Claude Router

A high-performance, cost-optimized routing layer for Claude models with virtual threads support.

Features

Racing Router - Races multiple models in parallel, returns fastest response
Adaptive Learning - Automatically identifies best-performing models over time
StructuredTaskScope - Java 25 structured concurrency for clean, safe parallelism
Tiered Routing - Routes requests to Haiku, Sonnet, or Opus based on priority
Cost Optimization - Achieves up to 96% cost savings vs using Opus for everything
Virtual Threads - Java 25 virtual threads for high concurrency (~9K req/sec)
Metrics Collection - Tracks latency, success rates, tokens, and costs per model

Model Tiers

Model	Tier	Input $/1M	Output $/1M	Use Case
Haiku	1	$0.25	$1.25	Simple queries, high volume
Sonnet	2	$3.00	$15.00	Balanced tasks
Opus	3	$15.00	$75.00	Complex analysis

Racing Router & StructuredTaskScope

The RacingRouter uses Java 25's StructuredTaskScope to race multiple models in parallel and return the fastest response. This is a significant improvement over traditional approaches.

StructuredTaskScope vs Traditional Implementation

With StructuredTaskScope (Java 25)

private LLMResponse raceModels(List<Model> models, LLMRequest request) {
    try (var scope = StructuredTaskScope.open(
            Joiner.<LLMResponse>anySuccessfulResultOrThrow())) {

        // Fork concurrent tasks
        for (Model model : models) {
            scope.fork(() -> executeModel(model, request));
        }

        // Wait for first success - others auto-cancelled
        return scope.join();

    } catch (Exception e) {
        return handleError(e);
    }
}

Benefits:

Automatic cancellation of losing tasks
Structured lifetime - scope closes cleanly
Exception handling is straightforward
No thread pool management
No Future tracking or cleanup

Traditional Implementation (Pre-Java 25)

private LLMResponse raceModelsTraditional(List<Model> models, LLMRequest request) {
    ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
    CompletableFuture<LLMResponse>[] futures = new CompletableFuture[models.size()];
    AtomicBoolean completed = new AtomicBoolean(false);

    try {
        // Create futures for each model
        for (int i = 0; i < models.size(); i++) {
            Model model = models.get(i);
            futures[i] = CompletableFuture.supplyAsync(() -> {
                if (completed.get()) {
                    throw new CancellationException("Race already won");
                }
                return executeModel(model, request);
            }, executor);
        }

        // Wait for first successful result
        CompletableFuture<Object> anyOf = CompletableFuture.anyOf(futures);
        LLMResponse winner = (LLMResponse) anyOf.get(30, TimeUnit.SECONDS);
        completed.set(true);

        // Manually cancel remaining futures
        for (CompletableFuture<LLMResponse> future : futures) {
            if (!future.isDone()) {
                future.cancel(true);
            }
        }

        return winner;

    } catch (TimeoutException e) {
        // Cancel all on timeout
        for (CompletableFuture<LLMResponse> future : futures) {
            future.cancel(true);
        }
        return handleError(e);
    } catch (Exception e) {
        return handleError(e);
    } finally {
        executor.shutdown();
    }
}

Problems:

Manual cancellation required
Complex exception handling
Race conditions with completed flag
Must manage executor lifecycle
Type casting from anyOf()
Timeout handling is verbose

Comparison Summary

Aspect	StructuredTaskScope	CompletableFuture
Cancellation	Automatic	Manual
Resource Cleanup	try-with-resources	Manual shutdown
Type Safety	Full generics	Casting required
Thread Management	Built-in	ExecutorService lifecycle

Adaptive Racing Phases

The RacingRouter adapts its strategy based on accumulated data:

Phase	Races	Behavior	Cost
Cold Start	0-100	Race ALL 3 models	3x
Learning	100-500	Race top 2 by win rate	2x
Optimized	500+	Single model if >80% wins	1x

Requirements

Java 25+
Gradle 8.14+

Quick Start

# Clone the repo
git clone https://github.com/SivagurunathanV/claude-router.git
cd claude-router

# Run the server
./gradlew run

# Test the endpoint
curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Hello","priority":"LOW"}'

API Endpoints

POST /chat

Route a request through the RacingRouter.

{
  "prompt": "Your prompt here",
  "priority": "LOW | MEDIUM | HIGH"
}

Racing Behavior:

Races multiple models in parallel (based on current phase)
Returns the fastest successful response
Automatically adapts model selection based on win rates

GET /metrics

Get cost, performance metrics, and race statistics.

{
  "models": {
    "haiku": { "totalRequests": 1000, "raceWins": 960, "totalCostUSD": 0.26 },
    "sonnet": { "totalRequests": 500, "raceWins": 38, "totalCostUSD": 7.50 },
    "opus": { "totalRequests": 50, "raceWins": 2, "totalCostUSD": 3.75 }
  },
  "race_stats": {
    "total_races": 1000,
    "current_phase": "optimized",
    "models_racing": ["haiku"],
    "win_rates": { "haiku": 0.96, "sonnet": 0.038, "opus": 0.002 }
  },
  "total_cost_usd": 11.51,
  "cost_savings_pct": 95.9
}

GET /config

View current racing configuration.

{
  "racing_config": {
    "min_sample_size": 100,
    "learning_threshold": 500,
    "dominance_threshold": 0.80,
    "exploration_rate": 0.10
  },
  "current_phase": "optimized",
  "total_races": 1000
}

GET /health

Health check with phase info.

{
  "status": "ok",
  "active_threads": 12,
  "total_cost_usd": 11.51,
  "current_phase": "optimized"
}

POST /reset

Reset all metrics and race statistics.

Configuration

Environment variables:

Variable	Default	Description
`PORT`	8080	Server port
`USE_VIRTUAL_THREADS`	true	Enable Java 25 virtual threads

Benchmark Results

RacingRouter Performance

8599

Metric	Value
P50 Latency	96ms
Throughput (concurrent)	231 req/s
HAIKU Win Rate	96%
Cost Savings vs Opus	95.9%

RacingRouter vs TierRouter

Metric	RacingRouter	TierRouter
Strategy	Adaptive racing	Deterministic
P50 Latency	96ms	~100ms
Adapts to Performance	Yes	No
Cost (1000 reqs)	$0.46	$0.29

Virtual Threads vs Platform Threads

10,000 requests, 1,000 concurrency

Metric	Virtual	Platform	Improvement
Requests/sec	3,078	1,530	2x
p50 latency	103ms	475ms	4.6x
p95 latency	420ms	1,276ms	3x

See BENCHMARK_RESULTS.md for detailed results.

Project Structure

app/src/main/java/example/
├── api/
│   ├── Model.java          # Model enum with pricing
│   ├── LLMRequest.java     # Request with priority
│   ├── LLMResponse.java    # Response with metrics
│   ├── ModelFactory.java   # Model instantiation
│   └── RequestHandler.java # Simulated inference
├── router/
│   ├── Router.java         # Router interface
│   ├── RacingRouter.java   # Adaptive racing with StructuredTaskScope
│   ├── RacingConfig.java   # Racing tunable parameters
│   ├── TierRouter.java     # Priority-based routing
│   ├── AdaptiveRouter.java # Performance-based routing
│   ├── RouterServer.java   # HTTP server
│   └── RouterConfig.java   # Server configuration
└── metrics/
    ├── MetricsCollector.java # Metrics aggregation + race stats
    └── ModelMetrics.java     # Per-model stats + race wins

Running Tests

./gradlew test

Running Benchmarks

# Start server
./gradlew run &

# Run racing router benchmark (tests all phases)
./scripts/racing_benchmark.sh

# Compare RacingRouter vs TierRouter
./scripts/router_comparison.sh

# Run simple load test
./scripts/simple_load_test.sh LOW 10000 1000

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits < 8000 a aria-label="View commit history for this file." href="/SivagurunathanV/claude-router/commits/main/" class="prc-Button-ButtonBase-9n-Xk LinkButton-module__linkButton__nFnov flex-items-center fgColor-default" data-loading="false" data-size="small" data-variant="invisible" aria-describedby="_R_9d9jcdajal1d_">
app		app
gradle		gradle
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
BENCHMARK_RESULTS.md		BENCHMARK_RESULTS.md
README.md		README.md
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Router

Features

Model Tiers

Racing Router & StructuredTaskScope

StructuredTaskScope vs Traditional Implementation

With StructuredTaskScope (Java 25)

Traditional Implementation (Pre-Java 25)

Comparison Summary

Adaptive Racing Phases

Requirements

Quick Start

API Endpoints

POST /chat

GET /metrics

GET /config

GET /health

POST /reset

Configuration

Benchmark Results

RacingRouter Performance

RacingRouter vs TierRouter

Virtual Threads vs Platform Threads

Project Structure

Running Tests

Running Benchmarks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Router

Features

Model Tiers

Racing Router & StructuredTaskScope

StructuredTaskScope vs Traditional Implementation

With StructuredTaskScope (Java 25)

Traditional Implementation (Pre-Java 25)

Comparison Summary

Adaptive Racing Phases

Requirements

Quick Start

API Endpoints

POST /chat

GET /metrics

GET /config

GET /health

POST /reset

Configuration

Benchmark Results

RacingRouter Performance

RacingRouter vs TierRouter

Virtual Threads vs Platform Threads

Project Structure

Running Tests

Running Benchmarks

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages