Blog
← Back to Blog

Searching 3,000 Routes in 10ms

10 min read

First real version of the engine. We type npm run route, wait, wait some more, check if it's frozen, and... 14 seconds later we get a result. For one transfer. Fourteen seconds. This was never going to work.

The math: 4 bridges, 8 chains, up to 3 hops. That's 3,000+ candidate paths. For each one we were fetching quotes, calculating slippage, estimating gas, running the scorer. Sequentially. All of it.

For context, here's what "3,000 candidate paths" means in practice. For a single ETH→SOL transfer of $50K USDC:

text
1Direct routes (1 hop):
2 ETH → SOL via Wormhole
3 ETH → SOL via deBridge
4 ETH → SOL via LayerZero
5 ETH → SOL via Allbridge
6 = 4 candidates
7
82-hop routes:
9 ETH → ARB → SOL (4 bridges × 4 bridges = 16 combos)
10 ETH → BSC → SOL (16 combos)
11 ETH → AVAX → SOL (16 combos)
12 ETH → MATIC → SOL (16 combos)
13 ETH → OP → SOL (16 combos)
14 ETH → BASE → SOL (16 combos)
15 = ~96 × multiple token paths = ~800 candidates
16
173-hop routes:
18 ETH → ARB → BSC → SOL (4×4×4 = 64 per chain pair)
19 ... for every viable intermediate chain combination
20 = ~2,200 candidates
21
22total: 3,000+

Most of those 3-hop routes are terrible. Nobody's going ETH→AVAX→BSC→SOL to move USDC. But you can't know that without checking, and sometimes a weird multi-hop route actually wins because of liquidity distribution.

Attempt 1: Parallelize the API Calls

Obviously. Fetch all bridge quotes at the same time instead of one by one. Went from 14s to about 4s. Cool. Still too slow.

typescript
1// before: sequential. obviously stupid in retrospect.
2for (const route of allRoutes) {
3 const quote = await route.bridge.getQuote(route.params);
4 scores.push(evaluate(quote));
5}
6
7// after: parallel fetching. better but not enough.
8const quotes = await Promise.all(
9 allRoutes.map(r => r.bridge.getQuote(r.params).catch(() => null))
10);
11const scores = quotes.filter(Boolean).map(evaluate);

Profiled it and the API calls weren't even the main bottleneck anymore. It was the scoring loop — running 3,000 routes through the evaluation function after quotes came back. Each evaluation hits the slippage model, the gas estimator, the MEV exposure calculator, and the reliability scorer. Individually fast. 3,000 times, not fast.

Attempt 2: Cache Everything

Cache bridge state, refresh every 5 seconds, evaluate against cached data. Got it to ~800ms. Problem: in a volatile 5-second window, pool conditions change enough to produce garbage recommendations.

We tested this against live data for a week. Ran both the cached version and the live version side by side, compared their recommendations.

text
1cache test results (1 week, 83 test transfers):
2 same recommendation as live: 70 (84.3%)
3 different recommendation: 13 (15.7%)
4
5 of the 13 differences:
6 cached was worse by <$100: 5
7 cached was worse by $100-500: 4
8 cached was worse by >$500: 4 ← not acceptable

15.7% wrong is worse than no engine at all. If users can't trust the recommendation, they'll just go back to their default bridge. Scrapped it.

Attempt 3: Stop Doing Stupid Work

Should have started here honestly. We were evaluating all 3,000 routes when the vast majority were obviously bad. A 3-hop route through congested chains when a direct bridge is sitting right there? Don't even score it.

Alpha-beta pruning. The concept is from 1975 — Knuth and Moore. The idea: you maintain a window of the best and worst scores found so far. As you evaluate a route, if it can't possibly beat the current best even in the most optimistic case, you stop immediately and move on.

typescript
1function search(
2 node: RouteNode,
3 depth: number,
4 alpha: number, // best score we can guarantee
5 beta: number // best score opponent can guarantee
6): number {
7 if (depth === 0 || node.isTerminal()) {
8 return evaluate(node);
9 }
10
11 for (const child of orderMoves(node.children)) {
12 const score = -search(child, depth - 1, -beta, -alpha);
13
14 if (score >= beta) {
15 // this branch can't beat what we already have.
16 // everything below here is a waste of time. cut it.
17 return beta;
18 }
19
20 if (score > alpha) {
21 alpha = score; // new best found
22 }
23 }
24
25 return alpha;
26}

The algorithm is simple. Getting it to work well is not. The key insight: pruning effectiveness depends entirely on move ordering. If you evaluate good routes first, the alpha-beta window tightens quickly, and you prune more aggressively. If you evaluate bad routes first, you barely prune anything.

We spent about a week on move ordering alone. The heuristic we landed on:

typescript
1function orderMoves(candidates: RouteNode[]): RouteNode[] {
2 return candidates.sort((a, b) => {
3 // cheap heuristic: estimate total cost without full evaluation
4 const costA = a.estimatedFees + a.estimatedSlippage * 1.5;
5 const costB = b.estimatedFees + b.estimatedSlippage * 1.5;
6
7 // prefer direct routes over multi-hop
8 const hopPenaltyA = (a.hops - 1) * 500;
9 const hopPenaltyB = (b.hops - 1) * 500;
10
11 return (costA + hopPenaltyA) - (costB + hopPenaltyB);
12 });
13}

The 1.5x slippage multiplier in the heuristic and the $500 per-hop penalty are hand-tuned from looking at which routes actually ended up winning. Not scientific. Works well enough to get 90%+ prune rate.

Result: 3,000 routes fully searched in under 10 milliseconds. 14 seconds to 10ms. The remaining ~300 routes that survive pruning get full scoring across all five dimensions.

Where the Time Actually Goes Now

With search at 10ms, the bottleneck shifted completely.

text
1timing breakdown (typical ETH→SOL $50K query):
2 state collection (bridge APIs): 800ms - 2.1s
3 ├─ wormhole API: 200-400ms
4 ├─ debridge API: 150-350ms
5 ├─ layerzero API: 300-600ms
6 └─ allbridge API: 200-500ms
7 route search + scoring: 6-12ms
8 result formatting: <1ms
9
10 total: ~1-3s
11 our code: ~10ms
12 waiting for bridge APIs: ~1-2s

We can't make Wormhole's API respond faster. But we can start working before all responses arrive. The search now runs incrementally — as soon as the first bridge responds, we start pruning. When the second bridge arrives, we refine. By the time the slowest bridge responds, we've already narrowed the candidate set significantly.

In practice, the route that's optimal after all four bridges respond is the same as the route we identified after three bridges about 85% of the time. The fourth bridge confirms rather than changes the result.