First real version of the engine. We type npm run route, wait, wait some more, check if it's frozen, and... 14 seconds later we get a result. For one transfer. Fourteen seconds. This was never going to work.
The math: 4 bridges, 8 chains, up to 3 hops. That's 3,000+ candidate paths. For each one we were fetching quotes, calculating slippage, estimating gas, running the scorer. Sequentially. All of it.
For context, here's what "3,000 candidate paths" means in practice. For a single ETH→SOL transfer of $50K USDC:
1Direct routes (1 hop):2 ETH → SOL via Wormhole3 ETH → SOL via deBridge4 ETH → SOL via LayerZero5 ETH → SOL via Allbridge6 = 4 candidates7
82-hop routes:9 ETH → ARB → SOL (4 bridges × 4 bridges = 16 combos)10 ETH → BSC → SOL (16 combos)11 ETH → AVAX → SOL (16 combos)12 ETH → MATIC → SOL (16 combos)13 ETH → OP → SOL (16 combos)14 ETH → BASE → SOL (16 combos)15 = ~96 × multiple token paths = ~800 candidates16
173-hop routes:18 ETH → ARB → BSC → SOL (4×4×4 = 64 per chain pair)19 ... for every viable intermediate chain combination20 = ~2,200 candidates21
22total: 3,000+Most of those 3-hop routes are terrible. Nobody's going ETH→AVAX→BSC→SOL to move USDC. But you can't know that without checking, and sometimes a weird multi-hop route actually wins because of liquidity distribution.
Attempt 1: Parallelize the API Calls
Obviously. Fetch all bridge quotes at the same time instead of one by one. Went from 14s to about 4s. Cool. Still too slow.
1// before: sequential. obviously stupid in retrospect.2for (const route of allRoutes) {3 const quote = await route.bridge.getQuote(route.params);4 scores.push(evaluate(quote));5}6
7// after: parallel fetching. better but not enough.8const quotes = await Promise.all(9 allRoutes.map(r => r.bridge.getQuote(r.params).catch(() => null))10);11const scores = quotes.filter(Boolean).map(evaluate);Profiled it and the API calls weren't even the main bottleneck anymore. It was the scoring loop — running 3,000 routes through the evaluation function after quotes came back. Each evaluation hits the slippage model, the gas estimator, the MEV exposure calculator, and the reliability scorer. Individually fast. 3,000 times, not fast.
Attempt 2: Cache Everything
Cache bridge state, refresh every 5 seconds, evaluate against cached data. Got it to ~800ms. Problem: in a volatile 5-second window, pool conditions change enough to produce garbage recommendations.
We tested this against live data for a week. Ran both the cached version and the live version side by side, compared their recommendations.
1cache test results (1 week, 83 test transfers):2 same recommendation as live: 70 (84.3%)3 different recommendation: 13 (15.7%)4
5 of the 13 differences:6 cached was worse by <$100: 57 cached was worse by $100-500: 48 cached was worse by >$500: 4 ← not acceptable15.7% wrong is worse than no engine at all. If users can't trust the recommendation, they'll just go back to their default bridge. Scrapped it.
Attempt 3: Stop Doing Stupid Work
Should have started here honestly. We were evaluating all 3,000 routes when the vast majority were obviously bad. A 3-hop route through congested chains when a direct bridge is sitting right there? Don't even score it.
Alpha-beta pruning. The concept is from 1975 — Knuth and Moore. The idea: you maintain a window of the best and worst scores found so far. As you evaluate a route, if it can't possibly beat the current best even in the most optimistic case, you stop immediately and move on.
1function search(2 node: RouteNode,3 depth: number,4 alpha: number, // best score we can guarantee5 beta: number // best score opponent can guarantee6): number {7 if (depth === 0 || node.isTerminal()) {8 return evaluate(node);9 }10
11 for (const child of orderMoves(node.children)) {12 const score = -search(child, depth - 1, -beta, -alpha);13
14 if (score >= beta) {15 // this branch can't beat what we already have.16 // everything below here is a waste of time. cut it.17 return beta;18 }19
20 if (score > alpha) {21 alpha = score; // new best found22 }23 }24
25 return alpha;26}The algorithm is simple. Getting it to work well is not. The key insight: pruning effectiveness depends entirely on move ordering. If you evaluate good routes first, the alpha-beta window tightens quickly, and you prune more aggressively. If you evaluate bad routes first, you barely prune anything.
We spent about a week on move ordering alone. The heuristic we landed on:
1function orderMoves(candidates: RouteNode[]): RouteNode[] {2 return candidates.sort((a, b) => {3 // cheap heuristic: estimate total cost without full evaluation4 const costA = a.estimatedFees + a.estimatedSlippage * 1.5;5 const costB = b.estimatedFees + b.estimatedSlippage * 1.5;6
7 // prefer direct routes over multi-hop8 const hopPenaltyA = (a.hops - 1) * 500;9 const hopPenaltyB = (b.hops - 1) * 500;10
11 return (costA + hopPenaltyA) - (costB + hopPenaltyB);12 });13}The 1.5x slippage multiplier in the heuristic and the $500 per-hop penalty are hand-tuned from looking at which routes actually ended up winning. Not scientific. Works well enough to get 90%+ prune rate.
Result: 3,000 routes fully searched in under 10 milliseconds. 14 seconds to 10ms. The remaining ~300 routes that survive pruning get full scoring across all five dimensions.
Where the Time Actually Goes Now
With search at 10ms, the bottleneck shifted completely.
1timing breakdown (typical ETH→SOL $50K query):2 state collection (bridge APIs): 800ms - 2.1s3 ├─ wormhole API: 200-400ms4 ├─ debridge API: 150-350ms5 ├─ layerzero API: 300-600ms6 └─ allbridge API: 200-500ms7 route search + scoring: 6-12ms8 result formatting: <1ms9
10 total: ~1-3s11 our code: ~10ms12 waiting for bridge APIs: ~1-2sWe can't make Wormhole's API respond faster. But we can start working before all responses arrive. The search now runs incrementally — as soon as the first bridge responds, we start pruning. When the second bridge arrives, we refine. By the time the slowest bridge responds, we've already narrowed the candidate set significantly.
In practice, the route that's optimal after all four bridges respond is the same as the route we identified after three bridges about 85% of the time. The fourth bridge confirms rather than changes the result.