Trading strategies·stat-arb

Statistical arbitrage

commoditised
Reviewed 4 June 2026. As of 2026: widely known and implemented; the edge is in execution, not the idea.

Trade a portfolio of correlated instruments on the bet that temporary statistical relationships revert. The edge is the average behaviour of many small, diversified positions, not any single forecast.

The idea

Statistical arbitrage annotated diagramfigure
Trade a portfolio of correlated instruments on the bet that temporary statistical relationships revert. The edge is the average behaviour of many small, diversified positions, not any single forecast.

Reference figure. This concept is explained in prose and diagram; the interactive widgets live on the flagship pages it links to under Where this fits.

What is statistical arbitrage, precisely?

Statistical arbitrage is a family of strategies that profit from the relative mispricing of related instruments rather than the direction of any one of them. You build a portfolio whose value is a mean-reverting spread, enter when the spread sits statistically far from its mean, and exit when it reverts. The "arbitrage" is statistical, not riskless: the spread can diverge before it converges.

Intuition first. Two assets that are economically linked (Coca-Cola and Pepsi, an ETF and its basket, a future and its cash index) should move together. When they drift apart for no fundamental reason, the gap is a spread that tends to close. Stat arb buys the laggard, sells the leader, and waits for the gap to shut. Your profit is the spread closing, regardless of whether the market as a whole rises or falls.

The defining property is market neutrality: the long and short legs are sized so the portfolio has little net exposure to the common factor: the market, the sector, the index. You are not betting on direction; you are betting on convergence. And "arbitrage" is a loose word here. A true arbitrage is riskless and self-financing; statistical arbitrage is neither. It is a bet that a historically-estimated relationship will hold long enough for the spread to revert before your stop or your capital runs out. The edge is statistical (positive expected value over many independent bets) not a guarantee on any single one.

A stat-arb position is a spread: long the cheap leg, short the rich leg, scaled by a hedge ratio so the combination is market-neutral. You profit as the spread reverts to its mean, not on either leg's direction.
St=PA,tβPB,t,zt=StμSσSS_t = P_{A,t} - \beta\,P_{B,t}, \qquad z_t = \frac{S_t - \mu_S}{\sigma_S}

Where does the mean-reversion come from?

The reversion comes from an economic or mechanical link that ties the instruments together: substitutes, an index-and-constituents relationship, a creation/redemption mechanism, or a lead-lag in how information propagates. When the link is real, deviations are temporary noise; when it is spurious, the "spread" is a coin flip. The entire skill is telling the two apart.

Economic linkage is substitutes, the same supply chain, the same factor: two airlines, two gold miners, two payment networks. The shared driver means idiosyncratic divergence tends to revert. This is the classic pairs trade. Mechanical linkage is a contractual or structural identity: an ETF must trade near its net asset value because authorised participants can create and redeem; a future must trade near spot-plus-carry because of cash-and-carry arbitrage; an index must equal its weighted constituents. These links are enforced by arbitrageurs, so deviations are smaller and faster, but more reliably mean-reverting. That is index arb and ETF arb.

Microstructure linkage is the bid-ask bounce and transient order-flow pressure: on very short horizons, the overshoot from a large taker order reverts as liquidity replenishes. This is intraday mean reversion, a different beast: mechanical and tiny, not economic. The danger throughout is the spurious relationship: two series that correlated historically by coincidence, with no link to enforce reversion. Correlation is not enough; you need cointegration (a stationary spread) and even that can break when the regime changes. The correlation-versus-cointegration distinction is developed in full on pairs trading.

What's in the family? The map of stat-arb variants

Statistical arbitrage spans pairs and baskets (relative value between two or many names), index and ETF arbitrage (the instrument versus its constituents), and intraday or microstructure mean reversion (very-short-horizon reversal). All share the same core (construct a mean-reverting spread, trade its deviations) but they differ in horizon, in what enforces the reversion, and in how crowded they are.

Pairs trading trades the spread between two cointegrated names, enforced by economic linkage, over minutes to days: commoditised in liquid equities, with surviving alpha in crypto, small caps and faster horizons. Basket / multi-name trades a portfolio long-short against a factor, enforced by statistical factor structure; it is the crowded WorldQuant / Two-Sigma-style space, where the edge is in the residual model. Index arbitrage trades an index future against its cash basket, enforced by cash-and-carry no-arbitrage over seconds to minutes: mechanical, low-margin and latency-gated. ETF arbitrage trades an ETF's price against its NAV, enforced by creation/redemption through authorised participants: structural and tight in liquid ETFs, wider in illiquid or leveraged ones. Intraday mean reversion trades short-horizon price against a local fair value, enforced by microstructure: niche-alpha, mechanical, tiny and cost-sensitive.

The unifying picture is that every one of those constructs a spread (a linear combination of prices designed to be stationary) and trades its deviations from a fitted mean. The faster and more mechanical the linkage (index and ETF arb), the tighter and more reliable the reversion, but the lower the margin and the more it becomes a latency game, not a modelling game. The slower and more economic the linkage (pairs and baskets), the richer the spread but the higher the risk it never reverts. The maths of the spread (cointegration, the z-score, the half-life) is developed in full on pairs trading and reused across the whole family.

The lifecycle of a stat-arb strategy, and why it decays

A stat-arb strategy has a predictable arc: discovery (a spread that reverts), profitability (you trade it before others do), crowding (others find it, deviations shrink and reversions speed up), and death (the spread is arbitraged on entry, so there is nothing left to capture after costs). The decay is structural: publishing or scaling an edge accelerates it.

Discovery. You find a spread with a statistically significant tendency to revert: a half-life short enough to trade, deviations large enough to clear costs. Beware: most "discoveries" are backtest artefacts (survivorship, look-ahead, or multiple-testing on thousands of pairs). An Engle–Granger test passing on one of 5,000 pairs at p<0.05p\lt 0.05 is expected by chance. Profitability. While few trade it, deviations are large and reversions slow enough to capture. You earn the spread minus costs, and costs are the whole story: the gross deviation must clear the spread, plus fees and market impact on both legs, twice (in and out).

Crowding. As capital piles in (see capacity & alpha decay), deviations get arbitraged faster and stay smaller. The same dollar of gross edge now arrives in less reverting distance, and after costs, the net shrinks toward zero. This is why the classic equity pairs trade, public for decades, is dead in liquid names. Regime break is the other way a stat-arb dies: not gradual decay but a structural break, where the cointegrating relationship stops holding (a merger, a sector shock, a constituent change). The spread that always reverted now diverges and keeps diverging. This is how stat-arb books blow up: the August 2007 "quant quake", when crowded equity-neutral books all delevered into the same names at once, is the canonical case.

Relative-value P&L is negatively skewed: many small wins as spreads revert, the occasional large loss when one breaks. A diversified book of spreads does not protect you when the same crowd holds them all and unwinds at once.
net edge per trade    ΔSgross reversion    clegs×in/outfixed cost floor\text{net edge per trade} \;\approx\; \underbrace{|\Delta S|}_{\text{gross reversion}} \;-\; \underbrace{c_{\text{legs}\times\text{in/out}}}_{\text{fixed cost floor}}

The honest takeaway: stat arb is not a static edge you own; it is a renewable search for spreads, run against decay and the ever-present risk of a break. The durable advantage is the research-and-execution machine, not any single pair. Relative value as a mechanism is permanent (see fat tails for why the loss tail matters); any specific spread's edge decays.

Is statistical arbitrage still profitable in 2026?

Partly. The slow, classic pairs trade in liquid large-cap equities is competed away: spreads are arbitraged in milliseconds and net of costs there is nothing left. Edge survives where the field is thinner or slower: faster horizons, small caps, ETFs and baskets, cross-venue and crypto relationships, and in modelling spread dynamics better than the consensus.

Dead / commoditised: the textbook two-stock pairs trade in S&P 500 names on a daily horizon: public since the 1990s, arbitraged on entry. Live, for the equipped: intraday relative value with good data and low costs; ETF and basket arb where creation/redemption frictions leave room; crypto pairs and cross-exchange relationships, where the venues are open and the field is younger; and statistically modelled residual portfolios where your factor model and execution beat the crowd. What AI changes: machine learning helps select and model spreads (which pairs cointegrate out-of-sample, how the dynamics shift with regime), but it does not conjure reversion where the economic link is absent, and it intensifies crowding by handing everyone the same tools. See machine learning in HFT and what AI changes.

For the brand-level answer to "is HFT still profitable?", see is HFT still profitable in 2026, which places stat arb against the other segments.

Worked example

A schematic stat-arb trade on a cointegrated pair, as of 2026, synthetic and illustrative. Take two names AA and BB with a fitted hedge ratio β=0.8\beta = 0.8 from a cointegrating regression (see pairs trading). The spread is S=PA0.8PBS = P_A - 0.8\,P_B. Over the fit window the spread has mean μ=0\mu = 0 and standard deviation σ=0.40\sigma = 0.40, so the z-score is z=(Sμ)/σz = (S - \mu)/\sigma.

Entry. The spread widens to S=+0.80S = +0.80, so z=+2.0z = +2.0, two standard deviations rich. You sell one unit of A and buy 0.8 units of B (short the spread), betting it reverts toward zero. Exit. The spread reverts to S=0S = 0 (z=0z = 0); you unwind. Gross P&L 0.80\approx 0.80 per unit of spread (the distance the spread travelled) and market-neutral throughout.

You crossed the spread and paid fees and impact on two legs, twice (entry and exit), roughly four spread-crossings. At 0.10 per crossing the cost is 0.40, so the net is 0.40. Halve the gross deviation, to a z = 1.0 entry, and the trade is unprofitable.
net=0.80gross4×0.10cost=0.40,at z=1.0:  0.400.40=0.00\text{net} = \underbrace{0.80}_{\text{gross}} - \underbrace{4 \times 0.10}_{\text{cost}} = 0.40, \qquad \text{at } z=1.0:\; 0.40 - 0.40 = 0.00

This is exactly why the edge dies as crowding shrinks the deviations: the gross has to clear a fixed cost floor. Half-life. If the spread's Ornstein–Uhlenbeck half-life is 3 days, a z=2.0z = 2.0 deviation is expected to halve to z=1.0z = 1.0 in 3 days, which is your holding period and your capital-turnover. A half-life of weeks means slow turnover and exposure to a regime break; a half-life of minutes means a fast, cost-sensitive intraday strategy. The live IX-COINT widget on pairs trading lets you set β\beta, σ\sigma and the z-bands and watch the P&L. Real spreads, costs and half-lives must be measured per instrument and dated; these figures are synthetic, educational only, and not investment advice.

Where this fits

Common questions

What is statistical arbitrage?
Statistical arbitrage is a market-neutral approach that trades a portfolio of correlated instruments on the expectation that temporary statistical relationships (a spread, a residual, a factor exposure) revert to their historical norm. It bets on the average behaviour of many small, diversified positions rather than any single forecast. Returns come from frequent, low-edge trades whose statistics hold over a large sample.