The business of HFT

Research-to-production pipeline

structural
Reviewed 4 June 2026. As of 2026: a permanent feature of the market, not an edge that decays.

Idea → backtest → paper trade → live, with risk gates at every step. The pipeline is where most edges die, and where they should die, cheaply, before they cost real money.

The idea

Research-to-production pipeline annotated diagramfigure
Idea → backtest → paper trade → live, with risk gates at every step. The pipeline is where most edges die, and where they should die, cheaply, before they cost real money.

Reference figure. This concept is explained in prose and diagram; the interactive widgets live on the flagship pages it links to under Where this fits.

What is the research-to-production pipeline?

It is the disciplined path a trading idea travels from hypothesis to live, risk-managed P&L and back again: research → backtest → execution → risk → operations, closed by a feedback loop that turns live results into the next idea. It exists because most ideas are wrong, and the few that are right leak their edge at every handoff unless the pipeline is fast and honest.

Intuition first: an idea is cheap and almost always wrong. The pipeline is a filter that kills bad ideas quickly and cheaply, and a conveyor that moves the survivors to live trading with as little of their edge lost in transit as possible. A firm with a great idea and a slow, leaky pipeline loses to a firm with an ordinary idea and a fast, tight one. The edge is rarely a single model; it is the speed and discipline of this loop.

And it is a loop, not a line. The output of operations (attributed live P&L, the gap between expected and realised performance) is the most valuable research input the firm has, because it is the only data not contaminated by hindsight. A pipeline that does not feed live results back into research is half a pipeline. The honest framing for the whole page: 2026's AI tooling compresses the research and engineering legs dramatically, which moves the differentiator to the two stages AI cannot shortcut: finding a real edge, and not blowing up deploying it. The discipline is the moat; speed just raises the stakes.

Stage 1, Research: finding a candidate edge

Research is the hunt for a hypothesis with genuine, surviving edge: a relationship in the data (a signal, an inefficiency, a structural rebate) that you believe will persist out of sample. The output is a precisely specified candidate: what you trade, when, why it should work, and how it could fail. Most candidates die here, and that is the point.

Intuition first: good research starts from an economic reason a price relationship should hold, not from data-mining a curve that happens to go up. "This pair cointegrates because they are the same risk in two listings" survives; "these two tickers correlated last year" does not. The discipline is to state the why before the whether. The output is a hypothesis, not a backtest: the instruments, the entry and exit logic, the horizon, the claimed edge per trade, the expected number of trades, and the conditions under which it should stop working. Naming the failure mode up front is what separates research from rationalisation.

The roles are quant researchers and signal researchers; the tooling is research notebooks (Python, pandas or Polars), a clean historical data store, and (increasingly in 2026) AI research assistants that propose and screen hypotheses at speed (see machine learning in HFT and what AI changes). Where ideas die here: overfitting. Generate ten thousand candidates and some will look brilliant by chance, and the entire performance topic exists to tell the survivors from the lucky.

Stage 2, Backtest: testing the edge honestly

The backtest replays the strategy against historical data to estimate whether the edge is real and how it behaves: its risk-adjusted return, its capacity and its decay. The hard part is not running it but running it honestly: a backtest that admits look-ahead bias, survivorship bias or optimistic fills will conjure an edge that does not exist live.

Intuition first: a backtest is a simulation, and every simulation is a set of assumptions about how the market would have responded to your orders. Get those assumptions wrong (assume you always filled at the touch, ignore your own impact, peek at data you would not have had) and you are not testing a strategy, you are writing fiction with a great Sharpe. The cardinal sins, each toggleable on the backtest sandbox (IX-BACKTEST): look-ahead bias (using information from the future), survivorship bias (testing only on instruments that still exist), and optimistic fills (assuming you traded at prices you would not have got). Each one fabricates edge; together they make almost anything look profitable.

A good backtest produces not a single number but a distribution and a verdict: the risk-adjusted return, the capacity ceiling (how much capital before your own market impact eats it), and an estimate of decay. Headline backtest return alone is the cheapest, most misleading number in the firm. The roles are quant researchers plus backtest and infrastructure engineers; the tooling is a harness that enforces point-in-time data and realistic fills, a fills and cost model, and a market simulator. This is the joint where the datasets and harness on the waitlist do their work.

Stage 3, Execution: capturing the edge without giving it back

Execution is the order logic that turns a paper edge into realised P&L: how you actually place, modify and cancel orders so the costs of trading do not exceed the edge. A signal worth two basis points executed badly (crossing the spread, leaking impact, missing the queue) is a loss. Execution is where most modest edges are won or lost.

Intuition first: the gap between a backtest's "we'd have bought here" and a live "we actually bought there, at this price, this often" is pure execution. For HFT, where the edge per trade is a fraction of the spread, execution quality is not a refinement; it is the strategy. The decisions are passive versus aggressive (maker-taker), queue position and fill probability, and slicing to limit impact, the whole execution-algorithms topic: VWAP/TWAP/POV, Almgren–Chriss, smart order routing. The same idea executed passively or aggressively can have opposite signs after costs.

The roles are execution and low-latency engineers, often with a dedicated execution-research function; the tooling is the low-latency stack, colocation and FPGA where speed gates the edge, messaging protocols (FIX and binary feeds), and a smart order router. The whole systems topic is, in pipeline terms, the execution leg.

Stage 4, Risk: limits, kill-switches and pre-trade checks

Risk is the layer that decides how much you can lose and stops you before you lose more: position limits, loss limits, pre-trade risk checks and kill-switches that pull the strategy automatically when it misbehaves. In HFT, where a bug can send thousands of orders a second, risk is not a quarterly review; it is a real-time circuit sitting in the order path.

Intuition first: the failure mode that ends firms is not a slowly losing strategy; it is a fast one. A mis-specified model or a feed glitch can lose a year's profit in seconds, so the controls have to act at machine speed, in the order path, independent of the strategy logic that might itself be the thing malfunctioning. The controls are layered: pre-trade (reject orders that breach price, size or rate limits before they leave), in-flight (throttles, fat-finger and message-rate limits), and post-trade / kill-switch (auto-flatten and halt on a loss-limit or anomaly breach). These are mandated as much by regulation as by prudence; see measuring HFT risk and the regulatory circuit breakers.

The roles are risk managers plus the risk-systems engineers who build the kill-switch, deliberately independent of the trading desk, because the people running a strategy should not be the only ones able to switch it off. The honest point for the solo operator: this is the stage going independent in 2026 warns you not to skimp on. A research edge without a kill-switch is not a business; it is an unexploded liability.

Stage 5, Operations and the feedback loop

Operations is running the live strategy day to day (monitoring, reconciliation, P&L attribution) and then closing the loop by feeding what actually happened back into research. Attribution is the keystone: until you know which part of your P&L was spread capture, which leaked to adverse selection, and which was your own impact, you cannot improve the book.

Intuition first: a live strategy is not "done"; it is a process that drifts. Markets change, the edge decays, competitors crowd in. Operations is the discipline of watching the gap between expected and realised performance and acting on it before the strategy quietly bleeds out. Attribution is the loop's payload: decomposing live P&L by source (spread versus adverse selection versus your own impact versus signal alpha versus fees and rebates, per symbol, per venue, per hour) is the single highest-leverage activity in the firm, because it is the only research input not contaminated by hindsight. It tells you exactly which idea to build next.

The roles are trade-support and operations, plus the desk and researchers who consume attribution; the tooling is real-time monitoring and reconciliation, an attribution engine, and the data pipeline that aligns your fills to the book state at execution, again the datasets and harness the waitlist is built around. The loop, restated: operations' attributed P&L is research's best input. A firm that closes this loop fast and honestly out-researches a firm with better individual ideas. That is the whole thesis of the page.

How does AI change the pipeline in 2026?

AI compresses the top of the pipeline: code generation, research agents and infrastructure-as-code collapse the research and engineering legs from weeks to days. It does not shortcut the two hard stages: finding an edge that genuinely survives out of sample, and the risk discipline to deploy without blowing up. So AI makes a good pipeline faster and a sloppy one fail quicker.

The practical consequence is that the bottleneck shifts decisively. When implementation is cheap, the binding constraints become the things AI cannot fake: a hypothesis with real economic content, an honest fills model, and a kill-switch that fires. The research and backtest legs that once took a researcher weeks now take days; the execution and risk legs (and the honesty of the simulation) are exactly as hard as before. The pipeline got faster at the top and no easier at the bottom. For the fuller treatment see what AI changes for HFT, and for what this means for a one-person shop, going independent in 2026.

Worked example

A single idea travelling the pipeline: a small-team crypto market-making strategy, illustrative and as of 2026. The point is the shape of the loss at each handoff, not the figures.

Research. "BTC-PERP top-of-book imbalance predicts the next 50ms move; quote around the microprice", stated with an economic reason and a named failure mode (the imbalance threshold overfits; no real out-of-sample). Backtest. Replay against six months of L2 data with realistic fills: gross edge 1.2\approx 1.2 bps per trade, about 8,000 trades a day, net Sharpe 6\approx 6 after fees, provided the fills are not optimistic and your own impact is not ignored, which would inflate both the edge and the capacity.

Execution. Passive, queue-aware quotes; measured live fills come in 0.4 bps worse than the backtest assumed, so the net edge drops to 0.8\approx 0.8 bps, still positive. Risk. Inventory limit of ±0.5\pm 0.5 BTC, a daily loss limit, an auto-flatten kill-switch on breach; without an independent kill-switch, one feed glitch ends the account. Operations. Live for a week; attribution reads +1.1 bps spread capture, −0.3 bps adverse selection, net +0.8 bps, fed back to research to widen quotes on toxic hours.

The idea claimed 1.2 bps gross; honest execution gave back 0.4 bps; the net 0.8 bps times ~8,000 trades a day is the business, and it only exists because every stage was done without lying to itself. Each handoff is a place the edge leaks; the pipeline's job is to leak as little as possible.
daily net(1.20.4)bpsafter execution leak×8,000trades/day=0.8bps×8,000\text{daily net} \approx \underbrace{(1.2 - 0.4)\,\text{bps}}_{\text{after execution leak}} \times \underbrace{8{,}000}_{\text{trades/day}} = 0.8\,\text{bps} \times 8{,}000

AI's effect on this example in 2026: the research and backtest legs that once took a researcher weeks compress to days; the execution and risk legs (and the honesty of the fills model) are exactly as hard as before. Numbers are synthetic, rounded and illustrative; real edges, fill quality, costs and decay vary enormously by venue and instrument and change over time. Reverify against primary sources before relying on any of it. Educational only, not investment advice; no P&L is promised.

Where this fits