🏥 Production ML · ★★★ FEATURED
AutoTrader — Regime-Aware Quant ML Lab
A two-tier quant-ML lab for NSE equities — 36 regime-specialist models under a GPT-based "checker" (a Decision Transformer + RL ensemble), with honest walk-forward backtesting and a documented win for simplicity.
Overview
AutoTrader is a research platform for equities ML built around one structural idea: don’t trust a single model. Layer 1 is a bench of regime specialists; Layer 2 is a learned “checker” — a GPT-based Decision Transformer backed by an RL ensemble — that sits above the specialists and decides whether to act on them. The whole thing runs on free data against 20 years of NSE history, on a single MacBook / 6 GB GPU.
It is strictly a paper-trading research system — not live capital, and not financial advice. And the most honest result in here is that, while Layer 2 was being made to work, a deliberately simple path beat it — and the system was built to surface that, not hide it.
The problem
Markets aren’t stationary. A model tuned on a 2021 bull run quietly falls apart in a 2022 drawdown — yet its validation metrics still look fine, because validation accuracy and trading P&L are different things. AutoTrader is built around that gap: detect the regime → ask a specialist trained for it → let a checker decide if the trade is worth taking → judge everything on walk-forward paper-trading.
The architecture: two tiers
flowchart LR
A["20yr OHLCV<br/>NSE · yfinance + jugaad"] --> B["Feature pipeline<br/>86 feats → ~25 selected"]
B --> C{"Regime?<br/>HMM 60d · SMA 50/200d"}
C -->|"bull / bear / sideways"| D["LAYER 1 — Specialists<br/>36 = 3 stocks × 3 regimes × 4 horizons"]
D --> E["LAYER 2 — Trader Brain (checker)<br/>Decision Transformer + RL ensemble"]
D -. "Direct Mode bypass" .-> R
E --> R["Risk + sizing<br/>Adaptive Kelly · ATR stops · 2% risk"]
R --> G["Paper-trading engine<br/>walk-forward · Nov'24 → Jan'26"]
Layer 1 — Regime specialists
The core bet is that one model can’t be good at everything, so AutoTrader trains 36 specialists (3 stocks × 3 regimes × 4 horizons), each with a regime-tuned Optuna objective: Sharpe for bull, drawdown protection for bear, win-rate for sideways. A SignalAggregator combines them by a horizon-weighted vote (20d → 40%, 10d → 30%, 5d → 20%, 1d → 10%).
Regime detection runs two ways — a 3-state Gaussian HMM (60-day, probabilistic) and an SMA detector — and a real finding fell out of it: switching from the textbook 200-day SMA to a fast 50-day detector mattered, because the 200-day lag costs you 4–5 months of being in the wrong regime.
The feature side is deliberate too: 86 features (36 technical + 30 price + 20 volume), with volatility handled by four estimators — Parkinson, Garman-Klass, Rogers-Satchell, Yang-Zhang — plus ATR-normalisation and z-scoring so features stay comparable across regimes. Sizing is Adaptive Kelly with regime multipliers and drawdown protection, ATR trailing stops, 2% risk per trade, and a hard 15% max-drawdown ceiling.
Layer 2 — the “checker”: a Decision Transformer
This is the part of the design I’m most proud of. Instead of a hand-written rule deciding whether to trust the specialists, Layer 2 learns it. A Decision Transformer reframes trading as conditional sequence modeling: feed it a target return and the recent history of [specialist predictions, price action, portfolio state], and it predicts the action that would have achieved that return — classic offline RL.
flowchart TB
subgraph IN["Custom input layer"]
S["State · 64-d<br/>specialist preds + price action + portfolio"]
Rtg["Return-to-go<br/>20% target · x100 scale"]
end
EMB["Trading embeddings → 768-d"]
GPT["Frozen GPT-2 backbone<br/>6 layers · 8 heads · 768-d · context 20"]
subgraph OUT["Custom output heads"]
ACT["Action head<br/>BUY · SELL · HOLD"]
SZ["Position-size head"]
end
RL["RL ensemble · online fine-tune<br/>PPO bear · A2C returns · DDPG single-stock"]
DEC["Decision<br/>BUY size / SELL size / HOLD"]
S --> EMB
Rtg --> EMB
EMB --> GPT
GPT --> ACT
GPT --> SZ
ACT --> RL
SZ --> RL
RL --> DEC
The interesting engineering choices:
- A frozen GPT-2 backbone. The transformer core (6 layers, 8 heads, 768-d, a 20-step context window) is a pretrained GPT-2, kept frozen — only the custom trading input-embeddings and output heads train. It borrows GPT-2’s sequence priors cheaply instead of learning attention from scratch on scarce financial data.
- Custom I/O around a general backbone. A 64-d market-state embedding and a return-to-go token go in; an action head (BUY/SELL/HOLD) and a position-size head come out. The backbone is generic; the trading intelligence lives in the layers wrapped around it.
- An RL ensemble on top. PPO, A2C and DDPG agents fine-tune online, and the coordinator picks between them on a rolling 30-day Sharpe — PPO tends to win bear markets, A2C the cumulative-return race, DDPG single-stock runs.
- Triple-Barrier labels. Trade outcomes are labeled with a take-profit / stop-loss / time barrier (3% / 2% / 10-day, ATR-scaled) so the model learns from realistic trade results, not raw next-day returns.
The honest finding: simple beat the half-built checker
Here’s the negative result I chose to keep. The first version of the checker (a plain ML coordinator) was broken — threshold-based labeling had handed it a training set that was 95%+ “HOLD”, so it learned to do almost nothing. Rather than paper over it, I built Direct Mode: bypass the coordinator and act on the specialist probabilities directly, with simple rules (prob > 0.52 → buy, < 0.48 → sell, horizon-weighted, wider 10–12% stops, larger 15–30% positions, fast 50-day regime).
| Metric | Direct Mode | Broken v1 checker |
|---|---|---|
| Return | +2.05% | −0.68% |
| Sharpe | 0.53 | −0.82 |
| Win rate | 61% | 38.5% |
| Trades | 18 | 13 |
In the bear stretch the specialists protected capital (≈ −$5k vs a potential −$20k). The Decision Transformer above is the proper rebuild of that checker layer — and it’s held to the same honesty bar: it reached 82.8% validation accuracy but 100% training accuracy (textbook overfitting on ~36k samples), so it stays clearly flagged experimental while the calibrated specialists + Direct Mode remain the production path. The architecture is the ambition; the discipline is refusing to ship the ambitious part until it actually generalises.
What I took away
- Validation accuracy ≠ trading P&L — walk-forward paper-trading is the only verdict that counts.
- Negative results are results — the broken-coordinator discovery is the most useful thing the project produced.
- Borrow priors, train the edges — freezing a general GPT-2 backbone and learning only the trading-specific I/O is far more sample-efficient than training a transformer on scarce market data.
- Regimes are real, and lag is expensive — fast (50-day) regime detection materially outperformed the textbook 200-day.
Stack
Python · XGBoost · LightGBM · PyTorch · GPT-2 (frozen backbone) · Decision Transformer · PPO / A2C / DDPG (RL ensemble) · 3-state Gaussian HMM · Optuna · Triple-Barrier labeling · Adaptive Kelly sizing · a walk-forward paper-trading engine.