🏥 Production ML · ★★★ FEATURED

AutoTrader — Regime-Aware Quant ML Lab

A two-tier quant-ML lab for NSE equities — 36 regime-specialist models under a GPT-based "checker" (a Decision Transformer + RL ensemble), with honest walk-forward backtesting and a documented win for simplicity.

Overview

AutoTrader is a research platform for equities ML built around one structural idea: don’t trust a single model. Layer 1 is a bench of regime specialists; Layer 2 is a learned “checker” — a GPT-based Decision Transformer backed by an RL ensemble — that sits above the specialists and decides whether to act on them. The whole thing runs on free data against 20 years of NSE history, on a single MacBook / 6 GB GPU.

It is strictly a paper-trading research system — not live capital, and not financial advice. And the most honest result in here is that, while Layer 2 was being made to work, a deliberately simple path beat it — and the system was built to surface that, not hide it.

The problem

Markets aren’t stationary. A model tuned on a 2021 bull run quietly falls apart in a 2022 drawdown — yet its validation metrics still look fine, because validation accuracy and trading P&L are different things. AutoTrader is built around that gap: detect the regime → ask a specialist trained for it → let a checker decide if the trade is worth taking → judge everything on walk-forward paper-trading.

The architecture: two tiers

flowchart LR
  A["20yr OHLCV<br/>NSE · yfinance + jugaad"] --> B["Feature pipeline<br/>86 feats → ~25 selected"]
  B --> C{"Regime?<br/>HMM 60d · SMA 50/200d"}
  C -->|"bull / bear / sideways"| D["LAYER 1 — Specialists<br/>36 = 3 stocks × 3 regimes × 4 horizons"]
  D --> E["LAYER 2 — Trader Brain (checker)<br/>Decision Transformer + RL ensemble"]
  D -. "Direct Mode bypass" .-> R
  E --> R["Risk + sizing<br/>Adaptive Kelly · ATR stops · 2% risk"]
  R --> G["Paper-trading engine<br/>walk-forward · Nov'24 → Jan'26"]

Layer 1 — Regime specialists

The core bet is that one model can’t be good at everything, so AutoTrader trains 36 specialists (3 stocks × 3 regimes × 4 horizons), each with a regime-tuned Optuna objective: Sharpe for bull, drawdown protection for bear, win-rate for sideways. A SignalAggregator combines them by a horizon-weighted vote (20d → 40%, 10d → 30%, 5d → 20%, 1d → 10%).

Regime detection runs two ways — a 3-state Gaussian HMM (60-day, probabilistic) and an SMA detector — and a real finding fell out of it: switching from the textbook 200-day SMA to a fast 50-day detector mattered, because the 200-day lag costs you 4–5 months of being in the wrong regime.

The feature side is deliberate too: 86 features (36 technical + 30 price + 20 volume), with volatility handled by four estimators — Parkinson, Garman-Klass, Rogers-Satchell, Yang-Zhang — plus ATR-normalisation and z-scoring so features stay comparable across regimes. Sizing is Adaptive Kelly with regime multipliers and drawdown protection, ATR trailing stops, 2% risk per trade, and a hard 15% max-drawdown ceiling.

Layer 2 — the “checker”: a Decision Transformer

This is the part of the design I’m most proud of. Instead of a hand-written rule deciding whether to trust the specialists, Layer 2 learns it. A Decision Transformer reframes trading as conditional sequence modeling: feed it a target return and the recent history of [specialist predictions, price action, portfolio state], and it predicts the action that would have achieved that return — classic offline RL.

flowchart TB
  subgraph IN["Custom input layer"]
    S["State · 64-d<br/>specialist preds + price action + portfolio"]
    Rtg["Return-to-go<br/>20% target · x100 scale"]
  end
  EMB["Trading embeddings → 768-d"]
  GPT["Frozen GPT-2 backbone<br/>6 layers · 8 heads · 768-d · context 20"]
  subgraph OUT["Custom output heads"]
    ACT["Action head<br/>BUY · SELL · HOLD"]
    SZ["Position-size head"]
  end
  RL["RL ensemble · online fine-tune<br/>PPO bear · A2C returns · DDPG single-stock"]
  DEC["Decision<br/>BUY size / SELL size / HOLD"]
  S --> EMB
  Rtg --> EMB
  EMB --> GPT
  GPT --> ACT
  GPT --> SZ
  ACT --> RL
  SZ --> RL
  RL --> DEC

The interesting engineering choices:

A frozen GPT-2 backbone. The transformer core (6 layers, 8 heads, 768-d, a 20-step context window) is a pretrained GPT-2, kept frozen — only the custom trading input-embeddings and output heads train. It borrows GPT-2’s sequence priors cheaply instead of learning attention from scratch on scarce financial data.
Custom I/O around a general backbone. A 64-d market-state embedding and a return-to-go token go in; an action head (BUY/SELL/HOLD) and a position-size head come out. The backbone is generic; the trading intelligence lives in the layers wrapped around it.
An RL ensemble on top. PPO, A2C and DDPG agents fine-tune online, and the coordinator picks between them on a rolling 30-day Sharpe — PPO tends to win bear markets, A2C the cumulative-return race, DDPG single-stock runs.
Triple-Barrier labels. Trade outcomes are labeled with a take-profit / stop-loss / time barrier (3% / 2% / 10-day, ATR-scaled) so the model learns from realistic trade results, not raw next-day returns.

The honest finding: simple beat the half-built checker

Here’s the negative result I chose to keep. The first version of the checker (a plain ML coordinator) was broken — threshold-based labeling had handed it a training set that was 95%+ “HOLD”, so it learned to do almost nothing. Rather than paper over it, I built Direct Mode: bypass the coordinator and act on the specialist probabilities directly, with simple rules (prob > 0.52 → buy, < 0.48 → sell, horizon-weighted, wider 10–12% stops, larger 15–30% positions, fast 50-day regime).

Metric	Direct Mode	Broken v1 checker
Return	+2.05%	−0.68%
Sharpe	0.53	−0.82
Win rate	61%	38.5%
Trades	18	13

In the bear stretch the specialists protected capital (≈ −$5k vs a potential −$20k). The Decision Transformer above is the proper rebuild of that checker layer — and it’s held to the same honesty bar: it reached 82.8% validation accuracy but 100% training accuracy (textbook overfitting on ~36k samples), so it stays clearly flagged experimental while the calibrated specialists + Direct Mode remain the production path. The architecture is the ambition; the discipline is refusing to ship the ambitious part until it actually generalises.

What I took away

Validation accuracy ≠ trading P&L — walk-forward paper-trading is the only verdict that counts.
Negative results are results — the broken-coordinator discovery is the most useful thing the project produced.
Borrow priors, train the edges — freezing a general GPT-2 backbone and learning only the trading-specific I/O is far more sample-efficient than training a transformer on scarce market data.
Regimes are real, and lag is expensive — fast (50-day) regime detection materially outperformed the textbook 200-day.

Stack

Python · XGBoost · LightGBM · PyTorch · GPT-2 (frozen backbone) · Decision Transformer · PPO / A2C / DDPG (RL ensemble) · 3-state Gaussian HMM · Optuna · Triple-Barrier labeling · Adaptive Kelly sizing · a walk-forward paper-trading engine.