Fintech
IBKR Futures Automation
NLP-driven command interface for futures and options trading — Claude as the command parser, risk gates before every order.
Customer
Self — proof-of-concept for retail trading automation
Timeline
2025–Present
Status
Working prototype
Capability
Stack
Outcome
Customer Context
Who they are and what world they live in
Retail futures and options trading — specifically vertical spreads on micro contracts — requires precise multi-leg order entry across a complex brokerage API. The workflow is: identify a trade setup, calculate position sizing based on account risk tolerance, enter a multi-leg order with specific strikes and expirations, monitor position, and exit on target or stop. Every step that requires switching between mental calculation and the brokerage UI is a place where mistakes happen.
The Problem
The fuzzy ask, translated
The stated goal was 'automate the trading workflow.' The real design challenge: how do you build a natural-language command interface for a domain where a misinterpreted command costs real money? The LLM has to be right, not just helpful. And 'right' means: correct symbol, correct strike, correct expiration, correct quantity, correct order type — with a risk gate that prevents execution if any parameter is outside bounds.
The Constraints
Time · Budget · Regulatory · Technical · Organizational
Real money trading APIs — IBKR ib_insync has no sandbox mode for futures; mistakes execute against real positions
Zero tolerance for command misinterpretation — 'buy 2 MES calls at 5400' must parse exactly, not approximately
Real-time risk gates — position sizing, account exposure, and margin checks must run before any order touches the API
Latency budget — options orders on micro futures are time-sensitive; the NLP parse + risk check must complete in under 2 seconds
Phase-based development: paper trading → micro contracts ($50 margin) → scaled positions
Architecture Decisions
What I chose. What I rejected. Why.
Command parsing
Chosen
Claude as NLP parser with strict structured output schema — command → JSON with explicit fields for symbol, action, quantity, strike, expiration, order type
Rejected
Regex-based parser / traditional NLP
Why
Trading commands have infinite natural-language variation. Regex breaks on the second person who uses it. Claude's schema-constrained output gives structured JSON from any valid command phrasing — and refuses to parse ambiguous commands rather than guessing.
Risk gate placement
Chosen
Risk validation as a blocking step between command parse and order submission — account exposure check, position sizing rules, margin check, all gates must pass
Rejected
Post-execution risk monitoring
Why
The risk gate is the actual hard problem. Post-execution monitoring is a loss management tool, not a risk management tool. Every order that violates risk rules must be blocked before touching the API.
Development phasing
Chosen
Paper trading (no real orders) → micro contracts (real orders, minimum size) → scaled
Rejected
Full-scale testing
Why
The only way to verify that the system behaves correctly under real API conditions is to run it with real orders at minimum position size. Paper trading tests the logic; micro contracts test the API integration, error handling, and latency under production conditions.
The Hard Problem
The one thing that almost broke the deployment
The LLM command parser is not the hard problem. The hard problem is the eval harness for command interpretation accuracy. How do you measure whether 'sell 3 ES put spreads at 5300/5200 for 20 points' is being parsed correctly across 200 command variations before deploying against a live account? Without an eval harness, you're flying blind.
The Fix
Building the eval harness now before scaling. 200 command/intent pairs, automated comparison of parsed JSON against ground truth, coverage across all supported order types and common variations. The system does not move out of paper trading until eval accuracy on the test set exceeds 99.5% and all edge cases (ambiguous expirations, mid-sentence corrections, multi-leg abbreviations) have known behavior.
Production Reality
What I had to fix in week 2
ib_insync's async event loop and the Claude API's HTTP client do not share an event loop gracefully. Early builds had race conditions between incoming market data events and the NLP parse async calls. Separated them: market data runs in a dedicated ib_insync event loop thread; NLP parsing runs in a separate thread pool with a queue. The order submission is synchronous on the ib_insync thread.
Lessons Carried Forward
What this taught me that I apply to every deployment
The risk gate is the actual hard problem in trading automation — LLM command parsing is the easy part
Build the eval harness before scaling — 200 command/intent pairs reveal edge cases that a demo never surfaces
Separate async event loops for I/O-bound tasks that don't share a runtime — ib_insync and aiohttp are not friends on the same loop
Phase-based development is not optional when real money is involved — paper trading tests logic, micro contracts test the API
Related Deployments