Open Source AI
timesfm-mcp
Zero-config time-series forecasting for any AI agent — MCP server wrapping Google's TimesFM 2.5 plus a pure-NumPy statistical baseline.
Customer
Open-source developers integrating time-series forecasting into AI agent workflows
Timeline
2026 · v0.1.6
Status
Published on PyPI · v0.1.6
Capability
Stack
Outcome
Customer Context
Who they are and what world they live in
AI agents built with Claude, Cursor, or any MCP client can generate text, write code, and analyze data — but they can't produce reliable numeric time-series forecasts. A foundation model might describe a trend narratively, but it can't produce calibrated point predictions with confidence bands, or run backtests against held-out data. Any workflow that needs a forecast recommendation — inventory planning, capacity management, energy demand, sales projections — requires the agent to call out to a real forecasting model and receive structured numeric results it can reason about.
The Problem
The fuzzy ask, translated
LLMs are inherently unable to forecast numeric time-series reliably. They can describe trends, but can't produce calibrated point predictions or confidence intervals. The design challenge was not building a forecasting model — Google's TimesFM 2.5 and a pure-NumPy statistical baseline handle that — but bridging the gap between an AI agent and a forecasting backend. The agent needs to call a tool, receive structured output (point predictions, confidence bands, trend/seasonality summary), and then write the recommendation itself. The backtest tool adds a second requirement: validate the forecast on held-out data before the agent trusts it.
The Constraints
Time · Budget · Regulatory · Technical · Organizational
Zero-config baseline: pip install timesfm-mcp must work on any machine without CUDA, GPUs, or special hardware — the NumPy statistical baseline runs everywhere
TimesFM 2.5 is not yet on PyPI (upstream issue google-research/timesfm#432) — must be installed from vendored source, making it optional rather than the default
MCP protocol compliance — tool definitions, parameter schemas, and response shapes must be valid for Claude, Cursor, and any MCP client without modification
Structured output for agent reasoning — point predictions, confidence bands, and a trend/seasonality summary in one response so the agent has enough context to write the recommendation itself
backtest tool must report MAE/sMAPE on held-out data so forecasts are validated before they're trusted in production workflows
Architecture Decisions
What I chose. What I rejected. Why.
Two-backend design
Chosen
Pure-NumPy statistical baseline (always available) + Google TimesFM 2.5 (optional, installed from vendored source while google-research/timesfm#432 is open)
Rejected
TimesFM 2.5 only / require GPU for all users
Why
TimesFM 2.5 is not yet on PyPI. Requiring it would block most users. The statistical baseline runs on any machine without special dependencies and covers the majority of production forecasting use cases. The MCP server detects which backend is available at startup and routes accordingly.
MCP server framework
Chosen
FastMCP — minimal boilerplate, automatic tool schema generation from Python type hints and Pydantic models
Rejected
Raw MCP protocol implementation
Why
FastMCP generates compliant MCP tool definitions from Python function signatures. This keeps the forecasting logic cleanly separated from the protocol layer and makes tool parameters self-documenting for any MCP client.
Output design
Chosen
Structured response per tool call: point predictions array, confidence bands (lower/upper), trend direction, seasonality flag, plain-English summary — all normalized through a shared Pydantic model
Rejected
Return raw numbers only / split context across multiple tool calls
Why
The agent writes the recommendation. To do that well, it needs not just numbers but context: is this trend up or down? Is there seasonality? What's the confidence range? One response with all of this means the agent can reason in a single context window without chaining tool calls. Normalizing both backends through a shared Pydantic model ensures the response shape is identical regardless of which backend runs.
Validation layer
Chosen
backtest tool: split the input series into training and held-out windows, forecast the held-out period, report MAE and sMAPE
Rejected
Trust forecast quality without validation
Why
A forecast that hasn't been validated against held-out data is not trustworthy in a production agent workflow. The backtest tool closes the loop: the agent can call forecast, then backtest, and only recommend the forecast if the error metrics are within acceptable bounds.
The Hard Problem
The one thing that almost broke the deployment
TimesFM 2.5 distribution gap. The upstream PyPI package doesn't exist yet (google-research/timesfm#432), which means users who want the foundation model backend can't install it with pip install timesfm-mcp alone. The naive solution — just require TimesFM — would break the package for anyone without a GPU or the source install. The other extreme — omit TimesFM entirely — would make the package less useful for the use cases where a foundation model's accuracy matters most.
The Fix
Dual-backend architecture with graceful degradation. The MCP server attempts to import TimesFM at startup inside a try/except. If it's available, it becomes the default backend for the forecast tool. If it's not, the NumPy statistical baseline takes over transparently. The startup log tells the user which backend is active. Users who want TimesFM 2.5 get a one-time install command from the vendored source — the README links directly to the upstream issue so they know the PyPI gap is tracked and temporary.
Production Reality
What I had to fix in week 2
The agent UX loop — call forecast, get numbers, write recommendation — only works if the structured output is stable enough for the agent to act on. Early versions returned slightly different key names across backends (the TimesFM backend used 'forecast' where the NumPy backend used 'predictions'). Fixed by normalizing all output through a shared Pydantic response model before returning from the MCP tool. Both backends now return an identical response shape regardless of which one runs.
Lessons Carried Forward
What this taught me that I apply to every deployment
Graceful backend degradation is not optional when an upstream dependency isn't on PyPI — design the dual-backend from day one, not as an afterthought
The agent writes the recommendation; the tool provides the numbers — structured output with context (trend, seasonality, confidence) is more valuable than raw predictions alone
Normalize response shapes across backends through a shared Pydantic model — any key-name divergence between backends will break the agent's downstream reasoning
Backtest before trust: any forecasting tool used in a production agent workflow needs a validation layer the agent can call before committing to a recommendation
TimesFM 2.5 from vendored source tracks upstream issue google-research/timesfm#432 — when that lands, the install path simplifies to one command
Related Deployments