Back to all deployments

Open Source AI

timesfm-mcp

Zero-config time-series forecasting for any AI agent — MCP server wrapping Google's TimesFM 2.5 plus a pure-NumPy statistical baseline.

Customer

Open-source developers integrating time-series forecasting into AI agent workflows

Timeline

2026 · v0.1.6

Status

Published on PyPI · v0.1.6

Capability

Open SourceMCPTime-SeriesPython

Stack

PythonMCPFastMCPNumPyPydanticTimesFM 2.5Time-Series

Outcome

v0.1.6
PyPI release
pip install timesfm-mcp
2
MCP tools
forecast + backtest
2
Backends
NumPy baseline + TimesFM 2.5
MAE/sMAPE
Backtest metrics
Validated before trusted

Customer Context

Who they are and what world they live in

AI agents built with Claude, Cursor, or any MCP client can generate text, write code, and analyze data — but they can't produce reliable numeric time-series forecasts. A foundation model might describe a trend narratively, but it can't produce calibrated point predictions with confidence bands, or run backtests against held-out data. Any workflow that needs a forecast recommendation — inventory planning, capacity management, energy demand, sales projections — requires the agent to call out to a real forecasting model and receive structured numeric results it can reason about.

The Problem

The fuzzy ask, translated

LLMs are inherently unable to forecast numeric time-series reliably. They can describe trends, but can't produce calibrated point predictions or confidence intervals. The design challenge was not building a forecasting model — Google's TimesFM 2.5 and a pure-NumPy statistical baseline handle that — but bridging the gap between an AI agent and a forecasting backend. The agent needs to call a tool, receive structured output (point predictions, confidence bands, trend/seasonality summary), and then write the recommendation itself. The backtest tool adds a second requirement: validate the forecast on held-out data before the agent trusts it.

The Constraints

Time · Budget · Regulatory · Technical · Organizational

01

Zero-config baseline: pip install timesfm-mcp must work on any machine without CUDA, GPUs, or special hardware — the NumPy statistical baseline runs everywhere

02

TimesFM 2.5 is not yet on PyPI (upstream issue google-research/timesfm#432) — must be installed from vendored source, making it optional rather than the default

03

MCP protocol compliance — tool definitions, parameter schemas, and response shapes must be valid for Claude, Cursor, and any MCP client without modification

04

Structured output for agent reasoning — point predictions, confidence bands, and a trend/seasonality summary in one response so the agent has enough context to write the recommendation itself

05

backtest tool must report MAE/sMAPE on held-out data so forecasts are validated before they're trusted in production workflows

Architecture Decisions

What I chose. What I rejected. Why.

Two-backend design

Chosen

Pure-NumPy statistical baseline (always available) + Google TimesFM 2.5 (optional, installed from vendored source while google-research/timesfm#432 is open)

Rejected

TimesFM 2.5 only / require GPU for all users

Why

TimesFM 2.5 is not yet on PyPI. Requiring it would block most users. The statistical baseline runs on any machine without special dependencies and covers the majority of production forecasting use cases. The MCP server detects which backend is available at startup and routes accordingly.

MCP server framework

Chosen

FastMCP — minimal boilerplate, automatic tool schema generation from Python type hints and Pydantic models

Rejected

Raw MCP protocol implementation

Why

FastMCP generates compliant MCP tool definitions from Python function signatures. This keeps the forecasting logic cleanly separated from the protocol layer and makes tool parameters self-documenting for any MCP client.

Output design

Chosen

Structured response per tool call: point predictions array, confidence bands (lower/upper), trend direction, seasonality flag, plain-English summary — all normalized through a shared Pydantic model

Rejected

Return raw numbers only / split context across multiple tool calls

Why

The agent writes the recommendation. To do that well, it needs not just numbers but context: is this trend up or down? Is there seasonality? What's the confidence range? One response with all of this means the agent can reason in a single context window without chaining tool calls. Normalizing both backends through a shared Pydantic model ensures the response shape is identical regardless of which backend runs.

Validation layer

Chosen

backtest tool: split the input series into training and held-out windows, forecast the held-out period, report MAE and sMAPE

Rejected

Trust forecast quality without validation

Why

A forecast that hasn't been validated against held-out data is not trustworthy in a production agent workflow. The backtest tool closes the loop: the agent can call forecast, then backtest, and only recommend the forecast if the error metrics are within acceptable bounds.

The Hard Problem

The one thing that almost broke the deployment

TimesFM 2.5 distribution gap. The upstream PyPI package doesn't exist yet (google-research/timesfm#432), which means users who want the foundation model backend can't install it with pip install timesfm-mcp alone. The naive solution — just require TimesFM — would break the package for anyone without a GPU or the source install. The other extreme — omit TimesFM entirely — would make the package less useful for the use cases where a foundation model's accuracy matters most.

The Fix

Dual-backend architecture with graceful degradation. The MCP server attempts to import TimesFM at startup inside a try/except. If it's available, it becomes the default backend for the forecast tool. If it's not, the NumPy statistical baseline takes over transparently. The startup log tells the user which backend is active. Users who want TimesFM 2.5 get a one-time install command from the vendored source — the README links directly to the upstream issue so they know the PyPI gap is tracked and temporary.

Production Reality

What I had to fix in week 2

The agent UX loop — call forecast, get numbers, write recommendation — only works if the structured output is stable enough for the agent to act on. Early versions returned slightly different key names across backends (the TimesFM backend used 'forecast' where the NumPy backend used 'predictions'). Fixed by normalizing all output through a shared Pydantic response model before returning from the MCP tool. Both backends now return an identical response shape regardless of which one runs.

Lessons Carried Forward

What this taught me that I apply to every deployment

01

Graceful backend degradation is not optional when an upstream dependency isn't on PyPI — design the dual-backend from day one, not as an afterthought

02

The agent writes the recommendation; the tool provides the numbers — structured output with context (trend, seasonality, confidence) is more valuable than raw predictions alone

03

Normalize response shapes across backends through a shared Pydantic model — any key-name divergence between backends will break the agent's downstream reasoning

04

Backtest before trust: any forecasting tool used in a production agent workflow needs a validation layer the agent can call before committing to a recommendation

05

TimesFM 2.5 from vendored source tracks upstream issue google-research/timesfm#432 — when that lands, the install path simplifies to one command

Related Deployments

Ask me