Top AI Product

We track trending AI tools across Product Hunt, Hacker News, GitHub, and more  — then write honest, opinionated takes on the ones that actually matter. No press releases, no sponsored content. Just real picks, published daily.  Subscribe to stay ahead without drowning in hype.


mcp2cli: The Tool That Cuts MCP Token Costs by 99% Just Hit Hacker News

If you’ve been building with MCP (Model Context Protocol) servers, you already know the pain: every tool schema gets injected into your LLM’s context on every single turn, whether the model uses those tools or not. With 30 tools, that’s roughly 3,600 tokens burned per turn doing absolutely nothing. Scale that to 120 tools over a 25-turn conversation, and you’re looking at 362,000 wasted tokens.

mcp2cli, which just landed on Hacker News’s front page with 133 points and 92 comments, offers a blunt solution: stop injecting schemas altogether. Instead, turn every MCP server and OpenAPI spec into a CLI that LLMs can discover on demand.

## The Problem With MCP’s Token Appetite

MCP has become the standard way AI tools connect to external data and services. Anthropic launched it, and now Claude, Cursor, Codex, and dozens of other tools support it. The protocol works well — until you start counting tokens.

Here’s what happens under the hood: every MCP server you connect exposes a set of tool definitions. These definitions — names, descriptions, parameter types, response formats — get stuffed into the model’s context window before the conversation even starts. Connect 10 servers and you can burn 8,000-15,000 tokens just on tool definitions. Add your system prompt and custom instructions, and you’ve eaten a significant chunk of your context window before typing a single word.

A recent academic paper on arXiv (2602.14878) confirmed this isn’t just a theoretical concern. While augmenting tool descriptions improves task success rates by about 5.85 percentage points, it also increases execution steps by 67% and actually hurts performance in 16.67% of cases. The researchers found that compact description variants often preserve reliability while dramatically cutting overhead.

mcp2cli takes that finding to its logical extreme: don’t inject any tool schemas at all.

## How mcp2cli Actually Works

The approach is straightforward. Instead of loading tool schemas into context, mcp2cli converts MCP servers and OpenAPI specs into standard CLI commands. The LLM interacts with tools the same way a developer would — by running shell commands.

The system follows a four-stage pipeline:

1. **Load** — Fetch and resolve the spec from a URL, file, or running MCP server. Cache the result (default TTL: 1 hour).
2. **Extract** — Convert the spec’s tools into uniform command definitions.
3. **Build** — Generate argument parsers with type validation.
4. **Execute** — Dispatch as HTTP requests (for OpenAPI) or tool calls (for MCP).

When an LLM needs to discover what tools are available, it runs `mcp2cli –list`, which returns summaries at roughly 16 tokens per tool. When it needs details on a specific tool, it runs `mcp2cli –help`, costing 80-200 tokens. Compare that to the native MCP approach: approximately 121 tokens per tool injected on every turn, regardless of whether the tool gets used.

The key insight is lazy discovery. The model only pays for the tools it actually looks up, and only when it needs them.

## The Numbers Behind the 99% Claim

The token savings aren’t hand-waved estimates. The project includes benchmarks using the cl100k_base tokenizer across three scenarios:

| Scenario | Tools | Turns | Native MCP Cost | mcp2cli Cost | Savings |
|—|—|—|—|—|—|
| 30-tool task manager | 30 | 15 | 54,525 tokens | 2,309 tokens | 96% |
| Multi-server setup | 80 | 20 | 193,360 tokens | 3,897 tokens | 98% |
| Enterprise platform | 120 | 25 | 362,350 tokens | 5,181 tokens | 99% |

The per-interaction breakdown: the system prompt costs 67 tokens per turn. Discovery via `–list` runs once. Help lookups happen only on first use per tool. Native MCP, by contrast, re-injects everything every turn.

Installation is minimal — `pip install mcp2cli` or run it without installing via `uvx mcp2cli –help`. It supports HTTP/SSE and stdio transports for MCP, handles OpenAPI 3.x specs in JSON or YAML (local or remote), and works with any LLM since it’s just a CLI tool the model shells out to.

## What the Hacker News Community Actually Thinks

The 92-comment thread on Hacker News reveals a mixed but engaged reception.

**The praise:** Multiple developers called the lazy discovery pattern “clever.” The core thesis — that injecting full schemas every turn is wasteful — resonated with anyone who’s watched their token bills climb while building MCP-heavy workflows. Several commenters confirmed they’d hit the exact same pain point in production.

**The skepticism:** The sharpest criticism came around metrics. Several users argued that tokens saved shouldn’t be the north star — what matters is whether tool call accuracy holds up with less context. If the LLM needs extra roundtrips to discover and understand tools, the latency cost and potential for errors might offset the token savings.

**The saturation point:** One commenter noted this was “the 5th one of these I have seen this week,” pointing to a growing ecosystem of CLI-to-MCP conversion tools. The space is getting crowded, with projects like CLIHub, mcporter, and Philipp Schmid’s MCP CLI all tackling similar problems from different angles.

**The documentation issue:** Several people flagged the README as “obviously generated slop,” suggesting the project’s AI-generated documentation undermined its credibility — an ironic problem for a tool designed to make AI tooling more efficient.

## mcp2cli vs. the Alternatives

mcp2cli isn’t alone in this space. Here’s how it stacks up:

**CLIHub** by Kagan Yilmaz was an early mover, demonstrating 92-98% cost reduction through CLI-based tool access. mcp2cli explicitly credits CLIHub as inspiration and pushes the approach further with runtime generation and OpenAPI support.

**Philipp Schmid’s MCP CLI** takes a similar approach but focuses more on direct MCP server interaction from the terminal rather than acting as a bridge for LLMs.

**Native MCP lazy loading** is the elephant in the room. Some Hacker News commenters asked: why not just fix MCP itself to support on-demand tool loading? If the protocol added native lazy discovery, tools like mcp2cli might become unnecessary. But until that happens, the wrapper approach fills a real gap.

mcp2cli’s differentiators are its zero-codegen runtime approach (point it at a spec URL and the CLI exists immediately), its dual support for both MCP and OpenAPI, and its provider-agnostic design that works with Claude, GPT, Gemini, or local models.

## Who Should Care About This

mcp2cli is most relevant if you’re:

– **Running multi-server MCP setups** where token overhead compounds across dozens or hundreds of tools
– **Building AI agents** that need access to many tools but only use a few per conversation
– **Cost-conscious teams** watching API bills grow as MCP adoption increases
– **Using Claude Code, Cursor, or Codex** — mcp2cli can be installed as a skill in these environments

If you’re only connecting one or two MCP servers with a handful of tools, the overhead probably isn’t worth worrying about. The savings become dramatic at scale.

The project has 273 GitHub stars, 96 tests, and an MIT license. With 30 commits on main, it’s still early-stage but actively maintained.

## FAQ

**How much does mcp2cli cost?**
mcp2cli is free and open source under the MIT license. The token savings it provides directly reduce your LLM API costs — the 96-99% reduction applies to the tokens consumed by tool definitions, which are billed at your provider’s standard rates.

**Does mcp2cli work with Claude Code and Cursor?**
Yes. mcp2cli can be installed as a skill in Claude Code, Cursor, and Codex. It acts as a standard CLI tool that the LLM shells out to, so it’s compatible with any environment that supports tool execution.

**Does reducing token context hurt tool call accuracy?**
This is the key open question. The Hacker News discussion highlighted that token savings don’t automatically mean better outcomes — if the LLM needs extra roundtrips to discover tools, latency increases and there’s potential for errors. The project’s benchmarks focus on token counts, not task completion rates, so real-world accuracy testing is still needed.

**How does mcp2cli compare to just fixing MCP’s protocol?**
MCP currently doesn’t support native lazy tool loading — every server injects all schemas upfront. If the protocol adds on-demand discovery in a future version, tools like mcp2cli might become less necessary. For now, it’s the most practical workaround available.

**Can I use mcp2cli with local LLMs?**
Yes. Since mcp2cli is just a CLI tool, it works with any model that can execute shell commands — including locally running models via Ollama, llama.cpp, or similar frameworks. There’s no dependency on any specific LLM provider.


You Might Also Like


Discover more from Top AI Product

Subscribe to get the latest posts sent to your email.



Leave a comment