Caveman scores 333 HN points for making Claude talk like a caveman — does it actually save 75% of tokens?

“Why use many token when few token do trick.” That’s the tagline of Caveman, a Claude Code skill by Julius Brussee that went viral over the weekend. The idea is absurdly simple: make Claude drop articles, prepositions, and all the conversational fluff it loves so much. Instead of “I’ll execute the web search tool to find that information for you,” Claude says “Tool work. Search now.”

The repo hit 700+ GitHub stars in days. The Hacker News thread “Talk like caveman” pulled 333 points and 209 comments. Twitter went nuts — half the replies calling it genius, the other half calling it snake oil.

The truth, as usual, is somewhere in between. And the debate it triggered is more interesting than the tool itself.

What Caveman actually does

It’s a Claude Code skill — a set of instructions Claude reads at the start of every session. The rules are straightforward: no filler words, no articles (“the,” “a,” “an”), no polite preamble (“I’d be happy to help!”), no verbose explanations. Just the information, caveman-style.

Normal Claude response to a debugging question: “I’ve analyzed your code and found that the issue is related to a race condition in the useEffect hook. The problem occurs because the state update happens asynchronously, and the cleanup function isn’t properly canceling the pending request.” ~45 tokens.

Caveman Claude: “Race condition in useEffect. State update async, cleanup not cancel pending request. Fix: add abort controller.” ~18 tokens.

That’s a 60% reduction on output, and the technical content is identical. For short answers the savings are even higher — closer to that 75% claim. The installation is a one-liner: you install the skill and invoke it with /caveman. That’s it. No dependencies, no config, no code changes.

Brussee also made a companion project — caveman-compression by community member wilpel — which takes the concept further with rule-based semantic compression that strips predictable grammar while preserving factual content. It offers three modes: LLM-based (40-58% reduction), NLP-based (15-30%, free and offline), and MLM-based (20-30%, predictability-aware). The spin-off Claude Peptides packages similar ideas into a collection of slash commands, claiming 73% savings.

The math problem nobody can ignore

Here’s where it gets interesting. Monali Dambre dropped a thread on Twitter that reframed the entire conversation: “This is INCORRECT. The ‘caveman Claude’ hack promising 75% token savings is misleading. It only trims the visible output slightly. The real costs (15k–40k+ tokens) come from the hidden system prompt + tool results sent on every message.”

She’s pointing at something most Caveman enthusiasts are ignoring. When you use Claude Code, the output you see — the text Claude types back to you — is a fraction of the total token spend. The system prompt alone runs 15,000+ tokens. Every tool call result gets injected into the context. Every file read, every search result, every MCP server response — all input tokens, all invisible, all expensive.

Output tokens typically represent about 4% of Claude Code’s total token usage. Input tokens account for 93.4%. This is the same math problem that came up when claude-token-efficient (Universal Claude.md) hit HN a week earlier — a 63% reduction on 4% of your total spend isn’t transformative. It’s rounding error.

Worse: the Caveman skill itself adds tokens to every message as input context. For quick, short interactions, you might actually spend more tokens than you save.

And there’s an even deeper objection. Forcing models to be terse can make them dumber. Andrej Karpathy’s research shows that autoregressive models reason better when they generate more tokens — the “thinking out loud” phenomenon. When you tell Claude to skip the reasoning and jump to the answer, you’re not just cutting fluff. You’re potentially cutting accuracy. Multiple Hacker News commenters flagged this: pushing models into unnatural response patterns takes them out-of-distribution in ways that are hard to measure but easy to feel during a long coding session.

The case for Caveman (it’s not about money)

But here’s the thing Monali’s analysis misses: most Caveman users aren’t on API billing. They’re on Claude Pro or Max subscriptions — flat monthly fees with usage limits measured in tokens consumed, not dollars spent.

For subscription users, the calculus is completely different. Every “I’d be happy to help with that!” eats into your daily limit. Every verbose explanation that repeats what you already know burns capacity you could be using for actual work. Ziwen put it perfectly on Twitter: “Everyone’s laughing at caveman Claude but the guy accidentally cracked the best prompt hack of 2026. Your LLM burns 30-40% of every response being polite to you. You are literally paying for ‘I’d be happy to help!’”

When you’re on a subscription with a token ceiling, Caveman isn’t a cost optimization tool. It’s a capacity optimization tool. You get more turns per session, more work done before hitting the limit. That distinction matters.

There’s also the developer experience angle. If you’re running automation pipelines — agent loops, batch processing, CI integrations — you don’t want Claude chatting at you. You want structured, terse output that’s easy to parse. No Unicode smart quotes breaking your shell scripts. No unsolicited suggestions cluttering your logs. No “Certainly!” before every response. For this use case, Caveman isn’t a hack. It’s a productivity feature.

The token optimization arms race keeps accelerating

Caveman is the latest entry in what’s becoming a crowded field. In the past month alone, the Claude Code community has produced a remarkable number of tools all attacking the same problem from different angles.

code-review-graph goes after the input side — building a local knowledge graph that cuts code review tokens from 739K to 15K, a 49x reduction. mcp2cli eliminates MCP tool schema injection entirely for a 99% reduction in wasted context. claude-token-efficient takes the CLAUDE.md approach — twelve rules in a config file for 63% output reduction. SuperClaude has an “UltraCompressed Mode” targeting 70%.

They’re all solving different parts of the same elephant. Caveman compresses output. code-review-graph compresses input context. mcp2cli kills unnecessary schema bloat. But none of them address the fundamental question: why does Claude need to be told to shut up in the first place?

Every one of these tools exists because Anthropic optimized Claude for a casual conversational experience — the kind where “Sure! I’d be happy to help!” feels friendly and reassuring. For developers who’ve moved their entire workflow into Claude Code, that personality is a tax. Caveman makes Claude talk like a caveman. But the real question is why we need a 700-star GitHub repo to achieve what should be a built-in toggle.

Top AI Product

Leave a comment Cancel reply