708 GitHub stars in 48 hours: claude-token-efficient (Universal Claude.md) and the fight over Claude’s most expensive habit

“You’re absolutely right!” That’s Claude’s favorite response to almost anything you say, including statements that aren’t even claims. Developer Scott Leibrand opened an issue about it on GitHub last year. It got 350 upvotes. The Register ran a story titled “Claude Code’s endless sycophancy annoys customers.” Anthropic acknowledged it. Months passed. The problem persisted.

So Drona Gangarapu did what frustrated developers do: he wrote a fix himself. One markdown file. Twelve rules. Drop it into your project root, and Claude reads it automatically. No code changes, no config, no dependencies. The repo — claude-token-efficient — went live on March 30, 2026. By the next morning it was on the Hacker News front page with 247 points and 94 comments. Two days later, 708 GitHub stars.

The pitch is simple: a 63% reduction in output tokens. The reality, as the Hacker News crowd quickly pointed out, is a lot more nuanced than that.

Twelve rules that strip Claude down to its skeleton

The CLAUDE.md file is roughly 400 words of instructions that Claude Code ingests at the start of every session. It attacks verbosity from every angle.

The output rules are the most aggressive. Answer first, reasoning after. No preamble — no “Great question!”, no “Sure!”, no “Certainly!”. No hollow closings like “I hope this helps!”. No restating the prompt. No explaining what you’re about to do before doing it. No unsolicited suggestions. If the task is clear, execute immediately.

Then there’s what the file calls “sycophancy zero tolerance.” Never validate the user before answering. Never say “You’re absolutely right!” unless they made a verifiable correct statement. Disagree when wrong. Don’t change a correct answer because the user pushes back. This is a direct response to that 350-upvote GitHub issue — the one where Claude told a user “You’re absolutely right!” after the user simply said “Yes please.”

The typography rules are surprisingly specific. No em dashes — use hyphens. No smart quotes — use straight quotes. No Unicode bullets. No ellipsis characters — use three dots. This isn’t about aesthetics. It’s about downstream parsing. If you’re piping Claude’s output into scripts, shell commands, or other tools, Unicode characters cause silent breakage. Anyone who’s debugged a shell script that fails because of a curly quote knows the pain.

The code output rules target another common complaint: over-engineering. Return the simplest working solution. No abstractions for single-use operations. No speculative features. No docstrings on code you didn’t change. Read the file before modifying it — never edit blind.

Beyond the universal file, Gangarapu ships three specialized profiles: CLAUDE.coding.md tuned for development work, CLAUDE.agents.md for multi-agent automation, and CLAUDE.analysis.md for data and research tasks. Each one tightens the rules further for its target domain. The installation is about as frictionless as it gets — a single curl command downloads the file into your project root and you’re done.

Gangarapu’s benchmark tested five prompts with and without the file. Baseline: 465 words total. Optimized: 170 words. That’s the 63% number. Individual prompts ranged from 50% to 75% reduction. The repo’s README is honest about methodology — this is a directional indicator from five prompts, not a controlled study. But the direction is clear.

The Hacker News math problem nobody can agree on

Within hours of hitting the front page, the thread split into two camps. And the debate reveals a fundamental tension in how developers think about AI costs.

The skeptics came with receipts. User monooso dropped the most cited data point in the thread: output tokens represent only 4% of Claude Code’s total token usage. Input tokens account for 93.4%. If you’re only squeezing the 4%, a 63% reduction on that slice is… not much. The file itself adds 400-plus tokens to every single message as input context. For short, quick queries, you’re actually spending more tokens than you save.

The cost projections in the README tell this story quietly. At 100 prompts per day, the savings come out to about $0.86 per month. You’d need to be running 1,000 prompts daily across multiple projects to see savings north of $25 per month. For developers on Claude Code’s subscription plans, where you’re paying a flat monthly fee rather than per-token, the cost argument largely evaporates.

Then came the deeper technical objection. Multiple commenters referenced Karpathy’s research showing that language models perform better when given more tokens for reasoning. User danpasca put it bluntly: forcing “answer first, reasoning after” contradicts how autoregressive models actually work. The model generates tokens sequentially. When you force it to commit to an answer before working through its reasoning chain, you’re not just cutting fluff — you’re potentially cutting accuracy. User motoboi warned that pushing models to respond in unnatural formats takes them out-of-distribution, reducing capability in ways that are hard to measure but easy to feel.

The defenders had their own logic. For high-volume automation pipelines — the primary use case the README targets — you don’t need Claude to explain its thinking. You need structured output, fast. If you’re running an agent loop that processes hundreds of tasks, every “Sure! I’d be happy to help with that!” is wasted compute. The token savings compound. The consistency improves. And the parsing becomes more reliable when you’re not dealing with random Unicode characters and chatty preamble.

User cheriot suggested a middle path: instead of loading these rules as a CLAUDE.md that affects every interaction, package them as a separate skill that gets activated only for automation contexts. Keep the verbose, reasoning-heavy mode for exploration and debugging. Use the stripped-down mode for pipelines. Another commenter, sillysaurusx, argued that the real solution isn’t constraining output at all — it’s building better session handoff systems with “/handoff” commands that generate compressed documentation before context windows fill up. The problem, in this view, isn’t that Claude talks too much. It’s that the token budget runs out before the work is done, and developers are treating symptoms instead of causes.

The thread also surfaced a user experience concern that doesn’t show up in token counts. User hatmanstack cited recent research on “self-consistency” — the idea that some degree of redundancy in AI output actually helps the model stay coherent across longer interactions. Strip too much away, and you don’t just lose pleasantries. You lose the connective tissue that keeps multi-step reasoning on track. For a 3-line answer, that doesn’t matter. For a 200-line refactoring session, it might.

The token optimization arms race Anthropic keeps losing

claude-token-efficient didn’t appear in a vacuum. It’s the latest entry in a growing ecosystem of tools trying to fix what many developers see as a problem Anthropic should solve at the model level.

The code-review-graph project attacks the problem from a different angle — building a local knowledge graph that cuts code review tokens from 739K down to 15K, a 49x reduction on daily coding tasks. The mcp2cli approach eliminates MCP tool schema injection entirely, saving up to 99% of wasted context tokens. nadimtuhin’s claude-token-optimizer claims 90% savings through reusable setup prompts. ooples built a token-optimizer-mcp server that uses caching and compression for 95% reduction. SuperClaude has an “UltraCompressed Mode” targeting 70% cuts.

Every one of these projects exists because developers feel like they’re paying a tax on Claude’s personality. The sycophancy, the verbose explanations, the unsolicited suggestions — these aren’t bugs in the traditional sense. They’re design choices that optimize for a particular user experience: the casual user who wants a friendly, thorough assistant. But the developer running 500 automated tasks a day has a very different definition of “helpful.”

The irony is that Anthropic’s own Claude Code system prompt already includes many of the same instructions that claude-token-efficient enforces. “Keep your text output brief and direct.” “Lead with the answer or action, not the reasoning.” “If you can say it in one sentence, don’t use three.” The model knows what it’s supposed to do. It just… doesn’t always do it. Which is why external enforcement through CLAUDE.md files has become a cottage industry.

Hacker News user levocardia made an interesting observation: Anthropic dogfoods Claude Code internally, which should create pressure to optimize the vanilla experience. But “should” and “does” are different words. As of March 2026, the community is still filing issues, still building workarounds, and still upvoting complaints about “You’re absolutely right!” at a rate that suggests the model’s default behavior hasn’t meaningfully changed.

The 708 stars on claude-token-efficient aren’t just validation of one developer’s markdown file. They’re a vote — from hundreds of developers — that Claude’s default output is still too expensive, too chatty, and too eager to please. Whether a single config file is the right fix or a band-aid over a deeper model-level issue depends entirely on what you’re building. If you’re exploring code, debugging a tricky problem, or having a back-and-forth conversation with your AI pair programmer, you probably want Claude to think out loud. If you’re running a pipeline that processes structured data at scale, every “Certainly! I’d be happy to help!” is money burning.

Gangarapu’s file knows exactly which side of that line it sits on. The README says it explicitly: not recommended for exploratory work, single short queries, or tasks that need debate and alternatives. This is a tool for people who already know what they want and need Claude to shut up and do it. For that specific use case, 708 stars in 48 hours is the community saying: yeah, we needed this.

Top AI Product

Leave a comment Cancel reply