Two Consumer GPUs, One Evening, and a 245% Reasoning Boost: How LLM Circuit Finder Works

Most approaches to improving LLM reasoning involve expensive fine-tuning, synthetic data pipelines, or reinforcement learning loops that eat GPU-weeks. LLM Circuit Finder throws all of that out. Instead, it copies three specific transformer layers, pastes them back into the forward pass, and watches logical deduction scores jump from 0.22 to 0.76 on Big-Bench Hard. No weight changes. No training. The entire discovery was made on two AMD consumer GPUs in a single evening.

The project hit Hacker News this week with 88 points and 30 comments, and landed on bestofshowhn.com’s March 2026 picks. The discussion it sparked — about whether transformers contain manipulable “cognitive units” — is arguably more interesting than the tool itself.

The RYS Method: Where This All Started

LLM Circuit Finder builds on David Ng’s RYS (Repeat Yourself) technique, which first surfaced in mid-2024. Ng’s original insight was deceptively simple: what if certain blocks of transformer layers function as indivisible reasoning pipelines, and running inputs through them twice makes the model think harder?

He tested this on Qwen2-72B by duplicating seven middle layers. The results: +17.72% on MuSR and +8.16% on MATH benchmarks. No weights were modified. The model just got a second pass through its reasoning circuit with the exact same parameters.

What happened next validated the approach in a way benchmarks alone couldn’t. Community members took Ng’s RYS-modified models, fine-tuned them further, and the resulting models topped the HuggingFace Open LLM Leaderboard. As of early 2026, descendants of those RYS models still dominate several leaderboard categories.

Two observations originally inspired Ng’s hypothesis. First, LLMs can reason in Base64 encoding — suggesting that reasoning happens in abstract intermediate representations, not at the token level. Second, the Goliath-120B “frankenmerge” model (which stitched together layers from different models) worked despite violating every assumption about weight distribution consistency. If you could Frankenstein layers together and get coherent output, maybe the layer-level organization was more modular than anyone thought.

What LLM Circuit Finder Actually Does

The toolkit automates the process of finding these reasoning circuits across any transformer model. Rather than guessing which layers matter, it runs a systematic three-phase sweep:

Phase 1: Coarse scan. Test large blocks (8 layers) with wide strides across the full model to identify “hot zones” — regions where duplication improves benchmark scores.

Phase 2: Precision targeting. Narrow down to 3-5 layer blocks with stride-1 scanning to find exact circuit boundaries.

Phase 3: Pattern exploration. Experiment with multi-pass and interleaved configurations to discover different cognitive profiles.

That last phase produced one of the project’s most surprising findings: different duplication patterns activate different capabilities from the same weights. Double-pass through the reasoning circuit enhances mathematics. Triple-pass strengthens what the creator calls “emotional intelligence” (measured via a custom EQ benchmark). Interleaved patterns maximize math performance further. Same model, same weights, different routing — different cognitive specialization.

The Numbers That Got People Talking

The headline results come from two models:

Devstral-Small-2-24B (40 total layers, reasoning circuit at layers 12-14):
– BBH Logical Deduction: 0.22 → 0.76 (+245%)
– GSM8K math reasoning: 0.48 → 0.64 (+33%)
– MBPP code generation: 0.72 → 0.78 (+8%)
– Average improvement across all benchmarks: +8%

Qwen2.5-Coder-32B (64 total layers, reasoning circuit at layers 7-9):
– Reasoning composite: 76.5% → 94.1% (+23%)
– EQ benchmark: 92.1 → 93.6 (+1.6%)

The boundaries are surgical. Shift the duplicated block by a single layer in either direction, and the improvement vanishes — or inverts. Layers 12-14 on Devstral are the reasoning circuit. Layers 11-13 or 13-15 are not. This isn’t a gradient; it’s a cliff.

The cost is modest. Three extra layers on a 40-layer model adds roughly 7.5% to inference time and about 1.5 GiB of extra VRAM. For a 245% improvement in logical deduction, that’s a trade most practitioners would take without blinking.

Why This Matters Beyond the Benchmarks

The Hacker News discussion around LLM Circuit Finder centered less on the tool and more on what it implies about how transformers organize knowledge.

The conventional view treats transformer layers as a continuous pipeline where each layer refines representations incrementally. LLM Circuit Finder’s results suggest something more structured: transformers self-organize into discrete functional modules during training. These modules — the “circuits” — are multi-layer units that perform complete cognitive operations. A single layer does almost nothing meaningful on its own, but the right block of 3-4 layers constitutes an indivisible reasoning pipeline.

This aligns with broader work in mechanistic interpretability. Anthropic’s 2025 circuit tracing research on Claude 3.5 revealed similar modular structures — sequences of features forming coherent computational pathways from input to output. MIT Technology Review named mechanistic interpretability one of the 10 breakthrough technologies of 2026.

But LLM Circuit Finder takes a more pragmatic angle than most interpretability research. Where Anthropic and Google DeepMind are mapping circuits to understand model behavior (and detect potential misalignment), this project exploits circuit structure to boost performance directly. It’s the difference between studying anatomy and performing surgery.

How It Compares to Other Approaches

The landscape of “make LLMs smarter without full retraining” has several contenders, and LLM Circuit Finder occupies a unique niche:

Fine-tuning / LoRA / QLoRA: These modify weights and require training data, compute budgets, and hyperparameter tuning. LLM Circuit Finder modifies architecture only — it’s orthogonal to fine-tuning and can be stacked on top of it. Ng’s original RYS models were later fine-tuned by the community and achieved even higher scores.

Prompt engineering / chain-of-thought: Zero-cost at inference but limited in scope and inconsistent. Circuit duplication provides structural, reproducible improvements that don’t depend on prompt quality.

Model merging (frankenmerging): Similar in spirit — both rearrange model architecture without training — but merging combines layers from different models, introducing distribution mismatches. Circuit duplication stays within a single model’s learned representations.

Sparse autoencoders (SAEs) and mechanistic interpretability tools: These aim to understand models, not improve them directly. Google DeepMind actually deprioritized SAE research in 2025 after finding that SAEs underperformed simple linear probes on practical tasks like detecting harmful intent. LLM Circuit Finder sidesteps the understanding question entirely and goes straight to applied performance gains.

Speculative decoding and inference optimization: These speed up inference without changing outputs. Circuit duplication deliberately changes outputs by giving the model extra reasoning passes, trading a small amount of speed for substantially better accuracy.

The project is MIT-licensed, written entirely in Python, and runs on llama.cpp for inference. At 43 GitHub stars and 10 commits, it’s still early — but the technique it validates has already proven itself through Ng’s earlier leaderboard results.

What the Community Is Saying

The Hacker News thread surfaced several recurring themes. Skeptics questioned whether the benchmark improvements would hold on more diverse evaluations beyond BBH and GSM8K. Others pointed out that the sharp layer boundaries — where a single-layer shift destroys the effect — need explanation. If these circuits are real cognitive units, why are their boundaries so exact?

Enthusiasts were more excited about the implications for local LLM users. If you’re running a 24B or 32B model on consumer hardware, copying three layers is essentially free compared to upgrading to a larger model. The 1.5 GiB VRAM overhead is trivial on modern GPUs, and the inference slowdown is negligible.

Several commenters noted the democratizing aspect: the entire discovery pipeline runs on two AMD GPUs (an RX 7900 XT and RX 6950 XT) that together cost less than a single A100. If reasoning circuits can be found this cheaply, the technique could be applied systematically across the growing catalog of open-weight models.

FAQ

What models does LLM Circuit Finder support?
It has been tested on Mistral-architecture models (Devstral) and Qwen2-architecture models (Qwen2.5). Since the technique operates on transformer layers generically through llama.cpp, it should work on most transformer-based models, though the specific circuit locations will differ per model and must be discovered through the sweep process.

Does circuit duplication slow down inference?
Yes, proportionally to the number of extra layers. Duplicating 3 layers on a 40-layer model adds approximately 7.5% to inference time. For most use cases, this is a negligible cost relative to the reasoning improvements — especially compared to the alternative of running a much larger model.

How does this compare to just using a bigger model?
Circuit duplication can close the gap between a smaller and larger model on specific capabilities (particularly logical reasoning and math) without the full VRAM and compute cost of the bigger model. It won’t match a larger model across all capabilities, but for targeted reasoning tasks, it’s remarkably cost-effective.

Can I combine circuit duplication with fine-tuning?
Yes, and this is actually how the technique achieved its most impressive results. David Ng’s original RYS-modified models were fine-tuned by community members and ended up topping the HuggingFace Open LLM Leaderboard. The methods are orthogonal — one modifies architecture, the other modifies weights.

Is this related to Anthropic’s circuit tracing research?
Both deal with identifying functional circuits inside transformers, but from different angles. Anthropic’s work maps circuits to understand model behavior and improve safety. LLM Circuit Finder exploits circuit structure to boost performance. The underlying insight — that transformers contain modular, functional sub-networks — is shared, but the applications diverge.

Top AI Product

Leave a comment Cancel reply