Forge is an open-source guardrail framework from antoinezambelli that takes an 8B self-hosted model from 53% to 99% accuracy on agentic workflows — within 1 percentage point of frontier APIs running the same framework. It showed up on Show HN this week and was presented as a demo at CAIS 2026.
## What’s in the guardrail stack
Retry nudges, step enforcement, error recovery, context compaction, and hardware-aware VRAM budgeting. The whole stack operates independently of the specific tools or workflows being executed — meaning you wrap your existing agent setup without rewriting it. It’s a Python framework for self-hosted LLM tool-calling and multi-step agentic workflows.
## The striking finding
The same 8B model plus Forge outperforms frontier APIs running without guardrails. In other words, the reliability gap between a small open model and GPT-5.5 or Opus 4.7 on agentic tasks isn’t mostly about model size — it’s about the scaffolding around the model. Forge closes most of that gap with engineering rather than parameters.
## Why it matters
If an 8B model with good guardrails matches frontier APIs on agentic reliability, the economics of self-hosting flip. Run it on your own hardware, no per-token API bill, no rate limits, no data leaving your network. For agentic workloads specifically, this is the strongest case yet for going local.

Leave a comment