700 GitHub Stars in a Week: Apfel Exposes the Free LLM Apple Locked Behind Siri

Every Mac with Apple Silicon has a large language model built into the operating system. Not downloaded. Not sideloaded. Baked in. Apple ships it as part of Apple Intelligence — a 3-billion-parameter model that runs entirely on your Neural Engine and GPU. Zero cloud. Zero cost. Zero API keys.

You just can’t use it. Not directly, anyway. Apple locked it behind Siri and a handful of system features — summarization, Writing Tools, Smart Reply. The FoundationModels framework technically lets developers access it, but only inside apps, only through Xcode, and only within Apple’s guardrails. If you wanted to pipe a prompt through it from Terminal, or spin up a local API server, or use it with any tool that speaks the OpenAI protocol — tough luck.

A developer named franze decided that was absurd. The result is Apfel, and the developer community’s reaction tells you everything about how much pent-up demand there was.

The Narrative That Lit Up Hacker News

The pitch writes itself: every Mac owner is already paying for an AI model through their hardware purchase. Apple just won’t let them use it freely. Apfel wraps the FoundationModels framework in a standalone CLI and HTTP server. No Xcode required. No app bundling. Just brew install and you have a local AI endpoint running in under a minute.

The Show HN post hit 448 points and 93 comments in hours. On GitHub, the repo went from roughly 400 stars to over 700 in days, with 295 new stars on April 3 alone. For a project that doesn’t introduce a new model, doesn’t require a GPU cluster, and doesn’t cost a dime, that traction says something specific about where developers are right now.

We’re deep into AI subscription fatigue. OpenAI charges $20-200/month. Anthropic charges $20/month for Claude Pro. Even local inference tools like Ollama require you to download multi-gigabyte model files and manage disk space. Apfel’s value proposition skips all of that: the model is already on your machine. It shipped with the OS. You literally paid for it when you bought the Mac.

“Every Mac has a free AI locked inside it” — that framing is what drove the virality. It’s technically accurate and emotionally resonant in a way that most developer tools aren’t.

Three Interfaces, One Hidden Model

Apfel ships as a Swift 6.3 binary with three access modes, and each one targets a different workflow.

The CLI is pure UNIX. Pipe text in, get text out. apfel "Summarize this project" -f README.md does exactly what you’d expect. JSON output mode for scripting. File attachments. Streaming. System prompts via -s flag. It slots into shell workflows without ceremony — chain it with other commands, wrap it in scripts, use it in CI pipelines.

The HTTP server is where things get interesting for the broader ecosystem. Run apfel --serve and you get an endpoint at localhost:11434 that responds to the standard /v1/chat/completions format. This is the same protocol that Ollama, LM Studio, and every OpenAI-compatible tool on the planet already speaks. Claude Code, Cursor, Continue, custom Python scripts — anything that can hit an OpenAI API can now talk to Apple’s on-device model with zero configuration changes on the client side.

The interactive chat mode handles longer conversations with five context management strategies — newest-first, oldest-first, sliding-window, summarize, and strict. Useful when you need to control how the model allocates its limited context window across turns.

Under the hood, the architecture is cleaner than you’d expect for a v0.6 project. An ApfelCore library separates testable logic from the FoundationModels dependency. 48 unit tests. 51 integration tests. MIT license. 114 commits. One developer taking quality seriously.

Apple’s 3B Model: Surprisingly Good at What It’s Good At

Let’s be honest about what 3 billion parameters can and can’t do.

Apple’s own benchmarks show the model scoring 67.85% on MMLU, 60.60% on multilingual MMMLU, and 74.91% on MGSM math reasoning. For its size class, those numbers are competitive with Qwen-2.5-3B and Gemma-3n. Here’s the surprising part: on specific benchmarks, Apple’s 3B model actually outperforms much larger models — Phi-3-mini (3.8B), Mistral-7B, Gemma-7B, and even Llama-3-8B. Apple’s quantization-aware training and KV-cache sharing are doing serious work here.

One Hacker News commenter ran backtesting and found Apple’s model outperformed frontier cloud models in 6 out of 10 task-specific cases. Not because the model is smarter in absolute terms, but because for focused tasks — summarization, extraction, classification, code explanation — a fast local model with zero latency and zero cost beats a slow API call to a model that’s 100x larger. The right tool for the right job.

But the limitations are real and you need to know them.

The 4,096-token context window is the biggest constraint. That’s combined input and output — roughly 3,000 English words total. You’re not feeding it a codebase. You’re not having a 20-turn conversation. Apple’s own documentation explicitly states the model “is not designed for world knowledge or advanced reasoning.” This is a task model — summarize this, extract that, classify this, explain that. If you approach it that way, it delivers. If you expect ChatGPT, you’ll be disappointed.

The safety guardrails also drew fire on HN. Multiple users reported the model refusing benign requests — a pattern Apple has been aggressive about across its AI products. Apfel includes a --permissive flag that loosens some restrictions, but it’s not a full bypass. Apple’s safety training is baked deep into the weights.

And the hard prerequisite: macOS 26 (Tahoe) with Apple Intelligence enabled. If you’re on Sequoia or earlier, or you’ve opted out of Apple Intelligence on principle, Apfel simply won’t work. Several HN commenters pointed out this excludes a significant chunk of Mac users. Fair criticism.

Apfel vs. Ollama vs. LM Studio: Different Problems Entirely

It’s tempting to compare Apfel to Ollama or LM Studio, but they’re not really competing. They’re solving different problems with different trade-offs.

Ollama recently switched its entire Mac inference backend to Apple’s MLX framework, hitting 1,810 tokens/sec on prefill and 112 tokens/sec on decode — nearly 2x improvement over the old llama.cpp backend. LM Studio has been running MLX even longer, with a polished GUI for browsing and downloading models. Both tools give you access to thousands of models on Hugging Face — Llama, Qwen, Mistral, Gemma, DeepSeek, anything that fits your hardware. You pick the model, the quantization, the context length.

Apfel gives you exactly one model. Apple’s model. You can’t swap it. You can’t fine-tune it (outside of Apple’s separate adapter training framework). You can’t choose a different quantization. It’s the model Apple shipped, running on the hardware Apple designed, through the framework Apple controls.

So why would anyone choose that constraint?

Three reasons. First, zero setup friction. No model downloads, no disk space allocation, no waiting for a 4GB GGUF to transfer. brew install apfel and you’re running inference in seconds. Second, it’s genuinely free — not “free tier with rate limits” but free as in the model already exists on your computer. No subscriptions. No tokens to count. No billing dashboards. Third, the privacy guarantee is architectural, not contractual. There are no network calls in the inference path. Not “we promise not to log” but “physically impossible to leak.” For developers building health apps, legal tools, financial analysis — anything where data sensitivity matters — that distinction is real.

The practical use case is clear: Apfel is the tool you reach for when you need a decent LLM right now, on this machine, with zero dependencies and zero data exposure risk. It’s not replacing your Ollama setup for serious work. It’s the AI equivalent of the calculator app — always there, instant, free, good enough for most quick tasks.

The CORS Incident Worth Mentioning

Within hours of the HN launch, a commenter flagged a CORS vulnerability in Apfel’s HTTP server. The default configuration allowed any webpage to send requests to the localhost endpoint — meaning a malicious website could theoretically interact with your local AI server without permission.

Franze responded the same day. Version 0.6.23 shipped with CORS restricted by default, a published security guide, and an opt-in flag for cross-origin access. The HN thread gave credit for the rapid fix, though several commenters noted this is becoming a recurring pattern across the local-AI and MCP ecosystem — developers shipping localhost servers without thinking through browser security models.

Small incident, but it reveals something about the project’s character. One developer, v0.6, MIT license, 99 tests, same-day security patches. This is early-stage open source done right — building in public, taking feedback seriously, iterating fast.

The broader story here isn’t really about Apfel the tool. It’s about Apple spending years building on-device AI capabilities — Neural Engine hardware, the MLX framework, unified memory architecture, a custom-trained 3B model — and then locking the consumer-facing AI behind Siri’s mediocre interface. The developer community is doing what it always does: finding the seams, building the tools the platform owner won’t, and proving demand exists for capabilities that were already paid for but never exposed. Seven hundred stars in a week for a CLI wrapper isn’t just enthusiasm for a neat hack. It’s a signal that developers are done paying monthly fees for AI when there’s already a free one sitting on their hardware, doing nothing.

Top AI Product

Leave a comment Cancel reply

700 GitHub Stars in a Week: Apfel Exposes the Free LLM Apple Locked Behind Siri

You Might Also Like

Share this:

Discover more from Top AI Product

Leave a comment Cancel reply