Salvatore Sanfilippo — the Redis guy — dropped ds4.c, a native inference engine for DeepSeek V4 Flash written as one C file with zero external dependencies. The whole thing is a Metal graph executor wired to DS4’s MoE topology: custom loader, prompt rendering, KV state, server glue. No GGUF wrapper, no llama.cpp fork on the main path. HN 223 points in a day.
What it actually does
ds4 is a command-line inference engine for Apple Silicon. Heavy KV cache compression pushes context length far past what local stacks usually manage, and KV state persists to disk so long sessions stay warm across runs. It pairs with a custom 2-bit quantization scheme, which is how 128GB of unified memory becomes enough to host V4 Flash at all — thinking mode included, something other local runtimes basically cannot serve under the same constraints.
Why anyone cares
antirez’s name carries: Redis-grade C, one file, no build hell. And right now it’s the cleanest path to a frontier MoE actually running on your own machine. If you have an M-series MacBook with 128GB, you have V4 Flash with thinking, locally, no API key.
You Might Also Like
- 397 Billion Parameters on a 48gb Macbook Flash moe Turns Apples 2023 Research Into Reality
- Ane Apple Neural Engine Training Someone Actually Cracked Open Apples Secret ai Chip for Training
- Ollama mlx on Apple Silicon 1810 Tokens sec Prefill and the end of Llama cpp on mac
- Deepseek tui Tops Github Trending a Claude Code Clone Wired to Deepseeks api
- Freeform Just Raised 67m to Build the Fastest Metal Printer on Earth and Theyre not Even Selling it

Leave a comment