AI Developer Tools & SDKs
-
DFlash beats EAGLE-3 by 2.5x using block diffusion as the speculative draft model
Z-Lab (Chen, Liang, Liu) shipped DFlash this week. 3.6k GitHub stars, +671 in a single day. It’s an inference speedup layer for any LLM, and the trick is genuinely new. What’s actually different Speculative decoding has been around for a while: a small draft model guesses N tokens, the big model verifies them in one… Continue reading
-
Title: ds4 (DeepSeek-V4 Metal local inference engine by antirez): Redis creator runs V4 Flash on a single MacBook
Salvatore Sanfilippo — the Redis guy — dropped ds4.c, a native inference engine for DeepSeek V4 Flash written as one C file with zero external dependencies. The whole thing is a Metal graph executor wired to DS4’s MoE topology: custom loader, prompt rendering, KV state, server glue. No GGUF wrapper, no llama.cpp fork on the… Continue reading
-
VectifyAI PageIndex throws out vector databases — 98.7% on FinanceBench
Vector search isn’t the only answer to RAG anymore. VectifyAI’s PageIndex just crossed 29K GitHub stars (+953 today) with a different pitch: skip embeddings, give the LLM a tree index of your document, let it reason its way to the right page like a human would. How it works Feed in a PDF. PageIndex builds… Continue reading
-
Tilde.run hits 162 on Show HN with a ‘Git for agent runs’ sandbox from the lakeFS team
Coding agents have one nasty habit: they nuke your working directory. Half-applied edits, deleted files, an rm -rf nobody asked for. Tilde.run hit 162 points on Show HN today by treating that as a database problem — every agent run is a transaction. Clean exit commits, crash rolls back, nothing silently overwritten. What it actually… Continue reading
-
DeerFlow 2.0 hits 65K stars: ByteDance open-sources its long-horizon agent stack
DeerFlow 2.0 is ByteDance’s open-source super-agent harness for tasks that run minutes to hours. 65.6K stars, 8.7K forks, still on GitHub Trending. It’s a full v2 rewrite on LangChain and LangGraph — a supervisor agent plans the work, spawns sub-agents in parallel, hands them sandboxed tools, and keeps memory across the run. What it actually… Continue reading
-
Google ships Gemma 4 multi-token prediction drafters: 2.7-3.5x faster inference, free
What it is Tiny helper models that ride alongside Gemma 4 and guess 4-8 tokens ahead per forward pass. The main model just verifies. Right guess, you get the whole sequence in one pass. Wrong guess, fall back to normal. No quality loss because the big model still signs off on every token. Same speculative-decoding… Continue reading
-
DeepSeek-TUI tops GitHub Trending: a Claude Code clone wired to DeepSeek’s API
DeepSeek-TUI hit #1 on GitHub Trending today. 2,389 stars in 24 hours. It’s a Rust-based terminal coding agent built specifically for DeepSeek models — sitting in your shell like Claude Code or Codex CLI, but routing every call to DeepSeek V4 / DeepSeek-Coder through the official API. What you actually get A TUI that reads… Continue reading
-
Mistral Workflows ships Temporal-powered AI orchestration, already running at ASML and CMA-CGM
Mistral didn’t roll their own agent runtime. They wrapped Temporal — the same durable execution engine Netflix and Stripe use — and pointed it at LLM workflows. Public preview hit late April 2026, and ASML, ABANCA, France Travail, and CMA-CGM are pushing millions of executions through it daily. What it actually is An orchestration product… Continue reading
-
OpenAI Realtime Voice WebRTC Stack: the infra blueprint every voice agent startup now has to compete with
OpenAI dropped an engineering deep dive May 4 on how it serves real-time voice to 900M+ weekly users. It hit Hacker News front page with 324 points — the first time OpenAI has formally walked through the architecture behind ChatGPT Voice and the Realtime API. What they rebuilt They rewrote the WebRTC stack from scratch.… Continue reading
-
DeepClaude lets Claude Code run on DeepSeek V4 Pro — $0.87 vs $15 per million tokens
DeepClaude is sitting at 467 points and 179 comments on Hacker News today, and the GitHub repo has crossed 540 stars in a few days. The pitch is one line: keep Claude Code’s agent loop, swap Anthropic’s models for DeepSeek V4 Pro. The bill drops about 17x. What it actually is A self-hosted proxy. You… Continue reading
