At WWDC 2026 Apple shipped AFM 3, its third-generation Foundation Models: five models split across on-device and cloud. The headline is AFM 3 Core Advanced, a 20-billion-parameter sparse model that actually runs on an iPhone.
The flash-memory trick
A 20B model won’t fit in phone RAM, so Apple doesn’t try. The full model lives in flash (NAND), and a lightweight router picks a fixed set of experts per prompt — activating just 1–4B parameters at a time. They call it Instruction-Following Pruning. Result: 20B-class quality on hardware that can only hold a few billion active weights in memory. Alongside it sit the 3B AFM 3 Core, server-side AFM 3 Cloud, an image model, and AFM 3 Cloud Pro, Apple’s most capable model — extended to NVIDIA GPUs in Google Cloud while keeping Private Cloud Compute’s privacy guarantees.
What developers actually get
Through the Foundation Models framework, you call AFM 3 directly inside your app — on-device or cloud, your choice. Good for agentic tool use, on-device dictation, summarization, and structured generation with no API key and no per-token bill. That last part is the real shift.
You Might Also Like
- Tinygpu George Hotz got Apple to Sign an Nvidia gpu Driver for mac
- Z ais glm 5 1 Tops swe Bench pro at 58 4 Trained on Zero Nvidia Hardware
- Google x Warby Parker Android xr Glasses 599 Smart Glasses With Gemini Built in Launching q3 2026
- Alibaba Amap Embodied Quadruped Abot World Alibabas First Robot Just Beat Google and Nvidia on the Leaderboard
- Framework Laptop 13 pro Ships With Panther Lake npu and pre Installed Linux 1272 hn Upvotes in 24 Hours

Leave a comment