AI Models & APIs
-
pixserp gives LLMs one endpoint, 10 answer shapes, $1.50 per 1k requests
pixserp launched on Product Hunt this week — a single API endpoint that returns 10 different answer shapes (web, news, images, places, shopping, flights, hotels, YouTube, transcripts, any URL) so an LLM can pick the right format for the question instead of stitching together five different services. ## The pricing and architecture $1.50 per 1,000… Continue reading
-
Krea 2 ships Krea’s first AI image foundation model with moodboard-based style control
Krea released Krea 2 on May 12 — the company’s first foundation image model built completely from scratch. Where most image models compete on prompt understanding, Krea 2 is built around the second half of the problem: how you want the image to look. ## Moodboards as the headline feature You upload multiple reference images.… Continue reading
-
NVIDIA Ising ships open-source AI models for quantum error correction — 2.5x faster, 3x more accurate
NVIDIA released Ising, the first open-source family of AI models purpose-built for quantum computing. The headline claim: 2.5x faster and 3x more accurate error-correction decoding compared to traditional approaches. Early adopters include Harvard and Fermi National Accelerator Laboratory. ## What Ising actually does Quantum computers generate error-correction problems that classical algorithms struggle with — qubit… Continue reading
-
δ-mem boosts frozen LLMs by 31% on MemoryAgentBench with an 8×8 online memory state
δ-mem is a lightweight memory mechanism from DECLARE Lab that augments a frozen full-attention LLM with a compact online state of associative memory. The paper hit Hacker News with 216+ points this weekend. Open-source code is up at declare-lab/delta-Mem. ## The mechanism Past information gets compressed into a fixed-size state matrix updated by delta-rule learning.… Continue reading
-
Supertonic v3: 99M-parameter on-device TTS covers 31 languages with expression tags
Supertone Inc shipped Supertonic v3 — 99M parameters, 31 languages, running entirely on-device via ONNX Runtime with zero cloud calls. GitHub trending lit up this week with 745+ daily stars as the broader dev community discovered the release. ## The size argument At 99M parameters Supertonic v3 is roughly 7-20x smaller than competing open TTS… Continue reading
-
Orthrus-Qwen3 hits 7.8x tokens-per-forward on Qwen3-8B with identical output distribution
Orthrus is a dual-architecture framework that wraps a frozen Qwen3-8B base model with a lightweight trainable diffusion module. It delivers up to 7.8x more tokens per forward pass while producing the exact zero-shot accuracy of the base model — no sampling drift, no quality regression. ## How it works Most speculative decoding methods (EAGLE-3, DFlash)… Continue reading
-
NVIDIA SANA-WM: 2.6B-parameter open-source world model generates one-minute 720p video on a single GPU
NVIDIA Labs has released SANA-WM, a 2.6B-parameter controllable world model that generates 720p, one-minute videos with 6-DoF camera control. The model is open source with weights on GitHub at NVlabs/Sana. ## The efficiency story SANA-WM trained on roughly 213K public video clips with metric-scale pose supervision. Total training run: 15 days on 64 H100 GPUs.… Continue reading
-
Mistral Cybersecurity Model for European Banks (Anthropic Mythos Rival) lands as ECB sounds the alarm on Mythos attacks
The timing is almost too perfect. On May 13, the ECB publicly warned eurozone lenders that Anthropic’s Mythos — the agentic model that auto-discovers zero-days at machine speed — is already being used to probe them. The same day, Bloomberg reported Mistral is in talks with European banks to ship a rival they can actually… Continue reading
-
Needle by Cactus Compute squeezes Gemini 3 tool calling into 26M parameters
Cactus Compute, the YC-backed on-device inference startup, open-sourced Needle today: a 26M-parameter Simple Attention Network that does single-shot function calling on phones and smartwatches. No MLPs, just attention and gating — the team argues FFN params are wasteful at this scale, and cross-attention is the right primitive for routing a query to the right tool.… Continue reading
-
MiniCPM-V 4.6 packs Qwen3.5-2B-level vision into a 1.3B model
OpenBMB open-sourced MiniCPM-V 4.6 on May 11. A 1.3B-parameter multimodal model built on SigLIP2-400M plus Qwen3.5-0.8B, aimed squarely at the edge — phones, laptops, consumer GPUs. The trick is in the visual encoder. LLaVA-UHD v4 brings intra-ViT early compression with a hybrid 4x/16x token compression ratio, cutting vision encoding compute by more than 50% versus… Continue reading
