AI Video & Image
-
Gemini Omni: Google ships a multimodal video model that takes image, audio, video, and text as input
Google announced Gemini Omni at I/O 2026 — a new model series that combines Gemini’s reasoning capabilities with native video generation. The first release, Gemini Omni Flash, accepts image, audio, video, and text input and outputs video grounded in real-world knowledge that can be easily edited. ## What’s actually new Most video generation models today… Continue reading
-
HKUDS’s ViMax orchestrates Director, Screenwriter, Producer, and Video Generator agents for multi-shot AI video
HKUDS released ViMax — a multi-agent video generation framework that combines four specialized roles into one end-to-end pipeline: Director, Screenwriter, Producer, and Video Generator. Input a concept, output a multi-shot video with consistent characters and scenes. ## The agent roles Each agent owns a discrete stage. Screenwriter drafts the script from your concept. Director plans… Continue reading
-
Odyssey ships Starchild-1, the first real-time multimodal world model that generates synchronized audio and video
Odyssey ML announced Starchild-1 on May 17 — the first general world model that autoregressively generates synchronized audio and video in real-time while continuously responding to streaming user input. The kicker: world models until now have been silent. ## What’s actually new Previous world models (Genie, Sora video, Decart’s models) learned visual dynamics from large-scale… Continue reading
-
Krea 2 ships Krea’s first AI image foundation model with moodboard-based style control
Krea released Krea 2 on May 12 — the company’s first foundation image model built completely from scratch. Where most image models compete on prompt understanding, Krea 2 is built around the second half of the problem: how you want the image to look. ## Moodboards as the headline feature You upload multiple reference images.… Continue reading
-
Vivago Video Agent bundles Sora 2, Veo 3.1, and Kling under one conversational interface for $8/month
Vivago.ai’s Video Agent lets you describe a video in plain language and routes the request across OpenAI’s Sora 2, Google’s Veo 3.1, Kling v2.6 Pro, Nano Banana Pro, and Seedream v4 — picking the right model and writing the prompt for you. No prompt engineering, no comparing pricing across providers. ## What’s actually in the… Continue reading
-
NVIDIA SANA-WM: 2.6B-parameter open-source world model generates one-minute 720p video on a single GPU
NVIDIA Labs has released SANA-WM, a 2.6B-parameter controllable world model that generates 720p, one-minute videos with 6-DoF camera control. The model is open source with weights on GitHub at NVlabs/Sana. ## The efficiency story SANA-WM trained on roughly 213K public video clips with metric-scale pose supervision. Total training run: 15 days on 64 H100 GPUs.… Continue reading
-
Picsart MCP puts 140+ image and video models behind one endpoint
Picsart shipped an MCP server and GenAI CLI on April 28, 2026. Plug it into Claude Code, Cursor, Codex, or Windsurf and one connection gives an agent access to 140+ models — Nano Banana, Flux, Sora, Kling, Veo, Runway, Recraft, ElevenLabs, GPT Image — across image, video, and audio. What you actually call It’s an… Continue reading
-
Higgsfield Supercomputer routes Sora, Kling, Veo, and Seedance through one chat agent
Higgsfield just shipped Supercomputer, an agentic creative pipeline. Describe a reel, an ad, or a week of content in one chat — it plans the shots, picks the right model for each clip, generates, and delivers. No tab-switching across five generators, no prompt rewrites per model. What it actually is A video-first creative agent sitting… Continue reading
-
Anthropic Claude Creative Connectors ship to Photoshop, Blender, and Ableton
Anthropic finally moved past coding. On April 28 the company shipped nine official Claude connectors plugging straight into Adobe Creative Cloud (Photoshop, Premiere, Express, 50+ apps), Blender, Autodesk Fusion, Ableton Live, Splice, Affinity, SketchUp, and Resolume. What the connectors actually do These are MCP-based agents, not chat plugins. In Blender, Claude reads your whole scene… Continue reading
-
Grok Imagine 1 on DesignArena across all three video arenas, beating Sora 2 Pro and Veo 3.1
xAI’s Grok Imagine took the top spot on every DesignArena video board — Video Arena (Elo 1337), Video Editing Arena (1291), and Image-to-Video Arena (1298, confirmed at 1329 in the latest run). It beat Runway Gen-4.5, Sora 2 Pro, and Google Veo 3.1 on the same leaderboard run by Arcada Labs. During the 30-day pre-launch… Continue reading
