AI Video & Image
-
Hero Studio Photos Turns One Snapshot Into Studio Shots From Every Angle
Product photography is the unglamorous tax on selling anything online: good lighting, multiple angles, a model or mannequin for clothing. Hero Studio Photos collapses that into a single snapshot. Photograph whatever you’re selling, and it generates clean, listing-ready studio shots from every angle. ## What Hero Studio Photos does From one photo, the app produces… Continue reading
-
Veridive Searches the Spoken Web and Jumps to the Exact Second
Most video search still means scrubbing a timeline or skimming a transcript. Veridive, billed as a discovery platform for the spoken world, does something more precise: ask a question in plain language and it returns a cited answer pinpointed to the exact second it’s spoken — across YouTube, podcasts, lectures, and interviews. ## What Veridive… Continue reading
-
Vaani Dubs Video Into 40+ Languages With Lip-Sync
Dubbing a video into another language usually means a studio, a slow turnaround, and a result where the mouth never matches the words. Vaani, an AI dubbing platform built for creators, broadcasters, and studios, aims to make that output broadcast-ready rather than just “done.” It translates and re-voices video into 40+ languages with cloned voices… Continue reading
-
LoomVideo does unified video generation and editing at 5B parameters, not 13B
Most “unified” video models — ones that both generate and edit from mixed text, image, and video inputs — are heavy, 13B parameters or more, and they handle editing by concatenating the source video’s tokens, which doubles the sequence length and quadruples attention cost. LoomVideo, a new arXiv release from Peking University, aims for the… Continue reading
-
VideoKR builds the first large training corpus for knowledge- and reasoning-intensive video understanding
Most video AI gets graded on shallow recall — what’s on screen at minute three. VideoKR, a new arXiv release, targets the harder thing: video questions that need outside knowledge and multi-step reasoning, not a textual shortcut. It’s billed as the first large-scale training corpus built specifically for that. ## What’s in it The dataset… Continue reading
-
Grok Imagine 1.5 turns a still image into 720p video with synced audio and tops the image-to-video arena
xAI’s Grok Imagine 1.5 takes a still image (or a text prompt) and animates it into a clip — with native, synchronized audio baked in: music, sound effects, even lip-synced dialogue. It shipped as an API preview on June 3. ## What’s new in 1.5 The headline is audio in every generation — no separate… Continue reading
-
Ideogram 4.0 is the first open-weight text-to-image model to top the DesignArena leaderboard
Ideogram 4.0, released June 3, is Ideogram’s first open-weight text-to-image model — and it landed at #1 among all open-weight models on the DesignArena leaderboard the day it shipped. It’s a 9.3-billion-parameter Diffusion Transformer trained from scratch, not a fine-tune of someone else’s base. ## Built for design, not just pretty pictures The headline feature… Continue reading
-
Microsoft MAI-Image-2.5 debuts at No. 3 on Arena with built-in image editing
## What it is MAI-Image-2.5 is Microsoft’s new in-house image generation and editing model, shown at Build 2026 and available in Foundry. It debuted at No. 3 on Arena.ai’s image leaderboard — a +75-point jump over MAI-Image-2 — with its biggest gains in text rendering (+107) and cartoon, anime, and fantasy (+90). It’s already running… Continue reading
-
PhysX-Omni generates simulation-ready 3D objects — rigid, deformable, and articulated — for embodied AI
PhysX-Omni is a unified framework for generating simulation-ready physical 3D assets across object types: rigid bodies, deformable objects, and articulated objects (things with joints, like doors or robot arms). “Simulation-ready” is the key phrase — the outputs aren’t just pretty meshes, they carry the physical properties a simulator needs. ## What’s under the hood A… Continue reading
-
GenEvolve makes image-generation agents self-improve by distilling their own successful attempts
GenEvolve, from MeiGen-AI, is a self-evolving framework for image generation agents. The core reframe: each generation attempt isn’t a one-shot prompt, it’s a tool-orchestrated trajectory — the agent gathers evidence, selects reference images, invokes generation skills, and composes them into a prompt-reference program. ## Tool-orchestrated visual experience distillation Instead of calling a diffusion model once,… Continue reading
