Multimodal
-
Baidu NAVA Treats Audio and Video as One Signal Instead of Two Streams to Align
NAVA — Native Audio-Visual Alignment for Generation — comes out of Baidu’s ERNIE research group and stakes a position the field has been edging toward: audio and video should be learned together as one signal, not two separate streams stapled at the seam. ## Native, not stitched Most audio-visual generation pipelines today are bolted together.… Continue reading
-
When Imagination Becomes Motion: Inside OiiOii.ai

If you’ve ever been intimidated by animation software or daunted by the idea of turning a story idea into a finished video, OiiOii.ai feels like a breath of fresh air. It’s one of those products that makes you pause and realize how far creative tools have come. Instead of asking you to learn complex timelines,… Continue reading
-
Inside the Billion-Dollar Mystery Startup Backed by SenseTime Veterans Liu Yu: Vivix AI

In a year dominated by rapid advances in generative video models, one of the most talked-about companies is also one of the most secretive. A stealth startup founded by former senior researchers from SenseTime has reportedly reached a valuation exceeding $1.2 billion — despite having no public product, no demos, and barely any official communication.… Continue reading
