Turning a video into a single great still usually means scrubbing for a frame and editing it by hand. Google’s Gemini image models now do it from a prompt: you can pass a video — a direct upload or a public YouTube URL — as context alongside text, and the model generates a thumbnail, poster, or summary infographic that actually reflects what’s in the clip.
## What it does
The capability runs on gemini-3.1-flash-image, treating the video as multimodal context rather than asking you to pick and crop a frame yourself. Ask for a cinematic movie poster, a YouTube thumbnail, or an infographic summarizing the footage, and it composes a new image informed by the video’s content. It’s aimed squarely at creators and marketers who generate this kind of derivative art constantly.
## Part of the Nano Banana Pro lineup
It sits alongside Nano Banana Pro (Gemini 3 Pro Image), now generally available, which leans on stronger reasoning and world knowledge — and can pull from Google Search — to build context-rich infographics and diagrams. Together they push Google’s image stack past pure text-to-image toward generation grounded in real source material, whether that source is the web or a video you hand it.

Leave a comment