Major Release

Introducing V2:
The Image-First Pipeline

Preview every scene. Edit any prompt. Get consistent characters. The biggest upgrade to PeakMV since launch.

PMPeakMV Team•March 26, 2026

V1 asked you to trust the AI blindly. You'd hit generate, wait minutes, and hope the output matched your vision.

V2 flips that. See every frame before a single second of video renders.

What was wrong with V1?

V1 was a straight shot: your audio went in, text-to-video prompts were generated, and the AI rendered clips directly. It worked, but it had three painful limitations:

1.No preview. You couldn't see what the video would look like until it was fully rendered. If you didn't like scene 7, you had to regenerate the entire video.
2.Inconsistent characters. The same "person" looked different in every scene. Brown hair in scene 1, blonde in scene 3, different face entirely in scene 5.
3.No lip sync. Characters on screen couldn't move their lips to your vocals. It felt disconnected.

How V2 works: Image-First

Instead of generating video directly from text, V2 splits the process into two stages. First, the AI generates a still image for each scene — a cinematic keyframe that captures the exact composition, lighting, and character. Then, it animates that frozen frame into motion.

This means you get to see and approve every single scene before any expensive video rendering starts. Don't like the lighting in scene 4? Regenerate just that one image. Want to tweak the camera movement? Edit the motion prompt without touching the visual.

The V2 pipeline

Upload & trim your audio

Pick your segment, choose 720p or 1080p

AI analyzes audio + generates scene images

Genre detection, lyrics extraction, keyframe generation

Preview & edit every scene

Regenerate images, tweak prompts, perfect your vision

Render final video

Images animate to motion, lip sync applied, stitched together

What's new in V2

Scene Preview Board

Every scene generates a still image first. Browse them all in a visual grid, grouped by location. Approve what you love, regenerate what you don't. No more blind rendering.

Per-Scene Regeneration

Hate scene 5 but love the rest? Regenerate just that one. Edit the image prompt, tweak the motion direction, or swap the location entirely. Each scene is independent.

Consistent Main Character

Upload your photo or let the AI generate a fictional star. Either way, the same character appears consistently across every scene. No more identity shifts between cuts.

Lip Sync

Toggle lip sync on and your character's mouth moves to the actual vocals. It's the detail that makes an AI video feel like a real music video instead of a slideshow.

Smart Scene Deduplication

The AI groups visually similar scenes and reuses keyframes with different motion. A 60-second video with 12 scenes might only need 5-6 unique images. Lower cost, faster generation, tighter visual coherence.

V1 vs V2 at a glance

Feature	V1 Classic	V2 Image-First
Pipeline	Text-to-Video	Image-to-Video
Scene Preview	None	Full preview board
Edit Individual Scenes	No	Yes
Consistent Character	No	Yes (upload or AI)
Lip Sync	No	Yes
Quality Options	720p / 1080p	720p / 1080p
Prompt Control	Single prompt	Image + motion per scene
Smart Dedup	No	Yes (~50% fewer images)
Wizard Steps	5 steps	3 steps

A simpler wizard

V1 had five steps. V2 has three. We combined upload + settings into one step, merged concept generation with the scene preview, and streamlined checkout into a single render confirmation.

The main character is always present — either upload your face or let the AI design one. No more choosing "No Character." Every video deserves a star.

Lyrics are detected automatically from your audio. No need to paste them manually. The AI extracts vocals, transcribes in any language, and uses them to sync scenes to your lyrics.

Under the hood

V2 isn't just a UI refresh. The entire generation backend was rebuilt:

Dual prompt system — separate image prompts (frozen keyframe composition) and video prompts (motion directives). Each optimized for its model.
Genre-aware prompting — 8 genre-specific visual vocabularies. Hip-hop gets low-angle power shots. Classical gets slow crane movements. The AI matches the visual language to your sound.
Face injection pipeline — Nano Banana model composites your face into scene images naturally, not as a crude paste but as a contextual blend that respects lighting and pose.
Main character system — a detailed physical description is generated once and embedded verbatim into every scene prompt. Same person, every frame.
LTX-2 for 1080p — new model with 6-second clips at $0.04/second, replacing Kling v2.5. Faster renders, lower cost.

Try V2 now

See your scenes before they render. Upload a track and experience the difference.

Create with V2

Introducing V2:The Image-First Pipeline