PeakMVBETA
Major Release

Introducing V2:
The Image-First Pipeline

Preview every scene. Edit any prompt. Get consistent characters. The biggest upgrade to PeakMV since launch.

PMPeakMV TeamMarch 26, 2026

V1 asked you to trust the AI blindly. You'd hit generate, wait minutes, and hope the output matched your vision.

V2 flips that. See every frame before a single second of video renders.

What was wrong with V1?

V1 was a straight shot: your audio went in, text-to-video prompts were generated, and the AI rendered clips directly. It worked, but it had three painful limitations:

  • 1.No preview. You couldn't see what the video would look like until it was fully rendered. If you didn't like scene 7, you had to regenerate the entire video.
  • 2.Inconsistent characters. The same "person" looked different in every scene. Brown hair in scene 1, blonde in scene 3, different face entirely in scene 5.
  • 3.No lip sync. Characters on screen couldn't move their lips to your vocals. It felt disconnected.

How V2 works: Image-First

Instead of generating video directly from text, V2 splits the process into two stages. First, the AI generates a still image for each scene — a cinematic keyframe that captures the exact composition, lighting, and character. Then, it animates that frozen frame into motion.

This means you get to see and approve every single scene before any expensive video rendering starts. Don't like the lighting in scene 4? Regenerate just that one image. Want to tweak the camera movement? Edit the motion prompt without touching the visual.

The V2 pipeline

1

Upload & trim your audio

Pick your segment, choose 720p or 1080p

2

AI analyzes audio + generates scene images

Genre detection, lyrics extraction, keyframe generation

3

Preview & edit every scene

Regenerate images, tweak prompts, perfect your vision

4

Render final video

Images animate to motion, lip sync applied, stitched together

What's new in V2

Scene Preview Board

Every scene generates a still image first. Browse them all in a visual grid, grouped by location. Approve what you love, regenerate what you don't. No more blind rendering.

Per-Scene Regeneration

Hate scene 5 but love the rest? Regenerate just that one. Edit the image prompt, tweak the motion direction, or swap the location entirely. Each scene is independent.

Consistent Main Character

Upload your photo or let the AI generate a fictional star. Either way, the same character appears consistently across every scene. No more identity shifts between cuts.

Lip Sync

Toggle lip sync on and your character's mouth moves to the actual vocals. It's the detail that makes an AI video feel like a real music video instead of a slideshow.

Smart Scene Deduplication

The AI groups visually similar scenes and reuses keyframes with different motion. A 60-second video with 12 scenes might only need 5-6 unique images. Lower cost, faster generation, tighter visual coherence.

V1 vs V2 at a glance

FeatureV1 ClassicV2 Image-First
PipelineText-to-VideoImage-to-Video
Scene PreviewNoneFull preview board
Edit Individual ScenesNoYes
Consistent CharacterNoYes (upload or AI)
Lip SyncNoYes
Quality Options720p / 1080p720p / 1080p
Prompt ControlSingle promptImage + motion per scene
Smart DedupNoYes (~50% fewer images)
Wizard Steps5 steps3 steps

A simpler wizard

V1 had five steps. V2 has three. We combined upload + settings into one step, merged concept generation with the scene preview, and streamlined checkout into a single render confirmation.

The main character is always present — either upload your face or let the AI design one. No more choosing "No Character." Every video deserves a star.

Lyrics are detected automatically from your audio. No need to paste them manually. The AI extracts vocals, transcribes in any language, and uses them to sync scenes to your lyrics.

Under the hood

V2 isn't just a UI refresh. The entire generation backend was rebuilt:

  • Dual prompt system — separate image prompts (frozen keyframe composition) and video prompts (motion directives). Each optimized for its model.
  • Genre-aware prompting — 8 genre-specific visual vocabularies. Hip-hop gets low-angle power shots. Classical gets slow crane movements. The AI matches the visual language to your sound.
  • Face injection pipeline — Nano Banana model composites your face into scene images naturally, not as a crude paste but as a contextual blend that respects lighting and pose.
  • Main character system — a detailed physical description is generated once and embedded verbatim into every scene prompt. Same person, every frame.
  • LTX-2 for 1080p — new model with 6-second clips at $0.04/second, replacing Kling v2.5. Faster renders, lower cost.

Try V2 now

See your scenes before they render. Upload a track and experience the difference.

Create with V2