Question 1

How does the voiceover feature work?

Accepted Answer

Type a script (up to 500 characters) and pick one of 14 voices. We synthesize the audio using Kokoro v1.0 — an open-source TTS model based on StyleTTS2 — running locally on our server. The generated WAV is mixed into your video alongside any background music you've added.

Question 2

What voices are available?

Accepted Answer

14 voices total: 5 American female (Bella, Nicole, Sarah, Sky, Kore), 4 American male (Adam, Michael, Onyx, Eric), 3 British female (Alice, Emma, Lily), and 2 British male (Daniel, George). All English. Quality is comparable to ElevenLabs for short narration scripts.

Question 3

Does this cost extra per character?

Accepted Answer

No. Kokoro TTS runs entirely on our server — no third-party API calls, no per-character billing. Your $5/month subscription covers unlimited voiceovers. Compare that to ElevenLabs ($22/mo + per-character) or OpenAI TTS ($15/M characters).

Question 4

Can I record my own voice instead of using AI?

Accepted Answer

Yes — upload your own audio file in the music panel. The mixing pipeline treats it like any other audio track, so it'll play alongside your video. (Soon we'll surface a dedicated 'upload narration' option for clarity, but the mechanism is already there via the music upload.)

Question 5

What if I don't know what to say?

Accepted Answer

If your input is a video with text on screen (kinetic-typography animation, marketing video with overlays), click the 'Pull script from video' button. Tesseract OCRs the on-screen text and pre-fills the script field for you. Edit before rendering if needed.

Question 6

Will the music drown out the voiceover?

Accepted Answer

No — when both are present, music is automatically ducked under speech via FFmpeg's sidechaincompress filter. Music drops 8 dB whenever the voice signal exceeds threshold and lifts back up between phrases. The result sounds like a podcast or radio ad — voice on top, music underneath.

Question 7

What input formats are supported?

Accepted Answer

MP4, MOV, M4V, and WebM video files. You can also voiceover HTML animations (Claude Design exports, Lottie, GSAP, CSS) — perfect for narrating a silent kinetic-typography piece.

Question 8

How long does voiceover generation take?

Accepted Answer

Kokoro runs at roughly 1.2× realtime on our 2-core CPU box. So a 200-character script (~20 seconds of speech) generates in ~25 seconds. Sub-2-minute total render overhead is typical.

Question 9

How much does it cost?

Accepted Answer

$5 per month, unlimited renders, unlimited voiceover characters. Cancel anytime.

AI Voiceover for Your Video —
14 Voices, No API Costs

Studio-quality narration without the studio bill

Adding Voiceover to Video — Common Questions

Related converters & tools

AI Voiceover for Your Video —14 Voices, No API Costs

Studio-quality narration without the studio bill

Adding Voiceover to Video — Common Questions

Related converters & tools

AI Voiceover for Your Video —
14 Voices, No API Costs