Generate a video of a person speaking a given text using lipsyncing.

source_video

text

Lipsync engine to use. Supported engines: "musetalk", "video_retalking".

lipsync_engine

voice engine to use. Supported engines: "cartesia", "elevenlabs".

voice_engine

downsample_video

Value between 0 and 1. Increasing variability can make speech more expressive with output varying between re-generations. It can also lead to instabilities.

voice_stability

Value between 0 and 1. High values are recommended if the style of the speech should be exaggerated compared to the original source audio. Higher values can lead to more instability in the generated speech. Setting this to 0.0 will greatly increase generation speed and is the default setting.

Text to Video Lipsyncing