Job TreeNavigate the job tree to view your child job details
Loading job tree...
Generate a video of a person speaking a given text using lipsyncing.
Code
ready
INPUTS
source_video
JSON
Drag or click to select file
video to lip-sync
text
text to speak
lipsync_engine
Lipsync engine to use. Supported engines: "musetalk", "video_retalking".
voice_engine
voice engine to use. Supported engines: "cartesia", "elevenlabs".
downsample_video
Whether to downsample the video to 720p.
voice_stability
Value between 0 and 1. Increasing variability can make speech more expressive with output varying between re-generations. It can also lead to instabilities.
voice_style
Value between 0 and 1. High values are recommended if the style of the speech should be exaggerated compared to the original source audio. Higher values can lead to more instability in the generated speech. Setting this to 0.0 will greatly increase generation speed and is the default setting.
Outputs
waiting for outputs
Logs
listening for logs...
README

Text to Video Lipsyncing

This app takes in a video file and a piece of text to then output a video that makes it look like the person in the video is saying the text. It combines two apps readily availabe on Sieve:

To learn more about the various engines for lipsyncing and text-to-speech, check out the respective documentation above.