ZH version is available. Content is displayed in original English for accuracy.
The basic flow is: place SVG characters on a scene, write dialogue, pick voices, and render to MP4. It handles word timestamps, mouth cues, and lip-sync automatically.
This started as me playing around with Jellypod's Speech SDK and HeyGen's HyperFrames. I wanted a small tool that could go from script to video without a big animation pipeline and next thing I knew I was trying to create my own South Park style show and here we are. :D
A few details:
- desktop app built with Electron
- supports multiple TTS providers through Jellypod's Speech SDK
- renders via HyperFrames
- lets you upload or generate characters and backdrop scenes
- includes default characters/scenes so you can try it quickly
- open source
It runs from source today. AI features use bring-your-own API keys, but the app itself is fully inspectable and local-first in the sense that there’s no hosted backend or telemetry.
Here are some fun examples of the the types of videos you can create:
https://x.com/deepwhitman/status/2046425875789631701
https://x.com/deepwhitman/status/2047040471579697512
And the repo:
https://github.com/Jellypod-Inc/cartoon-studio
Happy to answer questions and appreciate any feedback!

Discussion (1 Comments)Read Original on HackerNews
Also, Moho offers far more comprehensive (and comprehensible!) lip-sync: https://lostmarble.com/papagayo/
I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this.
Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial.
Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...