HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
89% Positive
Analyzed from 954 words in the discussion.
Trending Topics
#amp#model#whisper#local#language#kroko#yapsnap#dlp#install#output

Discussion (27 Comments)Read Original on HackerNews
My biggest challenge is finding a proper language model that is fast enough and accurate enough since I have to caption about 600 hours of video per week and I preferably want to run all of this on a tiny server (2 cores 4 GB memory). This tool could easily do that with the kroko model but I'll have to test if the accuracy is good enough.
Also in my own scripts I'm using ffmpeg to download just the audio of the videos that I want to caption, which saves a lot of bandwith and speeds up the whole process. As far as I can see this tool doesn't do that, that would be a nice functionality to add, plus an option to turn the output into a working .srt file.
sudo apt update && sudo apt install -y ffmpeg python3-pip python3-venv && git clone https://github.com/kouhxp/yapsnap.git && cd yapsnap && python3 -m venv ~/yapsnap-venv && source ~/yapsnap-venv/bin/activate && pip install --upgrade pip && pip install .
On a 32GB ThinkPad X13, a 21 minute YouTube video was processed by yapsnap under 2 minutes.
Very well done!
I guess if it encourages you to install and figure out how to use ffmpeg, yt-dlp, kroko, numpy, and onnx that's a good thing. Sometimes just knowing a thing is possible is a huge benefit.
This repo is now a good way to centralize hacks around the sure-to-come blockers those platforms will add to prevent download.
Just like uBlockOrigin was a way to centralize all the "just run this greasemonkey script" comments, I can see this getting a huge following for people who really value transcriptions.
I think I’m gonna go read a book.
NPUs - definitely a good use case for at least part of it, there are ports of whisper that use coreML/ANE with less power and 3x speed of CPU only