Back to News
Advertisement
00xchamin about 13 hours ago 0 commentsRead Article on github.com

HI version is available. Content is displayed in original English for accuracy.

I watch a lot of Stanford/Berkeley lectures and YouTube content on AI agents, MCP, and security. Got tired of scrubbing through hour-long videos to find one explanation. Built v1 of mcptube a few months ago. It performs transcript search and implements Q&A as an MCP server. It got traction (34 stars, my first open-source PR, some notable stargazers like CEO of Trail of Bits).

But v1 re-searched raw chunks from scratch every query. So I rebuilt it.

v2 (mcptube-vision) follows Karpathy's LLM Wiki pattern. At ingest time, it extracts transcripts, detects scene changes with ffmpeg, describes key frames via a vision model, and writes structured wiki pages. Knowledge compounds across videos rather than being re-discovered. FTS5 + a two-stage agent (narrow then reason) for retrieval.

MCPTube works both as CLI (BYOK) and MCP server. I tested MCPTube with Claude Code, Claude Desktop, VS Code Copilot, Cursor, and others. Zero API key needed server-side.

Coming soon: I am also building SaaS platform. This platform supports playlist ingestion, team wikis, etc. I like to share early access signup: https://0xchamin.github.io/mcptube/

Happy to discuss architecture tradeoffs — FTS5 vs vectors, file-based wiki vs DB, scene-change vs fixed-interval sampling. Give it a try via `pip install mcptube`. Also, please do star the repo if you enjoy my contribution (https://github.com/0xchamin/mcptube)

Advertisement

Discussion (0 Comments)Read Original on HackerNews

No comments available or they could not be loaded.