HI version is available. Content is displayed in original English for accuracy.
Another feature I use a lot is selecting an audio fragment, sending a predefined prompt to an AI to "explain grammar" or "explain nuances of meaning" and I still experimenting with prompts.
And because shadowing is so easy I also use it as a player to improve my English pronunciation. (I am not a native English speaker.)
I made a quick video showing the workflow for creating Anki cards and shadowing: https://youtu.be/TaR58uuDBvU?si=o5aGLAi2S-BZ7Zy9
The app supports 15 input languages (Japanese and Chinese are the latest experimental additions), and more than 30 output languages.
I would really appreciate it if you could try it https://lingochunk.com/try. I know there are other tools with similar functionality but I created something that fits my workflow and it is fun to build.
Also I struggled to find public domain audio for the try page. I'd be grateful if anyone could point me to public domain sources (I used LibriVox, Wikimedia and FSI courses), or if you're a creator, let me feature some of your own recordings with credits and links.

Discussion (23 Comments)Read Original on HackerNews
Japanese language are often described as using multiple type of alphabets - kanji, kana, numbers, and English alphabets sometimes - and pronunciations of especially kanji is not very well constrained, creating tons of homophones and homographs, e.g. "koushou" shared across more than 20 words, and the character for "life" said to be involved in more than 150 differently read parts of words.
Even OT but Unicode code space used for Japanese Kanji is famously shared with Chinese Hanzi, leading to ambiguities.
This situation is causing AI-based TTS(and also image generators) trained directly on Unicode text to go weird on kanji, even for simple ones as "tomorrow". Classical pre-LLM Japanese TTS avoid this by operating on generated or manually specified pronunciations, skipping kanji altogether, which do occasionally lead to wrong readings, but won't lead to sound generation code creating butchered middle-of-road sounds.
It doesn't seem like most or any of AI TTS tackle this problem, but I'm not in that field. Do anyone know the statuses on it?
Great project, and congrats for launching :)
Are you willing to share more technical details?
- Which data sources do you ingest?
- How do you transform and enrich the data? How does your pipeline look?
- What are your key challenges?
- Which tools do you use? What is your 'stack'? (Stanze, wordfreq, Whisper, wn, ...)
Background: I am currently building a multi-lang vocabulary hub for language learning. The goal is to match core words/lemmas to their senses/concepts, and then be able to generate multi-language flash cards.
I am still stuck on the sense alignment and fingerprinting (example: should 'to shop', 'einkaufen', ' alışveriş yapmak' and 'go shopping' point to the same concept of 'shop'?), but in a later stage I want to allow user-submission and data enrichment for IPA, pictograms [1] and audio.
[1: https://arasaac.org/pictograms/search]
Use-case (the dream): I come back from language class, I input new vocab and I output new Anki cards that work across all my fluent languages.
Currently, I mostly find myself knee-deep in problems of linguistics, NLP, Python and getting an LLM to do exactly what I want. At the same time it is a super fun project, and really makes me feel the joy of programming again. LLMs are magic, time just flies by, and all the random projects I always wanted to do suddenly materialize.
For coding, I mostly use free Gemini and some deepseek-v4-flash via openrouter to keep a tight oversight and understand the problem space. Maybe this slows me down, but agentic code jsut does not align with me. Overall, I haven't spent more than 2 € in total.
So far, surprisingly, the biggest problem is the lack of high-quality, free input data (example: English has the Oxford 5000 words as core vocabulary, but it is difficult to find the same for e.g. Turkish).
2nd place is the lack of high-quality synsets/wordnets (cross-language is mostly incomplete), and the 3rd place is getting LLMs to reliable play to their strength (on paper, a LLM is the perfect tool to provide multi-lang sense equivalents)
I plan to do a full writeup sometimes, but first I need it to work :)
And the site itself is a great idea and implementation, though the font size and family of the ui (not of the actual playback area) has a lot of room for improvement, but those are just minor changes.
[0] https://lingochunk.com/privacy
Also the pinyin for 誰/谁 is coming through as shuí, whilst this character has two pronounciations, I believe shéi is the more common one.
I use a firefox extension to convert simplified to traditional, looks like it's open source so that may be of some use to you: https://github.com/tongwentang/tongwentang-extension.
Although there are some clashes that it does not handle, e.g. 隻 and 只 are both 只 in simplified, you just have to know which one it is from context, but the extension fails to convert to 隻 where appropriate.
https://talkhabit.com/shadow Or example, of one exercise: https://talkhabit.com/shadow?videoUrl=https%3A%2F%2Fwww.yout...
Stuff I need to work on: - It only works with videos that have auto-generated captions - It works best with monologue videos
[1]: https://github.com/hiAndrewQuinn/audio2anki