Amateur may have cracked Linear A, a 120-year-old puzzle

236

KKosturdistan about 4 hours ago 101 commentsRead Article on aiclambake.com

RU version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

79% Positive

Analyzed from 2564 words in the discussion.

Discussion (101 Comments)Read Original on HackerNews

stratocumulus0•about 2 hours ago

As an amateur who's been fascinated by this puzzle himself, I will add some context that might be relevant in assessing the plausibility of this claim:

- The "Libation Formula", which the author used as the base for his translations, is the most studied piece of writing in Linear A, because it's the only recurring phrase (with grammatical variation) that we have. The corpus is extremely fragmentary, with just a handful of instances of longer text (and even then, the texts are the length of an average sentence in English). The majority of documents available to us are lists (of inventory, personnel, offerings or something of this sort). The longer texts make use of punctuation marks, likely put in between words. This gives us a non-trivial vocabulary, which still does not match that of any known language.

- With such fragmentary remaining material, we cannot be sure that a) all the texts we call "Linear A" are written in the same language, and b) the recognizable words are not abbreviations, for example.

- The author made an assumption that Linear A symbols which have counterparts in Linear B should have the same phonetic values. This gives us an already known glyph that represented "NA". "Duplicate" glyphs are only found in the P-series, and are assumed to represent syllables which were distinguished by the Linear A language, but not by Greek - such as aspirated/unaspirated P. There is a glyph that stands for "NWA" in Linear B, but instances of it have been found in Linear A as well.

- There are countless words with no known etymology in Ancient Greek, assumed to originate from a substrate language or languages spoken in the area at the time Greeks migrated to their present-day homeland. The language of Linear A would be a likely candidate for such substrate. If Linear A were a Semitic language, then we should already be able to establish Semitic etymologies for those words as they were in Greek. Of course it could also be the case that these words came from an another language which did not adopt writing or its writing did not survive to our times.

tamarru•26 minutes ago

Ciao. I'm Tom di Mino, and I'm on vacation in Bellingham, Washington right now. I'll get back to you later with a formal response.

I've also reached out to Dr. Ester Salgarella, so I'm familiar with attempts to apply computational analysis to the corpus, and where previous efforts erred.

stratocumulus0•8 minutes ago

Always glad to exchange! I'm a software engineer and a hobby linguist only myself, so don't expect wonders from me. But this is a fun topic to research for sure.

Kosturdistan•3 minutes ago

Tom has entered the chat!

_alternator_•about 1 hour ago

Thanks for the context; how do you think this impacts plausibility? Presumably the fact that he made progress in a well studied passage is cause for skepticism? What's your take?

yorwba•24 minutes ago

Well, the reasoning in the article is that if you take A-TA-I-*301-WA-JA, keep only W-J and assume *301 starts with N, then you get a claimed Semitic root N-W-Y related to dwelling, except I wonder whether that shouldn't be N-W-H instead https://en.wiktionary.org/wiki/%D7%A0%D7%95%D7%95%D7%94 (Semitic isn't my area) so at best one fifth of one word matches two thirds of another therefore iT mUsT bE sEmItIc. A serious attempt at decipherment should at least try to explain the A-TA-I, or any of the other words in the sentence, for that matter.

simonw•13 minutes ago

> Di Mino used Claude Code to build a suite of Python scripts that query, cross-reference, and organize the digitized Linear A corpus (drawn from the GORILA and SigLA databases), enabling systematic hypothesis testing at a scale that would have been impractical to do manually.

That's exactly the kind of thing I'd hope Claude would be used for in these kinds of projects - building tools, not black-box "solving" the problem.

Tuna-Fish•about 2 hours ago

The reason linear A is so difficult is that the total remaining corpus of Linear A text is ~7500 characters, spread out over ~1500 inscriptions.

If you have a 4k screen, you can fit all remaining Linear A text on your screen at once, in 14pt high font.

stratocumulus0•about 2 hours ago

An in addition to that, a vast majority of documents are lists which consist of a "header" (1 to 3 words) and word-number pairs afterwards. An another common class are small clay seals with 1, 2 characters carved into them. It's likely that in both cases, we may be dealing with abbreviations.

Some of the lists end with "ku-ro" and a number that's the sum of all the previous numbers, oddly frequently off by one.

humodz•about 1 hour ago

It would be amusing if archaeologists in the future also end up spending countless hours trying to decipher my shopping lists and poor math skills

_kst_•about 1 hour ago

They hadn't yet decided whether to count from 0 or from 1.

cwmma•about 1 hour ago

Surprisingly this comes up more then you'd think, for instance in Ancient Rome, tomorrow is two days away so all the dates are off by one from what you'd think it was. They mainly count down and it goes, 5, 4, 3, day before, day.

kps•about 1 hour ago

“Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.” — Stan Kelly-Bootle (first person to obtain a postgraduate degree in computer science)

red_admiral•16 minutes ago

ku-ro obviously means "carry in" :)

dehrmann•about 2 hours ago

Very vaguely, it makes it like a one-time pad where it can be anything you want it to be. Not quite, but so little text leaves a lot of options open.

AaronAPU•about 1 hour ago

I wonder, is there a form of analysis which lets you quantify how ambiguous a set of symbols is? Maybe related to entropy?

Obviously one symbol can mean literally anything, but you could also have very long strings of symbols with many different meanings.

red_admiral•14 minutes ago

Yes. Somewhere in Claude Shannon's work, called the "unicity distance".

elbasti•about 1 hour ago

I would love to have this image available!

tclancy•about 1 hour ago

I’d send it to you but you probably wouldn’t understand it.

WithinReason•about 2 hours ago

As observed by archaeologist John Younger, the entire Linear A corpus takes up only 1.84 pages of letter paper when typeset in 12 point font and 1-inch margins.

stringfood•about 2 hours ago

when I first read the title thought he was talking about linear algebra and I was like damn it's not that hard

Kosturdistan•about 4 hours ago

A lot of loonies make this claim, but Tom's work is credible enough that it's being reviewed by linguistics experts at Rutgers and Cambridge. Additional validation: his approach produces results. He's translated over 300 words, and that's never been done before, and his solution actually solves some problems in Linear B. Tom is an AI engineer, and Claude Code was key to his work. Disclosures: I know Tom socially, and I wrote the post at the link.

kubb•about 3 hours ago

Let's wait until it's been verified.

gus_massa•10 minutes ago

I agree. The post has too few information. Also

>> reviewed by linguistics experts at Rutgers and Cambridge.

Here in Argentina, near 2005, we had like 5 guys that claimed to have 5 independent solutions of the Goldbach Conjeture. Each one got a PhD student that volunteer to read it, discussed the obvious problems with the author, tried to help to solve them and after a few months of back and forth they concluded that none of the solutions were correct or has an interesting insight. Nobody was surprised about the that, but some wanted to give them a try.

Until there is a official report by Rutgers or Cambridge, it doesn't mean too much.

>> He's translated over 300 words

Where is the table of translations?

mikestorrent•about 2 hours ago

You're absolutely right! We've opened a ticket with the Linear A folks, hopefully they'll get back to us soon with an update as to whether we've got it correct or not. Hang tight!

kridsdale1•about 2 hours ago

This comment sure is load bearing.

saagarjha•about 2 hours ago

A Linear ticket, hopefully

bawolff•about 1 hour ago

How does an expert even verify something like this?

red_admiral•8 minutes ago

They verified Linear B against a new tablet that turned up in a dig after the Kober/Ventris* solution had been published. It had pictures of jars with no or one or two handles, and the claimed Linear B for "two handled jar" and such next to the correct picture.

* Ventris' publication, but given Kober's contribution to the work they should really share equal credit. I like to think Kober would have got there on her own if she had access to the larger corpus that Ventris had (the Pylos tablets) and a comparable amount of free time and money available.

Kosturdistan•26 minutes ago

You look at the proposed sound values and compare it to other known languages. Languages from the same family share grammar and vocabulary.

sigbottle•about 1 hour ago

I mean it's not like anyone could objectively go back in time and query ancient civilizations for what they meant, but presumably it means the verification heuristics, they have currently, pragmatic success, and expert solidarity means that it is "verified"

yorwba•about 3 hours ago

Then why is there no link to the actual write-up?

GavinMcG•about 3 hours ago

Presumably because it hasn’t yet been published?

Kosturdistan•about 1 hour ago

The only write-up at the moment is my blog post, hopefully that changes in the coming weeks.

Sniffnoy•about 1 hour ago

The blog post mentions a draft of a manuscript though. I was expecting something like a preprint. He's not willing to post that draft yet?

m0llusk•about 3 hours ago

It seems this is still extremely early in the process. There is an apparent finding that was shared. Evidence which would be the basis for a paper is "being reviewed by linguistics experts at Rutgers and Cambridge". So they are trying to do the right thing by talking about what they believe they have done but holding off publication and serious claims until later. The general idea that written forms can be categorized by systems built with Claude could be applied to other as yet undecipherable languages could be used by other interested investigators just with what is discussed here.

sillysaurusx•about 2 hours ago

> The general idea that written forms can be categorized by systems built with Claude could be applied to other as yet undecipherable languages could be used by other interested investigators just with what is discussed here.

Could you rephrase this or explain it more thoroughly? I don’t follow. What does it mean to categorize a written form by systems built with Claude?

kelseyfrog•about 3 hours ago

You can use Claude, like the author, to reproduce the result.

_verandaguy•about 2 hours ago

This isn't really a reasonable approach, is it?

The original prompts aren't provided, nor is the original context; even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.

atrus•about 2 hours ago

somehow I suspect it was a bit more involved than: Claude, please solve Linear A.

justin_dash•about 3 hours ago

Unless if it was done by Fable!

grey-area•about 3 hours ago

Amazing work and refreshing to see a well written and cogent post to summarise it. Would love to hear more about how he used Claude to help solve the puzzle.

dwroberts•about 2 hours ago

You know him socially but is there a reason you’re writing this rather than him? It looks like he has his own web presence.

Cynical read would be you’re stealing his thunder a bit by prematurely announcing this before it’s fully confirmed

jstanley•about 1 hour ago

Promoting your friends' work is hardly stealing their thunder. It's increasing their thunder!

Kosturdistan•about 1 hour ago

Tom knows I'm a freelance writer and decided to give me the scoop. He's more interested in linguistics than he is in journalism.

Conscat•about 2 hours ago

Isn't it customary for the author of a post shared on HN to leave a comment on the thread?

dwroberts•about 2 hours ago

I’m not referring to the parent comment: The post is not written by the author of the claimed breakthrough.

iwontberude•about 2 hours ago

What thunder? Claude did the work and used a human to interface with experience and causality better.

Kosturdistan•about 1 hour ago

Claude helped, it did not do the work. It would have taken Tom more time to crack on his own, and it would have been harder, but the key insights were Tom's not Claude's.

ben_w•about 2 hours ago

The thunder is as per the headline. Assuming it passes review.

One of the things I find weird with AI is how the dismissals of work that involve AI splits into two camps: like yours, saying the AI did the work while the human played no role and deserves no credit; and those saying the AI rips off its training data while the human using it played no role and deserves no credit.

loudmax•about 2 hours ago

This is very exciting. Congrats to Tom on the accomplishment.

To be clear, this is an attempt at a decipherment. This is not proven, and we shouldn't consider Linear A to be "solved" until experts in the field have reviewed the work. In fact, it probably shouldn't be considered "proof" unless some more Linear A writings are uncovered and these are congruent with the method proposed. All that can be said for certain at this point is that this is an interesting conjecture.

But this is a story worth following. This could be the real deal. More research and validation should follow and we should have a better idea in the next few weeks or months whether Linear A has really been solved. At the very least, this is an interesting attempt, and optimistically, it could yield real insight into Minoan culture. Kudos.

Kosturdistan•about 1 hour ago

Thanks, I hope Tom is right, but now it's in the hands of the pros.

cwmma•about 1 hour ago

Isn't a big problem with Linear A that there are so few symbols you can "solve" it relatively straightforwardly with no way to tell if you it's correct or not?

Kosturdistan•9 minutes ago

The lack of discovered inscriptions does make deciphering it harder, but it is possible!

rich_sasha•24 minutes ago

Gotta love the nominative determinism: Tom Di Mino ("of Mino"?) cracks a Minoan language.

mNovak•about 2 hours ago

Interesting writeup. Would be nice to have a couple images of Linear A/B scripts to visualize. Looking on google, they're very daunting!

Blahah•about 1 hour ago

lineara.xyz is your friend

evilfred•29 minutes ago

i'm gonna write a blog post now about how my buddy discovered cold fusion and will have a paper out real soon now

WhitneyLand•about 2 hours ago

If confirmed this is really cool and impressive work.

Honestly curious how many years before it can be one shotted in a coding harness with Fable.next by someone who’s not a linguistics expert.

Develop, test, and rank hypotheses about the phonetic values, morphology, grammar, and possible language family of Linear A using the full available corpus. Do not assume any decipherment is correct. Treat all candidate readings as hypotheses to be scored…”

danishanish•20 minutes ago

I don’t imagine a model capable of the first part would require being told not to assume a decipherment is correct

bazoom42•about 1 hour ago

I wonder how you would even know if you have “cracked” it, given the corpus is so small?

Kosturdistan•8 minutes ago

You know you have cracked it because using the proposed system you are able to translate the uncracked language. Also helpful if your proposed system for Linear A makes sense relative to related languages that are not Linear A. Tom's proposed phonetic values work for more than one language.

vb-8448•about 1 hour ago

I wonder if LLMs trained specifically for this purpose can perform well with "forgotten languages".

I know I'm simplifying a lot, but all this deciphering isn't it just some kind of pattern matching?

doubleorseven•about 2 hours ago

crossing my fingers for this guy.

however, nawaya or what ever examples around it are not part of the Hebrew language.

drcode•39 minutes ago

Don't know about the situation for this particular example, but keep in mind this type of analysis will necessarily involve extremely archaic dialects of all the involved languages

indiv0•about 2 hours ago

Can I get his decipher-forgotten-ancient-text skill? I want to try my hand at the Voynich Manuscript

fooster•about 2 hours ago

Alot of the comments in this thread are disappointing. Rather that celebrating an achievement (whether or it is validated yet), many of you seem to want to put him down, or make it seem like claude did all the work.

Claiming that claude did all the work is patently ridiculous. Claude is a tool, like any other. The corpus of linear A is ~7500 characters across ~1500 inscriptions and claude, no matter how smart, doesn't just solve that on its own.

What a shame.

evilfred•30 minutes ago

this isn't an achievement, it's yet another amateur crank claiming he solved a famous puzzle, without a paper and without any critical review. many people have claimed to decode Linear A before. just because this guy used an LLM doesn't make it more credible

Kosturdistan•5 minutes ago

He has a working draft of a manuscript that may form the basis of a scholarly article, it has been shared with experts, and there is an excerpt of the paper in my blog post. I have also seen and read the paper with my own 2 eys, I can't publish it though, Tom wants to keep that under wraps while it's reviewed by linguistics experts.

OutOfHere•about 2 hours ago

Is this extendible to a generalizable approach to translate any language pair (without a translation map or translation dataset)?

retrac•about 2 hours ago

I think it is an open question: can an unknown language be cracked -- without any dictionary or grammar or understanding of the language? Just lots and lots of texts, maybe some of it bilingual.

It's a common misconception that is what happened with Ancient Egyptian with the Rosetta Stone. The Rosetta Stone was just one of the big pieces of the puzzle. The decoding came when people realized that Coptic (a language written alphabetically and still in use in the Coptic Church today) is actually descended from Ancient Egyptian; as Spanish is to Latin, Coptic is to Ancient Egyptian.

Similarly the attempts to decode classical Maya were all dead ends. Until Yuri Knorozov realized that it encoded the ancestor of the Maya languages which are still spoken to this day. (Knorozov's Wikipedia article is worth checking out just for his photo with his cat. [0] IMHO.)

I have written before about the La Mojarra 1 stele in Mexico [1]. It looks a lot like Maya. [2] But it isn't Maya. Maybe the difference like between Russian and Latin writing?

No one can read it. It's undecipherable. There are some attempts to identify it with a proposed ancient language that would have been related to the modern Mixe-Zoque languages: some of the glyphs that are shared with Maya, when read phonetically, start sounding like a Mixe-Zoque language. But no one has proposed a confident decipherment. There probably isn't enough text. La Mojarra 1 is the only long example of the Isthmian script.

Deciphering Akkadian was very difficult, at first. The process started with Persian; old Persian was written in a simplified adapted form of the Mesopotamian cuneiform (wedges on clay). It was a kind of alphabet. And Old Persian was already understood. And there was a bilingual text on a monument carved by Darius I. But even then -- decoding relies so heavily on the fact that Akkadian is a Semitic language distantly related to Hebrew, more distantly, also Ancient Egyptian. So again, we sort of knew what we were looking for.

That is all to say: even if the Voynich manuscript (for example) contains real text in an otherwise completely lost language, I'm not sure it is possible even theoretically to translate it.

[0] https://en.wikipedia.org/wiki/Yuri_Knorozov

[1] https://en.wikipedia.org/wiki/La_Mojarra_Stela_1

[2] https://commons.wikimedia.org/wiki/File:La_Mojarra_Stela_1_S...

Kosturdistan•about 1 hour ago

Tom thinks he may be able to use his approach to crack more languages, but that's not confirmed.

SoftTalker•about 2 hours ago

Towards the Star Trek universal translator.....

rw_panic0_0•about 2 hours ago

would like to hear more about Tom's learning/education path in ML/AI.

Kosturdistan•about 1 hour ago

I haven't talked to him extensively about how he learned his engineering skills, but he is I believe 100% self taught. His background is in copywriting.

NooneAtAll3•about 2 hours ago

relevant xkcd: https://xkcd.com/2151/

iwontberude•about 2 hours ago

Sorry but I don’t recognize this as being an achievement by an amateur. This dude had no chance in hell until we trained a model to use his time to suss it out.

jonahx•about 2 hours ago

Assuming this pans out, every other professional linguist in the world has had the option to use Claude or other LLMs, but has not solved this problem, despite the incentives for doing so. It stands to reason the human is adding crucial value.

Kosturdistan•about 1 hour ago

I drilled down on this with Tom. He thinks that it might not have happened without Claude Code, but Claude was used to organize all of the symbols, and to run I think it was 100,000 simulations to assess whether or not he had an actual insight, or if he just randomly got lucky. Claude did NOT crack the code. Significant supporting role though.

BretonForearm•about 1 hour ago

So Claude Code was used to generate software that ran simulations? I don't think LLMs in and of themselves can execute simulations, esp. a specific, non-single digit count like 100k.