Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

78% Positive

Analyzed from 817 words in the discussion.

Trending Topics

#tokens#token#basic#text#llm#code#type#open#written#paren

Discussion (23 Comments)Read Original on HackerNews

asveikau•1 day ago
> TI-BASIC programs are stored as tokens, not text: every command, function, and variable is a token of 1 or 2 bytes. The OS detokenizes (token→display string) to show a program and tokenizes (keypress/text→token) on entry; the parser walks tokens to execute.

From my memory of using a TI-83 in the late 90s, I would not be surprised if the keypad UI injects tokens directly based on your keypress, rather than "tokenizing the text". I seem to recall, for example, you could not position the cursor in the middle of a BASIC token, and if you managed to type out the tokens it would not work; you needed to find the right menu item to inject the correct token.

duskwuff•1 day ago
I can confirm that. On the TI-83, many of the TI-BASIC tokens contained lowercase characters which couldn't be typed at all - you could only type uppercase letters on the keyboard. (There were a few lowercase letters available as tokens for special purposes, but it wasn't a full set.)

Interestingly, you could print tokens in strings - e.g. you could Disp "Disp ".

suburban_strike•1 day ago
The 83+ let you type the full set of lowerchase chars as well, but they used 2x as many bytes per character for storage.
7jjjjjjj•1 day ago
There's actually a hidden lowercase feature, you can use an assembly program to enable it.
jamesfinlayson•about 23 hours ago
Ah makes sense. I remember a younger me trying to open .8xp files back in the day and seeing gibberish, and eventually finding the TI IDE which... felt like it had been written a long time ago (the file select dialog capped the display of file names at 8.3 I think and used ~1 and ~2 etc as "the rest of the file name").
siraben•1 day ago
Yes, to type a TI-BASIC program you have to go through the calculator menus which directly insert the tokenized input into the buffer.

The weird thing about TI-BASIC is how seemingly innocent changes in the input can cause huge performance regressions e.g. https://siraben.github.io/ti84p-re/sub-tibasic-for-paren.htm...

  For(I,1,N
  If 0
  1
  End
is much slower than

  For(I,1,N)
  If 0
  1
  End
asveikau•1 day ago
The open paren being part of the tokens was always weird. I could imagine that doing strange things for the parser; when it sees a close paren it needs to know that several of the preceding tokens may have an open paren even without having a '(' token.
analogpixel•1 day ago
I couldn't tell, is a person doing this? or was this an LLM dissecting it?
siraben•1 day ago
This was made collaboratively by me directing coding agents at the binary, using Ghidra MCP extensively, disassembly and also dynamic analysis with an emulator. I don't have a writeup of the process but it was definitely not fully automatable (I wish though). I might prepare a blog post with transcripts and session history and things I learned along the way.

Broad takeaways:

- Ghidra MCP is not a silver bullet. Lots of opportunities for mis-decoding especially on older instruction sets (e.g. conflating code + data), which requires user input to flag data layout/structs.

- Agents still need a lot of user direction otherwise the RE production is just kind of a random walk. With Z80 it's decent at reading code but I expect that it has much worse performance than reading x86 or ARM for instance. The TI-84+ has a bunch of hardware quirks as well.

- GPT 5.5 is better than Opus 4.8 at RE. Opus 4.8 loves plausible-sounding RE'd logic without any checking. The gold standard is actually dynamically executing the binary and comparing the logic against the prose.

- Maintaining consistency in style and prose is a PITA across the wiki. Hard to reconcile prose <-> code. Can be somewhat mitigated by agent loops.

Was also in discussions with people in the TI calculator programming space who helped provide guidance as well. We previously did not have a catalogue of every subsystem in TI-OS yet alone most subroutines in the OS.

RgrTheShrubbr•1 day ago
Having just recently heard about Ghidra and started using it with Claude. I am absolutely blown away how little resistance it has decompiling old Win95/98 binaries. It's turning into a bit of a hobby of mine to take old software, decompile and find hidden treasures like images or messages.
Chu4eeno•about 8 hours ago
There's this unfortunate common misconception (that LLMs luckily don't tend to share) that reverse engineering is illegal or immoral, when it's both a great source of learning, a necessity for things like interop/preservation, and even has explicit carve-outs in the copyright laws of many/sane countries.

I know my government has a good amount of reverse engineers on the payroll (mostly in the security services).

hedgehog•1 day ago
Do you have plans to generate a buildable version of the sources, and do you know the original implementation language (C?).
siraben•1 day ago
It's highly likely that the original implementation language was assembly. The code is very idiomatic.

Regarding source build, I think reverse engineering it to the point where you can reconstruct the source is possibly legally problematic, so I don't plan to do this, but maybe for certain subsystems like MathPrint (equation display) which was especially fun to RE. I have a PR up for it and it will be live at

https://siraben.github.io/ti84p-re/mathprint

analogpixel•1 day ago
how much have you spent so far on this (for tokens)?
siraben•1 day ago
The plans are heavily subsidized by the AI companies so I didn't end up needing to do API usage or buy another subscription. I have ChatGPT Pro and Claude Code Max.
xkcd-sucks•1 day ago
> Confidence is flagged: .....

> The big picture

> The structural reverse-engineering is comprehensive (every subsystem mapped, both cross-page mechanisms resolved ...

> Confidence summary / open items

Probably an LLM wrote the docs.

> (the GhidraMCP plugin reconnects for interactive work)

Probably LLM+Ghidra for the actual RevEng. Ultimately does it matter if the end product is works though

markus_zhang•1 day ago
I think it’s fine as long as it works. Personally I prefer doing everything manually because that’s where the fun is, but everyone has their own fun.
tadfisher•1 day ago
I love that this project produced so much info, and also I'm disappointed with the prose. You probably didn't mean to explain the typographic nuances of em vs. en-dashes to the reader: https://siraben.github.io/ti84p-re/conventions.html#typograp...
siraben•1 day ago
Thanks for the feedback, fixing.
thwgrw•1 day ago
I am sure you did a lot of hardwork here. But with all the LLM smell in the text, my mind zoned out after few lines. I'd rather read a flawed but human written text than a perfect one written or co-written with an LLM.