HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
83% Positive
Analyzed from 779 words in the discussion.
Trending Topics
#rules#https#engine#more#llm#play#card#turns#forge#every

Discussion (22 Comments)Read Original on HackerNews
I do still wonder if adapting something like card forge for llm use would result in engaging gameplay with an llm.
https://github.com/Card-Forge/forge
But also with a rules engine, you have to manually go though every step, and pass priority after every action.
I think it makes more sense to let an LLM play magic like a person would. On early turns it is acceptable to say "I play a land and pass" without going through every phase. And you can say "I tap all my land and play this card" without having to use a tool call and agent turn for every land tap.
Also card forge would not let you goldfish a deck. You must have opponents.
Have the LLM submit a proposed move and either advance the game state or reply "permission denied, try again". Probably also log the number of times it happens since attempted violations seems like a valuable signal as well.
[0] https://maxbittker.github.io/runebench/
I'll have to look into that project, but I also have an RTX 5090 and did a lot of testing with Qwen3.6 27B and Gemma 4 31B. I was not able to get it to play legal turns consistently. I had to keep expanding the system prompt and adding rules for edge cases. By the end, the prompt was over 10k tokens, and while it mostly make legal turns, it did not make good turns. And all the heuristics in the prompt degraded the performance and increased the cost for frontier models.
I think I object more to the decks used in testing than the machines' decisions. I do have nit picks though: This hand is quite poor and should be mulliganned: https://app.mtgautodeck.com/public/benchmarks/4bd9955b-ebe1-.... The poor runout reinforces this decision.
This project is cool though, props for making it!
https://github.com/CallumFerguson/mtg-auto-deck/blob/a877c08...
https://mtg.fandom.com/wiki/Judge_Tower
Like how the strawberry example was overtrained for, or how the pelican on a bike started being used in official release posts.
IOW, it's as complicated as possible.