RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
72% Positive
Analyzed from 1077 words in the discussion.
Trending Topics
#game#agent#mud#text#similar#mcp#based#testing#state#using

Discussion (17 Comments)Read Original on HackerNews
I hadn't really thought about trying to create a harness for agents to play the full game interactively. I'd love to explore this. If you don't mind, here are a few questions:
1) Correct to assume that I probably need a text-only harness even though my game is text-based already because I do make use of menu selections made via arrow-key-and-enter interactions?
2) Do you have prompt recommendations for the type of feedback you have found to be useful? I would guess in your case, the objectives of the game are more clear than an open-world RPG. What dead ends have you run into? Maybe a variety of approaches would be good? One agent tries to fight everything. Another focuses on gaining and completing as many quests as possible?
3) How bad is the token burn doing this? Any optimization strategies you've employed?
The degree of choice point-to-point in the skill tree is actually quite limited in most circumstances. There are obviously items, like thread of hope, intuitive leap, or inversion of choice items like unnatural instinct which change it slightly.
If the question is path optimization to utilizing these nodes, Path of Building already does a good job. If the question is "what single node will give me the most theoretical power." It also solves that.
That's actually the beauty of Path of Exile as a whole - the different systems works in combination to lead to an outcome. As an example, If you're a life stacking build, finding unique ways to get as many life/strength nodes as possible. That's your gear and your passive tree working in tandem.
Speaking about using AI to optimize characters - not just the skill tree - you'd need to build some pretty sophisticated tools which do not yet exist to make that happen. No AI alone would be able to do it.
So we went down a rabbit hole and decided to do everything purely based on pixels and OS inputs.
We're currently only live for mobile but happy to give you early access to nunu ai for PC if interested. Would love to see how we compare!
1. The single biggest jump in test quality came from giving the agent BOTH source code analysis AND live browser snapshots, not either alone. With code-only the agent hallucinates selectors; with browser-only it misses project conventions. Two MCP servers feeding the same agent — one local file-read, one Playwright in-process — was the architecture that worked.
2. For the browser snapshot tool, returning the raw DOM ate tens of thousands of tokens per call and the agent struggled to navigate it. Swapping to accessibility-tree refs (e1, e2, ...) cut token usage by ~10x and made the agent reliably target the right elements.
3. We avoided Docker-based MCP servers in production (we run on ECS Fargate). The in-process SDK MCP pattern (create_sdk_mcp_server + @tool decorator) keeps the browser handle in scope of the tool definition, which let us attach page.on('console') listeners and have the agent read them via a separate tool. Hard to do that across stdio process boundaries.
For game testing specifically — your text-renderer detail is interesting because it sidesteps the visual-grounding problem (how does the agent verify what it's seeing?). Curious how you'd extend this to a 2D/3D rendered game where the screen state isn't easily textualized.
I'd like `mud_or_moo --state-dir ./tmp/some-mud` which stored most things as plain text or maybe SQLite if really necessary? The core of a MUD which was conceptually similar to a wiki-browser against markdown files (ie: room-001.md => exits => room-002.md) is what i'm angling towards, such that _editing and linking_ felt more comfortable and GUI to a human user.
Once i had the core authorship mcp's working, claude itself created the whole world, including an initial tutorial sequence, combat, etc...