Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
75% Positive
Analyzed from 2103 words in the discussion.
Trending Topics
#world#models#games#video#game#model#more#generated#intentionality#don
Discussion Sentiment
Analyzed from 2103 words in the discussion.
Trending Topics
Discussion (52 Comments)Read Original on HackerNews
I'm not a game developer myself, but some of my favorite games carry a deep sense of intentionality. For instance, there is typically not a single item misplaced in a FromSoftware game (or, for instance, Lies of P -- more recently). Almost every object is placed intentionally.
Games which lack this intentionality often feel dead in contrast. You run into experiences which break immersion, or pull you out of the experience that the developer is trying to convey to you.
It's difficult for me to imagine world models getting to a place where this sort of intentionality is captured. The best frontier LLMs fail to do this in writing (all the time), and even in code, and the surface of experiences for those mediums often feel "smaller" than the user interaction profile of a video game.
It's not clear how these world models could be used modularly by humans hoping to develop intentional experiences? I don't know much about their usage (LLMs are somewhat modular: they can produce text, humans can work on it, other LLMs can work on it). Is the same true for the video output here?
All this to say, I'm impressed with these world models, but similar to LLMs with writing, it's not really clear what it is that we are building towards? We are able to create less satisfying, less humane experiences faster? Perhaps the most immediate benefit is the ability for robotic systems to simulate actions (by conjuring a world, and imagining the implications).
In general, I have the feeling that we are hurtling towards a world with less intentionality behind all the things we experience. Everything becomes impersonal, more noisy, etc.
Making a world internally consistent by explicit placement gets harder as you increase in scale. When internal consistency is a factor impacting quality, there is a scale at which generated content eventually becomes the higher quality solution.
Secondly, when generating content with AI, the same rules around carelessness apply. There are certainly generative AI tools out there that offer few options when it comes to composing what you want, that is not a necessary part of AI, some of it is because people are wanting rudimentary interfaces, some of it is that the generators are sufficiently new that the control mechanisms are limited because they are focused upon doing something at all before doing it highly controlled, in some ways the problem is that things are new enough that it can be hard to describe what is desirable controllability, making the generator to see what people would like it to be able to do is, I think, a reasonable path to follow prior to creating the control that people want. Part of it is also that there _are_ tools that give a high level of control over what is generated but far fewer people get to see them. There are ways to control styles, object placement, camera motions, scene compositions, etc. The more specialised you get, the smaller the subset of people who need that specific control.
I think AI can make things possible for people who could not have done so without them, but it's still going to take care to make something special.
That's a pretty specific and one-sided example. There are tons of good games that don't rely on elaborate item placement (e.g. many Bethesda games are great because most items are useless decorations, they broke that rule in recent games, giving the purpose to clutter, and it made them a lot worse). There are tons of good games not relying on this intentionality at all, they're either literally random cool ideas thrown at the wall, or even procedurally generated.
One aspect of intentionality is that there’ll be a narrative payoff when you find something you find interesting. In videogames, the world is mostly pre-designed, so the designer has to predict what you’ll be interested in for the most part (In pen and paper RPGs, this is usually done better, because the human dungeon master/DM can plan ahead, but also improvise a payoff or modify the plot between sessions). If there was a world model generated game world, I guess the model would have to be “smart” though to setup and execute those payoffs.
An advantage that the world model would have (and shares with a good human DM) is that everything is an interactable, and the players get to pick what they think is interesting. If everything is improv with a loose skeleton around it, you don’t have to predict as far out. I think world model generated games, if they even become a thing, will be quite a bit worse than conventionally designed ones for a long time (improv can be quite shallow!) but have a lot of potential if they work out.
FromSoft is an interesting example. They make the game more believable by having extremely missable quests, just, most of them don’t block progress through the game, and you usually stumble across enough side quests naturally (although IMO the density was too low in Elden Ring, their system showed a bit of weakness in the less-guided context). The plot is pretty vague, but the vibes tell enough of a story that you don’t really mind. It’s sort of improv/pen-and-paper but the player’s imagination is doing the job of the DM.
Where you look for an intentionally evoked experience authored by a game designer, I am looking for an unexplored world unfolding before me filled with emergent and unique phenomena that perhaps no one and not even the game designer has seen before.
These world models are key for robotic and coherence in video generation.
Give a world model images of a factory, the robot now can simulate tasks and do the best result.
Give a world model images/context etc. and it can generate a coherent world for video generation.
What this world model system might be able to do for us in regards of gaming or virtual reality: Either simulate 'old' environments like the house of your grandparents (gaussian splatting but interactive) or potential new ones like a house, kitchen, remodeling.
It can also be a very interesting easy to approach VR environment were you can start building your world with voice. That would be very intentional. After all world building is not necessarily connected to being able to generate 3d assets. Just because you need to go this route today, doesn't mean you have to do this tomorrow.
Yes, we haven't gone that far with creating consciousness yet, but there is gonna be a lot of money around neural computing devices for consumers in the coming decades, so that will speed up knowing what sense data you need for consciousness.
for example, I am 100% certain that ANY model could write a better Dragon Age sequel than the rotting corpse of Bioware did, because only humans can despise their audience and their source material. an LLM would dutifully attempt to produce more of the thing rather than 're-imagine' the thing for 'the modern audience'.
Many of the most popular games in the past decade are procedurally generated and have nothing “intentionally” placed (apart from tuning/tweaking the balance of the seeding algorithms).
I think you underestimate the intentionality that goes into developing procedural generation. Something like Dwarf Fortress isn't "place objects randomly" - it is layers upon layers of carefully crafted systems that build upon each other to produce specific patterns of outcome
I guess what I'm saying is: Couldn't a world model with targeted training and thoughtfully tuned system prompts be directionally similar to the layered systems to produce specific patterns of outcome?
Are video game developers using these systems in their workflows? Would love to learn more!
The combination of "many", "most popular", and "nothing" is overstating it by a wide margin but for example the majority of the vegetation in games as far back as oblivion was procedurally placed.
Everyone is right to be skeptical of this coming from a 2.8B model. Weights or it didn't happen.
Also, will this run on RTX 4090 with 24GB memory?
Thank you!
The 'Refiner' effect seems to do the opposite if the examples are representative as in all cases the 1-st stage images look better than the 'refined' ones. Less clutter, more realistic, less 'cowbell' for those who know the phrase.
There’s no doubt they’re technically impressive, but what does one do with it?
This one is probably too small to be useful for that, and not diverse enough? But I could be wrong.
Imagine playing Read Dead Redemption 2 and you attempt to ride your horse from Saint Denis to Valentine and Valentine no longer exists, or is a completely different town located half a mile off from where it was originally.
I just don't see how this would work...
In this case, what looks interesting is the one minute coherence and the massive speedup - they claim 36x over open models with similar capabilities. You can tell they aren’t aiming for state of the art visuals — looks very SD 1.5 in terms of the output quality.
Seedance 2.0, Kling 3 are regarded the best closed source video models we have. I have subscribed to a few AI video subreddits, consensus atm is they are good for anything but long form videos with humans.
No surprises that we're very good at spotting even the most subtle differences while looking at other people.