Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
67% Positive
Analyzed from 3056 words in the discussion.
Trending Topics
#robot#model#models#robotics#robots#gemini#more#land#don#thing
Discussion Sentiment
Analyzed from 3056 words in the discussion.
Trending Topics
Discussion (79 Comments)Read Original on HackerNews
The gauge-reading example here is great, but in reality of course having the system synthesize that Python script, run the CV tasks, come back with the answer etc. is currently quite slow.
Once things go much faster, you can also start to use image generation to have models extrapolate possible futures from photos they take, and then describe them back to themselves and make decisions based on that, loops like this. I think the assumption is that our brains do similar things unconsciously, before we integrate into our conscious conception of mind.
I'm really curious what things we could build if we had 100x or 1000x inference throughput.
Just want to pedantically point out that we're not at our evolutionary endpoint yet. Humans are still evolving!
The planing ahead thing through simulation for example seems to be a very good tool in neuronal network based architectures.
A few robot legs and arms, big battery, off-the-shelf GPU. Solar panels.
Prompt: "Take care of all this land within its limits and grow some veggies."
Or it could turn out to look like satayoma (Japanese peasant forests) or it could be more similar to the crop rotation that was traditionally practiced in many parts of Central Africa where roots were important.
In Russia before the Soviets forced "modern scientific agriculture" on peasants to modernize, they practiced things like contour farming (where they interplanted rows of crops against the contours of the land to slow water down) and maslins (where they intermixed multiple varieties of wheat and barleys in the same patch). Now contour farming are an active area of research for their ability to prevent topsoil loss and build soil health while maslins provide superior yield stability and use little to no pesticides.
That's not even getting into the over 40-120,000 varieties of rice we've documented. Most of which are hyper adapted to a very specific location—often even a single village.
My point is there is no one way to take care of a plot of land. It's all relative to a number of factors beyond just the abiotic characteristics of the land itself. Your goals and intentions matter and you will always find localized unique adaptations.
BD sat back on traditional programming/light ML techniques for ages whilst transformers went wild and it's only now that they're like "oh shit".
Hence the partnership with Google; BD lacks the capabilities otherwise. I bet their internal marketing departments did a bit of hand shaking to spin this piece as a favour for Hyundai/BD. Because from Google's (and our) perspectives - reading a gauge etc isn't that impressive and multimodal transformers solved that years ago, OpenCV many years before that also. But to BD it's impressive/a desperate grasp of "we swear we're using modern ML now! Yes our robot dances were sequenced and took dozens of takes but now we'll start doing it for real, we swear!"
Hyundai now owns Boston Dynamics and is pushing to get the robots into their factories.
https://colinator.github.io/Ariel/post1.html
It was about Googles PaLM-E evolution and progress. It basically has two models one which controls the robot, the other is a llm and they are combined together in some attention layer.
Anyway, cool.
Of course this is for counting animal legs while giving coordinates and reading analog clocks. Not coding or or solving puzzles. I imagine the image performance to model weight of this model is very high.
And, I was disappointed to see that pointing was just giving x,y coords. I wanted to see robots pointing at stuff.
So we're going to have some engineers specify suitable digital replacements given the process/environment/safety requirements. We'll procure those (noting that an industrial digital pressure transducer can easily push up towards $10k), schedule a plant shutdown (how much does that cost?), then pay a pipefitter/boilermaker to replace the old gauges with new pressure transmitters (do you need a hot work permit for that? Did you get your engineer to sign that off?). Then, your controls sparky has to find a way to route a drop back to your marshalling cabinet for connection into your fieldbus/HART/modbus/whatever network (do you have one of those?) so that your SCADA system can talk to it (do you have one of those?).
Obviously it's not really an apples-to-apples comparison, but I think the costs involved with making "simple" changes in industrial settings are easy to wildly underestimate.
Dumb silicon is so super cheap now, just look to nfc etc, 1c microcontrollers. We can litter our world with sensors.
Which I would love to see - but I'm also not discounting the usefulness of any robot just being able to read something we can read and vice versa.
If it ain't broke don't fix it — pointing a cheap camera at it with some cloud compute will suffice.
Like your washing machine reporting its state, knowing if sun is out, running only when there is a lot of sun.
Your bsement heater sending out its stats.
And your industry machine doing the same thing.
Then you realize that we are talking about industry 4.0 for a decade now, everything IoT is either closed source or always costs extra and working together? hahahaahha...
I don't know why we can't have nice things it would be that easy :|
So there might be awesome progress behind the scenes, just not ready for the general public.
That's a bit exaggerated, no? Early roombas would get tangled in socks, drag pet poop all over the floor, break glass stuff and so on, and yet the market accepted that, evolved, and now we have plenty of cleaning robots from various companies, including cheap spying ones from china.
I actually think that there's a lot of value in being the first to deploy bots into homes, even if they aren't perfect. The amount of data you'd collect is invaluable, and by the looks of it, can't be synth generated in a lab.
I think the "safer" option is still the "bring them to factories first, offices next and homes last", but anyway I'm sure someone will jump straight to home deployments.
My non-AI dishwasher can't even always keep the water inside. Nothing is perfect.
Depending on what the rate of breaking dishes is, this would be a massive improvement on me, a human being, since I break a really important dish I needed to use like ~2x per month on average.
Not here to shame you for it, for the record.
That's me ;_;
My concern with a household robot is not the dishwasher but the tv screen, the glas door, glas table, animals (fish/aquarium) etc. the robot might walk through, touch through or fall onto.
VLA models essentially take a webcam screenshot + some text (think "put the red block in the right box") and output motor control instructions to achieve that.
Note: "Gemini Robotics-ER" is not a VLA, though Gemini does have a VLA model too: "Gemini Robotics".
A demo: https://www.youtube.com/watch?v=DeBLc2D6bvg
The safety guidelines are interesting, they treat them as a goal that they are aspiring to achieve, which seems realistic. It’s not quite ready for prime time yet.
I'm all for the task reasoning and the multi-view recognition, based on relevant points. I'm very uncomfortable with the loose world "understanding".
The fault model I see is that e.g., this "visual understanding" will get things mostly right: enough to build and even deliver products. However, these are only probabilistic guarantees based on training sets, and those are unlikely to survive contact with a complex interactive world, particularly since robots are often repurposed as tasks change.
So it's a kind of moral-product-hazard: it delivers initial results but delays risk to later, so product developers will have incentives to build and leave users holding the bag. (Indeed: users are responsible for integration risks anyway.)
It hacks our assumptions: we think that you can take an MVP and productize it, but in this case, you'll never backfit the model to conform to the physics in a reliable way. I doubt there's any way to harness Gemini to depend on a physics model, so we'll end up with mostly-working sunk investments out in the market - slop robots so cheap that tight ones can't survive.
Nothing was reported in Google status page, not even the CLI is responding, it’s just left there waiting for an answer that will never arrive even after 10 minutes.
Be careful, because you can easily overpay out the ass for "robot kits" online.
I haven't tested the Lekivi specifically, but lots of SO-ARMs and a custom built lekivi-like robot. I think some people have had some issues with the rear omni wheel when moving forward but I haven't seen that myself.
LLMs are really good at the sort of tasks that have been missing from robotics: understanding, reasoning, planning etc, so we'll likely see much more use of them in various robotics applications. I guess the main question right now is:
- who sends in the various fine-motor commands. The answer most labs/researchers have is "a smaller diffusion model", so the LLM acts as a planner, then a smaller faster diffusion model controls the actual motors. I suspect in many cases you can get away with the equivalent of a tool call - the LLM simply calls out a particular subroutine, like "go forward 1m" or "tilt camera right"
- what do you do about memory? All the models are either purely reactive or take a very small slice of history and use that as part of the input, so they all need some type of memory/state management system to actually allow them to work on a task for more than a little while. It's not clear to me whether this will be standardized and become part of models themselves, or everyone will just do their own thing.
As for memory: my approach is to give the robot a python repl and, basically, a file system - the LLM can write modules, poke at the robot via interactive python, etc.
Basically, the LLM becomes a robot programmer, writing code in real-time.