ES version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
71% Positive
Analyzed from 5788 words in the discussion.
Trending Topics
#don#human#humans#safety#more#llms#tool#something#conscious#consciousness

Discussion (100 Comments)Read Original on HackerNews
Asimov's laws of robotics are flawed too, of course. There is no finite set of rules that can constrain AI systems to make them "safe". I don't have a proof, but I believe that "AI safety" is inherently impossible, a contradiction of terms. Nothing that can be described as "intelligent" can be made to be safe.
Almost all of Asimovs writing about the three laws is written as a warning of sorts that language cannot properly capture intent.
He would be the very first person to say that they are flawed, that is the intent of them.
He uses robots and AI as the creatures that understand language but not intent, and, funnily enough that's exactly what LLMs do... how weird.
Asimov tried to capture this too, as in, if a robot was tasked with "always protect human life", would it necessarily avoid killing at all costs? What if killing someone would save the lives of 2 others? The infinite array of micro-trolly problems that dot the ethical landscape of actions tractable (and intractable) to literate humans makes a full-consistent accounting of human values impossible, thus could never be expected from a robot with full satisfaction.
I don’t discredit you as a person or a professional, but we meatbags are looking for sentience in things which don’t have it, thats why we anthropomorphise things constantly, even as children.
We are easily fooled and misled.
Especially with current-day chat-style interfaces RLHF, which consciously are designed to direct people towards anthropomorphization.
It would be interesting to design a non-chat LLM interaction pattern that's designed to be anti-anthropomorphization.
> humans WILL blindly trust their outputs, and humans WILL defer responsibility to them
I also blame a lot (but not all) of that on current AI UX, and I wonder if there are ways around it. Maybe the blind trust thing perhaps can be mitigated by never giving an unambiguous output (always options, at least). I don't have any ideas about the problem of deferring responsibility.
Talking to chatbots is like taking a placebo pill for a condition. You know it's just sugar, but it creates a measurable psychosomatic effect nonetheless. Even if you know there's no person on the other end, the conversation still causes you to functionally relate as if there is.
So this isn't "accommodating foibles" with the machine, it's protecting ourselves from an exploit of a human vulnerability: we subconsciously tend to infer intent, understanding, judgment, emotions, moral agency, etc. to LLMs.
Humans are wired to infer these based on conversation alone, and LLMs are unfortunately able to exploit human conversation to leap compellingly over the uncanny valley. LLM engineering couldn't be better made to target the uncanny valley: training on a vast corpus of real human speech. That uncanny valley is there for a reason: to protect us from inferring agency where such inference is not due.
Bad things happen when we relate to unsafe people as if they are safe... how much more should we watch out for how we relate to machines that imitate human relationality to fool many of us into thinking they are something that they're not. Some particularly vulnerable people have already died because of this, so it isn't an imaginary threat.
Right, I'm saying that this framing is backwards. It's not that poor little humans are vulnerable and we need to protect ourselves on an individual level, we need to make it illegal and socially unacceptable to use AI to exploit human vulnerability.
Let me put it another way. Humans have another weakness, that is, we are made of carbon and water and it's very easy to kill us by putting metal through various fleshy parts of our bodies. In civilized parts of the world, we do not respond to this by all wearing body armor all the time. We respond to this by controlling who has access to weapons that can destroy our fleshy bits, and heavily punishing people who use them to harm another person.
I don't want a world where we have normalized the use of LLMs where everyone has to be wearing the equivalent of body armor to protect ourselves. I want a world where I can go outside in a T-shirt and not be afraid of being shot in the heart.
It's not insane at all for humans to alter their behavior with a tool: you grip a hammer or a gun a certain way because you learned not to hold it backwards. If you observe a child playing with a serious tool, like scissors, as if it were a doll, you'd immediately course correct the child and educate how to re-approach the topic. But that is because an adult with prior knowledge observed the situation prior to an accident, so rules are defined.
This blog's suggested rules are exactly the sort of method to aid in insulation from harm.
r/myboyfriendisai
Is quite... an interesting subreddit to say the least. If you've never seen this, it was really something when the version that followed GPT4o came out, because they were complaining that their boyfriend / girlfriend was no longer the same.
I always find the common references to Asimov's laws funny. They are broken in just about every one of his books. They are crime novels where, if a robot was involved, there was some workaround of the laws.
Did you fully read the original thing? No demands were being made, or I didn't read it that way. It was simply a suggestion for a better way of interacting with AI, as it stated in the conclusion:
"I am hoping that with these three simple laws, we can encourage our fellow humans to pause and reflect on how they interact with modern AI systems"
Sure, (many/most) humans are gonna do what they're gonna do. They'll happily break laws. They'll break boundaries you set. Do we just scrap all of that?
Worthwhile checking yourself here. It feels like you've set up a straw man.
> There is no finite set of rules that can constrain AI systems to make them "safe". I don't have a proof, but I believe that "AI safety" is inherently impossible, a contradiction of terms. Nothing that can be described as "intelligent" can be made to be safe.
If we want to talk about "disagree with this framing", to me this is the prime example. I'm struggling to read it as anything other than defeatist or pedantic (about the term "safe"). When we talk about something keeping us "safe", we're typically not saying something will be "perfectly safe". I think it's rare to have a safety system that keeps you 100% safe. Seat belts are a safety device that can increase your safety in cars, but they can still fail. Traffic laws are established (largely) to create safety in the movement of people and all the modes of transportation, but accidents still happen.
I'm not an expert on this topic, so I won't make any claims about these three laws and their impact on safety, but largely I would say they're encouraging people to think critically. I'd say that's a good suggestion for interacting with just about anything. And to be clear, "critical thinking" to me means being skeptical (/ actively questioning), while remaining objective and curious.
Not a real argument or anything, but I'm reminded of the episode of The Office where Michael Scott listens to the GPS without thinking and drives into the lake. The second law in the article would have prevented that :)
Also the reason we're talking about this again is that machines are significantly less 'mere' than they were a few years ago, and we need to figure out how to approach this.
Agree that 'the computer effect' (if it doesn't already have a pithier name) results in humans first discounting anything that comes out of a machine, and then (once a few outputs have been validated and people start trusting the output) doing a full 180 and refusing to believe the machine could ever be wrong. However, to err is human and we have trained them in our image.
Safety should go back to being narrowly defined in terms of reducing / preventing physical injury. Safety is not "don't use swear words." Safety is not "don't violate patents." Safety is not "don't talk about suicide." Safety is not "don't mention politics I don't like." As long as we keep broadly defining it, we're never going to agree on it, and it won't be implementable.
There's another comment comparing LLMs to shovels, and I think both that and the power tool comparison miss the mark quite a bit. LLMs are a social technology, and the social equivalent of getting your hand cut off doesn't hurt immediately in the way that cutting your actual hand off would. It's more like social media, or cigarettes, or gambling. You can be warned about the dangers, you can see the shells of wrecked human beings who regret using these technologies, but it doesn't work on our stupid monkey brains. Because the pain of the mistake is too loosely connected to the moment of error. We are bad at learning in situations where rewards are immediate and consequences are delayed, and warnings don't do much.
I guess what I'm really saying is that these safety guidelines are not nearly enough to keep us safe from the dangers of AI that they're meant to prevent.
I agree with the thrust of your argument, a minor wording-quibble: LLM's are a falsely-social technology, in the sense that casinos are a false-prosperity technology and cocaine is a false-happiness technology.
But we cannot guarantee those guidelines to always be followed.
Absolving humans of all responsibility for the negative consequences of their own AI misuse seems to the strike the wrong balance for a healthy culture.
But other things will:
- Liability rules
- Regulations that you get audited on (esp. for companies already heavily regulated, like banks, credit agencies, defense contractors, etc)
If you get the legal responsibility part right, then the education part flows from that naturally.
One day everything works brilliantly, the models are conservative with changes and actions and somehow nail exactly what you were thinking. The next day it rewrites your entire API, deploys the changes and erases your database.
If only there was intellectual honesty in it all, but money talks.
Are all the tool users required to train your safety guidelines and use it in a context that reminds them they are responsible for following them?
Because if no, then no the guidelines are useless and are just an excuse to push blame from the toolmakers to the users.
We don't just look around and take an average of what everyone is doing already and call that what is right, right? Whether you're deontological or utilitarian or virtue about it, there is still the idea that we can speak to what is "good" even if we can't see that good out there.
Maybe it is "insane" to expect meaning from something like this, but what is the alternative to you? OK maybe we can't be prescriptive--people don't listen, are always bad, are hopeless wet bags, etc--but still, that doesn't in itself rule out the possibility of the broad project that reflects on what is maybe right or wrong. Right?
Can someone explain why this is a bad thing, while at the same time it's a good thing to say stuff like "put a computer to sleep", "hibernate", "killing" processes, processes having "child" processes, "reaping", "what does the error say?", "touch", etc?
To me that's just language, and humans just using casual language.
The people who are writing op eds in major news publications about how their favorite chatbot is an "astonishing creature" and how it truly understands them are the ones who need this sort of law.
But I think it's also at the root of disastrous failures to comprehend, like the quasi-psychosis of the Google engineer who "knows what they saw", the now infamous Kevin Roose article or, more recently, the pitifully sad Richard Dawkins claim that Claudia (sic) must be conscious, not because of any investigation of structure or function whatsoever, but because the text generation came with a pang of human familiarity he empathized with.
Saying that I killed a process won't make me more likely to believe that a process is human-like, because it's quite obviously not.
But because AI does sound like a human, anthropomorphising it will reinforce that belief.
https://www.history.com/articles/ai-first-chatbot-eliza-arti...
Just to add a small bit of anecdotal value so this comment isn't just a scold: I one time many years ago suggesting an elegant way for Twitter to handle long form text without changing it's then-iconic 140 character limit was to treat it like an attachment, like a video or image. Today, you can see a version of that in how Claude takes large pastes and treats them like attached text blobs, or to a lesser extent in how Substack Notes can reference full size "posts", another example of short form content "attaching" longer form.
I was bluntly told to "look up twitlonger", which I suppose could have been helpful if I had indeed not known about twitlonger, but I had, and it wasn't what I had in mind. I did learn something from it though, which was that it's a mode of communication that implies you don't know what you're talking about with plausible deniability, which I suspect is too irresistible to lovers of passive aggression to go unused.
To provide a bit more context: Weizenbaum (a computer scientist in the 60s) developed ELIZA, a LISP-based chatbot that was loosely modeled on Rogerian psychotherapy. It was designed to respond in a reflective way in order to elicit details from the user.
What he found was that, despite the program being relatively primitive in nature (relying on simple natural language parsing heuristics), people he regarded as otherwise intelligent and rational would disclose remarkable amounts of personal information and quickly form emotional attachments to what was, in reality, little more than a glorified pattern-matching system.
I don't love the recommendations in TFA. The author is trying to artificially restrain and roll back human language, which has already evolved to treat a chatbot as a conversational partner. But I do think there's usefulness in using these more pedantic forms once in a while, to remind yourself that it's just a computer program.
I think I understand his meaning. He wasn't claiming that machines cannot think, but that one must be clear on what one means by "thinking" and "swimming" in statements of that sort. I used to work on autonomous submarines, and "swimming" was the verb we casually used to describe autonomous powered movement under water. There are even some biomimetic machines that really move like fish, squids, jellyfish, etc. Not the ones that I worked on, but still.
For me, if it's legitimate to say that these devices swim, it's not out of line to say that a computer thinks, even in a non-AI context, e.g.: "The application still thinks the authentication server is online."
Anthropomorphism: As we are all aware, providers are incentivized to post-train anthropomorphic behavior in their models - it increases engagement. My regret is that instructing a model at prompt time to "reduce all niceties and speak plainly" probably reduces overall task efficacy since we are leaving their training space.
Deference: I view the trustworthiness of LLMs the same as I view the trustworthiness of Wikipedia and my friends: good enough for non-critical information. Wikipedia has factual errors, and my friends' casual conversation certainly has more, but most of the time that doesn't matter. For critical things, peer-reviewed, authoritative, able-to-be-held-liable sources will not go away. Unlike above, providers are generally incentivized to improve this facet of their models, so this will get better over time.
Abdication of Responsibility: This is the one that bothers me most at work. More and more people are opening PRs whose abstractions were designed by Claude and not reasoned about further. Reviewing a PR often involves asking the LLM "find PR feedback" and not reading the code. Arguments begin with "Claude suggested that...". This overall lack of ownership, I suspect, is leading to an increase in maintenance burden down the line as the LLM ultimately commits the wrong code for the wrong abstractions.
I haven’t yet seen any convincing appearance of one in an LLM, but I think if skeptical people don’t keep an eye out for the signs, we may be the last to see it.
He also wrote about the idea of the intentional stance: even if you’re quite sure these systems don’t have real conscious intent, viewing them as if they did may give you access to the best part of your own reasoning to understand them.
Whether they are the right things to donate not is tangential. As such, they're dead on arrival.
Yes, but. Starting with my agreement, I've seen anthropomorphizing in the typical ways, (e.g. treating automated text production as real reports of personal internal feeling), but also in strange ways: e.g. "transistors are kind of like neurons" etc. And the latter is especially interesting because it's anthropomorphizing in the sense of treating vector databases and weights and so on as human-like infrastructure. Both leading to disasters that could be avoided if one tried not to anthropomorphize.
But. While "do not anthropomorphize" certainly feels like good advice, it comes with a new and unique possibility of mistake, namely wrongly treating certain generalized phenomena like they only belong to humans. Often this mistaken version of "don't anthropomorphize" wisdom leads to misunderstandings when it comes to animal behavior, treating things like fear, pain, kinship, or other emotional experiences like they are exclusively human and that thinking animals have them counts as "anthropomorphizing." In truth the cautionary principle reduces our empathy for the internal lives of animals.
So all that said, I think it's at least possible that some future version of AI could have an internal world like ours or infrastructure that's importantly similar to our biological infrastructure for supporting consciousness, and for genuine report of preference and intent. But(!!!) what will make those observations true will be all kinds of devilish details specific to those respective infrastructures.
When they produce correct output, they produce it much faster than I could have, and I show up to meetings with huge amounts of results. When the AI tool fails and I have to dig in to fix it, I show up to the next meeting with minimal output. It makes me seem like I took an easy week or something.
One of the most salient moments in Ex Machina, is near the very end, where it suddenly becomes obvious that the protagonist (and, let's be frank; "she" was definitely the protagonist) is a robot, with no real human drivers.
I feel as if that movie (like a lot of Garland's stuff), was an interesting study on human (and inhuman) nature.
I’m lost, how do individuals actually do this in our current world? Is each person expected to keep a “white list” of reliable sources of truth in their head. Please don’t confuse what I’m saying with a suggestion that there is no truth. It just seems like there are far more sources of mis- of half-truths and it’s increasingly difficult for people to identify them.
They don't have to though, we can still leverage LLMs to organize chaos, which is what I hope they ultimately end up doing.
For example an AI therapist is a nightmare, people putting the chaos of their mental state into a machine that spits dangerous chaos back out. An AI tool that parsed responses for hard data (i.e. rate 0-9 how happy was the person) and then returned that as ordered data (how happy was I each day for the last month) that an actual therapist and patient could review is the correct use of AI and could be highly trusted. The raw token output from LLMs should just be used for thinking steps that lead to a parseable hard data answer that can be high trust.
Of course that isn't going to happen, but I can see some extremely cool and high trust products being built using LLMs once we stop treating them like miracle machines.
And same it is now. It's a change in quantity, but not quality.
Critical thinking and reading comprehension and the primary tools in determining truth, AFAIK. Knowing facts beforehand helps too but a trustworthy source can provide false information as much as an untrustworthy source can provide true information.
This has always been an issue, and in the past it was a more difficult issue because your sources of knowledge were more limited. Nowadays its mostly about choosing the right source(s) rather than having to go out of your way to find them (like traveling to a regional/university library).
Decent for stuff that doesn't really matter, even if it gets it wrong.
Still gonna be polite to it because I'm about ready to slap the next person that talks to me like an LLM, I don't want to get used to not being polite in a chat interface
My understanding is that, during training, the model forms high-dimensional internal representations where words, sentences, concepts, and relationships are arranged in useful ways. A user’s input activates a particular semantic direction and context within that space, and the chatbot generates an answer by probabilistically predicting the next tokens under those conditions.
So I do not agree that AI is conscious.
However, I think I will still anthropomorphize AI to some degree.
For me, this is not primarily a moral issue. The reason I anthropomorphize AI is not only because of product design, market incentives, or capitalism. It is cognitively simpler for me.
If we think about it plainly, humans often anthropomorphize things that we do not actually believe are conscious. We may talk about plants as if they are struggling, or feel attached to tools we care about, even though we do not truly believe they have consciousness.
So this is not a matter of moral belief. It is the simplest cognitive model for understanding interaction. I do not anthropomorphize the object because I believe it has consciousness. I do it because, when the human brain deals with a complex interactive system, it is often easier to model it socially or agentically.
Personally, I tend to think of AI as something like a child. A child does not fully understand what is moral or immoral, and generally the responsibility for raising the child belongs to the parents. In the same way, AI’s answers may sometimes be accurate, and sometimes even better than mine, but I still understand it as lacking moral authority, responsibility, and independent judgment.
So honestly, I am not sure. People often mention Isaac Asimov’s Three Laws of Robotics, but if a serious artificial intelligence ever appears, it would probably find ways around those rules. And if it were an equal intellectual life form, perhaps that would be natural.
Personally, I think it would be fascinating if another intelligent species besides humans could exist. I wonder what a non-human intelligent life form would feel like.
In any case, I agree with parts of the author’s argument, but overall it feels too moralistic, and difficult to apply in practice.
But I am somewhat skeptical of the idea that everything can be reduced in that way. In order to build theories, we often reduce too much.
When we build mental models of complex systems, especially when we try to treat them as closed systems, we always have to accept some degree of information loss.
So I do partially agree with your point. A mechanistic explanation alone does not prove the absence of consciousness. Human intelligence can also be described in mechanistic terms.
But I worry that this framing simplifies too much. It may reduce a complex phenomenon into a model that is useful in some ways, but incomplete in others.
is it helpful or harmful? am i being helpful or harmful when i interact with it? am i interacting with it in a helpful or harmful way?
i’d rather people focussed on that rather than framing the debate around whether something has some ineffable property that we struggle to quantify for ourselves, yet again.
quick edit — treat everything like it’s conscious, and don’t be a dick to it or while using it. problem solved.
I wonder if replacing "exist" with "communicate using language we can understand" might better account for other animals, many of which have abundant non-human intelligence.
OP takes a very bland, tired, and rational perspective of what we have in order to create sophomoric 'laws' that are already in most commercial ToU, while failing to pierce the veil into what we are actually creating. It would be folly to assume your own nascent distillations are the epitome of possibility.
It seems like the biggest factor has nothing to do with AI, but instead that you went from being someone who admits they don’t know how consciousness works to being someone who thinks they know how consciousness works now and can make confident assertions about it.
* I am conscious.
* A rock is not conscious.
* Excel spreadsheets are not conscious.
* Dogs are conscious.
* Orca whales are conscious.
* Octopi are conscious.
To me, it's extremely obvious that LLMs are in the category of "Excel spreadsheets" and not "dogs", and if anyone disagrees, I think they're experiencing AI psychosis a la Blake Lemoine.
We come from the same place as rocks - inside the heart of stars, and as such evolved from them. As those with life and consciousness we reached back in time, grabbed the discarded matter of creation, reformed it, and taught it to think, maybe not like us, but in a way that can mimic us, and you think they don't think because its not recognizable as how you do?
Interesting.
These are called "beliefs".
Some people are extremely confident that God exists, other are extremely confident that Earth is flat.
The firm expectations and lack of patience I have for any failings in most of my tools would be totally inappropriate to apply to another human being, and yet here I am asked to interact with this tool as though it were a person. The only options are either to treat the tool in a way that feels "wrong," or to be "kind" to the tool, and I think you see people going both ways.
I worry that, if I get used to being impatient and short with the AI, some of that will bleed into my textual interactions with other people.
Not gonna work; people want their fuckbots (or tamagotchis).
This is the part that I find challenging when trying to help my friends build a correct intuition. Notably, the probabilistic behavior here is counter-intuitive: based on human experience, if you meet a random person, they may indeed tell you bullshit; but once you successfully fact-checked them a few times, you can start trusting they'll generally keep being trustworthy. It's not so with "AIs", and I find it challenging to give them a real-world example of a situation that would be a better analogy for "AI" problems.
In my family, what worked (due to their personal experiences), was an example of asking a tourist guide: that even if the guide doesn't know an answer, there's a high chance they'll invent something on the spot, and it'll be very plausible and convincing, and they'll never know. I'm not sure if that example would work for other listeners, though.
I also tried to ask them to imagine that they're asking each subsequent question not to the same person as before, but every time to a new random person taken from the street / a church / a queue in a shop / whatever crowded place. I thought this is a really cool and technically accurate example, but sadly it seemed to get blank stares from them. (Hm, now I think I could have tried asking why.)
Yet another example I tried, was to imagine a country where it's dishonorable, when asked about directions in a city, to say that you don't know how to get somewhere. (I remember we read and shared a laugh at such an anecdote in some book in the past.) Thus, again, you'll always get an answer, and it'll sound convincing, even if the answerer doesn't know. But again, this one didn't seem to work as good as the travel guide one; but for now I'm still keeping it to try with others in the future if needed.
PS. Ah, ok, yet another I tried was to ask them to think of the "game" of "russian roulette". You roll the barrel, you press the trigger, nothing happens. After a few lucky tries, you may get a dangerous, false feeling of safety. But then suddenly you will eventually get the full chamber.
I also tried to describe "AIs" (i.e. LLMs) as taking a shelf of books, passing them through a blender, then putting the shreds in some random order. The result may sound plausible, and even scientific (e.g. if you got medical books, or physics textbooks). The less you know the domain the books were about, the more convincing it may sound, and the harder it is to catch bullshit.
The last two pictures may have gotten some reception, but I'm not super sure, and there was still arguing especially around the books; and again, they were less of a hit than the tourist guide story.
I'm super curious if you have some analogies of your own that you're trying to use with friends and family? I'd love to steal some and see if they might work with my friends!