ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
33% Positive
Analyzed from 1026 words in the discussion.
Trending Topics
#note#don#more#right#real#llm#models#where#something#going

Discussion (18 Comments)Read Original on HackerNews
As an example, creating recipes with Claude Opus based on flavor profiles and preferences feels magical, right up until the point at which it can't accurately convert between tablespoons and teaspoons. It's like the point in the movie where a character is acting nearly right but something is a bit off and then it turns out they're a zombie and going to try to eat your brain. This note taking example feels similar. It nearly works in some pretty impressive ways and then fails at the important details in a way that something able to do the things AI can allegedly do really shouldn't.
It's these failures that make me more and more convinced that while current generation AI can do some pretty cool things if you manage it right, we're not actually on the right track to achieve real intelligence. The persistence of these incredibly basic failure modes even as models advance makes it fairly obvious that continued advancement isn't going to actually address those problems.
So instead of an LLM trying to answer a math or reason question by finding a statistical match with other similar groups of words it found on 4chan and the all in podcast and a terrible recipe for soup written by a terrible cook, it can use a calculator when it needs a calculator answer.
You ask an LLM "What's wrong with your answer?" and you get pretty good results.
Real intelligence means you have to say "I don't know" when you don't know, or ask for help, or even just saying you refuse to help with the subtext being you don't want to appear stupid.
The models could ostensibly do this when it has low confidence in it's own results but they don't. What I don't know if it's because it would be very computationally difficult or it would harm the reputation of the companies charging a good sum to use them.
They don't like hearing "I don't know"
In other cases, I have seen it miss the mark when the discussion is not very linear. For example, if I am going back and forth with the SOC team about their response to a recent alert/incident. It'll get the gist of it right, but if you're relying on it for accuracy, holy hell does it miss the mark.
I can see the LLM take great notes for that initial nurse visit when you're at the hospital: summarize your main issue, weight, height, recent changes, etc. I would not trust it when it comes to a detailed and technical back-and-forth with the doctor. I would think for compliance reasons hospitals would not want to alter the records and only go by transcripts, but what do I know...
Not mentioned, as far as I can see: the comparative human mistake rate.
Having seen a lot of medical records, 60% sounds about normal lol.
(And if you already see 60% error rates in standard, pre-AI note taking, how does that not translate into many deaths and injury? At least one country's health system in the world should have caught that)
Presumably most doctor's visits are a one-problem-one-solution-one-doctor type of thing. Done deal, notes are never read again. So that alone would explain why high rates of errors doesn't result in injuries or death very often.
Any injury or death caused by poor notes would have to occur when mistakes are done if you're followed for a serious chronic condition, or if you're handled by a team where effective communication is required.
Because most of it is just written down and never looked at again until there’s a lawsuit or something.
I do wonder if people would be pushing AI so hard if their organizations were planning to hold them accountable for mistakes the AI made
I bet if that were the case we'd see a lot slower rollout of AI systems