RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
84% Positive
Analyzed from 1663 words in the discussion.
Trending Topics
#more#marketing#mythos#better#code#curl#hype#find#llms#used

Discussion (51 Comments)Read Original on HackerNews
"My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing."
It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.
More seriously, so far I haven’t seen much indication that Mythos is more than Opus with a security focused code analysis harness. That said, the fact it can find these bugs in an automated fashion is the more important takeaway outside of the hype.
I’m curious what the error rate is on the detections, because none of that means much if it is wrong 90% of the time and we are only hearing about the examples that are useful marketing.
The other alternative is that Curl is simply secure enough that there was far less to find than in other projects.
About as subtle as a personal injury lawyer's billboard
It's almost Trump-esque - "this model will change everything forever; we are doomed; we are saved; we will all be fired; we will all be rich", etc
They need the hype to pay off way more than we do. So many of us who still write code directly stand to lose nothing of our capabilities if the marketing claims cannot hold water.
I'm surprised you say that because it is all over Hacker News. Every single post is co-opted into promoting AI. Try finding a submission with fifty points or more than doesn't have AI or LLM's mentioned somewhere in the comments.
This. Well done by Antropic.
It even reached the CISO of my small semi-government org in the Netherlands, who slightly panicked at the announced 'tsunami' of vulnerabilities that was coming with Mythos.
Got us some more money and priority with the board, though.
Never waste a good marketing scare.
I'm not sure that follows. As noted, curl was already analyzed to death with every tool available; most software isn't at that level.
If so, it would still follow. "Most software" isn't analyzed as much as curl, by either other tooling or other models, that might well find close to the same as Mythos did. As such, Mythos then isn't especially/particularly dangerous.
Also curl in this regard is a open source project, relativly small but critical, well known and used everywhere. Besides image libraries, tools like curl or sudo, su, passwd, etc. would also be my first try.
Mythos is still not known at all what it can do. What does it mean from cost and benchmark pov to have a 10 Trillion parameter model?
Nonetheless, the fact that LLMs got significant better in finding this, better than humans, started to happen half a year ago? so at one point we need to address the elefant in the room and state that today you need to do security scanning additional with LLMs. You need to take this serious.
In worst case, use Anthropics marketing to state that its a must now and something changed.
*rools eyes* regular static analyzers also have been "better than humans" for decades, being better than a human at a specific mechanical task really doesn't mean much. The interesting new thing is the type of potential "fuzzy bugs" described in the article that LLMs are able to identify (a comment not matching the code it describes, uncommon usage of a 3rd party library, mismatch of code and a protocol it implements, or often just generally weird looking code somebody should have a closer look at... this closes a gap in the traditional debugging toolboxes, but shouldn't replace them)
My mind still cannot understand the quality and refinement that's gone into cURL. It really is the perfect example of something done so right, that people barely think twice about.
However in the days of race to bottom, offshoring for penies, and now LLM powered code generation, this is a quality most companies won't care unless there is liability in place.
[0] https://tsz.dev
https://news.ycombinator.com/item?id=48072916
The point wasn't actual cross-platform portability even though that was a nice side effect. It was to flush out all the weird edge cases.
Edges like security flaws. Buffer overflows are usually platform specific. There are plenty of other ways to find these issues but simply recompiling for a different platform surfaces all sorts of issues.
> It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway.
His expertise I think would elevate the results quite a bit. Although if he never uses LLMs, which it reads like he doesn't, I guess it might backfire just as well. Prompting style (still?) does matter after all, certainly in my experience anyways.
> using these tools interactively
I did read the article. It seems to me they're using LLMs in a prepared manner instead, as mere scanners that produce reports.
Typo, or is there a spoof I should go read?
Does it say anything else? Just 'Aaaarggghhhh'?
Source: voice typing this with Swedish vocal chords, and only had to correct "different lives" to "differently", and add /[^\w\s]/.
I also thought they were contending the word count before noticing. Even remarked how I find this a weird metric, given that code is not prose [0], but then I deleted that once I picked up on what's going on.
[0] comparing the output of `wc -w` with the word counts of books I'm reasonably sure will be super off