Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

79% Positive

Analyzed from 869 words in the discussion.

Trending Topics

#marketing#more#curl#used#still#llms#model#tools#done#mythos

Discussion (46 Comments)Read Original on HackerNews

rzmmm•about 2 hours ago
Quote:

"My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing."

It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.

greendude29•about 2 hours ago
I'd go out and say the marketing is not subtle. The hype and fanboys/girls are so in line with the marketing that any level of skepticism is seen a an act of defection, but if you look at the words, hyperbole and volume that is used, there is nothing subtle about it.

It's almost Trump-esque - "this model will change everything forever; we are doomed; we are saved; we will all be fired; we will all be rich", etc

apexalpha•about 1 hour ago
> An amazingly successful marketing stunt for sure.

This. Well done by Antropic.

It even reached the CISO of my small semi-government org in the Netherlands, who slightly panicked at the announced 'tsunami' of vulnerabilities that was coming with Mythos.

Got us some more money and priority with the board, though.

Never waste a good marketing scare.

yjftsjthsd-h•about 2 hours ago
> Not particularly “dangerous”

I'm not sure that follows. As noted, curl was already analyzed to death with every tool available; most software isn't at that level.

bilekas•about 2 hours ago
I don't think I understand what you mean, the "not particularly dangerous" comment was in relation to the vulnerability that was found right ? Surely they would know what constitutes a lower severity level.
Ekaros•about 2 hours ago
My guess is that it is in category of "you are holding it wrong". Still worth fixing, but requires very specific user input for example. Or very weird scenario. Or in some less used protocol or flag combination.
AntiUSAbah•about 2 hours ago
There is always marketing involved and people should be able to put marketing into perspective.

Also curl in this regard is a open source project, relativly small but critical, well known and used everywhere. Besides image libraries, tools like curl or sudo, su, passwd, etc. would also be my first try.

Mythos is still not known at all what it can do. What does it mean from cost and benchmark pov to have a 10 Trillion parameter model?

Nonetheless, the fact that LLMs got significant better in finding this, better than humans, started to happen half a year ago? so at one point we need to address the elefant in the room and state that today you need to do security scanning additional with LLMs. You need to take this serious.

In worst case, use Anthropics marketing to state that its a must now and something changed.

bilekas•about 2 hours ago
> The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June

My mind still cannot understand the quality and refinement that's gone into cURL. It really is the perfect example of something done so right, that people barely think twice about.

ahofmann•about 2 hours ago
Putting on my tinfoil-hat: Sooo, the guy who runs the test and delivers the report could just have removed the more interesting bugs and delivered those to any three letter agency?
bilekas•about 2 hours ago
No, based on cURL's history, it really seems like they would love to have found a really novel bug. Now if it was a for profit company.. Tinfoil hat would be shared!
Ekaros•about 2 hours ago
Curl is likely one of the very much more combed over pieces of code at this point. It feels like it has some special draw for people looking for vulnerabilities. Not that it doesn't mean some novel idea can't be looked or checked still.
perching_aix•about 1 hour ago
It's a shame he seems to reject the idea of actually diving in and using these tools interactively:

> It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway.

His expertise I think would elevate the results quite a bit. Although if he never uses LLMs, which it reads like he doesn't, I guess it might backfire just as well. Prompting style (still?) does matter after all, certainly in my experience anyways.

jph00•41 minutes ago
He states in the article that they use LLMs for this purpose and find them extremely useful.
perching_aix•15 minutes ago
Which can be true without this also being true:

> using these tools interactively

I did read the article. It seems to me they're using LLMs in a prepared manner instead, as mere scanners that produce reports.

mohsen1•about 2 hours ago
I don't know about Mythos but in recent weeks I've noticed Opus is constantly failing to fix things in tsz[0] vs GPT 5.5 can easily churn out fixes that are solid and pass tests. I've stopped paying for Claude for now and all my money is going to OpenAI at the moment. Either Opus is massively nerfed or GPT 5.5 is really head and shoulder higher in terms of very difficult tasks. The last percent of conformance tests in tsz are really really difficult and I've seen Opus bailing again and again. So annoying to waste time and tokens to finally get "this is too involved" or "this requires a multi-week sprint to fix".

[0] https://tsz.dev

_pdp_•about 1 hour ago
The new Opus feels like a step backwards. More expensive, thinks more, and it does not get the job done.
vincent_s•24 minutes ago
From a user’s perspective 4.7 is a downgrade compared to 4.6 . It’s intended to give Anthropic more control about their compute resources and profitability:

https://news.ycombinator.com/item?id=48072916

dyauspitr•about 1 hour ago
Having never used Claude and only Codex, does Claude actually say “this is too involved” as a response to a prompt?
mohsen1•about 1 hour ago
Yes it does. Usually after hours of working and not getting results
absynth•about 1 hour ago
I routinely used to compile C programs on other compilers to find defects that one or another didn't find. Compiling on Windows vs Linux. You could summarize / minimize it down to compiling it with warning as errors etc but you'd be missing the point.

The point wasn't actual cross-platform portability even though that was a nice side effect. It was to flush out all the weird edge cases.

Edges like security flaws. Buffer overflows are usually platform specific. There are plenty of other ways to find these issues but simply recompiling for a different platform surfaces all sorts of issues.

yjftsjthsd-h•about 2 hours ago
> The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Piece.

Typo, or is there a spoof I should go read?