Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
45% Positive
Analyzed from 3001 words in the discussion.
Trending Topics
#kernel#security#code#llm#llms#more#bugs#false#bug#https
Discussion Sentiment
Analyzed from 3001 words in the discussion.
Trending Topics
Discussion (61 Comments)Read Original on HackerNews
> and since nobody stepped up to help us deal with the influx of the AI-generated bug reports we need to move it out of tree to protect our sanity.
This thread from the linux-hams mailing list [2] has more insight into this decision. I guess the silver lining is that, more modern protocols (in userspace), written in modern languages will become the norm for HAM radio on linux now.
[1] : <https://lwn.net/ml/all/20260421021824.1293976-1-kuba@kernel....>
[2] : <https://lore.kernel.org/linux-hams/CAEoi9W5su6bssb9hELQkfAs7...>
That's really it. The list of things that "need" to be in the kernel is shrinking steadily, and the downsides of having C code running in elevated privilege levels are increasing. None of that is about LLMs at all, except to the extent that it's a notable inflection point in a decades-scale curve.
The future, and we basically all agree, puts complexities like protocol handling and state in daemons and leaves only the hardware, process and I/O management in the kernel.
Basically, Tannenbaum was right about the design but wrong about the schedule and path to get there.
why do they even need to be in kernel repo and not brought at/after install time?
I wrote and maintained 10GbE drivers for a small company in the 2000s, and just the SHIM file for our driver on Linux to massage over API differences was well over 1000 lines. I think it was close to the same size as the entire driver for one of the BSDs.
People have been asking this question since Linux was first invented…
Xbox/PS controllers, for example. I believe some old RAID controller and WiFi drivers are removed too. Whatever they don't want to support.
The only problem here, if any, is the false sense of confidence given by LLMs to people who have no business touching kernel code.
In terms of quality ("are there bugs that professional humans can't see at any budget but LLMs can?") - it's not very clear, because Opus is still worse than a human specialist, but Mythos might be comparable. We'll just have to wait and see what results Project Glasswing gets.
Either way, cybersecurity is going to get real weird real soon, because even slightly-dumb models can have a large effect if they are cheap and fast enough.
EDIT: Mozilla thinks "no" to the second question, by the way: "Encouragingly, we also haven’t seen any bugs that couldn’t have been found by an elite human researcher.", when talking about the 271 vulnerabilities recently found by Mythos. https://blog.mozilla.org/en/firefox/ai-security-zero-day-vul...
Being flooded with these kind of reports can make the actual real problems harder to see.
The plural of "Opus" is "Opera". Might be a tad confusing tho :)
Of course some people don't do that, and send all the reports anyway... and then scream from the hilltops about how incredible LLMs are when by sheer luck one happens to be right. Not only is that blatant p-hacking, it's incredibly antisocial.
It's disingenuous marketing speak to say LLMs are "finding" any security holes at all: they find a thousand hypotheticals of which one or two might be real. A broken clock is right twice a day.
Yes, what we see coming out of the bottom of funnel is now is a little better. But it's sort of like reading day trading blogs: nobody shares their negative results, which in my direct experience are so bad they almost negate any investigative benefit. I also think part of this is that a small set of very prolific spammers were sufficiently discouraged to stop.
> "Remarkably few of them complete false positives."
Modern LLMs with a reasonable prompt and some form of test harness are, in my experience, excellent at taking a big list of potential vulnerabilities and figuring out which ones might be real. They're also pretty good, depending on the class of vuln and the guardrails in the model, at developing a known-reachable vulnerability into real exploit tooling, which is also a big win. This does require the _slightest_ bit of work (ie - don't prompt the LLM with "find possible use after free issues in this code," or it will give you a lot of slop; prompt the LLM with "determine whether the memory safety issues in this file could present a security risk" and you get somewhere), but not some kind of elaborate setup or prompt hacking, just a little common sense.
At the same time, a lot of these bugs were in places that people weren't looking because it's not actually important. This kernel code had already been a longstanding problem in terms of low-effort bot-driven security reports and nobody had any interest in maintaining it. So this was more LLM-assisted technical management than LLM-assisted security, it finally made a situation uncomfortable enough for the team to do something about it.
Another example: Mythos found a real bug in FreeBSD that occurs when running as an NFS with a public connection. But... who on earth is doing that? I would guess 99.9% of FreeBSD NFS installations are on home LANs. More importantly, Anthropic spent $20,000 to find this bug. Just think in terms of paying a full-time FreeBSD dev for a month and that's what they find: I'd say "ok, looks like FreeBSD has a pretty secure codebase, let's fix that stupid bug, stop wasting our money, and get you on a more exciting project."
I do think anyone who has a legacy open-source C/C++ codebase owes it to their users to run it by Claude/Codex, check your pointers and arrays, make sure everything looks ok. I just wish people were able to discuss it in proper context about other native debugging tools!
No.
Like everything else an LLM touches, it is prone to slop and hallucinations.
You still need someone who knows what they are doing to review (and preferably manually validate) the findings.
What all this recent hype carefully glosses over is the volume of false-positives. I guarantee you it is > 0 and most likely a fairly large number.
And like most things LLM, the bigger the codebase the more likely the false-positives due to self-imposed context window constraints.
Its all very well these blog posts saying "LLM found this serious bug in Firefox", well yeah but that's only because the security analyst filtered out all the junk (and knew what to ask the LLM in the prompt in the first place).
Another way to see this is that you mentioned "LLM found this serious bug in Firefox", but the actual number in that Mozilla report [2] was 14 high-severity bugs, and 90 minor ones. However you look at it, it's an impressive result for a security audit, and I dount that the Antropic team had to manually filter out hundreds-to-thousands of false-positives to produce it.
They did have to manually write minimal exploits for each bug, because Opus was bad at it[3]. This is a problem that Mythos doesn't have. With access to Mythos, to repeat the same audit, you'd likely just need to make the model itself write all the exploits, which incidentally would also filter out a lot of the false positives. I think the hype is mostly justified.
[1] https://lwn.net/Articles/1065620/
[2] https://blog.mozilla.org/en/firefox/hardening-firefox-anthro...
[3] https://www.anthropic.com/news/mozilla-firefox-security
To be clear, I'm not saying 0% false-positive because that will always be impossible with any LLM.
However, to greatly over-simplify what I already said ...
The presence of >0 false-positives means you still need someone who knows what they are doing behind the keyboard.
The presence of an LLM, no matter how good, will never remove the need for a human with domain expertise in security analysis.
You cannot blindly fix stuff just because the LLM says it needs fixing.
You cannot report stuff just because the LLM says it needs reporting.
There may well be scope for LLM-assisted workflows, but WHO is being assisted is a critical part of the equation.
That is the fundamental point I am making.
> As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation.
What commenters don't seem to understand is that especially CVE spam / bug bounty type vulnerability research has always been an exercise in sifting through useless findings and hallucinations, and LLMs, used well, are great at reducing this burden.
Previously, a lot of "baseline" / bottom tier research consisted of "run fuzzers or pentest tools against a product; if you're a bottom feeder just stuff these vulns all into the submission box, if you're more legit, tediously try to figure out which ones are reachable." LLMs with a test harness do an _amazing_ job at reducing this tedium; in the memory safety space "read across 50 files to figure out if this UAF might be reachable" or in the web space, "follow this unsanitized string variable to see if it can be accessed by the user" are tasks that LLMs with a harness are awesome. The current models are also about 50% there at "make a chain for this CVE," depending on the shape of the CVE (they usually get close given a good test harness).
It seems that the concern with the unreleased models is pretty much that this has advanced once again from where it is today (where you need smart prompting and a good harness) to the LLM giving you exploit chains in exchange for "giv 0day pl0x," and based on my experience, while this has got an element of puffery and classic capitalist goofiness to it ("the model is SO DANGEROUS only our RICHEST CUSTOMERS can have it!"), I believe this is just a small incremental step and entirely believable.
To summarize: "more efficient than all but the best" comes with too many qualifiers, but "are LLMs meaningfully useful in exercising vulnerabilities in OS kernel code," or "is it possible to accelerate vulnerability research and development with LLMs" - 100% absolutely.
And you don't have to believe one random professional (me); this opinion is fairly widespread across the community:
https://sockpuppet.org/blog/2026/03/30/vulnerability-researc...
https://lwn.net/Articles/1065620/
etc.
Yes, I don't see the point of maintaining technical debt just for the sake of it.
The security environment in 2026 is such that legacy unmaintained code is a very real security risk for obscure zero-days to exploit to gain a foot in the door.
Reading through the list I don't see it being an issue for the overwhelming majority of Linux users.
Who, for example, still uses ISDN in 2026 ? Most telcos have stopped all new sales and existing ISDN circuits will be forcefully disconnected within 3–5 years as the telcos complete their FTTP build-outs and the copper network is subsequently decomissioned.
Most TV and radio stations.
I doubt it. And as I said, telcos have ceased new sales of ISDN and will be shutting down copper networks within 3–5 years.
Therefore if there are still TV and radio stations still using it, they will be forced to stop using it by circumstance, i.e. they will find their ISDN will cease working after the telco shuts down the kit in the exchange.
- Nobody is familiar with the code
- Almost all of the recent fixes are from static analysis
- Nobody is even sure if anyone uses the code
This feels a lot like CPython culling stdlib modules and making them pypi packages. The people who rely on those things have a little bit of extra work if they want a recent kernel version, and everyone else benefits (directly or indirectly) by way of there being less stuff that needs attention.
The overlap of bugs being found, nobody caring enough to bother read the reports or fix the code, and nobody caring that the modules are pushed out of main seems good.
In general, drivers make up the largest attack surface in the kernel and many of them are just along for the ride rather than being actively maintained and reviewed by researchers.
[0] not trivially if you want to validate if it works
Be real with yourself, do you know anyone using ISA or PCI in 2026? Everything is built on PCI-E except in specific industrial settings or on ancient hardware that's only relevant for retrocomputing. Is anyone using the ATM network protocol anymore? MPLS and MetroE mostly replaced ATM, and now MPLS is being largely supplanted by SDWAN technologies and normal Internet connections. I have been doing networking nearly my entire career in some capacity, the last time I touched X.25 or Frame Relay was in the early 2000s, the last time I touched ATM was in the mid early 2000s... the last time I touched ISDN was in the mid 2010s, and that was an IDSL setup, which is itself a dead technology. The last laptop I owned that had a PCMCIA card slot was manufactured in 2008.
I don't want to see these capabilities completely disappear, but there's no reason they should ship in the mainline kernel in 2026. They should be separated kernel modules in their own tree.
Can't wait to AI braindead folks get collapsed down for the good.