Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library

258

jj12y about 6 hours ago 80 commentsRead Article on semgrep.dev

FR version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

57% Positive

Analyzed from 1977 words in the discussion.

Discussion (80 Comments)Read Original on HackerNews

wlkr•about 4 hours ago

This might just be the frequency illusion at play, but there seem to have been a number of high-profile supply chain attacks of late in major packages. There are several articles on the first few pages of HN right now with different cases.

Looking back ten years to `left-pad`, are there more successful attacks now than ever? I would suspect so, and surely the value of a successful attack has also increased, so are we actually getting better as a broad community at detecting them before package release? It's a complex space, and commercial software houses should do better, but it seems that whilst there are some excellent commercial products (e.g. CI scan tools), generally accessible, idiot friendly tooling is somewhat lacking for projects which start as hobby/amateur code but end up being a dependency in many other projects.

I've cross-posted my comment from the current SAP supply chain attack thread [0].

[0]: https://news.ycombinator.com/item?id=47964003

JohnMakin•about 4 hours ago

People are ramming tons of code into places without ever looking at it, it would follow that supply chain attacks would increase thusly.

eddythompson80•about 4 hours ago

Yeah, and ultimately no body cares. Everyone assumes it’s just some process miss, and we need to add another step to the process and move on. Fuck ups that would have killed the credibility of projects 10 years ago are now treated as “eeh what are you gonna do. Sometimes you ship malware. Will look into it”

jackdoe•about 4 hours ago

I cant wait to have no dependencies.

An extreme example is now when I make interactive educational apps for my daughter, I just make Opus use plain js and html; from double pendulums to fluid simulations, works one shot. Before I had hundreds of dependencies.

Luckily with MIT licensed code I can just tell Opus to extract exactly the pieces I need and embed them, and tweaked for my usecase. So far works great for hobby projects, but hopefully in the future productions software will have no dependencies.

solid_fuel•about 3 hours ago

And of course, you will go over every line of code that Opus produces with the same scrutiny we expect of open source maintainers, right? Right?

I'm going to go publish some MIT-licensed remote access code and get that into Opus's training data.

mandevil•about 4 hours ago

The problem with this is now you are solely responsible for managing all of the changes, all of the variation of life. Chrome changed the shape of this API, you are responsible for finding it and updating it. Morocco changed when their daylight savings took effect, now you need to update your date/time handling code. There are a lot of these things that we take for granted because our libraries handle it for us, and with no dependencies you have to do all the work. Not a big deal for making a double-pendulum simulator for your daughter to play with that will stop mattering next week, but is a concern for a company which is trying to build something that can run indefinitely into the future.

v4nderstruck•about 4 hours ago

well surely Opus would never introduce vulnerabilities into the code so that sounds like the solution.

Aperocky•about 4 hours ago

I am torn because I like rust over go, and rust is better from an LLM perspective. But the dependency philosophy on rust is basically a security blackhole whereas go is much better.

kblissett•about 4 hours ago

I have found Go is an amazing language for LLMs. What do you prefer about Rust?

Aperocky•about 4 hours ago

A portion of context that are required is exported to the compiler. In addition rust binaries are generally smaller both in terms of size and footprint.

mamcx•about 4 hours ago

Vendoring don't basically copy what go does?

gib444•about 4 hours ago

Your LLM isn't a dependency?

auraham•about 3 hours ago

This week I was wondering whether using uv for managing Python versions is a good idea.

From their website [1]

> Python does not publish official distributable binaries. As such, uv uses distributions from the Astral python-build-standalone project. See the Python distributions documentation for more details.

It points to this GitHub repo https://github.com/astral-sh/python-build-standalone which mentions this other link https://gregoryszorc.com/docs/python-build-standalone/main/r...

If I understand correctly, the source code for building Python is not fetched directly from python.org. Not so sure how secure is that.

I have the same concern for asdf [2]. However, they use pyenv [3] which, I think, feels more official.

Can someone clarify this? Which tool is better/more secure for installing python: uv or asdf?

[1] https://docs.astral.sh/uv/guides/install-python/

[2] https://github.com/asdf-community/asdf-python

[3] https://github.com/pyenv/pyenv/tree/master/plugins/python-bu...

woodruffw•about 3 hours ago

> If I understand correctly, the source code for building Python is not fetched directly from python.org. Not so sure how secure is that.

python-build-standalone fetches CPython sources directly from python.org[1]. I don't even know where else we would get them from!

[1]: https://github.com/astral-sh/python-build-standalone/blob/a2...

auraham•about 2 hours ago

Thanks for pointing that out.

throawayonthe•about 3 hours ago

i mean... uv is already a binary you run on your computer to manage python binaries, packages (and any binaries with those), systemwide tools etc; how much does it change whether they build the python binaries or someone else?

auraham•about 2 hours ago

Both uv and asdf can be compiled from source. I prefer that way.

mkeeter•about 5 hours ago

A repository search shows 2.2K repos with the text "A Mini Shai-Hulud has Appeared", all created within the past day:

https://github.com/search?q=A%20Mini%20Shai-Hulud%20has%20Ap...

rhdunn•about 5 hours ago

The repository names all look like two terms/words from dune (harkonen, mentat, ornithoptor, etc.) followed by a number. This would indicate that the account (possibly GitHub auth/actions token) has been compromised and then used to create the repository.

spate141•about 5 hours ago

what's this all about?

foo12bar•about 5 hours ago

FTFA

> The attack steals credentials, authentication tokens, environment variables, and cloud secrets, while also attempting to poison GitHub repositories.

CodeAndCuffs•about 5 hours ago

That doesn't really explain why there is a bunch of GitHub repos created as well.

If I remember correctly from Shai-Hulud 2, the attacker extricated creds by posting them in public github repos with minor easily reversible encryption. I believe it was double b64 last time.

I'm assuming the logic there is that every security researcher and company is going to pull and scan those creds for their stuff and their clients' stuff. So the attacker is just 1 of N people downloading it. As opposed to trying to send it to their own machine directly.

progbits•about 5 hours ago

Malware uploading the credentials it managed to steal

nrengan•about 3 hours ago

Most of my pip installs come from Claude Code suggesting them now and me just hitting enter. Model was trained months ago, so it has no clue what got compromised this week. We built the worst possible filter for "is this package safe right now".

moritzwarhier•about 2 hours ago

What filter?

You say you rely on CC to suggest software to install from the internet, and then you install it.

I haven't heard anyone suggest CC or any LLM as a "filter" for "is this package safe right now", and it seems like a very bad heuristic to me, not only, but also for the reason you gave.

nrengan•about 1 hour ago

Well, people weren't checking CVEs before pip install before CC either, CC just scaled the habit to a larger audience at a faster cadence. The blast radius for day-zero compromises is what changed.

BrenBarn•about 2 hours ago

By "the worst possible filter" do you mean "hitting enter when claude tells you to"?

throwawayqqq11•about 2 hours ago

"Sandbox this project before you make no mistakes."

mixedbit•about 3 hours ago

When I was doing Fast.AI Deep Learning course, I was surprised by the number of Python dependencies machine learning projects bring. Web front-end projects were always considered very third-party dependencies heavy, but to me, the machine learning ecosystem looks much more entangled. In addition, unlike web development, which is considered security critical and has over the many years accumulated a lot of wisdom and good security-related practices, machine learning development looks much more ad-hoc, with many common software engineering practices not applied.

For example, at that time, one way to distribute machine learning models was via Python pickles. Which are executable objects with no restriction built in. Models in this format could do anything on a computer where the model was imported. Such an early 'wild-west' ecosystem can definitely make security compromises easier and resulting supply chain attacks more common.

zelphirkalt•1 minute ago

There are many people in that ecosystem, who are not primarily software engineers. Some just learned some coding along the way. Some are mathematicians. Some are devs who are AI drunk or something. Some have the mindset of "code doesn't matter any longer, if it works it works". For many proper dependency management is just a chore, that they don't want to care about. These things come together in various ML projects, even though ML projects should be amongst the projects most focused on reproducibility.

achandra03•about 5 hours ago

Bless the Maker and His water.

brahman81•about 4 hours ago

Thanks to the community for reporting the security issues with PyTorch Lightning 2.6.2 and 2.6.3 - we're actively looking into it.

In the meantime, please use 2.6.1 until we publish 2.6.4.

For more details: https://github.com/Lightning-AI/pytorch-lightning/security/a...

upupupandaway•about 4 hours ago

Not a security guy here. How did the dependency get compromised, exactly? Did they submit a PR into the main repo at github and it was approved by the maintainers? Or just host compromised versions in other mirrors?

andymcsherry•about 4 hours ago

Andy from Lightning here. The malicious code was not submitted to the main repo at Github. It appears our PyPi credentials were leaked and compromised packages were published directly there for versions 2.6.2 and 2.6.3

caycep•about 5 hours ago

just to clarify it's not PyTorch, it's the library for this Lightning AI company?

mort96•about 4 hours ago

Oh shit I had assumed PyTorch Lightning was affiliated with PyTorch. Not a great name for an unaffiliated third party thing.

lostmsu•about 5 hours ago

Yes

ks2048•about 4 hours ago

I'm curious what they do with various kinds of credentials if they get access.

I can see trying to steal crypto, but what do they do if they get some AWS credentials? Try to run some crypto mining instances? Try to use your account for other types of crimes? Or is it mainly trying to steal data and then ask for ransoms?

bigfluffydonkey•about 4 hours ago

It's always crypto. A client got some AWS credentials stolen and without anyone checking the account, the hacker managed to spin up big EC2 instances across many regions. The bill after a month as I recall was around 100K. Since the activity was clearly fraudulent the bill was forgiven eventually. So remember to lock down your AWS keys permissions...

0fflineuser•about 5 hours ago

The nixpkg from unstable seems to be infected as it s 2.6.2 https://search.nixos.org/packages?channel=unstable&include_h...

minkowski•about 4 hours ago

Nixpkgs uses the GitHub source, not the PyPI dist, for lightning; unclear to me from the advisory whether this should also be considered compromised.

andymcsherry•about 4 hours ago

Andy from Lightning here. Thanks for pointing that out, we are updating the CVE. Only the versions from PyPi were affected. The malicious code was not checked into the GitHub repository

deforciant•about 4 hours ago

github is fine, the package was only pushed into pypi directly

gcapu•about 4 hours ago

On GitHub, I saw this message from April 20, and I’m a bit confused.

"deependujha hi @thebaptiste, thanks for inquiring. Release of 2.6.2 is blocked due to some internal reasons. Will notify once release is made. "

I'd hate it if they knew of the problem that long ago and didn't warn until now. If someone has more info and can clarify I'd be thankful.

https://github.com/Lightning-AI/pytorch-lightning/issues/216...

mil22•about 4 hours ago

For those using uv: https://docs.astral.sh/uv/reference/settings/#exclude-newer

gcapu•about 3 hours ago

I appreciate the tip, but your response has nothing to do with my question

notatallshaw•about 4 hours ago

> Running pip install lightning is all that is needed to activate

FYI, pip added cooldowns in 26.1:

  * https://discuss.python.org/t/announcement-pip-26-1-release/107108
  * https://ichard26.github.io/blog/2026/04/whats-new-in-pip-26.1/

To use:

  * CLI: pip install --uploaded-prior-to=P1D ...
  * Env Var: PIP_UPLOADED_PRIOR_TO=P1D pip install ...
  * Config: pip config set global.uploaded-prior-to P1D

throwa356262•about 5 hours ago

Advisory, fresh from the owen

https://github.com/Lightning-AI/pytorch-lightning/security/a...

csvance•about 4 hours ago

The decision to run all of my experiments in a monorepo with a single uv.lock continues to be validated. I usually only update it a few times a year. It was pinned at 2.6.1 for lightning \o/

fnoef•about 3 hours ago

Looks like coding is in a downward spiral towards complete chaos

SupLockDef•about 2 hours ago

When I was a kid, we've been told to be cautious with third party dependencies, that code can do anything and it's a risk to evaluate.

With the new generation of yolo NPM scripters, they simply don't evaluate the risks. They will even fight back telling you that it's the way of doing things.

In reality, it's the warning we learnt back then, that's the result of be mindlessly importing third dependencies without thinking.

In other words, the risks were always there, the new "modern way", let's put it that way, doesn't put the effort anymore.

ashishb•about 2 hours ago

Always run third party code inside a sandbox

lysace•about 2 hours ago

Is there some string to recursively grep for to know if you have been infected?

sieve•about 3 hours ago

I find this constant churn in the software world to be tiresome. I get it if there is a security update. Or you are building something new; it takes time and a series of updates to reach feature parity on 1.0. But most software is not like that. All these online registries make the problem worse. Any random tool installation pulls in 300 different dependencies.

This is why I have been building, for my own usecases, a new language + compiler + vm that is completely source based. The compiler does not understand linking. You must vendor every single dependency you use, including the standard library, so that it makes its way into the bytecode. The register VM itself is a few thousand lines of freestanding C. Any competent programmer can audit it over a weekend.

v1 deliberately keeps FFI (outside of a bounded set of linux syscalls) outside the current spec as libc has the habit of infecting everything it touches and I want to keep Vm0 freestanding. The last time I compiled the VM, it produced a 70KB binary and supported a loader with structural verification, the entire instruction set using a threaded interpreter, a simple Cheney+MS GC, concurrency via an Erlang-style M:N scheduler working on a single thread, and 20-odd marshaled functions.

Most software in the world does not need anything more than this. Everyone acts as if they are building the next Google.

silverwind•about 2 hours ago

Maybe now people can stop blaming npm and realize none of these unreviewed package ecosystem are safe.

rvz•about 5 hours ago

Shai-Hulud strikes again and continues to turn innocent packages into zombies.

Think twice before looking at a package and most importantly, always pin your dependencies.

pixel_popping•about 4 hours ago

Yeah, pin the malware :p

0xbadcafebee•about 5 hours ago

something something Safety Requires A Building Code something thing

csvance•about 4 hours ago

Shai-Hulud dug my 100 ft trench. Should be OSHA compliant right?

spate141•about 5 hours ago

ah shit, here we go again

12_throw_away•about 5 hours ago

this is fine, we are definitely a perfectly normal industry that knows what it is doing