HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
55% Positive
Analyzed from 3528 words in the discussion.
Trending Topics
#python#code#github#more#dependencies#https#lightning#dependency#pip#attacks

Discussion (91 Comments)Read Original on HackerNews
Looking back ten years to `left-pad`, are there more successful attacks now than ever? I would suspect so, and surely the value of a successful attack has also increased, so are we actually getting better as a broad community at detecting them before package release? It's a complex space, and commercial software houses should do better, but it seems that whilst there are some excellent commercial products (e.g. CI scan tools), generally accessible, idiot friendly tooling is somewhat lacking for projects which start as hobby/amateur code but end up being a dependency in many other projects.
I've cross-posted my comment from the current SAP supply chain attack thread [0].
[0]: https://news.ycombinator.com/item?id=47964003
If you're interested in synchronicity and frequency illusion, Sergei v. Chekanov wrote a book that sounds interesting https://jwork.org/designed-world/
Have you ever experienced coincidences that cannot be logically explained? This book helps the readers understand the meaning of synchronicity, or remarkable coincidences in people's lives. This work not only explains the mystery of synchronicity, originally introduced by Carl Jung, but it also shows how to make simple calculations to estimate the chances that coincidences are not due to mere randomness.
The value has increased, and that is what drives all these attacks. Cryptocurrencies are to blame in particular because they not just provided a way for money laundering the proceeds but also a juicy target in itself.
And what is stolen with today's malware? Cloud credentials. Either to use for illicit mining, which is on the decline, or to run extortion campaigns, which is made possible by cryptocurrencies. All too often it's North Korea or Iran running these campaigns.
I can't vouch for the number of attacks, but, and since we are talking about Python, nothing substantially changed since the time of `left-pad`. The same bad things that enabled supply chain attacks in Python ten years ago are in place today. However, it looks like there are more projects and they are more interconnected than before, so, it's likely that there are either more supply chain attacks, or that they are more damaging, or both.
Here's my anecdotal experience with Python's packaging tools. For a while, I was maintaining a package to parse libconfuse configuration language. It started as a Python 2.7 project, but at the time there was already some version of Python 3 available, so, it was written in a way that was supposed to be future-proof.
I didn't need to change the code of the project in the last ten or so years, but roughly once a year something would break in the setup.py. Usually, because PyPA decided to remove a thing that didn't bother anyone.
When Python 3.13 came out, as clockwork, setup.py broke. I rolled up my sleeves and removed the dependency on setuptools, instead, I wrote some Python code that generated a wheel from the project's sources. I didn't look up the specification of the RECORD file in dist-info directory, and assumed that sha256().hexdigest() will generate the checksums in the desired format. And that's how I shipped my packages...
Some time later, the company added an AI reviewer to the company's repo and it discovered that instead of hexdigest() the checksums have to be base64-encoded and then padding removed...
Now, to the punchline: nobody cared. The incorrectly generated packages installed perfectly fine without warnings. Nobody checks the checksums.
More so: nobody checks that during `pip install` or the more fancy `uv pip install` the packages aren't built locally (i.e. nobody cares that package installation will result in arbitrary code execution). It's not just common, it's almost universal to run `pip install` on production machines as a means of deploying a Python program. How do I know this? -- The company I work for ships its Python client as a... source package. Not intentionally. We are just lazy. But nobody cares.
An extreme example is now when I make interactive educational apps for my daughter, I just make Opus use plain js and html; from double pendulums to fluid simulations, works one shot. Before I had hundreds of dependencies.
Luckily with MIT licensed code I can just tell Opus to extract exactly the pieces I need and embed them, and tweaked for my usecase. So far works great for hobby projects, but hopefully in the future productions software will have no dependencies.
We seem to greatly overestimate the amount of code needed to do something.
For example, there are billions of lines of code from me pressing a key, to you seeing what I wrote. But if we were to make a special program that communicates via ipv6 and icmp, and it is written for hazard3 pico2350 with wiz5500 ethernet breakout, the whole thing including the c compiler to compile your code (which could very well outperform gcc -O3) will be 5-6k lines of code, including RA, and even barebones spi drivers, and a small preemptive os.
So, it is not unreasonable to manage all of those changes.
vs the dependency broke something and now you're responsible for working around someone else's broken code.
Honestly, I've seen much more of the latter. Especially nowadays with every single dependency thinking they are an fully fledged OS because an agent can add 1000 feature/bug in no time. Picking the right dependency maintaining by a sane maintainer is like digging potatoes in a minefield.
I don't buy the notion of things breaking down over time, though. For "first-party" code that sticks to HTML and CSS standards, and Stage 4 / finished ecmascript standards, the web is an absurdly stable platform.
It certainly used to be that we had to do all sorts of weird vendor hacks because nobody agreed on anything and supporting IE6 and 7 were nightmares, and blackberry's browser was awful, but those days are largely behind us unless you're doing some cutting-edge chrome-only early days proposed stuff or a browser specific extension or something else that isn't a polished standard.
Even with timezone changes, you're better off using the system's information with Intl.DateTimeFormat.
So if you just need to do something simple like fire off a compute heavy background task and then get a result when it is done, you should probably just roll your own implementation on top of the threading API in your language. That'll probably be very stable. You don't need a massive background task orchestration framework.
People might object that the frameworks will handle edge cases that you've never thought of, but I've actually found in enterprise settings that the small custom implementations--if you actually keep it small and focused--can cover more of the edge cases. And the big frameworks often engineer their own brittle edge cases due to concerns that you just don't have.
So anyway, it isn't as simple as "dependencies are bad" or "dependencies are good", but every dependency has a cost/benefit analysis that needs to go along with it. And in an Enterprise, I'd argue that if you audit the existing dependencies you will find way too many of them that should be removed or consolidated because they were done for the speed of initial delivery and greenfielding. Eventually when you accumulate way too many of those dependencies the exposure to the supply chains, the need to keep them updated, the need to track CVEs in those deps, and the need to fix code to use updated versions of those dependencies, along with not have the direct ability to bugfix them, all combine to produce an ongoing tax of either continual maintenance or tech debt that will eventually bite you hard.
I'm going to go publish some MIT-licensed remote access code and get that into Opus's training data.
It should be feasible to design vulnerabilities which look benign individually in training data, but when composed together in the agent plane & executed in a chain introduce an exploit.
There’s nothing technical really stopping that from existing right now. It’s just that nobody has put the effort in yet.
https://github.com/search?q=A%20Mini%20Shai-Hulud%20has%20Ap...
This malware isn't even trying. Then again it's Microsoft so they're not even trying either.
> The attack steals credentials, authentication tokens, environment variables, and cloud secrets, while also attempting to poison GitHub repositories.
If I remember correctly from Shai-Hulud 2, the attacker extricated creds by posting them in public github repos with minor easily reversible encryption. I believe it was double b64 last time.
I'm assuming the logic there is that every security researcher and company is going to pull and scan those creds for their stuff and their clients' stuff. So the attacker is just 1 of N people downloading it. As opposed to trying to send it to their own machine directly.
From their website [1]
> Python does not publish official distributable binaries. As such, uv uses distributions from the Astral python-build-standalone project. See the Python distributions documentation for more details.
It points to this GitHub repo https://github.com/astral-sh/python-build-standalone which mentions this other link https://gregoryszorc.com/docs/python-build-standalone/main/r...
If I understand correctly, the source code for building Python is not fetched directly from python.org. Not so sure how secure is that.
I have the same concern for asdf [2]. However, they use pyenv [3] which, I think, feels more official.
Can someone clarify this? Which tool is better/more secure for installing python: uv or asdf?
[1] https://docs.astral.sh/uv/guides/install-python/
[2] https://github.com/asdf-community/asdf-python
[3] https://github.com/pyenv/pyenv/tree/master/plugins/python-bu...
python-build-standalone fetches CPython sources directly from python.org[1]. I don't even know where else we would get them from!
[1]: https://github.com/astral-sh/python-build-standalone/blob/a2...
”…for Shai-Hulud!!!”
You say you rely on CC to suggest software to install from the internet, and then you install it.
I haven't heard anyone suggest CC or any LLM as a "filter" for "is this package safe right now", and it seems like a very bad heuristic to me, not only, but also for the reason you gave.
For example, at that time, one way to distribute machine learning models was via Python pickles. Which are executable objects with no restriction built in. Models in this format could do anything on a computer where the model was imported. Such an early 'wild-west' ecosystem can definitely make security compromises easier and resulting supply chain attacks more common.
In the meantime, please use 2.6.1 until we publish 2.6.4.
For more details: https://github.com/Lightning-AI/pytorch-lightning/security/a...
"deependujha hi @thebaptiste, thanks for inquiring. Release of 2.6.2 is blocked due to some internal reasons. Will notify once release is made. "
I'd hate it if they knew of the problem that long ago and didn't warn until now. If someone has more info and can clarify I'd be thankful.
https://github.com/Lightning-AI/pytorch-lightning/issues/216...
If they haven't started yet, they should require 2nd factor for publishing as well.
I can see trying to steal crypto, but what do they do if they get some AWS credentials? Try to run some crypto mining instances? Try to use your account for other types of crimes? Or is it mainly trying to steal data and then ask for ransoms?
https://github.com/Lightning-AI/pytorch-lightning/security/a...
FYI, pip added cooldowns in 26.1:
To use:With the new generation of yolo NPM scripters, they simply don't evaluate the risks. They will even fight back telling you that it's the way of doing things.
In reality, it's the warning we learnt back then, that's the result of be mindlessly importing third dependencies without thinking.
In other words, the risks were always there, the new "modern way", let's put it that way, doesn't put the effort anymore.
This is why I have been building, for my own usecases, a new language + compiler + vm that is completely source based. The compiler does not understand linking. You must vendor every single dependency you use, including the standard library, so that it makes its way into the bytecode. The register VM itself is a few thousand lines of freestanding C. Any competent programmer can audit it over a weekend.
v1 deliberately keeps FFI (outside of a bounded set of linux syscalls) outside the current spec as libc has the habit of infecting everything it touches and I want to keep Vm0 freestanding. The last time I compiled the VM, it produced a 70KB binary and supported a loader with structural verification, the entire instruction set using a threaded interpreter, a simple Cheney+MS GC, concurrency via an Erlang-style M:N scheduler working on a single thread, and 20-odd marshaled functions.
Most software in the world does not need anything more than this. Everyone acts as if they are building the next Google.
Think twice before looking at a package and most importantly, always pin your dependencies.
You would have to publish the infected package first to infect others who haven't pinned their dependencies. With a simple pip install -U, and if the dependency is not pinned, then they will get the vulnerable version.