The Beginning of Scarcity in AI

ggmays about 3 hours ago 50 commentsRead Article on tomtunguz.com

DE version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

56% Positive

Analyzed from 1581 words in the discussion.

Discussion (50 Comments)Read Original on HackerNews

2001zhaozhao•20 minutes ago

AKA, the beginning of big companies being able to roll over small companies with moar money

(note: I don't expect this to actually happen until the AI gets good enough to either nearly entirely replace humans or solve cooperation, but the long term trend of scarce AI will go towards that direction)

dmazin•about 2 hours ago

Constraints can lead to innovation. Just two things that I think will get dramatically better now that companies have incentive to focus on them:

* harness design

* small models (both local and not)

I think there is tremendous low hanging fruit in both areas still.

cesarvarela•about 2 hours ago

Harness is a big one, Claude Code still has trouble editing files with tabs. I wonder how many tokens per day are wasted on Claude attempting multiple times to edit a file.

com2kid•about 2 hours ago

China already operates like this. Low cost specialized models are the name of the game. Cheaper to train, easy to deploy.

The US has a problem of too much money leading to wasteful spending.

If we go back to the 80s/90s, remember OS/2 vs Windows. OS/2 had more resources, more money behind it, more developers, and they built a bigger system that took more resources to run.

Mac vs Lisa. Mac team had constraints, Lisa team didn't.

Unlimited budgets are dangerous.

henry2023•about 2 hours ago

The US is bound by energy and China is bound by compute power. The one who solves its limitation first will end this “Scarcity Era”.

jakeinspace•about 2 hours ago

China is installing something like 500 GW of wind and solar per year now. Even if they're only able to build and otherwise access chips that have half the SoTA performance per watt, they will win.

wg0•about 2 hours ago

There's other side to it too.

Whoever running and selling their own models with inference is invested into the last dime available in the market.

Those valuations are already ridiculously high be it Anthropic or OpenAI to the tune of couple of trillion dollars easily if combind.

All that investment is seeking return. Correct me if I'm wrong.

Developers and software companies are the only serious users because they (mostly) review output of these models out of both culture and necessity.

Anywhere else? Other fields? There these models aren't any useful or as useful while revenue from software companies by no means going to bring returns to the trillion dollar valuations. Correct me if I'm wrong.

To make the matter worst, there's a hole in the bucket in form of open weight models. When squeezed further, software companies would either deploy open weight models or would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives, whole careers are built on that. Correct me if I'm wrong.

Eventually - ROI might not be what VCs expect and constant losses might lead to bankruptcies and all that build out of data centers all of sudden would be looking for someone to rent that compute capacity result of which would be dime a dozen open weight model providers with generous usage tiers to capitalize on that available compute capacity owners of which have gone bankrupt and can't use it any more wanting to liquidate it as much as possible to recoup as much investment as possible.

EDIT: Typos

vessenes•about 2 hours ago

It seems very possible that we have at least five years of real limitations on compute coming up. Maybe ten, depending on ASML. I wonder what an overshoot looks like. I also wonder if there might be room for new entrants in a compute-scarce environment.

For instance, at some point, could Coreweave field a frontier team as it holds back 10% of its allocations over time? Pretty unusual situation.

itmitica•about 2 hours ago

The current inference system is on a down slope.

It remains to be seen what new wave of AI system or systems will replace it, making the whole current architecture obsolete.

Meanwhile, they are milking it, in the name of scarcity.

stupefy•about 2 hours ago

What limits LLM inference accelerators? I heard about Groq (https://groq.com/) not sure how much it pushes away the problem.

vessenes•about 2 hours ago

ASML only makes a certain number of machines a year that can do extreme ultra-violet lithography.

Also - turbine blades limit power, according to Elon.

Between them - we cannot chip fabs past a certain rate, and we cannot stand up the datacenter to run these desired chips past a certain rate. Different people believe one or the other is the 'true' current bottleneck. The turbine supply chain scaling looks much more tractable -- EUV is essentially the most complicated production process humans have ever devised.

zozbot234•about 2 hours ago

You don't really need EUV for reasonable hardware, you can use DUV and scale up your design effort with things like multiple patterning, etc. to more closely approximate EUV outcomes. Sure, your compute per watt figures will suffer from this but if AI compute is as profitable as it's claimed to be, that's still a viable approach.

ls612•about 2 hours ago

Presumably ASML can increase production if demand is high enough the question is over what time frame. 5 years seems plausible to me but I honestly don't know what that number is.

vessenes•about 2 hours ago

It's ... really long, according to Dylan Patel on the Dwarkesh Podcast. The supply chain is extremely deep and complex.

yalogin•about 2 hours ago

Does this also mean ram prices are not coming down anytime soon?

stronglikedan•about 2 hours ago

they already are

dist-epoch•about 2 hours ago

yes, and it will keep increasing

com2kid•about 2 hours ago

To bang on the same damn drum:

Open Weight models are 6 months to a year behind SOTA. If you were building a company a year ago based on what AI could do then, you can build a company today with models that run locally on a user's computer. Yes that may mean requiring your customers to buy Macbooks or desktops with Nvidia GPUs, but if your product actually improves productivity by any reasonable amount, that purchase cost is quickly made up for.

I'll argue that for anything short of full computer control or writing code, the latest Qwen model will do fine. Heck you can get a customer service voice chat bot running in 8GB of VRAM + a couple gigs more for the ASR and TTS engine, and it'll be more powerful than the hundreds of millions spent on chat bots that were powered by GPT 4.x.

This is like arguing the age of personal computing was over because there weren't enough mainframes for people to telnet into.

It misses the point. Yes deployment and management of personal PCs was a lot harder than dumb terminal + mainframe, but the future was obvious.

space_fountain•about 2 hours ago

I've seen this claimed, but I'm not sure it's been true for my use cases? I should try a more involved analysis but so far open models seem much less even in their skills. I think this makes sense if a lot of them are built based on distillations of larger models. It seems likely that with task specific fine tuning this is true?

zozbot234•about 2 hours ago

The real advantage of Open Weight models from a compute scarcity POV is that they're repurposing the compute users need to have around anyway for their own use. That's great but it's also limited in scope. There's only so many engineering/architecture/Gfx special effects workstations that can now run reasonable mid-sized models "for free" during downtime because they had to be available already for other uses. Everything else will only increase the scarcity, not redress it, unless you only expect users to run very small or very slow models.

byyoung3•about 2 hours ago

distillation is an equalizing force

isawczuk•about 2 hours ago

It's artificial scarcity. LLM inference will soon be commodity as cloud.

There is a 2-3years still before ASIC LLM inferences will catch up.

observationist•about 2 hours ago

The problem with this idea is that someone can, and likely will, come up with the next best architecture that leapfrogs the current frontier models at least once a year, likely faster, for the foreseeable future. This means by the time you've manufactured your LLM on an ASIC, it's 4-5 generations behind, and probably much less efficient than current SOTA model at scale.

It won't make sense for ASIC LLMs to manifest until things start to plateau, otherwise it'll be cheaper to get smarter tokens on the cloud for almost all use cases.

That said, a 10 trillion parameter model on a bespoke compute platform overcomes a lot of efficiency and FOOM aspects of the market fit, so the angle is "when will models that can be run on an asic be good enough that people will still want them for various things even if the frontier models are 10x smarter and more efficient"

I think we're probably a decade of iteration on LLMs out, at least, and the entire market could pivot if the right breakthrough happens - some GPT-2 moment demonstrating some novel architecture that convinces the industry to make the move could happen any time now.

vessenes•about 2 hours ago

I don't think so. GB200 prices are GOING UP. A100s are still expensive. This implies massive utilization and demand, no? These machines are not sitting idle, or prices would drop in the very competitive hyperscaler environment.

czk•about 2 hours ago

"adaptive" thinking

mattas•about 2 hours ago

This notion that "we don't have enough compute" does not cleanly reconcile with the fact that labs are burning cash faster than any cohort of companies in history.

If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."

FloorEgg•about 2 hours ago

There is a major logic flaw in what you're saying.

'If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."'

How about 'if I'm a grocery store and I see no limit on demand for oranges at $.50 but they are currently $1, I can say 'if oranges were cheaper I could sell orders of magnitude more of them'.

Buying oranges for $1 and selling for $0.5 is an investment into acquiring market share and customer relationships and a gamble on the price of oranges falling in the future.

0x3f•about 2 hours ago

> acquiring market share and customer relationships

The whole setup rests on this, and it seems mythical to me. These guys have basically equivalent products at this point.

earthnail•about 2 hours ago

If there were more oranges you’d pay less to buy them and your economics would work out.

0x3f•about 2 hours ago

Not sure if this is a joke or not, but competitive pressure still exists. This only really holds if you're the only orange seller.

vessenes•about 2 hours ago

You misunderstand.

"I built a ship to go to the Indies and bring back tea."

"Bro, the ship cost 100,000 pounds sterling and only brought back 50,000 pounds of tea. I don't care if you paid 12,500 pounds for the tea itself, you're losing money."

There is a very rational reason labs are spending everything they can get for more compute right now. The tea (inference) pays 60%+ margins. And that is rising. And that number is AFTER hyper scalars make their margins. There is an immense amount of profit floating around this system, and strategics at the edge believing they can build and control the demand through combined spend on training and inference in the proper ratios.

SpicyLemonZest•about 2 hours ago

60%+ margins according to numbers which are not published publicly and have not AFAICT been audited.

Could they be accurate? Sure, I think people who claim this is impossible are overconfident. But I would encourage anyone who assumes they must be right to read a history of the Worldcom scandal. It's really quite easy for a person who wants to be making money (or an LLM who's been instructed to "run the accounts make no mistakes"!) to incorrectly categorize costs as investments when nobody's watching carefully.

paulddraper•about 2 hours ago

This is wrong along multiple axes.

1. Supply can scale. You can point to COVID/supply-chain shocks, but the problem there is temporary changes. No one spins up a whole fab to address a 3 month spike. Whereas AI is not a temporary demand change.

2. Models are getting more efficient. DeepSeek V3 was 1/10th the cost of contemporary ChatGPT. Open weight models get more runnable or smarter every month. Cutting edge is always cutting edge, but if scarcity is real, model selection will adjust to fit it.

Lapalux•about 3 hours ago

"The first hit is free....."