Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

95% Positive

Analyzed from 780 words in the discussion.

Trending Topics

#model#models#chip#self#thought#fast#taalas#better#burned#less

Discussion (22 Comments)Read Original on HackerNews

jv22222about 5 hours ago
Why is no one talking about open source models being burned direct to chip and running inference at 10k-15k a second?

OS models close the gap (via distillation) with frontier models, then get burned to chip, then offer commoditized inference via data farms or local plugins.

With thought loops this fast even if the models are less smart they can be self correcting to level them selves up.

unknown_user_84about 4 hours ago
I've seen Taalas come up on HN, but only once. I'm not a HN fanatic but I end up around page 3 before I kill-filp my browser into oblivion. Currently Taalas has Llama 3.1 8B burned onto a chip and offer chatbot and API access. That said they aren't selling the chips yet, on their website at least.

I expect they are waiting for an openweight model they really feel is worth burning to a chip and/or training their own thing. I'd guess they could probably figure out some efficiency speedups if they are doing model development + hardware development at the same time. Though this seems different than Google and friends with their TPUs and similar. TPUs being general purpose chips than what Taalas is making. Still, probably something there.

jv22222about 3 hours ago
If they supply something like an external hard drive formfactor with small ssd for a small corrective memory it could do very will IMHO
david_shiabout 5 hours ago
If you burn a model to a chip what happens if there's a better model?
scarmigabout 5 hours ago
You have a slightly less great model. Depending on your thesis on how fast AI will advance, that might be minimal, or it might be huge. However, "AI is advancing too fast for people to make obvious efficiency improvements economically worth doing" is rather hard to square with "AI is a lie and will never generate profits."
jv22222about 3 hours ago
They will come with corrective SSD and Ram that will enable stale models to get some amount of self correction. Then after that it will be a typical upgrade path. Actually a nice business model with upgrades built in.
nosioptarabout 5 hours ago
I had the same question.

I wondered if it'd be possible to use a rewritable chip or a socketed chip...

unknown_user_84about 4 hours ago
Not sure about sockets but I've seen a company with a wafer sized TPU thing or whatever. They claim to have an approach to route around the defects and I had to scroll past some press release I didn't read about the stock market, so someone believes in them I guess. They sell a mini-fridge that can handle a model with trillions of parameters with a contact sales button. Cerberas is their name. I actually ended up misremembering the name name of Taalas as Cerberas and discovered them when I was researching the above comment. Taalas burns models to chips. Cerberas makes pizza sized chips.
saulpwabout 5 hours ago
> With thought loops this fast even if the models are less smart they can be self correcting to level them selves up.

You can't self-correct a model that's been burned to a chip. That seems like it'd be the main problem with ASIC AI, when everything's changing on a monthly basis, do you want to spend $x00 on a substandard model that'll be obsolete in 3 months, or wait 3 months?

jv22222about 3 hours ago
Just talking about longer deep thought loops. You can do a LOT of deep thought at 15k a second and it still feels super fast.

Side note: I really believe in this technology if anyone building this happens to be reading this and is looking for help give me a shout.

aeternumabout 3 hours ago
These already exist and there are very few use-cases because typically waiting a little longer for a significantly better answer is preferable.
jv22222about 3 hours ago
Yes but we're talking about the fullness of time as things play out. Much improvements will be made to OS models via distillation and then you also get 15k sec to make it think better.
aesthesiaabout 4 hours ago
There are some glaring local errors that make this analysis less than trustworthy. For instance, an assumption that corporate income tax applies directly to revenue, or a supposedly generous assumption that GPUs will fully depreciate after 3 years (6-year-old A100s are still in very high demand!). I would love to read a really well thought through investigation of inference costs and how they relate to token pricing, but I have low confidence that this is it.
aesthesiaabout 4 hours ago
Oh, just noticed one other very significant error: they evaluate revenue using input token pricing while counting capacity using generated tokens per second. There's a big gap between input and output token pricing, and between prefill TPS and generation TPS.
anamaxabout 3 hours ago
> GPUs will fully depreciate after 3 years (6-year-old A100s are still in very high demand!)

Depreciation is a tax thing. While it is supposed to track useful life, it almost never does.

For example, houses are depreciated on a 28-year schedule. I'm typing this from a house built in 1902....

Google has yet to decommission any of its Trilliums, and the V1s shipped in 2015.

The prices to rent V2 (2017) and later are on https://cloud.google.com/tpu/pricing .

aesthesiaabout 2 hours ago
Yep, in their analysis depreciation meant "get no useful work out of the GPU after this point," though.
aatd86about 6 hours ago
Not convinced. That is a very static view. You would think that the output of AI will be better AI, better energy sources and that will make AI way cheaper in the long run... It will end up a cheap commodity that is basically free to produce. Over the long run it is absolutely one of the best investments in projections.
bwestergardabout 6 hours ago
"It will end up a cheap commodity that is basically free to produce."

Wouldn't this just mean that hardware manufacturers capture the profits, not hyperscalers?

RetroTechieabout 3 hours ago
Probably. Selling gear to shovel gpus into datacenters is gonna be profitable for a while, no matter how this pans out.
u1hcw9nxabout 6 hours ago
Classic story tellers vs people who can quantify.

Story tellers: Full self driving was commonplace already in 2020.

aatd8614 minutes ago
No one claimed that. Besides negativity is a self-fulfilling prediction.

And investing is not accounting...

vablingsabout 6 hours ago
The idea that a GPU being useful for 3 years is insane. There is little to no data to support that