DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
86% Positive
Analyzed from 243 words in the discussion.
Trending Topics
#latency#applicable#https#extremely#yes#archive#large#happy#small#models

Discussion (14 Comments)Read Original on HackerNews
I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.
It appears to be focussed more on latency, than throughput. Happy to be corrected?
One primary application of this work is in high-energy physics (https://home.cern/smarter-decisions-at-the-speed-of-collisio...). Ultrafast and real-time learning is also very applicable for problems in quantum computing, plasma control, etc. (https://arxiv.org/pdf/2602.02005).
p.s. Thanks for posting this and welcome to HN!