Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

100% Positive

Analyzed from 102 words in the discussion.

Trending Topics

#large#tri#dao#muon#training#elden#days#took#course#called

Discussion (4 Comments)Read Original on HackerNews

jnwatsonabout 3 hours ago
Back in the elden days, I took a course called "Large Scale Scientific Computing". It was mostly about multiplying large matrices. I didn't think this was going to be remotely applicable to anything commercial.

Boy was I wrong.

ainchabout 3 hours ago
Tri Dao's lab must have saved countless watts with FlashAttention. Great to see them continuing to open-source massive efficiency gains.
cs702about 3 hours ago
A superior alternative to standard Muon and AdamW optimizers for training large models.

Fantastic work, instantly valuable, immediately usable.

A big THANK YOU to the authors:

Jack Zhang, Noah Amsel, Berlin Chen, and Tri Dao

akoboldfryingabout 3 hours ago
Only read the first section but this sounds really impressive -- up to 50% of up to 17% of training time when using the Muon optimiser, so up to around 7% of basically pure improvement with no downside.