DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
100% Positive
Analyzed from 141 words in the discussion.
Trending Topics
#experts#batch#overlap#deepseek#small#token#general#more#meaningful#pro

Discussion (4 Comments)Read Original on HackerNews
Whether this is true depends on what you mean by small. In general, AIUI you don't need more than a handful of experts to get a meaningful probability of overlap. DeepSeek V4 Pro is an exceptionally sparse model and even there you start to get meaningful overlap for a batch size of 5 or more. Moreover, in general you can think of the average amount of activated experts for a batch of size b as being n(1 - (1 - k/n)^b) where k is the number of active and n of total experts. For DeepSeek V4, k=6 and n is 256 for Flash, 384 for Pro. (The sampling is repeated per layer, not just per token.)
good point tho - plus for Deepseek the shared expert increases the overlap slightly