HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
100% Positive
Analyzed from 192 words in the discussion.
Trending Topics
#experts#batch#meaningful#overlap#deepseek#more#pro#where#flash#small

Discussion (5 Comments)Read Original on HackerNews
Whether this is true depends on what you mean by small. In general, AIUI you don't need more than a handful of experts to get a meaningful probability of overlap. DeepSeek V4 Pro is an exceptionally sparse model and even there you start to get meaningful overlap for a batch size of 5 or more. Moreover, in general you can think of the average amount of activated experts for a batch of size b as being n(1 - (1 - k/n)^b) where k is the number of active and n of total experts. For DeepSeek V4, k=6 and n is 256 for Flash, 384 for Pro. (The sampling is repeated per layer, not just per token.)
good point tho - plus for Deepseek the shared expert increases the overlap slightly