LFM2-24B-A2B: Scaling Up the LFM2 Architecture

Discussion (4 Comments)Read Original on HackerNews

meatmanek•about 2 hours ago

This model is pretty cool if you don't have a GPU - I was able to get I think 20 or 30 tokens per second on CPU (DDR4 ram) alone. (I don't remember if that was with q4 or q8.)

Otherwise, if you have a GPU with more than like 4GB of VRAM, there are better models. Gemma4 and Qwen3.6 (or Qwen3.5 if you need the smaller dense models that haven't yet been released for 3.6) are a good place to start.

alyxya•about 1 hour ago

The blog post was published a couple months ago, and it looks like there hasn't been a follow-up release with the fully trained model. I'm not sure if there's much to take away from an early checkpoint besides the unique architectural choices they made in their model for faster inference.

alfiedotwtf•about 2 hours ago

Tokens per second is nice but I would also like to see quality benchmarks especially against other models. I mean eventually someone’s gonna write a blog post comparing models, so why not just do it yourself… that way your marketing department at least get to control the narrative rather than a random blogger

mirekrusin•33 minutes ago

It's a checkpoint in the middle of training, it makes sense to report speed, which will stay the same and to report quality as they did.

LFM2-24B-A2B: Scaling Up the LFM2 Architecture

⚡ Community Insights

Discussion (4 Comments)Read Original on HackerNews