these LFM 2.5 models are crazy fast. the (biggest in series) 8B-A1B model produces 35-40 t/s on an aged 6-core CPU using llama.cpp. it's my go-to model for whenever i need fast local inference. it's also pretty good at toolcalling. would love to see more finetunes on HF, but it appears not many people discovered it yet.
Discussion (1 Comments)Read Original on HackerNews