Back to News
Advertisement
Advertisement

Discussion (1 Comments)Read Original on HackerNews

potus_kushnerβ€’about 2 hours ago
these LFM 2.5 models are crazy fast. the (biggest in series) 8B-A1B model produces 35-40 t/s on an aged 6-core CPU using llama.cpp. it's my go-to model for whenever i need fast local inference. it's also pretty good at toolcalling. would love to see more finetunes on HF, but it appears not many people discovered it yet.