Meet LFM2-8B-A1B, our first on-device Mixture-of-Experts (MoE)! 🐘
> LFM2-8B-A1B is the best on-device MoE in terms of both quality and speed.
> Performance of a 3B-4B model class, with up to 5x faster inference profile on CPUs and GPUs.
> Quantized variants fit comfortably on high-end phones, tablets, and laptops.
Enabling fast, private, low-latency applications across modern phones, tablets, laptops, and embedded systems.
1/n 🧵