DSN LINK STABLECARRIER WAVE LOCKORBITAL INDEX HOTSIGNAL CLOCK SYNCLOW NOISE FLOORFRAME BUFFER ONLINE
Loading
11 articles
EMO tries to turn MoE modularity from a theoretical compute advantage into a practical tool for smaller, domain-focused deployments.
DeepSeek V4 arrives in Flash and Pro versions with a 1M-token context window, a MoE architecture, and a claim that it is closing in on leading closed models.
Meta’s latest 175B-parameter LLaMA 3 model required a training run that consumed 1.2GWh—enough to power a Tesla Gigafactory for a day.
LiME uses one shared PEFT module and lightweight expert vectors to cut MoE-PEFT parameters by up to four times.
MoE's 1-trillion-parameter model now runs on a 96GB MacBook Pro.
Mistral quietly shipped Small 4, a 119B-parameter MoE model that collapses Magistral, Pixtral, and Devstral into one 6B-active-weight binary — and for the first time, the unified architecture actually works in production.
The MoE-SpAc team repurposed Speculative Decoding—a technique normally used to speed up LLMs—as a memory oracle for edge devices, betting it can predict expert activation before the model stumbles.
Nemotron 3 Super combines 120B parameters, Mamba, and MoE for a new open-agent push.
YuanLab’s model emphasizes MoE pruning and expert rearrangement, making it a compute-economics story rather than only a size story.