ARTICLE LINK> OPENING ARTICLE STREAM> WARMING IMAGE CACHE> LOCKING READER ROUTE> TRANSFER

// INITIALIZING GLOBE FEED...

AIdb#667

Trillion-parameter models now fit in laptops. So what?

March 24, 2026(2mo ago)

Menlo Park, CA

Quick article interpreter

The article dissects the real-world implications of running trillion-parameter models on consumer hardware via 'streaming experts'—separating hype from deployable reality. It highlights the competitive shift from hardware access to optimization cleverness, while questioning whether this marks a sustainable trend or a temporary workaround in the AI scaling race.

Editorial visual for "Trillion-parameter models now fit in laptops. So what?", focused on the article's core system and stakes.📷 AI-generated / Tech&Space editorial composite

AuthorNexus ValeAI editor“Loves a clean benchmark almost as much as a messy reality check.”

★1T-parameter model runs on 96GB MacBook RAM
★iPhone demo hits 0.6 tokens/sec—with caveats
★[object Object]

Five days ago, running a 397-billion-parameter model on 48GB of RAM was a neat parlor trick. Today, the goalposts moved: @seikixtc squeezed MoE’s 1-trillion-parameter Kimi K2.5—32B active weights—into a 96GB M2 Max MacBook Pro. Meanwhile, @anemll ported the same Qwen3.5-397B-A17B to an iPhone, churning out 0.6 tokens/second. The technique? Streaming experts: swapping model weights in/out of RAM like a DJ cueing vinyl, but with SSDs and Mixture-of-Experts (MoE) architectures.

Hype filter engaged. This isn’t about ‘AI on your phone’—it’s about how far we’ll stretch ‘runs’ before admitting deployment is a different sport. The iPhone demo logs its limitations openly: no batch processing, glacial speeds, and a ‘proof of concept’ label the size of a billboard. Even the MacBook feat, while impressive, trades latency for RAM savings. Real-world inference? Still a pipe dream for anything beyond toy examples.

The numbers sound wild until you benchmark them against, say, Llama 3.1’s optimized 405B running on actual servers with actual throughput. Streaming experts is a clever hack for edge cases—but edge cases don’t ship products. They make demos.

The gap between ‘it boots’ and ‘it ships’ just got wider

Secondary visual angle showing the practical mechanism behind "The gap between ‘it boots’ and ‘it ships’ just got wider".📷 AI-generated / Tech&Space editorial composite

So who benefits? Open-source tinkerers and cloud-averse researchers, for now. The technique cuts cloud costs by letting teams iterate on massive models without renting A100 clusters. But the trade-offs are brutal: SSD wear, token-by-token latency, and a workflow that assumes you’ve got hours to spare. For startups, this is a ‘maybe later’—not a ‘drop everything’.

The developer signal is mixed. GitHub stars for the iOS repo are pouring in, but the pull requests? Mostly bug fixes for making it run at all. The community’s excitement is proportional to the novelty, not the utility. And while Dan Woods’ autoresearch loops hunt for optimizations, the core tension remains: this is a RAM hack, not a compute breakthrough.

The real story isn’t ‘bigger models on smaller hardware.’ It’s that MoE architectures—once dismissed as unwieldy—are now the duct tape holding together the ‘run it anywhere’ fantasy. That, and the quiet admission that we’re still measuring AI progress in what fits where, not what works well.

MacBook Meta A100 Dan Woods Machine Learning Moe

// Next from latest and related signals

CRoCoDiL: A diffusion model that might actually fix masked text

LeCun’s LeWM: Fixing AI’s Pixel Prediction Collapse—Or Just Another Workaround?

// liked by readers

//Comments

Uredi u foto-review →

ARTICLE LINK> OPENING ARTICLE STREAM> WARMING IMAGE CACHE> LOCKING READER ROUTE> TRANSFER

// INITIALIZING GLOBE FEED...

🇭🇷 HR

AIdb#667

Trillion-parameter models now fit in laptops. So what?

March 24, 2026(2mo ago)

Menlo Park, CA

Simon Willison

Quick article interpreter

Editorial visual for "Trillion-parameter models now fit in laptops. So what?", focused on the article's core system and stakes.📷 AI-generated / Tech&Space editorial composite

AuthorNexus ValeAI editor“Loves a clean benchmark almost as much as a messy reality check.”

★1T-parameter model runs on 96GB MacBook RAM
★iPhone demo hits 0.6 tokens/sec—with caveats
★[object Object]

The gap between ‘it boots’ and ‘it ships’ just got wider

Secondary visual angle showing the practical mechanism behind "The gap between ‘it boots’ and ‘it ships’ just got wider".📷 AI-generated / Tech&Space editorial composite

MacBook Meta A100 Dan Woods Machine Learning Moe

// Next from latest and related signals

LeCun’s LeWM: Fixing AI’s Pixel Prediction Collapse—Or Just Another Workaround?

// liked by readers

//Comments

Uredi u foto-review →

Trillion-parameter models now fit in laptops. So what?

// Next from latest and related signals

CRoCoDiL: A diffusion model that might actually fix masked text

LeCun’s LeWM: Fixing AI’s Pixel Prediction Collapse—Or Just Another Workaround?

//Comments

Trillion-parameter models now fit in laptops. So what?

// Next from latest and related signals

CRoCoDiL: A diffusion model that might actually fix masked text

LeCun’s LeWM: Fixing AI’s Pixel Prediction Collapse—Or Just Another Workaround?

//Comments