Meta tag

Speculative Decoding

2 articles

Gemma 4 speeds up token generation with MTP draft models

AIRewritten

db#4789

Gemma 4 targets the AI delay users actually feel

Gemma 4 gets a practical route to faster inference: MTP draft models propose multiple tokens at once, while the main model verifies them in a single pass.

25 May 2026

db#1520

MoE-SpAc’s speculative bet: Lookahead or just more hype?

The MoE-SpAc team repurposed Speculative Decoding—a technique normally used to speed up LLMs—as a memory oracle for edge devices, betting it can predict expert activation before the model stumbles.

12 Mar 2026