Meta tag
reinforcement learning
4 articles
SLATE Teaches Search Models Where They Went Wrong
SLATE targets the hardest part of RL search training: teaching a model which exact step helped or harmed the final answer.
27 Apr 2026
10 years of AlphaGo: real impact, hype, and gaps
A decade after DeepMind's AlphaGo crushed Lee Sedol, the real revolution wasn't the board game victoryâit was the quiet migration from gaming tables to laboratory benches.
21 Apr 2026
NVIDIAâs ProRL: Rollout-as-a-Service or Just Another Bottleneck Fix?
Leaked docs show Anthropicâs next model boasts scores 30% above Opusâbut details on real-world use remain scarce.
28 Mar 2026
RLHFâs blind spot: can P-GRPO fix the preference echo chamber?
P-GRPO tries to keep personalized gradients intact instead of flattening feedback into one global average.
12 Mar 2026



