Tag: reinforcement learning

SLATE Teaches Search Models Where They Went Wrong

AIRewritten

db#3486

SLATE Teaches Search Models Where They Went Wrong

SLATE targets the hardest part of RL search training: teaching a model which exact step helped or harmed the final answer.

27 Apr 2026

10 years of AlphaGo: real impact, hype, and gaps

AIRewritten

db#3134

10 years of AlphaGo: real impact, hype, and gaps

A decade after DeepMind's AlphaGo crushed Lee Sedol, the real revolution wasn't the board game victory—it was the quiet migration from gaming tables to laboratory benches.

21 Apr 2026

AI

db#806

NVIDIA’s ProRL: Rollout-as-a-Service or Just Another Bottleneck Fix?

Leaked docs show Anthropic’s next model boasts scores 30% above Opus—but details on real-world use remain scarce.

28 Mar 2026

AIRewritten

db#260

RLHF’s blind spot: can P-GRPO fix the preference echo chamber?

P-GRPO tries to keep personalized gradients intact instead of flattening feedback into one global average.

12 Mar 2026