TECH & SPACE
PROHR
Space Tracker
Meta tag

reinforcement learning

4 articles

SLATE Teaches Search Models Where They Went Wrong
AIRewritten
db#3486

SLATE Teaches Search Models Where They Went Wrong

SLATE targets the hardest part of RL search training: teaching a model which exact step helped or harmed the final answer.

27 Apr 2026
10 years of AlphaGo: real impact, hype, and gaps
AIRewritten
db#3134

10 years of AlphaGo: real impact, hype, and gaps

A decade after DeepMind's AlphaGo crushed Lee Sedol, the real revolution wasn't the board game victory—it was the quiet migration from gaming tables to laboratory benches.

21 Apr 2026
Article image
db#806

NVIDIA’s ProRL: Rollout-as-a-Service or Just Another Bottleneck Fix?

Leaked docs show Anthropic’s next model boasts scores 30% above Opus—but details on real-world use remain scarce.

28 Mar 2026
Article image
AIRewritten
db#260

RLHF’s blind spot: can P-GRPO fix the preference echo chamber?

P-GRPO tries to keep personalized gradients intact instead of flattening feedback into one global average.

12 Mar 2026
⊞ Foto Review