1 article
SLATE targets the hardest part of RL search training: teaching a model which exact step helped or harmed the final answer.