RL ist so empfindlich gegenüber Zahlen, beim letzten Mal hat das Torch-Compile einige Abstürze verursacht, jetzt vllm v1.
Mika Senghaas
Mika Senghaas12. Aug., 11:23
moving from vllm v0 to v1 made our async rl training crash! read how we fixed it we recently migrated from v0 to v1 as part of a larger refactor of prime-rl to make it easier-to-use, more performant and naturally async. we confirmed correct training dynamics on many smaller-scale runs, but hit a wall when trying to reproduce a larger scale run that ran without problems prior to the refactor. Specifically, training DeepSeek-R1-Distill-Qwen-1.5B on single-turn math problems from our INTELLECT-2 math dataset at 8k context with two-step off-policy delay would crash fatally roughly 400 steps into the training
6,73K