🚀 Big Update from @grail_ai! We’ve completed our GRPO implementation! Our early runs on the GSM8K dataset and Qwen/Qwen2.5-1.5B-Instruct model show that it is training properly over the SN81 main network, with online rewards steadily improving over time. 1/3 🧵