R1-Zero-like training for math