Qwen released a 72B and a 7B process reward models (PRM) on their recent math models
https://huggingface.co/Qwen/Qwen2.5-Math-PRM-72B
https://huggingface.co/Qwen/Qwen2.5-Math-PRM-7B
In addition to the mathematical Outcome Reward Model (ORM) Qwen2.5-Math-RM-72B, we release the Process Reward Model (PRM), namely Qwen2.5-Math-PRM-7B and Qwen2.5-Math-PRM-72B. PRMs emerge as a promising approach for process supervision in mathematical reasoning of Large Language Models (LLMs), aiming to identify and mitigate intermediate errors in the reasoning processes. Our trained PRMs exhibit both impressive performance in the Best-of-N (BoN) evaluation and stronger error identification performance in ProcessBench.
The paper: The Lessons of Developing Process Reward Models in Mathematical Reasoning
arXiv:2501.07301 [cs.CL]: https://arxiv.org/abs/2501.07301