Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Published in International Conference on Machine Learning (ICML), 2024
A. Nika, D. Mandal, P Kamalaruban, G. Tzannetos, G. Radanovic, A. Singla