DPL: Diverse Preference Learning Without A Reference Model

Abhijnan Nath; Andrey Volozin; Saumajit Saha; Albert Aristotle Nanda; Galina Grunin; Rahul Bhotika; Nikhil Krishnaswamy

Conference ProceedingsOPEN ACCESS

DPL: Diverse Preference Learning Without A Reference Model

Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 (2025) 1 3727-3747

DOI: 10.18653/v1/2025.naacl-long.190

1Citations

6Readers

Get full text

Abstract

In direct preference alignment of LLMs, most existing methods seek to retrieve the reward function directly from preference data. However, real-world preference data often contains diversity in preference annotations reflective of true human preferences. Existing algorithms, including KTO (Ethayarajh et al., 2024), do not directly utilize such nuances in the annotations which limits their applicability. In this work, we propose Diverse Preference Learning (DPL), a reference model-free method that simultaneously learns a baseline desirability in LLM responses while being robust to the diversity of preference annotations. Our experiments for instruction-following on Ultrafeedback and AlpacaEval 2.0 and for text-summarization on Reddit TL;DR suggest that DPL is consistently better at learning the diversity of preferences compared to existing methods, including those that require a reference-model in memory. Apart from overall quality, we find that DPL's completions, on average, are more honest, helpful, truthful and safe compared to existing methods.

Cite

CITATION STYLE

APA

Nath, A., Volozin, A., Saha, S., Nanda, A. A., Grunin, G., Bhotika, R., & Krishnaswamy, N. (2025). DPL: Diverse Preference Learning Without A Reference Model. In Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 (Vol. 1, pp. 3727–3747). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2025.naacl-long.190

DPL: Diverse Preference Learning Without A Reference Model

Abstract

Cite

Register to see more suggestions