Learning term-weighting functions for similarity measures

Wen Tau Yih

Conference ProceedingsOPEN ACCESS

Learning term-weighting functions for similarity measures

Yih W

EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (2009) 793-802

DOI: 10.3115/1699571.1699616

20Citations

113Readers

Abstract

Measuring the similarity between two texts is a fundamental problem in many NLP and IR applications. Among the existing approaches, the cosine measure of the term vectors representing the original texts has been widely used, where the score of each term is often determined by a TFIDF formula. Despite its simplicity, the quality of such cosine similarity measure is usually domain dependent and decided by the choice of the term-weighting function. In this paper, we propose a novel framework that learns the term-weighting function. Given the labeled pairs of texts as training data, the learning procedure tunes the model parameters by minimizing the specified loss function of the similarity score. Compared to traditional TFIDF term-weighting schemes, our approach shows a significant improvement on tasks such as judging the quality of query suggestions and filtering irrelevant ads for online advertising. © 2009 ACL and AFNLP.

Cite

CITATION STYLE

APA

Yih, W. T. (2009). Learning term-weighting functions for similarity measures. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (pp. 793–802). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699571.1699616

Learning term-weighting functions for similarity measures

Abstract

Cite

Register to see more suggestions