Mining the web for collocations: IR models of term associations

Rakesh Verma; Vasanthi Vuppuluri; An Nguyen; Arjun Mukherjee; Ghita Mammar; Shahryar Baki; Reed Armstrong

Conference Proceedings

Mining the web for collocations: IR models of term associations

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 9623 LNCS 177-194

DOI: 10.1007/978-3-319-75477-2_11

1Citations

7Readers

Get full text

Abstract

Automatic collocation recognition has attracted considerable attention of researchers from diverse fields since it is one of the fundamental tasks in NLP, which feeds into several other tasks (e.g., parsing, idioms, summarization, etc.). Despite this attention the problem has remained a “daunting challenge.” As others have observed before, existing approaches based on frequencies and statistical information have limitations. An even bigger problem is that they are restricted to bigrams and as yet there is no consensus on how to extend them to trigrams and higher-order n-grams. This paper presents encouraging results based on novel angles of general collocation extraction leveraging statistics and the Web. In contrast to existing work, our algorithms are applicable to n-grams of arbitrary order, and directional. Experiments across several datasets, including a gold-standard benchmark dataset that we created, demonstrate the effectiveness of proposed methods.

Cite

CITATION STYLE

APA

Verma, R., Vuppuluri, V., Nguyen, A., Mukherjee, A., Mammar, G., Baki, S., & Armstrong, R. (2018). Mining the web for collocations: IR models of term associations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9623 LNCS, pp. 177–194). Springer Verlag. https://doi.org/10.1007/978-3-319-75477-2_11

Mining the web for collocations: IR models of term associations

Abstract

Cite

Register to see more suggestions