Knowledge-lean Paraphrase identification using character-based features

Asli Eyecioglu; Bill Keller

Conference Proceedings

Knowledge-lean Paraphrase identification using character-based features

Communications in Computer and Information Science (2018) 789 257-276

DOI: 10.1007/978-3-319-71746-3_21

4Citations

13Readers

Get full text

Abstract

The paraphrase identification task has practical importance in the NLP community because of the need to deal with the pervasive problem of linguistic variation. Accurate methods should help improve the performance of NLP applications, including machine translation, information retrieval, question answering, text summarization, document clustering and plagiarism detection, amongst others. We consider an approach to paraphrase identification that may be considered “knowledge-lean”. Our approach minimizes the need for data transformation and avoids the use of knowledge-based tools and resources. Candidate paraphrase pairs are represented using combinations of word- and character-based features. We show that SVM classifiers may be trained to distinguish paraphrase and non-paraphrase pairs across a number of different paraphrase corpora with good results. Analysis shows that features derived from character bigrams are particularly informative. We also describe recent experiments in identifying paraphrase for Russian, a language with rich morphology and free word order that presents a particularly interesting challenge for our knowledge-lean approach. We are able to report good results on a three-way paraphrase classification task.

Author supplied keywords

Cite

CITATION STYLE

APA

Eyecioglu, A., & Keller, B. (2018). Knowledge-lean Paraphrase identification using character-based features. In Communications in Computer and Information Science (Vol. 789, pp. 257–276). Springer Verlag. https://doi.org/10.1007/978-3-319-71746-3_21

Knowledge-lean Paraphrase identification using character-based features

Abstract

Author supplied keywords

Cite

Register to see more suggestions