Entity recognition for duplicate filtering

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a system for automatic detection of duplicate entries in a repository of semi-structured text documents. The proposed system employs text-entity recognition to extract information regarding time, location, names of persons and organizations, as well as events described within the document content. With structured representations of the content, called “metamodels”, we group the entries into clusters based on the similarity of the contents. Then we apply machine-learning algorithms to the clusters to carry out duplicate detection. We present results regarding precision, recall, and F-value of the proposed system.

Cite

CITATION STYLE

APA

Cordero Cruz, J. A., Garza, S. E., & Schaeffer, S. E. (2014). Entity recognition for duplicate filtering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8821, pp. 253–264). Springer Verlag. https://doi.org/10.1007/978-3-319-11988-5_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free