Entity recognition for duplicate filtering

J. A. Cordero Cruz; Sara E. Garza; S. E. Schaeffer

Conference Proceedings

Entity recognition for duplicate filtering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8821 253-264

DOI: 10.1007/978-3-319-11988-5_24

1Citations

4Readers

Get full text

Abstract

We propose a system for automatic detection of duplicate entries in a repository of semi-structured text documents. The proposed system employs text-entity recognition to extract information regarding time, location, names of persons and organizations, as well as events described within the document content. With structured representations of the content, called “metamodels”, we group the entries into clusters based on the similarity of the contents. Then we apply machine-learning algorithms to the clusters to carry out duplicate detection. We present results regarding precision, recall, and F-value of the proposed system.

Author supplied keywords

Cite

CITATION STYLE

APA

Cordero Cruz, J. A., Garza, S. E., & Schaeffer, S. E. (2014). Entity recognition for duplicate filtering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8821, pp. 253–264). Springer Verlag. https://doi.org/10.1007/978-3-319-11988-5_24

Entity recognition for duplicate filtering

Abstract

Author supplied keywords

Cite

Register to see more suggestions