Indexing Names of Persons in a Large Dataset of a Newspaper

Juliana P.C. Pirovani; Matheus Nogueira; Elias de Oliveira

Conference Proceedings

Indexing Names of Persons in a Large Dataset of a Newspaper

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11122 LNAI 147-155

DOI: 10.1007/978-3-319-99722-3_15

0Citations

3Readers

Get full text

Abstract

An index is a very good tool for finding the necessary information from a set of documents. So far, the extant index tools in both the printed and digital newspaper versions are not sufficient to help users find information. Users must browse the entire newspaper to fulfill their needs or discover later on, after spending a considerable amount of energy, that the information they had been seeking is not available. We propose here to use state-of-the-art strategies for extracting named entities specifically for person names and, with an index of names, provide the user with an important tool to find names within newspaper pages. The state-of-the-art system considered used the Golden Collection of the First and Second HAREM, a reference for Named Entity Recognition systems in Portuguese, as training and test sets respectively. Furthermore, we created a new training dataset from the actual newspaper’s articles. In this case, we processed 100 articles of the newspaper and managed to correctly find 87.0% of the extant names and their respective partial citations.

Cite

CITATION STYLE

APA

Pirovani, J. P. C., Nogueira, M., & de Oliveira, E. (2018). Indexing Names of Persons in a Large Dataset of a Newspaper. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11122 LNAI, pp. 147–155). Springer Verlag. https://doi.org/10.1007/978-3-319-99722-3_15

Indexing Names of Persons in a Large Dataset of a Newspaper

Abstract

Cite

Register to see more suggestions