Extracting compound terms from domain corpora

Lucelene Lopes; Renata Vieira; Maria José Finatto; Daniel Martins

Journal ArticleOPEN ACCESS

Extracting compound terms from domain corpora

Journal of the Brazilian Computer Society (2010) 16(4) 247-259

DOI: 10.1007/s13173-010-0020-4

14Citations

20Readers

Abstract

The need for domain ontologies motivates the research on structured information extraction from texts. A foundational part of this process is the identification of domain relevant compound terms. This paper presents an evaluation of compound terms extraction from a corpus of the domain of Pediatrics. Bigrams and trigrams were automatically extracted from a corpus composed by 283 texts from a Portuguese journal, Jornal de Pediatria, using three different extraction methods. Considering that these methods generate an elevated number of candidates, we analyzed the quality of the resulting terms according to different methods and cut-off points. The evaluation is reported by metrics such as precision, recall and f-measure, which are computed on the basis of a hand-made reference list of domain relevant compounds. © 2010 The Brazilian Computer Society.

Author supplied keywords

Cite

CITATION STYLE

APA

Lopes, L., Vieira, R., Finatto, M. J., & Martins, D. (2010). Extracting compound terms from domain corpora. Journal of the Brazilian Computer Society, 16(4), 247–259. https://doi.org/10.1007/s13173-010-0020-4

Extracting compound terms from domain corpora

Abstract

Author supplied keywords

Cite

Register to see more suggestions