We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain. © 2013 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Medelyan, O., Manion, S., Broekstra, J., Divoli, A., Huang, A. L., & Witten, I. H. (2013). Constructing a focused taxonomy from a document collection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7882 LNCS, pp. 367–381). Springer Verlag. https://doi.org/10.1007/978-3-642-38288-8_25
Mendeley helps you to discover research relevant for your work.