Modeling common real-word relations using triples extracted from n-grams

Ruben Sipoš; Dunja Mladenić; Marko Grobelnik; Janez Brank

Conference Proceedings

Modeling common real-word relations using triples extracted from n-grams

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5926 LNCS 16-30

DOI: 10.1007/978-3-642-10871-6_2

2Citations

5Readers

Get full text

Abstract

In this paper, we present an approach providing generalized relations for automatic ontology building based on frequent word n-grams. Using publicly available Google n-grams as our data source we can extract relations in form of triples and compute generalized and more abstract models. We propose an algorithm for building abstractions of the extracted triples using WordNet as background knowledge. We also present a novel approach to triple extraction using heuristics, which achieves notably better results than deep parsing applied on n-grams. This allows us to represent information gathered from the web as a set of triples modeling the common and frequent relations expressed in natural language. Our results have potential for usage in different settings including providing for a knowledge base for reasoning or simply as statistical data useful in improving understanding of natural languages. © 2009 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Sipoš, R., Mladenić, D., Grobelnik, M., & Brank, J. (2009). Modeling common real-word relations using triples extracted from n-grams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5926 LNCS, pp. 16–30). https://doi.org/10.1007/978-3-642-10871-6_2

Modeling common real-word relations using triples extracted from n-grams

Abstract

Cite

Register to see more suggestions