Concept Recognition with Convolutional Neural Networks to Optimize Keyphrase Extraction

Andreas Waldis; Luca Mazzola; Michael Kaufmann

Conference Proceedings

Concept Recognition with Convolutional Neural Networks to Optimize Keyphrase Extraction

Communications in Computer and Information Science (2019) 862 160-188

DOI: 10.1007/978-3-030-26636-3_8

2Citations

4Readers

Get full text

Abstract

For knowledge management purposes, it would be useful to automatically classify and tag documents based on their content. Keyphrase extraction is one way of achieving this automatically by using statistical or semantic methods. Whereas corpus-index-based keyphrase extraction can extract relevant concepts for documents, the inverse document index grows exponentially with the number of words that candidate concepts can have. Document-based heuristics can solve this issue, but often result in keyphrases that are not concepts. To increase concept precision, or the percentage of extracted keyphrases that represent actual concepts, we contribute a method to filter keyphrases based on a pre–trained convolutional neural network (CNN). We tested CNNs containing vertical and horizontal filters to decide whether an n-gram (i.e, a consecutive sequence of N words) is a concept or not, from a training set with labeled examples. The classification training signal is derived from the Wikipedia corpus, assuming that an n-gram certainly represents a concept if a corresponding Wikipedia page title exists. The CNN input feature is the vector representation of each word, derived from a word embedding model; the output is the probability of an n-gram to represent a concept. Multiple configurations for vertical and horizontal filters are analyzed and optimised through a hyper-parameterization process. The results demonstrated concept precision for extracted keywords of between 60 and 80% on average. Consequently, by applying a CNN-based concept recognition filter, the concept precision of keyphrase extraction was significantly improved. For an optimal parameter configuration with an average of five extracted keyphrases per document, the concept precision could be increased from 0.65 to 0.8, meaning that on average, at least four out of five keyphrases extracted by our algorithm were actual concepts verified by Wikipedia titles.

Author supplied keywords

Cite

CITATION STYLE

APA

Waldis, A., Mazzola, L., & Kaufmann, M. (2019). Concept Recognition with Convolutional Neural Networks to Optimize Keyphrase Extraction. In Communications in Computer and Information Science (Vol. 862, pp. 160–188). Springer Verlag. https://doi.org/10.1007/978-3-030-26636-3_8

Concept Recognition with Convolutional Neural Networks to Optimize Keyphrase Extraction

Abstract

Author supplied keywords

Cite

Register to see more suggestions