A supervised learning to rank approach for dependency based concept extraction and repository based boosting for domain text indexing

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In conventional information retrieval systems, keywords extracted from documents are indexed and used for retrieval. Since same information can be represented by different keywords, there is hindrance in extracting relevant documents. Concept based indexing and retrieval which semantically identifies similar documents overcomes this problem by mapping the document phrases to a domain repository. In this paper, the problem of extracting and ranking concepts i.e. key phrases, from domain oriented text is explored. This paper ranks concepts (key phrases) of a document based not only on statistical and cue phrases but also based on the dependency relations in which the candidate concept occurs. For each candidate a vector is formed with the phrase weight and the dependency relations. The features used to score the phrases in the vectors, for re-ranking and as features to weigh the vector corresponding to the candidate are the cue features (presence in title, abstract), C-value in case of multi-words, frequency of occurrence and the type of dependency relation. The ranking process utilizes RankingSVM to rank the candidate concepts based on the feature vectors. In addition, to make the ranking domain sensitive and to determine the domain relevance of the candidate concepts they are fully or partially matched with the domain repository. Based on the depth of the concept and the presence of parent and siblings, the domain relevant concepts are boosted up the order. The results indicate that the use of dependency based context vector and domain repository provides substantial enhancement in the key phrase extraction task compared with other methods.

Cite

CITATION STYLE

APA

Naadan, U. K., Geetha, T. V., Kanimozhi, U., Manjula, D., Viswapriya, R., & Karthik, C. (2018). A supervised learning to rank approach for dependency based concept extraction and repository based boosting for domain text indexing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10859 LNCS, pp. 428–436). Springer Verlag. https://doi.org/10.1007/978-3-319-91947-8_44

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free