Application of NLP for Information Extraction from Unstructured Documents

8Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The world is intrigued by data. In fact, huge capitals are invested to devise means that implements statistics and extract analytics from these sources. However, when we examine the studies performed on applicant tracking systems that retrieve valuable information from candidates’ CVs and job descriptions, they are mostly rule-based and hardly manage to employ contemporary techniques. Even though these documents vary in contents, the structure is almost identical. Accordingly, in this paper, we implement an NLP pipeline for the extraction of such structured information from a wide variety of textual documents. As a reference, textual documents which are used in applicant tracking systems like CV (Curriculum Vitae) and job vacancy information have been considered. The proposed NLP pipeline is built with several NLP techniques like document classification, document segmentation and text extraction. Initially for the classification of textual documents, support vector machines (SVM) and XGBoost algorithms have been implemented. Different segments of the identified document are categorized using NLP techniques such as chunking, regex matching and POS tagging. Relevant information from every segment is further extracted using techniques like Named Entity Recognition (NER), regex matching and pool parsing. Extraction of such structured information from textual documents can help to gain insights and use those insights in document maintenance, document scoring, matching and auto-filling forms.

Cite

CITATION STYLE

APA

Pudasaini, S., Shakya, S., Lamichhane, S., Adhikari, S., Tamang, A., & Adhikari, S. (2022). Application of NLP for Information Extraction from Unstructured Documents. In Lecture Notes in Networks and Systems (Vol. 209, pp. 695–704). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-16-2126-0_54

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free