Abstract
Kurunthogai is a classical Tamil poetic masterpiece and it is the second book of Ettuthokai which is one of the Sangam literary works. The poems of Kurunthogai expresses the love life between men and women who lived during the Sangam age. Kurunthogai is a massive work written by many authors. The poems are written based on the five different landscapes namely, Kurinchi, Mullai, Marutham, Neythal, and Pālai. So, the poems contain much valuable historical information related to these landscapes. This paper proposes a template-based Information Extraction (IE) framework for Kurunthogai which automatically extracts the names of flora, fauna, foods, vessels, and water bodies described in it. Furthermore, it extracts Noun Unigrams, Verb Unigrams, Adjective-Noun Bigrams, and Adverb-Verb Bigrams. Tamil Morphological Analyzer tool has been used to extract the N-grams. The state-of-art IE techniques have attempted to extract information from expository texts, whereas, the proposed IE framework extracts information from a literature-based text. The existing techniques extract information from monolingual texts, whereas, the proposed IE framework extracts information from bilingual texts. The proposed IE framework has achieved a precision of 88.8%. The proposed framework can be applied for any literature type of texts and be used in various applications of Natural Language Processing.
Author supplied keywords
Cite
CITATION STYLE
Subalalitha, C. N. (2019). Information extraction framework for Kurunthogai. Sadhana - Academy Proceedings in Engineering Sciences, 44(7). https://doi.org/10.1007/s12046-019-1140-y
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.