An efficient training dataset generation method for extractive text summarization

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The work presents a method to automatically generate a training dataset for the purpose of summarizing text documents with the help of feature extraction technique. The goal of this approach is to design a dataset which will help to perform the task of summarization very much like a human. A document summary is a text that is produced from one or more texts that conveys important information in the original texts. The proposed system consists of methods such as pre-processing, feature extraction, and generation of training dataset. For implementing the system, 50 test documents from DUC2002 is used. Each document is cleaned by preprocessing techniques such as sentence segmentation, tokenization, removing stop word, and word stemming. Eight important features are extracted for each sentence, and are converted as attributes for the training dataset. A high quality, proper training dataset is needed for achieving good quality in document summarization, and the proposed system aims in generating a well-defined training dataset that is sufficiently large enough and noise free for performing text summarization. The training dataset utilizes a set of features which are common that can be used for all subtasks of data mining. Primary subjective evaluation shows that our training is effective, efficient, and the performance of the system is promising.

Cite

CITATION STYLE

APA

Hannah, E., & Mukherjee, S. (2014). An efficient training dataset generation method for extractive text summarization. In Advances in Intelligent Systems and Computing (Vol. 236, pp. 955–963). Springer Verlag. https://doi.org/10.1007/978-81-322-1602-5_101

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free