An efficient training dataset generation method for extractive text summarization

Esther Hannah; Saswati Mukherjee

Conference Proceedings

An efficient training dataset generation method for extractive text summarization

Advances in Intelligent Systems and Computing (2014) 236 955-963

DOI: 10.1007/978-81-322-1602-5_101

0Citations

6Readers

Get full text

Abstract

The work presents a method to automatically generate a training dataset for the purpose of summarizing text documents with the help of feature extraction technique. The goal of this approach is to design a dataset which will help to perform the task of summarization very much like a human. A document summary is a text that is produced from one or more texts that conveys important information in the original texts. The proposed system consists of methods such as pre-processing, feature extraction, and generation of training dataset. For implementing the system, 50 test documents from DUC2002 is used. Each document is cleaned by preprocessing techniques such as sentence segmentation, tokenization, removing stop word, and word stemming. Eight important features are extracted for each sentence, and are converted as attributes for the training dataset. A high quality, proper training dataset is needed for achieving good quality in document summarization, and the proposed system aims in generating a well-defined training dataset that is sufficiently large enough and noise free for performing text summarization. The training dataset utilizes a set of features which are common that can be used for all subtasks of data mining. Primary subjective evaluation shows that our training is effective, efficient, and the performance of the system is promising.

Author supplied keywords

Cite

CITATION STYLE

APA

Hannah, E., & Mukherjee, S. (2014). An efficient training dataset generation method for extractive text summarization. In Advances in Intelligent Systems and Computing (Vol. 236, pp. 955–963). Springer Verlag. https://doi.org/10.1007/978-81-322-1602-5_101

An efficient training dataset generation method for extractive text summarization

Abstract

Author supplied keywords

Cite

Register to see more suggestions