Automatic extraction-based document summarization is a difficult Natural Language Processing task. Previous approaches have usually generated the summary by extracting the top K salient sentences on graph-based ranking algorithms, but sentence feature representation only captures the surface relationship between the objects, hence the results may not accurately reflect the user’s intentions. Therefore, we propose a method to address this challenge, and: (1) obtain deeper semantic concepts among candidate sentences using meaningful sentence vectors combining word vectors and TF-IDF; (2) rank the sentences considering both relationships between sentences and the user’s intention for each sentence to identify significant sentences, and apply these to a heterogeneous graph; (3) generate the result sentence by sentence to ensure summary semantics are properly related to the original document. We verified the proposed approach experimentally using English summarization benchmark datasets DUC2001 and DUC2002; the large Chinese summarization data set, LCSTS. We also collected news data and produced a reference summary using a group of bank auditor experts that we compared to the proposed approach using ROUGE evaluation.
CITATION STYLE
Wang, H. Y., Chang, J. W., & Huang, J. W. (2019). User intention-based document summarization on heterogeneous sentence networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11447 LNCS, pp. 572–587). Springer Verlag. https://doi.org/10.1007/978-3-030-18579-4_34
Mendeley helps you to discover research relevant for your work.