A vocabulary costruction method based on difference between source document and summary for neural summarization model

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.

Abstract

Pointer-generator, which is the one of the strong baselines in neural summarization models, generates summaries by selecting words from a set of words (output vocabulary) and words in source documents. A conventional method for constructing output vocabulary collects highly frequent words in summaries of training data. However, highly frequent words in summaries could be usually a high possibility to be frequent in source documents. Thus, an output vocabulary constructed by the conventional method is redundant for pointer-generator because pointer-generator can copy words in source documents. We propose a vocabulary construction method that selects words included in each summary but not included in its source text of each pair. Experimental results on CNN/Daily Mail corpus and NEWSROOM corpus showed that our method contributes to improved ROUGE scores while obtaining high ratios of generating novel words that do not occur in source documents.

Cite

CITATION STYLE

APA

Makino, T., & Iwakura, T. (2020). A vocabulary costruction method based on difference between source document and summary for neural summarization model. Transactions of the Japanese Society for Artificial Intelligence, 35(6), B-K46_1-8. https://doi.org/10.1527/tjsai.35-6_B-K46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free