A vocabulary costruction method based on difference between source document and summary for neural summarization model

Takuya Makino; Tomoya Iwakura

Journal ArticleOPEN ACCESS

A vocabulary costruction method based on difference between source document and summary for neural summarization model

Transactions of the Japanese Society for Artificial Intelligence (2020) 35(6) B-K46_1-8

DOI: 10.1527/tjsai.35-6_B-K46

1Citations

2Readers

Abstract

Pointer-generator, which is the one of the strong baselines in neural summarization models, generates summaries by selecting words from a set of words (output vocabulary) and words in source documents. A conventional method for constructing output vocabulary collects highly frequent words in summaries of training data. However, highly frequent words in summaries could be usually a high possibility to be frequent in source documents. Thus, an output vocabulary constructed by the conventional method is redundant for pointer-generator because pointer-generator can copy words in source documents. We propose a vocabulary construction method that selects words included in each summary but not included in its source text of each pair. Experimental results on CNN/Daily Mail corpus and NEWSROOM corpus showed that our method contributes to improved ROUGE scores while obtaining high ratios of generating novel words that do not occur in source documents.

Author supplied keywords

Cite

CITATION STYLE

APA

Makino, T., & Iwakura, T. (2020). A vocabulary costruction method based on difference between source document and summary for neural summarization model. Transactions of the Japanese Society for Artificial Intelligence, 35(6), B-K46_1-8. https://doi.org/10.1527/tjsai.35-6_B-K46

A vocabulary costruction method based on difference between source document and summary for neural summarization model

Abstract

Author supplied keywords

Cite

Register to see more suggestions