Pointer-generator, which is the one of the strong baselines in neural summarization models, generates summaries by selecting words from a set of words (output vocabulary) and words in source documents. A conventional method for constructing output vocabulary collects highly frequent words in summaries of training data. However, highly frequent words in summaries could be usually a high possibility to be frequent in source documents. Thus, an output vocabulary constructed by the conventional method is redundant for pointer-generator because pointer-generator can copy words in source documents. We propose a vocabulary construction method that selects words included in each summary but not included in its source text of each pair. Experimental results on CNN/Daily Mail corpus and NEWSROOM corpus showed that our method contributes to improved ROUGE scores while obtaining high ratios of generating novel words that do not occur in source documents.
CITATION STYLE
Makino, T., & Iwakura, T. (2020). A vocabulary costruction method based on difference between source document and summary for neural summarization model. Transactions of the Japanese Society for Artificial Intelligence, 35(6), B-K46_1-8. https://doi.org/10.1527/tjsai.35-6_B-K46
Mendeley helps you to discover research relevant for your work.