What do Journalists do with Documents? Field Notes for Natural Language Processing Researchers

Jonathan Stray

Report

What do Journalists do with Documents? Field Notes for Natural Language Processing Researchers

Stray J

N/ACitations

11Readers

Abstract

Natural language processing and visualization systems have been proposed to help journalists analyze large sets of documents, but very little has been said on what journalists do with documents in practice. We review a collection of 15 stories completed with the Overview document mining platform, characterizing the source material and reporting tasks. The median document set contained 4,000 documents and the majority arrived as paper or scanned paper. In most cases journalists knew what they were looking for in advance, in contrast to the large research literature concerned with "exploring" a document set. We also review five cases where custom NLP techniques were used to produce a story, including applications of topic modeling, entity recognition, text classification, and sentiment analysis. Based on the cases in these two collections, we recommend six practice-driven themes for natural language processing researchers who want to assist journalists with large document sets: 1) Robust import. 2) Robust analysis. 3) Search, not exploration. 4) Quantitative summaries. 5) Interactive methods. 6) Clarity and Accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Stray, J. (n.d.). What do Journalists do with Documents? Field Notes for Natural Language Processing Researchers.

What do Journalists do with Documents? Field Notes for Natural Language Processing Researchers

Abstract

Author supplied keywords

Cite

Register to see more suggestions