Abstract
Natural language processing and visualization systems have been proposed to help journalists analyze large sets of documents, but very little has been said on what journalists do with documents in practice. We review a collection of 15 stories completed with the Overview document mining platform, characterizing the source material and reporting tasks. The median document set contained 4,000 documents and the majority arrived as paper or scanned paper. In most cases journalists knew what they were looking for in advance, in contrast to the large research literature concerned with "exploring" a document set. We also review five cases where custom NLP techniques were used to produce a story, including applications of topic modeling, entity recognition, text classification, and sentiment analysis. Based on the cases in these two collections, we recommend six practice-driven themes for natural language processing researchers who want to assist journalists with large document sets: 1) Robust import. 2) Robust analysis. 3) Search, not exploration. 4) Quantitative summaries. 5) Interactive methods. 6) Clarity and Accuracy.
Author supplied keywords
Cite
CITATION STYLE
Stray, J. (n.d.). What do Journalists do with Documents? Field Notes for Natural Language Processing Researchers.
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.