Bag of What? Simple Noun Phrase Extraction for Text Analysis

46Citations
Citations of this article
142Readers
Mendeley users who have this article in their library.

Abstract

Social scientists who do not have specialized natural language processing training often use a unigram bag-of-words (BOW) representation when analyzing text corpora. We offer a new phrase-based method, NPFST, for enriching a unigram BOW. NPFST uses a partof- speech tagger and a finite state transducer to extract multiword phrases to be added to a unigram BOW.We compare NPFST to both ngram and parsing methods in terms of yield, recall, and efficiency. We then demonstrate how to use NPFST for exploratory analyses; it performs well, without configuration, on many different kinds of English text. Finally, we present a case study using NPFST to analyze a new corpus of U.S. congressional bills.

Cite

CITATION STYLE

APA

Handler, A., Denny, M. J., Wallach, H., & O’Connor, B. (2016). Bag of What? Simple Noun Phrase Extraction for Text Analysis. In NLP + CSS 2016 - EMNLP 2016 Workshop on Natural Language Processing and Computational Social Science, Proceedings of the Workshop (pp. 114–124). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-5615

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free