Abstract
Social scientists who do not have specialized natural language processing training often use a unigram bag-of-words (BOW) representation when analyzing text corpora. We offer a new phrase-based method, NPFST, for enriching a unigram BOW. NPFST uses a partof- speech tagger and a finite state transducer to extract multiword phrases to be added to a unigram BOW.We compare NPFST to both ngram and parsing methods in terms of yield, recall, and efficiency. We then demonstrate how to use NPFST for exploratory analyses; it performs well, without configuration, on many different kinds of English text. Finally, we present a case study using NPFST to analyze a new corpus of U.S. congressional bills.
Cite
CITATION STYLE
Handler, A., Denny, M. J., Wallach, H., & O’Connor, B. (2016). Bag of What? Simple Noun Phrase Extraction for Text Analysis. In NLP + CSS 2016 - EMNLP 2016 Workshop on Natural Language Processing and Computational Social Science, Proceedings of the Workshop (pp. 114–124). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-5615
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.