Distributed Parse Mining

4Citations
Citations of this article
79Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We describe the design and implementation of a system for data exploration over dependency parses and derived semantic representations in a large-scale NLP-based search system at powerset.com. Because of the distributed nature of the document repository and the processing infrastructure, and also the complex representations of the corpus data, standard text analysis tools such as grep or awk or language modeling toolkits are not applicable. This paper explores the challenges of extracting statistical information and of building language models in such a distributed NLP environment, and introduces a corpus analysis system, Oceanography, that simplifies the writing of analysis code and transparently takes advantage of existing distributed processing infrastructure.

Cite

CITATION STYLE

APA

Waterman, S. A. (2009). Distributed Parse Mining. In NAACL HLT 2009 - Software Engineering, Testing, and Quality Assurance for Natural Language Processing, SETQA-NLP 2009 - Proceedings of the Workshop (pp. 56–64). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1621947.1621957

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free