A straightforward author profiling approach in MapReduce

Suraj Maharjan; Prasha Shrestha; Thamar Solorio; Ragib Hasan

Journal Article

A straightforward author profiling approach in MapReduce

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8864 95-107

DOI: 10.1007/978-3-319-12027-0_8

13Citations

13Readers

Get full text

Abstract

Most natural language processing tasks deal with large amounts of data, which takes a lot of time to process. For better results, a larger dataset and a good set of features are very helpful. But larger volumes of text and high dimensionality of features will mean slower performance. Thus, natural language processing and distributed computing are a good match. In the PAN 2013 competition, the test runtimes for author profiling range from several minutes to several days. Most author profiling systems available now are either inaccurate or slow or both. Our system, written entirely in MapReduce, employs nearly 3 million features and still manages to finish the task in a fraction of time than state-of-the-art systems and with better accuracy. Our system demonstrates that when we deal with a huge amount of data and/or a large number of features, using distributed systems makes perfect sense.

Cite

CITATION STYLE

APA

Maharjan, S., Shrestha, P., Solorio, T., & Hasan, R. (2014). A straightforward author profiling approach in MapReduce. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8864, 95–107. https://doi.org/10.1007/978-3-319-12027-0_8

A straightforward author profiling approach in MapReduce

Abstract

Cite

Register to see more suggestions