Investigating the statistical properties of user-generated documents

3Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user-generated documents for some of the established services over the Internet (Kongregate, Twitter, Myspace and Slashdot) and comparing them with a consolidated collection of standard information retrieval documents (from the Wall Street Journal, Associated Press and Financial Times, as part of the TREC ad-hoc collection). We investigate features such as document similarity, term burstiness, emoticons and Part-Of-Speech analysis, highlighting the applicability and limits of traditional content analysis and indexing techniques used in information retrieval to the new online user-generated documents. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Inches, G., Carman, M. J., & Crestani, F. (2011). Investigating the statistical properties of user-generated documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7022 LNAI, pp. 198–209). https://doi.org/10.1007/978-3-642-24764-4_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free