The explosive popularity of microblogging services produce a large volume of microblogging messages. It presents great difficulties for a user to quickly gauge his/her followees’ opinions when the user interface is overwhelmed by a large number of messages. Useful information is buried in disorganized, incomplete, and unstructured text messages. Wepropose to organize the large amount of messages into clusters with meaningful cluster labels, thus provide an overview of the content to fulfill users’ information needs. Clustering and labeling of microblogging messages are challenging because that the length of the messages are much shorter than conventional text documents. They usually cannot provide sufficient term co-occurrence information for capturing their semantic associations. As a result, traditional text representation models tend to yield unsatisfactory performance. In this paper, we present a text representation framework by harnessing the power of semantic knowledge bases, i.e., Wikipedia and Wordnet. The originally uncorrelated texts are connected with the semantic representation, thus it enhances the performance of short text clustering and labeling. The experimental results on Twitter and Facebook datasets demonstrate the superior performance of our framework in handling noisy and short microblogging messages.
Shah, Z., & Dunn, A. G. (2019). Event detection on Twitter by mapping unexpected changes in streaming data into a spatiotemporal lattice. IEEE Transactions on Big Data, 1–1. https://doi.org/10.1109/tbdata.2019.2948594