Events extraction from social media data is a tedious task because of their volume, velocity and informality. In a previous work [25], we proposed a successful approach for events extraction from social data. However, messages were processed individually which generates many meaningless events because of missing details scattered within millions of text segments. In addition, many unnecessary texts were analyzed which increased processing time and decreased the performance of the system. In this paper, we aim to cope with the abovementioned weaknesses and ameliorate the performance of the system. We propose clustering to group semantically-related text segments, filter noise, reduce the volume of data to process and promote only relevant text segments to the information extraction pipeline. We port the clustering algorithm to a stream processing framework namely Storm in order to build a stream clustering solution and scale up to continuously growing volumes of data.
CITATION STYLE
Jenhani, F., Gouider, M. S., & Said, L. B. (2018). Social stream clustering to improve events extraction. In Smart Innovation, Systems and Technologies (Vol. 73, pp. 319–329). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-59424-8_30
Mendeley helps you to discover research relevant for your work.