Weibo is the biggest microblogging service in China, whose users publish a mass of short messages every day. Whenever there is a hot event, weibo users will scramble to publishing the messages about this event and most of these messages are repeated. In this paper, we proposed two different methods to identify similar weibo messages from the massive candidate sets. The first method is based on the “Simhash”, who can find the similar messages based on their word similarities. The second method is based on the “Paragraph Vector”, which identify the similar messages based on their semantic similarities. We collect a real-world dataset to conduct experiments, and the results show that our two methods can efficiently identify the similar messages.
CITATION STYLE
Wang, Y. (2018). Finding similar microblogs according to their word similarities and semantic similarities. In Lecture Notes in Electrical Engineering (Vol. 474, pp. 371–375). Springer Verlag. https://doi.org/10.1007/978-981-10-7605-3_61
Mendeley helps you to discover research relevant for your work.