Twitter becomes one of the critical channels for disseminating up-to-date information. The volume of tweets can be huge. It is desirable to have an automatic system to analyze tweets. The obstacle is that Twitter users usually invent new words using non-standard rules that appear in a burst within a short period of time. Existing new word detection methods are not able to identify them effectively. Even if the new words can be identified, it is difficult to understand their meanings. In this paper, we focus on Chinese Twitter. There are no natural word delimiters in a sentence, which makes the problem more difficult. To solve the problem, we derive an unsupervised new word detection framework without relying on training data. Then, we introduce automatic tagging to new word annotation which tag the new words using known words according to our proposed tagging algorithm.
CITATION STYLE
Liang, Y., Yin, P., & Yiu, S. M. (2015). New word detection and tagging on Chinese twitter stream. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9263, pp. 310–321). Springer Verlag. https://doi.org/10.1007/978-3-319-22729-0_24
Mendeley helps you to discover research relevant for your work.