On the Semi-unsupervised Construction of Auto-keyphrases Corpus from Large-Scale Chinese Automobile E-Commerce Reviews

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The long-standing automobile e-commerce websites in China have accumulated huge amounts of auto reviews, and extracting keyphrases of these reviews can assist researchers and practitioners in obtaining online users’ typical opinions and acquiring their underlying motivations. However, there haven’t existed any relevant text corpora so far. In this paper, the authors propose a semi-unsupervised scheme to construct a comprehensive auto-keyphrases corpus from online collected reviews in Chinese automobile e-commerce websites by Position Rank, which performs very well in keyphrases extraction from texts in the scenario of scarce labeled data. The iterative annotation process consists of three-round labeling and two-round corrections. During the process of the three-round unsupervised labeling, the computing model will extract seven most important words as the keyphrases of the whole paragraph. Between each labeling phase, there are manual check, correction, re-check and arbitration stages, in which the previous labeling errors are corrected and new vocabulary and rules are summarized up to further improve the unsupervised model. For comparison, the paper runs the experiments using another two unsupervised approaches: TF-IDF and Text Rank, the experimental results also show that Position Rank is a more efficient and effective method for keyphrases extraction. By the time this paper was written, the auto-keyphrases corpus had contained 110,023 entries, and there are still much room for improvement in corpus volume and labeling quality.

Cite

CITATION STYLE

APA

Li, Y., Qian, C., Che, H., Wang, R., Wang, Z., & Zhang, J. (2019). On the Semi-unsupervised Construction of Auto-keyphrases Corpus from Large-Scale Chinese Automobile E-Commerce Reviews. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11856 LNAI, pp. 452–464). Springer. https://doi.org/10.1007/978-3-030-32381-3_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free