Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary

Byung Won On; Jun Young Jo; Hyunkwang Shin; Jangwon Gim; Gyu Sang Choi; Soo Mok Jung

Journal ArticleOPEN ACCESS

Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary

IEEE Access (2021) 9 161208-161223

DOI: 10.1109/ACCESS.2021.3129187

6Citations

21Readers

Abstract

In traditional web crawling, all web pages crawled are first stored to databases. As a result, this approach can store unnecessary web pages and requires additional running time for the construction of a sentiment dictionary in a particular domain because sentiment words should be identified by scanning all web pages in the database. To address these problems, we first define the sentiment-aware web crawling problem and then propose two hash-based methods for the implementation. One is based on hash join and the other is bucket-sorted hash join. In particular, we propose a novel bucket-sorted hash join for the efficient sentiment-aware web crawling method. Our experimental results show that the proposed web crawling method using bucket-sorted hash join outperforms existing web crawling methods by significantly reducing the running time and storage space. In the proposed method, the time taken to execute the sentiment-aware task per web page is 0.016 seconds and the database space can be saved by 59% compared to the existing web crawling methods.

Author supplied keywords

Cite

CITATION STYLE

APA

On, B. W., Jo, J. Y., Shin, H., Gim, J., Choi, G. S., & Jung, S. M. (2021). Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary. IEEE Access, 9, 161208–161223. https://doi.org/10.1109/ACCESS.2021.3129187

Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary

Abstract

Author supplied keywords

Cite

Register to see more suggestions