Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary

6Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In traditional web crawling, all web pages crawled are first stored to databases. As a result, this approach can store unnecessary web pages and requires additional running time for the construction of a sentiment dictionary in a particular domain because sentiment words should be identified by scanning all web pages in the database. To address these problems, we first define the sentiment-aware web crawling problem and then propose two hash-based methods for the implementation. One is based on hash join and the other is bucket-sorted hash join. In particular, we propose a novel bucket-sorted hash join for the efficient sentiment-aware web crawling method. Our experimental results show that the proposed web crawling method using bucket-sorted hash join outperforms existing web crawling methods by significantly reducing the running time and storage space. In the proposed method, the time taken to execute the sentiment-aware task per web page is 0.016 seconds and the database space can be saved by 59% compared to the existing web crawling methods.

Cite

CITATION STYLE

APA

On, B. W., Jo, J. Y., Shin, H., Gim, J., Choi, G. S., & Jung, S. M. (2021). Efficient Sentiment-Aware Web Crawling Methods for Constructing Sentiment Dictionary. IEEE Access, 9, 161208–161223. https://doi.org/10.1109/ACCESS.2021.3129187

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free