Automated controversy detection on the web

Shiri Dori-Hacohen; James Allan

Conference Proceedings

Automated controversy detection on the web

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9022 423-434

DOI: 10.1007/978-3-319-16354-3_46

36Citations

37Readers

Get full text

Abstract

Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the “filter bubble” effect, and therefore would be a useful feature in a search engine or browser extension. In order to implement such a feature, however, the binary classification task of determining which topics or webpages are controversial must be solved. Earlier work described a proof of concept using a supervised nearest neighbor classifier with access to an oracle of manually annotated Wikipedia articles. This paper generalizes and extends that concept by taking the human out of the loop, leveraging the rich metadata available in Wikipedia articles in a weakly-supervised classification approach. The new technique we present allows the nearest neighbor approach to be extended on a much larger scale and to other datasets. The results improve substantially over naive baselines and are nearly identical to the oracle-reliant approach by standard measures of F1, F0.5, and accuracy. Finally, we discuss implications of solving this problem as part of a broader subject of interest to the IR community, and suggest several avenues for further exploration in this exciting new space.

Cite

CITATION STYLE

APA

Dori-Hacohen, S., & Allan, J. (2015). Automated controversy detection on the web. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9022, pp. 423–434). Springer Verlag. https://doi.org/10.1007/978-3-319-16354-3_46

Automated controversy detection on the web

Abstract

Cite

Register to see more suggestions