Since the 2011 Arab Spring revolutions many scholars of the Middle East have built and analyzed locally-sourced protest event datasets, which have been hailed for providing superior coverage to various off-the-shelf datasets that rely primarily on English-language sources. This paper assesses the extent of these coverage improvements. It shows that across five different MENA countries, locally-sourced datasets identify considerably more events than most off-the-shelf datasets. It then compares one locally-sourced dataset of protests in Egypt from January 2012 to July 2013 to two prominent off-the-shelf datasets: ACLED and SCAD. These comparisons reveal that both ACLED and SCAD significantly overcount large, urban, violent, and political events. Next the paper compares the Egypt dataset to data compiled by two Egyptian activist groups, and finds that the locally-sourced dataset is also biased in key respects, undercounting small labor events outside the capital. Finally, the paper demonstrates the implications of these biases by showing how statistical models of protest repression differ when using the locally-sourced dataset versus SCAD. Scholars of Mediterranean politics analyzing within-case and sub-national mobilization dynamics should use locally-sourced datasets whenever possible, but should also be aware that using local sources does not entirely eliminate certain forms of bias.
Clarke, K. (2021). Which protests count? Coverage bias in Middle East event datasets. Mediterranean Politics. https://doi.org/10.1080/13629395.2021.1957577