We describe a system for automatically ranking documents by degree of militancy, designed as a tool both for finding militant websites and prioritizing the data found. We compare three ranking systems, one employing a small hand-selected vocabulary based on group membership markers used by insiders to identify members and member properties (us) and outsiders and threats (them), one with a much larger vocabulary, and another with a small vocabulary chosen by Mutual Information. We use the same vocabularies to build classifiers. The ranker that achieves the best correlations with human judgments uses the small us-them vocabulary. We confirm and extend recent results in sentiment analysis (Paltoglou and Thelwall 2010), showing that a feature-weighting scheme taken from classical IR (TFIDF) produces the best ranking system; we also find, surprisingly, that adjusting these weights with SVM training, while producing a better classifier, produces a worse ranker. Increasing vocabulary size similarly improves classification (while worsening ranking). Our work complements previous work tracking radical groups on the web (Chen 2007),which classified such sites with heterogeneous indicators. The method combines elements of machine learning and behavioral science, and should extend to any group organized for collective action. Copyright © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
CITATION STYLE
Gawron, J. M., Gupta, D., Stephens, K., Tsou, M. H., Spitzberg, B., & An, L. (2012). Using group membership markers for group identification. In ICWSM 2012 - Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (pp. 467–470). https://doi.org/10.1609/icwsm.v6i1.14336
Mendeley helps you to discover research relevant for your work.