Historical interpretation benefits from identifying analogies among famous people: Who are the Lincolns, Einsteins, Hitlers, and Mozarts? We investigate several approaches to convert approximately 600,000 historical figures into vector representations to quantify similarity according to their Wikipedia pages. We adopt an effective reference standard based on the number of human-annotated Wikipedia categories being shared and use this to demonstrate the performance of our similarity detection algorithms. In particular, we investigate four different unsupervised approaches to representing the semantic associations of individuals: (1) TF-IDF, (2) Weighted average of distributed word embedding, (3) LDA Topic analysis and (4) Deepwalk embedding from page links. All proved effective, but Deepwalk embedding yielded an overall accuracy of 91.33% in our evaluation to uncover historical analogies. Combining LDA and Deepwalk yielded even higher performance.
CITATION STYLE
Chen, Y., Perozzi, B., & Skiena, S. (2015). Vector-based similarity measurements for historical figures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9371, pp. 179–190). Springer Verlag. https://doi.org/10.1007/978-3-319-25087-8_17
Mendeley helps you to discover research relevant for your work.