Vector-based similarity measurements for historical figures

Yanqing Chen; Bryan Perozzi; Steven Skiena

Conference Proceedings

Vector-based similarity measurements for historical figures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9371 179-190

DOI: 10.1007/978-3-319-25087-8_17

7Citations

5Readers

Get full text

Abstract

Historical interpretation benefits from identifying analogies among famous people: Who are the Lincolns, Einsteins, Hitlers, and Mozarts? We investigate several approaches to convert approximately 600,000 historical figures into vector representations to quantify similarity according to their Wikipedia pages. We adopt an effective reference standard based on the number of human-annotated Wikipedia categories being shared and use this to demonstrate the performance of our similarity detection algorithms. In particular, we investigate four different unsupervised approaches to representing the semantic associations of individuals: (1) TF-IDF, (2) Weighted average of distributed word embedding, (3) LDA Topic analysis and (4) Deepwalk embedding from page links. All proved effective, but Deepwalk embedding yielded an overall accuracy of 91.33% in our evaluation to uncover historical analogies. Combining LDA and Deepwalk yielded even higher performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, Y., Perozzi, B., & Skiena, S. (2015). Vector-based similarity measurements for historical figures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9371, pp. 179–190). Springer Verlag. https://doi.org/10.1007/978-3-319-25087-8_17

Vector-based similarity measurements for historical figures

Abstract

Author supplied keywords

Cite

Register to see more suggestions