As the amount of literature increases, there is great possibility that more and more duplicate names are shared by an unknown number of real-life authors. The ambiguity in author name seriously interferes with the evaluation of researchers' academic ability and other statistical indicators. In order to address the issue, we propose an automatic author name disambiguation algorithm based on multiple strategies. First, we extract attributes like collaborator, affiliation, title, abstract, keywords and subject from the metadata of NSTL literature, at the same time, we generate proper name associated item which is similar to surname for each name by designing associated item generating algorithm. Then, the author data to be disambiguated is divided into several different clusters according to the name associated item. Further, we combine with the custom name compatibility rules to find the potentially similar author pairs in the cluster and design the similarity messurement for every attribute according to the characteristics of attribute type. After that, the similarity calculation function is constructed for the comparison between author pairs, which are tagged by the threshold value of similarity. Our experiment performs on the randomly selected WOS data sets which are filled with unbalanced distribution for publication. Result shows that our proposed method achieve higher average F-score than both the traditional clustering disambiguation algorithm and the graph-based disambiguation method.
CITATION STYLE
Li, H., Cui, Y., & Wang, T. (2020). An Effective Approach for Automatic Author Name Disambiguation Based on Multiple Strategies. In ACM International Conference Proceeding Series (pp. 169–175). Association for Computing Machinery. https://doi.org/10.1145/3403746.3403923
Mendeley helps you to discover research relevant for your work.