Citation data clustering for author name disambiguation

3Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose a new method of citation data clustering for author name disambiguation. Most citation data appearing in the reference section of scientific papers include the coauthor first names with their initials. Hence, we often search citation data by using such an abbreviated name, e.g. "S. Lee" or "J. Chen", and consequently obtain many irrelevant data in the search result, because such an abbreviated name refers to many different persons. In this paper, we propose a method of citation data clustering to construct clusters each of which includes only citation data corresponding to a unique author. Our clustering method is based on a probabilistic model which is an extension of the naive Bayes mixture model. Since our model has two hidden variables, we call it two-variable mixture model. In the evaluation experiment, we used the well-known DBLP data set. The results show that the two-variable mixture model can achieve a better balance between precision and recall than the naive Bayes mixture model.

Cite

CITATION STYLE

APA

Masada, T., Takasu, A., & Adachi, J. (2007). Citation data clustering for author name disambiguation. In ACM International Conference Proceeding Series (Vol. 06-08-June-2007). Association for Computing Machinery. https://doi.org/10.4108/infoscale.2007.203

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free