Many open-source software projects are self-organized and do not maintain official lists with information on developer roles. So, knowing which developers take core and maintainer roles is, despite being relevant, often tacit knowledge. We propose a method to automatically identify core developers based on role permissions of privileged events triggered in GitHub issues and pull requests. In an empirical study on 25/GitHub projects, (1) we validate the set of automatically identified core developers with a sample of project-reported developer lists, and (2) we use our set of identified core developers to assess the accuracy of state-of-the-art unsupervised developer classification methods. Our results indicate that the set of core developers, which we extracted from privileged issue events, is sound and the accuracy of state-of-the-art unsupervised classification methods depends mainly on the data source (commit data versus issue data) rather than the network-construction method (directed versus undirected, etc.). In perspective, our results shall guide research and practice to choose appropriate unsupervised classification methods, and our method can help create reliable ground-truth data for training supervised classification methods.
CITATION STYLE
Bock, T., Alznauer, N., Joblin, M., & Apel, S. (2023). Automatic Core-Developer Identification on GitHub: A Validation Study. ACM Transactions on Software Engineering and Methodology, 32(6). https://doi.org/10.1145/3593803
Mendeley helps you to discover research relevant for your work.