Analyzing behaviours of developers in different platforms (in particular, GitHub and Stack Overflow in this paper) can reveal interesting facts related to development activities. There are only few datasets for analysing crossplatform user behaviours, especially across GitHub and Stack Overflow. Users on GitHub and Stack Overflow are identifiable by equivalences of email addresses. In order to increase the number of identifiable users on these datasets, this paper retrieves potentially identifiable users between GitHub and Stack Overflow not relying only on email addresses. This paper employs a classification-based link prediction, which design the user identification problem as a link prediction problem on the bipartite graph consisting of users of GitHub and those of Stack Overflow. With the identification method, this paper generates a probabilistic dataset containing pairs of users with probabilities (or confidences). This paper, as well, publishes the identification tool in order to enable further data generation on appearing datasets of GitHub, Stack Overflow and others. The generated dataset and tool are highly helpful to accelerate researches on mining software repositories.
CITATION STYLE
Komamizu, T., Hayase, Y., Amagasa, T., & Kitagawa, H. (2017). Exploring identical users on GitHub and stack overflow. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (pp. 584–589). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2017-109
Mendeley helps you to discover research relevant for your work.