Abstract
The pull-based model, widely used in distributed software development, allows any contributor to fork a public repository, package contributions as a pull-request, and then merge back to the original repository. Code review is one of the most significant stages in pull-based development. It ensures that only high-quality pull-requests are accepted, based on the in-depth discussion among reviewers. Thus, automatically identifying what reviewers are talking about in the discussions is benificial to better understand the code review process. In this paper, we conduct a case study on three popular opensource software projects hosted on GitHub and construct a finegrained taxonomy including 11 sub-categories for review comments. We then manually label over 5,600 review comments, and propose a Two-Stage Hybrid Classification (TSHC) algorithm to classify review comments automatically by combining rule-based and machine-learning techniques. Comparative experiments with a text-based method achieve a reasonable improvement on each project (9.2% in Rails, 5.3% in Elasticsearch, and 7.2% in Angular.js respectively) in terms of the weighted average Fmeasure.
Author supplied keywords
Cite
CITATION STYLE
Li, Z., Yu, Y., Yin, G., Wang, T., Fan, Q., & Wang, H. (2017). Automatic classification of review comments in pull-based development model. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (pp. 572–577). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2017-039
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.