Automatic classification of review comments in pull-based development model

19Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

Abstract

The pull-based model, widely used in distributed software development, allows any contributor to fork a public repository, package contributions as a pull-request, and then merge back to the original repository. Code review is one of the most significant stages in pull-based development. It ensures that only high-quality pull-requests are accepted, based on the in-depth discussion among reviewers. Thus, automatically identifying what reviewers are talking about in the discussions is benificial to better understand the code review process. In this paper, we conduct a case study on three popular opensource software projects hosted on GitHub and construct a finegrained taxonomy including 11 sub-categories for review comments. We then manually label over 5,600 review comments, and propose a Two-Stage Hybrid Classification (TSHC) algorithm to classify review comments automatically by combining rule-based and machine-learning techniques. Comparative experiments with a text-based method achieve a reasonable improvement on each project (9.2% in Rails, 5.3% in Elasticsearch, and 7.2% in Angular.js respectively) in terms of the weighted average Fmeasure.

Cite

CITATION STYLE

APA

Li, Z., Yu, Y., Yin, G., Wang, T., Fan, Q., & Wang, H. (2017). Automatic classification of review comments in pull-based development model. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (pp. 572–577). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2017-039

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free