Studying the needed effort for identifying duplicate issues

Mohamed Sami Rakha; Weiyi Shang; Ahmed E. Hassan

Journal Article

Studying the needed effort for identifying duplicate issues

Empirical Software Engineering (2016) 21(5) 1960-1989

DOI: 10.1007/s10664-015-9404-6

29Citations

74Readers

Get full text

Abstract

Many recent software engineering papers have examined duplicate issue reports. Thus far, duplicate reports have been considered a hindrance to developers and a drain on their resources. As a result, prior research in this area focuses on proposing automated approaches to accurately identify duplicate reports. However, there exists no studies that attempt to quantify the actual effort that is spent on identifying duplicate issue reports. In this paper, we empirically examine the effort that is needed for manually identifying duplicate reports in four open source projects, i.e., Firefox, SeaMonkey, Bugzilla and Eclipse-Platform. Our results show that: (i) More than 50 % of the duplicate reports are identified within half a day. Most of the duplicate reports are identified without any discussion and with the involvement of very few people; (ii) A classification model built using a set of factors that are extracted from duplicate issue reports classifies duplicates according to the effort that is needed to identify them with a precision of 0.60 to 0.77, a recall of 0.23 to 0.96, and an ROC area of 0.68 to 0.80; and (iii) Factors that capture the developer awareness of the duplicate issue’s peers (i.e., other duplicates of that issue) and textual similarity of a new report to prior reports are the most influential factors in our models. Our findings highlight the need for effort-aware evaluation of approaches that identify duplicate issue reports, since the identification of a considerable amount of duplicate reports (over 50 %) appear to be a relatively trivial task for developers. To better assist developers, research on identifying duplicate issue reports should put greater emphasis on assisting developers in identifying effort-consuming duplicate issues.

Author supplied keywords

Cite

CITATION STYLE

APA

Rakha, M. S., Shang, W., & Hassan, A. E. (2016). Studying the needed effort for identifying duplicate issues. Empirical Software Engineering, 21(5), 1960–1989. https://doi.org/10.1007/s10664-015-9404-6

Studying the needed effort for identifying duplicate issues

Abstract

Author supplied keywords

Cite

Register to see more suggestions