Sign up & Download
Sign in

iComment : Bugs or Bad Comments ?

by Lin Tan, Ding Yuan, Gopal Krishna, Yuanyuan Zhou
Proceedings of twentyfirst ACM SIGOPS symposium on Operating systems principles SOSP 07 (2007)

Cite this document (BETA)

Available from portal.acm.org
Page 1
hidden

iComment : Bugs or Bad Comments ?

/* iComment: Bugs or Bad Comments? */
Lin Tan†, Ding Yuan†, Gopal Krishna†, and Yuanyuan Zhou†‡
†University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
‡CleanMake Co., Urbana, Illinois, USA
{lintan2, dyuan3, gkrishn2, yyzhou}@cs.uiuc.edu
ABSTRACT
Commenting source code has long been a common practice in soft-
ware development. Compared to source code, comments are more
direct, descriptive and easy-to-understand. Comments and source
code provide relatively redundant and independent information re-
garding a program’s semantic behavior. As software evolves, they
can easily grow out-of-sync, indicating two problems: (1) bugs -
the source code does not follow the assumptions and requirements
specified by correct program comments; (2) bad comments - com-
ments that are inconsistent with correct code, which can confuse
and mislead programmers to introduce bugs in subsequent versions.
Unfortunately, as most comments are written in natural language,
no solution has been proposed to automatically analyze comments
and detect inconsistencies between comments and source code.
This paper takes the first step in automatically analyzing com-
ments written in natural language to extract implicit program rules
and use these rules to automatically detect inconsistencies between
comments and source code, indicating either bugs or bad com-
ments. Our solution, iComment, combines Natural Language Pro-
cessing (NLP), Machine Learning, Statistics and Program Analysis
techniques to achieve these goals.
We evaluate iComment on four large code bases: Linux, Mozilla,
Wine and Apache. Our experimental results show that iComment
automatically extracts 1832 rules from comments with 90.8-100%
accuracy and detects 60 comment-code inconsistencies, 33 new
bugs and 27 bad comments, in the latest versions of the four pro-
grams. Nineteen of them (12 bugs and 7 bad comments) have al-
ready been confirmed by the corresponding developers while the
others are currently being analyzed by the developers.
Categories and Subject Descriptors
D.4.5 [Operating Systems]: Reliability; D.2.4 [Software Engi-
neering]: Software/Program Verification—Reliability; D.2.7 [Sof-
tware Engineering]: Distribution, Maintenance, and Enhancement
—Documentation
General Terms
Algorithms, Documentation, Experimentation, Reliability
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SOSP’07, October 14–17, 2007, Stevenson, Washington, USA.
Copyright 2007 ACM 978-1-59593-591-5/07/0010 ...$5.00.
Keywords
comment analysis, natural language processing for software engi-
neering, programming rules, and static analysis
1. INTRODUCTION
1.1 Motivation
Despite costly efforts to improve software-development method-
ologies, software bugs in deployed code continue to thrive and con-
tribute to a significant percentage of system failures and security
vulnerabilities. Many software bugs are caused by a mismatch
between programmers’ intention and code’s implementation. A
mismatch would be developed due to miscommunication between
programmers, misunderstanding of software components, and care-
less programming. For example, one programmer who implements
function Foo() may assume that the caller of Foo holds a lock
or allocates a buffer. However, if such assumption is not specified
clearly, other programmers can easily violate this assumption and
introduce bugs. The problem above is further worsened by soft-
ware evolution and growth, with programmers frequently joining
and departing from the software development process.
To address the problem, comments became standard practice in
software development to increase the readability of code and to ex-
press programmers’ intention in a more explicit but less rigorous
manner than source code. Comments are written by programmers
in natural language to explain code segments and data structures, to
specify assumptions, to record reminders, etc. that are often not ex-
pressed explicitly in source code. From our simple statistics, Linux
contains about 1.0 million lines of comments for 5.0 million lines
of source code, and Mozilla has 0.51 million lines of comments
for 3.3 million lines of code, excluding copyright notices and blank
lines. These results indicate the common usage of comments to
improve software reliability and maintainability in large software.
Even though comments are less formal and precise than source
code, comments have a unique advantage: comments are much
more direct, descriptive and easy-to-understand than source code.
In other words, many assumptions are specified directly and clearly
in comments but are usually difficult to infer from source code.
For example, the following comment from the latest Linux Kernel
(kernel/irq/manage.c) clearly specifies that function free_irq()
must not be called from interrupt context.
drivers/scsi/in2000.c:
/* Caller must hold instance lock! */
static int reset_hardware( … ) { ... }

static int in2000_bus_reset( … ) {

reset_hardware( … );

}
Assumption
in Comment.
No lock is held
before calling
reset_hardware().
Mismatch!
A confirmed
and fixed bug!
Quote from Bug Report
172131 in Mozilla Bugzilla:
“nsCRT.h's comment
suggests the wrong De-
allocator.
nsComponentManager.cpp
actually uses the wrong
De-allocator.”
nsCRT.h:
//must use delete[] to free the memory
static PRUnichar* PR_strdup(…...);
nsComponentManager.cpp:
nsresult nsComponentManagerImpl:: \\
GetClassObject(...) { …
buf = PR_strdup(…);
...
delete [] buf;

}
Bad command!
Should use
PR_free()
instead of
delete [].Bug!
Mislead by the bad
comment above.
security/nss/lib/ssl/sslsnce.c:
/* Caller must hold cache lock when calling this.*/
static sslSessionID * ConvertToSID( … ) { … }

static sslSessionID *ServerSessionIDLookup(…) {...
UnlockSet(cache, set);
...
sid = ConvertToSID( … );
...
}
Cache lock is
released
before calling
ConvertToSID()
Assumption
in Comment. Mismatch! Confirmed
by developers
as a bad
comment
after we
reported it.
kernel/irq/manage.c:
/* This function must not be called from interrupt context */
void free_irq( … ) { … }
It is hard to infer this assumption from the source code, even with
advanced techniques such as code mining or probabilistic rule in-
ference [16, 24, 26] (more discussion in Section 8.2).
Page 2
hidden
drivers/scsi/in2000.c:
/* Caller must hold instance lock! */
static int reset_hardware( … ) { ... }

static int in2000_bus_reset( … ) {

reset_hardware( … );

}
Assumption
in Comment.
No lock is held
before calling
reset_hardware().
Mismatch!
A confirmed
and fixed bug!
Quote from Bug Report
172131 in Mozilla Bugzilla:
“nsCRT.h's comment
suggests the wrong De-
allocator.
nsComponentManager.cpp
actually uses the wrong
De-allocator.”
nsCRT.h:
//must use delete[] to free the memory
static PRUnichar* PR_strdup(…...);
nsComponentManager.cpp:
nsresult nsComponentManagerImpl:: \\
GetClassObject(...) { …
buf = PR_strdup(…);
...
delete [] buf;

}
Bad command!
Should use
PR_free()
instead of
delete [].Bug!
Mislead by the bad
comment above.
security/nss/lib/ssl/sslsnce.c:
/* Caller must hold cache lock when calling this.*/
static sslSessionID * ConvertToSID( … ) { … }

static sslSessionID *ServerSessionIDLookup(…) {...
UnlockSet(cache, set);
...
sid = ConvertToSID( … );
...
}
Cache lock is
released
before calling
ConvertToSID()
Assumption
in Comment. Mismatch! Confirmed
by developers
as a bad
comment
after we
reported it.
Figure 1: A new bug detected by our tool in the latest version of Linux,
which has been confirmed and fixed by the Linux developers.
drivers/scsi/in2000.c:
/* Caller must hold instance lock! */
static int reset_hardware( … ) { ... }

static int in2000_bus_reset( … ) {

reset_hardware( … );

}
Assumption
in Comment.
No lock is held
before calling
reset_hardware().
Mismatch!
A confirmed
and fixed bug!
Quote from Bug Report
172131 in Mozilla Bugzilla:
“nsCRT.h's comment
suggests the wrong De-
allocator.
nsComponentManager.cpp
actually uses the wrong
De-allocator.”
nsCRT.h:
//must use delete[] to free the memory
static PRUnichar* PR_strdup(…...);
nsComponentManager.cpp:
nsresult nsComponentManagerImpl:: \\
GetClassObject(...) { …
buf = PR_strdup(…);
...
delete [] buf;

}
Bad command!
Should use
PR_free()
instead of
delete [].Bug!
Mislead by the bad
comment above.
security/nss/lib/ssl/sslsn e.c:
/* Caller must hold cache lock when calling this.*/
static sslSessionID * ConvertToSID( … ) { … }

static sslSessionID *ServerSessionIDLookup(…) {...
UnlockSet(cache, set);
...
sid = ConvertToSID( … );
...
}
Cache lock is
released
before calling
ConvertToSID()
Assumption
in Comment. Mismatch! Confirmed
by developers
as a bad
comment
after we
reported it.
Figure 2: A new misleading bad comment detected by our tool in
the latest version of Mozilla. It has been confirmed by the Mozilla de-
velopers, who replied us “I should have removed that comment about
needing to hold the lock when calling ConvertToSID”.
om ents and source code provide relatively redundant and
independ nt info mation about a program’s semantic behavior, cre-
ating a unique opportunity to compare the two to check for in on-
sistencies. As pointed out by a recent study [22] of the evolut on of
comments, when software evolves, it is common for comments and
source code to be out-of-sync. An inconsistency between the two
indicates either a bug or a bad comment, both of which have severe
implication on software robustness and productivity:
(1) Bugs—source code does not follow correct comments. Such
case may be caus d by ti e-constraints or other reasons, but a
very likely reason is that some co e and its associated comments
are updat d with a different assumption, while some old de is not
updated accordingly and still follows the old assumption.
Figure 1 shows such a real world bug example from Linux Ker-
nel 2.6.11. The comment above the implementation of function
reset_hardware() explicitly states the req irement that the
caller of this fun tion must hold the instance lock. However, in
the in2000_bus_reset() function body, the lock is not ac-
quired before calling reset_hardware(), introducing a bug (it
has been confirmed by the Linux developer as a true bug and has
been fixed). In Section 7, we will show more new bug examples that
our tool detected in the latest versions of large software including
Linux.
(2) Bad comments that can later lead to bugs. It is common for
developers to change code without updating comments accordingly
as developers may not be motivated, may not have time or simply
forget to do so. Furthermore, as opposed to source code that always
goes through a series of software testing before release, comments
cannot be tested to see if they are still valid. As a result, many com-
ments can be out-of-date and incorrect. We refer to such comments
as bad comments. Note that we do not consider comments with
simple typographical errors or grammar errors as bad comments.
Figure 2 shows a bad comment example, automatically detected
by our tool in the latest version of Mozilla and confirmed by the
developers based on our report. The outdated comment, the caller
must hold cache lock when calling function ConvertToSID(),
does not match with the code that releases the lock before call-
ing ConvertToSID(). Although such out-of-date or incorrect
bad comments do not affect Mozilla’s correctness, they can eas-
ily mislead programmers to introduce bugs later, as also acknowl-
edged by several Mozilla developers after we reported such bad
comments. In Section 7.2, we will show two real world bad com-
ments in Mozilla that have caused new bugs in later versions.
The severity of bad comments is also realized by programmers to
some degree. Very often some software patches only fix bad com-
ments to avoid misleading programmers. We analyzed several bug
databases and found that at least 62 bug reports in FreeBSD [4]
are only about incorrect and confusing comments. For example,
FreeBSD patch “kern/700" only modifies a comment in the file
/sys/net/if.h. Similarly, the Mozilla patch for bug report 187257
in December 2002 only fixed a comment in file FixedTableLayout-
Strategy.h.
The bug and bad comment examples above indicate that it is
very important for programmers to maintain code-comment con-
sistency; and it is also highly desirable to automatically detect bad
comments so that they can be fixed before they mislead program-
mers and cause damages.
To the best of our knowledge, no tool has ever been proposed to
automatically analyze comments written in natural language and
detect inconsistencies between comments and source code. Almost
all compilers and static analysis tools simply skip comments as if
they do not exist, losing the opportunity to use comments to their
maximum potential as well as to detect bad comments.
1.2 Challenges in Analyzing Comments
The reason for the almost non-existent work in comment analy-
sis and comment-code inconsistency detection is that automatically
analyzing comments is extremely difficult [44]. As comments are
written in natural language, they are difficult to analyze and almost
impossible to “understand” automatically, even with the most ad-
vanced natural language processing (NLP) techniques [27], which
mostly focus on analyzing well written news articles from the Wall
Street Journal or other rigorous corpora. To make things worse,
unlike these news articles, comments are usually not well written
and many of them are not grammatically correct. Moreover, many
words in comments have different meanings from their real-world
meanings. For example, words “buffer”, “memory” and “lock”
have program domain specific meanings that cannot be found in
general dictionaries. Additionally, many comments are also mixed
with program identifiers (variables, functions, etc.) that do not exist
in any dictionary.
Despite the above fundamental challenges, it is highly desirable
for a comment analysis and comment-code inconsistency detection
tool to have the following properties: (1) accuracy: the analysis
and inconsistency results need to be reasonably accurate. Too many
false positives can greatly affect the usability of the tool; (2) prac-
ticality: the tool should be able to analyze real world comments
from existing software (such as Linux) without requiring program-
mers to rewrite all comments; (3) scalability: the tool should be
scalable to handle large software with multi-million lines of code
and comments; (4) generality: the tool cannot be “hard-coded”
to handle only a specific type of comments. (5) minimum man-
ual effort: while it might be extremely difficult to eliminate pro-
grammers’ involvement, the tool should operate as autonomously
as possible.
1.3 Our Contributions
This paper makes the first step in automatically analyzing pro-
gram comments written in natural language to extract program-

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

10 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
60% Ph.D. Student
 
20% Researcher (at a non-Academic Institution)
 
10% Student (Master)
by Country
 
30% United States
 
20% Switzerland
 
20% China