On the Perception of Software Quality Requirements during the Project Lifecycle
Abstract
Context and motivation A key requirements consideration in software development is the system's quality requirements. Quality is usually defined in terms of global properties for a software system, such as "reliability", "usability" and "maintainability". In the context of software maintenance they are particularly relevant: maintenance activities are performed to ensure software quality. Question/problem Recently an expanded view of RE has been emerging, wherein requirements artifacts play a role throughout a system's lifecycle. How important are quality requirements as the lifecycle progresses? We examine two questions: whether requirements are discussed more as the software matures; secondly, whether different software projects have similar levels of interest about quality requirements. Principal ideas/results We use a software repository mining technique we call signifier extraction, and empirically investigate the treatment of software quality in software projects. Signifiers are keywords about quality requirements that we generate using a controlled taxonomy based on ISO9126. Using source data extracted from eight open-source software projects we extract the signifier frequencies over weekly intervals. We analyze the signifier occurrence patterns statistically and historically. Contribution Our results show that quality requirements are discussed differently in different projects. Furthermore, there is no correlation between project age and the importance of software quality requirements. Finally, we show that these occurrences provide a roadmap to reconstruct the historical changes of qualities as responses to external forces, such as release cycles and usability audits.
Author-supplied keywords
On the Perception of Software Quality Requirements during the Project Lifecycle
during the project lifecycle
Neil A. Ernst and John Mylopoulos
Department of Computer Science
University of Toronto
Toronto, ON, Canada
{nernst,jm}@cs.toronto.edu
Abstract. [Context and motivation] A key requirements consideration in soft-
ware development is the system’s quality requirements. Quality is usually defined
in terms of global properties for a software system, such as “reliability”, “usabil-
ity” and “maintainability”. Quality requirements have been well-studied in the re-
quirements engineering literature. In the context of software maintenance, in par-
ticular, they are characterized as vitally important: maintenance activities should
be performed in order to maintain quality requirements. [Question/problem] Re-
quirements engineering is often placed at the initial phase of software develop-
ment. A more nuanced view has been emerging, however, where requirements
artifacts play a role throughout a system’s lifecycle. To what extent are quality
requirements present in the mind of a developer as the lifecycle progresses? The
popular view is that what is important is ‘functionality’, and fixing errors; it is
not clear whether there is a subsequent link from these activities to a particu-
lar software quality. We therefore examine two questions: whether requirements
are discussed more as the software matures; secondly, whether different software
projects have similar levels of interest about quality requirements. [Principal
ideas/results]We use a software repository mining technique we call signifier ex-
traction, and empirically investigate the treatment of software quality in software
projects. Our signifiers are keywords about quality requirements that we generate
using a controlled taxonomy based on ISO9126. Using sources extracted from
eight open-source software projects – their mailing lists, subversion comments,
and bug comments – we extract the signifier frequencies over weekly intervals.
We analyze the signifier occurrence patterns statistically and historically. [Con-
tribution] Our results show that quality requirements are discussed differently in
different projects. Furthermore, there is no correlation between project age and
the importance of software quality requirements. Finally, we show that these oc-
currences provide a roadmap to reconstruct the historical changes of qualities as
responses to external forces, such as release cycles and usability audits.
Keywords: Evolution, software quality requirements, repository mining
1 Introduction
Software quality requirements are a key concern throughout the software lifecycle. Re-
quirements research is increasingly focused on supporting systems beyond the initial
quirements are usually defined in terms of global properties for a software system, such
as “reliability”, “usability” and “maintainability”. Because of their global nature, qual-
ity requirements are hard to build into a design and are often treated post facto in terms
of metrics that are applied to the final product.
If requirements are important throughout the life-cycle (and we believe strongly that
they are), a better understanding of requirements after the initial release is important. Are
requirements discussed post-release? One way of answering this question is to examine
current practices using a standardized requirements taxonomy. In particular, we are in-
terested in finding out whether there is any noticeable pattern in how software project
participants conceive of quality requirements. Our study is conducted from the perspec-
tive of project participants (e.g., developers, bug reporters, users). We use a set of eight
open-source software (OSS) products to test two specific questions about software qual-
ity requirements. The first is whether software quality requirements are of more interest
as a project ages, as predicted in Lehman’s ‘Seventh Law’ – that “the quality of sys-
tems will appear to be declining unless they are rigorously maintained and adapted to
environmental changes [2, p. 21].” Our second question is whether quality is of similar
concern among different projects. That is, is a quality such as Usability as important to
one project’s participants as it is to another?
To assess these questions, we need to define what we mean by software quality
requirements. Our position is that requirements for software quality can be conceived as
a set of labels assigned to the conversations of project participants. These conversations
take the form of mailing list discussions, bug reports, and commit logs. Consider two
developers in an OSS project who are concerned about the software’s performance. To
capture this quality requirement, we look for indicators, which we call signifiers, which
manifest the concern. We then label the conversations with the appropriate software
quality, using text analysis. Our qualities are derived from a standard taxonomy – the ISO
9126-1 software quality model [3]. The signifiers are keywords that are associated with
a particular quality. For example, we label a bug report mentioning the slow response
time of a media player with the Efficiency quality.
We discuss related approaches in Section 2. Section 3 describes how we derive these
signifiers and how we built our corpora and toolset for extracting the signifiers. We then
present our observations and a discussion about significance in Section 4. Finally, we
examine some threats to our approach and discuss future work.
2 Related work
Part of our effort with this project is to understand the qualitative and intentional aspects
of requirements in software evolution, a notion we first discussed in [4]. That idea is
derived from, in part, work on narratives of software systems shown in academic work
like [5], or more general-purpose works like [6].
Cleland-Huang and her colleagues published work on mining requirements docu-
ments for non-functional requirements (quality requirements) in [7]. One approach they
tried was similar to this one, with keywords mined from NFR catalogues found in [8].
They managed recall of 80% with precision of 57% for the Security NFR, but could not
find a reliable source of keywords for other NFRs. Instead, they developed a supervised
reasons we did not follow this route. One, we believe we have a more comprehensive
set of terms due to the taxonomy we chose. Secondly, we wanted to compare across
projects. Their technique was not compared across different projects and the applicabil-
ity of the training set to different corpora is unclear. A common taxonomy allows us to
make inter-project comparison (subject to the assumption that all projects conceive of
these terms in the same way). Finally, the source text we use is less structured than their
requirements documents.
Mens et al. [9] conducted an empirical study of Eclipse, the OSS code editor, to
verify the claims of Lehman [2]. They concerned themselves with source code only,
and found Law Seven, “Declining Quality”, to be too difficult to assess: “[we lacked
an] appropriate measurement of the evolving quality of the system as perceived by the
users [9, p. 388]”. This paper examines the notions of quality in terms of a consistent
ontology, as Mens et al. call for in their conclusions.
Massey [10] and Scacchi ([11, 12]) looked at the topic of requirements in open-
source software. Their work discusses the source of the requirements and how they
are used in the development process. German [13] looked at GNOME specifically, and
listed several sources for requirements: leader interest, mimicry, brainstorming, and pro-
totypes. None of this work addressed quality requirements in OSS, nor did it examine
requirements trends.
3 Methodology
Overview:We first construct a set of signifiers, which produces a word list to extension-
ally define the software quality of interest, e.g., Efficiency. We then query corpora from
each project with these lists to identify events. Events are timestamped occurrences of
our signifiers in the corpora.
3.1 Step I – Establishing the corpora
Table 1. Selected Gnome ecosystem products (ksloc = thousand source lines of code)
Product Language Size (ksloc) Age (years)
Evolution C 313 10.75
Nautilus C 108 10.75
Metacity C 66 7.5
Ekiga C++ 54 7
Totem C 49 6.33
Deskbar Python 21 3.2
Evince C 66 9.75
Empathy C 55 1.5
Our corpora are from a selection of eight Gnome projects, listed in Table 1. Gnome
is an OSS project that provides a unified desktop environment for Linux and its cousins.
project operates somewhat independently. In 2002, Koch and Schneider [14] listed 52
developers as being responsible for 80% of the Gnome source code. In our study, the
number of contributors is likely higher, since it is easier to participate via email (e.g.,
feature requests) or bug reports. For example, in Nautilus, there were approximately
2,000 people active on the mailing list, whereas there were 312 committers to the source
repository. 1
The projects used in this paper were selected to represent a variety of lifespans and
codebase sizes (generated with [15]). All projects were written in C/C++, save for one
in Python (Deskbar). For each project we created a corpus from that project’s mailing
list, subversion logs and the bug comments, as of November 2008. From the corpus,
we extracted ‘messages’, that is, the origin, date, and text (e.g, the content of the bug
comment), and placed this information into a MySQL database. A message consists of a
single bug report, a single email message, or a single commit. If a discussion takes place
via email, each individual message about that subject is treated separately. Our dataset
consists of over nine hundred thousand such messages, across all eight projects. We do
not extract information on the mood of a message: i.e., we cannot tell whether this mes-
sage expressed a positive attitude towards the requirement in question (e.g., “This menu
is unusable”). Furthermore, we are not linking these messages to the implementation in
code; we have no way of telling to what extent the code met the requirement beyond
participant comments.
3.2 Step II – Defining qualities with signifiers
In semiotics, Peirce drew a distinction between signifier, signified, and sign [16]. In this
work, we make use of signifiers – words like ‘usability’ and ‘usable’ – to capture the oc-
currence in our corpora of the signified – in this example, the concept Usability. We ex-
tract our signified, concept words from the ISO 9126 quality model [3], which describes
six high-level quality requirements (listed in Table 2). There is some debate about the
significance and importance of the terms in this model. However, it is “an international
standard and thus provides an internationally accepted terminology for software quality
[17, p. 58],” which is sufficient for the purposes of this research.
We want to preserve domain-independence, such that we can use the same set of
signifiers on different projects. This is why we eschew more conventional text-mining
techniques that generate keyword vectors from a training set.
We generate the initial signifiers from Wordnet [18], an English-language ‘lexical
database’ that contains semantic relations betweenwords, includingmeronymy and syn-
onymy. We extract signifiers using the procedure defined in Algorithm 1. This gives us
a repeatable procedure for each signified quality. We call this initial set of signifiers
WN.
Expanding the signifiers – The members of the set of signifiers will have a big
effect on the number of events returned. For example, the term ‘user friendly’ is one
most would agree is relevant to discussion of usability. However, this term does not
appear in Wordnet. To see what effect an expanded list of signifiers would have, we
1 Generated using Libresoft, tools.libresoft.es
Require: T , the set of top-level terms in ISO9126-1
for all t ∈ T do
S ← ∅
identify synset of t (synonyms) from Wordnet
S ← S+ synset
identify direct hypernyms of t (specializations) from Wordnet
S ← S+ hypernyms
identify meronyms of t (components) from ISO9126
S ← S+ meronyms
identify related forms of t (stemmed) from Wordnet
S ← S+ related forms
for all s ∈ S do
query message corpora for s and spelling variations of s {ignore case}
end for
end for
return E, the set of unique ‘events’ per t
generated a second set (henceforth ext), by expanding WN with more software-specific
signifiers. The ext signifier sets are shown in Table 3.
To construct our expanded sets, we first used Boehm’s 1976 software quality model
[19], and classified his eleven ‘ilities’ into their respective ISO9126 qualities. We did
the same for the quality model produced by McCall et al. [20]. Finally, we analyzed two
mailing lists from the KDE project to enhance the specificity of the sets. Like Gnome,
KDE is an open-source desktop suite for Linux, and likely uses comparable terminology.
We selected KDE-Usability, which focuses on usability discussions for KDE as a whole;
and KDE-Konqueror, a list about a long-lived web browser project. For each high-level
quality in ISO9126, we first searched for our existing (WN) signifiers; we then randomly
sampled twenty-five mail messages that were relevant to that quality, and selected co-
occurring terms relevant to that quality. For example, we add the term ‘performance’ to
the synonyms for Efficiency, since this term occurs in most mail messages that discuss
efficiency.
We discuss the differences the two sets create in Section 4.
3.3 Step III – Querying the corpora
Once we constructed our sets of signifiers, we applied them to the message corpora (the
mailing lists, bug trackers, and repositories) to create a table of events. An event is any
message (row) in the corpus table which contains at least one term in the signifier set. A
message can contain signifiers for different qualities, and can thus generate as many as
six events. However, multiple signifiers for the same quality only generate a single event
for that quality. We produced a set of events (e.g., a subversion commit message), along
with the associated time and project. We group events by week for scalability reasons.
Note that each email message in a thread constitutes a single event. This means that it
is possible that a single mention of a signifier in the original message might be replied
to multiple times. We assume these replies are ‘on-topic’ and related to the original
concern.
Quality Signifiers
Maintainability testability changeability analyzability stability maintain maintainable
modularity modifiability understandability
Functionality security compliance accuracy interoperability suitability functional
practicality functionality
Portability conformance adaptability replaceability installability portable movable-
ness movability portability
Efficiency “resource behaviour” “time behaviour” efficient efficiency
Usability operability understandability learnability useable usable serviceable
usefulness utility useableness usableness serviceableness serviceability
usability
Reliability “fault tolerance” recoverability maturity reliable dependable responsi-
bleness responsibility reliableness reliability dependableness depend-
ability
Table 3. Qualities and quality signifiers – extended version (ext). Each quality consists of WN
terms (Table 2) in addition to the ones listed.
Quality Signifiers
Maintainability WN + interdependent dependency encapsulation decentralized modular
Functionality WN + compliant exploit certificate secured “buffer overflow” policy
malicious trustworthy vulnerable vulnerability accurate secure vulnera-
bility correctness accuracy
Portability WN + specification migration standardized l10n localization i18n inter-
nationalization documentation interoperability transferability
Efficiency WN + performance profiled optimize sluggish factor penalty slower
faster slow fast optimization
Usability WN + gui accessibility menu configure convention standard feature fo-
cus ui mouse icons ugly dialog guidelines click default human conven-
tion friendly user screen interface flexibility
Reliability WN + resilience integrity stability stable crash bug fails redundancy er-
ror failure
list volume or commit log activity (some projects aremuchmore active). The calculation
takes each signifier’s event count for that period, and divides by the overall number of
messages in the same period. We also remove low-volume periods from consideration.
This is because a week in which only one message appeared, that contained a signifier,
will present as a 100% match. From this dataset we conducted our observations and
statistical tests. Table 4 illustrates some of the sample events we dealt with, and our
subsequent mapping to software quality requirements.
Table 4. Classification examples. Signifiers causing a match are highlighted.
Event Quality
...By upgrading to a newer version of GNOME you could receive bug fixes and
new functionality.
None
There should be a feature added that allows you to keep the current functional-
ity for those on workstations (automatic hot-sync) and then another option that
allows you to manually initiate .
Functionality
Steps to reproduce the crash: 1. Can’t reproduce with accuracy. Seemingly ran-
dom. ....
Reliability
How do we go disabling ekiga’s dependency on these functions, so that people
who arn’t using linux can build the program without having to resort to open
heart surgery on the code?
Maintainability
U_() is equivalent of _() but returns Unicode (UTF-8) string. Update your xml-
i18n-tools from CVS (recent version understands U_), update Swedish trans-
lation and close the bug back.
Portability
On some thought, centering dialogs on the panel seems like it’s probably right,
assuming we keep the dialog on the screen, which should happen with latest
metacity.
Usability
These calls are just a waste of time for client and server, and the Nautilus online
storage view is slowed down by this wastefulness.
Efficiency
3.4 Step IV – Precision and recall
We verified the percentage of terms retrieved that were unrelated to a signified software
quality to understand the precision of our method. For example, we encountered some
mail messages from individuals whose email signature included the words “Usability
Engineer”. If the body of the message wasn’t obviously about usability, we coded this
as a false-positive. Our error test was to randomly select messages from the corpora
and code them as relevant or irrelevant. We assessed 100 events per quality, for each
set of signifiers (ext and WN). Table 5 presents the results of this test. False-positives
averaged 21% and 20% of events, for ext and WN respectively (i.e., precision was 79%
and 80%).
Recall, or completeness, is defined as the number of relevant events retrieved di-
vided by the total number of relevant events. Superficially we could describe our recall
as 100%, since the query engine returns all matches we asked for, but true recall should
sampled our corpora and classified each event into either a signifier (Usability, Relia-
bility, etc.) or None. For extended signifier lists, we had an overall recall of 51%, and
a poor 6% recall for the Wordnet signifiers. We therefore dispensed with the Wordnet
signifiers. This is a very subjective process. For example, we classified a third of the
events as None; however, arguably any discussion of software could be related, albeit
tangentially, to an ISO9126 quality. We think a better understanding of this issue is more
properly suited to a qualitative study, in which project-specific quality models can be
best established.
Table 5. False positive rates for the two signifier sets
Signified quality F.P. Rate ext F.P. Rate WN
Usability 0.47 0.22
Portability 0.11 0.20
Maintainability 0.22 0.31
Reliability 0.15 0.19
Functionality 0.14 0.18
Efficiency 0.16 0.07
Mean 0.21 0.20
4 Observations and discussion
This section first explains the frequency distributions of the data we collected. We then
use that data to answer the two questions raised in the introduction: 1) Is there a cor-
relation between discussion of quality requirements and project age? 2) Are quality re-
quirements of similar importance relative to each project?
Fig. 1. Frequency distribution for Evolution-Usability. x-axis represents number of events (66
events wide), y-axis the number of weeks in that bin.
Fig. 1 shows an example frequency distribution for the quality requirement Usability,
product Evolution, with non-normalized data. The distributions seem to follow a power-
law distribution, that is, a majority of weeks had few events, with the ‘long tail’ consist-
ing of those weeks with many events. We verified that this pattern also existed for the
remaining qualities and project combinations.
4.2 Examining quality discussions over time
Our first question was whether, as predicted in the literature, there was a correlation
between the importance of software quality requirement and the age of a project. We
examined this in three ways. First, we looked at the overall trends for a project. Secondly,
we used release windows, the time between the release of one version, and the release
of the next (major) version. Finally, we explored qualitative explanations for patterns in
the data.
Table 6. Selected summary statistics, normalized. Examples from Nautilus and Evolution for all
qualities using extended signifiers.
Project Quality r2 slope N (weeks)
Evolution
Efficiency 0.06 -0.02 439
Portability 0.08 -0.05 448
Maintainability 0.04 -0.02 320
Reliability 0.20 0.25 492
Functionality 0.03 -0.02 439
Usability 0.14 0.27 515
Nautilus
Efficiency 0.16 -0.10 420
Portability 0.16 -0.07 331
Maintainability 0.27 -0.09 216
Reliability 0.19 0.26 454
Functionality 0.12 -0.05 390
Usability 0.08 0.29 459
Using project lifespan – We examined whether, over a project’s complete lifespan,
there was a correlation with quality event occurrences. Recall that we define quality
events as occurrences of a quality signifier in a message in the corpora. We performed
a linear regression analysis and generated correlation coefficients for all eight projects
and six qualities. Figure 2 is an example of our analysis. It is a scatterplot of quality
events vs. time for the Usability quality in Evolution. For example, in 2000/2001, there
is a cluster around the 300 mark, using the extended (ext) set of signifiers. Note that the
y-axis is in units of (events/volume * 1000) for readability reasons.
The straight line is a linear regression. The dashed vertical lines represent Gnome
project milestones, with which the release dates of the projects we study are synchro-
nized. Release numbers are listed next to the dashed lines. Due to space constraints, Ta-
ble 6 lists only Nautilus and Evolution as products, and r2 – squared correlation value,
formance) for selected products using extended signifiers.
Quality Project r2 slope N (weeks)
Usability
Deskbar 0.08 -0.97 126
Evolution 0.14 0.27 515
Nautilus 0.08 0.29 459
Totem 0.20 0.63 314
Efficiency
Deskbar 0.00 -0.11 34
Evolution 0.06 -0.02 439
Nautilus 0.16 -0.10 420
Totem 0.10 -0.16 158
or coefficient of determination – and slope (trend) values for each quality within that
project. r2 varies between 0 and 1, with a value of 1 indicating perfect correlation. The
sign of the slope value indicates direction of the trend. A negative slope would imply
a decreasing number of occurrences as the project ages. Table 7 does a similar analysis
for all products and the Usability and Efficiency (performance) qualities.
0
100
200
300
400
Occu
rrenc
es (n
orma
lized
)
1998 2000 2002 2004 2006 2008
Evolution -- Usability (R
2
= 18%)
Year
Gno
me 1
.2
Gno
me 1
.4
Gno
me 2
.0
Gno
me 2
.2
Gno
me 2
.4
Gno
me 2
.6
Gno
me 2
.8
Gno
me 2
.10
Gno
me 2
.10
Gno
me 2
.10
Gno
me 2
.12
Gno
me 2
.16
Gno
me 2
.20
Gno
me 2
.24
Gno
me 1
.0
Fig. 2. Signifier occurrences per week, Evolution – Usability
The results are inconclusive. In all cases the correlation coefficient – indicating the
explanatory power of our linear regression model – is quite low, well below the 0.9
threshold used in, for example, [9]. There does not seem to be any reason to move to
non-linear regression models based on the data analysis we performed. We conclude
that our extended list of signifiers does not provide any evidence of a relationship be-
tween discussions of software quality requirements and time. In other words, either the
occurrences of our signifiers are random, or there is a pattern, and our signifier lists
10
20
30
40
50
Occu
rrenc
es (a
ctua
l)
2000 2001 2002 2003 2004 2005
Year
Nautilus -- Reliability (R
2
= 4%)
Gno
me 1
.4
Gno
me 1
.2
Gno
me 2
.0
Gno
me 2
.2
Gno
me 2
.4
Gno
me 2
.6
Gno
me 2
.8
Gno
me 2
.12
Gno
me 2
.10
Fig. 3. Signifier occurrences per week, Nautilus – Reliability
are not adequately capturing it. The former conclusion seems more likely based on our
inspection of the data.
Using release windows – It is possible that the event occurrences are more strongly
correlated with time periods prior to a major release, that is, that there is some cyclical
or autocorrelated pattern in the data. We defined a release window as the period from
immediately after a release to just before the next release. We investigated whether there
was a higher degree of correlation between the number of quality events and release age,
for selected projects and keywords. Was this release window correlation better than the
one we found for project lifespan as a whole? For space reasons we do not include these
results, but there was no improvement in correlation. There is no relationship between
an approaching release date and an increasing interest in software quality requirements.
Analysis of key peaks in selected graphs – The final explanation we explore is
that the data are unrelated to software age or release cycle, and are instead responding
to external events, such as a usability audit. We chose to look at Evolution, a mail and
calendar client, and Nautilus, a web browser and file manager, for more detailed ‘his-
torical’ analysis. We tried two approaches: one used the normalized data, and identified
periods where our signifier occurred more frequently with respect to everyday volume.
The second approach used the actual signifier counts to see why that signifier occurred
more frequently than other periods.
We looked at the normalized Usability events in Evolution, shown in Fig. 2. To
eliminate bug reports and triaging events, we excluded these types of data from our
query. Many bug reports are auto-generated, and contribute more noise than signal. For
instance, one initial peak we examined was related to the “Mass close of stale bugs > = 4
months old.” This generated a lot of noise as the signifiers in these reports are considered
once more by our algorithm (since we treat any discussion on a bug similarly to mail
threads).
Mailing list discussions at that time turned to a question about the default option for
forwarding mail messages, e.g., “... I know this was discussed a few weeks ago ... could
it be implemented as an advanced option that has to be turned on and is off by default?”
Later that year, in October, another spike in our graph can be attributed to a feature-
freeze on Evolution and associated UI cleanups. As Evolution 1.4 is released in mid-
2003, there is a small upward trend. Events at that time reflect problems with the new
release, reflecting some UI changes. We still see some effects due to volume, such as the
outlier near the end of 2003, where nearly one third of mailing messages were usability
related. The issue here is one of overall volume over the winter holidays. In this case a
single mail thread about keyboard shortcuts consumed the discussions.
For our second approach, we used the actual signifier event counts, and targeted Re-
liability events for Nautilus. In November, 2000, 50 events occur. Inspecting the events,
one can see that a number have to do with bug testing the second preview release that
was released a few days prior. For example, one event mentions ways to verify relia-
bility requirements using hourly builds: “As a result, you may encounter a number of
bugs that have already been fixed. So, if you plan to submit bug reports, it’s especially
important to have a correct installation!”. Secondly, in early 2004 there is a point with
29 events just prior to the release of Gnome 2.6. Discussion centers around the proper
treatment of file types that respects reliability requirements. It is not clear whether these
discussions are in response to the external pressure of the deadline or are just part of a
general, if heated, discussion.
These investigations show that there is value to examining the historical record of a
project in detail, beyond quantitative analysis. While some events are clearly respond-
ing to external pressures such as release deadlines, other events are often prompted by
something as simple as participant interest, which seems to be central to the OSS devel-
opment model.
4.3 Quality importance and project
Table 8. Quality per project. Numbers indicate normalized occurrences per week.
Quality Project Occurrences
Efficiency Evolution 0.012
Nautilus 0.026
Usability Evolution 0.192
Nautilus 0.285
Portability Evolution 0.010
Nautilus 0.011
Recall that in our second question, we wanted to examine whether certain projects
would be more concerned with software quality requirements than others. We charac-
terized the importance of a requirement to a project by calculating the mean normalized
longevity and project size.
Table 8 lists our results; for space considerations, only three (representative) qual-
ities are listed. We show the mean number of occurrences per week, normalized by
dividing by the overall number of ‘events’ in that period, to eliminate the effect of vol-
ume. We would like to know, in other words, what proportion of all messages in that
week were talking about the requirement of interest.
We used the extended signifier set (ext). We cannot compare between qualities, be-
cause the signifier sets are not the same size. However, there is a difference among
projects.We chose to focus onNautilus and Evolution (both projects of similar longevity,
focused on file management and mail respectively). The Efficiency quality occurs in
Nautilus discussions at a rate of 0.026 occurrences per week, and in Evolution at 0.012
occurrences per week – less than half as often. Usability is discussed 1.5 times as of-
ten in Nautilus, while other requirements, including Portability, show no difference.
One possible explanation is that Evolution participants have a conceptual model of Ef-
ficiency that is a poorer fit to our signifier lists than the model Nautilus participants
use. However, it does seem fair to conclude that projects have different interests with
respect to software quality. We intend to do further testing to explore how communities
conceptualize these fairly abstract ‘-ilities’.
4.4 Threats to validity
Construct validity – Themain threat to construct validity is that our signifiers may omit
relevant terms or phrases., e.g., “can’t find the submit button” vs. “usability”. Our quali-
ties are not directly comparable, since their respective signifier set sizes differ.Usability,
for example, has 24 terms in its bubble, versus Functionalitywith 10. We conducted the
error analysis to determine how accurate our bubbles are. Furthermore, we are assum-
ing that projects share the ontology of software quality expressed in the quality model
(ISO9126). A more domain-specific taxonomy would be useful.
Internal validity – When we perform a linear regression, assuming a linear rela-
tionship may not be a good model of the actual pattern these discussions follow. For
example, the number of occurrences may be changing in response to some other vari-
able, such as co-ordinated release dates, or by things such as developer illness. However,
it is not clear from examining the data that other models would be more suitable – there
is no evidence, from our experiments, that an exponential or log relationship exists, for
example. We focused on the linear model as it is the simplest explanation of the pattern
we would expect to see if quality discussions were increasing with time.
External validity – Our data originated from open-source projects, less than ten
years old, from the Gnome ecosystem. Of these, the open-source nature of the project
seems most problematic for external validity. Capra et al. [21], for example, show a
higher software quality inOSS projects than commercial projects. It would be interesting
to determine whether a top-down directive to focus on software quality would present
as a noticeable spike on the event occurrence graph.
4.5 Models of quality requirements
There is a rich history of discussion regarding software quality requirements, and quality
models in particular. The main problem that arose in our study was that the quality mod-
more specific models. However, we maintain that it is useful to have a cross-product
quality requirements model which can be used to compare various types of software.
Many questions are left unanswered when confronted by actual data: for instance, what
is the relationship between product reliability and product functionality? An unreliable
product is not meeting functional requirements, for certain. On the other hand, how can
developers define functionality when it is related to the expectations of users they may
never encounter? Can maintainability be solely an internal requirement, relevant to de-
velopers, if users begin to extend publicly released software components?
The challenge for researchers is to align software quality models, at the high level,
with the product-specific requirements models developers and community participants
work with, even if these models are implicit. One reason discussions of quality require-
ments were difficult to identify is that, without explicit models, these requirements are
not properly considered or are applied haphazardly. We need to establish a mapping
between the platonic ideal and the reality on the ground. This will allow us to compare
maintenance strategies for product quality requirements across domains, to see whether
strategies in, for example, Gnome, can be translated to KDE, Apple, or Windows soft-
ware.
5 Conclusions and future work
This paper presents a novel analysis technique for conducting empirical research in Re-
quirements Engineering. The technique has been applied to study two specific questions
concerning quality requirements. In accordance with Lehman’s laws of software evo-
lution, we hypothesized that there is growing interest in quality requirements within a
developer community as a project matures. However, our analysis provides no evidence
for this hypothesis. However, it is sometimes possible to use external events to explain
patterns in the data. We then showed that there is a difference in how different projects
treat software qualities – with some projects discussing certain quality requirements
more than others.
Our ultimate goal is to be able to extract, from available sources, a list of require-
ments for a project, so that we can trace not just the ‘physical’ changes in the codebase,
but also the evolving features and goals inherent in a project. We plan to continue our
experiments with repository mining with this in mind. In all likelihood, we will need to
enhance our taxonomy and our data sources (including source code comments or wiki
pages, for example). We would like to explore additional analysis techniques, including
autocorrelation analysis, in case there is a pattern that is not apparent.
6 Appendix and acknowledgements
We appreciate the comments of the software engineering group at the University of
Toronto and Abram Hindle, and the comments of anonymous reviewers. Source code,
processed data, and related discussions are available at http://neilernst.net/tag/msr/.
References
1. Cheng, B.H., de Lemos, R., Giese, H., Inverardi, P., Magee, J. In: Software Engineering for
Self-Adaptive Systems: A Research Roadmap. LNCS 5525 (2009) 1–26
software evolution-the nineties view. In: International Software Metrics Symposium, Albu-
querque, NM (1997) 20–32
3. International Standards Organization: Software engineering – Product quality – Part 1: Qual-
ity model ISO 9126-1 (2001)
4. Ernst, N.A., Mylopoulos, J.: Tracing software evolution history with design goals. In: Inter-
national Workshop on Software Evolvability at ICSM, Paris, France (October 2007)
5. Antón, A.I., Potts, C.: Functional paleontology: system evolution as the user sees it. In:
International Conference on Software Engineering, Toronto, Canada (2001) 421–430
6. Waldo, J., ed.: The Evolution of C++. MIT Press, Cambridge, Massachusetts (1993)
7. Cleland-Huang, J., Settimi, R., Zou, X., Solc, P.: The Detection and Classification of Non-
Functional Requirements with Application to Early Aspects. In: International Requirements
Engineering Conference, Minneapolis, Minnesota, IEEE (2006) 39–48
8. Chung, L., Nixon, B.A., Yu, E.S., Mylopoulos, J.: Non-Functional Requirements in Software
Engineering. Volume 5 of International Series in Software Engineering. Kluwer Academic
Publishers, Boston (October 1999)
9. Mens, T., Fernandez-Ramil, J., Degrandsart, S.: The evolution of Eclipse. In: International
Conference on Software Maintenance, Shanghai, China (October 2008) 386–395
10. Massey, B.: Where Do Open Source Requirements Come From (And What Should We Do
About It)? In: Workshop on Open source software engineering at ICSE, Orlando, FL, USA
(2002)
11. Scacchi, W.: Understanding the requirements for developing open source softwaresystems.
IET Software 149(1) (2002) 24–39
12. Scacchi, W., Jensen, C., Noll, J., Elliott, M.: Multi-Modal Modeling, Analysis and Validation
of Open Source Software Requirements Processes. In: International Conference on Open
Source Systems. Genoa, Italy (July 2005) 1—-8
13. German, D.M.: The GNOME project: a case study of open source, global software develop-
ment. Software Process: Improvement and Practice 8(4) (2003) 201–215
14. Koch, S., Schneider, G.: Effort, co-operation and co-ordination in an open source software
project: GNOME. Information Systems Journal 12 (2002) 27–42
15. Wheeler, D.: SLOCcount: http://www.dwheeler.com/sloccount/ (2009)
16. Atkin, A.: Peirce’s Theory of Signs. Stanford Encyclopedia of Philosophy (Spring 2009
Edition)
17. Bø egh, J.: A New Standard for Quality Requirements. IEEE Software 25(2) (2008) 57–63
18. Fellbaum, C., ed.: WordNet: An Electronic Lexical Database. MIT Press
19. Boehm, B., Brown, J.R., Lipow, M.: Quantitative Evaluation of Software Quality. In: Inter-
national Conference on Software Engineering. (1976) 592–605
20. McCall, J.: Factors in Software Quality: Preliminary Handbook on Software Quality for an
Acquisition Manager. Volume 1-3. General Electric (November 1977)
21. Capra, E., Francalanci, C., Merlo, F.: An Empirical Study on the Relationship Between Soft-
ware Design Quality, Development Effort and Governance in Open Source Projects. IEEE
Transactions on Software Engineering (2008)
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


