Murphy Was an Optimist

  • Driscoll K
N/ACitations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Embedded, safety-critical systems often have requirements for incredibly small probabilities of failure, e.g. 10-9 for a one hour exposure. One often hears designers of safety-critical systems say: "We have to tolerate all credible faults". However, the word "credible" in this assertion contrasts starkly with the word "incredibly" in the sentence before. In fact, there are faults and failures that most designers think can’t happen which actually can and do happen with probabilities far greater than the requirements allow. The well known Murphy’s Law states that: "If anything can go wrong, it will go wrong." When requirements limit failure probabilities to one-in-a-million or less, this should be re-written as: "If anything can’t go wrong, it will go wrong anyway." There are a couple of factors that lead to designers erroneously thinking that certain faults and failures are impossible; when in fact, not only are they possible, but some are actually highly probable. One factor is that the requirements are outside any designer’s experience, even when that experience includes that of colleagues. Using the literature seems like an obvious way of expanding one’s (virtual) experience. However, there are two problems with this. The first problem is that people who actually design safety-critical systems are rarely given enough time to keep current with the literature. The second problem is that the literature on actual occurrences of rare failure modes is almost nonexistent. Reasons for this include: people and organizations don’t want to admit they had a failure; designers feel that rare failure occurrence aren’t worth reporting; and, if designers aren’t given enough time to read literature, they certainly aren’t given enough time to write it. Take away: Designers should fight their management for time to keep current with the literature and designers should use every report of a rare failure as an opportunity to imagine other similar modes of failure. The other factor that leads to designers erroneously thinking that certain faults and failures are impossible stems from abstraction. The complexity of modern safety critical systems requires some form of attraction. However, when designers limit their thinking to one level of extraction, certain faults and failures can seem impossible, but would clearly be seen as probable if one were to examine layers below that level of abstraction. For example, a designer thinking about electrical components would not include in their FMEA the possibility that one component (e.g. a diode) could transmogrify into another component (e.g. a capacitor). But, at a lower level of extraction, it can be seen that a crack through a diode die can create a capacitor. And, a crack is one of the most highly probable failure modes at the physical material level of obstruction. Examples of rare but actually occurring failures will be given. These will include a number of Byzantine faults, component transmogrification, fault mode transformation (e.g. stuck at faults that aren’t so stuck), the dangers of self-inflicted shrapnel, component creation via emergent properties, "evaporating" software, and exhaustively tested software that still failed.

Cite

CITATION STYLE

APA

Driscoll, K. R. (2010). Murphy Was an Optimist (pp. 481–482). https://doi.org/10.1007/978-3-642-15651-9_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free