Re: [SystemSafety] Software reliability (or whatever you would prefer to call it)

From: Nick Tudor < >
Date: Mon, 9 Mar 2015 17:52:36 +0000

Hi Bev

My objection still stands and yes, you have parsed incorrectly. Slowly is right and yes we do agree that the environment is potentially, likely, almost certainly random. It's just that small, simple detail that the software is not and hence does not have a "reliability ". If you could agree that, then maybe we are getting somewhere.

I have no objection to conservatism in safety systems, software based or otherwise, but I do object to bad advice which forces conservatism to be unsoundly justified and hence too costly. I have read the document to which you refer, among others from the ONR. I know this, and others, have been based upon interested parties views and unfortunately have become lore if not 'law'. In this instance, just because it's in the document doesn't make it right, just unjustifiably conservative.

On Monday, 9 March 2015, Littlewood, Bev <Bev.Littlewood.1_at_xxxxxx wrote:

> Hi Nick
> On 9 Mar 2015, at 10:14, Nick Tudor <njt_at_xxxxxx > <javascript:_e(%7B%7D,'cvml','njt_at_xxxxxx >
> Hi Bev
> The input you have given to support Peter is the same that you have been
> [wrongly] saying for over 30 years.
> 40-odd years in fact: the first paper was in 1973. And all that time
> under the rigorous scrutiny of scientific peer review…:-)
> But this exchange makes it seem like yesterday.
> I just read your latest posting in which you say “ is the
> environment that is random rather than the software.” Exactly. In fact, as
> I put it in my posting:
> The main source of uncertainty lies in software’s interaction with the
>> world outside. There is inherent uncertainty about the inputs it will
>> receive in the future, and in particular about when it will receive an
>> input that will cause it to fail.
> So the software encounters faults randomly, so the software fails
> randomly, so the failure process is random, i.e. *it is a stochastic
> process*. We are getting there, slowly. Can I take it that you withdraw
> your objections and we are now in agreement?
> Simple, really, isn’t it?
> Your comment about the UK nuclear sector is puzzling. I and my
> colleagues have worked with them (regulators and licensees) for 20 years
> (and still do). I have nothing but admiration for the technical competence
> and sense of responsibility of the engineers involved. If by “holding back”
> the sector, you are referring to their rather admirable technical
> conservatism, when building critical computer-based systems, I can only
> disagree with you. But it is probably their insistence on assessing the
> reliability of their critical software-based systems that prompts your ire.
> You are wrong, of course: read “The tolerability of risk from nuclear power
> stations” (*tolerability*.pdf
> <>)
> Cheers
> Bev
> PS I was rather amused to see the following on your website: "Tudor
> Associates is a consultancy that specialises in assisting companies for
> whom safe, reliable systems and software are critical.” Am I parsing that
> wrong? Don’t the adjectives apply to software? Or have you changed your
> mind?
> The best example of this is "Execution of software is thus a
> *stochastic* (random) *process*". No it isn't and you said so in the
> earlier part of the section :"It is true, of course, that software fails
> systematically, in the sense that if a program fails in certain
> circumstances, it will *always* fail when those circumstances are exactly
> repeated".
> So it either works [in the context of use] or it doesn't.
> Coming up with some apparent pattern of behaviour which you claim is
> random because of an indeterminate world, does not make the software
> execution in any way random. It merely acknowledges that the world is a
> messy place, which we all knew anyway. Your MS example, is yet another
> attempt to justify the approach. MS had so many issues because there were
> an indeterminate number of 3rd party applications which over wrote MS .dll
> causing unforeseen effects. Well, shock ! It apparently failed randomly;
> doesn't support your argument in anyway what so ever.
> I too have a day job and hence I cannot pick apart all of your
> arguments, but could easily do so (and have done in previous posts). So
> I'll cut to the chase.
> In my view, the reason so many have commented on the list is that the
> kind of thinking espoused regarding so called "software reliability" costs
> industry and tax payers money and it is frustrating to have such written in
> standards which ill-informed users, such as those in government, take as
> read. This kind of thinking has and continues to hold back the UK nuclear
> sector, for example, and, as I wrote in an earlier posting, I would rather
> the whole subject was removed entirely from the standard. If it is not
> possible to remove it entirely (which should be possible), then there
> should be a very clearly written disclaimer which emphasises that not
> everyone believes that the approach is viable and that it is left to the
> developer to propose the manner in which software can be shown to be
> acceptably safe without having to use "software reliability" as a method to
> justify the contribution of software to system safety.
> Going back [again] to the day job
> Regards
> Nick Tudor
> Tudor Associates Ltd
> Mobile: +44(0)7412 074654
> *77 Barnards Green Road*
> *Malvern*
> *Worcestershire*
> *WR14 3LR*
> * Company No. 07642673*
> *VAT No:116495996*
> *
> <>*
> On 8 March 2015 at 14:03, Littlewood, Bev <Bev.Littlewood.1_at_xxxxxx > <javascript:_e(%7B%7D,'cvml','Bev.Littlewood.1_at_xxxxxx >
>> As I am the other half of the authorial duo that has prompted this
>> tsunami of postings on our list, my friends may be wondering why I’ve kept
>> my head down. Rather mundane reason, actually - I’ve been snowed under with
>> things happening in my day job (and I’m supposed to be retired…).
>> So I’d like to apologise to my friend and co-author of the offending
>> paper, Peter Ladkin, for leaving him to face all this stuff alone. And I
>> would like to express my admiration for his tenacity and patience in
>> dealing with it over the last few days. I hope others on this list
>> appreciate it too!
>> I can’t respond here to everything that has been said, but I would like
>> to put a few things straight.
>> First of all, the paper in question was not intended to be at all
>> controversial - and indeed I don’t think it is. It has a simple purpose: to
>> clean up the currently messy and incoherent Annex D of 61508. Our aim here
>> was not to innovate in any way, but to take the premises of the original
>> annex, and make clear the assumptions underlying the (very simple)
>> mathematics/statistics for any practitioners who wished to use it. The
>> technical content of the annex, such as it is, concerns very simple
>> Bernoulli and Poisson process models for (respectively) on-demand (discrete
>> time) and continuous time software-based systems. Our paper addresses the
>> practical concerns that a potential user of the annex needs to address - in
>> order, for example, to use the tables there. Thus there is an extensive
>> discussion of the issue of state, and how this affects the plausibility of
>> the necessary assumptions needed to justify claims for Bernoulli or Poisson
>> behaviour.
>> Note that there is no advocacy here. We do not say “Systems necessarily
>> fail in Bernoulli/Poisson processes, so you must assess their reliability
>> in this way”. Whilst these are, we think, plausible models for many
>> systems, they are clearly not applicable to all systems. Our concern was to
>> set down what conditions a user would need to assure in order to justify
>> the use of the results of the annex. If his system did not satisfy these
>> requirements, then so be it.
>> So why has our innocuous little offering generated so much steam?
>> Search me. But reading some of the postings took me back forty years.
>> “There’s no such thing as software reliability.” "Software is deterministic
>> (or its failures are systematic) therefore probabilistic treatments are
>> inappropriate.” Even, God help us, “Software does not fail.” (Do these
>> people not use MS products?) “Don’t bother me with the science, I’m an
>> *engineer* and I know what’s what” (is that an unfair caricature of a
>> couple of the postings?). “A lot of this stuff came from academics, and we
>> know how useless and out-of-touch with the real world they are (scientific
>> peer-review? do me a favour - just academics talking to one another)”. Sigh.
>> Here are a few comments on a couple of the topics of recent
>> discussions. Some of you may wish to stop reading here!
>> *1 Deterministic, systematic…and stochastic. *
>> Here is some text I first used thirty years ago (only slightly
>> modified). This is not the first time I’ve had to reuse it in the
>> intervening years.
>> "It used to be said – in fact sometimes still is – that 'software
>> failures are systematic *and therefore it does not make sense to talk of
>> software reliability'*. It is true, of course, that software fails
>> systematically, in the sense that if a program fails in certain
>> circumstances, it will *always* fail when those circumstances are
>> exactly repeated. Where then, it is asked, lies the uncertainty that
>> requires the use of probabilistic measures of reliability?
>> "The main source of uncertainty lies in software’s interaction with the
>> world outside. There is inherent uncertainty about the inputs it will
>> receive in the future, and in particular about when it will receive an
>> input that will cause it to fail. Execution of software is thus a *
>> stochastic* (random) *process*. It follows that many of the classic
>> measures of reliability that have been used for decades in hardware
>> reliability are also appropriate for software: examples include *failure
>> rate* (for continuously operating systems, such as reactor control
>> systems); *probability of failure on demand (pfd)* (for demand-based
>> systems, such as reactor protection systems); *mean time to failure*;
>> and so on.
>> "This commonality of measures of reliability between software and
>> hardware is important, since practical interest will centre upon the
>> reliability of *systems* comprising both. However, the mechanism of
>> failure of software differs from that of hardware, and we need to
>> understand this in order to carry out reliability evaluation.” (it goes
>> on to discuss this - no room to do it here)
>> At the risk of being repetitive: The point here is that uncertainty -
>> "aleatory uncertainty" in the jargon - is an inevitable property of the
>> failure process. You cannot eliminate such uncertainty (although you may be
>> able to reduce it). The only candidate for a quantitative calculus of
>> uncertainty is probability. Thus the failure process is a stochastic
>> process.
>> Similar comments to the above can be made about “deterministic” as used
>> in the postings. Whilst this is, of course, an important and useful
>> concept, it has nothing to do with this particular discourse.
>> *2. Terminology, etc.*
>> Serious people have thought long and hard about this. The
>> Avizienis-Laprie-Randell-Neumann paper is the result of this thinking. You
>> may not agree with it (I have a few problems myself), but it cannot be
>> dismissed after a few moments thought, as it seems to have been in a couple
>> of postings. If you have problems with it, you need to engage in serious
>> debate. It’s called science.
>> *3. You can’t measure it, etc.*
>> Of course you can. Annex D of 61508, in its inept way, shows how - in
>> those special circumstances that our note addresses in some detail.
>> Society asks “How reliable?”, “How safe?”, “Is it safe enough?”, even
>> “How confident are you (and should we be) in your claims?” The first three
>> are claims about the stochastic processes of failures. If you don’t accept
>> that, how else would you answer? I might accept that you are a good
>> engineer, working for a good company, using best practices of all kinds -
>> but I still would not have answers to the first three questions.
>> The last question above raises the interesting issue of epistemic
>> uncertainty about claims for systems. No space to discuss that here - but
>> members of the list will have seen Martyn Thomas’ numerous questions about
>> how confidence will be handled (and his rightful insistence that it
>> *must* be handled).
>> *4. But I’ll never be able to claim 10^-9….*
>> That’s probably true.
>> Whether 10^-9 (probability of failure per hour) is actually * needed*
>> in aerospace is endlessly debated. But you clearly need *some* dramatic
>> number. Years ago, talking to Mike deWalt about these things, he said that
>> the important point was that aircraft safety needed to improve
>> continuously. Otherwise, with the growth of traffic, we would see more and
>> more frequent accidents, and this would be socially unacceptable. The
>> current generation of airplanes are impressively safe, so new ones face a
>> very high hurdle. Boeing annually provide a fascinating summary of detailed
>> statistics on world-wide airplane safety (www.*boeing*
>> .com/news/techissues/pdf/*statsum*.pdf). From this you can infer that
>> current critical computer systems have demonstrated, in hundreds of
>> millions of hours of operation, something like 10^-8 pfh (e.g. for the
>> Airbus A320 and its ilk). To satisfy Mike’s criterion, new systems need to
>> demonstrate that they are better than this. This needs to be done *before
>> *they are certified. Can it?
>> Probably not. See Butler and Finelli (IEEE Trans Software Engineering,
>> 1993), or Littlewood and Strigini (Comm ACM, 1993) for details.
>> Michael Holloway’s quotes from 178B and 178C address this issue, and
>> have always intrigued me. The key phrase is "...*currently available
>> methods do not provide results in which confidence can be placed at the
>> level required for this purpose**…*” Um. This could be taken to
>> mean: “Yes, we could measure it, but for reasons of practical feasibility,
>> we know the results would fall far short of what’s needed (say 10^-8ish).
>> So we are not going to do it.” This feels a little uncomfortable to me.
>> Perhaps best not to fly on a new aircraft type until it has got a few
>> million failure-free hours under its belt (as I have heard a regulator say).
>> By the way, my comments here are not meant to be critical of the
>> industry’s safety achievements, which I think are hugely impressive (see
>> the Boeing statsum data).
>> *5. Engineers, scientists…academics...and statisticians...*
>> …a descending hierarchy of intellectual respectability?
>> With very great effort I’m going to resist jokes about alpha-male
>> engineers. But I did think Michael’s dig at academics was a bit below the
>> belt. Not to mention a couple of postings that appear to question the
>> relevance of science to engineering. Sure, science varies in quality and
>> relevance. As do academics. But if you are engineering critical systems it
>> seems to me you have a responsibility to be aware of, and to use, the best
>> relevant science. Even if it comes from academics. Even if it is
>> statistical.
>> My apologies for the length of this. A tentative excuse: if I’d spread
>> it over several postings, it might have been even longer…
>> Cheers
>> Bev
>> _______________________________________________
>> Bev Littlewood
>> Professor of Software Engineering
>> Centre for Software Reliability
>> City University London EC1V 0HB
>> Phone: +44 (0)20 7040 8420 Fax: +44 (0)20 7040 8585
>> Email: b.littlewood_at_xxxxxx >> <javascript:_e(%7B%7D,'cvml','b.littlewood_at_xxxxxx >>
>> _______________________________________________
>> _______________________________________________
>> The System Safety Mailing List
>> systemsafety_at_xxxxxx >> <javascript:_e(%7B%7D,'cvml','systemsafety_at_xxxxxx >>
> _______________________________________________
> Bev Littlewood
> Professor of Software Engineering
> Centre for Software Reliability
> City University London EC1V 0HB
> Phone: +44 (0)20 7040 8420 Fax: +44 (0)20 7040 8585
> Email: b.littlewood_at_xxxxxx > <javascript:_e(%7B%7D,'cvml','b.littlewood_at_xxxxxx >
> _______________________________________________

Nick Tudor
Tudor Associates Ltd
Mobile: +44(0)7412 074654

*77 Barnards Green Road*
*WR14 3LR*
*Company No. 07642673*
*VAT No:116495996*

* <>*

_______________________________________________ The System Safety Mailing List systemsafety_at_xxxxxx
Received on Mon Mar 09 2015 - 18:52:46 CET

This archive was generated by hypermail 2.3.0 : Tue Jun 04 2019 - 21:17:07 CEST