Re: [SystemSafety] RE : Qualifying SW as "proven in use"

From: Matthew Squair < >
Date: Fri, 28 Jun 2013 17:15:38 +1000


Yep it was my poor choice of phrase, my point was that in terms of evidence one or a thousand hours of data from the old environment should have the same evidential weight, if the new environment is different to the old and we have no idea of what it is.

Yes I would agree that stochastic inputs can generate stochastic behaviour. So if it's inputs we're talking about isn't the use of 'hours run' as a unit of exposure essentially a side issue? If what you're actually doing is exposing the software to a set of inputs that are stochastic in nature. As a consequence the amount of time you have to assign to collect a statistically valid sample is driven by the confidence you wish to obtain, the inherent variability of the input, and how frequently it arrives over some period of time.

Going back to my example if I 'know' that the variability of the inputs is extremely low one hour of input data may do, if not then much more may be needed. If the input data is very complex, any amount of hours may not be enough. All of which is about establishing how well we understand the environment rather than the 'reliability' of software, as I see it.

Picking up your example again, why was the difference between the two environments not detected by the designer? Was an assumption made that the new environment was the same as the old? Presuming that to be the case, isn't the decision to deploy software in that new context really about a different kind of uncertainty?

For example, the original environment had an arrival rate of inputs that could be characterised to have some frequency, the new environment also has a frequency but we are uncertain as to it's value. We could have estimated some bounds to this possible range of frequencies and run some tests to see what effect differing arrival rates might have, or we could have gone out and gathered field data, but instead (I presume) we elected to assume that the parameters were the same.

So deploying into a new environment carries epistemic uncertainty and we can reduce this, but if we make an assumption that the environment is the same we are translating that epistemic uncertainty into an ontological one. I infer from your example that we didn't have to wait too long after deployment to find this problem so I presume that we wouldn't have to run a trial for very long before we saw the problem input.

As to whether you would or should weigh operation in multiple different environments as better or worse I was thinking about the open source example, where having multiple different people looking at the code independently seems to generate very low defect rates. Linux is the example a lot of people use I believe. So, couldn't one argue that operation across a range of different environments would be more likely to expose different systematic errors, as compared to operation in one environment for a long time?

On Thu, Jun 27, 2013 at 9:35 PM, Peter Bernard Ladkin < ladkin_at_xxxxxx

> Matthew,
> Scenarios such as those Bertrand describes are not that far-fetched.
> Unfortunately, there are in some places senior management who are in the
> same state of (lack of) expertise as Bertrand describes. That is a problem
> of professional qualification which I would prefer to treat as a separate
> issue.
> On 27 Jun 2013, at 09:18, Matthew Squair <mattsquair_at_xxxxxx > > I've been thinking about Peter's example a good deal, the developer
> seems to me to have made an implicit assumption that one can use a
> statistical argument based on sucessful hours run to justify the safety of
> the software.
> It is not an assumption. It is a well-rehearsed statistical argument with
> a few decades of universal acceptance, as well as various successful
> applications in the assessment of emergency systems in certain English
> nuclear power plants.
> > I don't think that's true,
> You might like to take that up with, for example, the editorial board of
> > in fact I'd go further and say that whether you operate for a thousand
> hours or a million hours has no bearing on demonstrating software safety,
> because what we're interested in are systematic failures rather than random
> ones.
> I presume you would want to argue that the occurrence of a failure caused
> by a systematic fault is functionally dependent on the inputs, and that is
> what distinguishes it from what you call "random". However, if your inputs
> have a stochastic nature, then anything functionally dependent on them will
> also exhibit stochastic behavior. Failures caused by systematic faults
> thus exhibit stochastic behavior.
> > Example, I have a piece of software and (despite my best efforts)
> there's a latent fatal fault within it, however testing hasn't discovered
> it and I'm also in luck in that the operating environment is sufficiently
> close to the test environment that the fault is not triggered in the
> operating environment. Now I could run the system for one, one hundred or a
> thousand years in that operating environment and I wouldn't see a problem.
> So according to the statistical treatment the software is safe, even with a
> fatal flaw isn't it?
> No. According to the statistical treatment, if you have seen 3 x 10^X
> operational hours without failure, *and* you are guaranteed to have had
> perfect failure detection, *and* the future operating environment has the
> exact same statistical properties as the previous (not "similar" but exact,
> statistically), then you may be 90% confident that you will see failures
> with a likelihood of not more than 10^(-X) per operating hour. How that
> might relate to a claim that "the software is safe" is up to you. Also, you
> didn't express what level of confidence you might need in such a claim.
> > So logically if the number of hours you run in service in a particular
> environment has nothing to do with proving the safety of software, why
> couldn't I say that after one hundred hours the software was 'proven in
> use', for that specific environment. Why not one hour?
> It is correct that the number of hours.... has nothing to do with proving
> the safety of software, if by that you mean establish without a shadow of
> doubt. Neither does any practical statistical reasoning. Usual levels of
> confidence with statistical reasoning are 95%. Well away from certainty.
> You can of course say that, after 100 hours of failure-free operation, the
> SW is "proven in use", whatever that might mean to you. What you cannot do
> is attribute to that assertion any other than a very, very low level of
> confidence. Even one hour. With an appropriately lower level of confidence
> (= epsilon indistinguishable from zero, I would hope).
> > In Peter's example the number of hours run on the original software
> version could have been one, or ten million and there still would have been
> the same end result, e.g a failure when put into a new operational context.
> In other words one hour of operations has as much weight as one thousand
> (in the same environment).
> I am not sure what you mean here. To me, "new operational context" and
> "same environment" are contradictory, so maybe I don't understand the way
> you are using these terms.
> > Another question, say I have developed a piece of software, it's now
> running in three quite different operating environments, in terms of
> evidence of 'safety' would I weight 300 hours of operation in a single
> environment the same as 100 hours from each of these different
> environments? If so why?
> What you have is 100 hours of experience from each of three different
> distributions. You could superimpose the distributions if you want, but the
> only reason to do that is if you are thinking of deploying the SW in an
> environment identical to that superimposition and want to get a clue as to
> its viability.
> Prof. Peter Bernard Ladkin, University of Bielefeld and Causalis Limited

*Matthew Squair*
Mob: +61 488770655
Email: MattSquair_at_xxxxxx

_______________________________________________ The System Safety Mailing List systemsafety_at_xxxxxx
Received on Fri Jun 28 2013 - 09:15:50 CEST

This archive was generated by hypermail 2.3.0 : Tue Jun 04 2019 - 21:17:05 CEST