[SystemSafety] Safety Cases: Contextualizing & Confirmation Bias -- Fault Injection

From: Stachour, Paul D CCS < >
Date: Mon, 10 Feb 2014 13:30:26 +0000


Drew, Tracy, Martyn, Peter, ...

   I agree that there is a distinction between claiming that we have done an action (and done the action right), documenting that we have done the action, and showing that the action was done correctly.

   In the products which we build at DETECTOR ELECTRONICS CORP, as part of the V&V actvity, there is an action performed which is labeled "fault injection". Fault injection happens as part of the V&V, where we have previously identified things that could go wrong (as part of a hazard analysis or other actions), where we actually inject faults (e.g., cutting a wire, creating a bad CRC on a message, stuffing a normally hidden variable with a bogus value) and then observe how (or not) the product diagnoses that the fault has happened, how (and when) it reports its diagnosis, and the fault-recovery actions (if any) which it takes.

   In other words, while we believe that we have designed and built a safe product, we are now trying to show (by the fault injection) that the product is unsafe (that is, it does not act in a safe manner in the presence of faults).

   This brings up some larger questions: 1) Did we properly list all of the faults which could happen?

     [probably not]
2) Did we properly list all of the important faults?

     [we think that we did]
3) Did we properly categorize those which are most likely to happen?

     [we think we did]
4) Did we properly create test-cases and inject faults after doing 2&3?

     [we think we did]
5) Did we thus properly reduce confirmation bias that is so easy for a builder of products/systems to have? Especially since there seldom seems to be the resources and time to do the full V&V that one would like to do?

    [We think we did.]

Are we perfect at this activity [fault injection]?   No, no-one is perfect. However, we do think that our actions in identifying potential faults, actually injecting them into the [appropriately instrumented] product, and seeing what actually happens [and documenting that activity] can increase our confidence [and reduce the confirmation bias] that we have built a safe product.

Is such an activity [fault injection] part of a safety case? Or is it "just good product / systems engineering"?

I'll defer labeling of what we do (that is, what it is called) to someone else.

Paul D. Stachour
Software Quality Assurance
Detector Electronics Corporation
A UTC Fire & Security Company
6901 West 110th Street, Bloomington, MN 55438 USA 952-941-5665, x8409
Paul.Stachour_at_xxxxxx

--The ideas and opinions expressed in this message

--are solely those of the message originator(s).
--The opinions of the author(s) expressed
--herein do not necessarily state or reflect those
--of Detector Electronics, or of United Technologies
--Corporation. They may not be further disseminated
--without permission. They may not be used
--for advertising or product endorsement purposes.


From: systemsafety-bounces_at_xxxxxx Sent: Sunday, February 09, 2014 6:16 PM
To: Tracy White
Cc: systemsafety_at_xxxxxx Subject: [External] Re: [SystemSafety] Safety Cases

Tracy,
The important point is that "done" and "claimed" are different things, not synonyms as you imply. Activities that are very good at the "done" are not necessarily very useful for the "claimed" and vice versa. In particular, a lot of activities that go into making a safe design are only indirectly evidence that the design is safe. In essence, they are evidence that've tried hard, not that you've achieved anything. This is why we distinguish between "process" and "product" evidence. One of the advantages of explicit safety cases is they force you to consider exactly what your evidence shows or doesn't show.

Contrawise, some activities which are used a lot to generate evidence are only indirectly helpful at making a design safer. A lot of quantitative analysis goes into this basket. Only if it reveals issues that are addressed through changes to design or operation can quantitative analysis actually directly improve safety. Otherwise it is evidence without improvement.

Drew

My system safety podcast: http://disastercast.co.uk My phone number: +44 (0) 7783 446 814
University of York disclaimer: http://www.york.ac.uk/docs/disclaimer/email.htm

On 9 February 2014 22:26, Tracy White <tracyinoz_at_xxxxxx

[Andrew Rae Stated]

(Note: not all safety activities are about evidence. Most of them are about getting the design right so that there _aren't_ safety problems that need to be revealed).

I completely agree that 'getting the design right' is an important element of any assurance argument but I disagree that it can be done (claimed) without providing 'evidence'. If you think you can claim that you got the 'design right', then you must have done something to achieve that and for those efforts there will be evidence.

Regards, Tracy



On 7 Feb 2014, at 23:24, Andrew Rae <andrew.rae_at_xxxxxx If I can slightly reframe from Martin's points, the real problem is asking these questions in the negative. If the system _didn't_ have the properties it needs, what activities or tests would be adequate to reveal the problems?

Whenever there is a focus on providing evidence that something is true, this is antithetical to a proper search for evidence that contradicts. As Martin points out, most evidence is not fully adequate to show that properties are true. The best we can do is selecting evidence that would have a good chance of revealing that the properties were not true. (Note: not all safety activities are about evidence. Most of them are about getting the design right so that there _aren't_ safety problems that need to be revealed).

Simple question for the list (not directly related to safety cases):

How often have you seen a safety analysis that was:

  1. Conducted for a completed or near completed design
  2. Revealed that the design was insufficiently safe
  3. Resulted in the design being corrected in a way that addressed the revealed problem(s) Supplementary question: What was the activity? [Not so hidden motive for asking, just so the question doesn't look like a trap - I've seen a lot of QRA type analysis that meets (a), but the only times I've seen (b) and (c) follow on are when the analysis is reviewed, not when the analysis is conducted] Drew

1 What properties does the system need to have in order for it to be adequately dependable for its intended use? (and how do you know that these properties will be adequate?) 2 What evidence would be adequate to show that it had these properties? 3 It it practical to aquire that evidence and, if not, what is the strongest related property for which it would be practical to provide strong evidence that the property was true? 4 What are we going to do about the gap between 1 and 3?

My system safety podcast: http://disastercast.co.uk My phone number: +44 (0) 7783 446 814<tel:%2B44%20%280%29%207783%20446%20814> University of York disclaimer: http://www.york.ac.uk/docs/disclaimer/email.htm

On 7 February 2014 12:05, RICQUE Bertrand (SAGEM DEFENSE SECURITE) <bertrand.ricque_at_xxxxxx It seems to me that at the end of the reasoning, the standard xyz (e.g. IEC 61508) requests some work to be done available in documents (whatever the name). Standard xyz contains (strong) requirements on 1 and (weaker) requirements on 2 but at least requirements on the means and methods to achieve 1.  It looks circular.
 In the understanding of stakeholders being compliant to standard xyz means not doing a lot of engineering stuff that is unfortunately explicit or implicit in the standard xyz. But most often they even never read it. This is also an explanation about the observed gap in the industry.  Bertrand Ricque
Program Manager
Optronics and Defence Division
Sights Program
Mob : +33 6 87 47 84 64<tel:%2B33%206%2087%2047%2084%2064> Tel : +33 1 59 11 96 82<tel:%2B33%201%2059%2011%2096%2082> Bertrand.ricque_at_xxxxxx

Sent: Friday, February 07, 2014 12:16 PM To: systemsafety_at_xxxxxx Subject: [SystemSafety] Safety Cases

In the National Academies / CSTB Report Software for Dependable Systems: Sufficient Evidence? (http://sites.nationalacademies.org/cstb/CompletedProjects/CSTB_042247) we said that every claim about the properties of a software-based system that made it dependable in its intended application should be stated unambiguously, and that every such claim should be shown to be true through scientifically valid evidence that was made available for expert review.

It seems to me that this was a reasonable position, but I recognise that it is a position that cannot be adopted by anyone whose livelihood depends on making claims for which thay have insufficient evidence (or for which no scientifically valid evidence could be provided). Unfortunately, much of the safety-related systems industry is in this position (and the same is true, mutatis mutandis, for security).

It seems to me that some important questions about dependability are these:

1 What properties does the system need to have in order for it to be adequately dependable for its intended use? (and how do you know that these properties will be adequate?) 2 What evidence would be adequate to show that it had these properties? 3 It it practical to aquire that evidence and, if not, what is the strongest related property for which it would be practical to provide strong evidence that the property was true? 4 What are we going to do about the gap between 1 and 3?

The usual answer to 4 is "rely on having followed best practice, as described in Standard XYZ". That's an understandable position to take, for practical reasons, but I suggest that professional ingegrity requires that the (customer, regulator or other stakeholder) should be shown the chain of reasoning 1-4 (and the evidence for all the required properties for which strong evidence can be provided) and asked to acknowledge that this is good enough for their purposes.

I don't care what you choose to call the document in which this information is given, so long as you don't cause confusion by overloading some name that the industry is using for something else.

I might refer to the answers to question 1 as a "goal", if I were trying to be provocative.

Martyn



The System Safety Mailing List
systemsafety_at_xxxxxx Received on Mon Feb 10 2014 - 14:30:56 CET

This archive was generated by hypermail 2.3.0 : Mon Apr 22 2019 - 20:17:06 CEST