Re: [SystemSafety] Critical Design Checklist

From: Les Chambers < >
Date: Tue, 27 Aug 2013 21:49:40 +1000

  1. Describe your process approach to hazard analysis and requirements definition.
  2. Highlight all hazards based on operational experience and past history of accidents.
  3. Can you provide some background on the people involved in the process. Can you present evidence that they were capable of recognising credible hazards. (Right answer: yes we had 20 people with a sum total of 300 years experience in the application domain. Wrong answer: we had this groovy consultant with a checklist. He sounded like he knew what he was doing.)
  4. What is your strategy for demonstrating that all the safety requirements have been satisfied in the design.
  5. How have you modelled the application domain. In other words, how well do you understand how it works.
  6. What is your strategy for demonstrating your system's response to unsafe conditions in the application domain.
  7. How much of your design depends on human intervention to mitigate safety hazards in high stress emergency environments.
  8. What design measures have you taken to prevent unsafe system maintenance from destroying the safety integrity of your system.
  9. What measures have you taken to establish the safety integrity of third-party and legacy software.
  10. To what degree does your system depend on the accuracy of configuration data. What measures does your design take to secure its integrity.
  11. How does your design deal with integrating third-party interfaces.
  12. How does your design support observability and testability, with a particular focus on regression testing. What elements of your design specifically support ongoing maintenance by an organisation other than the development team.

Failure scenarios (puerile stuff from bitter experience)

  1. Disconnect and reconnect network cabling from a random selection of points in your system.
  2. Simulate catastrophic system failure and evaluate your operations staff's response.
  3. Simulate power failure and brownouts.
  4. Are your backup systems the same height above sea level?
  5. Evaluate the attitudes and experience of the people responsible for ongoing operations and maintenance. What is their attitude to safety? How many hours of safety training have they had? How many years of safety-related system experience have they had?
  6. Turn off the air conditioner.
  7. Activate the halon/deluge system.
  8. How are your heatsinks kept in contact with your integrated circuit chips? They're not glued I hope.

Questions I don't want to be asked:

  1. Did you fully regression test this system after the last modification?
  2. With reference to that safety critical third-party software you integrated; and in respect of your claim that it is proven in use with 10 million failure free operational hours; please provide evidence that the exact same code executed on the exact same hardware configuration (an environment identical to your target environment) for each and every one of those 10 million hours.
  3. Provide evidence that you stress tested that system at 150% of its nameplate capacity.

I guess that will do, I've frightened myself.



From: systemsafety-bounces_at_xxxxxx [mailto:systemsafety-bounces_at_xxxxxx Driscoll, Kevin R
Sent: Tuesday, August 27, 2013 6:38 AM
To: systemsafety_at_xxxxxx Subject: [SystemSafety] Critical Design Checklist  

For NASA, we are creating a Critical Design Checklist:

. Objective

w Too easy to just check "yes" without doing sufficient work

w Instead, "What have you done ..."

w Prove what you have done is sufficient

. We are looking for inputs to include in this checklist

. Do you have any inputs that should be included?

w Where are the bodies buried?  

We are finishing the Checklist by next week and would like to include any good questions you may have that we have overlooked. Realizing this is an imposition on your time, I am hoping some of you would be so kind as to spend just a few minutes to send questions or even question fragments.  



I am also looking for unusual failure scenarios to add to my collection,
like those I've described in my series of "Murphy was an Optimist"
presentations (e.g.


_______________________________________________ The System Safety Mailing List systemsafety_at_xxxxxx
Received on Tue Aug 27 2013 - 13:50:11 CEST

This archive was generated by hypermail 2.3.0 : Tue Jun 04 2019 - 21:17:06 CEST