A program is a mathematical equation. It is either right, or either wrong. It does not change in time. Take the theorem of Pyhtagore. It is right. If I apply it to a non square triangle the theorem remains right. The application is wrong. One can replace a right software by a wrong software but it is right or wrong from the very beginning and will remain until the end of the universe. An equation cannot "degrade" itself.

There are no statistics on equations. There are just demonstrations, whatever the mean.

When we measure successes and failures of a software implemented on a hardware, from a blackbox perspective, we don't measure if the equation is right or wrong, we measure :
* The adequation of the requirements to the needs,

Is this really what we want ? I am not against, but it must be made very clear to the standard readers that what we measure is not a property of the software/equation but of the property of exclusively this HW/SW package,  within a very specific context (the input vectors having generated the success/failure series). And only in this context.

I would find this clear and rigorous enough to be in a standard, from an engineer perspective. I would find it also of absolutely no practical use, but I am convinced about that since the very beginning...

Don't forget that the supporters of proven in use/prior use principles are driven by only one thing: money. The aim is to find a way to convince an authority (which is far from demonstrating something to a scholar) that something is as safe as something else (let's say IEC61508 "compliant"). It might be true (but who knows), it might be wrong.

As it was pointed in a previous post, we are dealing with safety, and this type of approaches in a standard decrease the requirement level and transfers the obligation of proof to the auditor who will have the obligation to demonstrate to the claimer that his claim is wrong, not enough documented, with wrong data, etc....

Michael's proposal is very similar to the approach contained in ISO 26262 Part 8 Clause 12 "Qualification of software components".

To grossly simplify, it requires generation of confidence through black-box testing (including considering abnormal cases as well as testing against requirements) but for ASIL D evidence of structural coverage is also required.

Martyn suggests that we put the language to one side.

My take on the core problem.

IEC 61508-7 [2010] Annex D "provides initial guidelines on the use of a probabilistic approach to determining software safety integrity for pre-developed software based on operational experience. This approach is considered particularly appropriate as part of the qualification of operating systems, library modules, compilers and other system software."

In effect, I select an appropriate set of test data, run my system for a long time (or run lots of systems for a short time) and conclude - if no failures are detected - that the system is safe.

The longer that I test for, the higher the SIL level that can be assigned to the component that is being evaluated.

In my book, this is Black-Box testing.

If we revise this appendix as Peter proposes, then we may be able to help people to select more appropriate test data (and this may be an improvement) - but this will still be Black Box testing.

If we can't avoid this appendix altogether (and I'm sure that Bertrand is right about this), then we should - surely - be able to require some additional "White Box" assessments, such as code reviews, design reviews, etc (in line with the rest of the standard).

If we can achieve this, I would sleep more easily.


I'm puzzled by much of this discussion. Consider this common example:

A company creates a software package and submits it for beta testing by a group of users. Assume that the package reports how often it is used and for how long, and the users report all errors they encounter. Assume there is a single instance of the software on a server that all the users use.

The company corrects some of the errors that are reported.

The company calculates some measure of the amount of usage before failure. Call it MTBF.

The MTBF is observed to increase.

What word shall we use to describe the property of the software that is increasing?

I'd call it "reliability". If you would, too, then how can software reliability not exist?

I don't mind if you want to use a different word to describe the property. Let's just agree one, do a global replace in the offending standards and move on ...

... to discussing a practical upper bound on the "reliability" that can be assessed in this way - and on the assumptions that should be made explicit before using any such assessment as a prediction of future performance.


