Re: [SystemSafety] Software reliability (or whatever you would prefer to call it)

From: Ian Broster < >
Date: Tue, 10 Mar 2015 08:49:44 -0000

Here's a different view on software reliability and an example.

We know that:

  1. We /can/ write software that is very well defined and does not exhibit any stochastic behaviour.
  2. We /can/ also intentionally (or unintentionally) write software that does exhibit unpredictable failure behaviour, which can be characterized using statistical techniques (and therefore called stochastic behaviour). You can achieve this through the use of random number generators for example. (1)

The challenge, as software grows in size and complexity, is the practical difficulty in writing software (like 1) that is so well defined and verified that it does not exhibit the stochastic failure behaviour (of 2).

Indeed, at some point in the size/complexity scale, the development and verification of fully deterministic software will become a practical impossibility and therefore we have little other option than to use some statistical metric of confidence that we have achieved the goal of no failure.

One example of this that is developing traction is the PROXIMA EU project, which is specifically focused on software timing for multi-core processors. The basic idea is that for very complex hardware/software systems, it is beyond practical feasibility to understand the worst case execution time of the software. ("How can you possibly have tested/analysed sufficient inputs, initial states, and the impact from other cores to give a bound which is both accurate and *practically/economically small enough*.")

The direction in this project is to intentionally produce a system that is designed to have a stochastic timing behaviour at the low level. And by doing so, you can then legitimately start to use all kinds of statistical methods that are not available to a digital system normally.

Therefore, you have a software computation that has a probability of failing to produce its result within its allotted time. However, you also have a reliable method of computing that probability, which can be well below the oft-quoted 10^-9/hour.


(1) [You could also map a partially testable massive input domain to a random-number generator, or consider race conditions driven by apparently randomly timed input data and the like].

Dr Ian Broster, General Manager
Rapita Systems Ltd
Tel: +44 1904 413 945 Mob: +44 7963 469 090

Stay informed by joining the Rapita Systems mailing list

For real-time verifications issues and discussion, follow the Rapita Systems blog

_______________________________________________ The System Safety Mailing List systemsafety_at_xxxxxx
Received on Tue Mar 10 2015 - 09:49:51 CET

This archive was generated by hypermail 2.3.0 : Sun Feb 17 2019 - 16:17:07 CET