Re: [SystemSafety] Stupid Software Errors [was: Overflow......]

From: David Haworth < >
Date: Tue, 5 May 2015 08:39:44 +0200


It seems to me that this thing is being hyped up (in the media and here) as yet another example of stupid programmers, making stupid mistakes that could have been/should have been/ definitely would have been detected by some magical analysis tool. And really, no-one has any detailled information as far as I can tell.

What if, in the final analysis, we find the something like the following code called from an interrupt service routine that handles the 100 Hz timer interrupt:

int32 jiffies = 0;

void RecordTime(void)
{

    if ( jiffies >= ABSOLUTEMAXUPTIME )
    {

        /* TraceLink: DesignSpec.MaxUptime
         *

* The specified maximum uptime of this system is 72 hours.
* ABSOLUTEMAXUPTIME is the maximum time we can possibly
* support (rounded down to a whole number of days)
* and turns out to be 248 days.
* If we ever reach this limit there must be something seriously
* wrong. Bad things will happen.
*/ ShutdownSystem(AbsoluteMaximumUptimeExceeded); } else { jiffies++;

    }
}

Questions for the list inhabitants:

  1. would the behaviour of the system containing this code match the behaviour reported in the press?
  2. would the static analysis tools find this "error"?
  3. would the requirements/design tracing (which I presume takes place) lead us to the exact requirement for maximum operating time before a restart?
  4. would we find the place in the operation manual, service manual or some other document where it already states when these systems should be shut down or restarted?
  5. Have these testers read all of the documentation about this aeroplane? Remember, by the McDonnell-Douglas Law of Aircraft Design, the aircraft shall not fly until the weight of the documentation exceeds the weight of the aircraft. :-)

My personal opinion: this is a non-issue.

Dave

On 2015-05-04 15:05:56 +0100, Martyn Thomas wrote:

> Was this 8 months of simulation, to find an overflow error that static
> analysis could find in seconds?
> 
> It may even be true that the developers assumed correctly that noone
> would fly for 8 months without powering off the generators - in which
> case their fault may have just been not documenting that assumption as a
> requirement.
> 
> Martyn
> 
> On 04/05/2015 13:31, Matthew Squair wrote:
> > On the other hand I don't think we should loose sight of the fact that
> > the Boeing 'bug' was found by running a long duration simulation, not
> > by an airliner falling out of the sky. So perhaps thanks is due to the
> > Boeing safety or software engineer(s) who insisted on a long run
> > endurance test and who might have actually learned something from history?
> >  
> >
> 
> _______________________________________________
> The System Safety Mailing List
> systemsafety_at_xxxxxx

-- 
David Haworth B.Sc.(Hons.), OS Kernel Developer    david.haworth_at_xxxxxx
Tel: +49 9131 7701-6154     Fax: -6333                  Keys: keyserver.pgp.com
Elektrobit Automotive GmbH           Am Wolfsmantel 46, 91058 Erlangen, Germany
Geschäftsführer: Alexander Kocher, Gregor Zink       Amtsgericht Fürth HRB 4886


----------------------------------------------------------------
Please note: This e-mail may contain confidential information
intended solely for the addressee. If you have received this
e-mail in error, please do not disclose it to anyone, notify
the sender promptly, and delete the message from your system.
Thank you.

_______________________________________________
The System Safety Mailing List
systemsafety_at_xxxxxx
Received on Tue May 05 2015 - 08:40:34 CEST

This archive was generated by hypermail 2.3.0 : Mon Feb 18 2019 - 10:17:07 CET