Re: [SystemSafety] Stupid Software Errors [was: Overflow......]

From: Matthew Squair < >
Date: Mon, 4 May 2015 22:31:37 +1000

Maybe the conclusion is just that 'people are just no damn good'? To quote that great Australian philosopher Nick Cave

On the other hand I don't think we should loose sight of the fact that the Boeing 'bug' was found by running a long duration simulation, not by an airliner falling out of the sky. So perhaps thanks is due to the Boeing safety or software engineer(s) who insisted on a long run endurance test and who might have actually learned something from history?

On Mon, May 4, 2015 at 4:41 PM, Peter Bernard Ladkin < ladkin_at_xxxxxx

> Hash: SHA256
> I wrote a version of the following a few days ago to a closed list.
> AA has EFBs crashing on a number of flights. Apparently two copies of the
> approach chart for
> Reagan Washington National airport were included in after the latest
> update of the EFB, and the
> app wasn't able to handle having two files with almost-identical metadata
> denoted as "favorites".
> A colleague who flies for a major airline (not AA) which uses EFBs spoke
> of some colleagues having
> their EFBs crash early on Jan 1 one year - they fixed it by rolling the
> date back a day.
> On the Boeing 787: think of 32-bit Unix clock, and lots of examples.
> There's even a Wikipedia page
> .
> Remember Apple's go-to fail (CVE-2014-1266) from 2014: missing parsing
> checks.
> These are simple, known types of error. Forty years ago, it was known how
> to avoid all these kinds
> of problems. Twenty years ago, there were industrial-quality engineering
> tools available (proper
> languages and coding standards checkers) which enabled companies to avoid
> such problems without
> undue development costs.
> I don't buy Derek Jones's or Tom Ferrell's versions of the curate's egg. I
> don't see why anyone
> else should, either. Are they still going to be saying "well, it depends,
> it's complicated" in
> another twenty years when stupid coding errors still make it through into
> supposedly-dependable
> software products?
> Look at go-to fail. That's critical code! How come critical code such as
> that is not routinely
> subject to static analysis?
> Look at the 787 generator code. A systematic loss of all generators is
> surely a hazardous event.
> That should make it 10^(-7). Oh, but I forgot. Even though correct
> operation of SW contributes to
> the 10^(-7), the reliability of the SW itself is not assessed. But surely
> it gets to be at least
> DAL B, since the result is a hazardous event? Oh, but I forgot something
> else. A systematic
> failure like that would be common cause, and the certification
> requirements concern single
> failures, not common cause failures. So that's all right then. Tom's
> suggestion that it might have
> been a design compromise is vitiated by the fact that the phenomenon is
> subject to an
> AIRWORTHINESS Directive by the FAA. (Is that sufficient emphasis?)
> If people had told me thirty years ago that we'd still be making the same
> stupid mistakes in the
> same ways, but this time in code more fundamental to the safe or secure
> operation of everyday
> engineered objects, I wouldn't have believed it.
> Maybe it's a social thing. Mostly, people actually writing the code and
> inspecting it are in their
> twenties and their bosses maybe at most in their early thirties. The young
> people have never made
> *this* mistake before - the previous lot had of course, but they're all in
> management now. I'm
> reminded of Philip Larkin's ode to rediscovery, Annus Mirabilis:
> Sexual intercourse began
> In nineteen sixty-three
> (Which was rather late for me)-
> Between the end of the Chatterley ban
> And the Beatles' first LP.
> The Ensuing Discussion.
> There was obviously discussion on the list of why we are making the same
> old mistakes forty years
> after it was known how to avoid them. Some discussants suggested it might
> help to professionally
> certify software engineers, a PE. Others referred to the Knight-Leveson
> study a decade ago for the
> ACM, in which inserting SE into the current PE scheme was not seen as
> advantageous. UK discussants
> pointed out that such certification exists in the UK, as a CEng through
> the BCS or IET, and that
> there had been some UK consideration of extra qualification for
> critical-software engineering.
> Such qualification for system safety hasn't (yet) generally caught on
> anywhere. SARS offer it in
> the UK for example. It didn't catch on in the US. Over a decade ago, the
> System Safety Society
> introduced an option for system safety engineering into the PE exam. They
> had to pay the NPSE or
> NCEES (I forget which) lots of money per year to maintain the option - and
> two people took it in
> some number of years. So they dropped it. (I was at the board meeting in
> Ottawa in 2004 when this
> was decided.)
> The UK qualification regime hasn't stopped IT disasters in government
> procurement. And it hasn't
> stopped the kind of poor engineering which allows bank ATMs which use
> supposedly
> pseudo-one-time-pad nonce generation to be subject to replay attacks (see
> a recent paper reciting
> local experiments performed by Ross Anderson's group). I do note, however,
> that the three examples
> I mentioned above are all US examples. It's not ruled out that having some
> degree of formal
> professional training, as in the UK, encourages software engineers to
> avoid repeating simple
> mistakes whose prophylaxis has been well known for decades.
> Time was, when UK and US cars were not known for their reliability. Kind
> of like SW,
> relatively-inexpensive cars used to go wrong a lot. However, some very
> expensive cars such as made
> by Rolls-Royce/Bentley and Wolseley were reliable. So there was proof of
> concept. Japanese
> companies decided it was possible to produce reliable
> relatively-inexpensive cars and make money,
> and did it.
> There is proof of concept in SE, too. Unlike Rolls-Royce cars, it is not
> prohibitively expensive.
> Three out of my four examples involve run-time error. It is feasible to
> produce SW
> cost-effectively which is free from run-time error. Just like the Japanese
> approach to cars, you
> just have to decide to do it.
> How about the following? We design a document called A Programmer's
> Pledge. It has thirty or so
> numbered clauses:
> * I promise never to deliver SW which is subject to a data-range roll-over
> phenomenon (especially
> dates and times)
> * I promise never to deliver software which is subject to a numerical
> overflow or underflow exception
> * I promise never to deliver software which reads data on which it raises
> an "out of range" exception
> * ..... and so on
> A professional programmer signs it and files it with hisher professional
> organisation. Quality
> control issues in programs (such as the above phenomena) are routinely
> subject to RCA of sorts.
> When a programmer is responsible for a piece of code with such an error in
> it, the company reports
> it to the professional organisation and the programmer gets "points"
> attached to the corresponding
> clause in hisher Pledge. Like with driving (Germans say "points in
> Flensburg" which is where the
> office is. What is it in the UK? "Points in Cardiff"?). I bet lots of
> organisations, from
> companies hiring programmers to professional-insurance companies will find
> uses for it.
> Prof. Peter Bernard Ladkin, Faculty of Technology, University of
> Bielefeld, 33594 Bielefeld, Germany
> Je suis Charlie
> Tel+msg +49 (0)521 880 7319
> gi6zvdAb1ns2A8w0xXiBz6E8+iwik53ueVxhEDTINA4RXyoLTfFEVl9yunOR0qnU
> 7ht92kguaSjuM3BGUGYzy8MpZMjc0jyNWRmyC3wh0y3X0NnjL+/GMiqYR+3zq5RX
> ZEzJk89SboZiB1kyTqMM+IcKzbABmk1CSaAkQziGvdJFWklNM10prMIk/5MprGwV
> EeePB1rGs13Z1LZi8GIqdz8PDc1FKSz5qRugQ8VZJbbJvgct9JJVfEtQx3uElGkt
> a/E5fQ/+Gw8CARMhpktEr/wLdk7t3akJvNF5iLK5W7Mbb3h0kd7sCNLZ5d9OZyA=
> =i/nm
> _______________________________________________
> The System Safety Mailing List
> systemsafety_at_xxxxxx >

*Matthew Squair*

Mob: +61 488770655
Email: MattSquair_at_xxxxxx
Website: <>

_______________________________________________ The System Safety Mailing List systemsafety_at_xxxxxx
Received on Mon May 04 2015 - 14:31:45 CEST

This archive was generated by hypermail 2.3.0 : Tue Jun 04 2019 - 21:17:07 CEST