Re: [SystemSafety] Modelling and coding guidelines: "Unambiguous Graphical Representation"

From: Steve Tockey < >
Date: Sat, 5 Mar 2016 20:53:57 +0000

Derek,

"You seem to be willfully refusing to deal with a reality that does not conform to your ideal view of how the world should be."

This is the crux of where we differ, I guess. Yes, you are entirely concerned with how software *is* practiced today. And I respect that. Where I'm coming from is real, on-project experience involving at least 19 projects where the results are what I stated:

Development cost and schedule cut in half from typical Delivered defects < 0.1 of typical
Long term maintenance < 0.25 normal

I'm not dreaming of some pie-in-the-sky, never-been-done-before ideal. We
*have* done it. More than once. And yet the people who weren't on those
projects refuse to acknowledge those successes.

"Quality as seen by the paying customer, which can be very different from code quality."

The key word here is "can". As I said before, software could have perfect internal structure and completely meet the stated requirements, but the stated requirements could be totally wrong. So yes, high quality code internally, panned by customers externally. It happens. But I'm talking about how crappy internal code quality affects the paying customer's perception of quality. A word processor that goes into an infinite loop on a paste is crap quality regardless of how well the software might claim to meet the "functional" requirements. I can give you a laundry list of other defects that are clearly caused by crap code quality, not by incorrect requirements. A small sampling:

*) Re-opening a document after an infinite loop crash has about a 10%
chance of "There is insufficient memory". I have to quit and restart a second time before it will read the exact same document it just complained about being too big

*) I try to save a document and it will complain "Not enough space on
disk" despite having gigabytes of free space on the disk. Only one file is 7 meg, the rest are well under 1 meg, most are about 400k

*) Lines of text in the document disappear at page boundaries. The lines
are really there, page rendering defects make them disappear. Adding a blank line makes them come back. That's offset by opposite cases where the same line appears both before and after the page boundary

*) Columns in tables can't be resized if that table crosses a page boundary

*) Even when a table is contained in the same page, column margins can't
be moved by more than 1/8 inch at a time. It requires many click-drag, click-drag, click-drag events to move column margins any appreciable amount

*) If a word in a section of text was hyphenated by line-wrapping, when
that text is copied and pasted the hyphen is still in the pasted text even though line-wrapping isn't happening at that word anymore

*) If a table is pasted into a document and the right edge of the table is
outside the right hand page margin, that edge of the table can't be selected (for resize) until page layout is set to landscape so the table margin is back inside the right side page margin

*) Switching page orientation to landscape to re-size a table has a 50%
chance of resulting in "Document is to big for <schlock product> to handle" error

*) "Paste and match style" frequently doesn't match the style of what that
text was pasted into

*) Sometimes changing type size in a footnote causes the display content
to move by as much as 7 pages

*) Editing text within a footnote sometimes ends up as a footnote to the
footnote

I could go on. These are not mis-matched requirements issues, they are simply amateur programmers at work. The same holds for other products from this same company, I can give you a laundry list of crap-code induced problems in them as well. If I wanted to, I could start lists for products from other companies (next on my list would be Adobe) but seriously, I don't have the time.

"Ok. Can I have a copy of your data please. I would be happy to analyze and write about your success."

Unfortunately I don't have detailed to-the-person-hour data for these projects. Besides, that data would almost certainly be considered proprietary by the organization in question. That said, however, here's data that I can give you:

*) A discrete event simulation system was built to allow traffic
throughput calculations for commercial airports. The legacy code was 44K SLOC of Fortran and only handled a single runway. The company needed to support parallel runway simulations. The two full-time maintainers said the code could not handle parallel runways. The system was rebuilt from the ground up. One developer was coached on doing model-based requirements and design and replaced the legacy code in about 9 months with 8K SLOC of Ada that also handled parallel runways and needed less than 1/4 FTE to maintain from then on.

*) 767 Engine Simulator ATE: the most credible estimate for the project
was 18 months with 4 developers. The business need was 9 months. The 4 developers delivered in 7 months. Industry data for delivered defects in real-time systems (Boehm) were 15-17 per K SLOC. We delivered 13K SLOC. We claimed we would be no worse than the data for IT systems, 5-7 defects per K SLOC. Commercial products were supposed to deliver 1-3 defects per K SLOC. We based our estimate on 6 per K SLOC, this would estimate 78 defects delivered. Only 4 defects were reported by users in the entire first year. That's a delivered defect rate of 0.3 defects per K SLOC and that's for a real-time system. When we wlooked at defects found in model reviews (for both requirements and design), we counted pretty close to 72 defects found and fixed before coding. We made mistakes, we just found and fixed the majority of them before code was ever written.

*) 777 ATE: The business need was 30 months, the project was staffed to
get it done in that time. The original team wasted the first half of that time writing research papers, surfing the web, playing games, etc.--not doing any ATE requirements, design, or code work. When management found the team had done nothing in 1.25 years, that team got shuffled out. A new team was brought in. Using model-based development we were able to deliver to the factory floor 3 days ahead of the original schedule. Similar delivered defect densities as 767 ATE. The 777 Factory Manager insisted on 24x7 pager coverage for the first 6 weeks of use. If 777 ATE so much as hiccuped, he wanted someone there to fix it NOW. Not a single call ever happened. Interviews with shop floor mechanics (users) showed that they were astonished by how such complex functionality could work so flawlessly. They even set up a betting pool: $20 buy in, the first to be able to reputably crash 777 ATE would win a $600 pool. As long as I was with 777 ATE, nobody ever won the pool because nobody could crash it. Even with a $600 incentive.

*) ATE Switch Matrix Driver re-write: Part of general ATE software in
Everett was the driver for a "switch matrix". This device allows ATE instrumentation on one axis of wires to be arbitrarily switched into and out of airplane test points on the other axis of wires. There are rules about what switching was legal, like "No two output instruments can be switched onto the same airplane point at the same time". The existing code was about 4000 lines of spaghetti crap that was constantly failing. We re-developed it in roughly a month and ended up with about 800 lines of clean code that never broke from then on. When I say "never", I mean as long as I was on the 777 ATE project which was about another 8 months. I can't say it never failed after that because I lost contact with them for about 15 years. But, if the switch matrix code was going to fail then it should have failed at least once in that time.

*) 787 ATE: Can you say, deja vu? Again, the original team wasted the
first half of the schedule. Again, I got called in to bail out the project with a replacement team. We delivered earlier than was required by the original airplane development schedule. Remember that the whole airplane program slipped by about 2 years. We would have been ready even if there were no slips at all anywhere else in the airplane program. Similar quality performance as 777 ATE and 767 ATE.

*) Oso and Altair: A pair of projects at Los Alamos National Lab that
prepare input data (2-D and 3-D models) for computational fluid dynamics code. The need date for both projects was about 12 months, both were delivered in about 6. Similar quality performance as ATE.

*) 777 ATE ACS Re-write: FAA rules are that one can't test a commercial
airplane with obsolete instrumentation. One part of 777 ATE was "ACS", a driver to hardware that allows ATE to talk on the ARINC 629 bus. CDS, the maker of the hardware, notified Boeing that the version they were using would go obsolete in 12 months. ATE management estimated it would take at least 12 months with 4 people to rewrite the driver code. I was called in to help the team. We started with the original ACS requirements models and modified them to account for hardware changes in the new device. Those same model mods were forwarded into the design models and then into the code. They were back on the air with the new ARINC 629 hardware in 3 months.

*) P-8 Mission Systems: The P-8 is the replacement for the Lockheed P-3
Submarine Chaser (https://en.wikipedia.org/wiki/Boeing_P-8_Poseidon). All of the fly-the-airplane avionics was essentially stock 737 equipment--out of scope for us. Our project was the mission systems in the back of the plane: mission planning, mission execution, secure plane-to-plane communication, target acquisition and tracking, targeting, . . . There were about 250 developers on the project, working for about 7 years. Unfortunately, the project went black (need-to-know security clearance, I mean) before they finished and they're understandably tight-lipped about it now. My best guess is that mission systems code is in the 5-7 million SLOC range. I can't say that the software was done in half the time, because they won't tell me. On the other hand, it's normal for projects of this size to be substantially late, on the order of 100%. All I can say is that there have been no reports of schedule delay or software quality problems on the P-8. It's be so out of the news that most people don't even know it exists. OTOH, if it were anything like a typical military project of that size, it would have almost certainly been in the news for some software-induced delay or another.

*) Well Placement Engineering: This was for an oilfield services company.
Most people think that oil wells are drilled straight vertical. Not so. To take advantage of softer layers, wells can go many times the distance horizontally as vertically. The drill bit is literally steered by an operator to follow a pre-determined path. This project was for part of the software that gives the drill operator control over the drill. Again, model based requirements and design led to the project being delivered early (sorry, I don't have exact how-early figures). I also don't have delivered defect data from the field. However, the testers told me personally that they were amazed by the quality of that software, they literally could not find any defects in it at all. It passed almost every test they threw at it.

There have been other projects, too. Unfortunately, I have less data about those projects than these (above). And of course, anyone can throw out the "But this is all anecdotal data" defense. And there's nothing I can do to counter that. However, when you look at the typical results of software projects in terms of over-cost, over-schedule, under-scoped, defects-delivered, the pattern on these projects is decidedly different. None of these were over schedule (the pattern, as I said, is to be early), none were over cost, all delivered everything they were supposed to, and customers were delighted that the code did what they wanted it to without crashing or giving wrong answers.

Is this enough data? Or, do you need more?

-----Original Message-----
From: systemsafety <systemsafety-bounces_at_xxxxxx on behalf of Derek M Jones <derek_at_xxxxxx Organization: Knowledge Software, Ltd
Date: Friday, March 4, 2016 2:23 PM
To: "systemsafety_at_xxxxxx <systemsafety_at_xxxxxx Subject: Re: [SystemSafety] Modelling and coding guidelines: "Unambiguous Graphical Representation"

Steve,

>> A very important question.  I'm not sure I have the answer, but I
>> data data showing it happening at multiple levels."
>
> That's interesting data, yes. But remember, that data is limited by how
> software development is being practiced today (by largely "highly paid

I want data about today's practices (the data is actually from 2005-2014, the dates used by the papers that originally worked on it). Why would you want data from further back (not that I think things have changed much)?

> amateurs"), and not how software development SHOULD be practiced.

You seem to be willfully refusing to deal with a reality that does not conform to your ideal view of how the world should be.

>> "The benefits of any upfront investment to make savings maintaining
>> what survives to be maintained has to be compared against the
>> costs for all the code that got written, not just what remains."
>
> Some of the software that gets built is intended to be single-use or is
>of
> intentionally short lifetime. My experience is that's a very small
>subset.

The data is from large, long lived (10-20 years) projects.

> I argue that what determines survivability of the vast majority of > software is, essentially, quality. High quality (I.e., well-structured,

Quality as seen by the paying customer, which can be very different from code quality.

> clean, etc.) code that reliably automates a solution is far, far more
> likely to survive than code that's crap (e.g., high technical debt) and
>is

Unsubstantiated, biased point of view.

Cost of maintenance is a drop in the pond compared to marketing expenses.

What is important is ability to add new features quickly so the next version can be sold to generate revenue. If a product is selling management will pay what it takes to get something out the door.

> Using your terminology from
...

> I'm saying that I can make d1 and m1 such that:
>
> d1 = 0.5 * d
> m1 <= 0.75 * m
>
> Therefore, clearly, for any y >= 0
>
> d1 + y * m1 << d + y * m

Ok. Can I have a copy of your data please. I would be happy to analyze and write about your success.

-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com
_______________________________________________
The System Safety Mailing List
systemsafety_at_xxxxxx

_______________________________________________
The System Safety Mailing List
systemsafety_at_xxxxxx
Received on Sat Mar 05 2016 - 21:54:10 CET

This archive was generated by hypermail 2.3.0 : Thu Apr 25 2019 - 11:17:07 CEST