Re: [SystemSafety] Logic

From: Les Chambers < >
Date: Fri, 21 Feb 2014 11:52:02 +1000


Steve
You said: " I have a real-world example of a variant on this theme."

All I can say is: Hallelujah brother. How do you clap on an email list? clap, clap, clap, clap (I guess).

For those interested in case studies, there is another one from my past along the same lines (model-based development). Sounds like Alice in Wonderland, but it's actually true.

Story: the soul of a chemical reactor
For many years engineers have used software to enhance the performance of their machines. My first experience of this was in chemical processing (1975 - 1985). Working in a multidisciplinary team I developed process control software for chemical processing reactors. Put simply, we made synthetic latex, plastics, chlorine, insulating foams ... by controlling reaction kinetics with software. Our goal was not only to make high quality product but also to maximise the yield from our reactants and to make the most efficient use of resources: energy, water, labour and so on. Given that most of our reactants were either a threat to human health or potentially explosive, all this had to be achieved with safety. The software became the brains (nay the soul) of the plant, elevating the operator to a supervisor of automated chemical production. Software also allowed us to optimise the physical design of the plant (pipes, pumps and reactor vessels). Using smart control we could actually reduce the amount of plant hardware required without compromising safety. >From a business perspective, these applications were a resounding success. With much tighter, intelligent control we were able to produce more consistent quality at higher yields. Planned outages became less frequent as smart software predicted problems and took automatic corrective action before they escalated into a plant shutdown. As for the computers themselves, they were nothing but a tool in the service of chemistry, software was a means to an end. The programmers were actually chemical engineers on a mission. They couldn't care less about the beauty of their code, they were more interested in beautiful on-spec products. For me this was more than a job, it was a meaning of life, it was the best fun I ever had in my life and, to this day, remains the most successful suite of software applications I have ever witnessed. Building Complex Things
This software success story was largely due to the adoption of the standard engineering approach to building a complex thing:
. Develop a clear understanding of the problem - then document it. All
projects started with something we called "English language", a clear statement of the operating discipline structured such that it could be easily transformed into a design models - something similar to what is now called Requirements State Machine Language. The English language was usually high quality because it was written by plant engineers intimately familiar with plant operations and reaction kinetics. As the plant engineers who wrote the control programs were typically not expert in the application of computers to control systems, I worked in a central support group that trained them in analysis methods, control theory and programming techniques.

. Apply the best science to the problem. The applicable sciences were
the mathematics of control theory, and basic chemistry. Control theory, in existence for some years, was augmented with sampled-data systems theory to produce computer-based control. The chemical process technology was well established and documented by technology centres responsible for maintaining corporate memory of "the way we make chemicals".
. Partition the problem. Chemical processing plants could be very
large and complex with thousands of sensors and final control elements. However, the control problem could be simply partitioned into various unit operations (reactor, a heat exchanger, a premix tank, scrubber, distillation column). We applied the finite state machine model to each unit operation and developed mechanisms for cooperation between unit operations based on state. This approach has since become known as "model-based development" - that is, create a model of the system that will support detailed validation of proposed behaviour before you write a line of code.
. Simplify. Our application language was a simplified variant of
Fortran. It had no looping constructs (no do-whiles, no do-untils, certainly no go-tos). It could be taught to any engineer with foundation programming skills (some operators with no programming skills became programmers).
. Apply standard engineering practices - no exceptions. The control
requirements of all plants were analysed using the same analysis method. All process control programs were organised in the same way. They could be easily read and understood by anyone in the world who had received basic training. These techniques were successfully rolled out in three regions of the USA and several European plants. The Asia-Pacific rollout became my responsibility - it wasn't hard, there was a strong engineering culture in place before I arrived and as this was brand-new technology, no preconceived ideas to overcome.
. Reuse elements of past solutions where possible. Successful control
strategies that had been proven elsewhere were reused. Technology centres were tasked with making sure innovations in process control were communicated and reuse was maximised. Some programming exercises morphed into comprehensive copy and paste, with the attendant cost savings. My support group also took responsibility for "remembering neat ways of doing things," documenting them and promulgating them - complex stuff like feed-forward control using process modelling or simple stuff like "open the downstream valve before you start the pump - you idiot!!!"
. Apply strict quality control. It was easy for a newbe programmer to
stray from our best-known practices, so the dos and don'ts of control system design and coding were well established, documented and rigourously enforced in requirements and code reviews. We affected a pseudo-Nazi regime in this respect.
. Perform analysis and simulation. From the beginning it was possible
to test our programs via primitive simulations of plant conditions using dummy inputs. After a while we began to experiment with running our control algorithms against full-blown plant simulations. The effort required to analyse, specify and develop these simulations was roughly equivalent to that ploughed into the control program itself. The outcome was plant start-ups that took a week instead of a month; a massive cost saving.
. Manage risk - ask what could kill us next. Every plant I worked on
presented many opportunities to screw up. The ramifications varied from destruction of plant equipment, to causing sickness or death, to triggering explosions that would not only destroy the chemical processing complex, but also the surrounding neighbourhood. There was therefore a formal approach to risk management. A team was tasked with identifying dangerous states of each plant. Code was then written to sense these conditions, abort any existing control actions and return the plant to a safe operating state. Benefits of the Engineering Approach
There were many positive outcomes from our engineering approach, the most telling of which was:
... in 10 years of working in this application domain - with at least 10 projects running concurrently at any point in time, I NEVER once heard of or experienced a project failure.
I attribute this to:
. Customer focus. The control system software NEVER failed to meet the
needs of the processing plant - mainly due to the customer being embedded in the project and the high levels of expertise brought to bear on requirements definition.
. Process control. The software development process was simple, well
defined and rigourously enforced. We had no choice, we were always part of a larger systems project with immovable deadlines.
. Analysis. Analysis methods were mandated and therefore made
consistent across all plants. You could write any program you liked as long as it implemented the plant control system as a set of cooperating finite state engines. Further, formal analysis of reaction kinetics and the equipment under control, followed by generation of simulations, significantly reduced the resources required for plant start-ups.
. Early validation with model-based development. The use of the finite
state machine model allowed us to validate the overall plant control strategy, long before. Code was written. This eliminated overruns due to rework. We discovered that the most complex element of control was the logic around state transitions. This logic could be clearly stated in English and validated by engineers highly experienced in plant operations, but with no programming skills. If you like it allowed non-programmers to look deeper into what system was about to do and have more control over its behaviour.
. Quantification. Effort estimates were accurate. Using the state
machine as an estimating proxy we could predict how long it would take to develop control software within a week regardless of who was performing the work. This injected welcome predictability into our projects.
. Documentation. The statement of requirement became the plant
operating manual. Safety imperatives meant it was always kept up-to-date. Prior to plant commissioning these requirements were subject to rigourous review by process technology centres.
. Reuse of past solutions. A managed process for reusing innovative
control strategies enhanced quality and saved money.
. Concern for maintainability and safety. Maintainability and safety
were key issues in software development. Plants were constantly optimised and had long operational lives. Explicit requirement statements easily traceable to simple design archetypes (state machines) and implemented with simple readable code allowed anyone with process knowledge and basic programming training to enhance operations technology through software without compromising safety. There was a standing joke that after about three months from start up, you had to move the plant engineers who wrote the program on because life got incredibly boring. Often these plants started up as optimised as they were ever going to be. All the plant engineer could do was "stick his fingers in the program" (read over optimise) and screw it up. Better to move him on to another problem.
. Systems thinking. The software was always considered as a component
of a larger system (never an end in itself). The impact of software on the chemical plant as a whole was assessed and substantial benefits flowed. Introducing software into a chemical processing plant produced emergent behavior: high quality product for one, but by far the greatest benefit came from the ability to trade-off plant hardware for smarter software. For example, before computer-control it was considered unsafe to mix certain combinations of reactants in the same reactor. The problem was solved by using premix reactors to create less volatile, intermediate products. Computer control gave us tighter control of reactant ratios allowing us to eliminate premix operations and charge heretofore dangerous chemical mixes into the same reactor, at savings of hundreds of thousands of dollars.

Cheers
Les

-----Original Message-----
From: systemsafety-bounces_at_xxxxxx [mailto:systemsafety-bounces_at_xxxxxx Steve Tockey
Sent: Friday, February 21, 2014 10:20 AM To: Heath Raftery; systemsafety_at_xxxxxx Subject: Re: [SystemSafety] Logic

This may very well always be a challenge. People generally have a way too short term outlook, particularly mid-level managers in big corporations. I would like to (optimistically) extend Heath Raftery's example as follows (by the way, I refuse to refer to person B as "Engineer B" because they clearly aren't one):

Possibility A) Person B's implementation of doodad D is still little more than just flashing LEDs and clicking relays when Engineer A's solution is ready to go into production. Engineer A's production version works essentially flawlessly.

Possibility B) Person B does provide a "production version" of doodad D however that production system gives defective output on every 32nd use and crashes--requiring a complete OS reboot--on every 64th use. Engineer A's production version works essentially flawlessly.

But will the decision makers ever even notice??? Sadly, probably not.

I have a real-world example of a variant on this theme. Details are changed to keep PBL and his organization out of trouble.

I worked for about 8 years at a company that makes very high-tech transportation devices. I'll use cars as an analogy, but their devices are at least an order of magnitude more complex than cars.

We start off with Car Product Line 1, which the company has been building with, say, gasoline engines for almost 20 years. There's a comprehensive suite of "automated test equipment" (ATE) software--embedded in a hardware platform--for testing Product Line 1 cars as they are being manufactured. All existing test programs were traditionally-built C code that ran on HP/UX 9. Along comes the need to produce Product Line 1 cars, but with diesel engines. The "engine simulator ATE" application for gasoline engines is 25K SLOC. The most reasonable estimate is that engine sim ATE for diesel engines will also be about 25K SLOC however code reuse is not possible for reasons that can't be elaborated here. Nonetheless, at typical programmer productivity rates and the project's allocated staffing level, that's more than a year of development. The problem is that we're already in July and the first diesel engine car will be coming through the factory the following March. We only have 8 months, not 12. They simply couldn't wait until the following July (or, realistically, much later given typical software project schedule over-runs) for the diesel version of the engine sim ATE software.

The project manager, Mike (his real name), had worked with me before on some non-related projects and was aware of my involvement in model-based development so he invited me to give the team of four a presentation on the topic. The team was intrigued with the idea that we could significantly accelerate delivery because that's exactly what was needed. Everyone agreed to take the model-based development route. Estimates derived from early modeling predicted a mid-January completion date for the model-based project. We could get it done in about 6 months, well ahead of the March need date.

A mid-level manager (to remain nameless), having experienced horrible software project delays--due to necessary debugging--in the past, insisted on having the initial code written by the end of November. This was intended to allow adequate debugging time before the need date for the first diesel car. Long story short, the requirements modeling took until the middle of October. The design modeling took until early December. When that mid-level manager visited the project in early December, he almost went into orbit when he realized that the team had not yet written even one line of production code and the project was already past the point that he had mandated for "code complete". Mike almost lost his job right then and there.

There was a slight underestimate in the project, code complete (13K SLOC) and hardware integration was completed around January 21, not January 14 as predicted back in late July. We had done all of the testing that could be done without an actual car and everything worked as expected. The engine sim ATE system then sat there until mid March, waiting for the first diesel car. When that first diesel car was ready to be tested, both it and the ATE performed flawlessly.

A little more than a year earlier, the corporate executive management team approved the engineering & development of Car Product Line 2. The schedule from approval to Product Line 2 car #1 rolling off the assembly line was 2.5 years. The entire ATE software suite for Car Product Line 2 was included in the schedule and needed all of the 2.5 years for development. Unfortunately, that project's original software team wasted the first 1.25 years. When the executive management did a check of the Car Product Line 2 engineering & development critical path, they realized that the ATE software team was still sitting back at the starting line. The team members had authored a few interesting technical papers and played a lot of computer games but had made zero progress on actually producing ATE software. Most of that original team got reassigned to other projects and a new team was brought in. This new team noticed that Car Product Line 1's engine sim ATE was completed in about half the time that had been predicted, and that's pretty much what they had: half the time. So I and three of the four from Car Product Line 1 engine sim ATE were brought over to get Car Product Line 2 ATE going.

There was a management mandate to "reuse as much of the Car Product Line 1 code as possible". Unfortunately, code re-use was simply not an option because for some reason Car Product Line 2 had chosen C++ on HP/UX 10 for implementation. We did reuse a little code, but only 83 SLOC. Long story short, the entire ATE suite for Car Product Line 2 was delivered 3 days ahead of the need date for car #1. Keep in mind that the full ATE suite was a far bigger job, we had 30 developers and delivered 113K SLOC. 6-8 weeks after going live on the factory floor, we met with the ATE operators to see how they liked it. Simply, they were amazed at how such a complex piece of software could work so flawlessly from the very beginning. They had set up a contest to see which operator could crash ATE and nobody had been able to.

With such fully documented, high quality code the middle managers decided they didn't need nearly as many software weenies to maintain the Product Line 2 ATE code base. In their infinite wisdom they completely ignored the fact that we had built a team that took a project in seriously deep doo doo and made it successful. Rather than find another project that was in deep doo doo and turn the extra people loose on that, the excess staff got laid off (made redundant). The team's reward for doing a great job was that most of them lost their job. Sigh...

Now, wind the clock forward about 12-15 years later. I'm no longer working at the manufacturing company. By this time they were about half way into Car Product Line 3 engineering and development. Deja vu all over again as they realized that the original ATE software team had wasted the first half of the project schedule. Again, those people got re-assigned to other projects and I got a panic call from the new ATE software team. "Aren't you the guy who bailed out the Product Line 2 ATE software project?". "Yes, why?" "We desperately need your help..."

But again, code reuse was simply not an option because the Product Line 3 ATE project had already committed to C#/.net. Nonetheless, ATE software was ready well before Product Line 3 Car #1 was in a position to be tested. And when tested, both car #1 and ATE software performed flawlessly.

One very important lesson that this company never learned was that one major reason each of these projects were able to finish on time/early was because we reduced the amount of rework to negligible levels. Software projects at that company, like traditional software projects everywhere, suffered from 50-60% rework ("debugging"). All of the model-based ATE software projects featured peer review of the models that revealed and allowed removal of the majority of the defects before a single line of code was ever written. Rework on these projects was well under 10%, probably closer to 5%.

The other very important lesson that the company never learned was that the other major reason for finishing on time/early was because of requirements model reuse. If you laid the three requirements models side-by-side you would notice that 80-90% of the content was identical. >From a "what does it mean to test a car?" perspective, each version of ATE was largely just a minor modification of the earlier version, thus saving huge amounts of requirements development time.

In the end, my point is that the data is there. Projects have been done this way and those projects have been successful. But the business has to take the blinders off and understand what was done differently and why it made a difference. They seem to be totally incapable of this. Insert another sigh here...

I should add that what was done on these projects was not strictly "formal methods" in the sense that's being hotly debated here. We didn't use Z, VDM, or any of those formal languages. We didn't use theorem provers either. We used UML (and pre-UML because of project timing) class diagrams and state charts mostly, but we had a carefully defined and enforced (via the model inspections) single interpretation of that modeling language. Much like I mentioned in the Jeannette Wing "A Specifier's Introduction to Formal Methods" paper earlier, the modelers were using a comfortable surface syntax (UML) but there was a rigorous (albeit not exactly formally-defined) semantic to those models.

I can only speculate on the scalability of formal methods based on my experience. I suspect that they will scale just fine, provided that the people doing the majority of the "methods" work can work in comfortable surface syntaxes like UML and keep the Z, VDM, Larch, etc. stuff hidden under the covers. If anyone wants to do theorem proof of some interesting property, they are free to do so. Simply take the existing UML model and translate it into the underlying formal language equivalent and run the analysis on that. Every property proven about the formal representation has to apply to the UML version because they are equivalent semantics--they are just in a different syntax.

I don't have to speculate on the scalability of the ("semi-formal"?) model-based development process. I've personally been involved on projects that had more than 250 programmers working for about 5 years on code bases up to about 5-10 million SLOC. The projects delivered on time (or early) and the users were amazed by how few defects they encountered in actual use. We just have to find a way to get the corporate decision makers to notice...

-----Original Message-----
From: Heath Raftery <hraftery_at_xxxxxx Organization: ResTech Pty Ltd
Date: Wednesday, February 19, 2014 3:35 PM To: "systemsafety_at_xxxxxx <systemsafety_at_xxxxxx Subject: Re: [SystemSafety] Logic

On 19/02/2014 11:28 PM, Michael J. Pont wrote:

> It may - of course - be that the organisations I have closest contact
>with
> are atypical: they are, after all, a self-selecting group.  However,
>while
> I'm sure that there are many organisations that have mature processes in
> place for the development of real-time embedded systems, I'm equally sure
> that this isn't the norm.
>
> If we assume - for the moment - that my model is correct, how do we
>ensure
> that the situation is different in 10 years time?

Great points. I'd suggest that changes to education focus, while very important, wont be the necessary trigger. There needs to be a market force. The scenario that plays out in my world goes like this:

  1. Customer C requests doodad D to solve problem P.
  2. Engineer A says right, no problem, we just need to articulate the requirements and capture them in an unambiguous way. Formal methods can help, I'll show you the way.
  3. Engineer B says, no problem, in fact here's a prototype I whipped up. We're almost there.

Engineer A studied embedded development at an excellent facility and has sound knowledge of formal methods.

Engineer B taught herself programming and has been writing code since before she could drive.

4. A's manager asks how D is coming along and A says fine, we're working through the requirements.
5. B's manager asks how D is coming along and B says fine, look I've got the LEDs flashing and the relays clicking.

Guess which engineer gets rewarded?



The System Safety Mailing List
systemsafety_at_xxxxxx

The System Safety Mailing List
systemsafety_at_xxxxxx

The System Safety Mailing List
systemsafety_at_xxxxxx Received on Fri Feb 21 2014 - 02:52:25 CET

This archive was generated by hypermail 2.3.0 : Sat Feb 16 2019 - 02:17:06 CET