Re: [SystemSafety] Chicago controller halts Delta jet's near-miss....

From: Les Chambers < >
Date: Mon, 29 Jun 2015 09:32:10 +1000


Steve
I am assuming that even if there are other mechanisms for detecting unauthorised movement, ATC still depends on the voice channel to stop the aircraft taking off. Am I correct?
Les

-----Original Message-----
From: Steve Tockey [mailto:Steve.Tockey_at_xxxxxx Sent: Saturday, June 27, 2015 9:47 PM
To: Les Chambers; 'Peter Bernard Ladkin' Cc: systemsafety_at_xxxxxx Subject: Re: [SystemSafety] Chicago controller halts Delta jet's near-miss....

Les,
I think this might be overkill. If the only mechanism to assure safety were the conversation alone then your proposed scheme makes more sense. But it's not the only mechanism. The Midway incident was recognized visually. Ground radar can also help. So my point is that if failures can be recognized by other means, is it really necessary to put such a burden just on communication.

I'd be willing to bet that mis-communication happens far more often than this, and rarely ends up in a worst-case outcome (e.g., Tenerife). The vast majority of the time, one of these other mechanisms catches the problem before it turns into a disaster. It might not be economically viable to add this extra layer because it does come at a cost.

Cheers,

-----Original Message-----
From: Les Chambers <les_at_xxxxxx Date: Friday, June 26, 2015 11:48 PM
To: 'Peter Bernard Ladkin' <ladkin_at_xxxxxx Cc: "systemsafety_at_xxxxxx <systemsafety_at_xxxxxx Subject: Re: [SystemSafety] Chicago controller halts Delta jet's near-miss....

Peter
Further, on my proposition that the air traffic controller (ATC)/pilot take off protocol should be:

[1]ATC: cleared to take off
[2]Pilot: preparing for takeoff
[3]ATC: approved for takeoff

... Or words to that effect (NOTE: I am referring to the conceptual design of the protocol which addresses the number, sequence, rationale and meaning or essence of each message and excludes exact details of format and content).
 I have a problem with your reasoning that message [3] is unnecessary. I am passionate about this issue as the absence of a message [3] was responsible for one of the two career ending near misses I have experienced in the past 40 years.

You stated, "There is no good reason for a controller to ACK a correct readback,and it would complicate matters cognitively when transmissions take
up almost all the air time, which often happens at a major airport."

At face value this seems true in a perfect world. But we do NOT live in a perfect world.

In response, let me first state some propositions that, I believe, are self evident--> axioms if you like (please stay with me while I state the bleeding obvious).

ATC/pilot take off protocol is a master slave protocol. The ACT is the master, the pilot is the slave.
The pilot is the slave because his scope for independent action is severely limited while under the direction of the ATC. He basically does what he's told. And the safety of the system as a whole is dependent on him doing exactly what he is told.

ATC/pilot take off protocol is life-critical because failure of this protocol has a high probability of causing loss of life.

A life critical-protocol must, as far as possible, be robust. That is, it must compensate for failures in both master and slave, maintaining the system as a whole in a safe state in the presence of credible failure scenarios.

The most common failure mode of a party to a protocol (master or slave) is "poor health", where "poor health" means the person/party/system/subsystem ceases to behave in compliance with its specification. Classic examples are errors of commission and omission:
- the master issues an incorrect control for the current context;

An aircraft lying stationary at a turn off is in a low-risk state. The probability of harm to the pilgrims aboard is relatively low.

An aircraft in take off is in a high-risk state where the probability of harm, based on past experience, is comparatively high.

In transitioning his/her aircraft from low to high risk states a pilot makes
the following assumptions:
1. The ATC is healthy, that is, has issued a correct control for the current
context (message [1])
2. The control, as received by the pilot, has been correctly interpreted by the pilot
3. The pilot's interpretation of the control as-acknowledged to the ATC, has
been received and understood by the ATC(message [2]) 4. If the pilot's interpretation of the control is INCORRECT, the ATC+ message transmission medium is sufficiently healthy to transmit a "stop stop
stop" control
5. Upon receipt of the "stop stop stop" control, the pilot is sufficiently healthy to correctly interpret and act upon it.

For the two message protocol to succeed for all time, all the above assumptions must be correct for all time. At the core of these assumptions is the proposition that both master and slave will be healthy for all time.
This is a dangerous assumption as these entities are human beings capable of
gross errors of judgement.

Some examples:
- Distracted with personal problems. His wife has left him, his child is
sick, his daughter has died of a drug overdose (I cite television drama series: Breaking Bad, season 2 episode 13). In the real world I once frogmarched an engineer to the front door and told him to go home and look after his family. He had a three-year-old daughter with a temperature of 104
and a wife pleading with him on the hour to come home because their daughter
was dying. This particular start-up was the culmination of three years of his engineering. He was rightfully torn between work and home. In this induced, highly negative, mental state he was a danger to himself and everyone else on the start-up.
- Heart attack in progress. On one project we worked through the complete
community of 40 operators and gave those with potential heart or other potentially debilitating health problems very attractive redundancy packages.
- High on various chemical substances. The Australian Navy just revealed
several weapons electronics operators using various drugs while war fighting
a destroyer. Drug tests were ineffective. Drug tests ARE typically ineffective (I cite Lance Armstrong's 10 year career in drug cheating)
- In possession of an urgent desire to commit suicide (or watch entranced
as
others do). I cite German Wings.
- Unsighted. Fog means that the ATC cannot see the aircraft (as per
Tenerife
- scene of the world's worst aviation disaster).

The three-step protocol eliminates the need for assumptions 2, 3, 4 and 5. Message [3] pretty much catches everything. If you don't hear it, you don't move.
Further, my core objection to the two-step protocol is: the pilot takes radical action (that is, transitions his aircraft from a low to a high-risk state) on a NULL value. Silence means ascent. In doing this the pilot assigns meaning to a NULL. This is recognised bad practice. The only situation in which I can justify it is when the system is in a safe
state and NULL is interpreted as: do nothing. That is, remain in your current safe state. This is what the three-step protocol achieves. If you hear nothing from the ATC when you are expecting message [3] you do nothing and stay safe.

Returning to my career ending near miss, let me set the scene: A SCADA system is implemented with extensive use of optic fibre networks over distances of 20 kilometres. The optic fibre has one or more bad joints,
which cause disconnect-reconnect scenarios, typical of dry joints in copper.
The SCADA master communications module is required to compensate for this but it has a bug. On reconnect it blasts all its slaves with messages including random bit patterns. In a demonstration of genuine bad luck, one of these bits is misinterpreted as a control by one of its slave controllers
and an unsafe action is taken. Lucky for us nobody was killed. Had a three-step protocol been implemented the slave would: 1. Not have received message [3] and done nothing (the most likely scenario as the master was profoundly unhealthy)
2. Received another random bit pattern with a low probability of repeating the same control bit position (I know, I know, we should have used a 32-bit word not a single bit to command a critical action) 3. Received a healthy response that cancelled the previous erroneous control.

Overall the probability of total system failure would have been significantly reduced.



In response to your comment on my opening paragraph:

LES:

> It seems to me that the ATC - Pilot voice protocol is missing a step.
> ... In concept, a safer protocol might look like this: ATC: You are
> cleared for takeoff. Pilot: My understanding is that I am cleared for
> takeoff. ATC: Your understanding is correct

PETER:
"This misses crucial information, namely addressing, which is critical in a multiuser broadcast context. "

As stated above, my opening paragraph was a statement of conceptual design. Conceptual designs do not include detailed design or implementation detail. Conceptual designs test fundamental concepts, principles, rationale's and assumptions. This is what I have attempted to achieve above. I raised this issue as it is a common failing of technical designers to dive into massive detail without due consideration of the purity of their concepts (example: asking the question: are we allowing a transition to a high-risk state on the basis of a NULL?). Insufficient time in conceptual design is the ready-fire-aim approach from which the world suffers every waking hour.



On your comment regarding fixing the grammar used in ATC conversations, I wonder. You may have fixed the grammar but your linked papers show no evidence that cognition between pilot and ATC was tested and improvements demonstrated. Have you any?

In conclusion let me ask you:
If, as my slave, I commanded you to jump off a cliff, would you respond, "acknowledged, jumping", and execute the control? I suspect not. Any rational person would opt for message [3] before transitioning their body from the safe to the unsafe state.

Even more probable would be the response, "Les, go jump yourself." Syntactically poor but semantically rich.

Cheers
Les

-----Original Message-----
From: Peter Bernard Ladkin [mailto:ladkin_at_xxxxxx Sent: Sunday, June 21, 2015 6:36 PM
To: Les Chambers
Subject: Re: [SystemSafety] Chicago controller halts Delta jet's near-miss....

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 2015-06-21 01:28 , Les Chambers wrote: > It seems to me that the ATC - Pilot voice protocol is missing a step. ... In concept, a safer
> protocol might look like this: ATC: You are cleared for takeoff Pilot: My understanding is that
> I am cleared for takeoff ATC: Your understanding is correct

This misses crucial information, namely addressing, which is critical in a multiuser broadcast
context. A clearance is preceded by a call sign, and an ACK is succeeded by the call sign. Call
signs may be abbreviated, which can lead to confusion when the abbreviations
are close, and when
transmissions are stepped on, which might have been the case in the incident
in question. So let's
correct for call signs, and translate into the standard ATC-aircraft controlled language. What you
suggest is:

> [1] ATC: [call sign] Cleared for takeoff [2] CRW: Cleared for takeoff [call sign] [3] ATC:
> [call sign] Affirmative

Steps 1 and 2 are required. Step 3 is not; if Step 2 is not correctly executed, then Controller
will respond:

> [1] ATC: [call sign] Cleared for takeoff [2] CRW: Cleared for takeoff [other call sign] [3]
> ATC: <other call sign> Negative [other call sign]

or

> [1] ATC: [call sign] Cleared for takeoff [2] CRW: Cleared for takeoff [other call sign] [3]
> ATC: [other call sign] Negative [other call sign]; <[call sign] Cleared for takeoff>

which, if you analyse it, works just as well, and is more efficient. (the "<...>" indicates an
optional expression.) Don't forget that this may be interspersed with other transmissions, for
example

> [2'] CRW: Cleared for takeoff [call sign]

in which case the option will not be exercised.

There is no good reason for a controller to ACK a correct readback, and it would complicate
matters cognitively when transmissions take up almost all the air time, which often happens at a
major airport.

Further, complete expression as whole phrases doesn't illustrate the resilience of the language in
the face of partial obscuration, which is an important feature.

Cushing (op. cit. antea) provided a grammar for such communications in his book. Cushing is a
linguist, but his grammar was partially incorrect and also structurally more
complex than need be.
Twelve-thirteen years ago, some people working with me fixed it. See Review of the Cushing
Grammar, by Martin Ellermann and Mirco Hilbert at http://www.rvs.uni-bielefeld.de/publications/Papers/hillermann-critique.pdf and Building a Parser
for ATC Language, by the same authors, at http://www.rvs.uni-bielefeld.de/publications/Papers/hillermann-critique.pdf

PBL Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Je suis Charlie
Tel+msg +49 (0)521 880 7319 www.rvs.uni-bielefeld.de

-----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJVhnduAAoJEIZIHiXiz9k+5y4H/0Q36SxPrS06k9UThUCjaWEz /5Lu4QHjMEXTxoGA+lt/TmC666uteMErS4QzhvPZWaJokrKtDEutyUZUPwjSZb6+ UJnb+FxWq2MSRS36slRvhZZ7GaGssR6P+QRR95HQd+9T0nF6doXHvF7pX1LBLAH9 Q/SS3yZQdFDBxpUHfEwAAzgiugqFG05LNQKk+tOIbTKPL0yXvZE1HZLuY4HnfsY3 qr68yZIeh9ubJyk7pIr77kEP1/CPLFLKncz0UwM3J5A2dALAhivbACs52LQGbbLA 3Pp/UyNQMFq2d8xBtx2/SMp/WJplYM9dMddnxmG+zMlEomL6S8o8DiINuV4U8zc= =6f6U
-----END PGP SIGNATURE-----



The System Safety Mailing List
systemsafety_at_xxxxxx

The System Safety Mailing List
systemsafety_at_xxxxxx Received on Mon Jun 29 2015 - 01:32:21 CEST

This archive was generated by hypermail 2.3.0 : Sun Feb 17 2019 - 16:17:07 CET