[SystemSafety] NTSB report on Boeing 787 APU battery fire at Boston Logan

From: Peter Bernard Ladkin < >
Date: Thu, 04 Dec 2014 14:42:39 +0100

Has been published at http://www.ntsb.gov/doclib/reports/2014/AIR1401.pdf

There was an NYT article yesterday:

Just the summary of the NTSB report is astonishing in itself! Keep in mind this is a 2.2kW-hr energy storage device (75 Amp-hours at nominal 29.6 V). When they decide to go, they can get rid of all that energy in a relatively short space of time. But apparently the regulator didn't think so. Before.

Pp viii-ix of the Report contains a summary of the conclusions. It makes clear an astonishingly superficial grasp of the technology on the part of Boeing and the FAA. The manufacturer's processes allowed FOD and improper cell winding without having effective detection methods in place.

My remarks are in square parentheses

[begin quote NTSB]

The NTSB identified the following safety issues as a result of this incident investigation:

[Huh? How could one not anticipate internal short circuits? How could one not anticipate thermal
runaway from an internal short circuit? Answer: this was an assumption derived from a single nail-penetration test.]

..... Boeing’s analysis of the main and APU battery did not consider the possibility that cascading thermal runaway of the battery could occur as a result of a cell internal short circuit.

[This is an astonishing assertion, but appears to be well justified in the NTSB analysis. How do
you miss such an obvious phenomenon? I guess it's an example of group think. The NTSB notes the lack of effective traceability in the system safety assessment, pp73ff.]

[That is, the manufacturer of extremely powerful Li-ion secondary batteries was not using
"established industry practices". Not only that, but its quality-control processes were flawed. And this after all those public claims about careful oversight. ]

[Well, yes. That is or should be routine safety analysis and factor mitigation. But both the
manufacturer's FMEA and the FAA requirements of the system safety assessment seem to have been lacking; see below.]

[This is *really* hard to fathom!]

[end quote]

The manufacturing-line defects were quite straightforward. It astonishes me that the NTSB was able to observe "perturbations" in electrode/separator strips being wound during their inspection - and that such as these were not discovered using the manufacturer's quality-control (CT of the results which was too coarse to detect the kind of FOD that might well have got in, or even the perturbations the NTSB found), because these things don't appear to be subtle.

The manufacturer's FMEA was apparently based upon in-service data of 14,000 cells of a similar design to the LVP65.

On p68 we read:

[begin quote]

Boeing and Thales performed preliminary and final EPS safety assessments, which included fault tree analyses, FMEAs, and failure rate data provided by GS Yuasa. These assessments considered internal short circuit failures but were developed with the underlying assumption that the most severe effect of an internal short circuit within a cell would be limited to venting of only that cell without fire and propagation to other cells. Thus, the potential for an internal short circuit to lead to multiple-cell or battery thermal runaway with venting, electrolyte leakage, excessive heat, and fire was not analyzed in the safety assessment.

[end quote]

In other words, the FMEA contained an inadequate "E" part - internal short circuits leading to thermal runaway apparently didn't occur as an effect of a failure. Why not?

The FMEA is talked about on pp49-51, in Section 1.7.3 System Safety Assessment:

[begin quote]

Boeing’s FMEA was based on information contained within GS Yuasa’s FMEA, which GS Yuasa developed with assistance from Boeing and Thales. GS Yuasa’s FMEA included a calculation of a representative failure rate for the LVP65 cell. This calculation was based on in-service data from about 14,000 existing large-scale industrial lithium-ion cells manufactured by GS Yuasa, which had a similar design and manufacturing process as the LVP65 cell.106 GS Yuasa’s FMEA indicated that none of the industrial cells had experienced any failures, including venting, electrolyte release, or rupture of a vent disc. (GS Yuasa’s FMEA did not include an analysis of usage and environmental similarities between the industrial cells and the LVP65 cells or a discussion of the hazardous effects of a lithium-ion cell failure, including overheating or venting.)

[end quote]

So they did an FMEA using data from cells, none of which had failed. Looks good so far! Perfect manufacturing! But then there was a Nov 2006 fire at Securaplane, which makes the battery charging system (BCS). Investigation put this down to an cell-internal short, and overcharging of at least one other cell (Note 81, p43). There was also a thermal runaway in an APU battery on July 7, 2009 (Note 82, p43). Both of these incidents vitiate the assumptions made in the system safety assessment that thermal runaway was not a possible effect, but apparently nobody at Boeing or the FAA noticed. In other words, the SSA was not revisited as a result of these two incidents.

Yes, a lack of joined-up thinking. In some sense, this was known to be a problem with the heavily outsourced/subcontracted 787 project - one might even guess that "ensuring joined-up thinking" is THE big challenge with such efforts. Recall the cable-bundle mismatching that occurred on the A380, which if I remember rightly was partly put down to different Airbus plants using different versions of the CAD tool CATIA. But this lack of joined-up thinking went beyond the manufacturer (on this project more a systems integrator) to include in the 787 case the regulator as well!

A significant piece of information concerning aircraft safety assessment is contained in Note 86, p44:

[begin quote]

The FAA did not consider the 787 battery to be a critical component because the Seattle Aircraft Certification Office (which was responsible for the airplane’s certification) regarded the battery as a redundant system. ......

[end quote]

You are only "critical" according to airworthiness regulations if you're a single point of failure, and you only get selected for top scrutiny if you are manufacturing a "critical" component. There is an obvious argument here for a notion of criticality referring to the severity of consequences of (faulty or otherwise) behavior.

In any case, that won't help if the FMEA/FHA is faulty and doesn't indicate any effect greater than a single smoky cell.

Once again, it seems that faulty safety assessment, in this case (again) an obviously inadequate FMEA played a significant role, despite the presence of incidents contradicting the analysis.

(There are people here who have heard me say enough times that I haven't seen an FMEA I can't fault. There are plenty of other people on this list who can likely that also. Now it's the NTSB's turn to say it, even if discreetly.)

PBL Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany Tel+msg +49 (0)521 880 7319 www.rvs.uni-bielefeld.de

The System Safety Mailing List
systemsafety_at_xxxxxx Received on Thu Dec 04 2014 - 14:42:49 CET

This archive was generated by hypermail 2.3.0 : Tue Jun 04 2019 - 21:17:07 CEST