Re: [SystemSafety] NTSB report on Boeing 787 APU battery fire at Boston Logan

From: Mike Ellims < >
Date: Thu, 4 Dec 2014 15:44:49 -0000


Peter comments that " Huh? How could one not anticipate internal short circuits? How could one not anticipate thermal runaway from an internal short circuit? Answer: this was an assumption derived from a single nail-penetration test."

So could the assumption have been validated as it should have been?

Simplest way to test this is to do a literature search.

Section 1.2 of the report states; "The 787 program began in April 2004, with the 787's first flight in December 2009, certification in August 2011, and first delivery in September 2011."

Searching on " lithium ion battery fire" and " lithium ion battery failure" for papers (Google Scholar) published up to 2003 produces thousands of results.

For example from paper published in 2003...

... their safety is still a major concern, and are less safe as the capacity of the battery increases. In particular, one of the unsolved problems that can occur during operation, abrupt overcharge to the voltage-supply limit (12 V) owing to a defect or a malfunction in the protective devices[+] of the cell, has not been prevented. Moreover, numerous battery accidents with accompanying fires and explosions have been reported.[3] The main cause of such disasters is that LiCoO2 cathodes can undergo a violent exothermic reaction with the electrolyte during overcharge, which may result in the cell shortcircuiting. In addition, lithium deposited on the graphite anode accelerates the reaction, and results in a sharp rise in temperature.[4-6] Furthermore, this process converts LiCoO2 into the strong oxidizing agent Co2O3, which releases oxygen during overcharging. A combination of the temperature increase and the internal short circuit of the cell eventually results in an explosion of the cell. In spite of this, no fundamental solution has been found.  

Full paper at http://ep.snu.ac.kr/publication/pdf/2003%20Angew%2042_1618.pdf

-----Original Message-----
From: systemsafety-bounces_at_xxxxxx
[mailto:systemsafety-bounces_at_xxxxxx Peter Bernard Ladkin
Sent: 04 December 2014 13:43
To: The System Safety List
Subject: [SystemSafety] NTSB report on Boeing 787 APU battery fire at Boston Logan

Has been published at http://www.ntsb.gov/doclib/reports/2014/AIR1401.pdf

There was an NYT article yesterday:
http://www.nytimes.com/2014/12/02/business/report-on-boeing-787-dreamliner-b atteries-assigns-some-blame-for-flaws.html

Just the summary of the NTSB report is astonishing in itself! Keep in mind this is a 2.2kW-hr energy storage device (75 Amp-hours at nominal 29.6 V). When they decide to go, they can get rid of all that energy in a relatively short space of time. But apparently the regulator didn't think so. Before.

Pp viii-ix of the Report contains a summary of the conclusions. It makes clear an astonishingly superficial grasp of the technology on the part of Boeing and the FAA. The manufacturer's processes allowed FOD and improper cell winding without having effective detection methods in place.

My remarks are in square parentheses

[begin quote NTSB]

The NTSB identified the following safety issues as a result of this incident investigation:

[Huh? How could one not anticipate internal short circuits? How could one
not anticipate thermal runaway from an internal short circuit? Answer: this was an assumption derived from a single nail-penetration test.]

..... Boeing's analysis of the main and APU battery did not consider the possibility that cascading thermal runaway of the battery could occur as a result of a cell internal short circuit.

[This is an astonishing assertion, but appears to be well justified in the
NTSB analysis. How do you miss such an obvious phenomenon? I guess it's an example of group think. The NTSB notes the lack of effective traceability in the system safety assessment, pp73ff.]

[That is, the manufacturer of extremely powerful Li-ion secondary batteries
was not using "established industry practices". Not only that, but its quality-control processes were flawed.
And this after all those public claims about careful oversight. ]

[Well, yes. That is or should be routine safety analysis and factor
mitigation. But both the manufacturer's FMEA and the FAA requirements of the system safety assessment seem to have been lacking; see below.]

[This is *really* hard to fathom!]

[end quote]

The manufacturing-line defects were quite straightforward. It astonishes me that the NTSB was able to observe "perturbations" in electrode/separator strips being wound during their inspection - and that such as these were not discovered using the manufacturer's quality-control (CT of the results which was too coarse to detect the kind of FOD that might well have got in, or even the perturbations the NTSB found), because these things don't appear to be subtle.

The manufacturer's FMEA was apparently based upon in-service data of 14,000 cells of a similar design to the LVP65.

On p68 we read:

[begin quote]

Boeing and Thales performed preliminary and final EPS safety assessments, which included fault tree analyses, FMEAs, and failure rate data provided by GS Yuasa. These assessments considered internal short circuit failures but were developed with the underlying assumption that the most severe effect of an internal short circuit within a cell would be limited to venting of only that cell without fire and propagation to other cells. Thus, the potential for an internal short circuit to lead to multiple-cell or battery thermal runaway with venting, electrolyte leakage, excessive heat, and fire was not analyzed in the safety assessment.

[end quote]

In other words, the FMEA contained an inadequate "E" part - internal short circuits leading to thermal runaway apparently didn't occur as an effect of a failure. Why not?

The FMEA is talked about on pp49-51, in Section 1.7.3 System Safety Assessment:

[begin quote]

Boeing's FMEA was based on information contained within GS Yuasa's FMEA, which GS Yuasa developed with assistance from Boeing and Thales. GS Yuasa's FMEA included a calculation of a representative failure rate for the LVP65 cell. This calculation was based on in-service data from about 14,000 existing large-scale industrial lithium-ion cells manufactured by GS Yuasa, which had a similar design and manufacturing process as the LVP65 cell.106 GS Yuasa's FMEA indicated that none of the industrial cells had experienced any failures, including venting, electrolyte release, or rupture of a vent disc. (GS Yuasa's FMEA did not include an analysis of usage and environmental similarities between the industrial cells and the LVP65 cells or a discussion of the hazardous effects of a lithium-ion cell failure, including overheating or venting.)

[end quote]

So they did an FMEA using data from cells, none of which had failed. Looks good so far! Perfect manufacturing! But then there was a Nov 2006 fire at Securaplane, which makes the battery charging system (BCS). Investigation put this down to an cell-internal short, and overcharging of at least one other cell (Note 81, p43). There was also a thermal runaway in an APU battery on July 7, 2009 (Note 82, p43). Both of these incidents vitiate the assumptions made in the system safety assessment that thermal runaway was not a possible effect, but apparently nobody at Boeing or the FAA noticed. In other words, the SSA was not revisited as a result of these two incidents.

Yes, a lack of joined-up thinking. In some sense, this was known to be a problem with the heavily outsourced/subcontracted 787 project - one might even guess that "ensuring joined-up thinking" is THE big challenge with such efforts. Recall the cable-bundle mismatching that occurred on the A380, which if I remember rightly was partly put down to different Airbus plants using different versions of the CAD tool CATIA. But this lack of joined-up thinking went beyond the manufacturer (on this project more a systems integrator) to include in the 787 case the regulator as well!

A significant piece of information concerning aircraft safety assessment is contained in Note 86, p44:

[begin quote]

The FAA did not consider the 787 battery to be a critical component because the Seattle Aircraft Certification Office (which was responsible for the airplane's certification) regarded the battery as a redundant system. ......

[end quote]

You are only "critical" according to airworthiness regulations if you're a single point of failure, and you only get selected for top scrutiny if you are manufacturing a "critical" component. There is an obvious argument here for a notion of criticality referring to the severity of consequences of (faulty or otherwise) behavior.

In any case, that won't help if the FMEA/FHA is faulty and doesn't indicate any effect greater than a single smoky cell.

Once again, it seems that faulty safety assessment, in this case (again) an obviously inadequate FMEA played a significant role, despite the presence of incidents contradicting the analysis.

(There are people here who have heard me say enough times that I haven't seen an FMEA I can't fault.
There are plenty of other people on this list who can likely that also. Now it's the NTSB's turn to say it, even if discreetly.)

PBL Prof. Peter Bernard Ladkin, Faculty of Technology, University of Bielefeld, 33594 Bielefeld, Germany
Tel+msg +49 (0)521 880 7319 www.rvs.uni-bielefeld.de



The System Safety Mailing List
systemsafety_at_xxxxxx
---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

_______________________________________________
The System Safety Mailing List
systemsafety_at_xxxxxx
Received on Thu Dec 04 2014 - 16:45:17 CET

This archive was generated by hypermail 2.3.0 : Tue Jun 04 2019 - 21:17:07 CEST