Re: [SystemSafety] Research topics

From: Christopher Johnson < >
Date: Fri, 26 Jul 2013 09:24:44 +0000


This is really interesting - we are working with the European Network and Information Security Agency (ENISA) on the Cyber-security proposals for mandatory incident reporting in critical infrastructures (known as Article 14) which will also cover more general forms of systems failure.

Our particular focus in the project has been to design a European reporting system for private and public Cloud architectures in critical infrastructures.

The comments by John and Thierry really strike a chord. On the other hand, by comparison to the internal incident reporting systems used by Amazon, Google and Microsoft Cloud systems, those used in most safety industries look like the work of amateurs. The same also applies to the way they use redundancy in their server farms.

Just one aside to John - in these industries, reporting is explicitly included within SLAs in most contracts - frequency - definition of incidents - recourse if a report is not made etc but the problems he mentions do not go away 8(

From: systemsafety-bounces_at_xxxxxx Sent: 26 July 2013 10:03
To: 'systemsafety_at_xxxxxx Subject: Re: [SystemSafety] Research topics

One of the main obstacles in getting the data from real systems in the field is that the operator, who observes and reports an incident, will not know whether the effect they see is anything to do with software. Indeed, if they do know any detail of the implementation, it could hinder the incident investigation rather than help. The investigator wants to know what actually happened, not be told of theories about "circumbobulators having flange misalignment", or whatever. Another real-world effect is that an incident probably will not be reported if there is an established work-around. Management may think that their engineered system is wonderfully reliable when, in reality, they are employing a bunch of heroes who keep the Customers happy despite repeated system drop-outs... ...and, no doubt, because the system is so reliable, they are planning to "cut costs" by getting rid of the more-experienced operators.

John
Usual caveat about this being my opinion and not those of my employers, Customers or Clients. Sent: 26 July 2013 09:44
To: systemsafety_at_xxxxxx Subject: Re: [SystemSafety] Research topics

Hi,
One suggestion would be to look at the data-gathering in the field for actual reliability data for safety-critical SW. it seems there are many obstacles to get good (any?) data. And the research would lead to actually asking what are the right properties in the field that are measurable. One particular area of concern is the lack of data on incidents (minor failures in SW that are not in themselves critical but are advance warning and predictors of the presence of major defects). I suppose the research would look at technical, organizational and legal aspects, all of which might be interesting to a PHD student (and for his funding?).

Best regards,
Thierry Coq
DNV



If you are not the intended recipient, please notify our Help Desk at Email Information.Solutions_at_xxxxxx

NATS computer systems may be monitored and communications carried on them recorded, to secure the effective operation of the system.

Please note that neither NATS nor the sender accepts any responsibility for viruses or any losses caused as a result of viruses and it is your responsibility to scan or otherwise check this email and any attachments.

NATS means NATS (En Route) plc (company number: 4129273), NATS (Services) Ltd (company number 4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd (company number 3155567) or NATS Holdings Ltd (company number 4138218). All companies are registered in England and their registered office is at 4000 Parkway, Whiteley, Fareham, Hampshire, PO15 7FL.





The System Safety Mailing List
systemsafety_at_xxxxxx Received on Fri Jul 26 2013 - 11:24:56 CEST

This archive was generated by hypermail 2.3.0 : Tue Jun 04 2019 - 21:17:05 CEST