Re: [SystemSafety] Qualifying SW as "proven in use" [Measuring Software]

From: Derek M Jones < >
Date: Sat, 29 Jun 2013 16:08:01 +0100


Steve,

I have bee meaning to do this for a while:

http://shape-of-code.coding-guidelines.com/2013/06/29/empirical-se-groups-doing-interesting-work-2013-version/

> "I know of over a dozen people working towards PhDs on these and related
> topics; there are probably a lot more."
> 
> I would be interested in knowing who these people are, is there a way that
> you could please make an introduction?
> 
> 
> Thanks,
> 
> -- steve
> 
> 
> 
> -----Original Message-----
> From: Derek M Jones <derek_at_xxxxxx
> Organization: Knowledge Software, Ltd
> Date: Wednesday, June 26, 2013 7:14 AM
> To: "systemsafety_at_xxxxxx
> <systemsafety_at_xxxxxx
> Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
> [Measuring	Software]
> 
> Steve,
> 

>> I've been trying to drum up interest in this among my customers. They all
>> have significant code bases (totaling 10 million lines of code or more,
>> easily) with years of operational defect data. The two tricks are 1)
>> correlating the defect fixes to specific elements of code, and 2) finding
>> the time to do the analysis. What I'm hoping to do is run across someone
> 
> There is plenty of bug data from open source projects and
> researchers are slowly building an infrastructure to
> extract the desired information.  The number of issues
> that need to be addressed to turn raw fault data into something
> useful seems to be never ending (spotting duplicate reports,
> miscategorized reports, fault fixes intermingled with other
> work and vice versa).
> 
> There are now tools that will connect bug reports to the changed
> files, search back through the change history to find when the
> code was first written, spot most duplicates and even highlight
> threads in the discussion list relating to the report.
> 
> We live in interesting times in that good quality data is starting
> to become available in volume, i.e., the output of data extraction and
> human/machine cleaned data.
> 
> What the commercial data could have that the open source data rarely has
> is manpower information, how much effort was needed to do the work.
> 

>> who is working at one of these companies who is in (or will start) a
>> masters or Ph.D. Program. What a thesis topic this would be, huh?
> 
> I know of over a dozen people working towards PhDs on these and related
> topics; there are probably a lot more.
> 

>>
>> "We are not going to move towards professionalism until there are less
>> software development jobs than half competent developers. Hiring
>> people based on their ability to spell 'software' is not an
>> environment where professionalism takes root."
>>
>> As we say in our company, "the ability to fog a mirror shouldn't
>> count"...
>>
>> "I keep telling people that the best way to reduce faults in code is
>> to start sending developers to prison. Nobody take me seriously (ok,
>> yes, it would probably be a difficult case to bring)."
>>
>> If the damage is that substantial, why not? On the other hand, simple
>> financial liability would suffice in my opinion.
>>
>> I recently had an internal discussion with my boss about our "friends" at
>> Microsoft offering a "bug bounty" for reporting security defects in MS
>> products (see http://m.usatoday.com/article/news/2441675). I'm fine with
>> that, but I think the program would have a lot more positive effect if
>> the
>> bounty was deducted from the salary of the idiot programmer who wrote
>> that
>> code.
>>
>> "Even the simpler issue of identifier semantics is still way beyond our
>> reach. See:
>> http://www.coding-guidelines.com/cbook/sent792.pdf
>> for more than you could ever want to know about identifier selection
>> issues."
>>
>>
>> Very interesting, indeed. Thanks for the reference. I used to work with a
>> guy at Boeing who said, "90% of programming is choosing good names". I
>> surely agree with his intent.
>>
>>
>> Cheers,
>>
>> -- steve
>>
>>
>>
>> -----Original Message-----
>> Organization: Knowledge Software, Ltd
>> Date: Tuesday, June 25, 2013 5:54 PM
>> <systemsafety_at_xxxxxx >> Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
>> [Measuring Software]
>>
>> Steve,
>>
>>> I think we both strongly agree that there really needs to be a lot more
>>> evidence.
>>
>> Yes. No point quibbling over how little little might be.
>>
>>> But I'm not looking for a correlation of overall, total code base
>>> cyclomatic complexity to overall defects. I'm looking for the
>>> correlation
>>> of cyclomatic complexity within a single function/method to the defect
>>> density within that same single function/method.
>>
>> Left to their own devices developers follow fairly regular patterns
>> of code usage. An extreme outlier of any metric is suspicious and
>> often worth some investigation; it might be the case that
>> the developer had a bad day or perhaps that function has to implement
>> some complicated application functionality. or something else.
>>
>> Outliers are the low hanging fruit.
>>
>> The problems start, or rather the time wasting starts, when
>> specific numbers get written into documents and is used to
>> judge what developers produce.
>>
>>> along, what we need in the end is a balancing of a collection of
>>> syntactic
>>> complexity metrics. When functions/methods are split, it always
>>> increases
>>> fan out. When functions/methods are merged, it always decreases fan out.
>>> The complexity didn't go away, it just moved to a different place in the
>>> code. So having a limit in only one place easily allows people to
>>> squeeze
>>> it into any other place. Having a set of appropriate limits means
>>> there's
>>> a lot less chance of it going unnoticed somewhere else.
>>
>> Yes, what we need to lots of good quality data for lots of code
>> attributes so we can start looking at these trade-offs.
>> Unfortunately the only good quality data I have involves small
>> numbers of attributes.
>>
>> Having seen what a hash some researchers make of analysing the data
>> they have I am loath to accept finding where the data is not made
>> available.
>>
>>> accident. Just the same, I'm basically arguing for more professionalism
>>> in
>>> the software industry. I mean seriously, the programmer who was
>>> responsible for that single C++ class with a single method of 3400 lines
>>> of code with a cyclomatic complexity over 2400 is a total freaking moron
>>> who has no business whatsoever in the software industry.
>>
>> We are not going to move towards professionalism until there are less
>> software development jobs than half competent developers. Hiring
>> people based on their ability to spell 'software' is not an
>> environment where professionalism takes root.
>>
>> I keep telling people that the best way to reduce faults in code is
>> to start sending developers to prison. Nobody take me seriously (ok,
>> yes, it would probably be a difficult case to bring).
>>
>>> And, we will also always need semantic evaluation of code (which, as I
>>> said earlier, has to be done by humans) because syntax-based metrics
>>> alone
>>> will probably always be game-able.
>>
>> Until strong AI arrives that will not happen.
>> Even the simpler issue of identifier semantics is still way beyond our
>> reach. See:
>> http://www.coding-guidelines.com/cbook/sent792.pdf
>> for more than you could ever want to know about identifier selection
>> issues.
>>
>>>
>>> Regards,
>>>
>>> -- steve
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Derek M Jones <derek_at_xxxxxx >>> Organization: Knowledge Software, Ltd
>>> Date: Tuesday, June 25, 2013 4:21 PM
>>> To: "systemsafety_at_xxxxxx >>> <systemsafety_at_xxxxxx >>> Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
>>> [Measuring Software]
>>>
>>> Steve,
>>>
>>> ...
>>>> "local vs. global" categories, it's just that nobody has yet published
>>>> any
>>>> data identifying which ones should be paid attention to and which ones
>>>> should be ignored.
>>>
>>> So you agree that there is no empirical evidence.
>>>
>>> Your statement is also true of almost every metrics paper published
>>> todate.
>>>
>>> With so many different metrics having been proposed at least one of
>>> them is likely to agree with the empirical data that is yet to be
>>> published.
>>>
>>> You cited the paper: ˇ°A Practical Guide to Object-Oriented Metricsˇ±
>>> as the source of the cyclomatic complexity vs fault correlation
>>> claim. Fig 4 looks like it contains the data. No standard
>>> deviation is given for the values, but this would have to be
>>> very large to ruin what looks like a reasonable correlation.
>>>
>>> Such a correlation can often be found, however:
>>>
>>> o cyclomatic complexity is just one of many 'complexity'
>>> metrics that have a high correlation with quantity of code,
>>> so why not just measure lines of code?
>>>
>>> o once developers know they are being judged by some metric
>>> or other they can easily game the system by actions such as
>>> splitting/merging functions. If the metric has a causal connection
>>> to the quantity of interest, e.g., faults, then everybody is happy
>>> for developers to what what they will to reduce the metric,
>>> but if the connection is simply a correlation (based on code
>>> written by developers not trying to game the system) then
>>> developers doing whatever it takes to improve the metric value
>>> is at best wasted time.
>>>
>>>>
>>>> -----Original Message-----
>>>> Date: Monday, June 24, 2013 7:20 PM
>>>> To: "systemsafety_at_xxxxxx >>>> <systemsafety_at_xxxxxx >>>> Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
>>>> [Measuring Software]
>>>>
>>>> ST> For example, the code quality measure "Cyclomatic Complexity"
>>>> (reference:
>>>> ST> Tom McCabe, ©řA Complexity Measure©÷, IEEE Transactions on Software
>>>> ST> Engineering, December, 1976) was validated many years ago by simply
>>>>
>>>> DMJ> I am not aware of any study that validates this metric to a
>>>> reasonable
>>>> DMJ> standard. There are a few studies that have used found a medium
>>>> DMJ> correlation in a small number of data points.
>>>>
>>>> Les Hatton had an interesting presentation in '08, "The role of
>>>> empiricism
>>>> in improving the
>>>> reliability of future software" that shows there is a strong
>>>> correlation
>>>> between
>>>> source-lines-of-code and cyclomatic complexity, and that defects follow
>>>> a
>>>> power law distribution:
>>>>
>>>>
>>>>
>>>>
>>>> http://www.leshatton.org/wp-content/uploads/2012/01/TAIC2008-29-08-2008.
>>>> p
>>>> d
>>>> f
>>>>
>>>> Just another voice, which probably just adds evidence to the argument
>>>> that
>>>> we haven't yet found a
>>>> trivial metric to predict bugs...
>>>>
>>>> -TC
>>>>
>>>> On 6/24/2013 6:38 PM, Derek M Jones wrote:
>>>>> All,
>>>>>
>>>>>> Actually, getting the evidence isn't that tricky, it's just a lot of
>>>>>> work.
>>>>>
>>>>> This is true of most things (+ getting the money to do the work).
>>>>>
>>>>>> Essentially all one needs to do is to run a correlation analysis
>>>>>> (correlation coefficient) between the proposed quality measure on the
>>>>>> one
>>>>>> hand, and defect tracking data on the other hand.
>>>>>
>>>>> There is plenty of dirty data out there that needs to be cleaned up
>>>>> before it can be used:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> http://shape-of-code.coding-guidelines.com/2013/06/02/data-cleaning-the
>>>>> -
>>>>> n
>>>>> e
>>>>> xt-step-in-empirical-software-engineering/
>>>>>
>>>>>
>>>>>> For example, the code quality measure "Cyclomatic Complexity"
>>>>>> (reference:
>>>>>> Tom McCabe, ©řA Complexity Measure©÷, IEEE Transactions on Software
>>>>>> Engineering, December, 1976) was validated many years ago by simply
>>>>>
>>>>> I am not aware of any study that validates this metric to a reasonable
>>>>> standard. There are a few studies that have used found a medium
>>>>> correlation in a small number of data points.
>>>>>
>>>>> I have some data whose writeup is not yet available in a good enough
>>>>> draft form to post to my blog. I only plan to write about this
>>>>> metric because it is widely cited and is long overdue for relegation
>>>>> to the history of good ideas that did not stand the scrutiny of
>>>>> empirical evidence.
>>>>>
>>>>>> finding a strong positive correlation between the cyclomatic
>>>>>> complexity
>>>>>> of
>>>>>> functions and the number of defects that were logged against those
>>>>>> same
>>>>>
>>>>> Correlation is not causation.
>>>>>
>>>>> Cyclomatic complexity correlates well with lines of code, which
>>>>> in turn correlates well with number of faults.
>>>>>
>>>>>> functions (I.e., code in that function needed to be changed in order
>>>>>> to
>>>>>> repair that defect).
>>>>>
>>>>> Changing the function may increase the number of faults. Creating two
>>>>> functions where there was previously one will reduce an existing peak
>>>>> in the distribution of values, but will it result in less faults
>>>>> overall?
>>>>>
>>>>> All this stuff with looking for outlier metric values is pure hand
>>>>> waving. Where is the evidence that the reworked code is better not
>>>>> worse?
>>>>>
>>>>>> According to one study of 18 production applications, code in
>>>>>> functions
>>>>>> with cyclomatic complexity <=5 was about 45% of the total code base
>>>>>> but
>>>>>> this code was responsible for only 12% of the defects logged against
>>>>>> the
>>>>>> total code base. On the other hand, code in functions with cyclomatic
>>>>>> complexity of >=15 was only 11% of the code base but this same code
>>>>>> was
>>>>>> responsible for 43% of the total defects. On a per-line-of-code
>>>>>> basis,
>>>>>> functions with cyclomatic complexity >=15 have more than an order of
>>>>>> magnitude increase in defect density over functions measuring <=5.
>>>>>>
>>>>>> What I find interesting, personally, is that complexity metrics for
>>>>>> object-oriented software have been around for about 20 years and yet
>>>>>> nobody (to my knowledge) has done any correlation analysis at all
>>>>>> (or,
>>>>>> at
>>>>>> a minimum they have not published their results).
>>>>>>
>>>>>> The other thing to remember is that such measures consider only the
>>>>>> "syntax" (structure) of the code. I consider this to be *necessary*
>>>>>> for
>>>>>> code quality, but far from *sufficient*. One also needs to consider
>>>>>> the
>>>>>> "semantics" (meaning) of that same code. For example, to what extent
>>>>>> is
>>>>>> the code based on reasonable abstractions? To what extent does the
>>>>>> code
>>>>>> exhibit good encapsulation? What are the cohesion and coupling of the
>>>>>> code? Has the code used "design-to-invariants / design-forchange"?
>>>>>> One
>>>>>> can
>>>>>> have code that's perfectly structured in a syntactic sense and yet
>>>>>> it's
>>>>>> garbage from the semantic perspective. Unfortunately, there isn't a
>>>>>> way
>>>>>> (that I'm aware of, anyway) to do the necessary semantic analysis in
>>>>>> an
>>>>>> automated fashion. Some other competent software professionals need
>>>>>> to
>>>>>> look at the code and assess it from the semantic perspective.
>>>>>>
>>>>>> So while I applaud efforts like SQALE and others like it, one needs
>>>>>> to
>>>>>> be
>>>>>> careful that it's only a part of the whole story. More work--a lot
>>>>>> more--needs to be done before someone can reasonably say that some
>>>>>> particular code is "high quality".
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> -- steve
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> Date: Friday, June 21, 2013 6:04 AM
>>>>>> To: "systemsafety_at_xxxxxx >>>>>> <systemsafety_at_xxxxxx >>>>>> Subject: Re: [SystemSafety] Qualifying SW as "proven
>>>>>> in use" [Measuring Software]
>>>>>>
>>>>>> I agree with Derek
>>>>>>
>>>>>> Code quality is only a means to an end
>>>>>> We need evidence to to show the means actually helps to achieve the
>>>>>> ends.
>>>>>>
>>>>>> Getting this evidence is pretty tricky, as parallel developments for
>>>>>> the
>>>>>> same project won't happen.
>>>>>> But you might be able to infer something on average over multiple
>>>>>> projects.
>>>>>>
>>>>>> Derek M Jones wrote:
>>>>>>> Thierry,
>>>>>>>
>>>>>>>> To answer your questions:
>>>>>>>> 1ˇĆ) Yes, there is some objective evidence that there is a
>>>>>>>> correlation
>>>>>>>> between a low SQALE index and quality code.
>>>>>>>
>>>>>>> How is the quality of code measured?
>>>>>>>
>>>>>>> Below you say that SQALE DEFINES what is "good quality" code.
>>>>>>> In this case it is to be expected that a strong correlation will
>>>>>>> exist
>>>>>>> between a low SQALE index and its own definition of quality.
>>>>>>>
>>>>>>>> For example ITRIS has conducted a study where the "good quality"
>>>>>>>> code
>>>>>>>> is statistically linked to a lower SQALE index, for industrial
>>>>>>>> software actually used in operations.
>>>>>>>
>>>>>>> Again how is quality measured?
>>>>>>>
>>>>>>>> No, there is not enough evidence, we wish there would be more
>>>>>>>> people
>>>>>>>> working on getting the evidence.
>>>>>>>
>>>>>>> Is there any evidence apart from SQALE correlating with its own
>>>>>>> measures?
>>>>>>>
>>>>>>> This is a general problem, lots of researchers create their own
>>>>>>> definition of quality and don't show a causal connection to external
>>>>>>> attributes such as faults or subsequent costs.
>>>>>>>
>>>>>>> Without running parallel development efforts that
>>>>>>> follow/don't follow the guidelines it is difficult to see how
>>>>>>> reliable data can be obtained.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> The System Safety Mailing List
>>>> systemsafety_at_xxxxxx >>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> The System Safety Mailing List
>>>> systemsafety_at_xxxxxx >>>>
>>>
>>

>
-- 
Derek M. Jones                  tel: +44 (0) 1252 520 667
Knowledge Software Ltd          blog:shape-of-code.coding-guidelines.com
Software analysis               http://www.knosof.co.uk
_______________________________________________
The System Safety Mailing List
Received on Sat Jun 29 2013 - 17:08:14 CEST

This archive was generated by hypermail 2.3.0 : Thu Apr 25 2019 - 17:17:06 CEST