Re: [SystemSafety] Qualifying SW as "proven in use" [Measuring Software]

From: Matthew Squair < >
Date: Thu, 27 Jun 2013 13:42:56 +1000


Thanks Steve,

The full paper can be found at the link below, note that the metrics were applied to each of the three thousand odd components in the library. I take it that you'd say that even at component level (presuming components are not method sized) that the view is still too wide to generate a meaningful correlation?

http://www.leshatton.org/wp-content/uploads/2012/01/NAG01_01-08.pdf

On Wed, Jun 26, 2013 at 10:18 PM, Steve Tockey <Steve.Tockey_at_xxxxxx

>
> "Reading the presentation of Les Hatton's 2008 paper, "The role of
> empiricism in improving the reliability of future software" he found using
> empirical techniques in a large scale study (of NAG Fortran and C
> libraries) that cyclomatic complexity was 'effectively useless' and that no
> metric strongly correlated (some actually weakly anti-correlated)."
>
> I looked at the slide deck on his web site and it appears to me that
> he's making the same mistake I referred to earlier:
>
> ----- begin cut here -----
>
> Maybe we are using different applications of cyclomatic complexity to
> code? Yes, sure, increasing the total number of lines of code in some code
> base will almost certainly increase the total number of decisions in that
> code base, and probably by roughly an equal proportion. 10,000 lines of
> code with 2000 decisions almost certainly implies close to 4000 decisions
> in 20,000 lines of code.
>
> But I'm not looking for a correlation of overall, total code base
> cyclomatic complexity to overall defects. I'm looking for the correlation
> of cyclomatic complexity within a single function/method to the defect
> density within that same single function/method. Figure 4 in the Schroeder
> paper shows a strong correlation of function/method-level cyclomatic
> complexity and function/method-level defect density. Again, reverse
> engineering from the numbers in Figure 4, shows that the defect density
> goes up by more than an order of magnitude between cyclomatic complexity
> less than/equal to 5 vs greater than/equal to 15 ***within a single
> function***.
>
> ----- end cut here -----
>
> My interpretation of Hatton's results is that he's looking at total
> cyclomatic complexity in the entire code base. It's not relevant at that
> level. Look at it at the function/method level and it becomes relevant.
>
> "So perhaps we should not use metrics, period?"
>
> That a tool gets mis-applied is not the fault of the tool, it's the
> fault of the tool user. People should be educated in proper use of tools
> before they use them…
>
>
> Regards,
>
> -- steve
>
>
>
> From: Matthew Squair <mattsquair_at_xxxxxx > Date: Tuesday, June 25, 2013 7:15 PM
> To: Bielefield Safety List <systemsafety_at_xxxxxx >
> Subject: Re: [SystemSafety] Qualifying SW as "proven in use" [Measuring
> Software]
>
> Reading the presentation of Les Hatton's 2008 paper, "The role of
> empiricism in improving the reliability of future software" he found using
> empirical techniques in a large scale study (of NAG Fortran and C
> libraries) that cyclomatic complexity was 'effectively useless' and that no
> metric strongly correlated (some actually weakly anti-correlated).
>
> So it does seem that there is a basis on which we can empirically judge
> the efficacy of software metrics.
>
> So perhaps we should not use metrics, period?
>
>
>
> On Wed, Jun 26, 2013 at 10:54 AM, Derek M Jones <derek_at_xxxxxx >
>> Steve,
>>
>> > I think we both strongly agree that there really needs to be a lot more
>> > evidence.
>>
>> Yes. No point quibbling over how little little might be.
>>
>> > But I'm not looking for a correlation of overall, total code base
>> > cyclomatic complexity to overall defects. I'm looking for the
>> correlation
>> > of cyclomatic complexity within a single function/method to the defect
>> > density within that same single function/method.
>>
>> Left to their own devices developers follow fairly regular patterns
>> of code usage. An extreme outlier of any metric is suspicious and
>> often worth some investigation; it might be the case that
>> the developer had a bad day or perhaps that function has to implement
>> some complicated application functionality. or something else.
>>
>> Outliers are the low hanging fruit.
>>
>> The problems start, or rather the time wasting starts, when
>> specific numbers get written into documents and is used to
>> judge what developers produce.
>>
>> > along, what we need in the end is a balancing of a collection of
>> syntactic
>> > complexity metrics. When functions/methods are split, it always
>> increases
>> > fan out. When functions/methods are merged, it always decreases fan out.
>> > The complexity didn't go away, it just moved to a different place in the
>> > code. So having a limit in only one place easily allows people to
>> squeeze
>> > it into any other place. Having a set of appropriate limits means
>> there's
>> > a lot less chance of it going unnoticed somewhere else.
>>
>> Yes, what we need to lots of good quality data for lots of code
>> attributes so we can start looking at these trade-offs.
>> Unfortunately the only good quality data I have involves small
>> numbers of attributes.
>>
>> Having seen what a hash some researchers make of analysing the data
>> they have I am loath to accept finding where the data is not made
>> available.
>>
>> > accident. Just the same, I'm basically arguing for more professionalism
>> in
>> > the software industry. I mean seriously, the programmer who was
>> > responsible for that single C++ class with a single method of 3400 lines
>> > of code with a cyclomatic complexity over 2400 is a total freaking moron
>> > who has no business whatsoever in the software industry.
>>
>> We are not going to move towards professionalism until there are less
>> software development jobs than half competent developers. Hiring
>> people based on their ability to spell 'software' is not an
>> environment where professionalism takes root.
>>
>> I keep telling people that the best way to reduce faults in code is
>> to start sending developers to prison. Nobody take me seriously (ok,
>> yes, it would probably be a difficult case to bring).
>>
>> > And, we will also always need semantic evaluation of code (which, as I
>> > said earlier, has to be done by humans) because syntax-based metrics
>> alone
>> > will probably always be game-able.
>>
>> Until strong AI arrives that will not happen.
>> Even the simpler issue of identifier semantics is still way beyond our
>> reach. See:
>> http://www.coding-guidelines.com/cbook/sent792.pdf
>> for more than you could ever want to know about identifier selection
>> issues.
>>
>> >
>> > Regards,
>> >
>> > -- steve
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Derek M Jones <derek_at_xxxxxx >> > Organization: Knowledge Software, Ltd
>> > Date: Tuesday, June 25, 2013 4:21 PM
>> > To: "systemsafety_at_xxxxxx >> > <systemsafety_at_xxxxxx >> > Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
>> > [Measuring Software]
>> >
>> > Steve,
>> >
>> > ...
>> >> "local vs. global" categories, it's just that nobody has yet published
>> >> any
>> >> data identifying which ones should be paid attention to and which ones
>> >> should be ignored.
>> >
>> > So you agree that there is no empirical evidence.
>> >
>> > Your statement is also true of almost every metrics paper published
>> > todate.
>> >
>> > With so many different metrics having been proposed at least one of
>> > them is likely to agree with the empirical data that is yet to be
>> > published.
>> >
>> > You cited the paper: “A Practical Guide to Object-Oriented Metrics”
>> > as the source of the cyclomatic complexity vs fault correlation
>> > claim. Fig 4 looks like it contains the data. No standard
>> > deviation is given for the values, but this would have to be
>> > very large to ruin what looks like a reasonable correlation.
>> >
>> > Such a correlation can often be found, however:
>> >
>> > o cyclomatic complexity is just one of many 'complexity'
>> > metrics that have a high correlation with quantity of code,
>> > so why not just measure lines of code?
>> >
>> > o once developers know they are being judged by some metric
>> > or other they can easily game the system by actions such as
>> > splitting/merging functions. If the metric has a causal connection
>> > to the quantity of interest, e.g., faults, then everybody is happy
>> > for developers to what what they will to reduce the metric,
>> > but if the connection is simply a correlation (based on code
>> > written by developers not trying to game the system) then
>> > developers doing whatever it takes to improve the metric value
>> > is at best wasted time.
>> >
>> >>
>> >> -----Original Message-----
>> >> From: Todd Carpenter <todd.carpenter_at_xxxxxx >> >> Date: Monday, June 24, 2013 7:20 PM
>> >> To: "systemsafety_at_xxxxxx >> >> <systemsafety_at_xxxxxx >> >> Subject: Re: [SystemSafety] Qualifying SW as "proven in use"
>> >> [Measuring Software]
>> >>
>> >> ST> For example, the code quality measure "Cyclomatic Complexity"
>> >> (reference:
>> >> ST> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
>> >> ST> Engineering, December, 1976) was validated many years ago by simply
>> >>
>> >> DMJ> I am not aware of any study that validates this metric to a
>> >> reasonable
>> >> DMJ> standard. There are a few studies that have used found a medium
>> >> DMJ> correlation in a small number of data points.
>> >>
>> >> Les Hatton had an interesting presentation in '08, "The role of
>> >> empiricism
>> >> in improving the
>> >> reliability of future software" that shows there is a strong
>> correlation
>> >> between
>> >> source-lines-of-code and cyclomatic complexity, and that defects
>> follow a
>> >> power law distribution:
>> >>
>> >>
>> >>
>> http://www.leshatton.org/wp-content/uploads/2012/01/TAIC2008-29-08-2008.pd
>> >> f
>> >>
>> >> Just another voice, which probably just adds evidence to the argument
>> >> that
>> >> we haven't yet found a
>> >> trivial metric to predict bugs...
>> >>
>> >> -TC
>> >>
>> >> On 6/24/2013 6:38 PM, Derek M Jones wrote:
>> >>> All,
>> >>>
>> >>>> Actually, getting the evidence isn't that tricky, it's just a lot of
>> >>>> work.
>> >>>
>> >>> This is true of most things (+ getting the money to do the work).
>> >>>
>> >>>> Essentially all one needs to do is to run a correlation analysis
>> >>>> (correlation coefficient) between the proposed quality measure on the
>> >>>> one
>> >>>> hand, and defect tracking data on the other hand.
>> >>>
>> >>> There is plenty of dirty data out there that needs to be cleaned up
>> >>> before it can be used:
>> >>>
>> >>>
>> >>>
>> http://shape-of-code.coding-guidelines.com/2013/06/02/data-cleaning-the-n
>> >>> e
>> >>> xt-step-in-empirical-software-engineering/
>> >>>
>> >>>
>> >>>> For example, the code quality measure "Cyclomatic Complexity"
>> >>>> (reference:
>> >>>> Tom McCabe, ³A Complexity Measure², IEEE Transactions on Software
>> >>>> Engineering, December, 1976) was validated many years ago by simply
>> >>>
>> >>> I am not aware of any study that validates this metric to a reasonable
>> >>> standard. There are a few studies that have used found a medium
>> >>> correlation in a small number of data points.
>> >>>
>> >>> I have some data whose writeup is not yet available in a good enough
>> >>> draft form to post to my blog. I only plan to write about this
>> >>> metric because it is widely cited and is long overdue for relegation
>> >>> to the history of good ideas that did not stand the scrutiny of
>> >>> empirical evidence.
>> >>>
>> >>>> finding a strong positive correlation between the cyclomatic
>> complexity
>> >>>> of
>> >>>> functions and the number of defects that were logged against those
>> same
>> >>>
>> >>> Correlation is not causation.
>> >>>
>> >>> Cyclomatic complexity correlates well with lines of code, which
>> >>> in turn correlates well with number of faults.
>> >>>
>> >>>> functions (I.e., code in that function needed to be changed in order
>> to
>> >>>> repair that defect).
>> >>>
>> >>> Changing the function may increase the number of faults. Creating two
>> >>> functions where there was previously one will reduce an existing peak
>> >>> in the distribution of values, but will it result in less faults
>> >>> overall?
>> >>>
>> >>> All this stuff with looking for outlier metric values is pure hand
>> >>> waving. Where is the evidence that the reworked code is better not
>> >>> worse?
>> >>>
>> >>>> According to one study of 18 production applications, code in
>> functions
>> >>>> with cyclomatic complexity <=5 was about 45% of the total code base
>> but
>> >>>> this code was responsible for only 12% of the defects logged against
>> >>>> the
>> >>>> total code base. On the other hand, code in functions with cyclomatic
>> >>>> complexity of >=15 was only 11% of the code base but this same code
>> was
>> >>>> responsible for 43% of the total defects. On a per-line-of-code
>> basis,
>> >>>> functions with cyclomatic complexity >=15 have more than an order of
>> >>>> magnitude increase in defect density over functions measuring <=5.
>> >>>>
>> >>>> What I find interesting, personally, is that complexity metrics for
>> >>>> object-oriented software have been around for about 20 years and yet
>> >>>> nobody (to my knowledge) has done any correlation analysis at all
>> (or,
>> >>>> at
>> >>>> a minimum they have not published their results).
>> >>>>
>> >>>> The other thing to remember is that such measures consider only the
>> >>>> "syntax" (structure) of the code. I consider this to be *necessary*
>> for
>> >>>> code quality, but far from *sufficient*. One also needs to consider
>> the
>> >>>> "semantics" (meaning) of that same code. For example, to what extent
>> is
>> >>>> the code based on reasonable abstractions? To what extent does the
>> code
>> >>>> exhibit good encapsulation? What are the cohesion and coupling of the
>> >>>> code? Has the code used "design-to-invariants / design-forchange"?
>> One
>> >>>> can
>> >>>> have code that's perfectly structured in a syntactic sense and yet
>> it's
>> >>>> garbage from the semantic perspective. Unfortunately, there isn't a
>> way
>> >>>> (that I'm aware of, anyway) to do the necessary semantic analysis in
>> an
>> >>>> automated fashion. Some other competent software professionals need
>> to
>> >>>> look at the code and assess it from the semantic perspective.
>> >>>>
>> >>>> So while I applaud efforts like SQALE and others like it, one needs
>> to
>> >>>> be
>> >>>> careful that it's only a part of the whole story. More work--a lot
>> >>>> more--needs to be done before someone can reasonably say that some
>> >>>> particular code is "high quality".
>> >>>>
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> -- steve
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> -----Original Message-----
>> >>>> From: Peter Bishop <pgb_at_xxxxxx >> >>>> Date: Friday, June 21, 2013 6:04 AM
>> >>>> To: "systemsafety_at_xxxxxx >> >>>> <systemsafety_at_xxxxxx >> >>>> Subject: Re: [SystemSafety] Qualifying SW as "proven
>> >>>> in use" [Measuring Software]
>> >>>>
>> >>>> I agree with Derek
>> >>>>
>> >>>> Code quality is only a means to an end
>> >>>> We need evidence to to show the means actually helps to achieve the
>> >>>> ends.
>> >>>>
>> >>>> Getting this evidence is pretty tricky, as parallel developments for
>> >>>> the
>> >>>> same project won't happen.
>> >>>> But you might be able to infer something on average over multiple
>> >>>> projects.
>> >>>>
>> >>>> Derek M Jones wrote:
>> >>>>> Thierry,
>> >>>>>
>> >>>>>> To answer your questions:
>> >>>>>> 1°) Yes, there is some objective evidence that there is a
>> correlation
>> >>>>>> between a low SQALE index and quality code.
>> >>>>>
>> >>>>> How is the quality of code measured?
>> >>>>>
>> >>>>> Below you say that SQALE DEFINES what is "good quality" code.
>> >>>>> In this case it is to be expected that a strong correlation will
>> exist
>> >>>>> between a low SQALE index and its own definition of quality.
>> >>>>>
>> >>>>>> For example ITRIS has conducted a study where the "good quality"
>> code
>> >>>>>> is statistically linked to a lower SQALE index, for industrial
>> >>>>>> software actually used in operations.
>> >>>>>
>> >>>>> Again how is quality measured?
>> >>>>>
>> >>>>>> No, there is not enough evidence, we wish there would be more
>> people
>> >>>>>> working on getting the evidence.
>> >>>>>
>> >>>>> Is there any evidence apart from SQALE correlating with its own
>> >>>>> measures?
>> >>>>>
>> >>>>> This is a general problem, lots of researchers create their own
>> >>>>> definition of quality and don't show a causal connection to external
>> >>>>> attributes such as faults or subsequent costs.
>> >>>>>
>> >>>>> Without running parallel development efforts that
>> >>>>> follow/don't follow the guidelines it is difficult to see how
>> >>>>> reliable data can be obtained.
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >> _______________________________________________
>> >> The System Safety Mailing List
>> >> systemsafety_at_xxxxxx >> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> The System Safety Mailing List
>> >> systemsafety_at_xxxxxx >> >>
>> >
>>
>> --
>> Derek M. Jones tel: +44 (0) 1252 520 667
>> Knowledge Software Ltd blog:shape-of-code.coding-guidelines.com
>> Software analysis http://www.knosof.co.uk
>> _______________________________________________
>> The System Safety Mailing List
>> systemsafety_at_xxxxxx >>
>
>
>
> --
> *Matthew Squair*
> *
> *
> Mob: +61 488770655
> Email: MattSquair_at_xxxxxx >

-- 
*Matthew Squair*
*
*
Mob: +61 488770655
Email: MattSquair_at_xxxxxx



_______________________________________________ The System Safety Mailing List systemsafety_at_xxxxxx
Received on Thu Jun 27 2013 - 05:43:10 CEST

This archive was generated by hypermail 2.3.0 : Fri Feb 22 2019 - 14:17:05 CET