Notes from the Field

Fiona Dossin is a Senior Support Engineer at SourceIQ. She has spent many years as a release engineer helping companies and development teams to automate and streamline their build and development processes. She continues her mission to optomize development teams and companies by working closely with customers to take full advantage of the SourceIQ product and expertise. To contact Fiona, please email fiona.dossin@sourceiq.com.

Code Coverage Up, Complexity Down

April 21, 2010

Pretty much every body of code has some set of unit testing to go with it. However, usually the unit tests don’t even come close to exercising the all code properly. And at every meeting there is someone complaining, the dev manager, the qa team or the release engineer (and sometimes a developer or two) that the code coverage needs to be increased. But, of course, no one has time to write unit tests. And who can blame them? With all the efforts to increase productivity, once you are done with one project, it is right on to the next. But with decreasing code complexity, you’ll be able to more easily increase code coverage and get a few other side benefits out of the effort.

Code complexity is the measure of how many paths or outcomes of a method. And code coverage testing is supposed exercise all those different paths. Suppose you just spent a good week developing a highly sophisticated method with a complexity of 50. Now you have to go and spend at least another week writing an equally complex test or set of tests to exercise the 50 paths through your method. Keeping track of all the different paths to test is a project in its own right. Instead, spend some time refactoring the method into 10 new methods each with a complexity of 5, for example. Now writing a unit test that has to exercise 5 outcomes does not seem so daunting.

As for those added side benefits. The smaller methods will be much easier to maintain and debug in the future because they are easier to read. And there is high probability that some of the smaller methods could be reused.  It is inevitable that some of the smaller methods will be somewhat generic that other methods could use it. An exercise meant to increase code coverage, actually has 3 times the benefits. Now that is productivity!  

Delta Metrics, Changing Analysis for the Better

February 5, 2009

SourceIQ's quantitative metrics are a powerful means of assessing and correcting critical issues in a code base. The latest version of SourceIQ introduces version-over-version changes in quantitative metrics, such as Cyclomatic Complexity, Class Length, and Method Length. The advantage of seeing the delta changes in these quantitative metrics is to filter out files with volatile metrics from static metrics. Finding the files that significantly violate a metric standard is highly useful in isolating and correcting critical issues in the source code. However the same analysis that pinpoints the outliers might be hiding files that could be more disconcerting. These files may not have staggering metric violations but their quantitative metrics are rapidly or constantly changing. If the outlier's metric is static over time then it will probably not be a quality or maintenance issue in the future. In contrast, a file with a volatile quantitative metric either is or has the potential become a major issue in testing or in production.

There are instances where a file or a method will violate a normal standard of a given quantitative metric, but the method was either so well written or by nature can only be constructed in this manner. As long as the file is not being changed often or causing major defects then refactoring may not be possible or prudent. Take the example of the quantitative metric, Cyclomatic Complexity. Some methods and classes, due to the nature of their functionality, are inherently complex. This usually includes code that contains discreet business logic and contains large conditional statements. These methods are implicitly complex, but are rarely indications of inappropriate complexity. Also, while other operating models are possible, they are often not justified given the cost to implement. As long as these methods are well architected, well written and tested, they usually will not cause problems. These types of files' complexity tend to remain constant over time and their volatility low, indicating that these files will most likely not cause defects and need little maintenance over time. The identification of these methods is important, but, in general, once assessed and accepted, they deserve little attention in on-going development.

However, these classes will still show up in any analysis based on the absolute quantitative metrics. Once baseline assessment and corrections have been made, a more powerful and precise measurement would be the assessment of change in these quantitative metrics over time. For example, a method with a cyclomatic complexity of 50 that adds a single case, increasing its complexity to 51, is less of a concern than another method whose complexity increases from 10 to 20. Absolute metrics do not provide a simple means of identifying and isolating these more impactive changes in the scale of complexity, as they are often washed out by the noise of more complex - but less critical - methods of greater complexity. An analysis that focuses on the delta of the complexity metric can discover files with growing complexity before the file becomes an outlier or potential issue. This type of analysis will gives information on the file in real time versus assessing the absolute complexity and file data after being shipped to QA or Production. Once code is shipped to production addressing complex files becomes more expensive and critical to the maintenance effort. Thus it is always better to try to harness a method or piece of code before it can become a production showstopper.

Both absolute metrics and delta metrics have a place in the development cycle. An analysis of an absolute quantitative metric can provide a baseline used to evaluate and establish code standards for the release cycle. While an analysis of the delta of the quantitative metrics allows one to evaluate the progress and quality of the code base during the release cycle. One can agree that even though the absolute quantitative measurement of a particular revision of a file is invaluable, also having the ability to quickly and immediately assess deltas in these metrics over time or in a particular release provides a powerful and critical means of assessing change for both development and QA teams.

 

Budgets Anyone?  

September 23, 2008

On Sundays, especially if we are going for a long car ride, we always listen to a talk radio show that gives advice about money. It is general advice about investments, making smart money choices and getting out of debt. The piece of advice the guy gives to almost every caller is that you cannot manage your money until you know what you are spending it on. He tells the callers to carry a pen and notebook around with them for a week, two weeks or a month and write down everything they spend money on. Every cash, credit card transaction or bill the person pays should be noted in the notebook. At the end of the period sort the transactions into categories (groceries, utilities, transportation, Dunkin Donuts, entertainment, etc). Once the transactions have been sorted, you really get a good sense of where your money it going. I used to do this when I first bought my condo and I had a mortgage and business school tuition to pay. You'd be surprised how much Dunkin Donuts adds up over a month! But once you know where your money is going, you can better make decisions on how to manage it. Obviously there are some things you just have to pay, like the electric bill, but do you really need to eat out every weekend. Or it might make sense investing in a decent coffee maker rather than stopping at DoubleD every morning.

The same exercise can be applied to software management and resources. Management has to decide where to allocate resources and money. But how do they know that next year's maintenance budget needs to be increased by 15%? Or it is time to invest in a third party tool rather trying to fix the same feature for the umpteenth time? If they knew where the money and resources were going right now, they would better be able to decide where it should go tomorrow. Well that it not too hard to find out. There are several data sources in every company that can tell you where money and resources are going: the source code management system, bug/feature tracking system, and timesheets and payroll. The trick is extracting the data from them, as they are usually not all interconnected. Let's start with the bug/feature tracking system. Most systems will tell you the severity/priority of the item, whether it is a customer bug, qa bug or a new feature, and who performed the work on it. Now you can check the person's timesheet to see how much time was invested in this item. Or if the timesheet is a little vague (for example Joe Schmo spent 20 hours on project A, but fixed 2 bugs and implemented a new feature) then check the SCM to see how many files or lines of code where changed for the item. Then you can estimate time and cost per item. Once you have completed the tedious task of estimating costs for each bug and feature (btw... SourceIQ can help you make this automatic) then you can sort them into categories. I started this blurb mentioning three general categories: customer bug, qa bug and feature. A customer bug is any bug reported by the customer or found after the application went into production. It is generally a high priority to fix and speaks directly to the maintenance of the application. A qa (or development) bug is found either by the qa team or developers before the application is released to the world. These types of bugs can be considered the cost of quality. And features are new development and thus an investment in the future of the application (and the company). As a general rule you want to be investing the most into new development. But as applications grow, so does the effort to maintain them. Thus maintenance budgets grow and the importance of the decisions on how to best use the maintenance budget. You can further categorize the bugs into features to see which features need more help than others. So know you know you need to increase the maintenance budget by 15% and features X, Y and Z are the first on the list to be refactored.

 

Size Does Matter.  

May 27, 2008

Size must matter because all our customers keep asking me to help them measure how big it is. It being their current code base. The size of code can be a major factor in maintenance costs. If you were running a manufacturing plant; the biggest, most complex machines are most likely to be the most expensive ones to maintain. The bigger the machine the more moving parts to fail. And the complex machines will need someone with certain skills to fix them. The same is true for software. The more code, the more places for potential errors. And the more complex the greater the need for developers with particular skills. As code grows more and more, companies are seeing their software budgets being spent on maintenance rather than new development. By knowing the size of your code base and being able to monitor its growth will give you a better understanding on how to budget costs and developer activities going forward.

So how does one measure code size and growth. It is very easy to do with SourceIQ. First let's find out how big the code is in the latest release (code current in production) or how big the code is right now on the release branch (code about to go into production). In the query panel, pick the most recent release label, which should pull in all files associated with the most recent release. Or pick the latest release branch to find all the files that have potential to go into production. In both cases check "Most Recent Revisions Only" under the File Processing heading the query panel. This will ensure that only current revisions are being selected. The metric selections in both cases should be as follows:

Chart: Files, line graph, Linear, Plot data only
Period: All, All Years
View: Sum, all files within each query
Metric: LOC Total

The analysis will graph one point which is the total number of lines at this point in time (aka the total number of lines of code that went into production at the latest release date, or the total number of lines of code that is on the release branch right now).

Now that you know how big the code is, let's take a look at how it is growing. Code growth in production is really the most interestng. You could look at code size and growth for all code in your SCM, but that is only going to speak to how to maintain your SCM. To look at code growth in production, select the production branch (main) or all the release branches or integration branches in the query panel. Then the metric selections should be as follows:

Chart: Files, line graph, Linear, Plot data only
Period: All, Months
View: Sum, all files within each query
Metric: LOC Net Growth
   

This analysis should graph a line that is most likely growing over time. This graph will show you not only how big your code is, but how fast it is growing. If the line has a really steep angle, then you know lots of new code is being added quickly, and thus your development team is probably growing quickly too. If the line seems to have reached a plateau then the code is probably matured and the development team will start to be reallocated to other projects.

But how do you know how many people will be needed to maintain a certain code size? SourceIQ can tell you that too. You can create an analysis in SourceIQ that will tell you historically how many developers have been working on the code. Use the same set of branches as in the analysis to see code growth. Then change the metric selections to the following:

Chart: Files, line graph, Linear, Plot data only
Period: All, Months
View: Sum, all files within each query
Metric: # Authors

This analysis will give you a line (set of points) that will tell you how many people worked on the code at each point in time. Compare this graph to the graph of code growth and you'll be able to see clearly how your development team grows and changes as your code grows. By combing the data points on both graphs you should be able to tell your manager that every time the code grows by X million lines, then the development team needs to grow by Y persons. As developers are the major expense in producing software, knowing when and how many new developers you need is a huge step in better planning and budgeting going forward.

 

Creating a stable relationship with your code. Presenting the Stability Metric.  

May 14, 2008

Recently, one of our customers was describing a metric he wanted to see from SourceIQ. The best way I can describe the metric is a stability metric. The stability metric pinpoints features, modules and classes that have the potential to cause major problems in production and may require a substantial maintenance effort. Basically this customer wanted to know where he will be spending most of his maintenance budget. The idea is that highly stable code will present very few production issues or bugs and have a relative low complexity, which makes it easier and cheaper to maintain. Conversely highly unstable code will create the bulk of production issues and bug fixes as well as being highly complex. Thus many developers' hours will be required to fix and maintain the code. How does one determine the stability mertic of a feature/module/file? We have identified two factors of stable code: number of bug fixes associated with the code and the complexity or maintainability of the code. Let's take a look at each factor and then combine them to create a stability metric.

Bug fixes are generally easy to track. Depending on your SCM and bug tracking tool, the code may be easily correlated to the bugs it fixes. For example, if you are using the IBM Rational suite of ClearCase UCM and ClearQuest, you should be able to easily find the bug severity and the files that were changed to fix it. For our stability metric we want to know for each feature/module/class/piece of code how many different bugs are associated with it. SourceIQ works in tandem with ClearCase and ClearQuest to find the number of bugs associated with a piece of code's changes. For each file or directory SourceIQ can find the number of activities (bug fixes and features) associated with the codes' revisions. Thus giving a very accurate representation of which code is actually changing to fix bugs. But all bugs are not created equal. Some are showstoppers and some you can live with. In addition to finding total number of bugs to determine the bug fix part, factoring in priority or severity will help get a more accurate picture of the impact of each piece of code. Using a weighted average, as in the example below, combines the number of bugs and their priority/severity rating into one number, a bug rating. Even though Hello.java and World.java are associated with the same number of bugs, Hello.java is more of a concern with the higher bug rating because it has a number of high priority bugs.

  Feature     Total Bugs	High Priority(5)  Medium Priority(3)  Low Priority(1)	Bug Rating
   Hello.java	   20	              8	                 9	           3	           3.5
   World.java	   20	              3                  9                 8               2.5

The big issue with the bug rating is that it is historical. Bugs are opened on code that has already gone into testing or production. So bug ratings are established either during the testing phase of the product or once the code has been pushed into production. Unfortunately bugs seem to be an inevitable part of software production. But keeping track of how each feature/module/file's bug rating changes over time will help reveal if code is becoming better or worse. If your development team is making a concerted effort to clean-up a feature, that features bug rating should tread down. In contrast a feature with a growing bug rating might also have a growing complexity and need to be addressed.

The second part of the stability metric is maintainability. There are a few ways of looking at maintainability of the code. Often times, cyclomatic complexity is a good indicator. The more complex the code is, the more difficult it is to read, debug and amend. Also for java code there are several standard coding rules that can speak to maintainability of code; method length, long/short variable names and confusing ternary statement to name a few. Whichever way you define the maintainability factor, just be consistent going forward. In this example I am going to use cyclomatic complexity.

 Feature	     Complexity
  Hello.java	         55
  World.java	         12

Complexity can be monitored and addressed before code goes to testing and production. SourceIQ can easily run complexity metrics on code in the SCM as well as code on a developers' workstation. Complex code can be identified and either refactored or validated before it is checked into the SCM. This half the stability metric can be directly controlled and monitored by the development team before the code hits production. Now just because a piece of code is complex does not mean it is unstable. There may be a very good reason for complex code to exist. And that complex code may have been written by a very proficient development team. Hence the two parts of the stability metric. The bug rating keeps simple code from becoming problematic and intentionally complex code from being suspect. And the complexity metric can be used a rough predictor for the bug rating.

Once the two factors of the stability metric have been established they can be combined into one number to evaluate all parts of the code. The table below gives a general idea of how the two factors relate to each in terms of stability.

                       Low Complexity                High Complexity
Low Bug rating             Stable             Potential to become error prone
High Bug rating      Potential to become complex       Unstable

If your bug severity scale goes from 1 to 5, then a bug rating over 3 would be considered high and under 3 a low bug rating. Complexity is a little different seen as the complexity of a file can technically go from 1 to infinity. It is best to pick a complexity standard and use that as the middle point between high and low complexity. For example, if your development standard is to keep complexity under 15, then all complexity factors below 15 are low and above 15 are of high complexity. To get one number for the Stability metric, multiply the complexity (or maintainability) number by the bug rating. As the complexity numbers and bug ratings can have varying ranges from product to product there is no one Stability number that you should be striving for or a good industry standard. You'll need to figure out what stability number works best for your development team and product. As you start using the stability metric use it to compare the different features/modules/files. Pick out the features/modules/files with the highest stability metric and take a look at them first. Evaluate what needs to be done to reduce the bug rating and maintainability of the features/modules/files with the highest stability metrics. As you investigate each piece of code you may notice that for some their stability metric cannot be changed much. This could be an indicator of a comfortable stability metric that each set of code should not exceed. As your development team addresses code with high stability numbers, you should see the overall stability metrics trend down and level out around a certain number. This would also be an indicator of a comfortable stability metric. In the example below, the development team should first address the two features, Hello and Moon because of their high stability ratings. After working with the stability metric for some time, this development team may come to the conclusion that a Stability rating of 25 or less is acceptable.

 Feature     Ave Complexity    # of Bugs  High Priority  Medium Priority  Low Priority	  Bug Rating  Stability Rating
  Hell0              25           20           8                9             3             3.5           87.5
  World               7           20           3                9             8             2.5           17.5
  GoodNight           7           20           8                9             3             3.5           24.5
  Moon               25           20           3                9             8             2.5           62.5
  

The stability metric helps identify areas of the code that could be potentially costly to maintain. These areas may need more testing, to be refactored or just scrapped altogether. It also helps showcase areas of the code that are working well and could be potentially re-used for other products. It just provides a little more insight into the health of your code.

The Metrics are Coming! mySourceIQ Save Me!

May 4, 2008

So management has decided to sponsor an initiative to monitor code quality. They are using SourceIQ to audit the code in the SCM tool to see which files violate the established coding standards. Next they are going to hunt down those who have been working on those files. And you'll end up spending a weekend shortening vairable names, re-working if statements and filling in all those empty catch blocks. Well what can you do to save your weekend? How can you be sure that the code you check-in today won't show up as a red flag on a manager's dashboard tomorrow? The answer is mySourceIQ! The mySourceIQ tool allows you to upload the code in your workspace directly to the SourceIQ server and run all the same code quality metrics that your manager will look at. The best part is that you can run mySourceIQ and see the metrics BEFORE you check-in any code. So if a few red flags do pop-up, you can address them before checking in the code and your manager will never know. To use mySourceIQ, log into the SourceIQ frontend. Go to Tools->MySourceIQ. The wizard will guide you through creating the mySourceIQ conduit that is used to upload the files in your workspace and pick metrics to run on the code. The most useful metrics to run on a mySourceIQ conduit are Indexer (to allow for searching), LabelProcessing, LineCountMetrics, sLOC, PMDQuality (Java), PMDMetrics (Java), CheckStyles (Java), FXCop (.NET). Once the conduit is created, the upload and processing begins. When that is done, which can take a few minutes depending on the amount of code, a pop-up box appears asking you to re-login. Then you can create analyses and use the mySourceIQ conduit like you would any other conduit. Later on if you make changes to the same work space, you can update the mySourceIQ conduit to gather and assess the changes. Go into Tools->Portfolio Summary; highlight the mySourceIQ conduit, right-click and select "Update MySourceIQ conduit..." After a few updates you'll be able to see a trend of your changes over time. By using mySourceIQ, you'll always know that your code is in good shape. Your manager will be thrilled that your code never sets off any rule violation alarms. And your weekend will be yours to enjoy.

 

Hey SourceIQ, I deleted that file a year ago! How come it is still showing up in my analysis as actively changing?

April 29, 2008

Turns out that the data showing in SourceIQ is correct, just the file name is wrong. If you take a closer look at that deleted file in ClearCase, you'll see that the file was renamed or moved. In ClearCase a renamed file is considered one file element with two (or multiple) names. ClearCase will store one object id and one version tree for a file element under the different names. A user can rename a file as many times as necessary and be able to access the version tree of the file element by any of the file's names, past and present. Thus if I rename the file old.txt to new.txt I can see the file's version tree by running cleartool lsvtree against either name. Also Clearcase will return every version under the file name that is accessible by the current view. Thus if new.txt is visible in my view, all versions of new.txt will be new.txt@@main\... despite the fact that old versions had the file name old.txt. Also a cleartool find command will return all file versions of the file by only one name, generally the file name that is visible in the view, or the filename to chronologically appear first. The cleartool find command will never return two names for one file element. How does this impact SourceIQ? SourceIQ gathers information about existing file elements by running a cleartool find command. Usually the clearcase view is configured using a default "\main\LATEST" config spec. Thus the find command will return the first file name on the main branch or first revision on branch off of the main branch. Usually this is the original name of the file. As ClearCase will return all versions of the file element under the original name, SourceIQ will not discover the newer name of the file. During analysis it is possible to see current data on a filename that was renamed. The data is current and correct, it is only the filename that is inconsistent with the true state of the vob.

 

LOC, cLOC, sLOC, eLOC... What's the difference?

April 29, 2008

LOC, cLOC, sLOC, eLOC are all Line of Code metrics but they each count different sets of lines of code. Let's start with the general LOC metric. The LOC metric counts all lines of code in all text files except white lines. If the line is a comment, source or just a curly brace; the LOC metric will count it. The cLOC metric stands for comment lines of code, and counts only comments. Any line that begins with the programming language's comment attribute will be counted. For example in java any line beginning with "//" or between lines with "/*" and "*/" will be counted as a comment line of code. In contrast to cLOC is sLOC which stands for source lines of code. The sLOC metric counts any line that does not begin with a comment. What about lines that contain both source and comments? Well what ever appears first on the line is how the line is counted. So if the comment comes first then the whole line is consider to be a comment. Also if the source comes first then the whole line is consider source and will be included in sLOC. Finally there is eLOC which stands for effective or executable lines of code. eLOC is a subset of sLOC and only contains lines of code that will be executed. Lines of code that contain a just a curly brace or a semicolon are not considered executable lines of code. Lets take a code sample and see how LOC, cLOC, sLOC and eLOC measure up.

                                          LOC           cLOC         sLOC           eLOC 
  //lines of code metric sample            1              1 
  if (x+y = z)                             1                           1              1
  {                                        1                           1
     x = x + 1;                            1                           1              1
     return x;                             1                           1              1 
  } else {                                 1                           1              1 
     y = y + 1;                            1                           1              1              
     return y;                             1                           1              1
  }                                        1                           1              
                  
                               Totals      9              1            8              6 

As you can see each line of code metric gives a different number. Generally cLOC + sLOC = LOC, because comments plus source equal all lines of code. Also eLOC will always be a subset of sLOC. The next step is figuring out which line of code metric is right for you. Each metric will give you valuable information about the size, growth and volatilty of your codebase, but some metrics might be more appropriate than others. For example, if your development process relies heavily on javadoc, or the maintenance of the codebase is moving to another team or site then changes to comments are of the upmost importance. You'll need to keep track of the comments to ensure that the documentation is comprehensive or that the new team will be able to understand and maintain the codebase. But if you could care less about comments, then just looking at sLOC or eLOC metrics are probably more appropriate. Looking at only sLOC or eLOC gets right to the heart of the lines of code that impact the final production product. But how to choose sLOC versus eLOC. Well every developer codes differently. Some developers like to keep their code compact and others like to spread it out. Choosing eLOC will remove some of these coding preferences and syntax and allow you to concentrate on just those lines of code that will really have an impact. Still not sure which line of code metric is right for, then stick with the general LOC, that way you've covered every line. Finally once you've picked a line of code metric, be consistent and use that same metric going forward.

Comments

Posted by dress [222.95.196.241] on Sep 08

Homecoming Dresses on best wedding dresses for 2009 and 2010. You can find latest collection of woman's dresses and casual dresses on this site Quinceanera Dresses Best canon Coffee Mugs! Funny, Cute, & Humorous Unique designs. Also find Travel Mugs, Coffee Cups also, or Create Photo Personalized Mugs & Drinkware canon mugs canon lens mugs Nikon Mugs Canon Coffee Lens Mug Nikon Coffee Lens Mug anon coffee mug,canon lens mug,canon mug,canon coffee cup,canon thermos travel mug Canon 70-200mm Lens Coffee Mug Canon 24-105mm Mug

 
Last modified April 21, 2010