CSMN-646-1121: Software Development and CASE Tools

Establishing a Metrics Program for an Integrated Software
Development and Support Service Organization to Aid
Software Development and User Support Process Improvement


David R. Mapes

Date: 07/13/1995


This paper examines the needs of the US - EPA's SDC for the creation of a process improvement metrics program. Metrics specific to software/document development and system support are identified, and a basic set of each are proposed to aid in the baselining and tracking of process improvement efforts. Methods for implementing a metrics program are identified along with some of the potential pitfalls and hazards. The current status of the SDC's proposed metrics guideline is reviewed. Metrics are identified by reviewing the goals of the SDC as they apply to process improvement: better cost/schedule estimation, improved product/service quality, and ongoing process improvement as defined by the SEI - CMM repeatable and defined maturity level key process areas. The needs of a start-up metrics program for limited data collection scope, good analysis and strong support are kept in mind. A total of ten basic measures are identified: cost, schedule, size, requirements tracing, customer acceptance, and user found errors for software and documents; and cost, number of calls, number of problems resolved, time to resolve problems, and number of calls to resolve problems for user support. Monthly data collection and analysis as part of current schedule and cost reporting is advocated. The relationship between the proposed metrics and various SEI - CMM level two and three KPAs is described. Finally, it is recommended that the collection and analysis of the ten metrics be incorporated in the SDC's - SEE manual as a required policy, instead of an optional guideline, to ensure valid data for driving overall SDC policy.


This paper will cover the basic requirements and methods for instituting a metrics program in a homogenous software development and support organization. The focus of this paper will be on the basics for getting the program up and running; it will leave the details of expansion and embellishment of the program for later consideration. The organization in question is the United States Environmental Protection Agency's (EPA) Software Development Center (SDC). The SDC is working toward an improved methodology for providing software and document products and support services to the EPA. In support of this goal the SDC is preparing for a Software Engineering Institute (SEI) Software Process Assessment (SPA). As part of this preparation, the SDC is investigating the institution of a software/document development/support metrics program to allow for baselining and feedback as various process improvement and SEI Capability Maturity Model (CMM) Key Process Area (KPA) measures are instituted.

The SDC provides software and document development, and system maintenance and support services to the EPA. This somewhat complicates the task of instituting a metrics program as we have to deal with more than just software development type metrics. Simply put, the types of data on quality and productivity that would be collected for a development effort differ sharply from those that are required to track the same facets for a help desk/hot line or database administration project.

While it is clear from an overview of the types of projects in progress at the SDC that the SEI's CMM is a little too narrowly scoped to account for all of the SDC's business directly (Jones, 1995, p. 44), its concepts can be generalized from their specialized concentration on software development, to cover at least some issues for support efforts. In other areas, notably work products and schedule, support delivery orders will have to go their own way.

The reason for the creation of a metrics program within the SDC is to support the SDC in attaining the following goals: SEI - CMM assessments of level two and beyond; monitoring and improving development; documentation and support processes and products with regard to accuracy of cost and schedule estimates; quality; and customer satisfaction. With regard to the SDC's CMM goals the key processes areas (KPA) for a level two (repeatable) organization are: requirements management, software project planning, software subcontract management, software quality assurance, and software configuration management (Kan, 1995, p.40). And the KPAs for a level three (defined) organization are: organizational process improvement, organizational process definition, training program, integrated software management, software product engineering, intergroup coordination, and peer reviews (Kan, 1995, p. 41). Where these items are covered by the measurement program, it serves as a monitor on process improvement within the CMM. Where the metrics program requires a KPA, the program serves as a beneficiary of process improvement.

Lord Kelvin best expressed the need for measurement in the 19th century when he said: "'...when you can measure what you are speaking about, and express it in numbers, you know something about it; when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the stage of science ...'" (Moller and Paulish, 1993, p. 4). With the above goals in mind and a strong desire to progress toward them in a scientific and empirically valid manner, the need for a metrics program within the SDC becomes clear.

Starting a Metrics Program: Organizational Considerations

The institution of a metrics program requires a number of basic prerequisites within an organization. Support for the metrics program must be gained from top management (Harding, 1995, p. 6). The collection, storage, and analysis of metrics data will require a substantial resource commitment in the form of staff time and hardware/software support. Management must place their sponsorship (LaMarsh, 1995, p. 77) behind it because money must be spent to support the metrics program. Indeed the cost of a metrics program can be as much as 5 percent (Jones, 1991, p. 35) of a development effort's cost. Also, there is likely to be some cultural resistance to instituting a measurement program on the part of the very people who must collect the baseline data (LaMarsh, 1995, p. 104).

When implementing any new policy in a work environment it is a good idea to relate it to the overall business goals of the organization. This is true whether the business is the manufacture of wing nuts or the development of computer software (LaMarsh, 1995, pp. 15-18). This will help obtain both management and staff buy-in to the program. If the new policy requires the people on the "shop floor" to take more responsibility for monitoring their own work, as is the case with any metrics program, this is doubly so (Moller and Paulish, 1993, p. 49). Given a required SPI supporting metrics policy, Moller and Paulish (1993, p. 47) suggest the following implementation approach:

     1.  Software Development Process.  Establish and
     document the existing software development
     process.  This will be the baseline process which
     will be measured and incrementally improved.

     2.  Goals.  Identify the target improvement goals
     which are derived from and supportive of the
     strategic  business objectives.

     3.  Responsibility.  Identify management
     responsibility for the Metrics Program, and
     provide the organizational cultural support and

     4.  Initial Research.  Validate the goals and
     customer expectations through internal customer
     survey and/or assessment.

     5.  Metrics Definition.  Define the initial basic
     set of metrics for measuring goal achievement

     6.  Sell.  Introduce and communicate the Metrics
     Program to the organization such that high
     visibility and cooperation is achieved.

     7.  Feedback & Process Improvement.  Identify the
     metrics reporting and feedback mechanisms such
     that software development process improvement
     actions can be determined and implemented.

To start with a metrics program requires a stable, documented, and practiced business process to measure (Moller and Paulish, 1993, p 5.). Once this level of control is established, the metrics program can proceed with the establishment of a baseline data collection and analysis effort. In the case of the SDC this takes the form of the policies, procedures, and guidelines contained in the SDC Software Engineering Environment manual (SEE). This document combined with the above stated goals provides an overall framework to direct the metrics effort. It also at least partially fulfills the SEI - CMM level three KPA for an organizational process definition.

Finally there needs to be support in the form of staff and other resources to design and administer the metrics program. How much staff time (the principal cost of the program) is required? Kan (1995, p. 335) recommends that any organization "with over 100 members should have at least one full-time metrics person." With more than 200 employees the SDC could do well to change that "at least one" to more than one. Within the SDC these functions should be (and are) placed under the Development and Maintenance Methodology Group (DMMG) with input and assistance from Independent Product Assurance (IPA).

Generating a Baseline: Deciding What Data to Collect and How

An initial baseline metrics effort has already been accomplished within the SDC. It began in December 1994 with the selection of representative projects to be monitored for three months. The types of projects selected covered the spectrum of efforts within the SDC: development/maintenance, documentation, and user support.

While this pilot program has served as an adequate basis for designing an initial guideline, it is not sufficient to use as a basis for future estimating and planning activities outside of the selected projects. What is needed is a broader and somewhat less detailed view of SDC schedule, cost, and quality performance. In addition the following characteristics of good metrics should be kept in mind (Down, Coleman and Absolon, 1994, pp 27-28):

     -  Usefulness:
        The metric should provide quantified feedback
        that can be used as a basis for comparison and/or
        a trigger for corrective or enhancement action.

     -  Easily collectable/measurable:
        Collecting metric data should not interfere with
        the business of meeting customer requirements and
        should allow for a minimum possibility of errors
        in collection.

     -  Consistent and defensible:
        Metric collection methods should be applied
        consistently and the metrics collected should be
        readily identifiable and agreeable as useful in
        measuring the desired characteristic for purposes
        of comparison and as a basis for action.

Leaving aside quality measures for the moment, cost and schedule information is already reported on a monthly basis, in the form of presentations and reports to SDC management, and technical and financial reports to EPA Delivery Order Project Officers (DOPO). Baseline schedule and cost information is available for each project in annual project plans. What this means is that an SDC wide, historical picture of cost and schedule estimating performance can be had by simply gathering and analyzing existing, documented data. This information, coupled with sizing data that can be gleaned from the work products of these projects on file, could provide an overall, and by project, picture of productivity at the SDC from inception up to the present. This is by no means a small undertaking, but may be more productive for building a useful, real-world, picture of the SDC for use in gauging the impact of process improvement/SEI-CMM efforts on these facettes of SDC performance than relying strictly on the collection of current reporting and planning data. In the future this data will need to be collected in some automated fashion, but, for the present, this will be a manual process. Once, or as these data are collected, they should be stored in an electronic format that will allow them to be classified and analyzed in a meaningful and statistically valid way.

Basic cost and schedule data, actual verses planned, provides little information of use beyond historical facts about how well we have planned. This information needs to be set in perspective against some measure of the size of work products produced (lines of code (LOC) or function points (FP) for software, number of pages for documents), in order to serve as a basis for future estimation and as a baseline measure of productivity. Methods of gauging complexity are controversial at best. Shepard and Ince (1993, pp. 51-52) make three key points about these methods: there is little agreement or empirical evidence for exactly what these metrics really mean; the models used to generate these metrics are not based upon any great formal rigor, and are thus suspect as to their validity; and many of these complexity metric models are assumed to be general in nature when this may not be the case.

For the purposes of the initial metrics program, the other facette of work product size, complexity/difficulty should be ignored. Because of the difficulty in coming to an agreeable definition (Conte, Dunsmore, and Shen, 1986, pp. 80-87) that can be easily applied across various software and documentation projects, this metric should be assumed to "average out" of productivity baselines. Also, for the purposes of the initial program, sizing data should only be gathered for initial releases due to the inherent difficulty (impossibility for most past projects) of gathering meaningful data on changed/updated LOC/FP or pages in addition to wholly new ones. To help support future analysis and planning two additional pieces data should be captured for each deliverable: to support evaluation and selection for future projects (Raffo and Kellner, 1995, p. 2), development methodology used (software development life cycle (SDLC)); and in order to allow for some normalization among projects, whether or not function points are not to be calculated, the language used for development should be recorded so that apples may be compared with apples and oranges with oranges rather than kumquats.

The proceeding paragraphs have dealt with the kind of data gathering required for software development and document production; the frame-work for user support, however, is quite different. In a support environment there are no natural deliverables to use as key milestones to gauge schedule, cost, and productivity. Instead, there are measures like number of calls handled, number of problems resolved, time between initial call and problem resolution, number of customer call backs for the same problem, and level of effort/cost. These can give a picture of productivity and quality/customer satisfaction in the support environment. Again, collection and analysis of this data should be automated as much and as soon as possible.

Gathering these types of metrics allows the metrics program to serve as an important part of the software project planning and oversight KPA, both as a beneficiary of planning and an enforcer of oversight.

Quality Metrics: Definitions and Targets

There are two definitions of quality that can be freely applied to SDC products and services. Kan (1995, pp. 2-3) lists these as the "popular" and "professional" views of quality. The popular view could best be characterized by the statements "I know it when I see it" (Kan, 1995, p.2) and the more features the better. This is essentially a user's view of quality as a subjective judgement about a product. It does not lend itself to measurement within the metrics framework. At best it is a loose indicator of customer satisfaction. The professional view as defined by Kan, and Down, Coleman, and Absolon (1994, pp. 28-32) is more amenable to measurement.

Kan emphasizes conformance to requirements and fitness for use as key aspects of professional quality. Conformance with requirements implies the ability to trace user requirements through each stage of the development process to the final product. Conformance as traceability then is as much a verification (Lewis, 1992, p. 7) of the correctness of the development process as an indicator of product quality. Fitness for use is a more complex attribute because it possesses a user centered flavor. Fitness for use includes conformance to user requirements, but adds the dimension of user acceptance. From the measurement standpoint then fitness for use validates the rightness of the product (Lewis, 1992, p. 8), and is a more valid starting point for product quality measures.

Down, Coleman and Absolon present a table of quality "attributes", "metric areas", and "beneficiaries" that displays IBM's CUPRIMD quality acronym definition (1994, pp. 30-31):
 Measurement areas for quality attributes
 Attribute       Metric Areas                      Beneficiary
 Capability      Functionality delivered           Development
                 versus requirementsUser
                 Volume of function to

 Usability       Ease of learning important tasks  User
                 Ease of completing a task         User
                 Intuitiveness                     User

 Performance     Transaction throughput            User
                 Response time to enquiry          User
                 Size of machine needed to         User
                 run the product

 Reliability     Mean time between failures        User
                 Number of defects                 Development
                 Severity of defects               User
                 Severity/impact of failures       User

 Installibility  Ease of making product available  User/System programmer
                 for use
                 Time to complete installation     User/System programmer
                 Skill level needed by installer   User/System Programmer

 Maintainability Ease of problem diagnosis         Developer
                 Ease of fixing problem correctly  Developer

 Documentation   Ease of understanding             User
                 Ease of finding relevant          User
                 Completeness of information       User

Down, Coleman and Absolon point out that each of the "metric areas" really need more refinement before they can used to gather data, but a quick look at the "beneficiary" column readily indicates the user centered nature of the quality question.

While these definitions are useful for creating and extended set of quality metrics, the list of possible quality elements to collect is outside of the scope of a start up metrics program. At this stage in the process there are three areas that will have the biggest payoff in terms of gauging quality and process improvement: requirements tracing, customer acceptance of deliverables, and user discovered error rate. Instituting a measure of traceability between requirements and end products will verify that (at least within the limits of the requirements) the product is complete and correct to some known degree. Customer acceptance of deliverables with no or minimum changes can validate that the product is right. Tracking user discovered error rates by type will allow for a determination of quality for software when analyzed in relation to product size:

  (1)  hardware;
  (2)  software failures (abends, crashes, hangs);
  (3)  design/specification errors (failed requirements); and
  (4)  analysis errors (missed requirements).(Blazy, "n.d.", p.7)

Note also that items 3 and 4 from Blazy's list can as easily be applied to documents as software. The quality metric of requirements traceability benefits and enforces the CMM level two KPA of requirements management as requirements that are not managed are difficult to trace, and requirements tracing makes their management more understandable. That the products (be they code, data or documents) are accepted by the DOPO serves as a validation of the software quality assurance KPA (SDC-IPA).

Caveats: Potential Problems and Solutions

While measurement is an essential part of installing an engineering paradigm in software, documentation and support processes, there is no guarantee that a metrics program will succeed. Indeed, Rubin (1995, p. 22) found that "... for the past 20 years, it has been possible to find evidence for almost 600 organizations that have attempted to implement measurement programs. Unfortunately it has only been possible to see about 80 documented successes." Rubin's criteria for success were: "... the measurement program results are actively used in IS organizational decision-making.[,] ... the results are communicated to areas of the organization outside of IS and accepted.[,] ... the program lasts longer than 2 years."

Possible problems with a metrics collection and analysis effort according to Moller and Paulish (1993, pp. 48-53) include:

   -  Lack of Acceptance
   -  Personnel Appraisal Fear
   -  Quick Fixes - Unrealistic Expectations
   -  Loss of Momentum
   -  Tools Availability
   -  Management Support
   -  Poor Goals Follow-Up
   -  Lack of Team Players

Other authors have identified additional problems. Rubin notes (1995, p. 22) that many metrics program failures are due to over reliance on one single measure, what he terms "Fatal Function Point Attraction" by way of example (also a plug for an article of his that appeared in the October 1989 issue of Systems Development). Moller and Paulish (1993, pp. 40-41) also stress the need to keep initial collection efforts narrowly focused and allocate more resources to analyzing and applying metrics data rather than getting carried away counting everything in sight. And Jones (1991, pp. 198-200) lists a number of "...Problems Beyond the Current State of the Art".

Lack of acceptance may be due to a number of causes among these are the fear that "metrics will create additional work" (Moller and Paulish, 1993, p. 48). The resource commitment required must be planned for and ways must be found to streamline collection and analysis process. This is of particular concern within the SDC as Technical Project Leaders (TPL) have been suggested as the major asset for metrics collection without considering their current work load. Automating the collection of cost and schedule metrics as a part of the monthly reporting process would greatly ease this tension. Moller and Paulish (pp. 48-49) list other reasons for lack of acceptance, fear that "metrics may restrict the creative process", "benefits are not clear", "fear of being measured", and "difficulty in admitting that improvement is necessary", but these apply to a much lessor degree within the proactive SDC culture.

As the metrics program moves beyond the introductory/ baseline stage to greater detail in data collection and analysis, the fear that the information gained could be used for personnel appraisal may cause inaccuracies or lapses in reporting. It must be made clear at the outset that the personnel management and appraisal system are a separate entity from the metrics program. Also a policy of proactive in place training like Hitachi's "don't withdraw" policy that allows workers who are having problems to learn from them by continuing to work on the project will help ease this fear. Only a mature, culturally imbedded metrics program can weather use in personnel assessment and continue to provide valid data (Moller and Paulish, 1993, pp 49-50). This is a legitimate concern within the SDC that is further complicated by the number of subcontractors involved. However, with the exception of a very few single person projects, there will not be data at this level of detail from the metrics program for some time.

Another problem identified by Moller and Paulish (1993, p. 50) is the expectation of metrics as a quick fix. This is unrealistic; metric driven process/quality improvement is gradual and incremental in nature. The SDC is in the position of not needing a quick fix. The aim here is gradual, continuous improvement within the documented process (SEE) after establishing a baseline. Along with this, loss of momentum due to fading enthusiasm from hard work with small initial payoffs of the program can be a problem. Improvements derived from metrics collection and analysis are incremental in nature and require time and hard work to implement. As with any organization this will require both patience and good leadership for the SDC program.

Lack of supporting software tools is, perhaps, the single biggest stumbling block for the SDC metrics program. While any good spreadsheet program will support the kinds of analysis needed for this program, performing the initial data collection, entry and structuring in the absence of well-integrated software tool support is daunting. Management should keep in mind that Moller and Paulish state (1993, p. 51) Siemens was able to cost effectively allocate up to 10% of development employees to tools production.

Management must be behind this effort in visible ways. They must stress the importance of gathering and analyzing metrics data to the process, quality, and productivity goals of the SDC by providing incentives for these functions. They must be seen to be utilizing the information generated to take positive steps toward those goals (Moller and Paulish, 1993, pp. 51-52). Once the initial program is in place with established baselines, the goals and the program will need to be refined to meet the ongoing requirement for continuous process improvement. As each goal is met a new one must be set while ensuring that no previously met goals are allowed to lapse.

Within the SDC a lack of team players should not be a major problem. The SDC's SEE includes a mandate for continuous process improvement, and, at this stage in its existence, the SDC has become a well-integrated entity with quite blurred distinctions between subcontractors and well-defined lines of communication between organizational entities. With this in mind the reasons for the need for team players within the organization are obvious. The team atmosphere helps avoid poor cooperation with the metrics program requirements and eliminates most difficulty in bridging gaps between parts of the organization (Moller and Paulish, 1993, 52-53).

Rubin's concern about the overemphasis of a single metric (1995, p. 22), attempting to derive too much information from too little data, is less of a worry given a balanced focus on process, product and customer/quality measures. Of greater concern is the possibility of collecting so much data that it will overwhelm the still elementary capacity to organize and analyze it. This is a real concern because a review of the metrics proposed in the proceeding two sections shows that the ideal count of 3 to 5 for a starter set has been exceeded by double the upper limit (there are 10 without subdivisions):

   1) cost/level of effort (planned/actual)
   2) schedule/milestones (planned/Actual)
   3) product size (LOC, FP, pages)
   4) number of service calls
   5) number of problems resolved
   6) time required to resolve problems
   7) number of calls to resolve problem
   8) requirements tracing
   9) customer acceptance of deliverables
  10) user discovered error rate by type

Multiply these by their various subdivisions and the number of projects at the SDC, and one forms an appreciation for the scale of the undertaking.

Thanks to the SDC way of doing business as regulated through the SEE, organizational problems beyond state of metrics art are not the major threats they might be. While there is not yet a standard set of methods for estimating development schedules across projects, available tools and experienced technical managers ensure that there is some accuracy in this area. So irrational scheduling is not a major concern. The other concerns -- "incompetent management and staff, layoffs or drastic reductions in force, and large systems after the design phase" -- are also of little worry within the SDC framework.

Conclusions: The Current Status

The SDC goals, supported by the metrics program, remain the same: attain improved cost and schedule planning, quality and customer satisfaction by refining the SDC process to meet a SEI - CMM level two and above assessment. One of the CMM level three KPAs directly supported by the metrics effort is organizational process improvement. Basic requirements for a metrics program include the additional CMM level three KPAs of organizational process definition and some degree of intergroup coordination between the collection and analysis group (DMMG), IPA, and developers and support staff. Also the level two KPAs of requirements management and software project planning provide primary sources for requirements tracing, and baseline cost and schedule data. Finally the software quality assurance KPA function is validated for the SDC's IPA organization.

Currently there is no formal metrics program under way at the SDC. Costs and schedules are planned and reported on a monthly basis but no effort is made to tie those data to product size. Despite the active presence of a robust IPA organization the only real SDC wide data available on customer oriented quality and satisfaction is the semiannual award fee rating. While the award fee rating as it is implemented under the Mission Oriented Systems Engineering and Support (MOSES) contract is a very accurate indicator of overall customer satisfaction, it does not occur with a frequency that allows within fiscal year trend analysis, nor does it allow for timely corrective action as the award fee rating is the final arbiter of how well the SDC is doing.

Recommendations: Where to Go From Here

As an essential feature of a process improvement effort, a metrics program needs to be implemented within the SDC. While the current trend is to make this metrics effort a part of the SEE as a guideline (optional item), it is recommended that the basic measures of cost, schedule, product size, number of service calls, number of problems resolved, time required to resolve problems, number of calls to resolve problem, requirements tracing, customer acceptance of deliverables, and user discovered error rate be made part of an SDC policy requiring their collection, as applicable, within development, documentation, and support projects. Collection, analysis and reporting should be done on a monthly basis to permit the development of valid cost and schedule variance trends inside of a useful time frame for corrective action when needed. Finally, sufficient resources in terms of personnel, training, software tools (or tool development), supporting hardware, incentives, mandates, and authority need to be provided to make and keep the program a successful part of the SDC's business environment.


Blazy, L. ("n.d.") Metrics paper (Draft).  Unpulished 
     manuscript, University of Maryland University College, 
     College Park.

Conte, S. D., Dunsmore, H. E., & Shen, V. Y.  (1986).  Software 
     engineering metrics and models.  Menlo Park, CA:

Down, A., Coleman, M., & Absolon, P.  (1994).  Risk management
     for software projects.  London:  McGraw-Hill.

Harding, J. T. (1995).  Maximizing the usefulness of your 
     software metrics.,  1995 Software Engineering Process Group
     Conference, May 22 - 25.  Pittsburgh, PA:  Software 
     Engineering Institute, Carnegie Mellon University.

Jones, C. (1995, March).  Flaws: Gaps in SEI programs.  Software
     development. 41-48.

Jones, C. (1991).  Applied software measurement: Assuring 
     productivity and quality.  New York:  McGraw-Hill, Inc.

Kan, S. H. (1995).  Metrics and models in software quality
     engineering.  Reading, MA:  Addison-Wesley Publishing 

LaMarsh, J.  (1995).  Changing the way we change:  Gaining
     control of major operational change.  Reading, MA:  Addison-

Lewis, R. O. (1992).  Independant verification and validation: A
     Life cycle engineering process for quality software.  New
     York: John Wiley & Sons, Inc.

Moller, K. H. & Paulish, D. J. (1993).  Software Metrics: A
     practitioner's guide to improved product development.  
     London:  Chapman & Hall.

Raffo, D. M., Kellner, M. I. (1995).  Supporting process 
     improvement programs by evaluating the performance of 
     alternative software processes quantitatively.  1995 
     Software Engineering Process Group Conference, May 22 - 25.
     Pittsburgh, PA:  Software Engineering Institute, Carnegie 
     Mellon University.

Rubin, H. A. (1995, January).  Measurement: Dispite its promise, 
     successful programs are rare.  Application Development
     Trends. 21-24.  

Shepperd, M., & Ince, D.  (1993).  Derivation and validation of
     software metrics.  Oxford:  Clarendon Press.