THE UNIVERSITY OF MARYLAND UNIVERSITY COLLEGE GRADUATE SCHOOL
STUDENT RESEARCH REPORT
for
CSMN - 635 - 1111: Systems Development and Project Control


Estimating Software Size for Web Database Development

by

David R. Mapes
dmapes@erols.com
05/17/1999

Abstract

In this Author's experience, the weakest facet of the software development process is the initial step after determining what software should be built and (hopefully) how it fits into the organization's business process. Having identified a business need for a program, system, or enhancement, determining what level of resources it will require in terms of personnel, time, and money seems to be the most difficult and intractably error prone area in the profession. The first step in this estimating process is a determination of the planned product's size. Given that, at this stage in the development process, little may be known of the project's dimensions, what is the best approach to framing an initial estimate of the size of the effort and how may this estimating process be refined over time to enhance its contribution to future planning efforts? This is the central question addressed by this paper; the environment is a internet based database system development and maintenance effort.

Introduction

This paper focuses on identifying and selecting/developing a methodology for up-front software size estimation in a world- wide-web database application development environment for software project planning. This subject area was selected because an estimate of the size of the software to be developed forms the basis for planning and estimating the overall software project. The general problems with sizing and planning software development stem in large degree from the innate features of software. These include the characteristics or "essence" of software as posed by Brooks:


It is Brooks' contention that any future attempt to enhance the software development process must address at least one of these essential elements. These same elements make sizing and estimating proposed software a difficult and error prone process. All software of any significant functionality is inherently complex. This complexity makes software increasingly difficult to conceptualize as a whole making the development of a size estimate challenging at best. Software's inherent conformability and changeability make predicting its "final" form (and size) a slippery and error prone activity. Finally, the invisible nature of software renders it difficult to grasp conceptually as a whole. The fact that, in the case of estimation, the software does not yet exist doesn't help matters. In essence the problem is one of describing and sizing a complex, changeable, intangible, and non-existent entity. Also, for many software professionals, software sizing and estimation is a foreign and arcane topic given minimal (or no) coverage in college or even graduate school curriculums.

While it may seem difficult or impossible to develop an accurate assessment of a software product's size in advance of its development, there are a number of good reasons why one would want to try. These include the need to effectively plan the utilization of the development organization's resources, the need to produce accurate costing figures for external bids (not a direct concern for this paper), the need to plan "...how to manage the changes to this work as the work progresses..." (Donaldson & Siegel, 1997, p. 31), and the need to project a realalistic implementation date (including the need to be able to tell one's supervisor how long a particular piece of work is likely to take). For the purposes of this paper (and the completely self serving motives of this author) the focus will be on finding methods that can be applied manually to varying information systems applications across a variety of platforms with an eye towards internet type database applications.

Software Sizing Metrics

Software size can be described in any of a number of ways. One can speak of compiled program file size, the size of the execution memory footprint, the number of requirements, the number of source lines of code, or the number of function points of one type or another. For the purposes of size estimation and project planning compiled program file size and memory footprint can only be speculated upon or be treated as implementation targets; they cannot serve as a basis for planning. Requirements might at first appear to provide a basis for software size projection but this is only true if they are rigorously detailed and structured; an unlikely occurrence early in the planning stages of a development effort (indeed this state seams to be a somewhat rare occurrence at any stage in the process). Source lines of code (SLOC), while it does form the basis for many estimating techniques, and, given a stable development environment can provide good results (Symons, 1991, p. 10), does not fit the bill here because of the lack of a like-system historical database upon which to base size estimates. Finally, function points (FP) were developed in response to the perceived shortcomings of the SLOC metric and have continued to evolve as estimating practitioners identified new opportunities for improved FP methodologies.

For the purposes of this paper SLOC has limited utility because the development environment includes a world wide web (WWW) intranet development and implementation platform that is undergoing almost continuous change, a mixture of implementation languages (i.e. Oracle Corp.'s PL/SQL, HTML, JAVA Script, and PERL), the use of a code generator for a portion of the effort (Oracle's Designer 2000), and a politicized customer/user base that insists upon fomenting continuous requirements change. Dealing with rapid environmental change, differing computer languages, varying coding styles (human or machine), and uncertain requirement details are the types of problems that render SLOC meaningless (Symons, 1991, pp. 13-15) and for which FP methodologies were developed. FP methods seem to best lend themselves to the development environment in question here in that they produce more realalistic productivity comparisons and can more easily be applied early in the development effort (Jones, 1991, pp. 56-57) hence they are the focus of the remainder of this paper.

Function Point Methods

Function points, "...first proposed by Albrecht [ALB79], who suggested a measure called the function point..." (Pressman, 1997, p. 85), have a number of advantages in that they are derived by assessing functional information characteristics by objectively counting them, weighting them for assessed complexity, and summing the results (Pressman, 1997, pp. 85-86). Pressman delineates five functional information areas and three levels of complexity that are used to assign a weighting factor as shown below:

                                              Weighting Factor
  Measurement Parameter         Count     Simple Average Complex

  number of user inputs         _____   X   3       4       6    =  _____

  number of user outputs        _____   X   4       5       7    =  _____

  number of user inquiries      _____   X   3       4       6    =  _____

  number of files               _____   X   7      10      15    =  _____

  number of external interfaces _____   X   5       7      10    =  _____

  count = total ------------------------------------------------>  ______

  FIGURE 4.5.  Computing function point metrics.

  
  (Pressman, 1997, p. 86)


Pressman's approach is only one of more than 30 that have been published since 1974 (Jones, 1998, pp. 275-276) and he (Pressman) admits that, as stated, it is an over simplification of the general function point methodology (see note 4. (1997, p. 85)), but for demonstration and/or simple cases it fills the bill. Alan Albrecht's original idea was to create a sort of "Dow Jones Index" (Symons, 1991, p. 15) for the size of software. His FP methodology was essentially identical to that provided by Pressman except that he treated the total figure as an unadjusted indication of information processing size. To arrive at his final figure he developed a technical complexity adjustment (TCA) that was derived from and assessment of 14 General Application Characteristics (GAC):
   General Application Characteristics
   |------------------------------------------------------------------------|
   |    |     Characteristics       |      |     |  Characteristics  |      |
   |____|___________________________|______|_____|___________________|______|
   |    |                           |      |     |                   |      |
   | C1 |        Data Communications| ____ |  C8 |     On-line Update| ____ |
   |    |                           |      |     |                   |      |
   | C2 |      Distributed Functions| ____ |  C9 | Complex Processing| ____ |
   |    |                           |      |     |                   |      |
   | C3 |                Performance| ____ | C10 |       Re-usability| ____ |
   |    |                           |      |     |                   |      |
   | C4 | Heavily Used Configuration| ____ | C11 |  Installation Ease| ____ |
   |    |                           |      |     |                   |      |
   | C5 |           Transaction Rate| ____ | C12 |   Operational Ease| ____ |
   |    |                           |      |     |                   |      |
   | C6 |         On-Line Data Entry| ____ | C13 |     Multiple Sites| ____ |
   |    |                           |      |     |                   |      |
   | C7 |        End User Efficiency| ____ | C14 |  Facilitate Change| ____ |
   |-----------------------------------------------------------------|------|
   |                                                                 |      |
   |                                        Total Degree Of Influence| ____ |
   |-----------------------------------------------------------------|------|


                                 DI Values


   Not Present, of No Influence          = 0       Average Influence             = 3
   Insignificant Influence               = 1       Significant Influence         = 4
   Moderate Influence                    = 2       Strong Influence Throughout   = 5


             TCA =  0.65 + ( 0.01 x  (Total Degree of Influence) )

                       (TCA Range = 0.65 - 1.35)

   (Symons, 1991, p. 18)


Criticisms of Albrecht's methodology center around the limited impact of a proposed software's internal complexity on the resulting function point count, the way in which certain other of the GACs seem to overlap, the limited scope of the GACs causing certain key characteristics to be ignored, and the degree to which a set of interconnected but separate systems produce a much higher FP count than one integrated system with similar functionality (Symons, 1991, pp. 19-20). Symons' Mk II Function Point Analysis methodology attempts to address these shortcomings by looking more closely at the internal working of the proposed system when calculating unadjusted function points (UFP), by allowing a greater value range for the TCA, and basing it upon a larger number of more carefully defined GACs. Symons calculates UFPs by enumerating the transactions that can occur against each primary entity (which he defines"'the main entities for which the system was constructed to collect and store data'." (Symons, 1991, pp. 25-26) and counting the number of input data elements, entities accessed, and output data elements (D.E.) (for each possible out come of the transaction in question) (Note: Weights listed for X, Y and Z below are Symons' "industry averages" (1991, p. 30)):

  Mk. II Function Point Analysis
  |---------------------|----------|------------|----------|
  |                     |          |            |          |
  |     Transaction     |  Input   |  Entities  |  Output  |
  |                     |No. D.E.'s| Referenced |No. D.E.'s|
  |                     |          |            |By Outcome|
  |---------------------|----------|------------|----------|
  |                     | [input   | [count of  |[output D.|
  | [transactions list] |   D.E.   |key entities|E. counts |
  |                     |  counts] | referenced[| by trans'|
  |                     |          |            | outcome] |
  |---------------------|----------|------------|----------|
  |                     |          |            |          |
  |  Total              |     X    |      Y     |     Z    |
  |                     |          |            |          |
  |---------------------|----------|------------|----------|

  Information Processing Size = X x 0.58 + Y x 1.66 + Z x 0.26 = UFP

  (adapted from Symons, 1991, pp. 26, 30)


In calculating his TCA Symons follows Albrecht's model except that he uses a total of 19 GACs, allows the user to add more where appropriate, and the weighting of the TCA GACs is tunable by the user through a carefully defined calibration process (Symons, 1991, p. 72). Symons also carefully calls out how each these GACs is to be scored on a 0 - 5 scale:


    Mk. II FPA General Application Characteristics
     1.       Data Communication
     2.       Distributed Function
     3.       Performance
     4.       Heavily Used Configuration
     5.       Transactions Rates
     6.       On-Line Data Entry
     7.       Design for End-User Efficiency
     8.       On-Line Update
     9.       Complexity Processing
    10.       Usable in Other Applications
    11.       Installation Ease
    12.       Operations Ease
    13.       Multiple Sites
    14.       Facilitate Change
    15.       Requirements of Other Applications
    16.       Security, Privacy, Auditability
    17.       User Training Needs
    18.       Direct Use by Third Parties
    19.       Documentation

    TCA = 0.65 + C x (sum of Degree of influence for GACs)

    C:        is a tunable factor with an "Industry-Average"
    value of 0.005

    (adapted from Symons, 1991, pp. 73- 80)



The Mk. II FPA approach to sizing better takes into account the internal complexity of information systems by tracking the number of entities referenced by a given logical transaction (were a transaction could loosely be defined as any access to the system's primary data entities). It also provides for a wider range of TCA values based upon a more realalistic, complete, and flexible assessment of the factors that influence the complexity (relative difficulty) of a given development effort. However, Symons' methodology breaks down for complex internet applications in two areas: A) his insistence that the interfaces between non- integrated application components be ignored in the UFP calculation, and B) he still allows complex processing too weak an influence in the UFP calculation. Of special note, however is the fact that the Mk. II FPA method makes allowance of the impact of the development organization's process maturity in its TCA calculation by including an assessment of the level of documentation required for the product in question as its final general application characteristic.

Derived Function Point Methodology

The sizing methodology that makes the most sense for a complex inter/intra-net based database processing application is one that like Mk. II FPA is tunable and takes into account the internal complexity of the system, while, like Albrecht's FPA, allows for the impact of the many cross language and platform interfaces that also occur in this environment. The central problems in the application area being discussed are the connectionless nature of internet systems architecture and the common gateway interface (CGI), the multiple programming language environment required to build a friendly and functional interface, and the internal programming complexity of the applications being developed.

Unlike a LAN based client server application an inter/intra- net application does not maintain a connection between the client and host server. Connections are made only when a packet containing a server processing request is sent by the client to the host server or the server sends a packet containing a response back to the client. This lack of a persistent connection greatly complicates the job of user authentication and system security. It also means that all current information about a given "user session" must either be maintained by the client and get passed to the server with each request or be regenerated by the server as each request is made. This greatly complicates the interfaces between modules of even the most integrated system. Another issue that makes this effort difficult and annoyingly tedious is the nature of passing parameters across the CGI interface from the client, to the web listener, and on to the Oracle or Netscape Web Servers that handle database and more standard hypertext protocol requests respectively in the current server configuration. On the surface this appears to be a simple question of matching up sever and client side parameter names, data types, order, and ranges; but when the number of these interfaces grows into the hundreds, and the number of data elements being pased rises into the thirties and beyond for some modules, with six or seven programmers developing and maintaining the interfaces, and poor quality or no tools available to aid in debugging the interfaces then they can begin to demand a disproportionate amount of the programmer's time and effort to develop, test, and maintain. For this reason module interfaces that breach the client/server CGI boundary deserve their own column in the UFP counting matrix.

The lack of a truly mature web programming language tool set for the selected database software (Oracle was a customer selected system development constraint (and JAVA's much ballyhooed capabilities not withstanding)) requires the system to be implemented using a mixture of Oracle's Procedural Language/SQL (PL/SQL), HTML (much of it imbedded in PL/SQL procedures and functions), JAVA Script (also mostly imbedded in PL/SQL), and PERL. While trying to account for this situation within the FP framework would be a waste of time as it gets far too close to the physical system implementation to have any meaning for a FPA type of approach, it is also worth while to note that this situation does really invalidate any attempt to apply a SLOC metric based method.

Finally, the application area being supported requires complex text string processing and database searching to support the identification and matching of location addresses and names using data of varying quality, format, and completeness. This processing requires the sort of string format profiling, content parsing, and weighted scoring that inherently creates a high volume of complex and volatile code. The system not only handles this complex matching process but also supports extensive data evaluation and reporting tools that allow it to display data on the current and past quality of data and location based matches maintained within the system. The system is also required to maintain a complete audit trail of changes to its primary entities. These are features that breed programming complexity and whose impact cannot be adequately dealt with through the application of a TCA type correction factor that may (due to other GAC value considerations) actually cause the adjusted FP total to shrink. This situation indicates that a module's (Symons' logical transaction) internal processing complexity should also get its own column in the UFP (information processing size) calculation matrix for internet applications:



    Internet Function Point Analysis
    |---------------------|----------|------------|----------|-----------|----------|
    |                     |          |            |          |           |          |
    |       Modules       |  Input   |  Entities  |  Output  | No. of CGI|  Module  |
    |                     |No. D.E.'s| Referenced |No. D.E.'s| Parameters| Internal |
    |                     |          |            |By Outcome|   Passed  |Complexity|
    |---------------------|----------|------------|----------|-----------|----------|
    |                     | [input   | [count of  |[output D.|[count of  |[internal |
    |    [module list]    |   D.E.   |key entities|E. counts |    CGI    |complexity|
    |                     |  counts] | referenced]| by trans'|parameters]|assessment|
    |                     |          |            | outcome[ |           | 0 - 5]   |
    |---------------------|----------|------------|----------|-----------|----------|
    |                     |          |            |          |           |          |
    |  Total              |     X    |      Y     |     Z    |     P     |    C     |
    |                     |          |            |          |           |          |
    |---------------------|----------|------------|----------|-----------|----------|

    Information Processing Size = X x 0.58 + Y x 1.66 + Z x 0.26 + P + C = UFP

    (adapted and extended from Symons, 1991, pp. 26, 30)


From this UFP calculation an AFP value can by derived by following the standard Mk. II FPA TCA process. In this case the weighting factors for P (CGI parameters) and C (internal module complexity) are taken to be one (1) because of a lack of any hard data to assess their weight other than the experience of this author that they can significantly impede development. At first glance the specific counting of CGI parameters might seam a little too close to the physical implementation of the system for good FP practice, but they simply reflect the nature of the application area and are a fact of life for all significant internet platformed systems development. Treating CGI parameters in this form also takes into account the need for and impact of managing security for an expanding level of high content network traffic. Because of this focus on accounting for the management of network traffic in the UFP calculation this author proposes that the metric be referred to as Internet Function Points (IFP).

In directly accounting for an application's internal complexity in the IFP UFP calculation the standard that is assumed for a reference point would be a typical information systems create, read, update, and delete (CRUD) application. Such an application implementing simple business rules, standard levels of input validation, and user friendliness would score a zero (0) in the internal complexity column for most of its modules. Only those modules having a significantly complex internal processing component would score one (1) or more. Note that, while this is a subjective judgement of complexity certain questions can be asked to aid in the assessment:

    A)        Are there aspects of the module that will require
              extensive decomposition to implement this implies
              substantial structural fan-out (in Yorden, Constantine
              and Myers' parlance) and, according to Kan (1995, p.
              263), will positively correlate with an increase in
              error rate (one indirect measure of a system's
              complexity)?

    B)        Are there aspects of the problem that require extensive
              numeric processing as in graphics presentation,
              scientific processing, and engineering or financial
              modeling, this implies a certain significant level of
              complexity on Halstead's Software Science scale (Kan,
              1995, p. 256)?

    C)        Are there portions of the module that must deal with
              data of unknown quality or that must be tuned once
              initial results are obtained, this implies a real
              world/naturally occurring data acquisition and
              evaluation problem (i.e., address matching) that like
              nature itself is messy and complex?

    D)        Are there large portions of the module's functionality
              embedded in the database outside of the code itself,
              this could constitute a high degree of hidden
              Cyclomatic Complexity on McCabe's scale (Kan, 1995, p.
              257)?


Each of these questions address key aspects of the complexity issue, unfortunately they do not provide a way to place an given module at a given complexity level between one (1) and five (5). Also Shepard and Ince (1993, pp. 51-52) make three key points about some of these complexity assessment methods: there is little agreement or empirical evidence for exactly what these complexity metrics really mean; the models used to generate them are not based upon any great formal rigor, and are thus suspect as to their validity; and many of these complexity metric models are assumed to be general in nature when this may not be the case. For the moment at least, complexity must be assessed subjectively by comparing the module to others in the same application or in other similar applications. Hopefully, over time, certain of these module characteristics can be tied to a given ordinal level of complexity, but for the moment this must be left solely to the judgement of the assessor.

From Internet Function Points to Development Schedule

The central point of measuring software size by whatever metric, is to establish a way to measure productivity and predict development schedules for projects. The implementation of any sizing method for a new development environment and organization will not automatically result in a valid scheduling and planning tool. While the IFP method's basis in Mk. II FPA may allow some generalized schedule prediction based upon the Mk. II FPA model's past performance and industry standards, only the mapping of actual productivity metrics from within the development organization back to IFP size estimates will yield a really useful basis for development schedule planning.

In establishing a productivity tracking metrics program there are some key issues that must be addressed. These issues have as much to do with intra-organizational relations as they do with any other aspect of valid software metrics collection. Possible problems with a metrics collection and analysis effort according to Moller and Paulish (1993, pp. 48-53) include:

    -  Lack of Acceptance
    -  Personnel Appraisal Fear
    -  Quick Fixes - Unrealistic Expectations
    -  Loss of Momentum
    -  Tools Availability
    -  Management Support
    -  Poor Goals Follow-Up
    -  Lack of Team Players


Lack of acceptance may be due to a number of causes among these are the fear that "metrics will create additional work" (Moller and Paulish, 1993, p. 48). The resource commitment required must be planned for and ways must be found to streamline collection and analysis process. As the metrics program moves beyond the introductory/baseline stage to greater detail in data collection and analysis, the fear that the information gained could be used for personnel appraisal may cause inaccuracies or lapses in reporting. It must be made clear at the outset that the personnel management and appraisal system is a separate entity from the metrics program. Only a mature, culturally imbedded metrics program can weather use in personnel assessment and continue to provide valid data (Moller and Paulish, 1993, pp 49-50).

Another problem identified by Moller and Paulish (1993, p. 50) is the expectation of metrics as a quick fix. This is unrealistic; metric driven productivity tracking/planning improvement programs are gradual and incremental in nature. The aim here is gradual, continuous improvement once a baseline has been established. Along with this, loss of momentum due to fading enthusiasm from hard work with the small initial payoffs of the program can be a problem. Improvements derived from metrics collection and analysis of any type are incremental in nature and require time and hard work to implement. As with any organization this will require both patience and good leadership. Lack of a good supporting software tool is, perhaps, the single biggest stumbling block for the development of a productivity metrics program. While any good spreadsheet program will support the kinds of analysis needed for this program, performing the initial data collection, data entry, and analysis in the absence of well-integrated software tool support is daunting. It should be kept in mind that, as Moller and Paulish state (1993, p. 51), Siemens was able to cost effectively allocate up to 10% of development employees to tools production. It would seem to follow that the location and evaluation of third party tools would be very economically feasible.

To grow beyond the personal software process level the active support of management must be obtained for the metrics effort. They must understand the importance of gathering and analyzing metrics data to the productivity and planning goals of the organization. They must also be seen to be utilizing the information generated to take positive steps toward those goals (Moller and Paulish, 1993, pp. 51-52). Once the initial program is in place with established baselines, the goals and the program will need to be refined to meet the ongoing requirement for continuous process and productivity improvement. As each goal is met a new one must be set while ensuring that no previously met goals are allowed to lapse. Within the development organization lack of team players should not be a major problem due to the organization's existing mandate for continuous process improvement, and its well-integrated structure that has blurred the distinctions between subcontractors and established well- defined lines of communication between organizational entities. With this in mind the reasons for the need for team players within the organization are obvious. The team atmosphere helps avoid poor cooperation with the metrics program's requirements and eliminates most difficulties in bridging gaps between parts of the organization (Moller and Paulish, 1993, 52-53). There, however is likely to be some cultural resistance to instituting a measurement program on the part of the very people who must collect the baseline data (LaMarsh, 1995, p. 104). To get their buy-in to the program the following characteristics of good metrics should be kept in mind and be made clear to the collecting individuals (Down, Coleman and Absolon, 1994, pp. 27- 28):

   -  Usefulness:
      The metric should provide quantified feedback
      that can be used as a basis for comparison and/or
      a trigger for corrective or enhancement action.

   -  Easily collectable/measurable:
      Collecting metric data should not interfere with
      the business of meeting customer requirements and
      should allow for a minimum possibility of errors
      in collection.

   -  Consistent and defensible:
      Metric collection methods should be applied
      consistently and the metrics collected should be
      readily identifiable and agreeable as useful in
      measuring the desired characteristic for purposes
      of comparison and as a basis for action. 


What specific metrics should be collected given the above definition of what constitutes a good metric becomes fairly clear. For the purposes of productivity measurement and project schedule planning the time spent on the development and testing of each module as defined in the IFP's UFP matrix and the error rates experienced (both in testing and once the module is fielded) should be tracked. These permit both a direct measure of productivity performance within the application area (in both development and testing) and a way of mapping that productivity to out-the-door quality and application rework.

Conclusion

While it is not strictly possible to predict development schedule based upon the initial implementation of the IFP method proposed in this paper, it is the first step in gaining that result. The method addresses key areas of the software development problem as defined by Brooks (1987, pp. 10-12) in that the complexity of the software to be developed is explicitly accounted for, conformability and changeability are handled by GACs for facilitating change and alternate uses and users of the application code. The problem of Invisibility is abstracted by using some of the more tangible elements (i.e., modules, entities, input and output data elements, and CGI parameters) of the software as major contributors to the size estimate. Finally a method for developing data to permit the projection of a schedule based upon the size data is identified in the form of a productivity and quality software metrics collection program.

References


Brooks, Fredrick P.  (April, 1987).  COMPUTER (p. 10).  No silver
          bullet: Essence and accidents of software engineering.
Donaldson, S. E. & Siegel, S. G. (1997). Cultivating successful software development. Upper Saddle River, NJ: Prentice Hall Inc.
Down, A., Coleman, M., & Absolon, P. (1994). Risk management for software projects. London: McGraw-Hill.
Jones, C. (1998). Estimating Software. New York: McGraw-Hill.
Jones, C. (1991). Applied software measurement: Assuring productivity and quality. New York: McGraw-Hill, Inc.
Kan, S. H. (1995). Metrics and models in software quality engineering. Reading, MA: Addison-Wesley Publishing Company.
LaMarsh, J. (1995). Changing the way we change: Gaining control of major operational change. Reading, MA: Addison- Wesley.
Moller, K. H. & Paulish, D. J. (1993). Software Metrics: A practitioner's guide to improved product development. London: Chapman & Hall.
Pressman, R. S. (1997). Software engineering: A practitioner's approach (4th ed). New York: The McGraw-Hill Companies, Inc.
Rotzheim, W. H./Beasley, R. A. (1997). Software project cost and schedule estimating: Best practices. New York: Prentice Hall.
Shepperd, M., & Ince, D. (1993). Derivation and validation of software metrics. Oxford: Clarendon Press.
Symons, C. R. (1991). Software sizing and estimating: Mk II FPA. Chester, England: John Wiley & Sons.