We rigorously define the concept of a bug, explain why a specification is crucial for controlling software bugs, highlight the differences between a specification and an implementation, and discuss a cost-effective means for developing specifications.
What Is a Bug?
In order to discuss the act of debugging, it is important to define precisely what does and does not constitute a bug.
For the purposes of this text, I will define a bug as "program behavior that deviates from its specification." This definition does not include:
- Poor performance, unless a threshold level of performance is included as part of the specification.
- An awkward or inefficient user interface. Although user interface design is an important topic.
- Lack of features, lack of a particular useful feature, or lack of any feature not included in the program specification (even if it was intended to be in the specification).
The lack-of-features category illustrates an important aspect of our definition of bugs: they are inextricably linked to a program specification. If there is no program specification, then there literally are no bugs. To be sure, there are some generally accepted behavioral qualities expected from any software, e.g., it won't crash, it won't run forever without producing output, etc. Properties like these are implicitly part of the specification of any software. But these properties are the exception; most behavior must be explicitly specified. Because specifications define behavior which defines bugs, we had better discuss what constitutes a specification.
The simplest definition of a bug is "program behavior that deviates from its specification."
Intuitively, a program specification is a description of the behavior of a program. Therefore, having some kind of specification is essential to determining when the system is misbehaving. What form would we like this specification to take? First, let's consider how traditional software engineering answers this question.
Bugs and program specifications are inextricably linked. Since specifications define behavior, without a specification, bugs are not possible.
Specification as Monolithic Treatise
The traditional method of software engineering is to develop a thorough specification of the system's functionality before entering a single line of code. This specification is made as formal as possible, so as to minimize ambiguities. The programmers then slog through the various details of this specification (often a large book) as they implement the system.
This method of specification was adapted from other engineering disciplines, where it can be extremely costly to make any changes to a specification after deployment begins. Microprocessor design is one of these disciplines. Currently, the specifications of microprocessors are interpreted and analyzed automatically. In fact, many aspects of a microprocessor design can be proven sound by unaided machines. But such techniques would be impossible if the specification weren't formalized.
In the software arena, where changes to a specification after deployment aren't nearly as costly, it's natural to question whether this style of up-front, formal specification is so useful. To consider this question, let's first examine how well that specification style works for a particular type of software artifact: a programming language.
Among software systems, programming languages are most similar to microprocessors in terms of the cost of modifying a specification. The cost of making even minor modifications to a language design after people have begun using it can be especially high. All the existing programs written in that language will have to be modified and recompiled.
As we might expect, the specifications of programming languages, compared with other software systems, are often quite formal, especially in the case of syntax. Virtually all modern programming languages have a formally specified syntax. Most parsers are constructed through the use of automatic parser-generators that read in such grammars and produce full-fledged parsers as output.
The specifications of virtually all modern programming languages contain a formally specified syntax.
What about language semantics? Let's take a look at the following four languages, all either currently or formerly popular, and examine the relative degree of precision in the semantic specification for each:
For each language, let's look at how the degree of precision in the specification has helped determine its effectiveness.
The C++ language specification leaves many parts of the specification implementation dependent, and even declines to define the behavior of many valid C++ programs. Although the designers of C++ would claim that the programs for which C++ semantics is undefined are not valid C++ programs, it is impossible in principle for a machine to determine automatically whether a program is valid under this criteria, implying that many (most?) real world software applications written in C++ are not valid C++ programs.
The result is that many C++ programs don't behave as intended when ported from one platform to another.
The Python Language Reference is an informal language specification that leaves many details implementation dependent. In this case, the decision not to use a formal specification was made deliberately, with full awareness of the formalisms available for language semantics. In the words of Guido van Rossum, Python's inventor:
While I am trying to be as precise as possible, I chose to use English rather than formal specifications for everything except syntax and lexical analysis. This should make the document more understandable to the average reader, but will leave room for ambiguities. Consequently, if you were coming from Mars and tried to re-implement Python from this document alone, you might have to guess things and in fact you would probably end up implementing quite a different language. On the other hand, if you are using Python and wonder what the precise rules about a particular area of the language are, you should definitely be able to find them here.
But the ambiguities of the English language aren't just a problem for Martians. Various implementations of Python, such as JPython and CPython, have faced a formidable challenge in providing compatible behavior across platforms. This problem would be much worse if it weren't for the relative simplicity and elegance of the Python language.
The ML specification formally defines the full operational semantics of the language. Consequently, ML programmers enjoy an unprecedented level of precision and cross-platform standardization. The formal specification of ML has even allowed computer scientists to discover subtle inconsistencies in the ML type system, and correct them. Such inconsistencies no doubt exist in many other languages, but they are difficult to find without a formal specification.
Pascal is one language that suffered for quite a while with an inconsistency in its specification: the rules for determining type equivalence were left unspecified. For example, consider the following two Pascal types:
type complex = record
left : integer;
right : integer;
end;type coordinate = record
left : integer;
right : integer;
Are these two types identical? Clearly, they contain the same types of subcomponents. Depending on how we define the language, a value of type coordinate may be passed to a procedure that takes an argument of type complex, and vice versa. If so, our language would be said to use structural equivalence when identifying types. Alternatively, we might define two types as equivalent if and only if they have the same name. Then our language would use name equivalence.
What choice is made in the Pascal language specification? Originally, no choice was made at all; nobody had yet realized that there was more than one way to define type equivalence! As a result, each implementation team for Pascal had to make this choice on its own (and, often, those teams didn't realize they were making a choice either). But the result of this ambiguity is that Pascal code written for one implementation can behave in a drastically different fashion on others.
Benefits of Specifications
Although no formal specification (akin to that of ML) exists for Java, a good deal of care was put into development of a precise informal specification. Many smaller, toy versions of Java have been formalized from this specification, and correctness properties have been proven about them. Furthermore, Java is typically compiled to bytecode for the Java Virtual Machine, which itself is well specified (although, at the time of this writing, the process of bytecode verification is not). The result is an unprecedented level of portability for programs written in this language.
Java programs have a higher degree of portability because of the language's precise, albeit informal, specification.
The conclusion we can draw from this is that there really are advantages to having as precise a specification as possible. The costs of an ambiguity or inconsistency can be quite high, leading to decreased portability, reliability, or even to a security hole.
But even in the world of programming languages, where problems in a specification are most costly, formal specifications are rare. Some of the reasons for this are:
- Few language properties are checked automatically. The process of proving properties about a programming language specification hasn't been automated, as of yet, to the same degree that proving properties about hardware design has. As a result, there's not quite as much advantage to formalizing them.
- Many language users prefer the informal. Informal specifications are preferred by most of the people who will actually read them, like compiler writers. (In fact, compiler writers often revel in less formal specifications because it gives them more room to optimize a program.) The other, occasional, users of a language specification are the programmers, and most of them greatly appreciate an informal specification that they can easily understand.
- It costs money to produce a formal specification. Producing a formal specification up front is expensive. Companies have found it more cost effective to ship early and flesh out the details of a specification later (or, more often, never). Admittedly, if a development team commits to producing a specification, it may not finish formally specifying its system before its competitors have already shipped! If Sun had waited to produce formal language semantics for Java before releasing it, the language may not have come out in time to ride to fame as the preferred language for Internet programming.
But if up-front and formal specification is too costly, what approach should a development team take in specifying software? Many development teams have been so turned off by the cost of up-front specification that they've renounced specification entirely. But that's never a good idea.
Implementations Are Not Specifications
Like it or not, a great deal of industrial software is implemented without a discernible specification. If and when the software is completed, the implementation is then presented as the specification. Whatever behavior the software exhibits is said to be the specified behavior. Some poor souls might argue that this is a good approach, since it doesn't bog the developers down with working out some sort of formal plan that is bound to change anyway. But, while it is true that project specifications often change, an implementation makes for a lousy specification in several respects:
- Many of the choices made in an implementation are arbitrary. Thus, a team that wishes to implement the system on another platform has nothing to go on but the existing implementation. The developers will have to wade through numerous implementation details to determine the behavior that the implementation entails. It is much easier to determine such behavior when it is specified at a higher level of abstraction.
- You cannot define a bug. If an implementation is literally taken as its own specification, then, as in the case where there is no specification at all, it is impossible to identify any behavior as a bug!
- Initial developers have no model of behavior. Obviously, an implementation cannot serve as the specification for the initial developers, since no such implementation yet exists. These developers must rely on some model of behavior for the system they're creating. But the source of this model should then serve as the software's specification.
This last point sheds some light on what sort of a specification a developer might use with reasonable cost. While it's true that developers must have some mental model of the feature they are implementing, they needn't have a mental model of the entire application.
In other words, specifications can be developed a piece at a time. Not only does this make them more tractable, but it also allows them to be modified more efficiently as the customers' needs change.
Related Online Articles:
- Design Patterns for Debugging
- Design Patterns for Debugging - Maximizing Static Type Checking
- Bug Pattern in Java - Building Cost-Effective Specifications with Stories
- Bug Pattern in Java - The Impostor Type
- Bug Pattern in Java - The Liar View
- Bug Pattern in Java - Null Pointers Everywhere
- Bug Patterns in Java - The Orphaned Thread
- Bug Patterns in Java - The Fictitious Implementation
- Bug Pattern in Java - Saboteur Data
- Bug Patterns in Java - The Run-On Initialization
No comment yet. Be the first to post a comment.