Most of what is written about error and exception handling is fairly abstract and vague. When specific recommendations are made, those usually consist of examples given for specific circumstances. Yet, error handling is a fundamental task for developers. This brief review attempts to categorize the different types of error conditions a program can encounter. By describing these categories, I also provide suggestions on how to handle each type of error condition.
In general, errors a program can encounter tend to be the result of one of three things:
- Restrictions: Arguments to a routine that can never work, and always result in an error, define a restriction on the use of the routine. Ensuring that only correct arguments are passed to a routine is the type of thing that programming by contract is meant to address.
- Inconsistencies: When values or resources are not what they are expected to be, or are missing, that creates an inconsistency between the expected state of the environment and the actual state. This may be the internal environment, such as a
null pointer, or the external environment, such as a corrupt file. It doesn't encompass inconsistencies in the data model, which often needs to be temporarily inconsistent during an operation (e.g. adding a node to a linked list).
- Failures When an operation simply does not work, and it's out of the program's control, this is a failure. For example, a pulled network cable.
These types of errors overlap to an extent (say, a working network could be considered part of the expected state, making it an inconsistency error). In general, though, most errors can fall into one of these categories.
Sometimes failures are not errors, and are just a way of detecting the current external state. For example, opening a file that doesn't exist may fail, but results in an error only when that file is actually needed.
Error Handling Responsibilities
Program code is responsible for the consistency of the internal program state. Generally certain code has primary (ideally, exclusive) responsibility for parts of the internal state. Inconsistency errors that occur within the code responsible for that state are bugs.
Sometimes the state management responsibility is shared between different sections of code. This is a bad idea, because it makes assigning responsibility for an inconsistency error harder, but it does happen in practice.
It's important to make a distinction between error detection and debugging. Often, data generated in the process of error handling is mixed together with diagnostic information. If possible, these types of information should be kept completely separate—at least conceptually, even if combined in a single data structure.
Safe Zones
Restrictions can be checked before calling a routine, or within a routine. It seems a waste of time to check arguments every time a routine is called when you already know those arguments are correct. One strategy is to separate parameter checking from parameter usage. This doesn't work reliably for library code, where anything can happen between the check and the use of the parameters, but within a base of code for a particular application or within a library, you can restrict the code to not change a value known to be safe.
The code between a parameter check and the next change to a parameter variable is a safe zone, where parameters don't have to be re-checked. This is only valid for restriction errors, because inconsistency and failure errors can be caused by things outside the code's safe zone. Things like critical sections (in multithreaded environments), semaphores and file locks are meant to create a very limited kind of safe zone for inconsistency and failure errors.
The code safe zones for parameters can overlap with others, and may not be well defined. One way to deal with this is to assign known safe values to variables which indicate this safety. Joel Spolsky wrote about one way to do this using variable naming conventions in Making Wrong Code Look Wrong. Safe values should be assigned to variables declared constant.
Reporting Errors
Code calling a routine needs to know three things to decide how to proceed: First, whether the data is returned, if any, or if the method invocation succeeded; second, whether an error occurred; and, third, whether the error is permanent or transitory. This defines the following possible error states returned from a routine:
- Successful
- Restriction error (always permanent)
- Permanent (bug) inconsistency
- Transitory (detected) inconsistency
- Failure (transitory for all we know)
It's often a bad idea to mix an error code with a return value, such as designating a specific values—say, 0 or -1—to be invalid. Some languages, like Python, allow multiple values to be returned from a method as tuples. A tuple is basically an anonymous class, and can be implemented in a language like Java by defining a class for objects returned by a method, or in C by defining a struct which is passed as a parameter and is updated by the function. But in many cases, exceptions are a much better way to separate error information from return values.
Exceptions transmit an object from the exception location to a handler in a scope surrounding it, or surrounding the point where the routine was called. The exception objects include information by the object type and class, and debugging information the data contained within that type. Exceptions by themselves don't indicate the error state, so that must be included as an attribute of the exception object, or the error state must be deduced from the debugging information (object type and data).
Java introduced the controversial notion of checked exceptions, which must either be caught or declared to be thrown by a method in order to compile, while unchecked (or runtime) exceptions behave like exceptions in other languages. The main cause of the controversy is that there has been no good definition of why there should be a difference and, as a result, no consistent strategy in the implementation in various libraries, including standard parts of the different Java runtime libraries.
In general, unchecked exceptions are meant for bugs, where an error indicates that the code is simply wrong and must be fixed (restriction and bug inconsistency errors). An example is a NullPointerException . Checked exceptions are for detected inconsistency and failure errors, where the program may have a strategy of handling the error. An example is an I/O error.
Transactional Operations
One strategy to handle errors is to make all operations transactional, so that if they fail, it's as if the operation was never tried. One way implement this is to define an "undo" operation for every change:
In this example, the functions are also transactional, and thus don't need to be rolled back if they fail. This can be done with nested if/else blocks, or with nested try/catch blocks. If the "undo" operations themselves have errors, the result looks more like this:
One way of dealing with this is to modify a copy of the program state, and if all operations succeed, only then commit the changes. The commit may fail, but this isolates the possible state changing errors to one point, and is similar how databases implement transactions. Another way to implement transactional operations is to make a copy of before any state is changed, and use that copy to restore the expected state, in case of an error.
In summary, having a clear taxonomy of error conditions that code may encounter helps develop better strategies for dealing with, and possibly recovering from, those errors.
|