Signalling failures on functions that return values

Some functions will always return a valid value. Unfortunately, there are plenty of other times when they might fail – maybe the input data wasn't valid!

When the errors aren't critical and don't justify an assert, we need to have means to handle them and unambiguously signal to the caller that something went wrong. This is known as the semipredicate problem.

How can we deal with it?


The naive way

The first thing that probably comes off the top of the head is encoding the error as a specific value or range of values.

int FooBar()
{
    ...
    if (!success) return -1;
    ...
}

This approach has a major problem: the state of failure must be represented with the same type returned by the function.

Now, for some types, it is easy to represent the errors.

For integers, -1 is the de facto value to signal that the function failed and returned an invalid value. Sanity checks are then needed to see if the returned value is non-negative. But, what happens if the return type is unsigned instead? What value or range of values can we sacrifice?

For functions returning containers, it is common to return an new instance – an empty list or map, for example. And even then, how can we distinguish if it is empty because of an error or because "empty" is a valid output value?

For more complex, user-defined structures, things begin to get murkier.


Return a bool, modify a reference of T

There is a better way to handle error states. Programming languages already have a binary type that can be used to represent a success or a failure – booleans. Why not use them?

We can change the function to return a boolean and receive a non-constant value, by reference, that will contain the result of the function if it succeeds. This is particularly handy since we can avoid writing on the value when not necessary – an operation that could be potentially expensive.

bool FooBar(int& out_result)
{
    ...
    if (success) out_result = ...;
    return success;
}

The issue with this approach is that the function signature now looks dirtier and the name of the function might no longer match the fact that the returned value is a boolean. If a function is called GetItemIndex and it returns a something other than an index, then it becomes less readable.


Return a tuple of bool and T

Another alternative is to return both the value and the success of the function, bundled together.

In languages like C++, that would look like this:

Tuple<bool, int> FooBar()
{
	...
    return success ? Tuple(true, ...) : Tuple(false, ...);
}

For languages that have native support for tuples, such as Rust or Python, this solution is less verbose, as we can build tuples using ( , ).

This makes the signature a bit cleaner but we are required to create an instance of T even if the operation failed.


Return an "optional" value

The name might vary but most languages feature, one way or another, a type representing "optional" data, where the real data inside of them might not exist. They also provide methods to check if the data is set or not.

Optional<T> FooBar()
{
    // If it failed, Optional<T>() will return an empty box.
    return success ? Optional<T>(...) : Optional<T>();
}

Optional structures use the concept of Boxing, since they can be thought of as boxes that may or not contain things inside. Empty boxes likely contain garbage or have their memory initialised to 0.

Pointers as optionals

In languages like C++, which feature explicit pointers, something similar can be done by returning a pointer to the data. This is wonderful, since pointers have a built-in way to represent no data data: nullptr.

This method has one huge limitation, though. It should only be used in scenarios where it makes sense to return a pointer. If the returned value is created inside of the function, then returning and accessing its address will cause problems since the value is most destroyed once the function terminates.


Return a compound value

Following the Boxing concept, another solution would be to use a special type that could represent data in different states. In C and C++, Unions could be used for this matter.

I particularly love how this could be accomplished in Rust, using enums. For example, std::result::Result is an enum with two possible values:

  • Ok(T): containing the actual value in T.
  • Err(E): representing an error and containing its respective message.
fn baz(...) -> ... {
	...
	match foobar(...) {
        Ok(good_result) => { ... },
        Err(why_failed) => println!("{}", why_failed)
    }
	...
}


For dynamically-typed languages...

Things get easier in dynamic languages. Since variables don't have a static type, functions can return any kind of data. Errors could be represented as any type other than T.

Additionally, these languages usually provide a way to represent a null value or reference. For example, Python has None, which represents no valid value.

def foobar():
    return success ? ... : None

def baz():
    ...
    result = foobar()
    if result is not None:
    ...

All of the solutions have their own pros and cons. As usually, the best approach is the one that suits best the problem.

A good rule of thumb is to use common sense and think if the function is part of a public API. If it'll be used by someone else or from different other places, then the best practice would be to make the code as readable and maintainable as possible.

Which version do you prefer? Can you think of another way to handle this?