The Recurse of Factorial

Deconstructing factorial implementations in unnecessary detail, from iterative to recursive, from control structure to data structure, from one language to another

12 min readSep 6, 2021

Factorial played a walk-on part in a previous blog post.

I’m not saying the reason many programmers are hesitant about recursion is because factorial is their introductory example… but I’m sure it doesn’t help.

Now it gets to star in its own mini-series of articles. I want to examine what recursive and non-recursive implementations can teach us about programming style, paradigms, language design and constructs.

For the record — and for brevity — one thing that is not the focus of this mini-series is error handling. Behaviour outside the valid domain and range of any of the factorial implementations is not defined. To clarify, this means that garbage values, infinite loops, exceptions, crashes, etc., are all valid realisations of undefined behaviour — caveat programmer!

Breaking down the numbers

By defining factorial of a positive integer n, n!, as n×(n-1)! and the factorial of 0, 0!, as 1, factorial is commonly used as the gateway to recursion:

int factorial(int n)
{
    if (n == 0)
        return 1;
    else
        return n * factorial(n - 1);
}

It is, however, hard to get hooked when you compare it to the loop-based version in C:

int factorial(int n)
{
    int product = 1;
    while (n != 0)
        product *= n--;
    return product;
}

The imperative version has more visible moving parts but has a more compact visual footprint than the recursive version. If you’ve never really seen or taken to recursion before, this is not likely to strike you either as significantly elegant or significantly compelling.

Unless you’re already comfortable with the ternary operator in C-like languages, the following compaction is also not likely to sway you:

int factorial(int n)
{
    return n == 0 ? 1 : n * factorial(n - 1);
}

Of course, not all imperative languages suffer from having a conditional expression that is so syntactically unrelated to the conditional statement. For example, Python would render this recursive expression form as

def factorial(n):
    return 1 if n == 0 else n * factorial(n - 1)

Python favours familiar keywords over symbols for its conditional expression, although the ordering of consequence–condition–alternative differs from the more usual condition–consequence–alternative ordering of the if statement.

Languages like Rust and Algol 68 sidestep the problem by not introducing a hard distinction between statement and expression. For example, in Rust we could write factorial as follows:

fn factorial(n: i64) -> i64 {
    if n == 0 {
        1
    } else {
        n * factorial(n - 1)
    }
}

The last expression on the path of control of a function becomes the result of that function. If we wanted to make the return of control and value explicit we could rewrite it as follows, emphasising statements that return values as a consequence of a choice:

fn factorial(n: i64) -> i64 {
    if n == 0 {
        return 1;
    } else {
        return n * factorial(n - 1);
    }
}

Or as follows, emphasising returning an expression that involves a choice:

fn factorial(n: i64) -> i64 {
    return if n == 0 { 1 } else { n * factorial(n - 1) };
}

But, of course, sidestepping the question of both recursion and imperative mechanics, the simpler option is to express factorial directly in its product form, i.e., n! = Πⁿi. (If you’re unfamiliar with the notation, Π is capital pi, i.e., p for product, and this notation is related to Σ, which is capital sigma, i.e., s for summation.) Python has the prod function:

def factorial(n):
    return prod(range(1, n + 1))

A function that calculates the product of a sequence exists as standard in many languages and environments, from Haskell to Excel. Of course, the same is also true of factorial, which, like prod, can be found in Python’s standard math module. We won’t be exploring the mix of approaches used in the implementation of math.factorial, but the function does serve as a useful test oracle.

The goal here is to understand the different trade-offs and nudges that different languages and constraints make when we consider writing factorial, and what we can learn by turning some of the insights inside out.

Unstacking factorial

It is easy to assume the choice between recursion and looping is actually a choice the programmer can always make… but not all languages support both recursion and loops.

Conceived around the same time, Fortran and Lisp embodied distinctions that can still be found in some languages today. Lisp 1.5 had recursion but not loops (although later Lisps added loops, and Common Lisp retains a goto construct from Lisp 1.5). Fortran did not officially support recursion until Fortran-90, and even then this requires an explicit recursive keyword to mark that function state — parameters and local variables — is held dynamically on a call stack rather than statically.

Another language lacking support for call re-entrancy is occam, a now defunct language for parallel computing. Everything in occam is allocated statically, giving the call stack a minimal footprint to enable fast context-switching. Although a strongly procedural language, occam is a stickler for side-effect-free functions:

INT FUNCTION factorial(VAL INT n)
  INT product :
  VALOF
    SEQ
      product := 1
      SEQ i = 1 FOR n
        product := product * i
    RESULT product
:

Functions in occam make many functional programming languages look almost lax and libertine when it comes to matters of state. The requirement for side-effect-free functions in occam is strictly enforced by disallowing calls to procedures, use of channels, creation of processes and access to any mutable state in outer scopes.

Another aspect of language design relevant to the discussion in this article is to understand occam’s representation of the classic for loop. It uses a replicator, which, in the case of SEQ, replicates sequential composition, each replicant uniquely indexed. The assignment statement is effectively replicated n times, with each replicant receiving a different value of i, and then executed in sequence. The same replication applies to other control structures, such as PAR, which composes statements for parallel execution. In occam statements are considered to be processes, which leads to the following observation:

A bounded loop may be thought of as being an array of processes.

In other words, bounded loops can be considered the time or control-flow version of arrays, and arrays are the spatial or structural counterpart of bounded loops. There is a connection of ideas here that is deeper and more general than just arrays. It may also be familiar from another context: the common performance tuning technique of trading space for time, and vice versa. Need something to run faster? Use more space, e.g., caching. Need to use less space? Use more time, e.g., calculate on demand.

Restacking factorial

Erlang is a language that lacks any iterative control-flow construct:

It allows expressions to be organised for sequential execution and actors — functions executed as processes — can be spawned for concurrent execution.
It offers short-circuiting Boolean operators.
It has a number of selection-based constructs: if expressions, which are like Dijkstra guarded commands, but evaluated top to bottom until a true condition is found; case expressions, which are a generalised pattern-matching version of case statements; receive expressions, which select on an actor’s received messages.

But no loops.

All repetition in Erlang is achieved through recursion. When the last expression of a function is a recursive call to itself, such a tail call can be eliminated. Tail recursion recycles the current call frame rather than consuming more stack, which means that a process can run forever without blowing the call stack. Tail-call elimination is often referred to as an optimisation, which makes it sound optional — which, in many languages, it is. In recursion-based languages, however, tail-call elimination is a necessity — there is nothing optional about it.

If we think in terms of if when coding factorial, we get the following:

factorial(N) ->
    if
        N == 0 -> 1;
        true -> N * factorial(N - 1)
    end.

This is a conventional — what we have seen before — and compact expression of a recursive factorial. But it does not make best use of the language. Rather than thinking about a function as a single entity into which execution flows, whereupon decisions are made, a function in Erlang is defined as one or more clauses, each of which can express the function in a different situation. For example, here is a definition that expresses factorial as two clauses, the first of which applies to the zero case and the second of which is the general case:

factorial(N) when N == 0 -> 1;
factorial(N) -> N * factorial(N - 1).

The guard clause in the first case holds true for factorial(0) but not for other factorial(N). An even more direct expression of this idea relies not on guards but on pattern matching, which in Erlang is based on Prolog unification:

factorial(0) -> 1;
factorial(N) -> N * factorial(N - 1).

Such a definition of factorial feels uncomplicated and unsurprising, declaratively and quite literally stating what factorial is rather than teasing it apart into a how-to guide of execution. It demonstrates a close fit between language design and intended paradigm of use.

This example in Erlang — and other functional languages such as Haskell — shows that factorial is not always an example to be wheeled out to torture programmers with recursion. But it also demonstrates the necessity of alignment of language and method of expression — sometimes factorial really is just an example wheeled out to torture programmers with needless recursion.

From recursion to loops

Let’s return to Python and a statement-based recursive implementation:

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

I want to transform and reveal a particular aspect of recursive implementation that is often overlooked. Let’s start by inverting the condition:

def factorial(n):
    if n != 0:
        return n * factorial(n - 1)
    else:
        return 1

The reason for doing this and some of the longer, less idiomatic constructs in what follows is to highlight some points of similarity across ideas and approaches normally perceived as different.

The likeness of form enables the reader to recognize more readily the likeness of content and function.
— William Strunk, Jr., and E B White

Although it may not look like it, there is a data structure at work here: the humble stack. It’s hidden from view, but without it there is no recursion: it’s the call stack. This also tells us that whenever we have a recursive algorithm we can always make it non-recursive by introducing an explicit stack and a loop. In the following code, we use a list with a last-in-first-out discipline:

def factorial(n):
    # recursive calls pushing onto stack
    factors = []
    while n != 0: 
        factors.append(n)
        n = n - 1
    # recursive returns popping from stack
    result = 1
    while factors:
        n = factors.pop()
        result = result * n
    return result

For factorial(5), printing factors after the first while loop shows

[5, 4, 3, 2, 1]

Of course, because of the symmetric nature of multiplication and the simple arithmetic involved, we gain nothing by making the intermediate data structure explicit — this code is clearly more complex than either the direct recursive or simple loop-based forms.

While all recursive algorithms can indeed be made non-recursive by using loops and a stack, it is also true that not all recursive algorithms need a stack to be flattened. Sometimes a simpler form of state than a stack, such as an accumulating variable, will do the job, as is the case with a more conventional looped implementation of factorial:

def factorial(n):
    product = 1
    for i in range(1, n + 1):
        product *= i
    return product

In other cases, such as those that use tail recursion, there may be no need for any additional state at all. For example, replacing a recursing function with a label and a tail call with a goto (in languages that support it) highlights the closeness of this translation. If there are arguments to pass, this can be expressed by updating the current values; if there are no arguments to pass, the translation is direct.

Note that the recursive implementations of factorial shown so far are not examples of tail recursion: although factorial is called within the last expression of the function, this call is not the entirety of the last expression of the function. To implement factorial using a tail call requires a little more choreography, whereupon the loop state being abstracted becomes more obvious:

def factorial(n):
    def loop(i, product):
        if i == 0:
            return product
        else:
            return loop(i - 1, i * product)
    return loop(n, 1)

Data is normally considered the repository of state in a program, but control flow has both visible and implied state, as can be seen when trying to convert between one form of flow and another. Where code is expressed with a specified time ordering, such as composing statements in sequence or a loop ticking through its repetitions, there is also state. This stateful aspect is a subtlety often overlooked when developers are deciding between different coding styles: state is not always locked up in explicitly named variables.

Control flow as data structure

The observation about control and data can be taken further to unify a dichotomy: many data structures can be considered frozen algorithms; many algorithms can be considered thawed data structures. For example, a binary tree can be considered the solid form of a binary search, while a binary search can be considered a fluid animation of a binary tree.

We can make the recursion in the call even more explicit as recursion in data structure:

def factorial(n):
    factors = ()
    while n != 0:
        factors = (n, factors)
        n = n - 1
    result = 1
    while factors:
        (n, factors) = factors
        result = result * n
    return result

Tuples have different manifestations and interpretations depending on language. In Python a tuple is an immutable list. Because the goal here is to emphasise the data structuring, I have made tuple construction and unpacking explicit in this code — the parentheses around the (n, factors) pair are optional in both cases.

For factorial(5), printing factors after the first while loop would show

(1, (2, (3, (4, (5, ())))))

A little depth and displacement perhaps helps to make the structural and temporal connection more obvious:

(1,
  (2,
    (3,
      (4,
        (5,
          ()
        )
      )
    )
  )
)

A list expressed as pair in which the second value is also a pair in which the second value is also a pair, etc., is a recursive data structure. It is also the bedrock of the Lisp — LISt Processor—language. Understanding that the lists in question are singly linked — and therefore have a bias towards operations on the top or initial element — it can also be considered a stack-processing language. Lisp also unifies code and data in other ways:

The fundamental conceit of the Lisp programming language is that everything can be represented and manipulated as lists, including code.

To construct a list in Lisp, i.e., cons, you create a pair — a tuple of two — from a head value and an existing list. This effectively prepends (or pushes, if you want the stack framing) the head value onto the tail. A corresponding cons function in Python simply wraps the normal tuple construction:

def cons(head, tail):
    return (head, tail)

A special value, nil, represents both an empty list and falsehood.

nil = ()

All non-empty lists are terminated with an empty list, which is how you bootstrap creating a list — a cons with nil as the second in the pair.

Deconstructing a list involves extracting the head (the first element) and the tail (the second element). These operations are respectively known as car and cdr, which, in an otherwise well-considered naming system, reflect the IBM 704 architecture LISP was first implemented on: ‘contents of the address part of register’ and ‘contents of the decrement part of the register’.

def car(whole):
    (head, tail) = whole
    return headdef cdr(whole):
    (head, tail) = whole
    return tail

We can now rewrite the tuple-based stack version of factorial with an affected lisp:

def factorial(n):
    factors = nil
    while n != 0:
        factors = cons(n, factors)
        n = n - 1
    result = 1
    while factors:
        n = car(factors)
        factors = cdr(factors)
        result = result * n
    return result

This deconstruction is not something anyone would be happy to see in production code, but that’s not what it’s for.

deconstruction
The act of breaking something down into its separate parts in order to understand its meaning, especially when this is different from how it was previously understood.

Spacetime continuum

There’s an idea we keep returning to: space and time in code are, if not fungible, often translatable into one another. This is perhaps most obvious when make a space or time trade-off, such as an optimisation, but we can see it more fundamentally as a way of thinking about how we structure code in general and how we divide up paradigms and designs.

Recursion is a form of flow expressed through and expressible as a stack. Esoteric as it is, sleep sort is an example of taking a structural concept — a sorted array — and turning it into a temporal progression. Process models of execution, such as the actor model or CSP, take a simple functional abstraction, such as a function or a procedure, that is normally only thought of in terms of call and return and stretch it out over time, its transient local variables become encapsulated process state.

To see a world in a grain of sand is to understand why factorial is an interesting example to repeatedly dive into.