Lifting the Recurse of Factorial

Recursion? Iteration? Or a third (or fourth) way?

Kevlin Henney
12 min readSep 17, 2021

It has been said that if you want to create something unique, something likely never seen in the universe, shuffle a pack of cards. The probability that the particular arrangement of cards has ever existed before is vanishingly small. The number of permutations of a 52-card deck is 52×51×50×…×2×1, which we can write more briefly as Π⁵²i. Another way of writing this is 52!, which is the factorial of 52.

Factorial often finds itself as recursion’s poster child. L Peter Deutsch felt that

To iterate is human, to recurse divine.

Cool as it sounds, it’s not obvious this is always true. Divinity, if there can be such a thing in code, is a more subtle quality achieved by many different paths. Defining something in terms of itself is profound and important, but that doesn’t mean it’s always either necessary or appropriate. Simplicity, elegance and commodity are as much a function of context as they are a context of function.

In the case of factorial, the real lesson is seeing how something as seemingly simple as factorial — and therefore any code — can manifest in so many different ways, inviting so many considerations and techniques, to show what the limits and styles of languages can show us about data structures, control flow and paradigms. Continuing that journey, this time we unask the question of iteration versus recursion.

Thanks for the memoize

Let’s start by revisiting a recursive Python implementation:

def factorial(n):
return 1 if n == 0 else n * factorial(n - 1)

Except for the base case of 0, each call will involve multiple invocations of factorial, executing and multiplying all the factorials from n down to 0. Each call of factorial will recalculate many or all of the factorial values previously visited. Rather than recalculate the values, we can memoize them.

memoize

To store the result of a computation so that it can be subsequently retrieved without repeating the computation.

Let’s cache in on the previous calls:

@cache
def factorial(n):
return 1 if n == 0 else n * factorial(n - 1)

The cache decorator is

A thin wrapper around a dictionary lookup for the function arguments.

It’s not that software development is stuck on repeat but… when I looked up the link for cache I noticed the example in the documentation was factorial.

We can, however, do better. Because, by definition, a general-purpose cache is not specific to our situation, it is inevitably less optimal than one that takes advantage of the particular contours of our problem and solution spaces. Our function takes a single argument that is a non-negative integer. Different invocations of a function can be calculated in terms of previous invocations — they are not independent. Any call to factorial(n) either will rely on an existing cached value for n or will populate all the entries from the highest cached value up to n.

The type, the domain and the dependency mean we can replace the dictionary-based approach with a list-based approach:

factorials = [1]def factorial(n):
while len(factorials) <= n:
factorials.append(factorials[-1] * len(factorials))
return factorials[n]

Memoization is expressed in the factorials list. If a requested result is there, it is looked up directly by index. If a requested result is not yet there, i.e., it would index beyond the existing length of the factorials list, the list is extended until it reaches the right length. A loop calculates and appends the next factorial based on the last value in the list.

In common with many other JIT and cache techniques, such lazy evaluation is not thread-safe out of the box: the memoized function is stateful rather than stateless like the original function. In truth, if you’re working in Python, this is not really a problem, because you should be steering well clear of threads:

Python’s global interpreter lock (GIL) and casually modifiable underlying object model make it almost uniquely unsuited to pre-emptive threading.

Factorial unrolled

We may find that, in practice, our optimisation won’t demand much space — factorial has the kind of growth that makes native integer representations whimper. We can easily get a sense of how rapidly factorial grows:

>>> for n in range(22): print(f'{n:2}! = {factorial(n)}') 0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
7! = 5040
8! = 40320
9! = 362880
10! = 3628800
11! = 39916800
12! = 479001600
13! = 6227020800
14! = 87178291200
15! = 1307674368000
16! = 20922789888000
17! = 355687428096000
18! = 6402373705728000
19! = 121645100408832000
20! = 2432902008176640000
21! = 51090942171709440000

Here are some signposts on factorial’s combinatorially explosive journey: 7! is the highest value factorial that can be held in a 16-bit integer; 12! is the highest for a 32-bit integer; 20! is the highest for a 64-bit integer. As shown, Python’s integers are not so constrained.

But if our integers have a fixed bit width, as they do in C, there’s no need to optimise with runtime caching or, indeed, perform any runtime calculation at all: predefine a table that already holds all the values that could ever be used. All the values are determined at development time rather than runtime; every runtime evaluation becomes a constant-time lookup.

static const long long factorials[] =
{
1,
1,
2,
6,
24,
120,
720,
5040,
40320,
362880,
3628800,
39916800,
479001600,
6227020800,
87178291200,
1307674368000,
20922789888000,
355687428096000,
6402373705728000,
121645100408832000,
2432902008176640000,
};
long long factorial(int n)
{
return factorials[n];
}

Quite literally, we have lifted a structure out of the flow of execution, unfolding space from time. We have moved from lazy evaluation to eager evaluation, eliminated any and all threading constraints and, by unrolling an iteration into an array, transformed control flow into data structure:

This family of approaches, from lookup tables to table-driven code, finds expression across many paradigms and contexts but is often overlooked.

Precalculating all the results in advance and putting them in a table for runtime lookup is a general technique that can be employed for any function with a small domain (here, just 21 values), not just factorial.

Returning to the specifics of the code above, the long tail of history and the wide range of platforms has meant that while some systems map long to a 64-bit integer (LP64), others keep long at 32 bits and map long long to 64-bit integers (LLP64). This question of 64-bit mapping extends beyond native platform mapping with, for example, a 64-bit long mapping in .NET and the JVM and a 64-bit long long mapping in OMG IDL (MIDL chose hyper for this mapping).

For the sake of portability, long long is the preferred option in C when you’re hitting that part of the integer domain. I’ve made a further assumption that long long is exactly 64 bits rather than simply being being at least 64 bits. In practice this is a portable assumption but, if you want to underscore the dependency more explicitly, you can use int64_t rather than long long.

Generations

We can precalculate the values by hand or by code and copy them into the table… or just copy them from the code above (go ahead, those numbers are not subject to copyright)… or use code generation:

print('static const long long factorials[] =\n{')
for i in range(21):
print(f' {factorial(i)},')
print('};')

Pipe the output into a file named factorials.h and include it as a header:

#include "factorials.h"long long factorial(int n)
{
return factorials[n];
}

Or we could just generate all of the C code:

print('''\
static const long long factorials[] =
{''')
for i in range(21):
print(f' {factorial(i)},')
print('''\
};
long long factorial(int n)
{
return factorials[n];
}''')

As an aside, when it comes to code generation, don’t skimp on the formatting or naming. There’s no reason generated code has to look bad — indeed, it’s one time you have total control over making it look good! You’re not saving time by not caring about it. The idea that the generated code is not for human consumption and will only ever be seen by the compiler is one of those fantasy bubbles that gets burst after you’ve worked with generated code for any length of time (say, five minutes or so…).

It may be that no one (yourself included) will ever thank you for caring about the readability, but that future readers (yourself included) won’t take your name in vain one day when reading through the code is the silent reward you’re aiming for. It’s not presence of praise that matters: it’s absence of damnation.

Getting meta

If you’re unhappy with magical constants appearing in your code, no matter what their provenance, another approach is to unroll the calculation for each table entry.

static const long long factorials[] =
{
1L,
1L,
2L*1,
3L*2*1,
4L*3*2*1,
5L*4*3*2*1,
6L*5*4*3*2*1,
7L*6*5*4*3*2*1,
8L*7*6*5*4*3*2*1,
9L*8*7*6*5*4*3*2*1,
10L*9*8*7*6*5*4*3*2*1,
11L*10*9*8*7*6*5*4*3*2*1,
12L*11*10*9*8*7*6*5*4*3*2*1,
13L*12*11*10*9*8*7*6*5*4*3*2*1,
14L*13*12*11*10*9*8*7*6*5*4*3*2*1,
15L*14*13*12*11*10*9*8*7*6*5*4*3*2*1,
16L*15*14*13*12*11*10*9*8*7*6*5*4*3*2*1,
17L*16*15*14*13*12*11*10*9*8*7*6*5*4*3*2*1,
18L*17*16*15*14*13*12*11*10*9*8*7*6*5*4*3*2*1,
19L*18*17*16*15*14*13*12*11*10*9*8*7*6*5*4*3*2*1,
20L*19*18*17*16*15*14*13*12*11*10*9*8*7*6*5*4*3*2*1,
};
long long factorial(int n)
{
return factorials[n];
}

Pretty… tedious. And error prone. (Confession: for those reasons, I chose to generate the table entries above rather than cycle through a copy–paste–tweak process.)

Each entry shows the long-hand (or long-long-hand…) calculation. In this case, a double dose of unrolling may be more of an overdose than a preferred solution. If we switch from C to C++, we can have both compile-time calculation and functional abstraction.

Once upon a time, abstracting compile-time computation in C++ necessarily meant resorting to the impenetrability of its accidental template metaprogramming capability. Recent C++ standards have made the language’s compile-time expressiveness friendlier and more intentional.

namespace factorials
{
constexpr long long calculate(int n)
{
return n == 0 ? 1 : n * calculate(n - 1);
}
constexpr std::array calculated
{
calculate(0),
calculate(1),
calculate(2),
calculate(3),
calculate(4),
calculate(5),
calculate(6),
calculate(7),
calculate(8),
calculate(9),
calculate(10),
calculate(11),
calculate(12),
calculate(13),
calculate(14),
calculate(15),
calculate(16),
calculate(17),
calculate(18),
calculate(19),
calculate(20),
};
}
constexpr long long factorial(int n)
{
return factorials::calculated[n];
}

Here, the working has been placed out of the way in a namespace of its own, factorials, leaving the function of interest, factorial, as before. The working involves the separation of an actual function to calculate the factorial from the table of entries that hold the values. Both the function and the table are qualified with constexpr, which means they can be evaluated at compile time. For data, this means that it is also const. For a function, it means that if its arguments are compile-time constants, the function can be evaluated at compile time to produce a compile-time constant. This could be further constrained by defining calculate as consteval, making it an immediate function that can only be executed at compile time.

Transparent maps

Taking a slightly different path, let’s implement factorial in Ada to highlight another language-related consideration. Here’s the iterative version:

function Factorial(N: Integer) return Long_Integer is
Result: Long_Integer := 1;
begin
for I in 1..N loop
Result := Result * Long_Integer(I);
end loop;
return Result;
end;

The introduction of expression functions and conditional expressions in Ada 2012 makes the recursive implementation syntactically lighter than it would otherwise have been:

function Factorial(N: Integer) return Long_Integer is
(if N = 0 then 1 else Long_Integer(N) * Factorial(N - 1));

Note that when integers overflow in Ada, an exception is raised rather than wrapping around. For both the implementations above, the following blows up on the last time round the following loop, with N = 21:

for N in 0..21 loop
Put_Line(N'Image & "! = " & Factorial(N)'Image);
end loop;

As you might expect, Integer is a 32-bit integer and Long_Integer is a 64-bit integer. Ada requires even a widening conversion to be made explicit in this context. Converting the code from flow to lookup has the welcome side-effect of eliminating such conversions.

function Factorial(N: Integer) return Long_Integer is
Factorials: constant array(0..20) of Long_Integer :=
(
1,
1,
2,
6,
24,
120,
720,
5040,
40320,
362880,
3628800,
39916800,
479001600,
6227020800,
87178291200,
1307674368000,
20922789888000,
355687428096000,
6402373705728000,
121645100408832000,
2432902008176640000
);
begin
return Factorials(N);
end;

We can hoist the constant array out of the function and define Factorial as an expression function:

Factorials: constant array(0..20) of Long_Integer :=
(
1,
1,
2,
6,
24,
120,
720,
5040,
40320,
362880,
3628800,
39916800,
479001600,
6227020800,
87178291200,
1307674368000,
20922789888000,
355687428096000,
6402373705728000,
121645100408832000,
2432902008176640000
);
function Factorial(N: Integer) return Long_Integer is
(Factorials(N));

There’s something worth noting here: Ada, in common with a few other languages (from Fortran to Scala), indexes arrays using rounded parentheses (i.e., Factorials(N)) rather than square brackets (i.e., Factorials[N]). From a mathematical perspective, this is both reasonable and quite attractive: a function represents a mapping from a domain to a range; an array also represents a mapping, with the domain considered to be the index and the range considered to be a stored value rather than, say, a formula. The mapping here is of a natural number to another natural number, i.e., factorial: N⁰ → N.

This commonality suggests that it would make sense to use a uniform syntax to highlight the similarity and substitutability. Given the strong emphasis on mathematical thinking in functional programming, it is — curiously — an opportunity for referential transparency that functional languages do not in general seem to opt for.

But we’re looking at Ada where that option exists, so let’s use it:

Factorial: constant array(0..20) of Long_Integer :=
(
1,
1,
2,
6,
24,
120,
720,
5040,
40320,
362880,
3628800,
39916800,
479001600,
6227020800,
87178291200,
1307674368000,
20922789888000,
355687428096000,
6402373705728000,
121645100408832000,
2432902008176640000
);

We have refactored the factorial function away: it is not a wrapper around a table lookup; it is a table lookup. The circle between algorithmic function and data structure is closed.

A sterling approximation

So maybe you don’t have a convenient big integer type to hand and are limited to the complimentary two’s complement native integers and floating-point numbers.

Or perhaps you simply don’t have a need for that level of precision in your factorial — after all, factorial gets very big very quickly, and the differences between one ridiculously large number and the next ridiculously large number may be better captured in orders of magnitude than in significant figures.

An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.

John Tukey

Or perhaps you care about performance, both in time and space, in which case, hardware-backed floating-point numbers can edge out coded-up big integers.

Whatever the motivation, there’s no need to go in circles with loops, deep dive into recursion or draw up tables when you can take advantage of Stirling’s approximation, which is named for James Stirling, but was first stated by Abraham de Moivre.

Stigler’s law of eponomy states that no scientific law is named after its original discoverer. Theories, inventions and discoveries are often repeated, and are often named after the person or people who, for reasons of timing, popularity or cultural bias, may not be the person or people first responsible.

Factorial can be approximated by using n! ~ (n/e)(2πn). The value of the approximation approaches the actual factorial value as n tends to infinity (which you can be quite confident of never reaching, regardless of hardware).

def factorialish(n):
return (n / e)**n * sqrt(2 * pi * n)

In its vanilla form, this works with double-precision floating-point numbers.

>>> for n in range(22): print(f'{n:2}! = {factorialish(n):.2f}') 0! = 0.00
1! = 0.92
2! = 1.92
3! = 5.84
4! = 23.51
5! = 118.02
6! = 710.08
7! = 4980.40
8! = 39902.40
9! = 359536.87
10! = 3598695.62
11! = 39615625.05
12! = 475687486.47
13! = 6187239475.19
14! = 86661001740.60
15! = 1300430722199.47
16! = 20814114415223.14
17! = 353948328666101.12
18! = 6372804626194313.00
19! = 121112786592294192.00
20! = 2422786846761135104.00
21! = 50888617325509746688.00

If you have big integers to hand, as is the case in Python (but not C), it makes sense to return the factorial approximation as an integer — what would be expected of a factorial function — rather than a floating-point number.

def factorialish(n):
return ceil((n / e)**n * sqrt(2 * pi * n))

Rounding up to the next integer rather than taking the floored integer part of the result gives more accurate values at the lower end.

>>> for n in range(22): print(f'{n:2}! = {factorialish(n)}') 0! = 0
1! = 1
2! = 2
3! = 6
4! = 24
5! = 119
6! = 711
7! = 4981
8! = 39903
9! = 359537
10! = 3598696
11! = 39615626
12! = 475687487
13! = 6187239476
14! = 86661001741
15! = 1300430722200
16! = 20814114415224
17! = 353948328666102
18! = 6372804626194313
19! = 121112786592294192
20! = 2422786846761135104
21! = 50888617325509746688

When we change the tolerance on our expectations of functionality, we should not assume we are bound to the same design as before, or that the path from one solution to the next is necessarily a smooth refactoring progression. The path may well be discontinuous rather than continuous. Sometimes, changing the game slightly changes the game completely, and a radically different solution is both possible and desirable. This is as true of large systems as it is of factorial.

Holding all the cards

You may recall we opened with the observation that shuffling a pack of cards creates something unique in the universe. The probability that two randomly shuffled decks — or two games of 52-card pickup — have the same sequence is 1 in 52!. You now have many ways to calculate how unlikely that is.

Of course, the point of this and the previous article is not about tricks and tips for writing a factorial function. Factorial is just an excuse, a lever we can use to understand design trade-offs and the nuances of paradigms, to explore language design and its consequences, to reveal the interplay between structure and time in code.

But perhaps tricks and tips were all you were after. In which case, next time the question of factorial comes, you may find yourself better equipped to offer something a little unexpected. (Because there will be a next time — it’s not that software development is stuck on repeat but…)

--

--

Kevlin Henney

consultant · father · husband · itinerant · programmer · speaker · trainer · writer