Get the Drop on Sorting

Dropsort adventures in algorithm design, programming languages and sequential and concurrent control flow

Kevlin Henney
16 min readAug 9, 2021

In a previous post I explored sleep sort, a somewhat pathological sorting algorithm that reframes spatial rearrangement as temporal. There is no sense in which the algorithm is intended for practical work, but such esoteric algorithms are useful in what they can teach us about paradigms, performance and programming languages. They hold up a mirror to our assumptions and offer us a playground in which we can run riot, playing with possibilities we might not otherwise entertain. In short, the value of esoteric algorithms is that they are freeing and, therefore, fun and educational.

Following that lead, let’s look at dropsort, another NSFW sorting algorithm to see what we can learn about algorithms, coding styles, control flow, software architecture, performance qualities and the nature of software development.

Water drops turning to ice

A drop in the ocean of sorting algorithms

Whereas sleep sort superficially looks like an O(n) algorithm, dropsort actually is one. It was first published in 2006 by David Morgan-Mar.

Unlike sleep sort, there is no sleight of hand here: whichever way you look at it, it takes linear time, requiring precisely n–1 comparisons for a sequence of n elements. Here’s a Java implementation:

public static<T extends Comparable<T>> void dropsort(List<T> values)
{
T previous = null;
for (var sorting = values.iterator(); sorting.hasNext();)
{
var current = sorting.next();
if (previous != null && current.compareTo(previous) < 0)
sorting.remove();
previous = current;
}
}

This is a stable single-pass algorithm that ensures each element in the sequence is not less than the previous one. And if it is? It gets dropped. This process of elimination also explains why this sorting algorithm has also been (unknowingly reinvented over a decade later and) called Stalin sort.

It’s a feature, not a bug

It would be easy to complain that dropsort is broken. To claim that something is ‘broken’, however, implies being able to define what ‘not broken’ looks like — this is harder than it might first appear.

When asked what the postcondition of a sorting algorithm is — i.e., what must be true following a call to a correctly implemented sort routine — many developers will respond that the result must be sorted. That’s the truth, but it’s perhaps not the whole truth. (Likewise, my claim there’s no sleight of hand at play with dropsort might itself be considered a sleight of hand rather than the whole truth…)

Dropsort certainly meets the requirement that its result is sorted, so clearly people expect more from sorting than they are usually able to articulate:

The full postcondition is that “The result of sorting a sequence of elements is a sequence of the original elements sorted in non-descending order.”

That we find it so challenging to specify the properties of a case as familiar, definable and seemingly trivial as sorting should keep us humble and cautious about what we can achieve for more complex code and systems.

Dropsort’s USP — if it can be called that — is not so much that it is an efficient algorithm — or even that it is stable one — but that it is a lossy one. Its lossiness might be dismissed out of hand as a defect but, a priori, losslessness versus lossiness is a quality of service consideration that demands more context before judgement can be passed.

Of itself, lossiness is neither good nor bad. We accept — and expect — lossiness to be the appropriate approach in other contexts. For example, compressing a bitmap into a JPEG format is a lossy conversion: the restored image need not be pixel-perfect identical to the original. The benefit to us is in the compression; the lossiness is a tolerable trade-off. Similarly, watching a live video feed, the expected quality of service is to be as up-to-date as possible in preference to recovering any frames that may previously have been lost because of bandwidth limitations or other connection issues.

Context, therefore, determines appropriateness of lossiness, a truth that applies to many qualities in software execution. In the context of sorting, lossiness is unlikely to be a virtue; that context comes with some strong expectations. (But you never know…)

Shelter in place

The Java algorithm just shown sorts in place, hence it is an imperative implementation that works by side effect on an existing data structure. The ‘drop’ of dropsort is quite literal, expressed via the iterator’s uncommonly used remove method. The lossy nature of this implementation requires the original collection be modifiable, so dropsort for immutable collections (which, in Java, means surprising callers with an UnsupportedOperationException for methods that would mutate) and arrays (whose length is fixed at creation) needs to be approached differently.

C++ algorithms operate on iterator ranges, with an iterator representing a position in an iteration and an iterator range defining the extent of the iteration in terms of a start position and a past-the-end position, i.e., the iterators define a half-open interval. For in-place algorithms, element values can be changed, but the source of the values, such as a container or array, cannot be accessed or modified. C++ iterators do not support the rug-pulling antics of Java’s iterators. The convention (and its correspondingly common gotcha) is to return an iterator to where the resulting range ends. Any values between this newly returned end and the original end are unspecified, and it is left to the caller to decide what to do with the surplus elements.

For a C++ in-place, iterator-based implementation of dropsort, the ‘drop’ can be expressed compactly as a left shift of elements to overwrite the dropped position:

auto dropsort(auto sorting, auto done)
{
while ((sorting = is_sorted_until(sorting, done)) != done)
done = shift_left(sorting--, done, 1);
return done;
}

This means the following array can be dropsorted, even though the array cannot be resized:

int values[] {3, 1, 4, 1, 5, 9};
auto result = dropsort(begin(values), end(values));

The first four elements of values will be {3, 4, 5, 9} and result will point to the position immediately following, i.e., &values[4]. For the implementation shown, the six elements of values will be {3, 4, 5, 9, 9, 9}. For a more efficient implementation that only writes as it needs to, rather than shifting all remaining elements each time it drops a value, the six elements of values would be {3, 4, 5, 9, 5, 9}.

For a container such as vector, a caller might choose to erase any surplus elements, shrinking the container to fit:

vector values {3, 1, 4, 1, 5, 9};
auto result = dropsort(begin(values), end(values));
values.erase(result, end(values));

Alternatively, to create the final result in a single expression:

vector values {3, 1, 4, 1, 5, 9};
values.erase(dropsort(begin(values), end(values)), end(values));

Either way, values will contain only {3, 4, 5, 9}.

Out of place

The Java and C++ versions show two ways an in-place algorithm can be realised for an algorithm whose result is a filtered — and therefore lossy — version of its input. Out-of-place algorithms, where the result is composed without affecting the input, are typically favoured in functional and data-flow programming.

The sorted function in Python follows an out-of-place style, returning a new list as its result no matter what iterable it receives as its input. The following procedural Python implementation of dropsort follows this style:

def dropsorted(values):
result = []
for value in values:
if result == [] or value >= result[-1]:
result.append(value)
return result

The result is composed by adding values, so the ‘drop’ is more conceptual and less visible than in either the Java version, where it equates to remove, or the C++ version, where dropping is more an emergent property of movement than an explicit call. In using addition and omission rather than removal and action, this imperative Python implementation can be considered an inversion of the Java one. The following implementation uses a more functional style based on reduce:

def dropsorted(values):
return [] if values == [] else reduce(
lambda result, value:
result + [value] if value >= result[-1] else result,
values[1:], values[:1])

Although not noticeably simpler than the imperative version — and certainly not more efficient — there is a pleasing symmetry that arises from Python’s regular approach to indexing and slicing.

Pipe dreams

The previous section opened with an observation that the algorithm’s result is a filtered version of its input. This suggests that another way to organise the code for an out-of-place sort is explicitly as a filter. Filters are easily composable into pipelines, an approach that underpins Java streams and is readily available in many shell environments, such as Bash and PowerShell.

The following PowerShell dropsorts the values 3, 1, 4, 1, 5, 9 to 3, 4, 5, 9:

3, 1, 4, 1, 5, 9 | ?{$_ -ge $previous} | %{$previous = $_; $_}

The comma-separated list before the first pipe constructs an array that is then piped through a conditional filter that lets the current value pass if it is greater than or equal to the previously set value — on the first pass through, comparison against the as-yet-unset previous value is successful. The last part of the pipeline updates the previous value and passes on the sorted value to the output.

We can formalise and name a Dropsort filter, using a conventional if to roll the conditional prefilter into a single stage:

filter Dropsort
{
if ($_ -ge $previous)
{
$previous = $_
$_
}
}

We can rewrite the previous invocation as

3, 1, 4, 1, 5, 9 | Dropsort

We can make the lifecycle more explicit and remove the implicit reliance on a valid comparison against an unset value:

function Dropsort
{
begin
{
$previous = ''
}
process
{
if ($_ -ge $previous)
{
$previous = $_
$_
}
}
}

Pipes reify the concept of a buffered queue, with dedicated and exclusive endpoints that decouple a producer and consumer from one another. The decoupling takes two forms:

  1. Neither the producer nor the consumer have knowledge of one another.
  2. A queue allows the producer and the consumer to operate asynchronously, so that they are decoupled across time. As well as connecting them for communication, the use of a pipe between two filters causes them to run concurrently.

As well as expressing a functional programming style, classic pipes form a simple co-ordination language for units of execution arranged concurrently in a dataflow architecture.

Channelling the past

Whether as a classic data structure or as an active intermediary, queues introduce a spacetime decoupling between any producer and consumer. They let a value from one piece of running code be picked up by another piece of running code at a later point and, if distributed across execution contexts, in another location by another piece of running code. An N-place queue offers buffering and asynchronicity for N>0. An anonymous pipe, as previously described, is an N-place queue constrained to a single producer and a single consumer. What about constraining N? For example, what about N=1? A promise–future pairing can be modelled as a single-place, single-use queue, decoupling the promise holder from the future recipient.

Dropping from N=1 to N=0, however, fundamentally changes the nature of the communication: for N=0 a queue is unbuffered and synchronous. Both producer and consumer must be available at the same time to pass and receive a value. Some might argue that, even though it still has two directional ends for enqueuing and dequeuing, a zero-length queue is technically not a queue. Whether we view it strictly as a queue or not, such a construct is useful when it comes to constructing and reasoning about concurrent programs. Most obviously, a zero-length queue equates to the channel concept of CSP (Communicating Sequential Processes). Less obviously, it can be used to express Ada’s task rendezvous model — which, if squinted at just right, resembles a synchronous actor model.

But I digress.

Queues connecting concurrent control flows, that’s where we were. And if we remove the buffering we have channels. Although it has consequences for decoupling the rate and variability of production from the rate and variability of consumption, buffering does not affect whether or not we can implement dropsort as a filtering process. Here is dropsort in occam, an imperative programming language that was modelled on the declarative CSP formalism:

PROC dropsort(CHAN OF INT producer, consumer)
BOOL previously :
INT previous :
SEQ
previously, previous := FALSE, 0
WHILE TRUE
INT value :
SEQ
producer ? value
IF
NOT previously OR value >= previous
SEQ
consumer ! value
previously, previous := TRUE, value
TRUE
SKIP
:

Some elements of the syntax may be unfamiliar, so here’s a quick guide:

  • Indentation is significant.
  • The language is block-structured, with SEQ introducing a sequentially executed compound statement. Support for block-structured programming extends to parallel structured programming: there is a corresponding PAR keyword that introduces parallelism, i.e., all statements within a parallel compound are executed in parallel.
  • Multiple assignment is supported, i.e., a, b := c, d. Some ideas and syntax in occam are inspired by BCPL and its predecessor, CPL, which include this ‘simultaneous assignment’ form. This construct is enjoying more popularity today thanks to languages like Python and Go.
  • SKIP is an explicit no-op statement, like pass in Python. Strictly speaking, SKIP is a primitive process that communicates nothing, does nothing and terminates successfully. There is also a STOP primitive that communicates nothing, does nothing and, technically, never terminates.
  • The IF is based on Dijkstra’s guarded command, but with lexically top-to-bottom condition evaluation rather than the non-deterministic evaluation Dijkstra originally proposed. It must always have a matched outcome, hence the TRUE guard that does nothing. The use of a TRUE guard in last place is occam’s idiom for ‘else’.
  • The ? and ! operators receive from and send to a channel, respectively.

One thing made clear in the occam version with WHILE TRUE is that the process can run forever. Unlike a non-lossy sorting algorithm, dropsort doesn’t need to see the whole input before determining its output. A conventional (i.e., non-lossy) sorting facility, like Unix sort or Java streams’ sorted, has to store all the input and wait for it to complete before proceeding.

However, as it can run forever, there is a small question as to whether this realisation of dropsort can properly be called an algorithm. There is a distinction many programmers are unaware of — and often doesn’t matter — between a procedure and an algorithm:

A procedure which always terminates is called an algorithm.

That is, all algorithms are procedures, but not all procedures are necessarily algorithms. Dropsort is intrinsically an algorithm in that its execution is bounded in (linear) time for a bounded input, which is a foundational assumption in the definition of algorithms. We can, however, use it as the basis for a non-terminating process that can, in theory, execute over an unbounded input stream — the lack of actual infinite streams of data in practice is not going to stop us.

Channelling the present

The truth is that these days, when people talk about channels in programming languages, they will be talking about Go rather than occam, which is effectively a dead language. Like occam, Go is a procedural language that supports channels, multiple-assignments and inline function blocks, but apart from this handful of similarities, Go is more familiar, more conventional and much looser than occam, owing more to C than to CSP. (To be fair, in my experience, all other languages are looser than occam — it is the strictest language I have ever used.)

By default, Go channels are proper channels, i.e., unbuffered and resulting in synchronous communication. It is possible to create buffered channels, i.e., more like pipes than channels in the strictest sense, but that makes no difference to how dropsort can be expressed and run:

func Dropsort(producer <-chan int, consumer chan<- int) {
previously, previous := false, 0
for value := range producer {
if !previously || value >= previous {
consumer<- value
previously, previous = true, value
}
}
}

For a given source and sink that are both channels of int, the following code launches Dropsort as a concurrently running goroutine:

go Dropsort(source, sink)

Please note that, unlike all the other implementations presented so far, the occam version I showed before is both untested and uncompiled. I first wrote the Go version and then — dusting off both my memory and a couple of old books — translated it into occam.

Iterating co-operatively

Any program that can be written in terms of synchronous communication between its units of execution can be executed using only a single active thread of execution. In such cases, pre-emptive multitasking can be replaced by co-operative multitasking. Coroutines are an example of this approach, although the use of an async keyword to define them in some languages can be a little misleading (rather than being truly decoupled in time, ‘asynchrony’ with coroutines manifests sequentially and synchronously with respect to non-pre-emptive scheduling).

If there is a steady to and fro of control flow, coroutines can be further simplified. In the mid-1970s, the influential CLU language took advantage of this particular possibility. As noted by Barbara Liskov and her team (yes, that Liskov):

Iterators are a form of coroutine; however, their use is sufficiently constrained that they are implemented using just the program stack.

CLU’s iterator concept — where an iterator is a function that acts out a whole traversal rather than being an encapsulated position within a traversal — and syntax — such as the use of yield to suspend execution and return control and values to the looping context — has found its way into a number of languages, including Python, Ruby, JavaScript and C#, with Python later generalising this to coroutines and Ruby to fibres.

Constrained over an array of integers, dropsort would be coded in CLU something like the following:

dropsort = iter (values: array[int]) yields (int);
previously: bool := false;
previous: int := 0;
for value: int in array[int]$elements (values) do
if ~previously | value >= previous then
yield (value);
previously, previous := true, value;
end;
end;
end dropsort;

For an array of integers named unsorted, the dropsort invocation would be something like the following:

for next: int in dropsort (unsorted) do
stream$putl (stream$primary_output (), int$unparse (next));
end;

Of course, it is ‘something like’ this code because CLU was a research language without limited distribution. Although I have read up on it, I have precisely zero hands-on experience with it. That said, CLU’s long-term and far-reaching influence is such that its paradigms and syntax are familiar. I patterned it after the Go code and the following Python:

def dropsort(values):
previous = None
for value in values:
if previous is None or value >= previous:
yield value
previous = value

It’s worth noting that all of the dropsort implementations based on some form of concurrency are structurally similar. It’s also worth noting that if you’re interested in the history of programming languages, CLU is a real treasure trove — exception handling and data abstraction are two other obvious areas of influence, along with multiple assignment and syntax-sugared operator overloading. Taken together with occam and a few others, it’s clear that much of what is considered by many programmers to be new or recent in mainstream programming languages is anything but.

Reading, ‘Riting and ‘Rithmetic

If concurrency with synchronous communication can be flattened to coroutines, and coroutines can be flattened to iterators, iterators can be flattened further.

A different way to look at the producer–consumer concurrency is in terms of a reader and a writer. We can take inspiration from a tape recorder that has a read head (the playback head) and a write head (the record head). The stream of values is the tape and the read and write heads can potentially be at different positions along the tape. Drawing on this analogy, we can return to the C++ code from earlier and offer an alternative out-of-place implementation of dropsort:

auto dropsort(auto reading, auto done, auto writing)
{
for (auto wrote = reading; reading != done; ++reading)
{
if (*reading >= *wrote)
{
wrote = reading;
*writing++ = *reading;
}
}
return writing;
}

Here we have the idea of iterating the read head from where it starts up until the end position, writing to and advancing the write head as the next sorted value is encountered. This can be used to sort from one range to another, or to sort in place in the same range, with the write position overwriting existing elements:

int values[] {3, 1, 4, 1, 5, 9};
auto result = dropsort(begin(values), end(values), begin(values));

The first four elements of values are now {3, 4, 5, 9} and, as before, result will point to the position immediately following, i.e., &values[4]. Alternatively, via adaptation, the sorted result can be appended to an initially empty container:

const vector values {3, 1, 4, 1, 5, 9};
vector<int> result;
dropsort(begin(values), end(values), back_inserter(result));

Initially empty, result ends up containing {3, 4, 5, 9}, whereas values is unchanged. The out-of-place dropsort algorithm can also be used to (re)implement the in-place dropsort we had earlier:

auto dropsort(auto sorting, auto done)
{
return dropsort(sorting, done, sorting);
}

Here the read head is set to the same position as the write head, hence overwriting the original sequence.

Learning from loss

What have we learned from dropsort? Requirements and specification are hard, but not always because of essential complexity. Lossiness can be considered a quality-of-service issue, not simply a quality-of-implementation issue. The character and expression of an algorithm depends hugely on paradigm, programming language and other matters of idiom.

A language that doesn’t affect the way you think about programming, is not worth knowing.

Alan Perlis

In looking at algorithm design, it is conventional to assume that out-of-place algorithms require additional memory compared to in-place algorithms. However, this is more a matter of computational model than a necessity. For a style based on pure functions, this is true. For a style based on concurrent streams or channels, however, this is not. Java streams are lazy but not concurrent, and so its sorted operation is implemented as a sink that must receive and store all elements before sorting and passing them on.

That said, it is possible to optimise dropsort’s retention characteristics by using additional memory, while still preserving its O(n) performance:

A concept of recent memory is added to the algorithm. For example, if several elements are consecutively dropped, the preceding element is assumed to be far from ordered and is removed instead. This is shown to increase the usefulness of Dropsort by increasing the number of retained elements. With a specific type of introduced error, 10,000% more of the original elements are retained over standard Dropsort.

With such tweaks, dropsort is perhaps also not as entirely frivolous as we might have assumed:

This improved algorithm has many applications to very quickly sort large
data sets when specific elements are not as important as the assurance that the list is sorted. This could be the case when a sorted list is critical but the source providing the pre-sorted input is not trustworthy. Also, applications in fields with large data sets and where speed is of utmost importance are discussed.

That said, I’m not going to close by recommending this algorithm for your current codebase! A visit to your regular library is all you need — this essay and its explorations are just some extra-curricular fun on the side.

As with the sleep sort essay, this one ended up being longer than expected, pulling me in different directions and inviting me to consider other possibilities and details. It started with the idea of just showcasing dropsort. I couldn’t, however, decide between Java and Python, so I chose both… and then realised there were enough points of difference that more discussion was appropriate… and then two languages became three as I realised there were other first-class implementation options to be considered in C++. And then I started considering concurrency… and iteration constructs… which added more languages, along with some historical spelunking… and, well, you’ve come this far, so you see where it went and how it went. Unlike some of the code shown, however, this essay should terminate…

STOP

--

--

Kevlin Henney
Kevlin Henney

Written by Kevlin Henney

consultant · father · husband · itinerant · programmer · speaker · trainer · writer