Implementation Details¶
The Sorted Containers internal implementation is based on a couple observations. The first is that Python’s list is fast, really fast. Lists have great characteristics for memory management and random access. The second is that bisect.insort is fast. This is somewhat counter-intuitive since it involves shifting a series of items in a list. But modern processors do this really well. A lot of time has been spent optimizing mem-copy/mem-move-like operations both in hardware and software.
But using only one list and bisect.insort would produce sluggish behavior for lengths exceeding ten thousand. So the implementation of Sorted List uses a list of lists to store elements. In this way, inserting or deleting is most often performed on a short list. Only rarely does a new list need to be added or deleted.
Sorted List maintains three internal variables: _lists, _maxes, and _index. The first is simply the list of lists, each member is a sorted sublist of elements. The second contains the maximum element in each of the sublists. This is used for fast binary-search. The last maintains a tree of pair-wise sums of the lengths of the lists.
Lists are kept balanced using the load factor. If a sublist’s length exceeds double the load then it is split in two. Likewise at half the load it is combined with its neighbor. By default this factor is 1,000 which seems to work well for lengths up to ten million. Lengths above that are recommended a load factor that is the square root to cube root of the average length. (Although you will probably exhaust the memory of your machine before that point.) Experimentation is also recommended. A load factor performance comparison is also provided. For more in-depth analysis, read Performance at Scale which benchmarks Sorted Containers with ten billion elements.
Finding an element is a two step process. First the _maxes list, also known as the “maxes” index, is bisected which yields the position of a sorted sublist. Then that sublist is bisected for the location of the element.
Compared to tree-based implementations, using lists of lists has a few advantages based on memory usage.
Most insertion/deletion doesn’t require allocating or freeing memory. This can be a big win as it takes a lot of strain off the garbage collector and memory system.
Pointers to elements are packed densely. A traditional tree-based implementation would require two pointers (left/right) to child nodes. Lists have no such overhead. This benefits the hardware’s memory architecture and more efficiently utilizies caches.
The memory overhead per item is effectively a pointer to the item. Binary tree implementations must add at least two more pointers per item.
Iteration is extremely fast as sequentially indexing lists is a strength of modern processors.
Traditional tree-based designs have better big-O notation but that ignores the realities of today’s software and hardware. For a more in-depth analysis, read Performance at Scale.
Indexing uses the _index list which operates as a tree of pair-wise sums of the lengths of the lists. The tree is maintained as a dense binary tree. It’s easiest to explain with an example. Suppose _lists contains sublists with these lengths (in this example, we assume the load factor is 4):
list(map(len, _lists)) -> [3, 5, 4, 5, 6]
Given these lengths, the first row in the index is the pair-wise sums:
[8, 9, 6, 0]
We pad the first row with zeros to make its length a power of 2. The next rows of sums work similarly:
[17, 6]
[23]
Then all the rows are concatenated in reverse order so that the index is finally:
[23, 17, 6, 8, 9, 6, 0, 3, 5, 4, 5, 6]
With this list, we can efficiently compute the index of an item in a sublist and, vice-versa, find an item given an index. Details of the algorithms to do so are contained in the docstring for SortedList._loc and SortedList._pos.
For example, indexing requires traversing the tree to a leaf node. Each node
has two children which are easily computable. Given an index, pos, the
left-child is at pos * 2 + 1
and the right-child is at pos * 2 + 2
.
When the index is less than the left-child, traversal moves to the left sub-tree. Otherwise, the index is decremented by the left-child and traversal moves to the right sub-tree.
At a leaf node, the indexing pair is computed from the relative position of the node as compared with the offset and the remaining index.
For example, given the following index:
_index = 14 5 9 3 2 4 5
_offset = 3
Tree:
14
5 9
3 2 4 5
Indexing position 8 involves iterating like so:
Starting at the root, position 0, 8 is compared with the left-child node (5) which it is greater than. When greater, the index is decremented and the position is updated to the right child node.
At node 9 with index 3, we again compare the index to the left-child node with value 4. Because the index is the less than the left-child node, we simply traverse to the left.
At node 4 with index 3, we recognize that we are at a leaf node and stop iterating.
To compute the sublist index, we subtract the offset from the index of the leaf node:
5 - 3 = 2
. To compute the index in the sublist, we simply use the index remaining from iteration. In this case, 3.
The final index pair from our example is (2, 3) which corresponds to index 8 in the sorted list.
Maintaining the position index in this way has several advantages:
It’s easy to traverse to children/parent. The children of a position in the _index are at
(pos * 2) + 1
and(pos * 2) + 2
. The parent is at(pos - 1) // 2
. We can even identify left/right-children easily. Each left-child is at an odd index and each right-child is at an even index.It’s not built unless needed. If no indexing occurs, the memory and time accounting for position is skipped.
It’s fast to build. Calculating sums pair-wise and concatenating lists can all be done within C-routines in the Python interpreter.
It’s space efficient. The whole index is no more than twice the size of the length of the _lists and contains only integers.
It’s easy to update. Adding or removing an item involves incrementing or decrementing only
log2(len(_index))
items in the index. The only caveat to this is when a new sublist is created/deleted. In those scenarios the entire index is deleted and not rebuilt until needed.
The construction and maintenance of the positional index is unusual compared to other traditional designs. Whether the design is novel, I (Grant Jenks) do not know. Until shown otherwise, I would like to refer to it as the “Jenks” index.
Each sorted container has a function named _check for verifying consistency. This function details the data-type invariants.