Sorted Containers Introduction¶
Installation¶
The first step to using any software library is getting it properly installed. There are several ways to install Sorted Containers.
The recommended way to install Sorted Containers is using the pip command:
$ python3 -m pip install sortedcontainers
You may also choose instead to use the newer pipenv command:
$ pipenv install sortedcontainers
These commands will install the latest version of Sorted Containers from the Python Package Index.
Sorted Containers is actively developed on GitHub, where the code is open source. You may choose to install directly from the source repository. First, you will need a copy of the sources. The recommended way to get a copy of the source repository is to clone the repository from GitHub:
$ git clone git://github.com/grantjenks/python-sortedcontainers.git
You may also choose instead to download the Sorted Containers tarball or download the Sorted Containers zipball.
Once you have a copy of the sources, you can embed it in your Python package, or install it into your site-packages using the command:
$ python3 setup.py install
Sorted Containers is available in Debian distributions as python3-sortedcontainers and python-sortedcontainers.
Sorted Containers is looking for a CentOS/RPM package maintainer. If you can help, please open an issue in the Sorted Containers Issue Tracker.
Sorted List¶
At the core of Sorted Containers is the mutable sequence data
type SortedList
. The SortedList
maintains its values in
ascending sort order. As with Python’s built-in list data type,
SortedList
supports duplicate elements and fast random-access
indexing.
>>> from sortedcontainers import SortedList
>>> sl = SortedList()
Values may be added to a SortedList
using either
SortedList.update()
or SortedList.add()
. When doing so, the list
remains sorted.
>>> sl.update([5, 1, 3, 4, 2])
>>> sl
SortedList([1, 2, 3, 4, 5])
>>> sl.add(0)
>>> sl
SortedList([0, 1, 2, 3, 4, 5])
Several methods may be used to remove elements by value or by index. The
methods SortedList.discard()
and SortedList.remove()
remove
elements by value. And the methods SortedList.pop()
and
SortedList.__delitem__()
remove elements by index. All values may be
removed using SortedList.clear()
.
>>> sl.remove(0)
>>> sl.discard(1)
>>> sl
SortedList([2, 3, 4, 5])
>>> sl.pop()
5
>>> del sl[1]
>>> sl
SortedList([2, 4])
>>> sl.clear()
Because SortedList
is sorted, it supports efficient lookups by value
or by index. When accessing values by index, the SortedList
can be
used as an order statistic tree. Rather than performing a linear scan,
values can be found in logarithmic time by repeatedly bisecting the internal
tree structure. Methods for looking up values are
SortedList.__contains__()
, SortedList.count()
,
SortedList.index()
, SortedList.bisect_left()
,
SortedList.bisect_right()
, and SortedList.__getitem__()
.
>>> sl = SortedList('abbcccddddeeeee')
>>> 'f' in sl
False
>>> sl.count('e')
5
>>> sl.index('c')
3
>>> sl[3]
'c'
>>> sl.bisect_left('d')
6
>>> sl.bisect_right('d')
10
>>> sl[6:10]
['d', 'd', 'd', 'd']
Several methods can be used to iterate values in a SortedList
. There
are the typical sequence iterators: SortedList.__iter__()
and
SortedList.__reversed__()
. There are also methods for iterating by value
or by index using SortedList.irange()
and
SortedList.islice()
. These methods produce iterators that are faster than
repeatedly indexing the SortedList
.
>>> sl = SortedList('acegi')
>>> list(iter(sl))
['a', 'c', 'e', 'g', 'i']
>>> list(reversed(sl))
['i', 'g', 'e', 'c', 'a']
>>> list(sl.irange('b', 'h'))
['c', 'e', 'g']
>>> list(sl.islice(1, 4))
['c', 'e', 'g']
A SortedList
also supports the typical sequence operators
SortedList.__add__()
and SortedList.__mul__()
as well as their
in-place counterparts.
>>> sl = SortedList('abc')
>>> sl + sl
SortedList(['a', 'a', 'b', 'b', 'c', 'c'])
>>> sl * 3
SortedList(['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'])
>>> sl += 'de'
>>> sl
SortedList(['a', 'b', 'c', 'd', 'e'])
>>> sl *= 2
>>> sl
SortedList(['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'e', 'e'])
Although SortedList
implements most of the mutable sequence methods,
there are five which are not implemented. Each of these methods assigns a value
at an index which is not supported by SortedList
.
>>> sl = SortedList('abcde')
>>> sl[2] = 'c'
Traceback (most recent call last):
...
NotImplementedError: use ``del sl[index]`` and ``sl.add(value)`` instead
>>> sl.reverse()
Traceback (most recent call last):
...
NotImplementedError: use ``reversed(sl)`` instead
>>> sl.append('f')
Traceback (most recent call last):
...
NotImplementedError: use ``sl.add(value)`` instead
>>> sl.extend(['f', 'g', 'h'])
Traceback (most recent call last):
...
NotImplementedError: use ``sl.update(values)`` instead
>>> sl.insert(5, 'f')
Traceback (most recent call last):
...
NotImplementedError: use ``sl.add(value)`` instead
Comparison with SortedList
uses lexicographical ordering as with other
sequence types.
Refer to the Sorted List documentation for additional parameters, more examples, and descriptions of runtime complexity.
Sorted-key List¶
The Sorted Containers project also maintains a specialized
sorted-list-like type that accepts a key-parameter as found in Python’s
built-in sorted function. SortedKeyList
provides the same
functionality as SortedList
but maintains the order of contained
values based on the applied key-function. This simplifies the pattern of boxing
and un-boxing which would otherwise be required.
>>> from operator import neg
>>> from sortedcontainers import SortedKeyList
>>> skl = SortedKeyList(key=neg)
The key function extracts a comparison key for ordering items in the list. In our example above we apply the negation operator. In the example above, a sorted list of integers would be ordered in descending sort order.
You can also construct a SortedKeyList
using the SortedList
type by passing a key-function to the initializer.
>>> from sortedcontainers import SortedList
>>> values = SortedList([1, 2, 3, 4, 5], key=neg)
>>> values
SortedKeyList([5, 4, 3, 2, 1], key=<built-in function neg>)
>>> isinstance(values, SortedList)
True
>>> issubclass(SortedKeyList, SortedList)
True
>>> values.key
<built-in function neg>
SortedKeyList
adds three additional methods to the SortedList
type. They are SortedKeyList.bisect_key_left()
,
SortedKeyList.bisect_key_right()
, and SortedKeyList.irange_key()
.
Each of these methods accepts the key rather than the value for its
SortedList
counterpart.
>>> skl = SortedKeyList([1, 2, 3, 4, 5], key=neg)
>>> skl
SortedKeyList([5, 4, 3, 2, 1], key=<built-in function neg>)
>>> skl.bisect_key_left(-4.5)
1
>>> skl.bisect_key_right(-1.5)
4
>>> list(skl.irange_key(-4.5, -1.5))
[4, 3, 2]
Refer to the Sorted List documentation for additional parameters, more examples, and descriptions of runtime complexity.
Caveats¶
Sorted Containers data types have three requirements:
The comparison value or key must have a total ordering.
The comparison value or key must not change while the value is stored in the sorted container.
If the key-function parameter is used, then equal values must have equal keys.
If any of these three requirements are violated then the warranty of Sorted Containers is void and it will not behave correctly. While it may be possible to design useful data types that do not have these requirements, these are the caveats of Sorted Containers and they match those of alternative implementations. Each of these requirements allow for optimizations and together they are an attempt to find the right tradeoff between functionality and performance.
Let’s look at some examples of what works and what doesn’t. In Python, all
objects inherit from object
which provides a default implementation of
equality. In pseudocode, the object type looks something like:
>>> class Object:
... def __eq__(self, other):
... return id(self) == id(other)
The default implementation defines equality in terms of identity. Note that
Object does not define comparison methods like less-than or
greater-than. While Python objects are comparable by default in Python 2, the
feature was removed in Python 3. Instances of object can not be stored in a
SortedList
.
We can extend this example by creating our own Record data type with name and rank attributes.
>>> class Record(object):
... def __init__(self, name, rank):
... self.name = name
... self.rank = rank
... def __eq__(self, other):
... return self.name == other.name
The Record type defines equality in terms of its name attribute which may
be thought of as a record identifier. Each Record also has a rank which
would be useful for ranking records in sorted order. The Record type does not
define comparison operators and so can not be stored in a
SortedList
.
>>> alice1 = Record('alice', 1)
>>> bob2 = Record('bob', 2)
>>> carol3 = Record('carol', 3)
Since the rank attribute is intended for ordering records, the key-function presents a tempting but invalid use for ordering records:
>>> get_rank = lambda record: record.rank
>>> sl = SortedList([alice1, bob2, carol3], key=get_rank)
Although the sorted list now appears to work, the requirements have been invalidated. Specifically #3, since it is now possible for equal values to have inequal keys:
>>> bob4 = Record('bob', 4)
>>> bob2 == bob4 # Equal values.
True
>>> get_rank(bob2) == get_rank(bob4)
False
>>> # ^-- Equal values should have equal keys.
>>> bob4 in sl # <-- Here's the problem. BAD!
False
In the example above, bob4 can not be found in sl because although bob2 and bob4 are equal, the corresponding keys are not equal. The mistake is a bit easier to see without the key-function. The key-function defined comparisons between records like so:
>>> class Record(object):
... def __init__(self, name, rank):
... self.name = name
... self.rank = rank
... def __eq__(self, other):
... return self.name == other.name
... def __lt__(self, other):
... return self.rank < other.rank
Written as above, equality between objects is more clearly seen as unrelated to ordering between objects. This is the most common mistake made when using Sorted Containers. The Record type now also violates requirement #1 because equal instances can also be strictly less than each other:
>>> bob2 = Record('bob', 2)
>>> bob4 = Record('bob', 4)
>>> bob2 == bob4
True
>>> bob2 < bob4 # <-- Here's the problem. BAD!
True
In the above example, bob2 and bob4 are equal to each other while bob2 is also strictly less than bob4. The Record type therefore does not have a total ordering. In pseudocode the three requirements for a total ordering are:
If
a <= b and b <= a
thena == b
.And if
a <= b and b <= c
thena <= c
.And
a <= b or b <= a
.
Intuitively, a total ordering is best understood through integer and string types. Each of these common types defines a total ordering and can be used for comparisons in Sorted Containers. Of the built-in types in Python, these have a total ordering:
Integers
Strings and bytes.
All floating-point numbers except
float('nan')
.Sequences like list and tuple of values with a total ordering.
There are also some built-in Python types and values which lack a total ordering:
Sets and frozensets (not a total ordering).
float('nan')
(not a total ordering).Mapping types (not comparable, changed in Python 3).
The best way to fix the Record type is to define equality and comparison in terms of the same fields.
>>> class Record(object):
... def __init__(self, name, rank):
... self.name = name
... self.rank = rank
... def _cmp_key(self):
... return (self.rank, self.name)
... def __eq__(self, other):
... return self._cmp_key() == other._cmp_key()
... def __lt__(self, other):
... return self._cmp_key() < other._cmp_key()
The example above uses a comparison-key method named _cmp_key and the
lexicographical ordering semantics of tuples to define equality and comparison
in terms of the rank and name fields. It would also be possible to omit the
Record.__lt__ method and instead use a key-function which called
record._cmp_key(). But the key-function will take more memory and be slower
as it uses a SortedKeyList
which stores a reference to the key for
every value.
The Record example above is complicated by equality defined as those objects with equal names. When using equality inherited from object, that is, defined in terms of instance identity, the situation is simplified. For example:
>>> class Record(object):
... def __init__(self, name, rank):
... self.name = name
... self.rank = rank
... def __lt__(self, other):
... return self.rank < other.rank
The Record type now can be stored in SortedList
because equality
based on instance identity guarantees the rank attributes are equal so long
as its value has a total ordering.
>>> alice1 = Record('alice', 1)
>>> bob2 = Record('bob', 2)
>>> carol3 = Record('carol', 3)
>>> bob4 = Record('bob', 4)
>>> bob2 != bob4 # <-- Different instances, so unequal. GOOD!
True
>>> sl = SortedList([alice1, bob2, carol3, bob4])
>>> bob2 in sl
True
>>> bob4 in sl
True
The above example displays that all three requirements are followed:
The comparison key, rank, is an integer, which has a total ordering.
The comparison key does not change while the value is stored in the sorted container.
Equal Record instances have equal rank attributes based on instance identity.
These examples can be summarized in two pieces of advice:
If the data type defines its own equality, that is
__eq__
, then make sure the comparison methods or key-function define a total ordering and equal values have equal comparison keys.If the data type does not define equality then it inherits equality from object and uses instance identity. Compare objects using comparison methods like
__lt__
or the key-function. The compared values must have a total ordering.
Another invalid use case of SortedKeyList
occurs when the key-function
returns a different comparison key for a given value while the value is stored
in the sorted container.
>>> from random import random
>>> key_func = lambda value: random()
>>> sl = SortedList([1, 2, 3, 4, 5], key=key_func)
>>> # ^-- Corrupt sorted list.
>>> 3 in sl
False
>>> key_func(1) == key_func(1) # <-- Here's the problem. BAD!
False
The example above violates two requirements of sorted lists: equal values must have equal keys and the key function must return the same key for the given value while the value is stored in the sorted container. The random key-function does not reliably return the same key for a given value. The order of values in a sorted container can be made arbitrary and reliable by changing the key-function like so:
>>> from random import seed
>>> def key_func(value):
... "Key-function for arbitrary but reliable order."
... seed(value)
... return random()
Another way the problem above manifests itself is when the comparison key of an
object is mutated while the object is stored in the SortedList
. Using
the Record definition from above:
>>> class Record(object):
... def __init__(self, name, rank):
... self.name = name
... self.rank = rank
... def __lt__(self, other):
... return self.rank < other.rank
>>> alice1 = Record('alice', 1)
>>> bob2 = Record('bob', 2)
>>> carol3 = Record('carol', 3)
>>> sl = SortedList([alice1, bob2, carol3])
>>> bob2 in sl
True
Python objects are typically mutable so while the above example works and is
correct, there’s nothing preventing a modification to the rank of a Record.
If the rank is modified while the Record instance is stored in the
SortedList
, then the container is corrupted.
>>> bob2.rank = 20 # <-- Here's the problem. BAD!
>>> bob2 in sl
False
The Record instance bob2 can no longer be found in the SortedList
because modifying the rank changed its sort order position without updating
its position in sl. To modify the sorted order position, remove
the value, update it, and then add
the value back again.
Similar limitations also apply to Python’s dict data type which can be corrupted by modifying the hash of a key while the item is stored in the dict. The hashing protocol also requires that equal keys have equal hashes.
One final caveat: indexing a sorted list is fast but not as fast as indexing Python’s built-in list data type. The runtime complexity for indexing a sorted list is O(log(n)) while it is O(1) for Python’s built-in list data type. Indexing a sorted list requires building and maintaining a positional index which is not built if not necessary. The index is fast and light on memory overhead but avoid positional indexing if able. Indexing at the front or back, indexes like 0 and -1, is optimized and will not require the positional index.
Sorted Dict¶
Built atop Python’s built-in dict data type and SortedList
is the
mutable mapping data type SortedDict
. Sorted dict keys are maintained
in sorted order. The design of SortedDict
is simple: sorted dict
inherits from dict to store items and maintains a sorted list of
keys. SortedDict
keys must be hashable and comparable. The hash and
total ordering of keys must not change while they are stored in the
SortedDict
.
>>> from sortedcontainers import SortedDict
>>> sd = SortedDict()
Items may be added to a SortedDict
using
SortedDict.__setitem__()
, SortedDict.update()
or
SortedDict.setdefault()
. When doing so, the keys remain sorted.
>>> sd['e'] = 5
>>> sd['b'] = 2
>>> sd
SortedDict({'b': 2, 'e': 5})
>>> sd.update({'d': 4, 'c': 3})
>>> sd
SortedDict({'b': 2, 'c': 3, 'd': 4, 'e': 5})
>>> sd.setdefault('a', 1)
1
>>> sd
SortedDict({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
Several methods may be used to remove items by key or by index. The methods
SortedDict.__delitem__()
and SortedDict.pop()
remove items by
key. And the method SortedDict.popitem()
removes items by index.
>>> del sd['d']
>>> sd.pop('c')
3
>>> sd.popitem(index=-1)
('e', 5)
>>> sd
SortedDict({'a': 1, 'b': 2})
Because SortedDict
uses both dict and SortedList
, there are
many methods for looking up keys from each type. The mapping interface supports
SortedDict.__getitem__()
, SortedDict.__contains__()
,
SortedDict.get()
, and SortedDict.peekitem()
.
>>> sd['b']
2
>>> 'c' in sd
False
>>> sd.get('z') is None
True
>>> sd.peekitem(index=-1)
('b', 2)
The sequence interface supports the same lookup methods as those provided by
SortedList
.
>>> sd.bisect_right('b')
2
>>> sd.index('a')
0
>>> list(sd.irange('a', 'z'))
['a', 'b']
The keys, items, and values views also support both set semantics and sequence semantics with optimized methods for lookups by index.
>>> keys = sd.keys()
>>> keys[0]
'a'
>>> items = sd.items()
>>> items[-1]
('b', 2)
>>> values = sd.values()
>>> values[:]
[1, 2]
The SortedDict
initializer supports one additional position argument.
The positional argument must come before any sequences, mappings, or keyword
arguments used to initialize the items in a SortedDict
. The first
positional argument is an optional callable key-function used to extract a
comparison key from the keys of the SortedDict
. For example, to
construct a SortedDict
with integer keys in descending order:
>>> sd = SortedDict(neg, enumerate('abc', start=1))
>>> sd
SortedDict(<built-in function neg>, {3: 'c', 2: 'b', 1: 'a'})
>>> keys = sd.keys()
>>> list(keys)
[3, 2, 1]
Because SortedDict
inherits from dict, the __missing__ method can
be used to give missing keys a default value. Customizing the __missing__
method creates a kind of defaultdict like that in the collections module.
>>> class DefaultSortedDict(SortedDict):
... def __missing__(self, key):
... value = 0
... self[key] = value
... return value
>>> dsd = DefaultSortedDict()
>>> dsd['z']
0
Refer to the Sorted Dict documentation for additional parameters, more examples, and descriptions of runtime complexity.
Sorted Set¶
Standing on the shoulder’s of Python’s built-in set data type and
SortedList
is the mutable set data type SortedSet
. Sorted set
values are maintained in sorted order. The design of SortedSet
is
simple: sorted set uses Python’s built-in set for set-operations and
maintains a sorted list of values. Values stored in a SortedSet
must
be hashable and comparable. The hash and total ordering of values must not
change while they are stored in the SortedSet
.
>>> from sortedcontainers import SortedSet
>>> ss = SortedSet()
SortedSet
implements optimized versions of the core mutable set
methods: SortedSet.__contains__()
, SortedSet.add()
,
SortedSet.update()
, SortedSet.discard()
, and the more strict
SortedSet.remove()
.
>>> ss.add('c')
>>> ss.add('a')
>>> ss.add('b')
>>> ss
SortedSet(['a', 'b', 'c'])
>>> 'c' in ss
True
>>> ss.discard('a')
>>> ss.remove('b')
>>> _ = ss.update('def')
>>> ss
SortedSet(['c', 'd', 'e', 'f'])
SortedSet
also behaves like a sequence with
SortedSet.__getitem__()
and SortedSet.__reversed__()
methods. And
includes the mutable sequence methods SortedSet.__delitem__()
and
SortedSet.pop()
.
>>> ss[0]
'c'
>>> list(reversed(ss))
['f', 'e', 'd', 'c']
>>> del ss[0]
>>> ss.pop(index=-1)
'f'
>>> ss
SortedSet(['d', 'e'])
Set-operation methods like SortedSet.difference()
,
SortedSet.intersection()
, SortedSet.symmetric_difference()
, and
SortedSet.union()
, as well as their in-place counterparts and operators
are all supported.
>>> abcd = SortedSet('abcd')
>>> cdef = SortedSet('cdef')
>>> abcd.difference(cdef)
SortedSet(['a', 'b'])
>>> abcd.intersection(cdef)
SortedSet(['c', 'd'])
>>> abcd.symmetric_difference(cdef)
SortedSet(['a', 'b', 'e', 'f'])
>>> abcd.union(cdef)
SortedSet(['a', 'b', 'c', 'd', 'e', 'f'])
>>> abcd | cdef
SortedSet(['a', 'b', 'c', 'd', 'e', 'f'])
>>> abcd |= cdef
>>> abcd
SortedSet(['a', 'b', 'c', 'd', 'e', 'f'])
Several SortedList
methods are also exposed on SortedSet
objects like SortedList.bisect()
, SortedList.index()
,
SortedList.irange()
, and SortedList.islice()
just as with
SortedDict
.
>>> ss = SortedSet('abcdef')
>>> ss.bisect('d')
4
>>> ss.index('f')
5
>>> list(ss.irange('b', 'e'))
['b', 'c', 'd', 'e']
>>> list(ss.islice(-3))
['d', 'e', 'f']
Like SortedList
an additional key parameter can be used to
initialize the SortedSet
with a callable which is used to extract a
comparison key.
>>> ss = SortedSet([1, 2, 3], key=neg)
>>> ss
SortedSet([3, 2, 1], key=<built-in function neg>)
Sorted set comparisons use subset and superset relations. Two sorted sets are equal if and only if every element of each sorted set is contained in the other (each is a subset of the other). A sorted set is less than another sorted set if and only if the first sorted set is a proper subset of the second sorted set (is a subset, but is not equal). A sorted set is greater than another sorted set if and only if the first sorted set is a proper superset of the second sorted set (is a superset, but is not equal). The comparison semantics of sorted sets do not define a total ordering.
Refer to the Sorted Set documentation for additional parameters, more examples, and descriptions of runtime complexity.
Migrating¶
The performance comparison page documents several alternative implementations of the sorted types described. Some of those projects have deprecated themselves and now recommend Sorted Containers instead. When migrating from other projects, there are a couple of things to keep in mind.
Sorted Containers went through a major version change between version one and version two. The goal of the change was to adopt Python 3 semantics wherever possible:
Several
SortedList
methods now raiseNotImplementedError
:SortedList.__setitem__()
,SortedList.append()
, andSortedList.extend()
. UseSortedList.add()
orSortedList.update()
instead.SortedDict
has adopted the Python 3 semantics of dict views as the default. TheSortedDict.keys()
,SortedDict.items()
, andSortedDict.values()
methods now return views with support for optimized indexing. The view objects also implement set and sequence protocol methods. Prefer using theSortedDict
methods directly for better performance.Some type and parameter names were changed. SortedListWithKey was renamed to SortedKeyList but an alias remains for compatibility. Several methods which accepted a val parameter now accept value for better readability.
The Sorted Containers Release History documents all the changes made in every version in the history of the project. The Version 2 release notes detail all the changes made.
The blist project remains the most similar as its API was the original
inspiration for Sorted Containers. The main difference has always
been the SortedList.pop()
method. The blist project pops the first
element by default while Sorted Containers pops the last element
and matches the API of Python’s built-in list data type. The sorted dict data
type in blist also continues to use the old Python 2 semantics for dict
views.
The bintrees project now recommends using Sorted Containers
instead and has stopped development. The API differs significantly but the
supported functionality is the same. The Tree object in bintrees is most
similar to SortedDict
. All of the mapping methods and set methods are
available using either SortedDict
or SortedKeysView
. The
slicing methods and previous/successor iterator methods correspond to
SortedDict.irange()
and the heap methods correspond to indexing with
views like SortedKeysView.__getitem__()
.
The banyan project has data types similar to SortedDict
and
SortedSet
. Most methods have a direct counterpart in Sorted
Containers. But the frozen sorted dict and frozen sorted set data types
have no direct comparison. The functionality of hashing can be implemented by
inheriting and defining the __hash__ method. Do so with care, because the
instance is still mutable. The banyan project also supports tree
augmentation which can be useful in implementing interval trees or segment
trees. There is no support for tree argumentation in Sorted
Containers.