Comparison and Sorting in Python3

Although I have written Python for several years, I am still not very familiar with its idioms. In particular, I don't quite understand how to compare and sort the objects in Python. That's the reason why I am writing this article.

The rest of this article consists of following sections:

  1. Comparison Operators -- lists the comparison operators.
  2. Built-in Types -- lists the semantics of the comparison operators corresponding to the operand types:
    1. Numbers: Bool, Int, and Float
    2. Complex Numbers
    3. String
    4. Sequential Types: Tuple, List, ByteArray, and Bytes
    5. Sets and FrozenSets
  3. User-defined Classes -- discusses how to define our own comparison operators for the user-defined classes.
  4. Sort the List -- will talk about how to customize the ordering while sorting the elements in a list.

Comparison Operators

There are several comparison operators in Python. All of them are listed in the following table:

Operator Semantics Override Method
< Strictly less than __lt__
<= Less than or equal __le__
> Strictly greater than __gt__
>= Greater than or equal __ge__
== Equal __eq__
!= Not equal __ne__
is Object identity Not overridable
is not Negated object identity Not overridable

The six items from the beginning are the comparison operators that determine the order of the values.

The last two items at the end are the identity operators which can tell whether the objects on both sides are the same, i.e. both the expressions are evaluated to same object.

For what it's worth, it is a bad idea to compare int or float values with is or is not operator. You will confront some non-intuitive results. We will find out more examples in the following section.

Built-in Types

There are several built-in types in Python 3. Most of them have reasonable comparison ordering. However, notice that complex numbers, dict, and range are not ordered.

Category Types Ordering
Numberic bool, int, float Real number ordering
String str Unicode lexicographical
Sequential tuple, list, bytearray, bytes Element-wise
Set set, frozenset Subset
Unordered complex, dict, range, NoneType N/A (Equality comparable)

Numbers: Bool, Int, and Float

The number types in Python have reasonable comparison rules:

  • For the bool values, False is equal to 0, and True is equal to 1.
  • The int values are mapped to corresponding real numbers.
  • Most float values are mapped to corresponding real numbers. Here are some special rules:
    • float('nan') is a special floating point number which is standing for not a number. NAN is not equal to all of the numbers. Besides, <, <=, >, and >= will return False as well.
    • float('inf') will be treated as infinite. It will be greater than other numbers.
    • float('-inf') will be treated as negative infinite. It will be less than other numbers.

These examples demonstrate the simple cases with implicit type conversions:

>>> 1 < 2
True
>>> 1.0 < 2.0
True
>>> 1 < 1.2
True
>>> 1 == 1.0
True
>>> -0.0 == 0
True
>>> -0.0 == 0.0
True
>>> False == 0.0
True
>>> True == 1
True

Here are the examples for float('nan'):

>>> nan = float('nan')
>>> nan == nan
False
>>> nan != nan
True
>>> nan < 0
False
>>> nan > 0
False
>>> nan <= 0
False
>>> nan >= 0
False

Here are the examples for float('inf') and float('-inf'):

>>> inf = float('inf')
>>> inf == inf
True
>>> inf != inf
False
>>> inf < inf
False
>>> inf < nan
False
>>> inf > nan
False
>>> ninf = float('-inf')
>>> ninf < ninf
False
>>> ninf < nan
False
>>> ninf > nan
False

Here are the examples which compare float('inf') with the maximum floating point number on the system and a huge integer numbers.

>>> import sys
>>> mx = sys.float_info.max
>>> huge = (1 << 5000)
>>> mx < huge
True
>>> huge < inf
True
>>> mn = sys.float_info.min
>>> -huge < mn
True
>>> ninf < -huge
True

Please notice that the floating point arithmetics are performed in binary. As a result, the outcome might be different from the one computed in decimal:

>>> 0.1 + 0.2 > 0.3
True
>>> 0.1 + 0.2
0.30000000000000004

By the way, DON'T compare int or float values with identity operators (is or is not.) Some implementations will reuse the objects for small numbers and give non-intuitive, unexpected, and surprising results:

>>> (1 + 2) is 3
True
>>> (1 << 4) is 16
True
>>> (1 << 30) is 0x40000000
False
>>> (1 << 30) == 0x40000000
True

Complex Numbers

We can compare the complex numbers with equality or inequality operators:

>>> complex(1) == 1.0
True
>>> complex(1.4j) == 1.4j
True
>>> complex(1.4j) == 1.4
False
>>> complex(2j) != 2
True

However, the complex numbers are unordered:

>>> 1 < 1j
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < complex()

Even if their imaginary parts are equal to zero, they are unordered either:

>>> complex(4) < complex(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: complex() < complex()

String

The str objects are ordered by lexicographical ordering. In the other words, the characters in the strings are element-wise compared. The characters are encoded in unicode and the comparison is case sensitive. If two strings have common prefix, then the shorter one is less than the longer one.

>>> 'a' < 'b'
True
>>> 'aa' < 'b'
True
>>> 'a' < 'aa'
True
>>> '' < 'a'
True
>>> 'aa' < 'aa'
False
>>> 'a' == 'A'
False

Sequential Types: Tuple, List, ByteArray, and Bytes

The sequential types, such as tuple, list, bytearray and bytes, are element-wise compared. Like str objects, if two objects have common prefix, then the shorter one is less than the longer one.

Here are the examples for tuple:

>>> () < (0,)
True
>>> (0, 1) < (0, 1, 2)
True
>>> (0,) < (1,)
True
>>> (0, 1) == (0, 1)
True
>>> (0, 1) == (0, 1, 2)
False

Here are the examples for list:

>>> [] < [0,]
True
>>> [0, 1] < [0, 1, 2]
True
>>> [0] < [1]
True
>>> [0, 1] == [0, 1]
True
>>> [0, 1] == [0, 1, 2]
False

Here are the examples for bytes:

>>> b'' < b'a'
True
>>> b'ab' < b'abc'
True
>>> b'a' < b'b'
True
>>> b'ab' == b'ab'
True
>>> b'ab' == b'abc'
False

We are allowed to compare bytearray with bytes:

>>> bytearray(b'a') == b'a'
True
>>> bytearray(b'aa') > b'a'
False

We are also allowed to compare tuple and list with equality and inequality operations:

>>> [0, 1] == (0, 1)
True
>>> [0, 1] != (0,)
True

But, unfortunately, we can't get the order between tuple object and list object without type conversion:

>>> [0, 1] < (0, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: list() < tuple()
>>> (0, 2) > [0, 1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: tuple() > list()

Sets and FrozenSets

The set and frozenset instances are ordered by the subset relationship. For example,

>>> a = {0, 1}
>>> b = {0}
>>> a < b
False
>>> a <= b
False
>>> a == b
False
>>> a != b
True
>>> a > b
True
>>> a >= b
True

Notice that subset is not a total order. We can check this by adding 2 to b:

>>> b.add(2)
>>> a < b
False
>>> a <= b
False
>>> a == b
False
>>> a != b
True
>>> a > b
False
>>> a >= b
False

Of course, it is fine to compare set objects with frozenset objects:

>>> c = frozenset(a)
>>> c == a
True
>>> a.remove(0)
>>> a < c
True

User-defined Classes

We can declare a new class with a class statement. For example, we can define a Date class in date.py:

#!/usr/bin/env python3

class Date:
    def __init__(self, year, month, date):
        self.year = year
        self.month = month
        self.date = date

    def __str__(self):
        return '{0}/{1:02}/{2:02}'.format(self.year, self.month, self.date)

    def __repr__(self):
        return 'Date({0},{1},{2})'.format(self.year, self.month, self.date)

What will happen if we compare two instances of Date? By default, we can only compare them with equality/inequality operators and identity operators:

$ python3
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> a == b
False
>>> a != b
True
>>> a is b
False
>>> a is not b
True

As we can see, the default behavior for equality/inequality operators are equal to the identity operators.

If we compare the objects with other operators, such as less-than operator, then the TypeError exception will be raised:

>>> a < b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: Date() < Date()

Customize the Equality and Inequality Operators

We can customize the equality operators by overriding __eq__:

class Date:
    # ... skipped ...

    def __eq__(self, other):
        return (self.year == other.year and
                self.month == other.month and
                self.date == other.date)

Let's try out:

>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> a == b
True
>>> a == c
False

We can see that the equality operator works as expected.

Let's look at the inequality operator. If you don't have special logic, it is suggested to define __ne__ in terms of not and ==. For example:

class Date:
    # ... skipped ...

    def __ne__(self, other):
        return not self == other
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> a != b
False
>>> a != c
True

On a side note, we should not assume a == b is complement with a != b. This assumption may not be true in general. Some class may return True on both cases.

Remarks: If you override __eq__, then you should override __hash__ as well. Otherwise, the object won't be hashable. In the other words, we can't insert the objects to set and the objects can't be used as the key of dict. Besides, we have to make sure that a == b implies hash(a) == hash(b); otherwise, the uniqueness property of the set will become invalid. Here's the __hash__ function for our Date class:

class Date:
    # ... skipped ...

    def __hash__(self):
        return self.year * 10000 + self.month * 100 + self.date
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> hash(a)
20150614
>>> set([a, b, c])
{Date(2015,6,14), Date(2015,6,15)}

Customize the Ordering Comparison Operators

We can customize the ordering comparison operators by overriding __lt__, __le__, __gt__, and __ge__. They are corresponding to <, <=, >, and >= respectively. For example:

class Date:
    # ... skipped ...

    def __hash__(self):
        return self.year * 10000 + self.month * 100 + self.date

    def __lt__(self, other):
        return hash(self) < hash(other)

In our example, we simply implement __lt__ by comparing the hash value of the object. You may implement this with a series of integer comparisons and boolean arithmetics as well. Here are the results:

>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> a < b
False
>>> a < c
True
>>> c < a
False

However, you might be surprised that the following code works as expected even though we haven't overridden __gt__ yet:

>>> a > b
False
>>> c > a
True

It is working because the Python run-time will reflect the comparison. If (1) the left-hand-side class does not override the corresponding method, or (2) the corresponding method returns NotImplemented, then the Python run-time will swap the arguments and call the reflected method defined in the right-hand-side class. Here are the mapppings:

Original Reflects To Mnemonic
__lt__(a, b) __gt__(b, a) a < b <=> b > a
__gt__(a, b) __lt__(b, a) (ditto)
__le__(a, b) __ge__(b, a) a <= b <=> b >= a
__ge__(a, b) __le__(b, a) (ditto)
__eq__(a, b) __eq__(b, a) a == b <=> b == a
__ne__(a, b) __ne__(b, a) a != b <=> b != a

With the reflection rules, it will be sufficient to define a total ordering class by overriding __eq__, __ne__, __lt__, and __le__.

Total Ordering Decorator

To define a class with total ordering, we should override all of __eq__, __ne__, __lt__, and __le__. However, it might be cumbersome to override all of them. It will be simpler to decorate the class with functools.total_ordering.

With functools.total_ordering decorator, overriding __eq__ and one of __lt__, __le__, __gt__, or __ge__ is sufficient to define a total ordering class:

import functools

@functools.total_ordering
class Date:
    # ... skipped ...

    def __hash__(self):
        return self.year * 10000 + self.month * 100 + self.date

    def __eq__(self, other):
        return (self.year == other.year and
                self.month == other.month and
                self.date == other.date)

    def __lt__(self, other):
        return hash(self) < hash(other)
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> (a == b, a != b, a < b, a <= b, a > b, a >= b)
(True, False, False, True, False, True)
>>> (a == c, a != c, a < c, a <= c, a > c, a >= c)
(False, True, True, True, False, False)

Althoguh it is much simpler to define a total ordering class with the decorator, it comes with the performance penalty. If this has been proved to be a performance bottleneck, then it will be better to override six methods directly.

Sort the List

We can sort the list with either a.sort() or b = sorted(a). The former will sort the associate list and the later will create a new sorted list from the iteratables (and leave the argument unchanged). Both of them will sort the elements by less-than comparison (i.e. in ascending order.) For example,

>>> a = [2, 5, 3, 13, 8, 1]
>>> sorted(a)
[1, 2, 3, 5, 8, 13]
>>> a
[2, 5, 3, 13, 8, 1]
>>> a.sort()
>>> a
[1, 2, 3, 5, 8, 13]

The element type doesn't matter as long as they are pairwise comparable:

>>> sorted([set([1]), frozenset([1,2,3]), set(), set([4, 2, 3, 1])])
[set(), {1}, frozenset({1, 2, 3}), {1, 2, 3, 4}]

Sort the List with Customized Ordering

If we want to sort the elements in different order, we can specify the key function to extract the key from the elements.

First, let's sort the list of tuples with sorted() directly:

>>> xs = [(0, 3), (1, 1), (0, 2), (2, 3), (1, 3)]
>>> sorted(xs)
[(0, 2), (0, 3), (1, 1), (1, 3), (2, 3)]

As what we have expected, the tuple are pairwisely compared.

Second, let's sort the tuples by the first element:

>>> xs = [(0, 3), (1, 1), (0, 2), (2, 3), (1, 3)]
>>> sorted(xs, key=lambda x: x[0])
[(0, 3), (0, 2), (1, 1), (1, 3), (2, 3)]

As we can see, the tuples with x[0] equals to 0 comes before the tuples with x[0] equals to 1 or 2. However, (0, 3) and (0, 2) are not reordered. This is due to the stable guarantee of the sorting algorithm.

Third, the key function can return a new tuple as well. For example, we can compare the second element first and then compare the first element.

>>> xs = [(0, 3), (1, 1), (0, 2), (2, 3), (1, 3)]
>>> sorted(xs, key=lambda x: (x[1], x[0]))
[(1, 1), (0, 2), (0, 3), (1, 3), (2, 3)]

However, returning a new tuple might result in performance penalty since

  1. we have to allocate new tuples from the heap, and
  2. the element-wise comparison for tuples might be more expensive than necessary.

It will be better to return primitive types if it is possible. For example, if it is guaranteed that both x[0] and x[1] are less than 1000, then we can simply sort the tuples with:

>>> xs = [(0, 3), (1, 1), (0, 2), (2, 3), (1, 3)]
>>> sorted(xs, key=lambda x: (x[1] * 1000 + x[0]))
[(1, 1), (0, 2), (0, 3), (1, 3), (2, 3)]

Wrap Up

In this article, we have discussed the Comparison Operators that are available in the Python programming language. Besides, we have discussed the semantics of the comparison operators for the Built-in Types and how to create User-defined Classes with specific ordering. Finally, we have mentioned how to Sort the List with customized order by extracting the keys.

Hope you enjoy this post!