Although I have written Python for several years, I am still not very familiar with its idioms. In particular, I don't quite understand how to compare and sort the objects in Python. That's the reason why I am writing this article.
The rest of this article consists of following sections:
- Comparison Operators -- lists the comparison operators.
- Built-in Types -- lists the semantics of the comparison operators corresponding to the operand types:
- User-defined Classes -- discusses how to define our own comparison operators for the user-defined classes.
- Sort the List -- will talk about how to customize the ordering while sorting the elements in a list.
Comparison Operators
There are several comparison operators in Python. All of them are listed in the following table:
Operator | Semantics | Override Method |
---|---|---|
< |
Strictly less than | __lt__ |
<= |
Less than or equal | __le__ |
> |
Strictly greater than | __gt__ |
>= |
Greater than or equal | __ge__ |
== |
Equal | __eq__ |
!= |
Not equal | __ne__ |
is |
Object identity | Not overridable |
is not |
Negated object identity | Not overridable |
The six items from the beginning are the comparison operators that determine the order of the values.
The last two items at the end are the identity operators which can tell whether the objects on both sides are the same, i.e. both the expressions are evaluated to same object.
For what it's worth, it is a bad idea to compare int
or float
values with is
or is not
operator. You will confront some
non-intuitive results. We will find out more examples in the following
section.
Built-in Types
There are several built-in types in Python 3. Most of them have reasonable
comparison ordering. However, notice that complex
numbers,
dict
, and range
are not ordered.
Category | Types | Ordering |
---|---|---|
Numberic | bool , int ,
float |
Real number ordering |
String | str |
Unicode lexicographical |
Sequential | tuple , list ,
bytearray , bytes |
Element-wise |
Set | set , frozenset |
Subset |
Unordered | complex ,
dict , range ,
NoneType |
N/A (Equality comparable) |
Numbers: Bool, Int, and Float
The number types in Python have reasonable comparison rules:
- For the
bool
values,False
is equal to0
, andTrue
is equal to1
. - The
int
values are mapped to corresponding real numbers. - Most
float
values are mapped to corresponding real numbers. Here are some special rules:float('nan')
is a special floating point number which is standing for not a number. NAN is not equal to all of the numbers. Besides,<
,<=
,>
, and>=
will returnFalse
as well.float('inf')
will be treated as infinite. It will be greater than other numbers.float('-inf')
will be treated as negative infinite. It will be less than other numbers.
These examples demonstrate the simple cases with implicit type conversions:
>>> 1 < 2
True
>>> 1.0 < 2.0
True
>>> 1 < 1.2
True
>>> 1 == 1.0
True
>>> -0.0 == 0
True
>>> -0.0 == 0.0
True
>>> False == 0.0
True
>>> True == 1
True
Here are the examples for float('nan')
:
>>> nan = float('nan')
>>> nan == nan
False
>>> nan != nan
True
>>> nan < 0
False
>>> nan > 0
False
>>> nan <= 0
False
>>> nan >= 0
False
Here are the examples for float('inf')
and float('-inf')
:
>>> inf = float('inf')
>>> inf == inf
True
>>> inf != inf
False
>>> inf < inf
False
>>> inf < nan
False
>>> inf > nan
False
>>> ninf = float('-inf')
>>> ninf < ninf
False
>>> ninf < nan
False
>>> ninf > nan
False
Here are the examples which compare float('inf')
with the maximum
floating point number on the system and a huge integer numbers.
>>> import sys
>>> mx = sys.float_info.max
>>> huge = (1 << 5000)
>>> mx < huge
True
>>> huge < inf
True
>>> mn = sys.float_info.min
>>> -huge < mn
True
>>> ninf < -huge
True
Please notice that the floating point arithmetics are performed in binary. As a result, the outcome might be different from the one computed in decimal:
>>> 0.1 + 0.2 > 0.3
True
>>> 0.1 + 0.2
0.30000000000000004
By the way, DON'T compare int
or float
values with
identity operators (is
or is not
.) Some implementations
will reuse the objects for small numbers and give non-intuitive,
unexpected, and surprising results:
>>> (1 + 2) is 3
True
>>> (1 << 4) is 16
True
>>> (1 << 30) is 0x40000000
False
>>> (1 << 30) == 0x40000000
True
Complex Numbers
We can compare the complex numbers with equality or inequality operators:
>>> complex(1) == 1.0
True
>>> complex(1.4j) == 1.4j
True
>>> complex(1.4j) == 1.4
False
>>> complex(2j) != 2
True
However, the complex
numbers are unordered:
>>> 1 < 1j
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < complex()
Even if their imaginary parts are equal to zero, they are unordered either:
>>> complex(4) < complex(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: complex() < complex()
String
The str
objects are ordered by lexicographical ordering. In the other
words, the characters in the strings are element-wise compared. The characters
are encoded in unicode and the comparison is case sensitive. If two strings
have common prefix, then the shorter one is less than the longer one.
>>> 'a' < 'b'
True
>>> 'aa' < 'b'
True
>>> 'a' < 'aa'
True
>>> '' < 'a'
True
>>> 'aa' < 'aa'
False
>>> 'a' == 'A'
False
Sequential Types: Tuple, List, ByteArray, and Bytes
The sequential types, such as tuple
, list
, bytearray
and bytes
, are element-wise compared. Like str
objects, if two
objects have common prefix, then the shorter one is less than the longer one.
Here are the examples for tuple
:
>>> () < (0,)
True
>>> (0, 1) < (0, 1, 2)
True
>>> (0,) < (1,)
True
>>> (0, 1) == (0, 1)
True
>>> (0, 1) == (0, 1, 2)
False
Here are the examples for list
:
>>> [] < [0,]
True
>>> [0, 1] < [0, 1, 2]
True
>>> [0] < [1]
True
>>> [0, 1] == [0, 1]
True
>>> [0, 1] == [0, 1, 2]
False
Here are the examples for bytes
:
>>> b'' < b'a'
True
>>> b'ab' < b'abc'
True
>>> b'a' < b'b'
True
>>> b'ab' == b'ab'
True
>>> b'ab' == b'abc'
False
We are allowed to compare bytearray
with bytes
:
>>> bytearray(b'a') == b'a'
True
>>> bytearray(b'aa') > b'a'
False
We are also allowed to compare tuple
and list
with equality and
inequality operations:
>>> [0, 1] == (0, 1)
True
>>> [0, 1] != (0,)
True
But, unfortunately, we can't get the order between tuple
object and
list
object without type conversion:
>>> [0, 1] < (0, 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: list() < tuple()
>>> (0, 2) > [0, 1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: tuple() > list()
Sets and FrozenSets
The set
and frozenset
instances are ordered by the subset
relationship. For example,
>>> a = {0, 1}
>>> b = {0}
>>> a < b
False
>>> a <= b
False
>>> a == b
False
>>> a != b
True
>>> a > b
True
>>> a >= b
True
Notice that subset is not a total order. We can check this by adding
2
to b
:
>>> b.add(2)
>>> a < b
False
>>> a <= b
False
>>> a == b
False
>>> a != b
True
>>> a > b
False
>>> a >= b
False
Of course, it is fine to compare set
objects with frozenset
objects:
>>> c = frozenset(a)
>>> c == a
True
>>> a.remove(0)
>>> a < c
True
User-defined Classes
We can declare a new class with a class
statement. For example, we can
define a Date
class in date.py
:
#!/usr/bin/env python3
class Date:
def __init__(self, year, month, date):
self.year = year
self.month = month
self.date = date
def __str__(self):
return '{0}/{1:02}/{2:02}'.format(self.year, self.month, self.date)
def __repr__(self):
return 'Date({0},{1},{2})'.format(self.year, self.month, self.date)
What will happen if we compare two instances of Date
? By default, we
can only compare them with equality/inequality operators and identity
operators:
$ python3
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> a == b
False
>>> a != b
True
>>> a is b
False
>>> a is not b
True
As we can see, the default behavior for equality/inequality operators are equal to the identity operators.
If we compare the objects with other operators, such as less-than operator,
then the TypeError
exception will be raised:
>>> a < b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: Date() < Date()
Customize the Equality and Inequality Operators
We can customize the equality operators by overriding __eq__
:
class Date:
# ... skipped ...
def __eq__(self, other):
return (self.year == other.year and
self.month == other.month and
self.date == other.date)
Let's try out:
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> a == b
True
>>> a == c
False
We can see that the equality operator works as expected.
Let's look at the inequality operator. If you don't have special logic, it
is suggested to define __ne__
in terms of not
and ==
.
For example:
class Date:
# ... skipped ...
def __ne__(self, other):
return not self == other
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> a != b
False
>>> a != c
True
On a side note, we should not assume a == b
is complement with
a != b
. This assumption may not be true in general. Some class may
return True
on both cases.
Remarks: If you override __eq__
, then you should override
__hash__
as well. Otherwise, the object won't be hashable. In the
other words, we can't insert the objects to set
and the objects can't
be used as the key of dict
. Besides, we have to make sure that
a == b
implies hash(a) == hash(b)
; otherwise, the uniqueness
property of the set
will become invalid. Here's the __hash__
function for our Date
class:
class Date:
# ... skipped ...
def __hash__(self):
return self.year * 10000 + self.month * 100 + self.date
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> hash(a)
20150614
>>> set([a, b, c])
{Date(2015,6,14), Date(2015,6,15)}
Customize the Ordering Comparison Operators
We can customize the ordering comparison operators by overriding __lt__
,
__le__
, __gt__
, and __ge__
. They are corresponding to
<
, <=
, >
, and >=
respectively. For example:
class Date:
# ... skipped ...
def __hash__(self):
return self.year * 10000 + self.month * 100 + self.date
def __lt__(self, other):
return hash(self) < hash(other)
In our example, we simply implement __lt__
by comparing the hash value
of the object. You may implement this with a series of integer comparisons and
boolean arithmetics as well. Here are the results:
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> a < b
False
>>> a < c
True
>>> c < a
False
However, you might be surprised that the following code works as expected even
though we haven't overridden __gt__
yet:
>>> a > b
False
>>> c > a
True
It is working because the Python run-time will reflect the comparison. If
(1) the left-hand-side class does not override the corresponding method, or (2)
the corresponding method returns NotImplemented
, then the Python
run-time will swap the arguments and call the reflected method
defined in the right-hand-side class. Here are the mapppings:
Original | Reflects To | Mnemonic |
---|---|---|
__lt__(a, b) |
__gt__(b, a) |
a < b <=> b > a |
__gt__(a, b) |
__lt__(b, a) |
(ditto) |
__le__(a, b) |
__ge__(b, a) |
a <= b <=> b >= a |
__ge__(a, b) |
__le__(b, a) |
(ditto) |
__eq__(a, b) |
__eq__(b, a) |
a == b <=> b == a |
__ne__(a, b) |
__ne__(b, a) |
a != b <=> b != a |
With the reflection rules, it will be sufficient to define a total ordering
class by overriding __eq__
, __ne__
, __lt__
, and
__le__
.
Total Ordering Decorator
To define a class with total ordering, we should override all of
__eq__
, __ne__
, __lt__
, and __le__
. However,
it might be cumbersome to override all of them. It will be simpler to decorate
the class with functools.total_ordering
.
With functools.total_ordering
decorator, overriding __eq__
and
one of __lt__
, __le__
, __gt__
, or __ge__
is
sufficient to define a total ordering class:
import functools
@functools.total_ordering
class Date:
# ... skipped ...
def __hash__(self):
return self.year * 10000 + self.month * 100 + self.date
def __eq__(self, other):
return (self.year == other.year and
self.month == other.month and
self.date == other.date)
def __lt__(self, other):
return hash(self) < hash(other)
>>> from date import Date
>>> a = Date(2015, 6, 14)
>>> b = Date(2015, 6, 14)
>>> c = Date(2015, 6, 15)
>>> (a == b, a != b, a < b, a <= b, a > b, a >= b)
(True, False, False, True, False, True)
>>> (a == c, a != c, a < c, a <= c, a > c, a >= c)
(False, True, True, True, False, False)
Althoguh it is much simpler to define a total ordering class with the decorator, it comes with the performance penalty. If this has been proved to be a performance bottleneck, then it will be better to override six methods directly.
Sort the List
We can sort the list with either a.sort()
or b = sorted(a)
.
The former will sort the associate list and the later will create a new sorted
list from the iteratables (and leave the argument unchanged). Both of them
will sort the elements by less-than comparison (i.e. in ascending order.)
For example,
>>> a = [2, 5, 3, 13, 8, 1]
>>> sorted(a)
[1, 2, 3, 5, 8, 13]
>>> a
[2, 5, 3, 13, 8, 1]
>>> a.sort()
>>> a
[1, 2, 3, 5, 8, 13]
The element type doesn't matter as long as they are pairwise comparable:
>>> sorted([set([1]), frozenset([1,2,3]), set(), set([4, 2, 3, 1])])
[set(), {1}, frozenset({1, 2, 3}), {1, 2, 3, 4}]
Sort the List with Customized Ordering
If we want to sort the elements in different order, we can specify the key function to extract the key from the elements.
First, let's sort the list of tuples with sorted()
directly:
>>> xs = [(0, 3), (1, 1), (0, 2), (2, 3), (1, 3)]
>>> sorted(xs)
[(0, 2), (0, 3), (1, 1), (1, 3), (2, 3)]
As what we have expected, the tuple are pairwisely compared.
Second, let's sort the tuples by the first element:
>>> xs = [(0, 3), (1, 1), (0, 2), (2, 3), (1, 3)]
>>> sorted(xs, key=lambda x: x[0])
[(0, 3), (0, 2), (1, 1), (1, 3), (2, 3)]
As we can see, the tuples with x[0]
equals to 0
comes before
the tuples with x[0]
equals to 1
or 2
. However,
(0, 3)
and (0, 2)
are not reordered. This is due to the
stable guarantee of the sorting algorithm.
Third, the key function can return a new tuple as well. For example, we can compare the second element first and then compare the first element.
>>> xs = [(0, 3), (1, 1), (0, 2), (2, 3), (1, 3)]
>>> sorted(xs, key=lambda x: (x[1], x[0]))
[(1, 1), (0, 2), (0, 3), (1, 3), (2, 3)]
However, returning a new tuple might result in performance penalty since
- we have to allocate new tuples from the heap, and
- the element-wise comparison for tuples might be more expensive than necessary.
It will be better to return primitive types if it is possible. For example, if
it is guaranteed that both x[0]
and x[1]
are less than 1000,
then we can simply sort the tuples with:
>>> xs = [(0, 3), (1, 1), (0, 2), (2, 3), (1, 3)]
>>> sorted(xs, key=lambda x: (x[1] * 1000 + x[0]))
[(1, 1), (0, 2), (0, 3), (1, 3), (2, 3)]
Wrap Up
In this article, we have discussed the Comparison Operators that are available in the Python programming language. Besides, we have discussed the semantics of the comparison operators for the Built-in Types and how to create User-defined Classes with specific ordering. Finally, we have mentioned how to Sort the List with customized order by extracting the keys.
Hope you enjoy this post!