python has two different built-in data types that represent an ordered sequence of items, list and tuple .
Now you might know that the difference between a tuple and a list is that a tuple is immutable . So you can do this to a list:
# You can mutate a list a = [1, 2, 3] a.append(4) print(a) # this displays [1, 2, 3, 4] a[1] = 5 print(a) # this now displays [1, 5, 3, 4] # But you can't do it for tuples b = (1, 2, 3) b[1] = 5 # this raises a TypeErrorSo you can change elements of a list and also add new elements to it. But you can't do it for a tuple.
But have you ever wondered why this is useful? Can't we just use lists everywhere instead of tuples? Why do we need tuples? Or frozensets?
Understanding hash tablesTo understand this let's talk about a data structure called a hash table.
The abstraction of a hash table is a key-value pair . It is a data structure that associates a key with a value.
Does that sound like a Python dict ? That's because Python dicts are implemented using hash tables.
There are many possible implementations of a key-value pair (such as binary trees), but a hash table is particularly efficient, in that given a key, you can theoretically access its associated value in O(1) (constant) time.
Let's talk about how a hash table is implemented in a lower level language like C.
Are you familiar with C arrays? An array is a contiguous sequence of variables. You can think of them like Python lists:
int array[5] = {1, 2, 3, 4, 5}; printf("%d", array[3]); // This prints "4" (C arrays are zero-indexed just like Python) One way to think of hash tables is a gigantic array. Notice how we can access elements of an array using an index (i.e, array[index] )? If we can map a key to a unique index, given a key you can access its value in constant time (O(1)).For example, you can map the string 'ladida' to the index 2334. So given the key 'ladida' you can simply access index 2334 of the array to get the associated value for the key 'ladida' .
The mapping of a value to an index is done via a computation called a hash function .
Basically, you can conceptually think of hash functions as functions that compute an integer given an object supplied as a key. This allows us to put an associated value in the gigantic array. That integer number hopefully turns out to uniquely map to an index in the array, but sometimes collisions can occur (depending on the array size we use to implement the hash table).
But anyway, that's a topic for another blog post. Hash functions and collision handling are too deep to cover in a single post (I recently talked to someone who did a PhD on hash functions!).
Hashable data typesOkay, so why are we talking about this?
In Python, mutable data types like lists are unhashable . You can try doing this quickly in the shell:
>>> a = [1, 2, 3] >>> d = {a: 'hello'} Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'list'And then if you try doing it with a tuple, it works fine:
>>> a = (1, 2, 3) >>> d = {a: 'hello'} >>> d[(1, 2, 3)] 'hello'The reason this happens is that a hash function needs to differentiate between objects (e.g. (1, 2, 3) and (1, 2, 3, 4) to be useful. When an object is mutable, you can't reliably compute a hash function because the object the dict holds a reference to might change.
That's why we use immutable data types like tuples and frozensets , and strings as keys in Python dictionaries, so that we can reliably compute a hash function for these data types.
SummaryNow you know how immutable data types like tuples and frozensets can be useful as keys in Python dictionaries.
When you want to use a collection of objects as a key to a Python dictionary, using a hashable data type is necessary.
All built-in immutable data types in Python are hashable, whereas built-in mutable data types are unhashable.
Another fun fact: elements of Python sets also need to be hashable.
For more detailed information, refer to the Python documentation on the topic [1].Hope this helps!
[1] Built-in types, Python 3 documentation