Hashable objects in Python

Hashable objects are the foundation of dictionaries and here is how they work.

Principle

First, what is a dictionary? A dictionary is a hash table. A data structure in python that map keys to values and that implements a hash function. This function compute the index to an array of slots. This function will use the key and then return the index where the value is stored. All immutable data types in Python come with a buil-in hash function called __hash__

>>> a = 10
>>> a.__hash__()
10
>>> b = 10.0
>>> b.__hash__()
10.0
>>> c = 10.2
>>> c.__hash__()
3454543454554
>>> d = 10.2
>>> d.__hash__()
3454543454554
>>> e = "10"
>>> e.__hash__()
-4577622685821588571
>>> f = (1, 2, 3)
>>> f.__hash__()
34644454456455
>>> g = [1, 2, 3]         # no immutable object
>>> g.__hash__()       
TypeError: 'NoneType' object is not callable

Because g in our previous example is not hashable, list can not be used as keys in dictionaries (just like dictionaries can not be keys in a dictionary).

When using a tuple as key, please note that there are two ways to retrieve the element.

>>> key = (1, 2, 3)
>>> my_dico = {key: 10}
>>> my_dico[key]
10
>>> my_dico[(1, 2, 3)]
10
>>> my_dico[1, 2, 3]
10

But what happens if 2 keys have the same hash values?

>>> var1 = 'a'
>>> var1.__hash__()
12345678910
>>> var2 = 12345678910
>>> var2.__hash__()
12345678910
>>> var1.__hash__() == var2.__hash__()
True

>>> var3 = {var1: 'var1'}
>>> var3[var2] = 'var2'
>>> var3
{'a': 'var1', 12345678910: 'var2'}

Good news, python is smart enough to figure out that those 2 keys should be different. That means python does not only uses the hash function to create the unique id.

Working with classes

It’s possible to overwrite the __hash__ method when defining our class but again, what will happen if two instances have the same hash values. Let’s see.

>>> class MyClass:
       def __init__(self, value):
          self.value = value
>>> my_obj_1 = MyClass(1)
>>> my_obj_1.__hash__()
3948549493045959
>>> my_obj_2 = MyClass(2)
>>> my_obj_2.__hash__()
-3903949858848444

# let's overwrite the hash method
>>> class MyNewClass(MyClass):
       def __hash__(self):
          return int(self.value)
>>> my_new_obj_1 = MyNewClass(1)
>>> my_new_obj_2 = MyNewClass(2)
>>> my_new_obj_1.__hash__() == my_new_obj_2.__hash__()
True

# now let's use those as keys in a dictionary
>>> my_dico = {my_new_obj_1: 'my_new_obj_1'}
>>> my_dico[my_new_obj_2] = 'my_new_obj_2'
>>> my_dico
{my_new_obj_1: 'my_new_obj_1', my_new_obj_2: 'my_new_obj_2'}

As you can see, again, Python is smart enough to see that we really want to have those keys different. Is it because the 2 keys are different objects after all?

>>> my_new_obj_1 == my_new_obj_2
False

So let’s modify our class and force it to return True when 2 instances have the same parameter value

class MyClass:
    def __init__(self, var):
        self.var = var

    def __hash__(self):
        return int(self.var)

    def __eq__(self, other):
        return other.var == self.var

And now, if we compare them we should get True if they have been initialized with the same value.

>>> var1 = MyClass(1)
>>> var2 = MyClass(2)
>>> var3 = MyClass(1)
>>> var1 == var2
False
>>> var1 == var3
True

Sounds good, so var1 and var3 have the same hash value and are logically equal. Let’s use them again in our dictionary to see how they behave.

>>> my_dico = {var1: 'var1'}
>>> my_dico[var3] = 'var3'
>>> my_dico
{var3: 'var3'}

Finally we see now what happens. Python looks at the hash value but also at the value of the key. If they both match, the key is identical.

Links