Python data structures heapq and heapify

I came across a python beginner tutorial that mentioned heapq and heapify. Because I never heard of those guys before and because like I said it was part of a beginner tutorial I decided to look into those and here is what I learned.

Let’s suppose you have a list of simple numbers, strings or even dictionaries, you want to find the N smallest or largest elements from that list, heapq is your friend in this case.

>>> import heapq
>>> nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
>>> heapq.nsmallest(3, nums)
[-4, 1, 2]
>>> heapq.nlargest(2, nums)
[42, 37]

But this also works with a more complicated data structure such as dictionaries

>>> portfolio = [
       {'name': 'IBM', 'shares': 100, 'price': 91.1},
       {'name': 'AAPL', 'shares': 50, 'price': 543.22},
       {'name': 'FB', 'shares': 200, 'price': 21.09},
       {'name': 'HPQ', 'shares': 35, 'price': 31.75},
       {'name': 'YHOO', 'shares': 45, 'price': 16.35},
       {'name': 'ACME', 'shares': 75, 'price': 115.65}
    ]
>>> cheap = heapq.smallest(3, portfolio, key=lambda s: s['price'])
>>> cheap
[{'name': 'YHOO', 'shares': 45, 'price': 16.35}, {'name': 'FB', 'shares': 200, 'price': 21.09}, {'name': 'HPQ', 'shares': 35, 'price': 31.75}]

In the case of a simple search where the number of elements to retrieve is small compare to the number of elements in the list, use heapify

>>> nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
>>> heap = list(nums)
>>> heapq.heapify(heap)
>>> heapq[0:2]
[-4, 1]

It’s also possible to use heappop to find one by one the smallest elements. This operation is a O(logN) operation.

To find the 3 smallest numbers for example

>>> heapq.heappop(heap)
-4
>>> heapq.heappop(heap)
1
>>> heapq.heappop(heap)
2

Other solution to retrieve the min and max are

  • using min() and max()
  • sorted(items)[:N] or sorted(items)[-N:]

heappush, heappop

heappush provides a way to “sort” or “prioritize” the elements by giving it a 2-elements tuple where the first arguments is the priority key and the second the item itself. Here is in action.

import heapq

class PriorityQueue(object):
    def __init__(self):
        self._queue = []
        self._index = 0
    def push(self, item, priority):
        heapq.heappush(self._queue, (-priority, self._index, item))
        self._index += 1
    def pop(self):
        return heapq.heappop(self._queue)[-1]
class Item(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return 'Item({!r})'.format(self.name)
>>> q = PriorityQueue()
>>> q.push(Item("Barcelona"), 1)
>>> q.push(Item("Manchester"), 4)
>>> q.push(Item("PSG"), 10)
>>> q.push(Item("LA Galaxy"), 0)
>>> q.pop()
Item('PSG')
>>> q.pop()
Item('Manchester')

It’s important to notice that the push and pop operations have a O(logN) complexity (N is the number of items in the heap) so they are fairly efficient.

Links