Binary Trees, Heaps

K08 Δομές Δεδομένων και Τεχνικές Προγραμματισμού

Κώστας Χατζηκοκολάκης

Binary trees

A binary tree (δυαδικό δέντρο) is a set of nodes such that:

  • Exactly one node is called the root
  • All nodes except the root have exactly one parent
  • Each node has at most two children
    • and the are ordered: called left and right

Example: a binary tree

Example: a different binary tree

Whether a child is left or right matters.

Terminology

  • path: sequence of nodes traversing from parent to child (or vice-versa)
  • length of a path: number of nodes -1 (= number of “moves” it contains)
  • siblings: children of the same parent
  • descendants: nodes reached by travelling downwards along any path
  • ancestors: nodes reached by travelling upwards towards the root
  • leaf / external node: a node without children
  • internal node: a node with children

Terminology

  • Nodes tree can be arranged in levels / depths:
    • The root is at level 0
    • Its children are at level 1, their children are at level 2, etc.
  • Note: node level = length of the (unique) path from the root to that node
  • height of the tree: the largest depth of any node
  • subtree rooted at a node: the tree consisting of that node and its descendants

Complete binary trees

A binary tree is called complete (πλήρες) if

  • All levels except the last are “full” (have the maximum number of nodes)
  • The nodes at the last level fill the level “from left to right”

Example: complete binary tree

Example: not complete binary tree

Example: not complete binary tree

Level order

Ordering the nodes of a tree level-by-level (and left-to-right in each level).

Nodes of a complete binary tree

  • How many nodes does a complete binary tree have at each level?
  • At most
    • $1$ at level $0$.
    • $2$ at level $1$.
    • $4$ at level $2$.
    • $2^k$ at level $k$ .

Properties of binary trees

  • The following hold:
    • $h + 1 \le n \le 2^{h+1} - 1$
    • $1 \le n_E \le 2^h$
    • $h \le n_I \le 2^h - 1$
    • $\log(n+1) - 1 \le h \le n-1$
  • Where
    • $n$: number of all nodes
    • $n_I$: number of internal nodes
    • $n_E$: number of external nodes (leaves)
    • $h$: height

Properties of complete binary trees

$h \le \log n$

  • Very important property, the tree cannot be too “tall”!
  • Why?
    • Any level $l < h$ contains exactly $2^l$ nodes
    • Level $h$ contains at least one node
    • So $1 + 2 + \ldots + 2^{h-1} + 1 = 2^h \le n$
    • And take logarithms on both sides

How do we represent a binary tree?

Sequential representation

Store the entries in an array at level order.

  • Common for complete trees
  • A lot of space is wasted for non-complete trees
    • missing nodes will have empty slots in the array

How to find nodes

To Find: Use Provided
The left child of $A[i]$ $A[2i]$ $2i \le n$
The right child of $A[i]$ $A[2i+1]$ $2i + 1 \le n$
The parent of $A[i]$ $A[i/2]$ $i > 1$
The root $A[1]$ $A$ is nonempty
Whether $A[i]$ is a leaf $2i > n$

Heaps

A binary tree is called a heap (σωρός) if

  • It is complete, and
  • each node is greater or equal than its children



(Sometimes this is called a max-heap, we can similarly define a min-heap)

Example

Heaps and priority queues

  • Heaps are a common data structure for implementing Priority Queues
  • The following operations are needed
    • find max
    • insert
    • remove max
    • create with data
  • We need to preserve the heap property in each operation!

Find max

  • Trivial, the max is always at the root
    • remember: we always preserve the heap property
  • Complexity?

Inserting a new element

  • The new element can only be inserted at the end
    • because a heap must be a complete tree
  • Now all nodes except the last satisfy the heap property
    • to restore it: apply the bubble_up algorithm on the last node

Inserting a new element

bubble_up(node)

  • Before
    • node might be larger than its parent
    • all other nodes satisfy the heap property
  • After
    • all nodes satisfy the heap property
  • Algorithm
    • if node > parent
      • swap them and call bubble_up(parent)

Example insertion

Inserting 15 and running bubble_up

Inserting 12 and running bubble_up

Complexity of insertion

  • We travel the tree from the last node to the root
    • on each node: 1 step (constant time)
  • So we need at most $O(h)$ steps
    • $h$ is the height of the tree
    • but $h \le \log n$ on a complete tree
  • So $O(\log n)$
    • the “complete” property is crucial!

Removing the max element

  • We want to remove the root
    • but the heap must be a complete tree
  • So swap the root with the last element
    • then remove the last element
  • Now all nodes except the root satisfy the heap property
    • to restore it: apply the bubble_down algorithm on the root

Removing the max element

bubble_down(node)

  • Before
    • node might be smaller than any of its children
    • all other nodes satisfy the heap property
  • After
    • all nodes satisfy the heap property
  • Algorithm
    • max_child = the largest child of node
    • If node < max_child
      • swap them and call bubble_down(max_child)

Example removal

Removing 9 and restoring the heap property

Complexity of removal

  • We travel a single path from the root to a leaf
  • So we need at most $O(h)$ steps
    • $h$ is the height of the tree
  • Again $O(\log n)$
    • again, having a complete tree is crucial

Building a heap from initial data

  • What if we want to create a heap that contains some initial values?
    • we call this operation heapify
  • “Naive” implementation:
    • Create an empty heap and insert elements one by one
  • What is the complexity of this implementation?
    • We do $n$ inserts
    • Each insert is $O(\log n)$ (because of bubble_up)
    • So $O(n\log n)$ total
  • Worst-case example?
    • sorted elements: each value with have to fully bubble_up to the root

Efficient heapify

  • Better algorithm:
    • Visit all internal nodes in reverse level order
      • last internal node: $\frac{n}{2}$ (parent of the last leaf $n$)
      • first internal node: 1 (root)
    • Call bubble_down on each visited node
  • Why does this work?
    • when we visit node, its subtree is already a heap
      • except from node itself (the precondition of bubble_down)
    • So bubble_down restores the heap property in the subtree
    • After processing the root, the whole tree is a heap

Heapify example

Visit internal nodes in inverse level order, call bubble_down.

Complexity of heapify

  • We call bubble_down $\frac{n}{2}$ times

    • So $O(n\log n)$?
  • But this is only an upper-bound

    • bubble_down is faster closer to the leaves
    • and most nodes live there!
    • we might be over-approximating the number of steps

Complexity of heapify

  • More careful calculation of the number of steps:
    • If node is at level $l$, bubble_down takes at most $h-l$ steps
    • At most $2^{l}$ nodes at this level, so $(h-l) 2^{l}$ steps for level $l$
    • For the whole tree: $\sum_{l = 0}^{h-1}(h-l) 2^{l}$
    • This can be shown to be less than $2n$ (exercise if you're curious)
  • So we get worst-case $O(n)$ complexity

Efficient vs naive heapify

  • For naive_heapify we found $O(n\log n)$

    • maybe we are also over-approximating?
  • No: in the worst-case (sorted elements) we really need $n\log n$ steps

    • try to compute the exact number of steps
  • The difference:

    • bubble_up is faster closer to the root, but few nodes live there
    • bubble_down is faster closer to the leaves, and most nodes live there
  • Note: in the average-case, the naive version is also $O(n)$

Implementing ADTPriorityQueue

Types

// Ενα PriorityQueue είναι pointer σε αυτό το struct

struct priority_queue {
	Vector vector;              // Τα δεδομένα, σε Vector για μεταβλητό μέγεθος
	CompareFunc compare;        // Η διάταξη
	DestroyFunc destroy_value;  // Συνάρτηση που καταστρέφει ένα στοιχείο
};

ADTPriorityQueue implementation

Types.

// Ενα PriorityQueue είναι pointer σε αυτό το struct

struct priority_queue {
	Vector vector;              // Τα δεδομένα, σε Vector για μεταβλητό μέγεθος
	CompareFunc compare;        // Η διάταξη
	DestroyFunc destroy_value;  // Συνάρτηση που καταστρέφει ένα στοιχείο
};

ADTPriorityQueue implementation

Finding the max is trivial.

Pointer pqueue_max(PriorityQueue pqueue) {
	return node_value(pqueue, 1);		// root
}

ADTPriorityQueue implementation

For pqueue_insert, the non-trivial part is bubble_up.

// Αποκαθιστά την ιδιότητα του σωρού.
// Πριν: όλοι οι κόμβοι ικανοποιούν την ιδιότητα του σωρού, εκτός από
//       τον node που μπορεί να είναι _μεγαλύτερος_ από τον πατέρα του.
// Μετά: όλοι οι κόμβοι ικανοποιούν την ιδιότητα του σωρού.

static void bubble_up(PriorityQueue pqueue, int node) {
	// Αν φτάσαμε στη ρίζα, σταματάμε
	if (node == 1)
		return;

	int parent = node / 2;		// Ο πατέρας του κόμβου. Τα node ids είναι 1-based

	// Αν ο πατέρας έχει μικρότερη τιμή από τον κόμβο, swap και συνεχίζουμε αναδρομικά προς τα πάνω
	if (pqueue->compare(node_value(pqueue, parent), node_value(pqueue, node)) < 0) {
		node_swap(pqueue, parent, node);
		bubble_up(pqueue, parent);
	}
}

ADTPriorityQueue implementation

// Πριν: όλοι οι κόμβοι ικανοποιούν την ιδιότητα του σωρού, εκτός από τον
//       node που μπορεί να είναι _μικρότερος_ από κάποιο από τα παιδιά του.
// Μετά: όλοι οι κόμβοι ικανοποιούν την ιδιότητα του σωρού.

static void bubble_down(PriorityQueue pqueue, int node) {
	// βρίσκουμε τα παιδιά του κόμβου (αν δεν υπάρχουν σταματάμε)
	int left_child = 2 * node;
	int right_child = left_child + 1;
	int size = pqueue_size(pqueue);
	if (left_child > size)
		return;

	// βρίσκουμε το μέγιστο από τα 2 παιδιά
	int max_child = left_child;
	if (right_child <= size && pqueue->compare(node_value(pqueue, left_child), node_value(pqueue, right_child)) < 0)
			max_child = right_child;

	// Αν ο κόμβος είναι μικρότερος από το μέγιστο παιδί, swap και συνεχίζουμε προς τα κάτω
	if (pqueue->compare(node_value(pqueue, node), node_value(pqueue, max_child)) < 0) {
		node_swap(pqueue, node, max_child);
		bubble_down(pqueue, max_child);
	}
}

Other possible representations

Operation Heap Sorted List Unsorted Vector
pqueue_create (with data) $O(n)$ $O(n\log n)$ $O(1)$
pqueue_remove $O(\log n)$ $O(1)$ $O(n)$
pqueue_insert $O(\log n)$ $O(n)$ $O(1)$



All of them have some advantage

  • Heaps provide a great compromise between insertions and removals

Using ADTPriorityQueue for sorting

  • We can easily sort data using ADTPriorityQueue

    • create a priority queue with the data
    • remove elements in sorted order
  • When ADTPriorityQueue is implemented by a heap

    • this algorithm is called heapsort
    • and runs in time $O(n\log n)$

Readings

  • T. A. Standish. Data Structures, Algorithms and Software Principles in C. Chapter 9. Sections 9.1 to 9.6.

  • R. Sedgewick. Αλγόριθμοι σε C. Κεφ. 5 και 9.

Proofs of given statements can be found in the following book:

  • M. T. Goodrich, R. Tamassia and D. Mount. Data Structures and Algorithms in C++. 2nd edition. John Wiley and Sons, 2011.