# Red-black tree

A red-black tree is a type of self-balancing binary search tree, a data structure used in computer science. It was invented in 1972 by Rudolf Bayer who called them "symmetric binary B-trees". It is complex, but has good worst-case running time for its operations and is efficient in practice: it can search, insert, and delete in O(log n) time, where n is the number of elements in the tree.

 Contents

## Background and terminology

A red-black tree is a special type of binary tree, which is a structure used in computer science to organize pieces of data, such as numbers. Each piece of data is stored in a node. One of the nodes always functions as our starting place, and is not the child of any node; we call this the root node or root. It has up to two "children", other nodes which it connects to. Each of these children can have children of its own, and so on. The root node thus has a path connecting it to any other node in the tree.

If a node has no children, we call it a leaf node, since intuitively it is at the edge of the tree. A subtree is the portion of the tree that can be reached from a certain node, considered as a tree itself.

As red-black trees are also binary search trees, they must satisfy the constraint that every node contains a value greater than or equal to all nodes in its left subtree, and less than or equal to all nodes in its right subtree. This makes it quick to search the tree for a given value.

Red-black trees are conceptually the same as 2-3-4 trees.

Red-black trees, along with AVL trees, offer the best possible worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures which provide worst-case guarantees; for example, many data structures used in computational geometry can be based on red-black trees.

Red-black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets which can retain previous versions after mutations. The persistent version of red-black trees requires O(log n) space for each insertion or deletion, in addition to time.

## Properties

A red-black tree is a binary search tree where each node has a color attribute, the value of which is either red or black. In addition to the ordinary requirements imposed on binary search trees, we make the following additional requirements of any valid red-black tree:

1. A node is either red or black.
2. The root is black.
3. All leaves are black (or null).
4. Both children of each red node are black.
5. The paths from each leaf up to the root contain the same number of black nodes.
Missing image
Red-black_tree_example.png
An example of a red-black tree

These constraints enforce a critical property of red-black trees: that the longest possible path from the root to a leaf is no more than twice as long as the shortest possible path. The result is that the tree is roughly balanced. Since operations such as inserting, deleting, and finding values requires worst-case time proportional to the height of the tree, this theoretical upper bound on the height allows red-black trees to be efficient in the worst-case, unlike ordinary binary search trees.

To see why these properties guarantee this, it suffices to note that no path can have two red nodes in a row, due to property 4. The shortest possible path has all black nodes, and the longest possible path alternates between red and black nodes. Since all maximal paths have the same number of black nodes, by property 5, this shows that no path is more than twice as long as any other path.

A common source of confusion with these properties is that they assume that all the leaves in the tree are nil leaves, which contain no data and serve merely to indicate where the tree ends. These nodes are often omitted in drawings, resulting in a tree which seems to contradict the above principles, but which in fact does not.

## Operations

Read-only operations on a red-black tree require no modification from those used for binary search trees, since it is a binary search tree. However, after an insertion or removal, the red-black properties might become violated. Restoring the red-black properties requires a small number (O(log n)) of color changes (which are very quick in practice) and no more than three tree rotations (two for insertion). This allows insertion and deletion to remain O(log n) time, but it also turns them into very complex operations.

### Insertion

We begin by adding the node as we would in a binary search tree and coloring it red. What happens next depends on the color of other nearby nodes. We will use the term uncle node to refer to the child of a node's grandparent which is not the node's parent, as in human family trees. Note that:

• Property 3 always holds.
• Property 4 is threatened only by adding a red node, repainting a black node red, or a rotation.
• Property 5 is threatened only by adding a black node, repainting a red node black, or a rotation.

In the accompanying diagrams, the node being inserted is N, the node's initial parent node is P, the node's initial grandparent is G, and the node's initial uncle is U. Any color shown in the diagram is either assumed in its case or implied by these assumptions.

Case 1: The new node is at the root of the tree. In this case, we repaint it black to satisfy property 1. This threatens property 5, but since it adds one to the number of black nodes on every path, property 5 remains satisfied.

Case 2: The new node's parent node is black, so property 4 is not invalidated. In this case, the tree is still valid. Property 5 is threatened, because the new node has two black leaf children, but because the new node is red, the path through each of its children has the same number of black nodes as the path through the leaf it replaced, which was black, and so this property remains satisfied.

Note: In the following cases we can assume the new node has a grandparent node, because the parent node is red, and if it were the root, it would be black. Thus, the new node also has an uncle node, although it may be a leaf in cases 4 and 5. Case 3: If both the parent node and the uncle node are red, then we can repaint them both black and repaint the grandparent node red (to maintain property 5). Now our new red node has a black parent. Since any path through the parent or uncle must pass through the grandparent, the number of black nodes on these paths has not changed. However, the grandparent may now be a red node with a red parent, violating property 4. If so, we perform this entire procedure recursively on it.

Note: In the remaining cases, we assume the parent node P is the left child of its parent. If it is the right child, left and right should be reversed throughout cases 4 and 5. Case 4: The parent is red but the uncle is black or missing; also, the new node is the right child of its parent, and the parent in turn is the left child of its parent. In this case, we can perform a left rotation that switches the roles of the new node and its parent; then, we deal with the former parent node using Case 5. This causes some paths to pass through either the new node or the parent where they did not before, but both these nodes are red, so property 5 is not violated. Case 5: The parent is red but the uncle is black or missing, the new node is the left child of its parent, and the parent is the left child of its parent. In this case, we perform a right rotation about the grandparent; the result is a tree where the former parent is now the parent of both the new node and the former grandparent. We know that the former grandparent is black, since the parent could not have been red otherwise. We then switch the colors of the former parent and grandparent nodes, and the resulting tree satisfies property 4. Property 5 also remains satisfied, since all paths that went through any of these three nodes went through the grandparent before, and now they all go through the former parent. In each case, this is the only black node of the three.

### Deletion

In a normal binary search tree, when deleting a node with two non-leaf children, we find either the maximum element in its left subtree or the minimum element in its right subtree, and move its value into the node being deleted (as shown here). We then delete the node we copied the value from, which must have less than two non-leaf children. Because merely copying a value does not violate any properties, if we can figure out how to delete a node with fewer than two non-leaf children, we'll know how to do this as well.

If we are deleting a red node, we can simply replace it with its black child, if any. All paths through the deleted node will simply pass through one less red node, and both the deleted node's parent and child must be black, so properties 3 and 4 still hold. Another simple case is when the deleted node is black and its non-leaf child is red. Simply removing a black node could break property 4, but if we repaint its child black, all paths that used to pass through it will pass through its black child, preserving the property.

The complex case is when both the node to be deleted and its single non-leaf child (or either leaf, if it has none) are black. We begin by replacing the node to be deleted with its child. Call this child N, and, for convenience, call its sibling (its parent's other child) S. In the diagrams below, we will also use P for N's parent, SL for S's left child, and SR for S's right child.

Because we have deleted N's black parent, paths which proceed through N have one fewer black node than paths that do not. As this violates property 4, the tree must be rebalanced. There are several cases to consider:

Case 1: N is the new root. In this case, we're done. We removed one black node from every path, and the new root is black, so the properties are preserved.

Note: In cases 2, 5, and 6, we assume N is the left child of its parent. If it is the right child, left and right should be reversed throughout these three cases.

 Missing imageRed-black_tree_delete_case_2.pngDiagram of case 2 Case 2: S is red. In this case we rotate left at N's parent, turning the red sibling into N's grandparent. We then reverse the colors of N's new parent and grandparent. Although all paths still have the same number of black nodes, now N has a black sibling and a red parent, so we can proceed to step 4, 5, or 6. (Its new sibling is black because it was once the child of the red S.)
 Missing imageRed-black_tree_delete_case_3.pngDiagram of case 3 Case 3: N's parent, S, and S's children are black. In this case, we simply repaint S red. The result is that all paths passing through S, which are precisely those paths not passing through N, have one less black node. Because deleting N's original parent made all paths passing through N have one less black node, this evens things up. However, all paths through P now have one fewer black node than paths that do not pass through P, so property 4 is still violated. To correct this, we perform the rebalancing procedure on P, starting at case 1.
 Missing imageRed-black_tree_delete_case_4.pngDiagram of case 4 Case 4: S and S's children are black, but N's parent is red. In this case, we simply exchange the colors of N's sibling and parent. This doesn't affect the number of black nodes on paths not going through N, but it does add one to the number of black nodes on paths going through N, making up for the deleted black node on those paths.
 Missing imageRed-black_tree_delete_case_5.pngDiagram of case 5 Case 5: S is black, S's left child is red, S's right child is black, and N is the left child of its parent. In this case we rotate right at S, so that S's left child becomes S's parent and N's new sibling. We then exchange the colors of S and its new parent. All paths still have the same number of black nodes, but now N has a black sibling whose right child is red, so we fall into case 6. Neither N nor its parent are affected by this transformation.
 Missing imageRed-black_tree_delete_case_6.pngDiagram of case 6 Case 6: S is black, S's right child is red, and N is the left child of its parent. In this case we rotate left at N's parent, so that S becomes the parent of N's parent and S's right child. We then exchange the colors of N's parent and S, and make S's right child black. The subtree still has the same color at its root, so property 3 is not violated. However, N now has one additional black ancestor: either N's parent has become black, or it was black and S was added as a black grandparent. Thus, the paths passing through N pass through one additional black node. Meanwhile, if a path does not go through N, then there are two possibilities: It goes through N's new sibling. Then, it must go through S and N's parent, both formerly and currently, and they've only exchanged colors. Thus the path contains the same number of black nodes. It goes through N's new uncle, S's right child. Then, it formerly went through S, S's parent, and S's right child, but now only goes through S, which has assumed the color of its former parent, and S's right child, which has changed from red to black. The net effect is that this path goes through the same number of black nodes. Either way, the number of black nodes on these paths does not change. Thus, we have restored property 4. The white node in the diagram can be either red or black, but must refer to the same color both before and after the transformation.

## Proof of asymptotic bounds

A red black tree which contains n internal nodes has a height of O(log(n)).

Definitions:

• h(v) = height of subtree rooted at node v
• bh(v) = the number of black nodes (not counting v if it is black) from v to any leaf in the subtree (called the black-height).

Lemma: A subtree rooted at node v has at least [itex]2^{bh(v)}-1[itex] internal nodes.

Proof of Lemma (by induction height):

Basis: h(v) = 0

If v has a height of zero than it must be nil, therefore bh(v) = 0. So:

[itex] 2^{bh(v)}-1 = 2^{0}-1 = 1-1 = 0 [itex]

Inductive Hypothesis: v such that h(v) = k, has [itex]2^{bh(v)-1}-1[itex] internal nodes implies that [itex]v'[itex] such that h([itex]v'[itex]) = k+1 has [itex]2^{bh(v')}-1[itex] internal nodes.

Since [itex]v'[itex] has h([itex]v'[itex]) > 0 it is an internal node. As such it has two children which have a black-height of either bh([itex]v'[itex]) or bh([itex]v'[itex])-1 (depending on whether [itex]v'[itex] is red or black). By the inductive hypothesis each child has at least [itex]2^{bh(v')-1}-1[itex] internal nodes, so [itex]v'[itex] has:

[itex] 2^{bh(v')-1}-1 + 2^{bh(v')-1}-1 + 1 = 2^{bh(v')}-1 [itex]

internal nodes.

Using this lemma we can now show that the height of the tree is logarithmic. Since at least half of the nodes on any path from the root to a leaf are black (property 4 of a red black tree), the black-height of the root is at least h(root)/2. By the lemma we get:

[itex] n \geq 2^{{h(root) \over 2}} - 1 \leftrightarrow \; \log{(n+1)} \geq {h(root) \over 2} \leftrightarrow \; h(root) \leq 2\log{(n+1)} [itex]

Therefore the height of the root is O(log(n)).

## References

• Art and Cultures
• Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
• Space and Astronomy