# Red-black Trees in Erlang

Working through Chris Okasaki's "Purely Functional Data Structures", I found that I couldn't find an Erlang version of the red-black tree implementation Chris shows on page 28. The code, also available as a pastie and download (with more comments of mine) .erl and .hrl:

%% redblack.hrl -record(node, {color, data, left, right}). %% redblack.erl %% Adapted from Okasaki "Purely Functional Data Structures" p. 28 member(_X, undefined) -> false; member(X, #node{left=A, data=Y}) when X < Y -> member(X, A); member(X, #node{data=Y, right=B}) when X > Y -> member(X, B); member(_X, _S) -> true. insert(X, undefined) -> #node{color=black, data=X}; insert(X, S) -> % to do recursive anonymous functions, pass the created fun in as 2nd parameter Ins = fun(undefined, _F) -> #node{color=red, data=Data}; % insert the new data as a red node (#node{color=Color, left=A, data=Y, right=B}, F) when X < Y-> balance(Color, F(A, F), Y, B); (#node{color=Color, left=A, data=Y, right=B}, F) when X > Y-> balance(Color, A, Y, F(B, F)); (Node, _F) -> Node end, #node{left=A, data=Y, right=B} = Ins(S, Ins), #node{color=black, left=A, data=Y, right=B}. %% detect Black->Red->Red Patterns and balance the situation balance(black, #node{color=red, left=#node{color=red, left=A, data=X, right=B}, data=Y, right=C}, Z, D) -> #node{color=red, left=#node{color=black, left=A, data=X, right=B}, data=Y, right=#node{color=black, left=C, data=Z, right=D}}; balance(black, #node{color=red, left=A, data=X, right=#node{color=red, left=B, data=Y, right=C}}, Z, D) -> #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}}; balance(black, A, X, #node{color=red, left=#node{color=red, left=B, data=Y, right=C}, data=Z, right=D}) -> #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}}; balance(black, A, X, #node{color=red, left=B, data=Y, right=#node{color=red, left=C, data=Z, right=D}}) -> #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}}; balance(Color, Left, Data, Right) -> #node{color=Color, left=Left, data=Data, right=Right}.

So, I'm going to go through this, mainly to prove to myself that I kinda
get it, and because Ben will complain I didn't
dumb it down enough for him. Line 2 defines the node structure in an
Erlang header file. Comparable to a struct in C. **Member** Lines 6-13
define the member function. Each member() -> … is a function clause,
essentially a case statement on steroids. The first clause says if the
tree is undefined, then whatever X is, it is not a member of undefined,
so return false. The second clause unpacks the current node (
#node{left=A, data=Y} ) and assigns the values of left and data to A and
Y, respectively (I have tried to maintain Chris's variable names as much
as possible, though I think Erlang's record syntax complements the
Standard ML syntax found in the book). The clause then compares Y, the
node's data, to the query X, and if X < Y, checks the left branch (the
variable A) of the node. The third clause is nearly identical, except it
handles when X > Y, and checks the right branch (B) for X's existence.
The fourth and final clause handles a match (because when X is not
greater than or less than Y, it must be equal), returning true. Simple
enough to follow and grok. **Insertion** When I read several different
articles online concerning red-black trees, the insertion routine
sounded quite complex to me. I was quite pleased when I read the
algorithm in the book, as it crystallized my thoughts and allowed me to
grok the action. The first clause of the insert function handles the
empty (or undefined) tree, creating and returning a black node (since
the root of the tree must be black). The second clause takes the new
data X and a tree S and creates a new function, stored in the var Ins.
Now, calling an anonymous function recursively is tricky, and I will
explain why this part varies slightly from the text. Since there's no
way to refer to the function (as it isn't named), we have to pass the
anonymous function as a second parameter to the anonymous function,
which is what happens on line 28. So when you see F(A, F), mentally
think Ins(A). The first parameter to Ins() is our current node in the
tree. If it is undefined, we insert our new data as a red node. Why?
That is discussed at length elsewhere, but it boils down to being easier
to fix a red violation than a black violation (look up the
constraints
a red-black tree has on it, and the violations will make more sense).
The second clause unpacks the node and makes our less than/greater than
comparison like a normal BST. What's different is that we call the
appropriate recursive insert call (remember F(A, F)? This inserts X
somewhere into the left side of Node, and F(B, F) inserts X somewhere
into the right side), and pass the result to the balance function. The
final clause handles a match between X and Y and returns the Node and
its branches unmodified. **Balance** Why do we need a balance function? In
case our data being inserted into the tree is more ordered than we may
have expected. A vanilla BST with inputs [3,2,1] will be a glorified
list. We force the situation by inserting a red node in the tree and
causing a situation where the parent of the newly inserted red node is
also a red node. This violates the constraint that a red node's children
must be black nodes. There are four possible combinations of this
violation:

- (B)-Left->(Red)-Left->(Red)
- (B)-Left->(Red)-Right->(Red)
- (B)-Right->(Red)-Left->(Red)
- (B)-Right->(Red)-Right->(Red)

Each of those possibilities maps to the first four clauses of the balance function. The fifth clause passes a node back unchanged. So let's look at the first clause. This clause is activated when a black node's left child is red, and that red node's left child is also red.

B3 / \ R2 * / \ R1 *

So we unpack the three nodes and return the three nodes rotated to remove the red violation.

Variable | Meaning | Value |
---|---|---|

A | R1's Left node | undefined |

X | R1's Data | 1 |

B | R1's Right node | undefined |

Y | R2's Data | R2 |

C | R2's Right node | undefined |

Z | B3's Data | 3 |

D | B3's Right node | undefined |

Unpacking from the pattern match, we then pack up a new tree structure according to the clause's definition:

#node{color=red, left=#node{color=black, left=A, data=X, right=B}, data=Y, right=#node{color=black, left=C, data=Z, right=D}};

This essentially replaces the black node having a red child and red grandchild with a red node that has two black nodes for children. The next illustration shows the result of the balance call:

R2 / \ B1 B3

Now, the bubbling up of the red node may cause a red violation with the parent and grandparent of R2, which will be handled as the recursion of Ins() unwinds. When the initial Ins() ends (line 28), the resulting node is the root of the new tree, which we then unpack into the data, left, and right fields. We then force the root node to be black by creating the root node with the unpacked parts, forcing color=black. So there you have it. Simplified, with a lot of things glossed over/ignored, but this is my port of the red-black code listing on page 28. Looking forward to progressing through the book more and increasing my data structure- and Erlang-fu. Further fun: Read Chris's reflections 10 years after publishing his book.