Balancing our haskell tree
9 December 2007Some days ago I presented to you a simple binary tree implementation in Haskell. It seems that I didn’t have enough, and if you have been checking the bzr branch, you have probably noticed that by now (rev10), the tree is actually an AVL Tree. A wise self-balancing tree, that keeps the lookup, insertion and deletion operations in logarithmic time. As inorder traversal is linear, sorting is on a nice O(n log(n)) quota. The implementation also keeps the maximum depth of the tree to sqrt(2)log(n).
This is made in 97 lines with proper comments and whitespace. Compare this to the OpenSolaris AVL Tree implementation that is 10 times larger. I’m not claiming my implementation is better or more efficient… but if you want my opinion, having a AVL Tree in C or Java is something that can be hardly done in a hundred of lines.
So, how you do balancing?
If we know that an empty tree is balanced, the way to keep the tree balanced is to check if rebalancing is needed every time we delete or insert a new node. As easy as that. First things first:
How you know that you unbalanced a tree?
You go through the tree following a certain path. Just after the insertion the call stack keeps the nodes of every node that could have been unbalanced in the process: the ones that can reach the inserted node. There’s no way you can unbalance the left branch of a tree if you inserted (or deleted) in the right branch. So we keep checking as we go back in the recursion. The function that checks that a node is balanced is a one-liner as follows (supose that depth does what is supposed to do :)
balFactor :: BT -> BT -> Int balFactor t u = (depth t) - (depth u)
Ok, this node is wrong, how do you solve it?
There are only 4 different cases, that I just copied from the Wikipedia:
Click for a bigger image. Notice that the original tree has a balFactor of +/- 2; while the resulting tree, 0. Notice also that the 2 last cases are combinations of the 2 first.
Do you know how many lines of Haskell is that? Just 4…
balanceLL (N v (N vl tl ul) u) = (N vl tl (N v ul u)) balanceLR (N v (N vl tl (N vlr tlr ulr)) u) = (N vlr (N vl tl tlr) (N v ulr u)) balanceRL (N v t (N vr (N vrl trl url) ur)) = (N vrl (N v t trl) (N vr url ur)) balanceRR (N v t (N vr tr ur)) = (N vr (N v t tr) ur)
We are just restructuring the tree, but still is confusing. In Haskell you spend 1 hour to get those 4 lines and it’s worth it. I couldn’t manage to get those lines, until I draw myself the tree. Notice how the equation corresponds to the drawing…
balanceLR (N v (N vl tl (N vlr tlr ulr)) u) = (N vlr (N vl tl tlr) (N v ulr u))
And what you do with that?
Knowing when to apply which balancing is as difficult as to know how to do the balancing. In case of insertion, this is how I do it:
insert :: BT -> Int -> BT
insert L i = (N i L L)
insert (N v t u) i
| i == v = (N v t u)
| i < v && (balFactor ti u) == 2 && i < value t = balanceLL (N v ti u)
| i < v && (balFactor ti u) == 2 && i > value t = balanceLR (N v ti u)
| i > v && (balFactor t ui) == -2 && i < value u = balanceRL (N v t ui)
| i > v && (balFactor t ui) == -2 && i > value u = balanceRR (N v t ui)
| i < v = (N v ti u)
| i > v = (N v t ui)
where ti = insert t i
ui = insert u i
Of course the only lines we are interested in are the ones which apply the balance* functions (yep, 4 lines again). Notice that checking for balFactor being equal to 2, already means that i < v (otherwise, we couldn't have get this unbalancing), but is nice to have them there to document the function a bit. The 3rd condition is just checking in which direction the new value has been inserted (the value function just returns the value of the given node, as its name says) to know which of the balance functions we have to apply.
Deletion is a bit different, but also needs to check for 4 different cases. With some time, a pen and a piece of paper you get those 4 lines. If you ask me, the best achievement of Haskell is that you spend more time thinking than coding: not that you cannot do that with other language, but in Haskell you are forced to it, and that's the only way to bug-free code.
Oh, and just look at the sort function:
sort = inorder . (load empty)
Gimme, gimme
The full implementation is kept in the bazaar repository:
http://bzr.geeksynapse.net/haskell
But if you don’t know what is bazaar (although I recommend you to investigate about it), you can get the rev10 here:
http://dropbox.geeksynapse.net/code/haskell/AVLTree.hs
Leave a Reply
You must be logged in to post a comment.


2 Responses to “Balancing our haskell tree”
December 9th, 2007 at 7:47 pm
The concept of binary tree is too dificult for me (at the moment). Tomorrow we will begin the Computer Complexity unit in PRG (Do you remember your time as a student of the UPV?), so I’ve no idea about this data structure.
PS: I haven’t downloaded the Haskell compiler yet!!!
December 9th, 2007 at 8:59 pm
You’ll have to wait for LPP (Third year?) There you’ll be taught some Haskell and Prolog. Good luck with it ;)