{"id":533,"date":"2012-04-29T13:17:11","date_gmt":"2012-04-29T13:17:11","guid":{"rendered":"http:\/\/www.themissingdocs.net\/wordpress\/?p=533"},"modified":"2012-04-29T13:17:11","modified_gmt":"2012-04-29T13:17:11","slug":"red-black-trees-advanced","status":"publish","type":"post","link":"https:\/\/www.themissingdocs.net\/?p=533","title":{"rendered":"Red-Black Trees, Advanced"},"content":{"rendered":"<p>So, I thought I knew my way around a red-black tree, maybe not great, but I&#8217;ve implemented it a few times now.<\/p>\n<p>The contest analysis for Q3 in GCJ12 R1A states that the question can be performed in O(N^2 log N) time, but leaves how as an exercise to the reader.\u00a0 I decided to take up this challenge, and came to the conclusion that what I needed was a data structure which took items which have an order, and could join\/split subsets of that order in log(N) time.<\/p>\n<p>I actually thought I could do such a data structure with O(1) join\/split &#8211; but then I realised that I needed to associate state with the joined groups, and hence the split operations were becoming O(N) or accessing the state would take O(N) if I started from the wrong node.\u00a0 So I went back to the drawing board. As is often the case when you see a solution with O(1) for some operations and O(N) for others, there is a way to do somewhere in between for all operations and it often involves trees&#8230;<\/p>\n<p>So, if I represented each connected section as a binary tree, I could get from any element to the root of its section, and associate the state with that root, giving O(log N) lookups.\u00a0 Good start.\u00a0 But what about the joins and splits?\u00a0 Could I join a binary search tree with another adjacent one in O(log N) time, or even worse, can I split a binary search tree at an arbitrary point in O(log N) time?\u00a0 I felt the answer was going to be yes, and not just because of the Contest Analysis suggestion of a O(N^2 log N) lower bound.<\/p>\n<p>This called for some research, and it didn&#8217;t take long to find a couple of papers describing how to do concatenate and split in red-black trees.\u00a0 The first one was confusing me, it kept adding nodes to the tree, or removing them and not putting them back. (And the PDF had the pages in reverse order!)\u00a0 But the second one was much clearer, and explained that the former was using a variant of red-black trees where only leaf nodes contained values, and how to augment these algorithms to work with what I considered a more standard red-black tree approach.<\/p>\n<p>So, off to implement I went.\u00a0 For my new data structure I decided to reuse a single memory allocation containing all the tree nodes, and perform the splits\/joins and manage the multiple trees all in the one array.\u00a0 This held promise of best performance, something of a target considering I am going to make this part of TMD.Algo&#8217;s next release.\u00a0 But&#8230; it did mean I had to write my red-black tree logic from scratch, since my other red-black tree implementations all use heap allocated nodes and pointers and sentinel nodes.\u00a0 This took a while on its own, and since my data structure only has 3 API members (Add, Union, Split) &#8211; and I had only written 1 (Add), it was not very favourable to testing yet&#8230;<\/p>\n<p>Union was up next.\u00a0 Union requires the Insertion fix-up logic from a standard red-black tree, but is otherwise not actually that hard.\u00a0 First, remove the left-most element of the right tree.\u00a0 Then spend another O(log N) steps to find the black-height of both trees, and the black node on the higher tree which is the same height as the smaller tree.\u00a0 Then you take that left-most element you removed, make it red, and add the two nodes with equal black-height you have identified as children, then add it back in to the taller tree where you just stole a sub-tree to use as a child. (Unless they trees are of equal height, then there is nothing to add it back to.)\u00a0 Pretty simple, but still you end up with 3 code paths, and having to handle corner cases like the right tree being a single node to start with, or either of the trees being entirely empty.\u00a0 At this point my code got its first real test-case, but it was hardly capable of stressing much.\u00a0 I was probably a couple of hours in to writing code at this point.<\/p>\n<p>Split.\u00a0 When I first started this journey, I didn&#8217;t expect split to be all that much worse than Union.\u00a0 But it certainly was!\u00a0 The description starts out simple&#8230; Walk the tree from the root to the split location.\u00a0 At each node you visit, you are either at the target node, or the target node is to the left or right.\u00a0 If the target node is left, union the right sub-tree with the right split result (starts as null) and store the current node to use as a pivot for the next union rather than having to take the left most element out of the right tree.\u00a0 Similar (but reversed) for if the target node is right.\u00a0 And if the target node is equal, do both, but then add the current node to the left tree (or right if you want) rather than storing it to use as the next pivot.<\/p>\n<p>The devil is of course in the details.\u00a0 In order for this Split algorithm to be O(log N) the unions have to be performed in O(1) time.\u00a0 But the Union implementation takes O(log N) time?!?\u00a0 Also not mentioned is that you have to handle cases where you&#8217;ve stored a node as a pivot, but when you get to the point where you would use it the current node has no right sub-tree, or you have no accumulated result tree, or both&#8230;<\/p>\n<p>But back to the performance problem &#8211; how do we do Union in O(1) time?\u00a0 Well it gets mentioned that if you augment your tree with black height information in each node, and keep it updated through all the insertions and stuff you can do Union in time equal to the difference in the height of the trees.\u00a0 Because when it comes down to it, Union is O(1), you just have to find where to perform the union itself.\u00a0 But keeping track of the black height as part of the nodes isn&#8217;t exactly trivial, and so I went with the approach suggested in the paper I was reading, which was to track the relative heights between the items which you are merging.\u00a0 You can even start by saying that the original root node of the tree is height 0, since it is only the relative heights that matter and a red-black tree starts with the property of every path having the same black height.<\/p>\n<p>This is still easier said than done! Black height of the current node, that is easy, you just decrement it by 1 each time you leave a black node.\u00a0 So you know the black height of that right (or left) sub-tree you want to concatenate with the (right or left) result easily enough.\u00a0 But what is the effective black height of the (left or right) result, and how can we be sure that the difference in heights is effectively O(1)?\u00a0 It was at this point that I had to do some actual thinking, because it wasn&#8217;t expanded upon in the paper.\u00a0 The first sub-tree was easy enough, we knew its black height when we got it, so we just stored it, and the height difference between it and the next sub-tree is going to be at most equal to the number of steps between sub-trees being on the same side.\u00a0 This isn&#8217;t O(1), but clearly the sum of these O(steps) is at most O(log N), which is the goal.\u00a0 This however will only work if we can move between the first sub-tree and the next in O(steps).<\/p>\n<p>So what happens when we union these trees?\u00a0 We know there is a pivot, we know that the pivot&#8217;s black height starts as the same as the smaller tree (and the trees that we are joining are in non-increasing height sequence, so that is always the newer tree for which the black-height is easy to calculate), since the pivot is always red to start with.\u00a0 Will this pivot always be on the outside edge of the tree once we have run the insertion fix-up, and will it always have that same black height?\u00a0 The answers to this is yes and no respectively.\u00a0 Most scenarios just recolour, but there can be rotations.\u00a0 However, these rotations all occur at the parent or grand parent level.\u00a0 With the parent level rotating to push the pivot down the outside and the grand parent level to pull it up the outside.\u00a0 The recolouring also occurs at parent levels and above, so you might think the black height won&#8217;t change either.\u00a0 However there is one exception, if the pivot is or becomes the root of the tree, it gets forced to black to maintain the root node is black requirement.\u00a0 This increases the black height by 1.\u00a0 This increase by one is worst case going to double the cost of walking to the next join point (1 step becomes 2), so the order of magnitude hasn&#8217;t changed.<\/p>\n<p>So, we can use the pivot point, and we can track its black height efficiently.\u00a0 All that remains is to put it all together, handle the black heights in the corner cases where there is no sub-tree and trying not to make too many errors writing almost the same rather complicated logic 4 times.<\/p>\n<p>Now that I had &#8216;Split&#8217; I could write some real stress tests.\u00a0 I like to have a random (with fixed seed) test which runs a large number of repetitions for cases like this, because manually identifying all the corner cases to write test cases for is harder than writing the code!\u00a0 I started out with 1000 elements and 100k random operations.\u00a0 The first time I ran my test case it crashed on operation 1.\u00a0 (I hadn&#8217;t handled splitting something which wasn&#8217;t unioned.) and it wasn&#8217;t long before I realised that 1000 elements was too many to debug all the problems in my code, so I dropped it down to 3.\u00a0 It passed! (Yay! My code could handle some trivial cases!)\u00a0 I stepped up to 4.\u00a0 Fail.\u00a0 Fix, fix fix. Works, 5, Fail. Fix, fix fix. Works, 6, Fail.\u00a0 At 7 the only failure was for splitting the perfectly balanced tree.\u00a0 At 8 I found I had implemented the algorithm incorrectly.\u00a0 Not a typo or off-by-one or similar like all my previous bugs, I had implemented the wrong thing.\u00a0 Specifically I had missed the requirement that union could only work if the nodes at same black level were both black.\u00a0 You couldn&#8217;t use a red parent of the equal black level nodes from the larger tree.\u00a0 I wondered why only my split implementation failed, then I realised that my union walked down to the leaves then walked back up to find equal height, so stopping early was always stopping in the right place.\u00a0 For Split, I was always walking down from the last pivot point, which in this case was red and started with the right black height.\u00a0 I fixed that and it works.\u00a0 9, 10, 20, 30, 60, 130.\u00a0 All works.<\/p>\n<p>One last thing I want to mention was my test assertions.\u00a0 After each operation, union or split, I could trivially assert that the union had joined or the split has split.\u00a0 But a failure at one step might not get noticed for dozens of steps later.\u00a0 So I ran a full suite of compliance tests after each step as well.\u00a0 Significantly slowed down execution, since they were O(n^2) and each step was O (log N) otherwise, but I tested the following and every one of them helped me find a bug.<\/p>\n<ol>\n<li>Each root node was black. (red black tree requirement 0)<\/li>\n<li>Each node was only in one tree, and no cycles.<\/li>\n<li>Each red node only has black children (red-black tree requirement 1)<\/li>\n<li>The black distance from each node with 0 or 1 children to root is the same as the black distance from root to left most child. (red-black tree requirement 2)<\/li>\n<li>At each point in each tree, every node to the left is smaller and every node to the right is larger. (I originally only checked that immediate children satisfied this property, but found my tree mangling was sometimes too subtle to get picked up.)<\/li>\n<li>For every pair of root nodes, that the left most edge of one tree isn&#8217;t on the same side of the right-most edge of the other as its root, or vice versa.<\/li>\n<\/ol>\n<p>All in all I think I spent almost 10 hours to get this to work.\u00a0 Certainly glad I wasn&#8217;t trying to do it during a programming competition! (And I haven&#8217;t even written the solution to the actual problem&#8230;)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So, I thought I knew my way around a red-black tree, maybe not great, but I&#8217;ve implemented it a few times now. The contest analysis for Q3 in GCJ12 R1A states that the question can be performed in O(N^2 log N) time, but leaves how as an exercise to the reader.\u00a0 I decided to take &hellip; <a href=\"https:\/\/www.themissingdocs.net\/?p=533\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Red-Black Trees, Advanced<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-533","post","type-post","status-publish","format-standard","hentry","category-code-competitions"],"_links":{"self":[{"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/posts\/533","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=533"}],"version-history":[{"count":0,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=\/wp\/v2\/posts\/533\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=533"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=533"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.themissingdocs.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=533"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}