June 2011 – The Missing Docs .Net

TCO11 R2

No t-shirts for me this year…

Only 760 out of 850 potential contestants turned up, and I came 443rd, short of the required top 350 which advance to shirt round. I think 7 Aussies qualified for this round, but only 6 registered. Out of those 6, 3 failed to get any points (possibly fell asleep!), 2 of us around the 440th place, and John Dethridge advances 315th place.

I could have made it through to get a t-shirt, but I took too long to realise I had an overflow bug in my count number of powers of k less than or equal to n loop. I really should have implemented it using a divide, but also I should stop using 32bit integers – except where absolutely needed. One day I will learn!

I had time enough to do question 2, but it was not easy and I was very tired. I looked for corner cases for Q1 hoping to get the single challenge I needed to get through, but the given examples covered all of them, so there was basically no one to challenge.

Q1) How many boolean sequences of length n are are plausible, if the ith entry is the answer to whether a number is divisible by i. Return answer mod 1000000007 since input can be up to 1000000.

A1) I was a bit slow in realizing the correct approach here, but it wasn’t too long before I realized the answer was related purely based on the primes less than or equal to n. Specifically each prime doubled the answer at a minimum, more if different powers of that prime were less than n as well. More specifically if the maximum power of a prime p less than or equal to n is p^k then the prime independently contributes (k+1) ways to build the final sequence. So, multiply all these factors mod 1 billion and 7 and you have your answer.

What kept me too slow to advance was it took a long time to realize that my loop for determining max power was potentially squaring numbers just less than 1 million, which doesn’t fit into 32bit integer. I had correctly identified that I needed to use 64bit integers for the result as small powers go more than squared and so mod 1 billion and 7 is not sufficient to keep you under 32bit limit, but I should have used them more liberally from the start.

Q2) Given an square array of strings which are built up by concatenating the smallest of the vertically and horizontally previous cells in front of the largest, where the first row and column are provided (and each have a single character in each cell), return up to 50 characters starting at a specific index of the string in the far bottom right corner (the largest string). Note: no character will be repeated in the first row or column.

A2) Pascals triangle gives the length of each cell – and with a worst case of 30×30 the length of the largest string can be very large!

Having built up pascals triangle, all that is really needed is to know for each cell, which of the two vertically and horizontally previous cells is lexically smallest. As usual, ‘all’ is of course the whole actual problem. I thought I was on the right track for this, but I suspect its performance is too slow, the trick being to define a memotizable recursion which doesn’t require an offset and length for comparison for both cells involved.

Edit: Before I managed to fall asleep I think I solved this. The memotizable recursion is ‘which is smaller for 2 cells on a common diagonal?’. If the 2 cells are neighbours on a diagonal, they are built out of the 3 cells on the previous diagonal. From there it can be seen that the middle cell can be ignored as it is either the common start, the common end or transitive property applies and you can just compare the outside pair. If they are not neighbours, you get 4 cells which are in neighbouring pairs. By using the memotized recursion on each neighbouring pair you can then reapply the recursion on the smallest of each pair. Memotization ensures that the algorithm runs in O(n^3), both in space and time.

Now if you can read between the lines there you’ll notice that I have failed to take in to account an important special case. Once the recursion hits an edge we need to compare that edge value to a cell on the same diagonal. Unless that other cell is on the opposite edge of the diagonal, this doesn’t appear trivial. However the original problem was ‘determine the nth character in the bottom right hand corner’, we can trivially expand that to also ‘determine the first character in every cell’. Since the answer to these new questions depends on the which is smaller question, it might seem we are at an impasse – but because each question is entirely answered by considering the previous diagonal alone, we can invoke mutual memotized recursion safely.

All that remains is to prove that this method works. Something which is obvious is that each cell is only made up of characters which appear on the edge in the same or lesser row, or same or lesser column. Thus since every comparison in the algorithm involves two different cells on the same diagonal, it can be seen that the termination comparison is never equal, as an edge on a diagonal compared to a different cell in the diagonal cannot be equal as the cells on the diagonal cannot be made out of that edges character and all edge characters are distinct.

Q3) Given 2 arrays of positions, and a set of ‘connections’ from one array to another, determine the number of orderings of the two arrays such that no connections cross. (Return mod 1000000007 as with arrays of length 50, the answer might be very large.)

A3) Edit: Also solved this one before going to sleep. I think it is easier than Q2 even.

First step, DFS the connections to form them into groups. If you discover a loop, return 0. Now we have our first factor in the answer, n! where n is the number of groups. Each group is going to be independent, since there is no way they can cross each other without causing a crossed connection so the n! comes from the orderings of the n groups.

Each group now contributes an independent factor. There are 2 scenarios, either every edge in the group joins at a common endpoint, or there are 2 or more locations in the group where 2 or more edges join. If it is a single common endpoint the answer is k! where k is the number of edges in the group. If it is more than two common endpoints we can either arrange said endpoints left to right, or right to left – so that gives a factor of 2. Then for each endpoint we also get a factor of (m-2)! where m is the number of edges which join to that endpoint.

Finally we have additional factors for the array locations which do not participate in a connection. These can be added in any place in any order, so if there are n spots in the array and k of them are endpoints of 1 or more connections, the additional factor is n!/k!. Since there are 2 arrays, this gives 2 more factors total. Multiply everything together using repeated modulo and 64 bit integers… and we should be done.

All in all it makes me wish I was less tired when I did the competition. I think I could have solved that second question during the competition if I had of been able to concentrate at all. Third one, well I’m pretty sure I would have run out of time having only opened it with a few minutes to go.

TCO11 R1

With 2000 qualifiers able to sign up, only 1550 actually did.

I started writing this post while the solutions were going up, I wasn’t very impressed with my effort (even allowing for the 2am start time and the remnants of the head-cold I’ve had over the last few days), and fully expected that I had failed to make it through to the next round. Scores were still updating but I was 852 and very few re-orderings were occurring due to system test failures. However, at that point I also thought that the cut-off was the top 600. I was wrong – top 850 go through – and final results have me at 845th!

I guess that means I get the privilege of waking up at 2am next week, along with the 6 other Australian’s who progressed. Only 8 Australians bothered to wake up (including myself), I’m sure more qualified than that…

Q1) Given a sequence of characters X and O, you want to reorder them into a different sequence. 4 operations are available, move the front to the end of either of 2 separate storage locations, or move the front of either of the 2 separate storage locations to the end of the original. Return the minimum number of operations required.

A1) Simple! – although at 2am nursing a muddled head it certainly took me long enough – a few seconds longer and I would have been eliminated this round. The answer is 2 times the minimum number of characters to remove from the front before the rest is equal to the start of the desired sequence.

Obviously there is no way it can be smaller than this, all that remains is to justify that it can always be achieved in this number of turns. To do this, all it takes is to realize that there are 2 storage locations and 2 character types. So every X you move to storage 1 and every 0 you move to storage 2. Then you can put them back on to the end of the initial location in whatever order you like, no restrictions since each storage is all the same character.

Q2) Given a set of probabilities that a road segment is muddy, and the ability to jump over any single road segment you like, what is the expectation value for the minimum number of muddy road segments you have to stand in to get from start to end? (Start and end are both guaranteed to be dry.)

A2) As probability problems go, this one isn’t as bad as some. However, my first write up I tried to build from start to finish, rather than from finish back to start – which was completely wrong, so I had to scrap that and start again. I then made about a dozen mistakes getting 0 and 1 mixed up. No idea why but I had decided to choose 1 as dry and 0 as muddy and half way through I switched that all backwards… I should have realised that 0 as dry and 1 as muddy was a more natural assignment from the start. I finally sorted them all out, but only a couple of minutes after the end of the available coding time!

So I wrote the solution as a recursion of expectation value for a given position depending on whether the position closer to the finish is muddy or not as well as the current position is muddy or not. (Apparently it can even be defined as a straight recursion, like a complicated version of the Fibonacci series – but I am too tired to see the path to that.) From there the answer is the sum of the two scenarios for position 0(start) where they are not muddy, weighted average by the probability of each scenario which is given by the probability of position 1 being muddy. Memotized recursion or DP to the answer.

Details of the recursion are rather long to fully write out, but can be summarized as a couple of conditions. If the next position is dry, take expectation value of that position for the dry case (2 scenarios weighted average), and then add 1 if the current position is muddy. If the next position is wet, take expectation value of the position after in either dry or wet case (4 scenarios weighted average) and add 1 if the current position is muddy.

This works because if the next cell is dry jumping is not required, and worst case you jump into mud, so you never try jumping. If next cell is muddy, you always jump, because if the cell after is dry its a winner, if it is muddy then you have stepped in the same number of muddy cells as if you had not jumped, but you are closer to the finish (that is not a formal proof, but hopefully obvious).

Q3) Imagine IP addresses where each of the 4 components are valued 0 to 999. Then imagine that you effectively own all of them, and that people are willing to pay for each IP address. Up to 50 people make a request for 1 or more IP addresses and give a price they are willing to pay per address. Each request is in the dotted form, with wild cards allowed for any/all of the 4 segments(no partial wild cards, each segment is either a number or a match anything wild card). Determine the maximum profit to be made.

A3) Interesting question. With up to 50 trillion operations in brute force, there definitely needs to be some accelerated counting.

One failed solution I saw calculated all intersections, to gather each section where there is multiple possibilities. Sort them by size largest to smallest, then do inclusion/exclusion counting to get the answer. I think the general principle was sound, but the implementation was lacking. The trick comes in determining inclusion/exclusion counting. Worst case for unique intersections is less than 70000, but for each one it could be the result of many different ways of combining, some of which might be needing an inclusion, while others need an exclusion (I think?). I am thinking forming them into a potentially interlinking forest, which is depth first searched from each root with some kind of memotization to produce the result without a performance explosion.

5am has arrived, I’ll have to come back to this one. (And maybe QR3, which I never wrote up…)

GCJ11 R3

A quick look at how people performed in this round shows it to be a very tricky round. No perfect scores this round, but with only 25 people advancing to the finals, unless you solved question 4 large (which was worth 31 points and only had 1 person solve it correctly), you needed everything else and a decent time to advance. Representing Australia, tikitikirevenge came close in 48th place.

Q1) Given an irregular shape defined by two poly-lines with identical start and end x coordinates, and continuously increasing x coordinates in between (but each poly line itself may have a different number of points) and these poly lines do not intersect. Determine G – 1 x coordinate positions where cuts with dx=0 (vertical) divide the shape into G equal parts.

A1) First thing to do is to break the shape into trapeziums, at each x coordinate specified in the 2 poly lines, taking care for the case where the x-coordinates are identical, and otherwise having to determine the y coordinate for the other side using interpolation. Area of a each trapezium is easy, so we can then work out the total area, and the G equal sub divisions of that total area. Iteration easily finds which trapezium contains each division point, all that is left is determining where in the trapezium the appropriate point is. This could be done using a binary search, or simply derive the equation (which will be quadratic) and solve for the position given the target area (second solution will be outside the trapezium).

All in all, this question is fiddly but very doable. Not surprisingly then this was the question with the highest success rate.

Q2) Given a set of N numbers each between 1 and 10000 inclusive, with possible repetition, you want to organize them into sets of consecutive integers such that the smallest such set is maximal. What is the size of that maximal smallest set?

A2) Small input size is 10, and so a brute force approach is plausible there. Large input is up to 1000, which is much less friendly.

Firstly I think it is useful to break the problem down. If you sort the numbers, you can then break the problem into sections based on where there is a gap in the sorted sequence. The answer is then the minimum of the answer for each section.

Now all we need is the answer to a given section. Where ‘all’ is pretty much the entire problem. Obvious cases first. Same number of repetitions for every value; answer equals set size. If the repetition of a single value is greater than the sum of the repetitions of its neighbours, the answer is 1. In any case that we get an answer of 1, we can avoid looking at all other sections.

That still leaves a fair bit to consider. Anywhere that the repetition count increases has to be the start of at least a number of sequences equal to the difference. I am wondering if the greedy approach of starting no more sequences than required, and when the count decreases, stopping the oldest sequences, will produce the ideal answer. It certainly seems sound on the first glance. A quick check of the answers shows this to be the solution.

Definitely a lot more thinking than the first problem, but again doable.

Q3) There is an RxC grid where each square contains a conveyor belt which points either vertically, horizontally, or one of the 2 diagonals. Each conveyor belt has an individual direction control. Determine how many out of the 2^(R*C) possible scenarios result in the conveyor belts pointing only in loops. (I.E. No cell can have 2 cells pointing to it.) The grid wraps around on to itself ‘by the miracle of science’. Answer modulo 1000003 to avoid huge answer problems.

A3) Small input RxC is up to 4×4, so only 2^16 scenarios to investigate, and each scenario can be checked relatively quickly, simply by counting how many cells point to each cell. So, small input is trivial (and it receives a tiny number of points too). Large input is 100×100 – hence why we need that modulo.

Special cases – a cell might have nothing that can possibly point to it, causes a result of 0. Also a cell might have only one thing pointing too it, hence the direction of that conveyor is ‘defined’. A straight line loop of conveyors has 4 possibilities if length even, 2 if length odd. Any such loops can be removed from the graph as a multiple of the answer obtained otherwise. Otherwise the puzzle contains no reversible loops. Which is a bit of a pain really, because it means decomposition appears infeasible.

At this point I’m pretty sure I’m out of my league, so I shall consult the answers. Interesting! Having repeatedly forced conveyors where a cell has only one option pointing to it, and exiting if there is ever no options for a cell, an interesting thing occurs. Every remaining partially (or complete) undecided cell has either exactly 2 options for how conveyors point to it, or away from it (or both), due to conservation of edges. This implies that choosing one direction, always results in forcing a direction, which in-turn forces another until you come round in a loop, and since you come round in loops, the problem is decomposable after all. Each of these ‘loops of force’ is entirely independent and contributes exactly 2 independent ways to solve the problem. So count the loops, and keep doubling modulo 1000003 to get the answer.

In competition I doubt I would have gotten this out. The fact that the ‘loops of force’ aren’t constructed of consecutive cells makes it hard to observe by accident.

Q4) Given a binary representation of a square number, where some of the bits have been replaced with question marks (but not the highest set bit), determine the original number. (Only one answer will be possible.)

A4) Small input constraints of maximum 60 digits and at most 20 question marks made brute force an obvious and trivial course of action. This was actually answered more frequently than the easy answer to question 3.

The large input only had one person successfully solve it it, so I don’t feel too bad that nothing springs to mind. Constraints were 125 digits and up to 40 question marks. My first thoughts are, maybe fill in the highest 20 question marks with each possibility, and then get the bounds of the 64bit integer which matches the lower 20 question marks, and then try each one. Problem is, in the worst case scenario, that appears to be even worse than brute forcing all 2^40 scenarios.

A few little trivial cases to get started. If the right most digit is 0, it and the digit to its left can be removed and ’00’ appended back onto the answer afterwards. If the second right most digit is a 1, something is wrong because all squares either end in 01 or 00. If second right most digit is a question mark, it must be a 0. This leaves last 2 digits are ’01’ or ‘0?’.

The winning approach was to divide the number into half, if most of the question marks are in the top half, try each possibility for the bottom half, and use that to infer the top half (somehow). Otherwise try each possibility for the top half, and use that to determine the bottom half.

Lets look at the second option first, since it is easiest. If we calculate the square root of the maximum value which matches the specific pattern for the top half, round down and then square – we have our answer (assuming that it matches the pattern.) Only trick is to acquire the square root of a 125bit number. Binary search using BigInteger will do the trick.

This leaves the first option. Lets put aside the possibility of the answer being even. Above it was mentioned that the last 2 digits we actually work with are 01 or 0?, we can use this as justification for ignoring the possibility of the answer being even, because 0? can be broken into 00 or 01 sub cases, 00 causes recursion, 01 does not. (Since only one side of the question mark ever causes recursion, there is no exponential run time problem to worry about. Another important point is to only consider whether the low or high bits have the most question marks after ensuring the last 2 digits are 01, otherwise the algorithm can be easily tricked into running too slow.)

So, since the answer is odd, it ends in 01, but how does this help us? Well if the answer is odd, then the square root is odd, so it either ends in 01 or 11. IE it is 4A+1 or 4A+3. If it is the former than the square is 16A^2+ 8A + 1 = 8A+1 mod 16, if our input is a specific mod 16, we can determine whether A is odd or even, which gives us one more digit of the square root. Similar applies to 4A+3. In both cases, this continues. Given the right half digits of the original number, we can slowly build 2 possibilities for the same number of digits of the square root, which either ends in 01 or 11. Since the number of digits in the square root is no more than half the number of digits in the input and we’ve filled in all question marks in the right hand half we have 2 complete possibilities for the square root. Given these two possibilities we can then check which either actually matches the input.

Thinking about this problem, in hindsight I believe I’ve previously been shown this approach of determining the square root of a number given only its right hand half, but I doubt it would have helped me in the heat of competition, not even if it was the only problem I did.

Final analysis – the small inputs are generally all pretty easy to implement, the first 2 large inputs were in my doable category, this leaves my ‘feasible ideal placement’ between 160th and 90th – which matches with where I get eliminated in my best days. Would have been nice not to screw up round 2 so badly 🙂

GCJ11 R2

Out again in round 2. Last year in ~800th, this year in 1263rd. So again no t-shirt from Google.

I am a bit disappointed with my effort, the answer to question 3 had been staring me in the face for an hour, but my late night addled brain just refused to see it. If I had of seen it, I think I could have even solved the large input and made it through to round 3. Maybe I would have seen it if I had of spent a few more minutes on it rather than doing question 2 small input.

Q1) If you are in a corridor of a certain length, and there are moving walkways some or all of the length of that corridor, and each walkway has its own speed boost which it applies to your speed, and you can walk at a speed S and run at a speed R, but you will run out of puff if you run for longer than T total, what is the shortest time to get from the start of the corridor to the end?

A1) This problem was very easy, although it was a bit fiddly. The key to notice is that you want to run during the period of time that your walking speed + walkway speed is slowest, as that has the greatest net effect on the final time. So, just sort the input by walkway speed, run on all non-walkway bits you can, then on the walkways in the order they are now sorted.

Q2) Given an rectangular array of point masses, where each mass is given as a deviation from the smallest mass (and no deviation is more than 9 units), determine the largest square array of these point masses, which if you remove the 4 corners has a centre of mass at its actual centre.

A2) Again, simple enough, but the brute force attack (which is what I implemented to try and get a few more points) is O(N^5) and the large input is up to 500×500. The key is to do a bunch of pre-calculation of moment of mass and sum of mass of every rectangle array starting from the top left corner, about (0,0) in O(N^2) time by using the fact that moment of mass and mass is additive, a larger rectangle can be determined from the sum of the rectangles 1 smaller in each dimension individually, minus the overlap, then finally tacking on the new mass element. Once these pre-calculations are done, any rectangle can be calculated quickly by adding the bottom right corner and top left corner, and subtracting the bottom left and top right corners. Then finally this can be adjusted by taking off the corners. And to finish off we can check whether the centre of mass is in the middle by seeing whether the moments are equal to the product of total mass and the centre position. Since there are O(N^3) scenarios to check, and they can now all be considered in O(1) time, the 500×500 isn’t too bad at all.

All in all, this question was perfectly doable, but I had decided to focus on question 3 for too long to come back and think about it properly. If I had of done this problem I suspect I would have implemented a different approach. Consider the number of 1xn and nx1 subsections of the input. There are O(N^3) of these, which can each be calculated in O(1) by building off each previous 1x(n-1) (n-1)x1 sections. From there, each of the O(N^3) scenarios can be built up in O(1) from each previous scenario, by slapping on the rectangular edges and then subtracting off the corners twice (and adding on the previously removed corners).

Q3) Given an integer N, the numbers 1 to N can be ordered and then processed in the following way. As each number is considered, a total (starting from 0) is checked to see if it is a multiple of this number. If it is not, the total is increased to the smallest total which is a multiple of this number. If the number is no longer a multiple of one of the numbers previously considered, this process continues until all processed numbers divide into the current total. Throughout all possible orders what is the difference between the most and least number of times that adding a new number to consideration results in the total being increased.

A3) The wording was tricky, but it boils down to how do you order the first N numbers, such that the running lowest common multiple changes the most number of times, or the least number of times.

I fairly quickly identified that the most number of times occurred when you process the numbers in order, further identifying that the actual answer was equal to 1 + the sum of the maximum number each prime could be raised to and still be less than N. That is half the problem solved…

However, I started by considering the reverse order, which worked correctly for small values. I then considered maybe ordered by the number of prime factors, but that was actually even worse. Finally I ran out of time with my sleep deprived brain giving in.

So, what was that second half of the answer? It comes right back to the first part, put the maximum power of each prime in first, then everything else will not change the LCM. I have no idea why I failed to consider this scenario, it seems so very obvious now.

And this leads to how to do the large input, where a maximum value of N was 10^12. O(N) was going to be too slow, and looking at 10^12, I had immediately thought that the answer would involve the primes less than or equal to the square root. (I even wrote the once off pre-calculation of all primes less than 1million, to be ready for needing that scenario.) With the minimum answer being the number of primes less than N, and the maximum answer being the number of primes, and all of their powers less than N, the difference is all of the prime powers of at least squared, which are less than N. Hence that array of the prime numbers less than 1million is all you need to implement a very fast answer.

Q4) Given a graph with unit edge distances, determine the shortest path between 2 points, which is also within 1 unit of the most number of vertexes.

A4) The shortest path part of this algorithm is easy enough, the maximizing the number of vertices near the path is an entirely different story. Fairly obviously it will be a memotized recursion/dynamic programming solution, but the recursion is not trivially obvious.

I looked up the answer because it is way early in the morning and I just wanted to finish writing this up before dawn… So the recursion you use is to consider the maximum number of vertexes ‘near’ the shortest path to b where the previous node in the path was a. This is equal to the maximum over the sum of paths ending in a with a each possible previous node before that, plus the number of additional new ‘near’ nodes which get added extending that path to b. This is the key point, it is possible to work out the number of new ‘near’ nodes in a path if you know the last 3 nodes of the path. This observation occurs because you can see that if a node next to the latest node in the path was close to anything before those last 2, then the second latest node isn’t actually on the shortest path, as it could have been skipped via this theoretical other path.

And in-fact doing that is actually the most time consuming part of the problem, but it isn’t exactly hard, just consider every vertex, is it next to the end of the path, if so, is it next to either of the previous 2 on the path. Using an edge matrix, this is O(V) for each scenario. Cache each triple to avoid repeated calculation, and you are done. (Slight optimizations exist, but are not required given the input size.)