Code Competitions – Page 5 – The Missing Docs .Net

TCO14

Registration is open already. I think this is why I almost forgot to register for this years competition…

Last on-line round of the TCO/GCJ ‘season’. 138 people eligible to compete, 125 turned up. Unlike last time the first 2 problems were completed by a significant proportion, so advancing required a fast time for both the first 2 problems. Only one person solved all 3, the same person also had the only solution to the 3rd problem.

Q1) Given a sequence of numbers where each element is between 1 and n, determine how many sequences of equal length with each element also between 1 and n are a ‘match’ with the original sequence if you remove one element from each sequence. Since the answer is large, return modulo large prime.

A1) At first glance this doesn’t seem too complicated, consider the original sequence, remove each element in turn, and count how many sequences you can create by adding one element. The problem is of course that this double counts. Even just considering a single case, if you insert the same value before/after a given number, the sequences are the same. So either we need to determine how many double counts occur and subtract them off, or use a different approach entirely.

Looking at a solution (because I feel lazy today) scenarios can be divided in to 2 cases. Where the number we insert is a member of the original sequence, and where it is not. If it is in the original sequence, we have complicated possibility of double counting, if it is not, then it is much simpler. So for each element removed (up to 50 cases) for each insertion position ( up to 50 places) for each distinct original value (up to 50 values) we have something which needs checking for duplicates. This is a fairly small sample size, so we can easily use a set. For each of the element removed, for each insertion position, we can also insert the n-distinct_count_options other values. There is still the ability to double count here, but these cases can be grouped by what the inserted value is (regardless of whatever that value is), so we can use a place holder (0 for instance) and the same set, and each time we get a non-match we add n-distinct_count_options to the final result.

Q2) Given pairs of values, determine the minimum number of individual value swaps (swapping one value from one pair with one value from another pair) is required such that all pairs have a maximum difference of k. If it is impossible, return -1.

A2) There is only a maximum of 16 pairs here, which suggests bit-masks may play a role.

But again, I am feeling lazy today, so I went straight to a solution. The first step is to determine whether the problem is possible at all. It seems that if you take all the values from the set of pairs, put them together, and sort them, they must fall out pairwise into values which are close enough. The logic behind this seems to be rather straight forward – if you consider the smallest value you need a value close enough, and if there are multiple which work, choosing anything other than the smallest of those, can only potentially make it harder to form the next pair.

To find the minimum number of swaps we can look at this problem from a different angle. Any solvable situation can easily be done in at most n-1 swaps, if we look at the sorted order pairing, we can swap the second to be next to the first, the 4th to be next to the 3rd, and so on, until the final number is already in the right place, the question is how much can we short cut that. If we have a solvable scenario which can be divided into 2 solvable subsets, worst case we can do the first in k-1 steps and the second in n-k-1 steps, giving n – 2 steps. What remains is to determine the most number of steps which can be saved, by following this down recursively. The maximum number of steps which can be saved by a scenario is the best sum of the maximum number of steps that can be saved over each partition into 2 solvable subsets. With a minimum of 1. Since we already have worked out how to determine if a situation is solvable, we can perform that for every subset, then considering every subset, of every subset (which is not a small number, but is achievable), determine if that subset and its compliment are both solvable, and use that as the framework to build up a recursive cache of maximum number of steps to be saved for a given subset. Then the final answer is n – maximum number of steps saved for the full set, assuming it is solvable. If it isn’t the answer is of course -1.

Q3) Determine the minimum number of jumps to get from 0,0 to x,y if each jump can’t be more than a distance sqrt(d) in a straight line, and jumps may only be to/from integer coordinates.

A3) Unusual bit here is that rather than just asking this question once, each test case can be up to 50 of these questions, which you have to answer all at once. For d=1 the answer is |x| + |y|. For d=2 the answer is max(|x|,|y|). From there the question that first needs to be asked I think is whether a greedy solution is correct. If so we should be able to find a pattern in the jumps which repeats and use that to ‘accelerate’ to the finish.

Looking at the solution again, it appears some fancy maths comes into play. Given d, you start off by determining the pair of maximal distance jumps which are either side of the angle connecting 0,0 to |x|,|y|, do this by constructing each possible maximum distance jump and watching the crossproduct change signs as we go from one side to the other. (May want to take care of the x= 0 or y = 0 cases separately just to be sure you don’t screw that up.) Once you have this pair of vectors it just becomes a question of how many of one and how many of the other to add together to give the right destination. This is where the maths seems to get a bit fancy. Taking directly from the solution, the answer is crossproduct((x,y),V2-V1) / crossproduct(V1, V2-V1), rounded up. I’m not quite sure how to derive this exactly, but at first glance I wouldn’t say it was wrong.

GCJ13 R3

So, out of 500 qualifiers, only 311 positive scores. And no perfect scores.

Cut-off to advance to the world finals was having solved the first and last problems. But given only 3 people fully solved the last problem, a safer grouping was having solved first, 3rd and the small input of the last.

Interestingly solving Q1 and small inputs of the rest got between 29th and 54th, which was a reasonably achievable score if you gave up and didn’t get bogged down on any of the tougher large inputs.

Q1) In a rigged game of roulette where the ball always lands on one of the values with minimum payout, given how everyone else has already played, determine your maximum profit you can make with your money.

A1) So for this question it seems likely that we can enumerate the scenarios of maximum winnings. The small input certainly is trivial, with a maximum budget of 1000, you can use the strategy that placing money on an outcome which is not currently one of the lowest payout outcomes, is a wasted bet. Furthermore when there are multiple equal, you choose the one which has the least of your money on it. Then just apply that strategy by placing each bet of size 1, one at a time, and find which is best. The large input however we need to enumerate the ideal scenarios more efficiently as you could have up to 10^12 to spend.

So, rather than going 1 by 1, we need to skip ahead by chunks at a time. Specifically, any scenario which is profitable, will be more so, if we can raise all the minimum payout outcomes by 1, without increasing the number of minimum payout scenarios. So we can seed with a few scenarios, and push them as high as we can, without reaching equality with another players bet or running out of budget. Each time we reach another players bet, we start again assuming having brought everything up equal to that. I think that the scenarios to seed with are the 37 equal or above equality with a bet (or equal or above 0 if there are less than 37 player bets). Running time is thus 37*37 divisions/minimum/maximum calculations – easily running in time.

I think this would work, but I see solutions instead using binary search to find the best scenario rather than division. It seems that there is a more aggressive strategy which always spends as much of the budget as possible for each of 37 scenarios, rather than 37*37 scenarios spending up to the next price threshold.

Q2) Given a set of vertices construct a non-self intersecting polygon which uses all the vertices with at least half the area of the convex hull.

A2) Just like Round 2, another problem where it doesn’t state maximum area, just at least half, and any satisfying result can be returned. Being a computational geometry question, it isn’t my favourite. So I think the approach would be to start by calculating the convex hull. Then it becomes a question of where to insert the contained points into the order described by the convex hull. Small input however is limited to 10 vertices, so it can be brute forced, just consider every polygon, build them up to rule out self-intersection, then calculate the area that remains. Large input has 1000 vertices, and I want to say a greedy approach will work, where for each vertex not in the current border you calculate its minimum cut-out location to add it to the list, and perform that cut-out. Problem is doing this in better than O(N^3) time seems like it would require some fancy spatial data structures.

Looking at the answers there is a far simpler approach which avoids the whole question of maximum and just dives straight at the ‘at least half’ criterion instead. It is not at all obvious though… If you calculate the convex hull, and consider connecting all the internal points in a left-to-right sequence to the outside of the hull (choosing an orientation to ensure that internal points can be strictly ordered left-to-right, you create 2 areas (both of which are valid non-self-intersecting polygons), which together add to give the area of the convex hull. Hence one of them must be at least half. So if you take half the convex hull, and all the other points ordered left-to-right, that can only be larger than the previous area, so one of those 2 must be more than half the area. Since all coordinates are integer, you can select a very slight fractional angle, which will ensure they all have a distinct left-to-right order. Calculate the convex-hull, and the left/right most coordinates of the convex hull under this order then serve as the split points to divide the hull into 2 parts, try both versions, calculate the maximum area. Since the polygons are known to be non-self intersecting, the running cost is O(N log N) for sorting on a slant, and then O(N) to implement convex hull on sorted data and O(N) to calculate the area, so 1000 is easy.

Q3) Given a set of one direction edges on a graph and an ordered subset of them which represents a path from vertex 1 to vertex 2, determine the first edge of that subset which is definitely not on the shortest path, if any, given a range of possible costs for each edge. When considering each edge for whether it can be on the shortest path, the shortest path must include all previous edges.

A3) This is a very strange question, on my first reading I didn’t understand it very well and thought it was about ensuring each edge specified was on a shortest path, not only on a shortest path prefixed by a given set of edges. I think the small input can be brute forced by trying each combination given by choosing one of the extreme values for each edge and calculating the shortest path for each of these 2^20 scenarios. Whichever one goes furthest down the specified path will define the answer. Large input certainly needs a different approach. I am thinking that we start with every edge at minimum value, then apply all source shortest path, then for each edge on the route we take the specified previous edges, and from the end of the edge to vertex 2 we use the previously calculated shortest path to the destination, then every edge not on that list has its cost set to maximum, and we determine if the shortest path goes through our selected edge. Running time should be okay, O(N^2 log N) or so, but I am not certain it is actually correct.

Looking at the solution I am still not entirely sure, but I am pretty sure there are corner cases it breaks where the shortest path back from end of prefix allows for a short-cut that skips the selected prefix edges but there exists another ‘short path’ back which doesn’t cause that problem. The solution does a simultaneous 2 starting point shortest path by priority first search. One starting point is vertex 1 with initial cost 0, the other is the end of the current prefix being considered, with a cost implied by the minimum for each edge in that prefix (minus a half to give it priority since equally fast is still on the shortest path). Each of the prefix edges are forced to have weight minimal. If we are at a vertex which has a fractional best cost, we use minimal weight for outgoing edges, because our prefix shortest path got there first, so its always good. Otherwise we use maximal weight for outgoing edges, since the raw search got there first, which is bad. If the destination vertex ends up being fractional, shortest path has succeeded with the prefix.

Q4) Given a starting list of boolean values, determine the average sum of the number of consecutive true values starting from randomly selected locations until the first false to the right (wrapping back to the beginning if needed), if each found false location becomes true, and the process repeats until every value is true.

A4) Again this is a problem where the small input is quite easily achieved. With only 20 entries in the list, the problem can easily be recursively defined with a cache over the possible states in the list. Each state has 20 random approaches which have to be averaged to give the final value of that state.

Large input, I had no clue, so I went straight to the solutions. The solution is an O(N^3) approach by calculating the probability of filling a consecutive subsequence (including wrapping) of values with a specific member being last, and the expectation value of the same. Each of these calculations can be expressed in terms the items before the specific last entry to fill, and the ones after it and using some fancy combinatorics for the probability calculation. I won’t go into it here, you can always read the official contest analysis for more details.

TCO13 R3A

So, having been knocked out in Round 2, I didn’t have to wake up at 2am to do this round, which is good because I’m not sure I would have gotten any points anyway… As usual, there were 3 questions, but no one solved question 3 and only 5 people successfully solved question 2, meaning you could advance to the finals with just by solving Q1 quickly and a successful challenge or 2.

Q1) Given a set of 32bit numbers, determine whether each of a second set of 32bit numbers is in the resultant of K rounds of expanding the the set using the bitwise and operation. Specifically, every pair in the set is anded and the result is added to the new set, the input and output sets can contain duplicates.

A1) So, my initial thoughts were around that the and operation is rather limiting, so the total count of numbers will never get very large, so you could just simulate the problem, collecting duplicate outputs into counts. But that would be mistaken. using an appropriate set of initial numbers you can generate a very large set of outputs even once you apply distinct, and as a result simulation will take too long.

So, K=1 is easy, you just consider every pair. K=2 there are 2 cases, the 2 input numbers are either from distinct original pairs (N choose 4) or they are from overlapping original pairs (N choose 3). K=3 there are more scenarios. 8 distinct original numbers anded together is the maximum, you can obviously do scenarios for 7, 6, 5, 4. What is interesting is that you can still do 3. In the K=2 case, each scenario of N choose 3 can be created 3 different ways. ((1,2) (2,3)) ((1,3) (2,3)) ((1,2) (1,3)). Since all duplicates are kept this means that these would be anded together pairwise to create another 3 new copies of each case of (N choose 3). From there on it follows that every possible combination involving at least 3 and at most 2^K distinct original numbers will be in the output. And since the original input is at most 16 numbers, we can consider every subset easily enough. We just need to remember to special case K=1, as it is the only case where you can choose pairs.

Q2) Given a NxN array with 3 specified positions on the edge being important, determine the minimum number of filled in positions which need to be added (some may already be filled in) such that there is no path of connection between any pair of the 3 original specified points (path consists of horizontal and vertical only, no diagonals). If there is no way to do this is less than 100 positions, return 100.

A2) The ‘100’ limit is a misdirection, other than when any 2 of the specified positions are adjacent (and hence cannot be blocked) the maximum answer is 6, 3 for each of 2 of the original positions required to surround them against an edge – the 3rd position doesn’t matter of course once 2 of them are blocked in. However, considering the cases of placing 1-5 options on a 20×20 grid, and for each case performing a connection check is 400^6 operations in the worst case, which is far too many to complete in the few second time limit. It is quite feasible to check each single location, or even each pair, but beyond that is pushing your luck.

Looking at the 5 successful solutions they all seem to use the same kind of approach. The idea is to think about the minimum cost to create a wall between 2 points. Then the final result is either 2 walls which each block in 1 edge location, or 3 walls which meet in the middle. The minimum cost to create a wall between 2 points is a graph problem where a block already filled in creates 0 cost edges for the paths it blocks, a block not filled in creates 1 cost edges instead. The net result is you can run all pairs shortest path. Once you have the all pairs shortest path, you can then consider each centre location, and for each of those consider the shortest paths to create three walls from there to any location on the edge between each pair of the 3 selected edge locations. Finally you can choose each pair of edge positions and choose the cheapest 2 walls which go from one side of a selected location to the other. (The 2 cheapest selected must obviously be for different selected locations.) Choose the best result out of all of these scenarios and, presto you have the answer. In the worst case of a large empty map with 3 spread out selected edge locations the 2 walls will always win out, but on a map which is already filled in much more significantly, the 3 walls meeting is more likely.

Q3) There is a set of m positive integers , n of which are between 1 and t inclusive, and the sum of all m is less than or equal to s. Determine how many sets satisfy these requirements (returning count modulus large prime). Constraints, n is at most 100 less than m, t is at most 10^5, m is at most 10^9 and s is at most 10^18 and is at least n*t.

A3) So I don’t have any answers to fall back on here, what I find most interesting is the constraints, they are rather detailed – s being at least n*t for instance is suspicious. Looking at this problem I see possible approaches with running times of at least O(s) which is obviously out of the question. The combinatorics certainly seems beyond me at the moment.

GCJ13 R2

So, looks like this was a tough round – only 1 person solving the 4th question, and they broke the large input on question 1, so no maximal scores. Advancement cut-off was solving the first 2 questions small/large with at least 6 minutes to spare. T-shirt cut-off was Q2 small/large with more than 40minutes to spare. Safe T-shirt was Q1 small/large and Q2 small.

Q1) Given that the cost to travel between any 2 points on a line in one direction is given by a quadratic formula which has a negative quadratic component in the distance between the points (hence average cost by distance is monotonically decreasing as distance increases) and a list of trips and the number of people completing them, determine how much the total cost can be reduced by maintaining the same number of exits/entries at each stop, but changing where a person’s ticket starts/stops the journey.

A1) My question statement above is a bit more generic than the specific one in the comp – but I think they are equivalent. The first thing to identify that in any case of 2 trips that touch/overlap the best result is to have one ticket travel the long way and one ticket travel the short way. This leads to the greedy approach – ignore the people and just watch the tickets. At each trip start location, new tickets arrive, at each trip exit location we remove the newest tickets, leaving the old ones. This leads to a natural implementation involving a stack – order all start/end locations from first stop to last stop, such that starts are sorted before ends when start and end locations are coincident, then walk them in order, pushing onto the stack ticket start locations and counts, and at each end location pop the stack as much as required to satisfy the exit count – putting any remnants back on the stack if we get a partially fulfilled starting location. Sum up all the costs from entering/exiting and compare to the expected cost. This is O(N) in the number of distinct trips, so it will easily perform in time for the large input which only had 1000 distinct trips.

The real challenge for the large input is the size of the result – the number of tickets in flight can get up to 10^12, and they can all exit at once, or all enter at once. (Problem statement does not rule out the 1000 trips actually being identical – just that each one has a maximum of 10^9 people). So you need to remember to use 64bit integers at this point already. But then the cost of individual trips can be ~10^18 itself, so calculating the total cost can overflow 64bit multiplication. So you either have to remember to apply modulus before and after the multiplication, or simply do all math using BigInteger and apply the modulus at the very end. If you take the modulus approach, you have to remember that when taking the difference of 2 modulus values (to calculate the value saved) the answer is always positive under modulus, but the % operator in many languages will give you the negative remainder when used on a negative result. As a bonus, the small input is so constrained that it can’t even hit the modulus at all, so you can’t get any confidence in your modulus code before tackling the large input, without doing some manual testing yourself.

Q2) Given 2^N competitors where each competitor can be ranked in a strict order where higher ranked players always beat lower ranked players and p prizes to be awarded, determine the lowest rank which is guaranteed to receive a prize and the lowest rank which could receive a prize. Players start in potentially any order, but after each of the N rounds they have a stable sort applied based on their Win/Loss record for the tournament so far (lexically ordered comparison favouring early wins rather than number of wins). Prizes are then awarded to the top p entries in the final sort.

A2) This is an interesting competition structure – in order to be in the top half you have to win your first round. Also of note that because there are 2^N competitors and N rounds and in each round everyone will always be playing against someone with the same win/loss record – the final results are fully ordered by the win/loss record, no reliance on the stable sort. Small input is up to 10 rounds, large input is up to 50. So 64bit integers are again important.

I did not find this problem very approachable for some reason, but after some thought I think the approach to take is to binary search on rank and ask the question, how high (or how low) can this rank go in the final result. Comparing the result of that question to the prize threshold you can decide whether to go higher or lower to find the minimum rank which could (or is guaranteed to) win a prize. So then the question becomes, for a given rank, what range of placings is possible. Highest/lowest ranks have no options they are always stuck in their natural position. Second highest/lowest ranks can go from their natural ranking to exactly one position past the middle in the direction away from their natural rank. But what about the rest? If we consider a given rank, at each round its ‘division’ (same win/loss record) will consist of a certain number of ranks higher and a certain number of ranks lower. The actual value of these ranks is not relevant, just how many there are. If we are trying to win we can consider what the effect of optimal strategy will have on these 2 numbers, and vice versa for trying to lose.

So if we have x above and y below and are trying to determine how high a final placing we can get, what is the ideal strategy? I would suggest it is to maximise y at all stages. One of the lower ranked will play you, and lose, maximising how many of the y-1 remaining win involves getting them to pair off with each other as much as possible, as every time someone from x plays against someone from y, x is guaranteed to win. So new y = (y-1)/2 (rounding down) and new x = (x+y+1)/2 – 1 – new y. Eventually y=0 and you have to lose, and from then on you also can’t win. This sequence of wins/losses will give you the maximum placing for a given rank. You can also reverse everything to get a sequence of losses/wins to get the minimum placing for a given rank. Assuming these strategies are correct it is also fairly obvious that the maximum/minimum placing never has any turning points with respect to changing starting rank, so the binary search approach I originally suggested is plausible (although due to the step-function nature some extra care might be needed). On the other hand, the fact that this strategy demonstrates that ‘ideal’ scenarios always consist of sequences of wins then losses, or vice versa (but no mixing) it should be possible to come up with a much more direct approach simply calculating the size of y backwards. (Consistently assuming that a round down was required in the forward version, or not, in order to get the range of values that will reach a specific ‘limit’ in the ideal scenario.) I think this direct approach can be performed in O(N) with a sufficiently smart implementation, as compared to O(N^2) of the binary search approach.

Q3) Given a table of lengths of the longest increasing sub-sequences ending at i and a table of the longest decreasing sub-sequences starting from i, determine the lexicographically smallest permutation of the numbers 1 to N, which satisfies both tables.

A3) Inspecting the tables gives us a bunch of inequalities regarding elements in the final sequence, from there we just find the smallest permutation which satisfies the inequalities and presto we are done? Somehow I think it isn’t as simple as that sounds. But lets start anyway. If two consecutive elements of the first table are increasing, we know that the values are increasing at that point as well. If they are decreasing in the table, we know the values are decreasing, and if they are equal in the table we know the values are decreasing. Corresponding logic also applies for the longest decreasing sub-sequences. However we can learn more. Each value in the first table is either 1 (indicating a new lowest value) or it is precisely 1 more than a previous number. The new value is then between that previous number and all intermediate numbers. However there might be multiple previous numbers that are exactly one less – it appears that maybe we can assume the least constrictive of those, which will be the closest one.

Is that enough constraints? I am not sure. The next question is, given these constraints how do we build the output? My guess is we build a directed graph and perform a kind of topological sort. Care must be taken to ensure that in the multiple ways the topological sort can be performed, we choose the one which gives the best lexicographical output.

Looking at a solution, I think that I have covered all the constraints, although it can be stated more simply. For each entry, walk backwards through the table, the first one you are equal to, you are less than that, the first one you are exactly one more than, you are larger than that. (And vice versa for the second table). When stated this way, you can more clearly see that the resultant graph is fully connected and so the way to populate the topological sort is clear, leaf pruning, where the leaf to be trimmed next is always the leaf which corresponds to the earliest position in the output.

Q4) Given a multi-player version of pong where the paddles are points and can pass by each other and each team has to rotate through its paddles (in order) to hit it back, determine how long the game goes for (and who wins, if it ever finishes). You are given the initial position and velocity of the ball, the size of each team, how fast each team can move its paddles, and the size of the game world.

A4) Reading the description I thought, sure this question sounds reasonable. Then I saw the small/large input constraints. Even the small inputs contain numbers that won’t fit in 32bit integers. Large input contains values which barely fit in 384bit numbers…

Small input does at least look feasible – each team has at most 1 million players, so as long as your code is fast, you can consider each player and think about bouncing cycles and reflection, identify the few scenarios to determine whether the paddle can make it in time between each turn it has to take – calculate the smallest failure out of the up to 2 million players and return. The large input however has 10^100 players per team. The only way this version is plausible is if you can consider ever player in a team all at once, identify the worst case within the team (maybe using a binary/ternary search?). Given there was only one successful solution, I don’t feel too bad at leaving my discussion there. In practice the implementation will require a BigRational class (or scaling all positions by the ball and paddle velocities so they are all integers in seconds per unit, cementing the need for BigInteger even for the small input).

GCJ13 R1C

So I am a bit late with this write up, I completely forgot about it…

Cutoff for the top 1000 which advance to round 2, was problem A small/large and problem B small with 45minutes spare.

Q1) Determine the number of subsequences which contain n (or more) consecutive consonants in a given string.

A1) Small input here is trivial, with a small maximum size you can use the brute force O(N^3) approach of considering each starting point, each end point, and checking each scenario for maximum consecutive consonants. Large input is size 10^6, so we need to do better than O(N^3/2) in order to run in time.

In practice it is actually not too hard to implement a solution in O(N) runtime – although given the answer can be proportional to O(N^2) we still need to remember to use 64bit integers. The solution is to use a walking range of length n, you keep track of how many consonants in a row at the end of the range, first by just counting them, then as it walks forwards, adding 1 if the new location is a consonant, and subtracting 1 if the count was previously equal to n. Each time the current number of consecutive consonants is n, add (length-n-current_starting_index+1)*(current_starting_index-last_successful_starting_index) where you initially set last_successful_starting_index to -1 and update it to equal the current_starting_index each time this calculating is performed and the total incremented. The use of last_successful_starting_index here avoids double counting when there are multiple cases where the number of consecutive consonants is n.

Q2) Given you can only move along 2D grid in horizontal/vertical direction and each movement has a length of 1,2,3,4… determine a list of NESW instructions to get to any specific point from the origin. Point is guaranteed to be reachable, and you should return the minimum length path.

A3) Small input was an interesting variation, you didn’t have to print the shortest path, it just had to be less than 500 moves, which makes the problem trivial as any consecutive pair of moves can be made opposing to have a delta of 1 in whatever direction you want, allowing you to step your way from 0,0 to x,0 to x,y, and since x,y are both less than 100, you will get there in less than 400 moves.

Large input however is not so forgiving, the result has to be the shortest path. I had a few thoughts regarding a greedy approach where you do something like NENENENENNNN to get as close as possible (assuming the target is in the NE quadrant) and then use repeated approaches of ssssswswswswswsnenenenennnnnnnn type to converge even closer. But I don’t see any way to guarantee that that is ideal. My second thoughts was that maybe we can ignore the whole 2 dimensional nature of the problem for a bit, if instead of trying to get from 0,0 to x,y we tried to get from 0 to abs(x)+abs(y) we could end up with a sequence of EWWEEEWEW which appropriate use of replacement with NS can get us to abs(x),abs(y) instead. Which can then be trivially flipped to get to the actual x,y.

This 1D version of the problem seems like it might be simpler. Reading a solution it actually is definitely simpler. Find how many steps it takes to go past the target. Now we go backwards, starting at the destination, subtracting each value off in reverse, if we go negative, we add the next one back on instead. Ultimately you can see that it must converge to 0. Now how do we convert this sequence into one that works with 2D? Using the same number of steps, we subtract off the largest absolute value out of x and y at each step. This doesn’t converge to 0 quite so obviously, but apparently it does all the same.

Q3) Given a bunch of tribes which have a ‘length’ and move along a line and occasionally attack towards the other side of the line, and succeed if any part of the wall isn’t at least height s (which is potentially different for each team, and changes as time progresses), determine how many time they succeed if at the end of each day the wall is raised to the minimum height that would have stopped any attack made that day (if it isn’t already higher).

A3) The small data set is a trivial simulation – you only need an array which covers from -200 to 200, and to order the maximum 100 events correctly and handle them happening simultaneously correctly.

The large input is a bit more tricky. 1000 tribes, each with up to 1000 attacks gives up to 1 million events, so if we are still doing pure simulation, we need do better than simulating each unit of the length of the tribe, since that is up to 10^6 and it moves up to 10^5 each move, giving a 10^8 space range and a 10^12 operation count. I think using an interval tree will do the job, especially since the ranges are disjoint. We need to handle splitting and merging, the former for correctness, the merging to avoid catastrophic performance. I think with merging, each operating may match at worst 1000 height ranges – giving a 10^9 operation count, which is huge, but with only 20 test cases, maybe plausible.

Looking at a solution, I think I needed to make the interval tree a bit more fancy otherwise it wouldn’t be fast enough, but the alternative is easier to implement than a full interval tree. They don’t use explicit merging or an actual tree implementation, instead they create a list where the first element represents the entire set of distinct start/end locations for attacks, the second the first half of those locations, the third the second half, the 4th the first quater, a bit like a common binary heap implementation. For each node they have 2 values, ‘whole’ and ‘min’. ‘whole’ is used to shortcut when an attack area covers a large section – so if an entire node is covered, we don’t have to update all of its children, we just update the ‘whole’ entry, and from then on we use the higher of whole and min of the two children when looking at the tree to find the minimum height in a range, since min may not have been updated because of whole. This ‘whole’ strategy obviates the need for merging, because it ensures n log n regardless (where n is the number of events) which is trivially fast enough.

TCO13 R2C

Low turnout as usual – only 1032 registered out of ~1900 potential. With the top 100 through already, this round was always going to be the best opportunity to get a t-shirt, and I was probably close. I got 258th with just a solution to the first problem I was slow to implement, a fast implementation time of shortest path would have gotten me into t-shirt contention range. Alternatively a successful challenge would have got me 150th which is good t-shirt contention. I was looking at a broken 1000pt solution, which I was confident was broken, when I saw that a bunch of 250pt problems had been challenged and had a quick look at the carnage. By the time I switched back to the 1000pt and was about to open the challenge window for a try, someone else picked it off first.

There is always next year of course…

Q1) Given an undirected graph determine the minimum number of steps to make 2 specific vertices neighbours where a step consists of taking one or more distinct pairs of vertices which each have a common neighbour, and making them neighbours.

A1) Looking back on this now, I should have realised after the clarification announcement that this question would have a lot of challenge opportunities. Clarification seemed to state the obvious to me, so I didn’t pay any attention to it, but the 7 challenged solutions in my room all had the same mistake – assuming a step could only introduce one new edge into the graph at a time.

So the solution is straight forward – we need to know the shortest path between the 2 nodes, then calculate the number of steps to make them neighbours. Most of the quickly implemented solutions used the DP version of all pairs shortest path which is O(N^3) – but given the restriction to 50 vertices this was not an issue. I implemented the much longer piece of code for a breadth first search (O(N^2) due to adjacency matrix rather than edge lists) for single source shortest path, this cost me a bit of time.

Once we know the shortest path length, the actual vertexes are irrelevant, along that chain we takes groups of 3 consecutive and remove the middle one. We repeat this phase until we get down to 2. This can simply be implemented as subtracting floor(number of vertexes remaining/3) repeatedly until there is only 2 left, and counting the number of steps. I stupidly actually implemented the loop of copying the vertex ids that remain each time into a new list and repeating until the list only had 2 elements – which again cost me time.

Q2) Given a requirement that a rectangular grid of numbers have a ‘peak’ where every value before the peak row is strictly ascending and after the peak row is strictly descending, and correspondingly for the peak column, determine the minimum sum of elements given that some of the elements have to be specific values. (Up to 200 by 200, but only up to 50 specific values.) Solution is not always possible, impossible states must be identified as such.

A3) I made a start on this question, but I’m pretty sure I was heading in the wrong direction, so I gave up. Code required was quick lengthy and prone to mistakes, justifying the extra points (550 instead of the usual 500). It appears that the correct path was to realise that in the ideal scenario, the 4 corners will be as low as possible. So you create 4 pseudo answers by numbering starting from 1 (or given value) in each corner, and satisfying all the fixed values. Then you merge the 4 pseudo answers together to get the smallest possible sum. (For actual details, you’d need to read a solution, they are in-depth.) Its O(N^3) unlike my original idea which I suspect would have been O(N^4) in the worst case.

Q3) Given a numeric range 1 to n and the ability to make guesses regarding a hidden selected value where each guess is told correct, left or right – determine the maximum probability you can select the right answer between turns a and b inclusive. (n, a and b are all less than 1 million.)

A3) First off you have to realise that this question wants you to maximise the probability of selecting the right answer between a and b turns, not what the probability of ending between turns a and b if you play optimally with intent to win as quickly as possible. So the actual problem has a straight forward cached recursion solution, which is O(n^2*a) time and O(n*a) space – which given the range of values you can clearly see is not even going to remotely work. The solution I was about to challenge used this approach, with a few short cuts to make it fast enough for the basic cases given in the problem, but it was clearly not enough for general case of 1million, 500k, 500k type scenarios.

Only a handful of solutions for this problem, but unlike Q2, they are short. Short but difficult to understand unfortunately. One key thing I see is if the difference between a and b is more than ~21 they replace b with ~a+21 – because if you make it to a turns without getting it correct, you can guarantee that you get it right in the next ~21 turns because of binary search, so any optimal strategy will never go past a+21 turns. The rest is not so easy to understand – at first glance it appears they are walking through possible first guesses and finding the best one, but I suspect this may not actually be a correct reading.

GCJ13 R1B

This time around the top 1000 advancement cut-off was Q1 + Q3 small in fairly fast time, or Q1 and Q2 small.

Q1) Given a game where you can absorb other smaller objects to become the total size of the 2, a starting size and a set of other objects, determine the minimum number of changes that have to be made to the set in order to be able to absorb every object into one big super object. When adding to the set you can choose whatever size object you like.

A1) So this problem is somewhat greedy in nature. It should be fairly obvious that there is never a downside to absorbing an object you can currently absorb. So you start off by sorting the objects by ascending size, and absorb from the start until you can’t absorb any more. At this point you have 2 choices – remove every larger object, or add a new object to make yourself larger in the hopes of continuing. Again greedy comes into play, there is never any value in adding an object any larger than yourself (right now, you can always do it later when you are bigger) and no point making it any smaller than the largest you can absorb. So add an object exactly equal to your current size minus one, absorb that, and repeat from the beginning. You can continue this process until they are all done.

Along the way at each point you have to add an item, check to see whether your current number of added items + the number of remaining items, is better than your best result. If it is, replace the best result. Finally check whether always adding was better still at the very end. Return this result. Since you almost double in size every time, the answer is always going to be no more than about 21, so everything will easily run in time. There is one special case that I have ignored so far. If the initial object size is 1, you have to remove everything, because adding an object never helps in that case.

Q2) In a scenario where diamonds fall down and stack up (with random choice in fall direction when there isn’t an obvious choice) what is the probability that after n diamonds fall a given location has a diamond at it.

A2) The important thing to notice with this question is that the falling diamonds occur in phases. Until the mound of diamonds reaches height n, it never goes wider than n either side of 0. So the answer is either 1 or 0, unless the location being asked for is in the current phase. Each phase adds 1, 5, 9, 13, 17 … diamonds. This is of course an arithmetic series, so it has a quadratic sum formula, so the large input is obviously no worse than ~1000 phases, so it is safe to walk through them all until we get to the one of interest. If the point of interest is in the final shell, we need to work out the probability. For the small input there are only a handful of scenarios, you can probably work them out by hand and just hard code them, but for the large input it can get a bit more complicated.

If the final shell is full, obviously the answer is 1, otherwise we have 4n locations to fill with k objects where the 4n locations form 2 stacks of height at most 2n. If k is <= 2n then each of those k had a 50% chance of falling to either side and the probability of reaching a certain location of height h is the probability of getting h or more heads from k coin tosses, which is standard probability theory (although some care will be needed for the calculations in the large input, 1600 choose h can be a very large number, as is 2^1600 – if you create your pascal triangle cache pre-scaled by 2^-depth and use double precision that should be sufficient accuracy). If k > 2n, then it is possible to fill up an entire side, after which every diamond is guaranteed to fall on the other side. I think this means that k-2n on both sides are guaranteed and we can treat the remaining (k – 2k+4n)=(4n-k) diamonds exactly as before, but with the height of the target location being reduced by k-2n – but I can’t say that I’ve double checked that carefully for probability theory mistakes. If that isn’t true you can instead formulate a separate DP for each problem case (rather than reusing the same pascal triangle cache over and over) and build it up for each diamond dropped.

Q3) Given a string of ‘words’ where the spaces are removed and then someone introduces errors no more than once every 5 characters, determine the best cost match to a starting scenario based off a dictionary.

A3) First thing to realise is that while the dictionary is quite large (500k words) the words themselves have a maximum length of 10. This means that there can be no more than 2 mistakes per word. This problem then boils down to a fairly standard dynamic programming problem. The problem state is what character are we up to in the input, and how many characters since the last mistake (capped at 5). From each problem state we consider each word length (1 to 10), and each valid scenario of adding errors to that next section (no errors, 1 error after the minimum gap, 2 errors after the minimum gap with minimum gap), check if the result exists, if so produce a new potential minimum for the final state (length and characters since the last mistake).

Only question is will this run fast enough for the large input. Large input is 4000 characters, which gives 20k states. Each state has 10 potential word lengths, and the worst case of those word lengths with has ~9500 cases, and the exists check requires operation cost at least equal to word length. This gives an estimate cost magnitude of 20k*900k. Or 18billion memory ops. Even with only 4 test cases, that is very scary time wise. We can probably trim quite a bit by skipping any scenarios where we can’t possibly improve the target cost scenario, and that cost magnitude is a bit of an exaggeration due to taking easy approximations. So release build, no debugging, cross fingers… (or test run time before you submit…)

However, we can probably do better. One option which seems a likely candidate is to use a prefix tree rather than a hashset to store the original dictionary contents – that way while we are generating our potential cases, we are going to get early failure quite often. I would estimate this probably gives at least a factor of 20 improvement in practice. Another option is to pre-compute separate dictionaries for each single character error position, then use them as a basis for the exists check, to avoid having to generate 2 character error scenarios which are the vast majority of the cases to check during the DP. Memory usage will be quite large in this last case, but it should also definitely give a factor 20 improvement if implemented correctly.

I originally considered doing a once off creation of every 2 character error scenario from the dictionary, but the memory requirements can easily start heading towards 50+GB which defeats the point since I don’t have a home super computer. Oh well.

GCJ13 R1A

An interesting new style of question turned up in this round, where you could potentially fail even if you do everything right…

Top 1000 advancing, cut-off ended up being Q1 small/large, Q2 small in quick time, or Q3 small as well.

Q1) Determine how many rings on a bullseye can be painted using a specific amount of paint.

A1) Conveniently the ratio of paint volume to surface area involves pi, so this problem ends up being integer. First off you calculate the area of an annulus with inner radius r and outer radius r+1 – this is pi*(2r+1). Paint volume per ring is thus 2r+1, and we can get all rings by summing from the initial r, increasing r by 2 each time. For small input you can just keep adding until you don’t have enough paint for the new ring and return the count of rings so far. Large input paint volumes need 64 bit integers, so brute force needs over 1 billion iterations in the worst case, so that is unlikely to work fast enough.

So for the large input you need to be a bit smarter. The sum is a classic arithmetic series total, which has the closed form (a+b)(b-a)/2k – where a is the smallest element, b the largest element and k the element spacing. First option (which I would suggest is the best choice) is to binary search on the number of rings, using this formula to calculate the sum. Being very careful to avoid integer overflow. (So long as your binary search range is selected very carefully (a+b)*(b-a)/2k should able to be made to work with 64bit integer, as k is 4 and a+b and b-a are both guaranteed to be even – but it is simpler to just use BigInteger.) Alternatively you can expand the closed form to give a quadratic formula in terms of the number of elements and the target paint usage, solve for n using double precision floating point, and then check which of the nearby numbers is the actual maximum using the closed form (to avoid floating point error triggering the floor function to incorrectly give you the one less than the answer).

Q2) Given an ordered list of tasks where you can spend your energy to create value, and you have a maximum energy E and a regeneration between each task R, and each task has a value per energy expended V_i – determine the maximum value you can get out of your energy, assuming you start at maximum energy.

A3) Small input the maximum energy is only 5, so going into any given task there are at most 5 possible scenarios, and only 10 tasks, gives a true brute force approach of 5^10, which is only a few million. Obviously you are always going to spend all remaining energy on the last task, so its only really 5^9 even. But you could use a cache to get the state space down to ~50 entries.

Large input maximum energy is 10^7, maximum value per energy is 10^7 and maximum number of tasks is 10^4. So even the cached approach is ~10^18 runtime operations. (Which is coincidentally also the range of the maximum answer, so 64bit integers are back in play.) Looking at the problem more closely it seems that a greedy approach is more likely – you want to use your energy all up on the most valuable tasks, unless you could have conserved some of that energy for a later even more valuable task.

Looking at someone’s solution this appears to be the case. You have an array of energy used at each location, starting off by using all energy you have for the first task, and everything you regenerate at the second. And so on, using all the energy you have as fast as you can. But also at each location after the first, you look backwards 1 by 1 to steal some energy forwards if you are higher value per energy. If the total energy used between your current steal candidate and yourself is less than or equal to max, you can safely steal it all forwards. But if it exceeds max, you take what you can, and leave the rest – and stop looking further back. Also if you ever find an element more valuable than you, you stop going back because everything prior to that will have already been stolen forwards to that element as much as possible. This greedy solution looks worst case O(N^2) which is ~10^8 – but the case to trigger that is first task is most valuable, rest are in ascending order, maximum energy is greater than N^2 * regen rate – so there probably weren’t too many inputs that were that bad. I suspect that an O(N) (or close to it) solution is possible by tracking whether we have pulled all the energy forwards in certain ranges, and thus skipping them during any walks backwards.

Q3) Given a set of products selected at random by choosing random subsets of N cards with randomly selected 1 digit numbers (between 2 and M inclusive) take a guess at what the original cards were.

A3) Yes, that’s right, the question doesn’t want you to give the answer or -1 if it isn’t possible to tell for certain, it wants you to take a guess if you aren’t sure. It then specifies that you pass if you have a certain percentage correctness. Since that is a bit nasty, there is no large input, just 2 small inputs that you can repeatedly try until you succeed. Time penalty still applies (which is kind of unfortunate).

Helpfully the first small input is very constrained, only 3 cards, only 4 possible values per card, giving 64 scenarios to consider as to whether they are possible given the presented products. Then to maximise your chances, if there are multiple possibilities, you choose the ‘best one’, weighted by the probability that a given choice of cards resulted in that products given – grouping them by the sorted order (of which there is only 20 cases). Also, since you are given 7 products, your probability of being given a product which is maximal for the value of the cards is actually pretty high. (You could run a bunch of simulations and determine probability for each possible product as to what the original input was, then combine them to choose the most likely input.)

Looking at solutions, it appears that even that is more complicated than required. Success could be achieved by looking at the products – highest power of 3, start your answer with that many 3s, highest power of 5, add that many 5’s to the answer – then look at the maximum power of 2. If you only have 1 digit remaining, you know the answer and return. If you have 2 digits remaining and the power of 2 is 8, then add a 4 and a 2, otherwise add 2 2’s. Finally return 3 2’s otherwise.

The second small input however has 7 possibilities per card and up to 12 cards. This one really needs some smarts, but 7^12 is 17billion, so you can’t consider every starting hand, let alone the probabilities of every subset. Also since there are only up to 12 products given per case, probability of getting the maximal product is quite low. While we can’t consider every starting hand, if you group them by repeats it should be much better. So if you generate the non-descending only hands, and calculate the number of repeats using the standard combinatorics (which I would have had to look up…) and then looking at the probabilities for each possible product (by trying the 2^12 multiplications) for frequency you can generate a big database which can be reused for each sub-test case. For each possible answer you calculate the probability by multiplying all the probabilities of each product together under each potential initial scenario group, and which ever answer is best, you return it.

Looking at a solution it seems I was on the right track. Only thing I saw was that rather than calculating actual repeats, they calculated inverse repeats divided by 12! and then used that as a division factor in the probability rather than a multiplication. Probably avoids requiring to normalize the probability calculations and worry about range errors.

TCO13 R2B

Seems low turn out is normal now – 1214 registered out of a possible 1950. 6 Australians braving the 2am start.

Tough round again, top 50 was determined by having solved the 250 and 500pt, with a couple of 250pt only thrown in through large challenge success rates. Despite throwing away some points with an obviously wrong challenge of a correct solution and getting off to a slow start with multiple silly bugs in my 250pt code, I managed to get top 400 placing and my TC rating went back over 1700 again. But still not in the running to get a t-shirt.

Q1) Given 3 infinite arithmetic series which you can freely slide to different integer offsets determine the maximum minimum gap between any 2 numbers amongst the 3 series. The 3 infinite series are defined by their spacings a,b,c which are all between 1 and 2000 inclusive.

A1) Initial guess was GCD(a,b,c)/3 rounded down – but that doesn’t work for cases like 20,30,40 (which was one of the given test cases helpfully).

As it turns out, the answer is either min(GCD(a,b)/2, GCD(b,c)/2, GCD(a,c)/2) or GCD(a,b,c)/3 (both cases, rounding down) – the determining factor being whether all 3 pairwise gcds are equal or not. Equal results in the second case, not equal results in the first case.

But since the problem states that the maximum range is 1 to 2000, you can find the solution with a bit less smarts and a bit more brute force. (As I did.) Firstly you observe that offset of one of the arrays doesn’t matter, it can be fixed at offset 0 – since it is an infinite scenario, the origin is meaningless. Secondly, it is obvious that an offset greater than the sequences spacing adds no value, since again the sequences are infinite.

This leaves 4 million scenarios to evaluate. So long as we can do so quickly, we are done. So for a pair of infinite sequences (with spacings s1 and s2) with a relative offset of i – what is the closest approach? Turns out this isn’t too difficult to work out – it is min (i % gcd(s1, s2), gcd – (i % gcd(s1, s2)). So calculate this for all 3 pairs and take the minimum then return maximum of those across all scenarios. My implementation (mostly pointlessly) went a bit faster than this by skipping the inner loop if the outer loop gave a pair of sequences which was worse than the current best. Also the outer loop can be simplified to between 0 and gcd(a,b) rather than 0 and b. However a mistake I made at first was trying to make the same range reduction for the inner loop – which isn’t valid as it doesn’t fully explore the combinations of modulus.

Q2) Given a directed graph of up to 50 vertices, where edges can come in 3 different flavours (and any given pair of vertexes can therefore have up to 3 edges in each direction between them) a game is played. This game involves one person choosing a starting location and announcing the flavour of each transition made to move to a new location. The second person then tries to state with certainty where the first person currently is. If the first person reaches a dead-end, they must declare as such, and the game is over (even if there are multiple dead-ends). Game is scored by the number of moves the first player makes before the game is over. Given the graph, and assuming optimal play by player 1, return how long the game is – or -1 if the game is infinitely long.

A2) So it isn’t too hard to write a solution to this problem (which I did) – it is however much more difficult to write one which isn’t too slow in some corner cases. I actually tried to construct a large test case to break my solution, and failed – but the system tester did not. My solution approached the problem from player 2’s perspective. At the start the ‘possible space’ is the set of all locations. Each called out transition maps the possible space to a new possible space. If that space consists of a single location – the game is over. If that space is empty, the optimally playing player could not have called that transition out. A simple dictionary approach lets you store how many moves remain for a given state. When you first see a state you store -1 – thus if you see a state on your current path you can immediately return -1, but if not you can replace that -1 with the actual best number of turns possible. Running cost of this is worst case O(n^2*2^n) – which for n=50 is way too slow. (Using bitwise operations like I did its O(n*2^n) 64bit integer operations, but that is only valid till n=64, and its not a significant improvement anyway.) But constructing this worst case is non-trivial – so I gave it a shot (and failed).

It appear the correct approach is to consider that for player 2 it doesn’t matter whether there are 2 or 22 possible locations, anything greater than one rules out the game ending. So if you take any pair of locations, and consider the possible ways to map that pair to another pair and so on and so on – if you ever find the same pair again, you’ve created a loop and the game can continue forever, but if your search always terminates (no mapping for pair, or only mapping is to them both going to the same node) it isn’t too far of a stretch to see that the best number of turns you find in this search (starting from each possible starting pair) is the same as the best number of turns from the more extensive search.

Q3) Given an initially empty rectangle of size x, y – and the ability to perform 2 operations where you select a sub-rectangle sx, sy and fill 0 or more of the squares in the sub-rectangle – determine how many possible different final states there (modulo large prime) x and y bounded to at most 40.

A3) Only 2 solvers for this problem. Answers look like serious combinatorics – adding and subtracting numerous cases. Seems to break it down into rectangle size where changes are made. For each rectangle size where changes are made count how many possibilities, and multiply that by number of possible offset locations of that rectangle in the larger rectangle. That somehow makes the counting of possibilities a bit easier. Seems that for each of these rectangles you consider the case where the 2 sub-rectangles are fixed to opposite corners – and then subtract off the overlap between the 2 cases of opposite corner. Then there is some magic to do with the edges of the rectangles which must be to avoid some kind of double-counting – its all way too complicated for me at 5am in the morning…

Category: Code Competitions