May 2011 – The Missing Docs .Net

GCJ11 R1C

The problems in this round were just a tiny bit easier and it showed – qualifying required 2 whole problems, or a good time on 1 whole and 2 smalls.

Q1) Given a rectangular array where some tiles are marked with a hash, replace it with a new array where the same tiles which were marked with a hash are marked with 2×2 arrangements of ‘\’ and ‘/’ which look like diamonds. This may not be possible, if not then return as much.

A1) Pretty straight forward question, the key is to identify that if a hash mark has nothing to its left or above, it has to be the top left corner of a 2×2. So replace the whole 2×2 immediately. If this overlaps with the edge of the array, or any of the 2×2 aren’t marked, fail immediately. Once done, repeat until complete or fail. A simple left to right, top to bottom walk solves the problem quickly and efficiently enough.

Q2) Given a sequence of stars, and a distance between each you want to travel from the first star, to the final star in as short of time as possible while stopping at every star in the exact order given. To slightly complicate things your engineers can build up to L speed boosters on stars. These speed boosters take a time t to be built, but once built all travel between the star it was built on, and the next star in the chain is performed at double speed. This even applies if the space craft is between those stars when the building completes.

A2) With the large input claiming up to 1million stars, this problem looks a bit scary at first. However a quick think and you should realize that a greedy algorithm applies.

The answer is (sum of the distances)/speed – (sum of speed boost benefits). Since the first part is a constant, and the second part has a constant number of components – the minimal time comes from maximising each component in the sum of speed boost benefits.

So, simply calculate the speed boost benefit for each star, if a speed boost was built there, and then sort the array descending (no problem for 1 million entries using merge sort), and sum the first L elements.

Only things to watch out for are using 64bit integers and ensuring that your speed boost benefit formula correctly uses t and covers all 3 scenarios – ‘useless’, ‘full benefit’, ‘space craft in mid-flight when building completes’.

Q3) Given a list of N integers, determine the smallest integer in the range L to H which either divides or is a multiple of each integer in the list, if such an integer exists.

A3) I solved this one on my own for a change, but the implementation for the large input is a bit hairy – so high risk I would have failed during competition. The small input constraints allowed a trivial brute force so many people submitted that on its own.

The key is to sort the integers, then the answer is either smaller than the smallest, or larger than the largest, or part of one of the segments now described by each consecutive pair.

Lets treat each case separately for now. Less than the smallest, determine the GCD of all numbers, and find the smallest divisor of that GCD which is in the range L-H. If L>GCD – nothing, otherwise can generate all divisor pairs by walking the integers 1 to Sqrt(GCD). Larger than the largest, determine the LCM of all numbers, and if less than L use a quick division to determine the first multiple of the LCM which is in the L-H range. If greater than H, obviously no answer.

This leaves the gaps. Based on these two cases above it should become apparent what is required. Determine the LCM of the first numbers, and the GCD of the rest. Then the answer is a multiple of the LCM which also divides the GCD. Obviously this won’t work if the LCM doesn’t divide the GCD, so that is a quick check. Next L<->H must overlap with the LCM<->GCD range or there is no point trying. If LCM is in the L<->H range, obviously we are done. Otherwise we walk the integers 1 to Sqrt(GCD) again, checking to see if any divisor pairs contain a multiple of the LCM in the L-H range. Choosing the smallest if there are multiple options.

So putting it all together, generate the LCMs starting from the left, and the GCDs starting from the right, try the less than smallest, then walk each consecutive pair of the sorted order, and finally the greater then largest and if at any time a solution is found in any of these parts, return it.

So I mentioned that the large input was a bit hairy and this is because of the LCM calculations involving numbers which are up to 10^16. Using BigInteger saves you from overflow risk, but unless you realize that there is no need to continue calculating LCMs once they exceed 10^16 (since they are greater than the maximum H value) you run the risk of performance being a problem as BigInteger becomes slower as the size of the data increases. Using a 64bit integer is okay, but you need to hand code or otherwise check for overflow in multiplication, and you need to apply the GCD division before the multiplication!

Performance wise the biggest cost is the walk from 1 to Sqrt(GCD) as this can be up to 10^8 modulo operations. Luckily this can only happen for at most 1 segment, but in case 100million modulo operations is too scary, an improvement to the average case is to divide the L<->H range and the GCD by the LCM. With the L<->H range round inwards, the GCD will divide cleanly. Then you can start with ceil(L/GCD), walking up wards to Sqrt(GCD/LCM), then jump back to just below ceil(L/GCD) and walk down to the first divisor for which the other side of the pair exceeds H/LCM. This doesn’t improve the worst case, but does give you early out and reduced calculation space in the average case.

GCJ11 R1B

Since I got through in round 1A, there was no need for me to get up at 2am and try this round. Apparently 53 other Aussies did though. This time round a fast time on question 1 was not sufficient, you needed to get out a small input from either the second or third questions as well. Third question was probably a good choice there because it was a simple brute force on that input size – but it appears most people just went for question 2.

Q1) Given a matrix of teams, where each team has played at least 2 matches, and there are no draws just win/loss/hasn’t played yet – determine the RPI for every team. RPI is defined as 0.25*WP + 0.5*OWP + 0.25*OOWP. WP is the percentage of games a team has won. OWP is defined as the average of the WP for each opponent of the team, where that WP has been calculated excluding the current team. OOWP is defined as the average of each opponents OWP, defined as above.

A1) Pretty straight forward question – as the 97% success rate from 4500+ people suggests. Large input there are up to 100 teams, but all but the most brute force of approaches will pass this. To be sure you pass the large input, the approach to take is to not start by trying to calculate the RPI of each team, but instead trying to calculate sub problems and then build the results from that. A memotized recursion approach would work as well, but the problem is a bit simple for that level of code complexity.

The basic sub-problem is ‘what is the WP for team a, optionally excluding team b as an opponent’. This can be represented in a square array, using the diagonal for the WP for the team, since excluding self achieves nothing. This can be calculated in O(N^2) time by using a multiple pass approach. First fill the array with the WP for each team, then deviate each point by removing the a vs b game from the average. (Use fractions rather than real numbers by storing it in 2 arrays.) Given the constraints though, a brute force O(N^3) algorithm is fine…

Next up is the OWP list, filled in O(N^2) trivially using the previous array. Finally we can do the RPI, using the diagonal of the first array, the values in the OWP list, and calculating the OOWP as we go by averaging OWP entries as appropriate. Again filled in O(N^2) so overall algorithm is achievable in O(N^2).

Q2) Given positions of groups of shop carts, and a minimum distance required between each shop cart, and a moving speed of 1m/s, determine the minimum time for them to satisfy these constraints.

A2) Small input wasn’t too bad, at most 100 carts, at most 20 groups, at most a 5 meter separation. Large input however was up to 1million carts, in up to 200 groups, and required separation might be 1000km. Initial positions were all within a 200km range for both. It is worth noting that with 1million carts and a 1000km separation 64bit integers will be required.

For a single group the answer is easy enough, (cart count-1)/2 * min seperation. The problem comes when the groups interact, forcing the centre of the group to move. However it seems fairly obvious to me that that is all that happens, the centre of a group moves and that adds to the min required time for that group. The final distribution of carts from a single group, relative to that group never changes. So we can reduce the problem to groups, just with a varying separation requirement, which is governed by the sum of two neighbours cart counts.

One more point to prove and I think this problem is solved. That is that the centre of two groups will never pass one another while going from start position to final position. This is pretty simple because if the centres pass then at least 2 carts have passed. And no 2 carts should pass because it would take the same time for them to meet and turn around and go to each others former destination instead, which is obviously inefficient.

Right so we have a list of groups with a minimum separation between each neighbour, so we can determine exactly where the centre of each group will be relative to a single number, the far edge. So now we can do a binary satisfiability search on time to work out the minimum time. Starting with a minimum bound which is the time to spread out the largest group without moving its centre, and an upper bound which is derived from the scenario of a single group of 1million carts with a 1000km separation requirement, the binary search then proceeds. Satisfiability is determined by walking each group left to right – the centre of the left most group is moved as far left as the time will allow – from that onwards the time constraint and the border of the group to the left both contribute to where to place the centre of the next group. If the problem is not satisfiable, the position of the group to the left will eventually be so close (or even past the starting position to the right) that there is no way to move the centre of the current group right far enough in time.

Binary search can be performed on floating point numbers, or it can be seen that the answer is always a multiple of 0.5, and so an integer binary search on 2*t will also work.

I think I probably would have solved this in time, whether I would have remembered to use 64bit integers (as I probably would have used the integer binary search approach) is a different question.

Q3) Given an N vertex convex polygon, and M internal non-intersecting walls (which create M+1 internal polygon partitions), determine the maximum number of different colours which can be used to paint each vertex, such that every partition has every colour present for 1 or more of its vertexes. Also output any such colouring to prove it possible.

A3) Obviously the answer is at most N. We can fairly easily go further and say that the answer is at most k where k is the minimum over vertex count for each internal partition. Minimum answer is also obviously 1, or we can go further and say that it is 1 + the minimum of the number of ‘non-shared’ vertexes for each internal partition. For the small input the maximum for N was 8, which made a brute force of every possible colouring perfectly feasible.

Problem can be turned into a graph, but with up to 2000 vertexes, it isn’t obvious to me how this could be implemented with a fast enough running time. Time to look at the answers… (Because I am lazy and don’t want to keep thinking about this problem at this point…)

So, the key here is realizing that each partition joins to another by a single wall, and there is no set of 3 or more partitions which join in a circle, partitions form a tree joined by specific vertex pairs. If you choose a random partition and colour its vertexes, so long as the two vertexes on each partition wall aren’t the same, you aren’t limiting the other connecting partitions in any way, except that if you colour this partition with too many colours, the next partition might not be able to use all of those colours as well. From this you can see that the partition with the least number of vertexes is the critical bottle neck. If you label every one of its vertexes a different colour, any/all connecting partitions will have no problem colouring themselves. So long as you don’t repeat the same colour two vertexes in a row you are fine. Since the minimum number of colours is 3, you can just paint vertexes clockwise around a polygon by going through the colours in order, and if it happens you get 2 of the same colour in a row when you complete your walk, you just switch the last 2 vertexes you coloured.

The other tricky part is generating the partition graph fast enough from the input as given. This actually isn’t too hard, but it certainly is possible screw it up.

GCJ11 R1A

So like last year, I scraped through in R1A with only 1 out of the available questions solved. But this year I was a bit faster, so I got 655th rather than 800+.

I was actually very close to a respectable score – an off by 1 which took me almost 2 hours to find… *sigh*

Q1) Given an inclusive upper bound on the number of games played today (and at least 1 game was played), with an ‘exact integer’ percentage win rate for today of Pd and an ‘exact integer’ percentage win rate for all time Pg, is the scenario even possible?

A1) The large inputs put the inclusive upper bound at 10^15 – which seemed daunting at first, but fairly quickly I discovered a short cut which I thought made it irrelevant…

First we can rule out a couple of obvious scenarios – if Pg is 100% then Pd must be 100% or something is wrong. Similarly Pg of 0% implies Pd of 0% or again we fail. For me the next thing I realized was that if the inclusive upper bound was >= 100 then the option of 100 is available – an important option in the presence of percentages. In this case every value of Pd is valid. In that case all that remains is to consider whether Pg is valid…

Pg being valid is actually very simple, so long as it isn’t 0 or 100, we can add so many games prior to today, that today becomes irrelevant, other than to serve as a tiny modulation factor which needs to be compensated for to make Pg an exact integer. This applies regardless of whether the number of games today was 100, or much smaller.

Final scenario is where the inclusive upper bound doesn’t include 100. This is less than 100 options to step through, so why not! For each value of n, we check whether n*Pd is an integer, or similarly, whether n*Pd = 0 mod 100. If we find a match the situation is plausible, if not then the situation is invalid.

So as I alluded to earlier the upper bound max of 10^15 isn’t a problem, 3 if statements take care of all scenarios >= 100. However when I went to run my code on the large input file, it crashed! I hadn’t noticed exactly how big the upper bound max was, and was using int.Parse.

Q2) In a variant of hangman where there is no way to lose, just a score that gets worse the more useless guesses you make, you are choosing words from a dictionary and your friend is playing to a specific strategy. You want to choose the word which will give your friend the worst score – if there are equal options the first word in the dictionary is the one you take. If the strategy can be described by a permutation of the alphabet, where each letter is guessed in turn unless there are no dictionary words which match the previous guesses which contain the letter, in which case the letter is skipped, given a list of these permutations and the dictionary, what word should you select?

A2) A very wordy question (pun not intended!), and for the longest time the question text itself was self-contradicting. Large input had a dictionary size of 10k words and up to 100 permutations to try for each dictionary. Maximum of 10 test cases meant that it wasn’t entirely insane though. A brute force attack requires iterating over the dictionary, for each entry in the dictionary – and performing this multiple times even – it was clearly not going to work in time O(D^2*P) is billions upon billions of operations for the large input.

I spent a while considering whether there was some dynamic programming approach but eventually came to the conclusion that I was just trying too hard, and that an O(D*P) solution would be okay even with a rather high constant multiplier. So I started implementing. The approach is not to try each possible word choice and simulate, but instead to simulate all possible word choices at once.

At this point I should mention something which possibly isn’t clear from how I wrote the question above, which is that since the first thing you do in hangman is to write down the full list of blanks, your friend immediately knows the length of the word, so whatever word you select your friend will only consider words of that length.

Put all the words of a specific length in a list, then going through each one determine what its response to the current letter being tried in the strategy. The responses (which can be represented by a bit field as maximum word length is small) break the words into groups. Each of these groups is self consistent – if any of those words had been the originally selected word, the group would be same set of words that your friend would consider as still plausible and the basis of the strategy. Once each group is created you can move to the next letter in the strategy, applying it to each group, splitting it into potentially even more groups which remain self consistent. All that remains is to keep track of the score for each group, and then select the best score once you get to the end. Repeat for each different word length, and select the best of the best.

Keeping track of the score for each group is fairly easy, first group has score 0. If a group response to a letter by splitting into 2 or more groups, and one of those groups contains words that do not have that letter, that group has its score increased by 1 compared to the input group. Key thing to notice here is that if the response of a group to a letter is to not split, and the words do not contain the letter, the score doesn’t go up – because this corresponds to the scenario where the strategy is to skip a letter, because none of the plausible words contain the letter.

I actually had this all written with almost an hour to go, I even took the time to generate the maximal input file to make sure it wasn’t too slow. Except it didn’t produce the correct answers. I was confused, I hunted for potential issues – only found 1, and it was still wrong. With 30 minutes to go I decided I would write the brute force implementation, just get the points for the small input or if I had time, maybe use it to work out why the proper implementation didn’t work. But I made copious mistakes in the brute force code – even though it was theoretically simpler, I got it wrong in multiple ways. After the competition finished, I still had to fix 3 more bugs before it passed the practice small input. At this point I compared it to my proper implementation, and found a specific test case to focus on. Turned out that my proper implementation was actually much better, it hadn’t suffered from any of the bugs of the brute force code. Just a single mistake, iterating over the dictionary of grouping words by length, I hadn’t used foreach, but had instead just iterated over the possible lengths, 1 to 10. Except my loop stopped at 9, oops!

Q3) You are playing a special single-player card game. You have at least 1 card in hand at the start, and the rest in the deck – with a total of up to 80 cards. You start with 1 turn remaining until the end of the game, but each card has 3 values. T, S and C. T is the number of additional turns you gain by playing it, S is the score you get by playing it, and C is the number of cards you draw from the deck, if you play it. You actually know the exact order and contents of both your hand and the deck before you start – determine the maximum score you can achieve.

A3) This was the ‘hard’ problem and I definitely agree it was far harder than Q2. Only 3 people got out the large input, and barely over 100 got the small.

The key difference between small and large was of course the limits imposed. In the small input, C is at most 1, where as in the large input C could also be 2. I think with the C is at most 1 limit, the problem can be more obviously turned into a dynamic programming problem…

I certainly had to look up the answer here, but the key comes from determining properties of the ideal answer and using those to limit your search space, which would otherwise be huge. Firstly it is worth realizing that you should always play a T>0 card if you have one available. Since they give more turns, there is no downside to playing them. Next is the C=0 cards. For a given maximal solution, any C=0 cards (which are not T>0 cards) can be reordered to the end. Furthermore amongst C=0,T=0 cards you will always want to play them in largest score first order. All that remains is to determine the order of playing the C=1 and C=2 (T=0) cards in between, if they are played at all.

The final and most important trick here is to determine that if you play 2 C=1 or 2 C=2 (T=0) cards, they can be reordered such that you play the one which arrives in your hand first, first. Finally this gives enough limitations on the search for an ideal solution. So we have a graph with status ‘index in deck of last drawn card’, number of turns remaining (capped at 80, since over 80 is pointless), the first C=1 card we might yet play, and the first C=2 card we might yet play and the first T>0 card we might yet play. The transitions on this graph always increase one or more of these 5 values, so it is an acyclic directed graph. Since there are no cycles you can fairly easily search this graph for the ‘longest path’ from the source, where each edge is a score and each vertex has an end bonus related to maximal use of remaining turns to play available C=0 cards.

To enumerate the possible transitions we have ‘play a T> card’, ‘play a C=1 card’, ‘play a C=2 card’, ‘decide to never play the current C=1 card’, ‘decide to never play the current C=2 card’. A performance improvement option here is to remove the C=1/C=2 card options if there is a play a T>0 card option, since we know playing T>0 first is never a bad thing (Or alternatively, get rid of all T>0 from the status info and state transitions – presume to have used all the could be in the hand as part of every transition, and in determining the initial status). Use a DFS to generate the graph, as it is going to be sparse and maximum recursion is 80. Also add backwards pointing links because they will be useful for determining the maximum length.

Recursion with memotization will solve the problem at this point. Starting from the implicit end node which every node connects to via the play of playable C=0 cards, recur across each back pointing link added before, using the edge weight (score increase due to that action) to build upon the result of the recursion. Memotization to ensure the runtime is fine.

TCO11 QR2

So this one was at a more reasonable time, so I participated. 281st, not as good as I might like, but good enough to get through.

Q1) Given a scrambled list of two item types with a total count of N, and the ability to at most once for each i in 1 to N switch two elements that far apart, determine the minimum number of swaps to order the list, if it is possible.

A1) So, the answer to this question was so obvious that it takes a while to see. In the case of there only being 2 element types a single pivot sort pass will result in an ordered list. If you consider a pivot sort where you walk from the outside towards the middle, swapping if both are on the wrong side, walking a specific end if it is on the right side – you can see that it will perform a minimal number of swaps and that every swap will be closer than the last.

Hence it is always possible to sort, and the answer is just the count of the number of items not in the right spot to start with/2. Or you can simulate the pivot sort, like I did.

Q2) Given a ‘map’ (1 dimensional array) which shows the position of two critter types, which are either grumpy at time t % K = 0 or at t % C != 0, determine the minimum time to get from the left to the right of the map if you cannot by in a cell at a time when its occupant is grumpy, if it is even possible. Maximum value for K,C and length of array are all 50.

A2) So this is really pretty easy, its a shortest path algorithm, with a small twist. Because the behaviour of a critter depends on time, each cell needs to be replicated in the graph to represent how time plays a part. Obviously replicating once for every time step defeats the whole purpose of a shortest path algorithm… but since behaviours are dependent on mod K and mod C, it is pretty obvious that t mod K and t mod C are the only things that matter for a given cells possible future paths. So each cell is replicated K*C times (or LCM(K,C) if you want to get really fancy, but given the availability of primes near 50, worst case for K*C and LCM(K*C) are pretty close…

So 50*50*50 vertexes in the graph, and 3 edges from each vertex. From there you just use a priority queue based Dijkstra algorithm and presto, successful solution done.

I on the other hand used a non-priority queue – which for a while I had convinced myself should perform much slower in the worst case. Shows how rusty I am! graphs with equal edge distances don’t need a priority queue as a normal queue already walks them in breadth first search order. Using a priority queue would only make things slower by adding the logarithmic insert/remove cost overhead. So yeah, in hindsight my algorithm works pretty well… If .Net had of had a built in priority queue I probably would have used it and only made things marginally worse!

Q3) Define S(x) to be the sum of the digits in x. Define D(x) to be the recursive application of S until the input equals the output. Given a range, determine how many of the contained numbers can be represented as y*D(y). Range constraints: less than 10 billion.

A3) This problem was easy! – unfortunately I only realized as much with 10minutes on the clock, and I just wasn’t fast enough to write it all up. I was 4 lines of code from being complete. (Although said 4 lines require a little bit of thought each…) If I had of rushed rather than taking everything at an easy pace I might have saved those extra 2-3 minutes I needed to finish those 4 lines and perform the final debugging. That would have gotten me in the top 40, which is where I know I am capable of reaching.

D(y) = 9 if y is a multiple of 9 else y mod 9. If you’ve played around with digit sums before you would know this, if not a short exploration should make it fairly obvious. So y*D(y) can be decomposed into 9 linear functions, which sometimes overlap. Includes/Exclusion principle applies, so you can sum, subtract, sum, subtract your way through every possible n choose 9. That is quite a few equations to write out, but everyone of them is pretty simple – if you could be bothered…

However! Every one of the 9 equations is of the form a*9k+b. Note that these equations all have constant ‘mod 9’ values and they aren’t all the same. Hence many pairs don’t have any overlap and so the inclusion/exclusion can be simplified significantly. It decomposes into 3 pairs and 1 triple. This means you only need 9 base equations, 5 pair equations, and 1 triple equation. I was half way through the pair equations when time ran out. Interestingly I found one of the pair equations cancels with one of the base equations, and apparently there are more cancellations because looking at once of the successful answers, they only needed 9 equations and 7 of those were base equations. (So technically I was 2 lines of code away from being finished :P, 1 add and 1 delete…)

TCO11 QR1

So TCO11 kind of crept up on me, wasn’t paying much attention compared to Google code jam. But here it is qualification rounds 1 2 3 intermixing with the schedule for GCJ Round 1.

I was thinking of entering this round, probably would have been tricky to get through though, QR1 is not usually agreeable to my skill balance. I signed up for it, but by the time it hit midnight here I decided that I was too tired to stay up another 2 hours until the competition started. QR2 and 3 are much better times for me, although QR3 is during a work day so, hopefully I get through on QR2.

Almost 1600 contestants competed in QR1 – cut off for 600th place to advance to round 1 appears to be 235 points. So a ‘fast’ question 1, or question 1 + challenges, or question 2 was the target. Only 1 Australian through this round – but given the time zone it is hardly surprising…

Q1) There is a group of up to 50 people and each knows exactly how many of them are compulsive liars. Each makes a claim ‘there are at least n liars’, for a compulsive liar this means there are ‘less than n liars’. Determine the minimum number of liars, or return -1 if there is no possible correct answer.

A1) So this isn’t a difficult question, although there might be 2^50 possible scenarios for consideration they can easily be excluded in bulk, and exact scenario is not required. We just need to walk through each possible liar count and determine whether it is satisfiable.

So what is the requirement for satisfiability? For a trial n, there must be n claims greater than n, and the rest must be less than or equal to n. This corresponds to the n liars and the N-n truth tellers. Trivial enough really… probably trivial enough that I would have been too slow.

Some corner cutting options which are totally pointless and just serve to make your answer more complicated include counting the 0 and 1 entries – as those cannot be liars, and counting the N and greater entries – as those must be liars, then performing the walk in between. If the range had of been larger, sorting entries would allow for an O(n log n) implementation rather than O(n^2) – but it was just pointless in this small input size scenario.

Q2) Given a set of up to 50 integer music track lengths, and random selection from them (where repetition is possible) determine the probabilities that a given time (T) will be playing each track.

A2) Looks like an interesting question on the surface… seems like it calls for dynamic programming, but with T being up to 80000, need to be careful not to invoke O(T^2) on the DP array population.

At T=0, all probabilities are equal and it stays that way up until minimum track length. At that point probability of that track drops and the others go up. So it looks like maybe the probabilities can be linked to when tracks last ended.

At this point in my investigations into the solution I went down a long path of counting the number of ways to be ending a given song at time T. This was a problem in 2 ways. 1) It produced Huge numbers which were just too large to handle, and 2) it was useless because it didn’t account for the probability of the starting point of each song. Counting is not a good option, probabilities are better. I doubt I would have realized this during the competition soon enough, so highly likely I would have failed to get through.

So the correct approach is to ask “what is the probability that a song can start playing at time t?” This is even easier than the counting question. P(0) = 1. P(t) = Sum(P(t-ti)/N). The probability is the sum of the probability at the previous potential start time for each song divided by the number of song options for playing at that time. From there the answer is just the sum for each track over each possible start time that would have it still playing at the finish time. (Remembering to divide by the number of tracks that could have started playing at that time.)

Q3) Given a pattern of ‘B’, ‘W’ and ‘?’, where there is always exactly 1 question mark and 0 or more of the other characters, determine the shortest and lexicographically minimal replacement of ‘?’ which makes the sequence have a value of N, using the following rules. Sequence of length 1 has value 1. A letter alternation increases value by 1, a letter repetition decreases by 1. The sequence as evaluated left to right must never decrease below value 1. Further conditions, input pattern is at most 50 characters, and the final value of N is at most 100.

A3) Initially looking at this question I am thinking it must be a joke…

Four scenarios. 1) Question mark is on the end. Evaluate the pattern start for viability, and then proceed from evaluation end value to target value with directness – empty, alternates, or continuous repeats. 2) Question mark is on the start. Slightly more tricky, Check for empty, or add alternating on the front until it works (if it is going to at all). 3) Question mark is in the middle. Hardest case. Similar to case 2, but target to get back to isn’t empty. 4) Question mark is on its own. Answer is alternation starting from B until target value reached.

Scenario 3 is the only tricky part. Given a start and end character, and a target end value and a given start, without ever going below 0, what is the lexicographically smallest approach that yields the correct result. I suspect this could be done using a list of maybe a dozen or so scenarios which are simply thought through for the optimal strategy, but it can also be turned into a recursion, but one which needs to be evaluated in a fashion which is greedy so as to avoid considering solutions which are known to be way too long.

So with a known start character and start value, we can look at it as a shortest path to specific value ending in specific character. Adding a character the new state is determined by the old value, the old last character, and the new last character, with the outputs being the new last character and new value. If the assumption is made that the value never needs to exceed a certain threshold in order to achieve the shortest path, this state space is limited and easily fully explored (only two possible transitions from each spot, which we only need to consider if we find a better answer for that position). A simple threshold to assume is double the maximum target, or 200 – but I suspect the true threshold is much lower.

So, not a joke after all… still I think I was more likely to solve it than question 2…

GCJ11 QR Post-Round Analysis

So looks like a pretty big turn out – 10566 qualifiers. 1855 perfect scores – which I was a part of – although in some ways I don’t think I deserved to be.

The 4 problems was definitely more than I could have handled in a 2.5 hour competition (not without less huge insane mistakes at least!) – I actually took more than 5 hours to submit my answers – although I had a lot of distractions and took a few hour break in the middle. So sure it took less than 2.5 hours to write up the code, but I cannot ignore the value of that break allowing to the answers to mull over in my head.

Q1) Given starting positions for 2 robots and that they can both move at 1 unit per second, and that it takes 1 second to press a button, and a list of integer button positions and who has to press them in exactly the given order, what is the minimum time to complete the required work?

A1) This puzzle was actually portal themed – there was mention of possible cake. As usual they provided a sample example of ‘one way’ to solve a sample optimally. I almost wonder whether I should start to assume that any time they say that it is an example of a sub-optimal strategy which just happens to be optimal.

Anyway the question itself is a fairly basic simulation using a greedy algorithm. At any given point in time the two robots have a target position, which is based on the next button which needs pushing by them. For a tiny optimization, this only ever changes when they push a button, but given the constraints even on the large input you could probably have gotten away with updating the target every timestep. Then it just comes down to simulating each timestep. If a robot is at its target, it doesn’t move. If it is supposed to push a button it pushes the button and doesn’t move. A likely trivial mistake to make would have been to accidentally let the second robot push its button in the same turn as the first robot because you’ve updated which is the next button to press but haven’t yet ended the current turn. Maximum number of turns is something like 100k for the large input, no problems given the 8minutes you have to submit the answer.

Q2) Given a list of pairs which if matched consecutively are replaced by a third (which will never be found in the list of pairs), and a list of pairs that if they are ever found anywhere in a list (except if they are instantly replaced by something in the first list) cause the entire list to be cleared. Process a list of input characters adding each one to the output list in turn, subject to these rules, print the output list in a pretty format.

A2) At 82% this problem had the lowest small to large conversion ratio – and it isn’t surprising because the small problem constrains the list of pairs to be at most 1 for each case and the large allows up to 30. This isn’t a performance problem at all, but it does mean that several corner cases just cannot occur. Writing up my solution for this problem probably took longest of all of them, although I found it easy to solve in theory – I made copious errors during implementation – several of which would have crept past the small inputs if I hadn’t of noticed them in time. (Keeping with the computer gaming theme, this problem is inspired by Magika)

So again this is a straightforward simulation. If the list is empty you add the character. If not first check whether the incoming character and the existing last character are a pair in the first list of pairs. Performance is not a problem, just walk them all, checking both orders (scale optimization is to add pairs a dictionary of dictionaries – but not needed here). If a match is found, replace the current last character with the replacement and continue loop onto next character. If not it is time to check whether there is a need to clear the entire list.

Again because the scale constraints are so small there is no need to optimize this using dictionaries, you can just walk every entry already in the list, consider it with the incoming character against every pair provided in both orders. Obviously this is O(n*k) but k is at most 30 in the large input. If you do optimize this scenario you would probably do what I did and add a set to keep track of the unique elements in the input list so you can dictionary lookup the next character, and do a set intersection between the dictionary results and the set of unique elements in the input list. The one thing to be careful of here is that a set is not a good implementation for keeping track of the set of unique elements in the input list, you need a count of each unique. Without a count, the pair transformations which remove the last element can cause you to incorrectly claim that element is no longer in the list. In any case, if you find a match, you clear the list and do not add the incoming character.

Finally if neither transformation nor clearing occurs, add the character. At the end of the simulation loop the nice formatting was trivially achieved using string.Join – but a simple loop with a first loop condition does the job.

Q3) Two boys have a set of items which each have a value. One boy can’t add – any time he adds two numbers he always forgets to carry the 1. He also adds in binary… The other boy however can add just fine and wants to create two piles of these items which the first boy will think have the same value, but hopefully do not. In fact he wants to minimize the value of one non-empty pile, and being a mean boy, give that to the first boy and take the rest for himself. For a given set of values, is it even possible, and if so what is the maximum value the second boy can take for himself.

A3) Looking at the questions, I would say this one gives the first question a run for its money as the easiest question – but the statistics don’t back me up. I was bemused to see people asking the Admins questions about this question during the competition. ‘Does the first boy add the items left to right?’ – ‘Yes’. Addition without carrying the 1 is both commutative and associative, so the order is irrelevant.

First off we note that the first boy thinks A+A=0. This combined with commutative and associative, quickly shows that the sum of all items must be 0 if there are 2 piles which have the same sum. The next leap is to see that A+A is the only way to get 0, hence again because of commutative and associative natures any bi-partition of such a pile with sum 0, has equal values.

So, from there it is trivial. Second boy gives the single smallest item to the first boy and takes everything else.

Implementation involves proving the no-carry sum is 0. Afterwards I realized that I could have implemented this using simple xor on the whole number, but I broke the integers into arrays of bools and reimplemented bitwise no-carry sum (xor!) by hand. Once you prove that the xor of all inputs is 0, sort, and sum the top n-1 elements and return that as result. Simple.

Q4) Given a permutation of the first N integers, and a single operation which is ‘randomly permute a selected subset’ and acting ideally – what is the expectation value for the number of operations until the list is sorted.

A4) This was by far both the hardest and easiest problem. The provided sample provided a hint which was both useful and useless at the same-time… Random simulation looked liable to time-out before convergence, especially for large input and presumed you knew what the ideal strategy was.

So, what is the ideal strategy? The sample provided a hint that if you have n disordered elements, acting on each sub-permutation cycle independently appeared a good strategy. (Permutation cycle in disordered elements is the smallest set of entries which can be reordered on their own to end up completely ordered correctly. For example 2341 is a permutation cycle of 1234, as is 2413, but 2143 and 4321 are not as they can be divided into pairs which when switched are in the correct positions. This may not be the correct definition of a permutation cycle, but I can’t think of a better term to use in this discussion…)

Indeed I could see that it seemed reasonably obvious, that not randomizing a subset of a permutation cycle was inefficient. I didn’t prove it, but a trial with small permutation cycles agreed with the hypothesis. So I started down the path of dividing the input into a count of the number of each permutation cycle length, under the assumption that the ideal strategy was to randomize each permutation cycle in turn until it was ordered. At this point I tried some examples – in fact the first example I tried was a cycle of length 3, 312. However due to some incredibly stupid mistakes (I made 3 in quick succession, but only noticed 2) I incorrectly calculated an average of 2.5 operations to order this cycle. I then made a completely mistaken leap from this mistaken value to presume that this was further evidence that working with cycles was better than working at random.

So at this point I start breaking out some serious combinatorics deriving the formula for the average number of operations for a cycle of length n, assuming the ideal behaviour of working after each operation was to work with one cycle at a time. I’m not great with combinatorics, and it took me a while to realize that most of the combinations and permutations and factorials in my formula would actually cancel out, and I wouldn’t be needing to calculate the ratio of 3000 digit numbers to 7 significant figures. I then made a mistake doing the cancellations and ended up with something again, completely wrong. It was obvious though, so I started some trials again. This time on the core piece of my formula which is ‘how many different ways is there for n randomly permuted entries to have a length k cycle which includes the first element. I was bemused to find that my trials didn’t depend on k, it was always (n-1)! – or 1/nth of the total permutations. I could see where I screwed up my cancellations now and ploughed ahead with my formula. I implemented mutual recursion memotization, and hit go. It was fast, it produced the correct answers for the sample.

At this point I decided to have a look at my memotization tables – I laughed because there, accurate to 7 or more significant figures, was a table which contained ‘n’ at index ‘n’. I had successfully demonstrated that for ‘n’ disordered elements the sum of the times for each subcycle is equal to ‘n’, which is tantalizingly close to suggesting that advanced strategy was irrelevant. The only thing you have to do, is make sure you don’t randomize something which is already in the correct place (and make sure you do randomize every member of any given cycle, which comes from letting everything else randomize).

So at this point I ripped out my memotization tables and replaced them with ‘if passed n, return n’. I then realized the obvious, that I didn’t actually need almost any logic any more at all. The answer was equal to the number of disordered elements in the input. Counter, for loop, if statement, increment, return result.

I probably would have solved this problem quickly, if it wasn’t for that mistake of calculating 2.5 as the expectation value for cycle length 3… Instead I wasted almost 3 hours on it alone. 3 hours for 4 lines of code… (ignoring the file parsing code of course!)

GCJ11 Qualification Round

So its almost halfway over and over 4500 people currently have the requisite 25 points to their name – of course the final testing could still play havoc with a few of them.

A significant change in format this year – previously there was no way to qualify solving just the small input files, solve one problem in full and you are through.

This year there are 4 questions, and you could solve just small input files to get through – indeed I’ve solved them all so baring some catastrophic event there is no way I am not through to round 1. (I also solved the large inputs, we’ll see how the final testing treats those…) Also, the first question isn’t worth enough points to get through.

So I plan on doing a write up tomorrow once it is all finished, but it is looking to be rather a busy day, so it might get delayed…

GCJ11 is coming

Just 2 and a bit days till GCJ11 qualification round now. Registration is open on the day I believe, so no rush.