Code Competitions – Page 9 – The Missing Docs .Net

TCO11 QR1

So TCO11 kind of crept up on me, wasn’t paying much attention compared to Google code jam. But here it is qualification rounds 1 2 3 intermixing with the schedule for GCJ Round 1.

I was thinking of entering this round, probably would have been tricky to get through though, QR1 is not usually agreeable to my skill balance. I signed up for it, but by the time it hit midnight here I decided that I was too tired to stay up another 2 hours until the competition started. QR2 and 3 are much better times for me, although QR3 is during a work day so, hopefully I get through on QR2.

Almost 1600 contestants competed in QR1 – cut off for 600th place to advance to round 1 appears to be 235 points. So a ‘fast’ question 1, or question 1 + challenges, or question 2 was the target. Only 1 Australian through this round – but given the time zone it is hardly surprising…

Q1) There is a group of up to 50 people and each knows exactly how many of them are compulsive liars. Each makes a claim ‘there are at least n liars’, for a compulsive liar this means there are ‘less than n liars’. Determine the minimum number of liars, or return -1 if there is no possible correct answer.

A1) So this isn’t a difficult question, although there might be 2^50 possible scenarios for consideration they can easily be excluded in bulk, and exact scenario is not required. We just need to walk through each possible liar count and determine whether it is satisfiable.

So what is the requirement for satisfiability? For a trial n, there must be n claims greater than n, and the rest must be less than or equal to n. This corresponds to the n liars and the N-n truth tellers. Trivial enough really… probably trivial enough that I would have been too slow.

Some corner cutting options which are totally pointless and just serve to make your answer more complicated include counting the 0 and 1 entries – as those cannot be liars, and counting the N and greater entries – as those must be liars, then performing the walk in between. If the range had of been larger, sorting entries would allow for an O(n log n) implementation rather than O(n^2) – but it was just pointless in this small input size scenario.

Q2) Given a set of up to 50 integer music track lengths, and random selection from them (where repetition is possible) determine the probabilities that a given time (T) will be playing each track.

A2) Looks like an interesting question on the surface… seems like it calls for dynamic programming, but with T being up to 80000, need to be careful not to invoke O(T^2) on the DP array population.

At T=0, all probabilities are equal and it stays that way up until minimum track length. At that point probability of that track drops and the others go up. So it looks like maybe the probabilities can be linked to when tracks last ended.

At this point in my investigations into the solution I went down a long path of counting the number of ways to be ending a given song at time T. This was a problem in 2 ways. 1) It produced Huge numbers which were just too large to handle, and 2) it was useless because it didn’t account for the probability of the starting point of each song. Counting is not a good option, probabilities are better. I doubt I would have realized this during the competition soon enough, so highly likely I would have failed to get through.

So the correct approach is to ask “what is the probability that a song can start playing at time t?” This is even easier than the counting question. P(0) = 1. P(t) = Sum(P(t-ti)/N). The probability is the sum of the probability at the previous potential start time for each song divided by the number of song options for playing at that time. From there the answer is just the sum for each track over each possible start time that would have it still playing at the finish time. (Remembering to divide by the number of tracks that could have started playing at that time.)

Q3) Given a pattern of ‘B’, ‘W’ and ‘?’, where there is always exactly 1 question mark and 0 or more of the other characters, determine the shortest and lexicographically minimal replacement of ‘?’ which makes the sequence have a value of N, using the following rules. Sequence of length 1 has value 1. A letter alternation increases value by 1, a letter repetition decreases by 1. The sequence as evaluated left to right must never decrease below value 1. Further conditions, input pattern is at most 50 characters, and the final value of N is at most 100.

A3) Initially looking at this question I am thinking it must be a joke…

Four scenarios. 1) Question mark is on the end. Evaluate the pattern start for viability, and then proceed from evaluation end value to target value with directness – empty, alternates, or continuous repeats. 2) Question mark is on the start. Slightly more tricky, Check for empty, or add alternating on the front until it works (if it is going to at all). 3) Question mark is in the middle. Hardest case. Similar to case 2, but target to get back to isn’t empty. 4) Question mark is on its own. Answer is alternation starting from B until target value reached.

Scenario 3 is the only tricky part. Given a start and end character, and a target end value and a given start, without ever going below 0, what is the lexicographically smallest approach that yields the correct result. I suspect this could be done using a list of maybe a dozen or so scenarios which are simply thought through for the optimal strategy, but it can also be turned into a recursion, but one which needs to be evaluated in a fashion which is greedy so as to avoid considering solutions which are known to be way too long.

So with a known start character and start value, we can look at it as a shortest path to specific value ending in specific character. Adding a character the new state is determined by the old value, the old last character, and the new last character, with the outputs being the new last character and new value. If the assumption is made that the value never needs to exceed a certain threshold in order to achieve the shortest path, this state space is limited and easily fully explored (only two possible transitions from each spot, which we only need to consider if we find a better answer for that position). A simple threshold to assume is double the maximum target, or 200 – but I suspect the true threshold is much lower.

So, not a joke after all… still I think I was more likely to solve it than question 2…

GCJ11 QR Post-Round Analysis

So looks like a pretty big turn out – 10566 qualifiers. 1855 perfect scores – which I was a part of – although in some ways I don’t think I deserved to be.

The 4 problems was definitely more than I could have handled in a 2.5 hour competition (not without less huge insane mistakes at least!) – I actually took more than 5 hours to submit my answers – although I had a lot of distractions and took a few hour break in the middle. So sure it took less than 2.5 hours to write up the code, but I cannot ignore the value of that break allowing to the answers to mull over in my head.

Q1) Given starting positions for 2 robots and that they can both move at 1 unit per second, and that it takes 1 second to press a button, and a list of integer button positions and who has to press them in exactly the given order, what is the minimum time to complete the required work?

A1) This puzzle was actually portal themed – there was mention of possible cake. As usual they provided a sample example of ‘one way’ to solve a sample optimally. I almost wonder whether I should start to assume that any time they say that it is an example of a sub-optimal strategy which just happens to be optimal.

Anyway the question itself is a fairly basic simulation using a greedy algorithm. At any given point in time the two robots have a target position, which is based on the next button which needs pushing by them. For a tiny optimization, this only ever changes when they push a button, but given the constraints even on the large input you could probably have gotten away with updating the target every timestep. Then it just comes down to simulating each timestep. If a robot is at its target, it doesn’t move. If it is supposed to push a button it pushes the button and doesn’t move. A likely trivial mistake to make would have been to accidentally let the second robot push its button in the same turn as the first robot because you’ve updated which is the next button to press but haven’t yet ended the current turn. Maximum number of turns is something like 100k for the large input, no problems given the 8minutes you have to submit the answer.

Q2) Given a list of pairs which if matched consecutively are replaced by a third (which will never be found in the list of pairs), and a list of pairs that if they are ever found anywhere in a list (except if they are instantly replaced by something in the first list) cause the entire list to be cleared. Process a list of input characters adding each one to the output list in turn, subject to these rules, print the output list in a pretty format.

A2) At 82% this problem had the lowest small to large conversion ratio – and it isn’t surprising because the small problem constrains the list of pairs to be at most 1 for each case and the large allows up to 30. This isn’t a performance problem at all, but it does mean that several corner cases just cannot occur. Writing up my solution for this problem probably took longest of all of them, although I found it easy to solve in theory – I made copious errors during implementation – several of which would have crept past the small inputs if I hadn’t of noticed them in time. (Keeping with the computer gaming theme, this problem is inspired by Magika)

So again this is a straightforward simulation. If the list is empty you add the character. If not first check whether the incoming character and the existing last character are a pair in the first list of pairs. Performance is not a problem, just walk them all, checking both orders (scale optimization is to add pairs a dictionary of dictionaries – but not needed here). If a match is found, replace the current last character with the replacement and continue loop onto next character. If not it is time to check whether there is a need to clear the entire list.

Again because the scale constraints are so small there is no need to optimize this using dictionaries, you can just walk every entry already in the list, consider it with the incoming character against every pair provided in both orders. Obviously this is O(n*k) but k is at most 30 in the large input. If you do optimize this scenario you would probably do what I did and add a set to keep track of the unique elements in the input list so you can dictionary lookup the next character, and do a set intersection between the dictionary results and the set of unique elements in the input list. The one thing to be careful of here is that a set is not a good implementation for keeping track of the set of unique elements in the input list, you need a count of each unique. Without a count, the pair transformations which remove the last element can cause you to incorrectly claim that element is no longer in the list. In any case, if you find a match, you clear the list and do not add the incoming character.

Finally if neither transformation nor clearing occurs, add the character. At the end of the simulation loop the nice formatting was trivially achieved using string.Join – but a simple loop with a first loop condition does the job.

Q3) Two boys have a set of items which each have a value. One boy can’t add – any time he adds two numbers he always forgets to carry the 1. He also adds in binary… The other boy however can add just fine and wants to create two piles of these items which the first boy will think have the same value, but hopefully do not. In fact he wants to minimize the value of one non-empty pile, and being a mean boy, give that to the first boy and take the rest for himself. For a given set of values, is it even possible, and if so what is the maximum value the second boy can take for himself.

A3) Looking at the questions, I would say this one gives the first question a run for its money as the easiest question – but the statistics don’t back me up. I was bemused to see people asking the Admins questions about this question during the competition. ‘Does the first boy add the items left to right?’ – ‘Yes’. Addition without carrying the 1 is both commutative and associative, so the order is irrelevant.

First off we note that the first boy thinks A+A=0. This combined with commutative and associative, quickly shows that the sum of all items must be 0 if there are 2 piles which have the same sum. The next leap is to see that A+A is the only way to get 0, hence again because of commutative and associative natures any bi-partition of such a pile with sum 0, has equal values.

So, from there it is trivial. Second boy gives the single smallest item to the first boy and takes everything else.

Implementation involves proving the no-carry sum is 0. Afterwards I realized that I could have implemented this using simple xor on the whole number, but I broke the integers into arrays of bools and reimplemented bitwise no-carry sum (xor!) by hand. Once you prove that the xor of all inputs is 0, sort, and sum the top n-1 elements and return that as result. Simple.

Q4) Given a permutation of the first N integers, and a single operation which is ‘randomly permute a selected subset’ and acting ideally – what is the expectation value for the number of operations until the list is sorted.

A4) This was by far both the hardest and easiest problem. The provided sample provided a hint which was both useful and useless at the same-time… Random simulation looked liable to time-out before convergence, especially for large input and presumed you knew what the ideal strategy was.

So, what is the ideal strategy? The sample provided a hint that if you have n disordered elements, acting on each sub-permutation cycle independently appeared a good strategy. (Permutation cycle in disordered elements is the smallest set of entries which can be reordered on their own to end up completely ordered correctly. For example 2341 is a permutation cycle of 1234, as is 2413, but 2143 and 4321 are not as they can be divided into pairs which when switched are in the correct positions. This may not be the correct definition of a permutation cycle, but I can’t think of a better term to use in this discussion…)

Indeed I could see that it seemed reasonably obvious, that not randomizing a subset of a permutation cycle was inefficient. I didn’t prove it, but a trial with small permutation cycles agreed with the hypothesis. So I started down the path of dividing the input into a count of the number of each permutation cycle length, under the assumption that the ideal strategy was to randomize each permutation cycle in turn until it was ordered. At this point I tried some examples – in fact the first example I tried was a cycle of length 3, 312. However due to some incredibly stupid mistakes (I made 3 in quick succession, but only noticed 2) I incorrectly calculated an average of 2.5 operations to order this cycle. I then made a completely mistaken leap from this mistaken value to presume that this was further evidence that working with cycles was better than working at random.

So at this point I start breaking out some serious combinatorics deriving the formula for the average number of operations for a cycle of length n, assuming the ideal behaviour of working after each operation was to work with one cycle at a time. I’m not great with combinatorics, and it took me a while to realize that most of the combinations and permutations and factorials in my formula would actually cancel out, and I wouldn’t be needing to calculate the ratio of 3000 digit numbers to 7 significant figures. I then made a mistake doing the cancellations and ended up with something again, completely wrong. It was obvious though, so I started some trials again. This time on the core piece of my formula which is ‘how many different ways is there for n randomly permuted entries to have a length k cycle which includes the first element. I was bemused to find that my trials didn’t depend on k, it was always (n-1)! – or 1/nth of the total permutations. I could see where I screwed up my cancellations now and ploughed ahead with my formula. I implemented mutual recursion memotization, and hit go. It was fast, it produced the correct answers for the sample.

At this point I decided to have a look at my memotization tables – I laughed because there, accurate to 7 or more significant figures, was a table which contained ‘n’ at index ‘n’. I had successfully demonstrated that for ‘n’ disordered elements the sum of the times for each subcycle is equal to ‘n’, which is tantalizingly close to suggesting that advanced strategy was irrelevant. The only thing you have to do, is make sure you don’t randomize something which is already in the correct place (and make sure you do randomize every member of any given cycle, which comes from letting everything else randomize).

So at this point I ripped out my memotization tables and replaced them with ‘if passed n, return n’. I then realized the obvious, that I didn’t actually need almost any logic any more at all. The answer was equal to the number of disordered elements in the input. Counter, for loop, if statement, increment, return result.

I probably would have solved this problem quickly, if it wasn’t for that mistake of calculating 2.5 as the expectation value for cycle length 3… Instead I wasted almost 3 hours on it alone. 3 hours for 4 lines of code… (ignoring the file parsing code of course!)

GCJ11 Qualification Round

So its almost halfway over and over 4500 people currently have the requisite 25 points to their name – of course the final testing could still play havoc with a few of them.

A significant change in format this year – previously there was no way to qualify solving just the small input files, solve one problem in full and you are through.

This year there are 4 questions, and you could solve just small input files to get through – indeed I’ve solved them all so baring some catastrophic event there is no way I am not through to round 1. (I also solved the large inputs, we’ll see how the final testing treats those…) Also, the first question isn’t worth enough points to get through.

So I plan on doing a write up tomorrow once it is all finished, but it is looking to be rather a busy day, so it might get delayed…

GCJ11 is coming

Just 2 and a bit days till GCJ11 qualification round now. Registration is open on the day I believe, so no rush.

TC SRM 478

Been a while since I last posted… I’ve fallen behind as was to be expected.

Q1) Given carrots are placed at locations 0 mod 1billion 7, and from any given location there are 2 options to go to, 4x+3 or 8x+7 and at most 100thousand movements are allowed, what is the minimum number of movements to get from starting position x, to a carrot.

A1) Obviously this problem is about transitions modulo 1 billion and 7. The trick is needing to rule out trying all 2^100000 options. 1 billion and 7 is too large to compute all transitions, so a simple memotization won’t work.

The first step is to realise that order of application doesn’t matter. A(B(x)) == B(A(x)). That gets rid of almost all of the solution space. Furthermore A(A(A(x)))=B(B(x)) and since we are trying to minimise moves the ideal solution will always choose 2 8x rather than 3 4x. From there it is a simple matter of repeated application of B and trying 0 1 or 2 applications of A. (Technically you need to verify that another application of B isn’t equal of 2 appliations of A are equal given the statements made so far, but A(A(x))!=B(x) for all x so that is not a problem)

Another solution is to notice that all transition states available are of the form 2^(n)*(x+1)-1, and that every possible n is reachable except n=1. Furthermore since the order of application doesn’t matter, and a greedy approach of using 8x as often as possible therefore is the ideal scenario. Therefore the answer is n/3 if n is a multiple of 3, or n/3+1 otherwise, for the first n which results in a 0 mod 1 billion and 7 answer. (if 2 mod 3 then apply 4x, if 1 mod 3 then apply 4x twice from one less 8x.)

Q2) Given up to 15 bottles each with capacity C and a list of initial contents levels (all integer), and a price list for how much a bottle with any specific content level is worth determine maximum profit given the ability to pour 1 bottle into another until it is empty or the other is full. Maximum C is 50 and maximum price is 1million.

A2) Each pouring operation either results in a bottle which is empty or one which is full (potentially both). Empty and full bottles are of no interest for future pouring operations, so there are at most 14 pouring operations that will be performed. Giving a search space of 14! – which is too large.

A dynamic programming approach might seem like a good idea, all we have to do is determine the maximum for each subset and combine them. But a closer look shows that it doesn’t make sense because we don’t know the contents levels in bottles as they may be neither full, empty nor their initial value.

So a different approach is required. If a subset of the bottles all pour, one into another in some order. They all end up either full or empty except for possibly 1. Given we know the total contents and the capacity of each one, div gives you the number of full bottles, and mod gives you the capacity of the last possibly non-empty one, the rest are empty. Presto we can trivially calculate the maximum for each subset where they all pour. Therefore we can determine the maximum for a specific subset, it is the maximum for all partitions of pouring. Number of partitions of 15 elements is still ~1billion so we can’t brute force those.

The maximum of n elements is therefore found by considering each possible non-empty subset as all pour, and the maximum of the rest. This has a running time of less than 2^29, but it seems likely to time-out in the worst case.

Next approach – For each subset one of the elements will have content j, the rest are already sold. You can generate a subset 1 larger by combining the unsold one with an unused one, or selling the unsold one and keeping the new one as unsold. This has a state space of 2^15*50, but only an average of about 15 options to consider for each spot. This gives running time <2^25 which is clearly going to run in time.

Q3) Given a list of the count of each type of apple in up to 50 boxes, determine the probability of selecting each specific apple type at random if you first randomly select a non-empty subset of the boxes to further select from randomly. (Up to 50 types of apple and each count is <200, no box is empty)

A3) The dual random selection is kind of annoying as the number of subsets is up to 2^50. So obviously we can’t consider each scenario and average.

Dynamic programming might be a possibility. Maybe determine the probabilities using the first n boxes? Except how do you expand from n boxes to n+1. This appears to be impossible. To modify the probabilities you need to know how many apples there were before. Hmm, that <200 limitation is pretty small, gives a maximum of 500k apples in any subset. Each transition only depends on the one before, so we don’t need to keep everything in memory. So probabilities using the first n boxes with a total of k apples. That is 2meg ram for just one probabilitiy, so we’ll have to do each probability separately, which I think is fine…

Well maybe a bit slow.

A faster way to do it is to break down the problem differently so you can accumulate the probabilities for each item simultaneously without requiring any additional ram. One breakdown is to consider the answer as being the sum of count of apples of specific type in a box divided by total apples in subset, times number of ways to create that total with a subset that contains the specific box divided by total number of subsets, for each possible box and total number of apples in a subset. Efficient calculation of number of ways to create total with a subset that contains the box means that running time is proportional to 500k times 50 worst case, as each apple type can be accumulated at the same time, because the part which requires the giant array is constant over each apple type. Unlike the previous calculation where the giant array is specific to the apple type, which is 50 times slower. (Although apparently an efficient implementation can pass system tests… without this further optimization)

TC SRM 477

As is usual with SRM I didn’t do this one. However, I figure I might keep writing up blogs until next years TCO.

Q1) Given a boolean rectangular array which maps to a hexagonal grid by shifting every second row half an entry to the right, determine the number of edges between true and false exist.

A1) This is easy enough, its a simple case of walking each cell and checking its 6 ‘neighbours’ and incrementing a counter if its not the same. Once complete divide answer by 2. The only difficulty is making sure you don’t screw up the determination of the 6 neighbours, which is different depending on whether the current cell is in an odd or even row.

Q2) Given a list of numbers, determine the greatest number of distinct pairs which can be selected such that all such pairs are the shorter sides of a integer sided right triangle and have a gcd of 1. Numbers are all less than 1 million and worst case there might be 1250 numbers (less if they aren’t all single digit – maximum of 350 6 digit numbers).

A2) The constraints on the pairs are pretty significant, and could easily rule out most of the numbers entirely. However 600 3s and 600 4s are a real possibility. We can easily go through each pair of numbers and work out if they are compatible. We can turn this into a graph with counts available for each number. Not certain whether the graph can contain cycles. If the graph can’t contain cycles, it is bipartite and we can perform a max flow to determine the answer. If it can contain cycles we have a problem…

So I checked out some answers – and yes it does seem max matching is now considered a question 2 level algorithm, not question 3 as I thought it used to be. However, whether there are cycles or not is not relevant, as 2 odd numbers apparently cannot satisfy the requirements (since the sum of their squares will be 2 mod 4), and 2 even numbers is ruled out by the gcd(a, b) == 1 requirement. Hence the pairs are bipartite based on even or odd.

Q3) Given a weighted tree with up to 200 nodes, determine the minimum weight sum of a tour which starts and ends at node 0, visits every edge at least once, and can teleport for cost L up to K times.

A3) If K is 0, the answer is 2 times the sum of the edge weights as every edge will be traversed twice. If K is 1, we can subtract the longest path and add L, and choose which is best. That is to say if the depth of the deepest node from any other node is > L we will teleport between those nodes. However once K is > 1, it becomes a lot trickier…

Again looking at solutions. The problem is a dynamic programming problem. Recurse over the minimum tour length within a subtree if there are i teleport endpoints in that subtree. This can be determined by considering that if a subtree has an even number of teleport endpoints, it will be both exited and entered, otherwise it will only be exited or entered, not both. So there are n nodes, and up to 2k teleport endpoints, so the DP has O(nk) states. Determining each recursion requires considering how to distribute the i teleport endpoints between a nodes children. Considering every combination is too much, so we need to switch up the DP a bit. If instead of thinking about each node, we consider each edge, we realise we only need to consider each possibility for i down a given edge, not in combinations with other edges out of a given node. So we can instead either recurse to the left most child and all its outgoing edges, or to the ‘set of edges to the right of the left most child’ Since the number of edges equals number of nodes -1 in a tree, the DP states is still O(nk). Number of possibilities to try for each case is O(k) giving O(nk^2) running time to populate the DP table. Once the table is populated all we have to do is check for root of the tree, each even number of end points, modified by the cost of teleporting, and determine the best result.

The real trick here is disassociating the end points, realising the problem can be solved without caring about specific locations teleporting to specific others.

This is the second time I’ve seen the ‘dfs edge walk on a tree’ DP problem. I should probably write up a solution to practice it.

TCO10 R4

So, as predicted by the odds I am out ;). I did at least score some points, so I came 125th. The question structure didn’t favour me, an easy first question jumping to a hard second question. A fast time on the first question was sufficient to get through, or a couple of challenges and a terrible time. However the first problem was so easy the challenges were hard to come by, and the second problem’s input requirements so obtuse that constructing a failing test case was next to impossible.

My time for the first problem was terrible – I had the problem solved in the first minute, but a plus sign where I wanted a minus sign sent me down a warren of verifying things by hand, and quickly half an hour had vanished. That being said I probably would not have gotten through even if I hadn’t screwed up badly, had to be fast. Although an extra 20minutes and I might have actually had a serious attempt at the second question.

For a while I thought maybe I could do the 3rd question, thinking it was a well known computational geometry problem which I could just grab an algorithm off a textbook for. However the closest related problem in the textbook told me the running time would be too slow.

Q1) Given a set of bank accounts with known values, where 1 dollar gets you one ticket in a lottery draw each week for a fixed amount, what is the expectation value of bank account 0 after n weeks. Limits: n up to 1000, winnings per week up to 1000, up to 50 bank accounts, each with a starting balance of up to 1million.

A1) I implemented this as a DP of probabilities of number of wins in a given number of weeks. Note that the total amount of tickets in a given week stays the same regardless of who wins in the previous weeks, so only bank account 0 and the initial total of the incoming bank accounts is of interest. Once DP is formed, simply multiply each probability in the final week by account balance for that many wins and return.

As it turns out the problem can be solved without the DP. It appears that simply using the expectation value of one week can be used to generate the expectation value of the next week. Reducing O(N^2) down to O(N).

Q2) Given a sequence of numbers formed by multiplying the digits of each of up to 50 consecutive integers, determine the location of the first of those integers, assuming the sequence refers to the earliest such match. Limits: the input numbers will be less than 1 billion, output will be less than 2^63.

A2) After the competition I realised that this problem is not as hard as it looks. I really might have worked this out if I had of made a serious attempt (and had that extra half hour :P). If the input is all 0’s you can return 10,100, 1000 depending on input length quickly. In fact brute forcing for possible answers of 1-1000 takes care of some corner cases. Find the first non-zero number, brute force the last 3 non-zero digits and solve for the rest by trying as many 9s then 8s then 7s down to 2s going from right to left, then check against all other numbers in the input to see if it is right. Return the smallest result. For kicks I wrote this up and it failed test cases – but only because of a mistake in my code, fixed that and it now passes system tests. My first thought was brute force 2 digits (well my real first thoughts were more complicated than that), but input size > 12 allows some control over the 3rd digit. A number followed by 12 0s indicates that the last 3 digits were 9s for instance. You can look at the input set and break it down in to all sorts of combinations, but in the end, simply brute forcing all last 3 digit scenarios is easier…

As an aside I was left wondering what checks the challenge system used, since the input limitations are significant. Other than 0 no two consecutive numbers in the input can be equal. No input number can have a prime factor greater than 7. At least 5 of the inputs must be 0, and after the last consecutive 0 every 10th number must be a 0. Two consecutive non-zero numbers must be increasing and the difference must divide them.

Q3) Given up to 1200 points, where no 3 are co-linear and no 2 are coincident, determine the number of line segment intersections from choosing any 4 points of the points.

A3) Brute force is obviously no good. A look on Wikipedia indicates that even the best algorithm for enumerating all intersections of the complete set of line segments from n points is in the order of the number of intersections, and if the 1200 points form a convex hull the number of intersections is well over N^3 (Close to N^4 even I think).

Therefore we can’t enumerate the actual intersections we have to count them in batches. Specifically the time needed needs to be about O(1) for every line segment. O(log n) would be sufficient.

I had a quick look at one solution, and it considers each point, and sorts the other points by angle relative to that point, then for each of those segments in order, mystically calculates the intersection count really quickly. … going to have to read it again after some sleep.

Edit: Post sleep update.

I had a look at some more solutions. General approach seems to be assume everything crosses. Then for every point, sort the other points around it by angle. If you can get 180 degree separation in the sorting, all the points on the left will not form crosses with the points on the right. So you subtract these scenarios away. However there is over-counting which you have to adjust for. Looks like the over-counting adjustment is probably the hardest bit.

TC SRM 468

Q1) Given a dictionary and a mapping of numbers to sets of letters, and an input sequence consisting of numbers, ‘try alternate’ button pushes and ‘spaces’, determine the output. Assume dictionary is processed in alphabetical order.

A1) Easy enough, make a dictionary of number sequences to list of words that match, sort each list, then process input to produce output…

Q2) Given a graph with a line of nodes each connecting from one to the next, determine the minimum time to get from the first node to the last. Subject to some special conditions. Each edge has two time options for its travel. The first is always available, but the second can only be used if there are less than k contiguous sections where second option has already been used, or you used the second option for the last edge. The number of nodes n is up to 400thousand, and k is up to n or 40, whichever is lower. The first and second times for each travel option are generated using a pseudo-random number generator with specified parameters.

A2) While writing my question text above I may have made this question even easier due to my phraseology. Its a DP problem, 3 parameters, number of contiguous sections of second option so far, boolean if last was second option and how many nodes passed already. Very easy, only problem is that it requires 64meg ram in the default dp table. Luckily every entry only depends on the ones at one less nodes passed. Lucky because that means we can get rid of the memory problem, and also because if it depended on all of them the processing time would have been excessive.

Q3) Given a k-partite graph where each ‘layer’ only connects to its neighbours (arranging the k layers in a circle), determine the maximum independent set. Limits: Maximum vertices is 100, k between 10 and 50.

A3) When I first saw this problem I was shocked, a question 3 which maybe I can actually do. (The original question isn’t phrased as given, but still it was obvious this was the real question.) I unfortunately pretty quickly came to the realisation that the graph itself didn’t have any limitations which guaranteed it was bipartite. If k was even, I was good to go, but when k is odd, I had a problem. I thought about it for quite a while, and along the way I had the idea that I could try all the possibilities for one of the layers, and do the rest pretending it was a bipartite graph. Unfortunately I threw this thought away because I didn’t remember that k was at least 10, and so I decided that worst case would be 33 vertices as the smallest layer and 2^33 scenarios is too many to consider. With k at least 10 (or we might as well say 11 since 10 is trivial and a maximum vertices of 100, the worst case scenario is when the 100 vertices are distributed in layers of 10 or 9. Select one of the rows with only 9 in it, and try all 2^9 scenarios. For a given layer if we are trialling a vertex as being in the independent set, we can remove it, and any immediate neighbour vertexes. If not we just remove it. This removes the entire layer, leaving us with a bipartite graph which we can solve via the dual maximum matching problem using simple maxflow on a graph with capacities which are all 1. Add in the number of ‘independent’ vertexes in the current trial and choose maximum over all trials and done. Probably one of the easier question 3’s I’ve seen recently.

TC SRM 469

Q1) Given a n rows of m seats, and a list of specific taken seats, return how many ways 2 people can sit next to each other. Limits: n and m up to 1 billion and number of taken seats up to 50.

A1) Very very easy. Use long long to avoid overflow. For each row with taken seats get the sorted taken seats in that row, then for each gap greater than 1, add gap-1 to the total. Finally add number of empty rows times m-1.

Not my favourite kind of question, simply because I am not uber-speed code writer.

Q2) Given a counter which ticks down every minute, and a set of k items which each have a time length, but increase the counter by j at a specific time within that time length, determine the maximum number of items which can be processed completely before the counter drops below 0. If there are multiple ways to achieve this goal, return the order which prefers lower numbered items first. Limits: k up to 20.

A2) Each item has a minimum requirement, and a net effect. If there is a non-negative net effect which we can meet the minimum requirement for, we want to use it. If there are only negative net effects, which we can meet the minimum requirements for, we will never be able to use anything which is currently not at minimum requirement. However, the order of the things left is non-trivial.

With a limit on k of only 20, every possible subset is a valid consideration if we can process it quickly. This provided a hint, but I needed to look at a solution to get the rest. If we divide the input into start and end sections, we can determine which of the end section can be placed immediately after the start section. This lets us create a DP of maximum number which can be processed from a set which appear after the rest not in the set (regardless of whether the rest not in the set can actually all be processed before it). That final condition may sound crazy, but once you get to the full set, the not-in set becomes empty set and the condition is fine, so the rest doesn’t matter.

Once the DP is done, you just have to do the ‘reverse dp walk’ to construct the output.

Q3) Given a list of items and 2 queues to put them in, where each item has a processing time, which depends which queue it is in, and after processing is placed into the other queue, unless it has already been processed by that queue, determine how many different ways to place the items in the 2 queues subject to 2 conditions. 1) The items have to be added to the queues in the same order as the input. 2) Neither queue must ever be empty before the other queue has finished its initial list. Limits: up to 47 items and each processing time is between 1 and 20.

A3) No clue – the processing time limit seems suspiciously small though. Maybe a dp on the number of solutions with an initial queue time of n? Let me look at a solution.

Rather than solving the problem as given, solve a different problem, number of ways for queue 1 to run out before queue 2 is done its initial list. Then switch queues and use the same logic again. Subtract both from number of possible distributions.

So, how to count the number of ways for it to run out. Each time we put something on to queue 1, it makes it harder to run out, each time we put something on queue 2 it is easier to run out. Some analysis is required to work out the DP. The definition of fail is if total queue 1 time + total time of first i-1 from queue 2 processed at queue 1 speed – total time of first i from queue 2 processed at queue 2 speed < 0 for any i. We want to know what the minimum of the left for any i is, for each possible distribution in order to know whether it has failed, if that minimum is less than 0, then it is a fail. As each new item is added to the pile this function changes. If we add one to queue 1, all possible left hand sides increase by the processing time, thus also increasing the minimum. If we add one to queue 2, the inequality doesn’t change except for the new i=new length of queue 2 case. In that case the inequality equals the previous scenario where all were processed minus the new addition as a possible new minimum. So we can minimize between that and the old minimum. The only problem is we need to know that previous scenario as well. So ‘previous scenario’ and ‘minimum’ are 2 parameters to the DP, the 3rd being number of items en-queued.

Now that we have this new parameter we need to know how it changes. If we add to the first pile, it goes up exactly the same as the minimum does, adding the queue 1 processing time. If its added to the second queue, that causes a subtraction of the queue 2 processing time, and an addition of the queue 1 processing time.

Potential range on previous scenario and minimum are pretty large so memory is an issue if the entire dp was kept in memory. Helpfully each case only depends on the previous number of items en-queued, and the final summation is only on the full list of items en-queued, so we can throw away memory as we go. (Avoid garbage collection issues and alternate arrays, don’t keep allocating.) Additionally with the inequality range being +/- ~900, that is close to 200million dp locations filled without any optimisation. Running with c++ this is probably reasonable, but with Java further trimming helped. For instance minimums only ever get smaller and we are only interested in the negative ones, so treat all positive minimums as 0, and we save half the state space. Furthermore the area of the DP increases by +/- 20 in both dimensions with each step, taking this into account can save another factor of ~3 rather than blindly checking every spot.

Again out of my range, but useful to sit down and understand how it works.

TC SRM 470

Q1) Given a line of n (up to 51) rooms separated by gates of up to 16 different types, a game is to be played. One of the (non-end) rooms is the ‘target’ room, where both player 1 and player 2 want to get to. Player 1 starts in room 0, and player 2 starts in room n-1. Each player (starting with player 1) takes turns removing all gates of a specific type. At any time if someone can reach the target room, they have won. If both can reach the target room the game is considered a draw. Assuming optimal play return how quickly player 1 can win (if they can), or how quickly player 2 can win (if they can), or ‘draw’.

A1) This is a simple memotized recursion, using the bitset of gates removed as its state. At each stage of the recursion first check if the given bitset is a p1 win, p2 win or draw. If it is one of these 3 states, return 1, -1 or 0 – swapping the first two if the popcount of bitset is odd (cache the result of course). If it is not a win for either player and not a draw, recurse on each bit which is not in the bitset, and select the closest negative number to 0, 0, or largest number of all recursion options and return that (after caching it).

While this approach is fast enough (50*2^16 worst case), a greedy implementation actually works. If you divide gate types into left, right, or both compared to the target room, then player 1 wins if left + both <= right, player 2 wins if right + both < left and in all other cases its a draw. Number of turns is 2*(left + both)-1 in the former, and 2*(right + both) in the later. Giving an O(n) solution, which works regardless of the number of gate types.

Q2) Given two parallel rows of n dots, where some specific dots in row 1 are already connected to specific dots in row 2, determine the expected number of pairwise crosses formed if each remaining dot in row 1 is joined to a random remaining dot in row 2. Every dot in both rows will have exactly 1 line connecting it to another dot in a valid randomly generated situation. Limits: n is up to 10000 and count of specific starting joins is up to 50.

A2) Summation of the expectation value of crossing for a specific pair over each pair of dots in row 1 will give the answer. However with n=10k n^2 is not fast enough. We can divide the problem into 3 cases.

2 non-specified dots. Expectation value for all of these pairs is exactly 0.5, so we can sum all of these quickly. (O(1) time even.)
2 specified dots. 1 or 0, we know whether they cross. The 50*49/2 pairs to consider is easily executed.
1 specified dot and 1 non-specified dot. At 50*10k, this could be brute forced, with probability defined by the ratio of non-specified locations to the left/right of the specified location in row 2, flipped if the specified dot in row 1 is to the left instead of right of the non-specified dot. However every non-specified dot is going to have the same ratio, only depending on whether it is left or right of the specified dot in row 1. Therefore by knowing how many specified dots there are to the left/right of each end of each specified dot pair, this can be performed much faster.

Q3) Given a map, up to 50×50, with up to 4 pairs of towns which want a road built between them, determine the minimum cost to build the up to 4 roads required. Roads are free to build, and can go anywhere, except through a rock. Rocks are a side to side connected set of square tiles for which the entire set can revert to being normal ground for a specific cost. There are up to 62 rocks, and no 2 rocks have the same cost. No rocks overlap with any town, nor do any towns overlap with other towns.

A3) This is a tough question. No one solved it in the competition.

A single pair of towns is trivial, you just create a directed graph where nodes are connected sets of map tiles of the same type, and side-to-side adjacency defines the graph edges. The incoming edges to rocks all have a cost equal to the cost of removing the rock, and the outgoing edges are free. Then you calculate shortest path from town to town.

Handling 4 town pairs is tricky, because you have to consider the case where the cost of a given rock is shared between more than 1 road. Looking at the solution write up, the suggested way to approach this is a dp over location and a mask indicating the towns are you are trying to join through this location, and the towns which are already joined. For each mask and position you consider each possible way of breaking the mask into 2 submasks. The cost for that combination is then the sum of the 2 sub mask costs minus the cost of getting to this node if both of the submasks have one or more lone towns out of the town pairs. The reason for the subtraction is if the submasks have a lone town then both of them have already incurred the cost of destroying this rock to get here, and so we need to refund it.

So far so good, however we have not handled anything to do with the cost between between nodes yet. For a given mask and location, we need to minimise its cost by considering we could come from another location with the same mask. If the mask has only pairs, it is already fully connected, so there is no road being built and we freely minimise everything. This lets us combine any road with another when we build up the mask to a higher bit count. If it does have a lone town, we have to pay the shortest path cost between the two vertexes (if it has multiple lone towns, we only have to pay the cost once!). This second case requires n^2 steps to ensure minimisation is complete. (The former can be performed in n steps, but it isn’t a significant performance boost.)

The final problem is that n^2 is kind of expensive considering we have to do it for most of the up to 256 masks. A particularly nasty arrangement of rocks can result in ~800 nodes in the graph. However, excluding the empty space nodes, there can only be at most 70 nodes ( 8 towns and 62 rocks). Every empty space node can be eliminated by connecting its incoming edges to its outgoing edges, since the incoming edges always have 0 cost, you basically just throw away that edge. Number of edges will increase significantly in some scenarios, but still all points shortest path will easily run in time with the lesser number of vertexes, and the overall algorithm benefits hugely from the decreased vertex count.

An interesting algorithm, certainly beyond my abilities to create on the fly.