GCJ 09 R2

So… I am out.

840th, definitely had a bad round.(Did I ever mention I hate competitions scheduled for 2am…)

Questions
1) This was an a pretty easy question, but… I took almost an hour to do it because I was being brain dead.
To start with I miss read the question, thought you could swap any 2 rows and spent a while trying to solve that. Which I think may be a harder problem…
Then I tried some really stupid greedy heuristics (at least I realised greedy was correct this time). My first 3 implementations looped infinitely because I was swapping items which were equal!
Anyway, finally this question can be resolved down to, the first entry must be satisfied, find the closest item which can be moved in to satisfy it and move it there. Repeat for all subsequent entries. So incredibly obvious but I sucked.
I skipped ahead to question 4 at this point.
4) Circle covering problem, but the small data set constraints have at most 3 circles. Trivial brute force!! Knocked that over to get myself into the top 400 briefly since I had taken so long to do Q1. However it was clear that I needed to do another problem to get to round 3. (Or do the hard for Q4… but that was beyond me…)
And here is where I lost the competition for sure.
3) I noticed a lot more people had solved the easy for this question instead of Q2 at this point, so I decided I would go for it. I wrote what I thought was going to be a solution for the small data set, but it turned out to be wrong. Fairly obviously, so with 30minutes remaining I converted it into a cached recursion problem to ensure it solved the problem. However due to my brain dead state I removed the greedy heuristic entirely, rather than just considering all equal greedy scenarios. Result was program ran in 10minutes instead of a few seconds, and so I failed. (Looking at the leader board, getting the small out for Q3 was not sufficient, you had to solve it prior to the 90minute mark – so I wasn’t even close really.)
The real disappointment was that I spent the first 10 minutes looking at this problem going, I am sure I’ve seen this before, there is some cool way to do it. What I didn’t notice was that this was a disguised variant of a problem from last year’s round 3! Specifically one which I thought was impossible, and spent quite a bit of time learning the solution because it was cool. The number of sets is related to the maximum bipartite matching of the non-constrained edges. I even thought maximum matching might be related, but I failed to recognise how similar it really was to last year’s problem. Admittedly on further inspection it is a pretty good disguise since the bipartite graph choice is much more obvious in last years problem, so I don’t feel too bad about missing it.

Above I mentioned that doing Q3 was where I lost the competition. Because doing Q2 would have gotten me through to the next round.
2) I looked at this problem when I first failed on Q3, but concluded that I probably didn’t have enough time to risk it, and I had just realised that a cached recursion for Q3 would probably work.
Ultimately I should have gone to this question immediately after Q4, because it really doesn’t look that hard, it is just one of those problems where coding errors consume time. I liked the problem, and with the forward only nature of the problem, the fact that the environment of the problem was being modified as you progressed shouldn’t be the problem it first seemed.
The problem was a nice easy shortest path, the only problem was constructing the graph. Graph nodes were not simply locations in the maze, but also related to what places have dug already. But at each node, the only digging which matters is any done on the current row and the row immediately beneath you. Furthermore, patterns of digging were irrelevant, it never helps to have sparse digging so it can be simplified to start and end point of the dug area. This is sufficient to solve the small data set, but to solve the large the trick is to realise that the digging of the row beneath you can be ignored as part of the node definition, if you always transition to a new row between nodes. (That is, reduce the number of nodes by combining multiple digs into a single edge.)

Q4 large data set, is tricky. with 40 plants, there are 2^40 ways to divide them into 2 groups which you could apply a minimum radius calculation to. First guess would be to binary search the sprinkler radius with a ‘can it be done’ approach. This appears to be the correct approach from looking at a solution, but the mechanism of determining the can it be done defies my sleep deprived comprehension. It appears that the possible locations for the water sprinklers needing to be considered can be constrained to being either directly above a plant, or at the extremities of the sort-of-midpoint line for which a sprinkler covers each pair of plants. I can see this as being plausible, but I certainly don’t understand why yet.

Experiential Compression

Had an Idea this morning.
Just jotting it down so I can call prior art when someone try’s to patent it in the future 😛

Data compression comes in two kinds, lossless and lossy, png vs jpeg…

I propose a new kind of compression, Experiential Compression, which is technically a lossy format but perfect at the same time.

Step 1, Expose brain to the stimulus which is to be compressed.
Step 2, Record nerve impulses at some point in the neural stack. (Deeper is probably better for compression, but more difficult to achieve… Also more likely to fail to replay correctly due to brain elasticity)
Step 3, (Optional) Compress nerve impulse recordings with lossless compression.

Decompression requires nerve impulses to be replayed direct into the brain. Kind of a problem right now but maybe not in future.

A more ambitious option would be to simulate the neural stack to determine the most compressible inputs which generate the same nerve signals. Sounds seriously tricky…

Simulating the neural stack is also an alternative to using an actual brain for steps 1 and 2.

I don’t know enough neuroscience to be sure but I am guessing that this compression would be individualised, data recorded this way would only be good for the original brain. But in a neural integration scenario, it could reduce the storage required in neural implants or bandwidth required to talk to external storage for individual playback at a later date.

Okay, crazy idea all typed up, back to work.

GCJ 09 Round 1C

Looking at the questions I expected a high cut-off for this one. However Q1 + small data set of Q3 apparently was sufficient.

Q1 – I haven’t checked any answers but this seems trivial – count the number of unique symbols, that is your base (if only one symbol then its base 2). First symbol is 1, second symbol is 0, subsequent symbols are 2…n in left to right order. All that remains is to convert the result into text, using 64bit int to avoid overflow.
Q2 – This appears trivial, although there was 1 special case right off the bat I noticed. Looking at another answer there is a second. One I strongly suspect I would have noticed during implementation. Velocity of centre of mass is average velocity of all particles since they have same mass. At that point you have a simple line-point equation to determine the closest location. (Admittedly I don’t remember the line point equation off by heart, but its 30 seconds to find it on mathworld or somewhere…). Special cases are if closest location is prior to starting point (in which case you use the starting point) and if the average velocity is 0 (in which case you also take the starting point). Line point equation gets a divide by 0 if the average velocity is 0 anyway, so that special case shouldn’t be hard to see. In any case I was surprised that this wasn’t required to pass on to the next round. I guess that 1A and 1B had thinned the top levels enough that this round was easier to get through in.
Q3 – This is the real problem of the set but the constraints on the small data set mean it is a trivial brute force. The problem feels familiar but I don’t remember where I saw it before. It is however a classic brute force is too expensive and the seemingly obvious greedy solution is wrong problem. I looked at one of the solutions and they used dynamic programming as is to be expected in that scenario, but its late enough that I couldn’t translate the DP back into a sensible description of an approach. I’ll have to think about it some more and update this later.

Edit: Okay so I definitely remember this problem now…
Last time I saw a problem like this this is what happened.
1) Identified n! brute force, as a failure.
2) Identified that after you select one prisoner to remove, either side is now an independent problem. This allows an algorithmic improvement on n!, but its still exponential, and I realise it is too slow.
3) Decide (incorrectly) that either the mean or median of a given section is always the right one and waste an hour trying variations on a theme to try and get it to work.

If I was to solve this problem correctly I would have to have a different step 3 (obviously)
3) Recognise that there are repeated sub-problems in the independent problems.
4) Add caches
5) Win.(hopefully)
The cool DP solution I couldn’t work out immediately is a variation on this starting at step 4.
To go into step 4 in more detail, there are 3 different scenarios which have to be cached.
1) Minimum bribe for everything left of removed prisoner.
2) Minimum bribe for everything right of removed prisoner.
3) Minimum bribe for everything between 2 removed prisoners.
Using sentinels we can store all 3 scenarios in a single cache rather than needing 3 caches, but still it is kind of messy.
The cool DP solution realises that having added sentinels the problem can be restated as having one extra cell on each end also contain a prisoner, and those 2 prisoners are always the first 2 released. What is the minimum bribe now. Adjust that by the bribe paid when the first 2 are released and you have the answer to the original question. Or make it what is the minimum bribe left to be paid for this section after each end is released, then no adjustment is required. The advantage of this new question is that every sub-problem is now a variant of type 3, there are never any type 1 or type 2 scenarios. This makes the DP much simpler to write. Although slightly harder to understand…

46 Australians through to round 2 a good 1.5% of contestants. 1C was a good round for Australia. Assuming (stupidly) results are random we should expect <0.4 Australians in the on-site finals 😛 It is very annoying that round 2 and 3 are at 2am local time. But anyway.

GCJ09 Round 1B

So looking through some of the stats it appears that round 1B was easier than 1A and a quick look at the questions would have me agreeing.

Q1 – a very simple question from a problem solving point of view, what made it different was that reading the input involved some real parsing. Also the format of the question lends itself to a tree data structure which most people would have to roll their own. Only performance optimisation I can see is ensuring to put the animal attributes into a dictionary for fast lookup – looking at the constraints of the large data set, I’m not sure that is even needed but it probably wouldn’t hurt. As well as solving the next problem, almost everyone who got through solved at least the small for this problem.
Q2 – This looks very easy, as excepting the case where the number is currently entirely in non-ascending order (starting from the left) (in which case the answer is reverse, push all 0’s to be after the first-non-zero digit and insert a 0 after the first digit), the next number can always be found by finding the first descending digit (starting from the right) and swapping it with the next highest digit to its right, and sorting the rest of the digits to the right into non-descending order (starting from the left). given this is n log n operation the large data set constraints are trivial. Basically everyone who got through solved this problem.
Q3 – This appears to be the only algorithmically complicated problem in the set, and the real differentiator between those who got through and those who did well. Just solving the small data set here (in addition to the first 2 questions) was enough to get 130th place. My initial analysis of this question appears to be the correct solution after looking at a couple of the top answers, the problem however does take a bit of skill to implement correctly given the amount of time you likely have left after doing the first 2. The correct approach is basically to perform a breadth first search starting at each node, recording the minimum depth to get a certain number ending (or starting) at a specific location. The input data set can be transformed into two independent directed graphs, but you might as well just treat it as one. The trick with the breadth first search is only that you need to terminate each branch of the search if it discovers a possible number at the same or greater depth at the same location (same depth needs to be considered for answer ordering though). In order to satisfy the ordering constraints and not perform an exponential backtrack to create the solution, the easiest way is to store the best string for each possible number in a separate dictionary and to store the best strings for each location and number as well. This apparently performs okay under the constraints and I suspect would likely have been where I failed on this question if I was competing in practice. I would likely not have considered the scenario of an entire grid of 1’s and + signs and being asked to find the answer for 250. Backtracking the breadth first search in this case would result in the code never finishing as at each location it wouldn’t know which branch has the better sort order and so would have to try each one.
It is probably also possible to avoid constructing the strings and instead solve the second half of the problem as a separate problem. I suspect it is probably also algorithmically superior.

Google Code Jam 09 Early Rounds

So it appears I am through to round 2, not that that is amazing news or anything.

Qualifying round was pretty easy, solved all the problems – like 2392 other people.
7516 people qualified with a minimum score of 33, over 8000 people submitted at least one correct answer for a small data set. So quite a few people gave it a shot obviously.

First question could be solved within the constraints by brute force. Although there was potential for optimisations of the average case, I couldn’t see a reliable worst case improvement. And any such optimisations were too complicated to be bothered with anyway.
Second question was a bit more interesting. The constraints of the large input set were enough that the simplest solution would be unlikely to execute quick enough (I think…). I immediately thought of using a disjoint set data structure, but I was too lazy to open up TMD.Algo and copy my code. It would have been a simpler solution but in my laziness I went with a dfs based method which was possible since each disjoint set had a single root node which was easily identifiable. Ultimately the running time for both cases is O(n) where n is the land area w*h. The disjoint set data structure would have had a better constant though. (Since i did not cache which way a given square flowed with my dfs approach each square might be asked its flow direction once for each neighbour.)
Third question was a straight forward stock standard dynamic programming (or cached recursion), although the small data set could conceivably be brute forced.

Round 1A – I had a doh moment just after finishing this one, when I realised that 40 choose 20 is much smaller than long.MaxValue. I spent the last 10 minutes trying to reorganise my code to avoid an overflow, which probably would never have occurred anyway. I probably could have easily gotten another 30 points as a result, taking my 37/100 to 67. It doesn’t matter though, Solving the small data set for the first problem in less than 45minutes (including time penalties), or solving anything else was sufficient to progress as part of the Round 1A top 1000. My 37 (in 2:20) was sufficient for 403rd place, 67 would have gotten me to about 200th, a much more respectable number given 1A probably was missing a lot of the good European time zone contestants who will likely compete in 1B. 1B is in less than 11 hours now, but I think I’ll avoid staying up late until I have to for round 2. (I could do the problems as an observer and try to to keep to the time limit to practice more realistically, but having just come back from overseas some sleep would be an idea I suspect.)

To the questions – much harder than qualifying round of course:
Question 1 – Happy numbers – the small data set for this should have been pretty easy. With a maximum of 3 constraints to be met simultaneously the maximum answer is easily small enough to be brute forced. I was pretty happy with my simple solution though, even with up to 9 constraints as in the large data set (although base 2 is irrelevant since all base 2 numbers greater than 0 are happy) you can brute force the full set of all 256 (ignoring base 2) constraint sets pretty quickly I think the maximum answer was under a hundred million. Then either having saved them to disk (I didn’t bother), or computing them once up front, you can answer all 500 inputs in a moment. (Given there were 500 inputs and 511 different possible inputs, you were pretty much guaranteed to be asked for any given combination, so calculating them all at once was an obvious win).
Question 2 – I didn’t do this one, but the small data set looked doable by brute force. It was just complicated enough to not bother given the third question and how slow I am. I haven’t looked at any of the successful solutions but I didn’t see any obvious way to do the large data set at the time. Thinking for a moment now I see that the problem really is a shortest path graph problem after all, despite the edge lengths being different depending on when you arrive at a given node, the shortest path can still be determined using the simple repeated reductions method. I’m not 100% sure if convergence is guaranteed to be in O(EV) but it certainly should be close enough if not. (Had a look at a solution now, and if you use a time based priority queue you can do it in approx O(V log V + E). Since there are no negative edges a priority first search is guaranteed to only need to visit each vertex once, you just need to allow potential overhead of the priority queue.)
Question 3 – Probability questions, oh how I hate them. With an unusually high tolerance of 10^-5 I thought possibly I could get away with randomised trials. It became quickly apparent that convergence was to slow even for the small data set. So I went back to doing it the correct way. Its so rare that I actually have to do real math these days that formulating a few simple relationships took me almost an hour. But eventually I got the right formula for the central part of the recursion and the small data set was solved. The hardest part was working out the correct self-recurrence relation to factor out the infinite recursion case. I already had all the caching I needed to solve the large data set, but an incorrect assumption that choices from 40 items was too large for my 64bit integer code to handle led me way down a garden path of wasted time. I do think I’ve seen almost an identical question before, I can’t remember if it was last years code jam though or somewhere else. (Side note, this problem is as a dynamic programming problem with running time n^2, if it wasn’t for overflow and accuracy the constraints could have been made much higher.)

One note here, I did discover that the latest copy of TMD.Algo has a typo in the Choose implementation which means it is completely useless. That’s what I get for not having any unit tests on the combinatorics library yet. I guess I should release a new version at some point. I should also add a cached choose function which derives its values using the Pascal’s triangle method, slower for a single calculation but also less risk of overflow.
I also think I should write the self recurrence calculation for expectation values, since it is non-obvious.