GCJ15 R2

The all important t-shirt round, top 500 advance but 1000 t-shirts were on offer.  Four problems for round 2, in the end advancing meant solving all the small inputs and the easiest large input, or one less small in a very fast time.  Even with a slow time that was enough to get the t-shirt, 2 small and a large was okay if you were fast, or the second small was for the hardest problem.  I think I stood a decent chance of advancing if I had of been competing this year, definitely would have gotten the t-shirt…

Q1) Given a 2D grid where some cells propel you in a direction, what is the minimum number of cell with direction that need to be changed to ensure that regardless of starting location, you never fall off the edge?  (Note its not always possible to solve, so return ‘impossible’ if there is no solution.)

A1) I didn’t immediately get this problem, but when I looked at it again I realized it was trivial.  Seems the contestants agreed, with very high success ratio and high solving rate.

In the end it boils down to the question, why would you fall off the edge at all?  Because a cell is pointing there with nothing else in the way.  So for each such cell, just point it at another cell with a direction.  If you can’t then the problem is impossible and you should return as such.  Otherwise you are done, just count how many cells you had to change.  The run time of the simple brute force is obviously no more than O(WH*max(W, H)) which given the constraints is trivial.  A tighter bound is actually O(WH) every cell can only be traversed at most 5 times, once to find the cells with direction, and then 4 times from being reached from each direction by other cells. So even if the problem had of allowed much larger grids, it would still have performed in time.

Q2) Determine the minimal time to fill a container of volume V to temperature X, given a bunch of sources of flow rate F_i and temperature T_i.

A2)  The small input here has only 2 flow sources, and the problem gives the formula for mixing temperatures.  Solve simple simultaneous equations in 2 variables and you are done.  The inputs are given to 4 decimal places, and the answer is 1 part in 10^6 accuracy, so doubles sound like they would be fine here.  But this isn’t quite like other such floating point problems.  This one has the ‘can’t be done’ answer, if both temperatures are the same side of the actual temperature.  So you need to be sure that you don’t introduce a numerical error which means that in the case where one of the input temperatures is equal to the desired temperature, you incorrectly return impossible.  One solution is to ignore the fact that the question specifies things as 4dp, and just multiply everything by 10000.  These factors of 10000 cancel out in determining the time, so you can switch to mostly integer arithmetic, just converting back to floating point for the final division.   You have to be careful with overflows though.

The large input is too large to brute force, which suggests a greedy approach.  My best guess was to order them by temperature.  I took a look at some passing solutions and one of them did exactly this.  Order by temperature, then slowly prune either the hottest or coldest until you switch from the output being too hot to being too cold or vice versa.  Then its like solving the small input, just with one high flow mixed temperature of everything, and the last temperature you removed.  I found it interesting that they didn’t bother to solve the simultaneous equations though, they just binary searched for the flow rate that produced the right output temperature. (Since you can simulate lower flow rates just be having that tap turned off for part of the time.)

Another solution I saw instead binary searched over ‘can it be solved by time x’.  They also sorted by temperature, but instead of pruning backwards one by one, they built outwards from the middle.  Given the time t, they had actual volumes rather than rates, and could determine if each volume from a given tap could be mixed with some from another to create the right temperature.  As they used up each tap, they moved to the next one out.  Eventually all of the colder than target taps or all of the hotter than target taps get used up, and the the rest is discarded.  If the total used volume is greater than or equal to target, success and binary search a smaller time, otherwise a longer time is needed.

Q3) Given a set of sentences which are in ‘English’ or ‘French’ determine the minimum number of words that must be in common between English and French.  The first two sentences are labelled, the rest are unknown.

A3) The small input looks like a trivial brute force, but unless you start by replacing strings with numbers, the hash/equality cost is quite high and will probably stop your code from running in time.  So relabel everything to numbers.   Given the constraints there will be no more than a few hundred distinct words which you can number consecutively 0 to N, so rather than hash lookups you can just used indexed arrays.  Enumerate the 2^18 scenarios adding to the counts of times you’ve seen each ‘word’ against either English or French arrays.  Then sweep them both looking for anything which has both positive for the same index.

The large input however had me stumped.  My best guess was some kind of maximal matching or flow graph, but I couldn’t see how to construct it.  Looking at other solutions it appears max flow graph was the answer.

Each sentence has two nodes, each distinct word has two nodes.  Unlimited flow from the first to second nodes for each sentence, but only unit flow between the first and second nodes for each distinct word.  Then connect the second node of each sentence to the first node of each distinct word in the sentence, and vice versa – both with unlimited flow.  The answer is then the maximum flow from the first node of the first sentence, to the second node of the second sentence.

I don’t understand why this works though, maximum flow for a minimal answer.  Even the meaning of each node doesn’t seem obvious.  The two nodes per sentence seems redundant since there is only unlimited flows involved, probably just a convenience rather than having source and sink nodes for the first two sentences especially.  A flow between the two nodes for a word means they must be both English and French.

Looking at the case of just the input two sentences, every word in common allows a flow which goes first node of first sentence, second node of first sentence, first node of word, second node of word, first node of second sentence, second node of second sentence.  Words not in common don’t allow flows because they just form a cycle with their parent sentence.  Next simplest example, one word third sentence which is in common with English (or French) but not both.  It gets ignored.  Good.  Three word sentence with two in common with English and one in common with French.  One more flow, from first sentence through the fact its in common with English, to third sentence, then over to French by the one in common with that.  The second potential flow gets stuck at the 3rd sentence.  It all works, but the why still escapes me.

 

Q4) Given a cylinder labelled with a rectangular grid of positive numbers such that each number has exactly that many neighbours of the same value determine how many such labelling exists, modulo 1 billion and 7.  Two labellings are considered distinct only if they aren’t rotations of each other.

A4) Brute force on the small input here seems scary, but back tracking algorithm stands a good chance of running in time, because the problem is so restrictive.  Some smarts might still be needed.  Pure brute force is 3^36, which is obviously too slow. (4 and higher can’t be on the board, 5+ trivially, 4 because it implies an infinite board.)

Looking more deeply unlocks this problem.  There are limited number of ‘units’ which a solution can be made up of.  A row of 2’s where nothing above or below is a 2.  2 row’s of 3 where nothing above or below is a 3.  These 2 are the obvious ones given in the question itself.  The trick is to come to understand there are only 3 more building blocks.  They all involve just 1’s and 2’s.

122
122

112222
222112

1222
1212
2212

Each can only be placed next to itself in a row, and each can be rotated a number of places equal to its width, but again rotated versions can only be next to rotated versions.

The problem then becomes a DP over rows.  2 rows of 3s separating one of 4 scenarios of 2’s.  One which uses one row, 2 which use 2 rows, but have 3 or 6 rotation variants, one which uses 3 rows with 4 variants.  The trick is to make sure you don’t over count.  Each variant is like a rotation, so if you only use one pair of rows of mixed 1 and 2, you have actually over counted by exactly 3 or 6 times.  One solution is to assume that you are always going to over count by a factor of 12, so divide the result by a factor of 12.  Then at the deepest level of your recursion, force over counting to be 12.  If you’ve not used any 1 or 2 mixed, return 12, if you’ve used only the first kind, return 4, only the second kind (or second and first), return 2, only the 3rd kind return 3, both the 1st and 3rd or 2nd and 3rd (or all 3) return 1.  The DP is then on rows and 4 bools representing whether you’ve ever used each of the 3 types of 1,2 mixed and whether the last set of rows were all 3’s or not.

GCJ15 R1C

Three easier questions this round, although I have to say I found the ‘easy’ question the hardest… Advancing cut-off was either of the first 2 problems and the smalls from both the others and a few minutes to spare.  Solving the 3rd problem and smalls of the other was worth 1 more point and ~400 places as a result.

Q1) In a game of finding all the locations of a 1xW rectangle in an RxC space, where before each guess the opponent is free to move the 1xW rectangle anywhere that is consistent with the previous answers, determine the minimum number of guesses to guarantee victory assuming the opponent plays to maximize your guesses.

A1) So I spent a while rabbit holing entirely the wrong direction.  I seem to have ‘divide and conquer’ on the brain, this is the second time I’ve gotten stuck on a problem this year because I was convinced that dividing the problem in half was optimal.

So in practice this question boils down to forcing the opponent to have no options as fast as possible.  This starts by reducing the places the rectangle can be.  The fastest way to do this is obviously (in hind sight) to guess W cells away from an edge of where it could be.  The opponent will answer no unless it has to, a yes easily reaches a smaller number of guesses.  No answer eliminates W spots.

When you are down to one last row with 2*W-1 or less possible spots the strategy changes.  W spots you know everything, so just guess every spot left to victory.  Between 2*W-1 and W+1 inclusive you know some subset of the middle. Guess immediately next to that, the opponent will answer no (or possibly yes if there is more than one unknown spot left, it makes no difference to them if they are assume we play ideal).   Once they answer no, we know everything and can guess all the remaining spots.

In the end this entire problem can be boiled down to C/W*R + W – (C%W==0 ? 1 : 0).  Each row other than the last takes C/W guesses to eliminate entirely.  The last takes C/W – 1 guesses to get down to the last section, then W+1 guesses to pin everything down.  With one exception if C%W == 0, the last section is W spots, so only W guesses are required to pin everything down.

In the end I wonder why the limited the large input to just 20 x 20, I guess it leaves open a DP solution rather than just the greedy.

Q2) Given a list of letters (size N) to choose from at random, a length of output (size S) and a ‘target’ sequence (size T), determine the maximum number of times the target sequence could appear, then determine the difference between that and the expectation value for number of times it appears in the output if generated at random.

A2) So this is a deceptively easy problem.  I usually don’t like probability problems, but in this case the details clearly point out that multiple overlapping instances all count.  Hence the probabilities are independent and easily calculated.  List of letters can be turned into base probabilities by summing 1/N for each letter.  The target is then converted into a probability by the product of those base probabilities for each letter in the target.  The full expectation value is then that ‘target’ probability times S-T+1, since each possible starting point is fully independent.

The maximum number of times target sequence can appear is determined by the maximal non-complete overlap of the target sequence with itself.  Which can be determined with a simple nested for loop.  The maximum is then (S – T)/(T-overlap) + 1.  One length T, then as many T-overlap sections that will fit.

Q3) Given a list of values that you can have up to c of each of, determine the minimum number of extra values in order to be able to create every possible total between 1 and V.

A3)  I found this question easy, but only because I’ve had previous experience with this kind of question before.  Its a greedy problem.

Start off with having used none of the provided values.  With this you can make 0.  Take the smallest unused provided value, if it is what you can currently make + 1, or smaller, add c * that value to what you can currently make.  Otherwise add c * (what you can currently make + 1) and increment the extra values you had to have counter.  Repeat until you can make V or higher.

One corner case to be careful of for large input. While the target V can never be more than 1 billion, and the provided values are always small, the extra values you have to add can be very closer to 1 billion, so adding c * extra value can overflow a 32 bit integer very easily.

TCO15 R1C

TCO hasn’t really been drawing the crowds this year – another relatively low turn out again for R1C.  Only 815 people registered out of 2500 available slots, 750 to advance (in theory), but again the positive score requirement kicked in and only 622 will advance as a result.  So including byes there is only ~2150 people advancing to round 2.  The problems were pretty easy this time round, although problem 2 wore a pretty good disguise.  I think I could have solved all 3.

Q1) In a given 2xN grid, no two filled in cells are touching by edge or corner.  Without clearing any of the existing filled in cells, what is the maximum number of filled in cells that can be without any 2 touching by edge or corner?

A1) So this problem doesn’t look that difficult, but it becomes trivial once you realize that in a 2xN grid, if no two filled cells can touch by edge or corner, there is no change in restriction if you move every existing filled cell to the first row, and delete the second row entirely.  Obviously only at most one can be filled in any column, and if it is, both of its neighbouring columns have to be empty.

Once the problem is reduced to 1xN, its trivially a greedy left to right fill in while both neighbours aren’t filled in – then count how many are filled in.

Q2) Given a rooted tree and defining a path as being a sequence of one or more distinct vertexes where consecutive elements in the sequence are connected by an edge, determine the maximum number of paths that you can select at once, such that no element of any one path is an ancestor of any element in any other.

A2)  Key here is that a path is a sequence of one or more vertexes.  But every vertex you add to a path can only decrease the number of other paths that could be selected such that there is no ancestor commonality. Therefore the whole paths thing can be disregarded, the question is just how many vertexes can you select such that none are a parent of any others.

Given a non leaf node in a tree, if it is selected, none of its (direct or indirect) children can be selected.  So if it has more than one direct child, it is obviously better to replace it with its children instead.  If it does have only one child replacing it with its child doesn’t make anything worse, but it does potentially lead to a further replacement if there a branching factor greater than 1 anywhere in its indirect children.  Ultimately this means there is no point selecting anything other than leaf nodes.  Leaf nodes also cannot possibly be an ancestor of any other leaf node or vice versa, so the problem reduces down to counting the leaf nodes in the tree.

Q3) In a list of boolean values, its ‘value’ is equal to the number of distinct subsections which are entirely made of alternating 1’s and 0’s.  A subsection of length 1 is considered to trivially ‘alternating’.  For a given n and k determine how many possible lists of boolean values of length n have ‘value’ k.

A3) This problem is clearly the hardest of the 3, but it has a fairly obvious dynamic program style solution.  A sequence of length i ending with alternating section of length j, with existing value v, can either be extended to i+1 with alternating length 1 with value v+1, or length i+1, alternating length j+1 with value v + (j+1)*j/2.  So if you have x of such sequences, you can add x to both of the destination cells.  Start with a seed of length 1 with alternating section length 1 and value 1.  Each target cell is always strictly greater in i and v, so iterate over j last.

Note that I didn’t talk about 0 or 1 here.  But you don’t need to, every sequence has a complement of same value by swapping all 0 to 1 and vice versa.  So the basic program output just needs to be doubled.  Then finally the n and k from the original input map to i and v, and you sum over all j.

GCJ15 R1B

This was a round of relatively tough problems.  In the end solving the first and the small of the second was sufficient to advance, so long as your time penalty wasn’t excessive.

Q1) Given the increment and reverse digit order operations, what is the minimum number of steps to get from 1 to x?

A1) So the small input can be brute forced using a breadth first search, but why this is true is one of the key points towards the large input which cannot be brute-forced.

The danger with any breadth first search is that the search space might explode.  But in this case a quick inspection finds that a search to find x never considers values greater than 10^(ceil(log10(x))).  Of the operations available, only the increment can increase the number of digits.  Therefore in order to get to a value in the next tier, you have to first get to 9999…  It is therefore interesting to see how many steps it takes to get to each tier.  Inspecting the bfs it takes 1 to get to 1, 9 more to get to 10, 19 more to get to 100, 109 to get to 1000, 199 more to get to 10000.  This pattern leads to the insight of what the minimum number of steps involved is to get from one tier to the next.  First increment until the last half of the digits are all 9, then reverse, then increment until next tier is reached.

Having made this observation it is natural to jump to a greedy approach, calculate number of steps for each tier, then on the last tier increment until the last half of the digits match the start of the desired number, reverse and then increment until the desired number is reached.  There is one corner case however.  If the target number ends in zeros, reversing leaves you with a number ending in 1, and you can’t go backwards.  The natural fix here is to aim at the first half of the digits – 1, reversed, as the last half of the digits.  So for 9900 the first step is to get to 1089.

This just leaves proof the approach is right.  The greedy algorithm is pretty obviously the minimum number of steps which stays within a tier, but we need to rule out going up a tier, reversing a number ending in zero.  This falls out from the number of steps the greedy algorithm takes to get to a number worst case within a tier.  For anything that doesn’t end in half zeros, the number of steps is obviously less than the number of steps to get to the next tier, since each sub part of the algorithm has less steps.  This leaves 90, 900 and 990 as a potential worst case scenarios.  First takes 18 steps (less than 19) from 10.  The second takes 108 steps (compared to 109) from 100.  Finally the last takes 99 steps (compared to 109) from 100.  So going up and then down is never going to be cheaper.

Q2) Given RxC grid and N filled in squares, what is the minimum number of shared edges.

A2) The small input is only 15, so brute force simulation is nice and easy.   Conveniently 15 is also 3×5, which ensures good coverage of the main corner case.

The large input seems a natural ‘greedy’ approach scenario.  Checker-board coverage, then fill in any empty corner locations (for 2 edges each) then any empty edge locations (for 3 edges each), then the centre (for 4 edges each).  But this fails.  Inspecting brute-force vs greedy the differing scenario is in 3×5.

Greedy for N=12 gives:

xxxxx
xx.xx
x.x.x

Brute force on the other hand gives:

xxxxx
x.x.x
xx.xx

Greedy scores 12, brute force scores 11.

At this point I have to say I was stumped, I started to consider DP style approaches, but they were all very very computationally expensive, the large input was not plausible.  So I looked at one of the submitted solutions.  And now staring at the cases I presented above it seems obvious, but maybe it would have been more obvious if I had of presented N=11.

Greedy vs Brute-force for N=11.

xxxxx xx.xx
xx.x. x.x.x
x.x.x xx.xx

For an odd by odd grid, there are 2 ways to tile checker-board style and the maximal one is not always the best option as a starting point for the greedy algorithm.  A single extra spot has a cost of 3 in maximal grid, but in the inverted grid the first 4 extra spots cost 2.  The inverted grid fills one less spot, so at 2 spots at 2 each vs cost of 3 is a loss, but 3 spots at 2 each vs 2 cost of 3 is a draw, and 4 spots of 2 each vs 3 of 3 is a win for the inverted grid.  Ultimately the inverted grid starts losing again because it has more interior points which cost 4, but while its edges are still being filled in, it gains and then holds the advantage.

Ultimately 3×5 is not required to force the inverted grid, 3×3 has a win for the inverted grid at N=8.  But 3×5 does cover the full progression of cases better.

Q3) Given a bunch of people walking endlessly at constant speeds around a circular track, with varying starting points, determine the minimum number of times to meet up with anyone else in order for you to do one loop of the track, if you can do any speed you want whenever you want.

A3) So I think that your super powers with regards to speed means the problem can be turned into one of arrival time.  If you arrive at the destination at time x, you have to have gone past anyone who hasn’t yet completed a lap (for 1 meeting each) and have one additional meeting for every time you have been ‘lapped’ by someone who has completed 2 or more laps.  Trivially infinite speed with an arrival time of 0 gives you an upper bound which is equal to the number of walkers.  It also seems obvious that an arrival time larger than that of the slowest walker’s first arrival is not worth considering, it is a possible minimum, but any longer you can only be lapped by more walkers, so it can’t be any lower. The trick is to determine the potential arrival times of interest and calculate the number of meetings in time.

It is easy enough to sort everyone by their first arrival time, this gives a bunch of ranges to calculate the number of times you get lapped, but the naive approach of calculating number of times you get lapped is O(N) for each arrival time.  And O(N^2) is not going to cut it for N=500000.

In between each arrival time the number of meetings can only go up, when you get lapped, the instant after an arrival time is the only time it can go down, and it only goes down by one.  This leads to an observation.  Arrival time 0 gives an upper bound on the answer of 0 and if you are considering arrival time of the kth earliest first arrival and the number of meetings which have occurred is greater than N+(N-k), there is no way the remaining meetings can get the total meetings below N.  Therefore there is no value in continuing once you have reached 2N-k meetings, or more simply 2N.  2N is 1 million.  So if we can simulate ‘every’ arrival time (not just the first arrivals) in O(log N) time each, and early out once we hit 2N, we can run in time.  Simply putting walkers in a heap ordered by their next arrival time, and putting them back in with their next arrival time, will give the desired behaviour.  Just need to track whether its first arrival or not in order to decrement or increment the times.

Oh and there is likely a corner case to do with equal arrival times, they have to be processed as a group because you can’t arrive in between them, only before or after.  And getting accurate equal arrival times requires using fractions rather than floating point.