GCJ13 R2

So, looks like this was a tough round – only 1 person solving the 4th question, and they broke the large input on question 1, so no maximal scores.  Advancement cut-off was solving the first 2 questions small/large with at least 6 minutes to spare.  T-shirt cut-off was Q2 small/large with more than 40minutes to spare.  Safe T-shirt was Q1 small/large and Q2 small.

Q1) Given that the cost to travel between any 2 points on a line in one direction is given by a quadratic formula which has a negative quadratic component in the distance between the points (hence average cost by distance is monotonically decreasing as distance increases) and a list of trips and the number of people completing them, determine how much the total cost can be reduced by maintaining the same number of exits/entries at each stop, but changing where a person’s ticket starts/stops the journey.

A1) My question statement above is a bit more generic than the specific one in the comp – but I think they are equivalent.  The first thing to identify that in any case of 2 trips that touch/overlap the best result is to have one ticket travel the long way and one ticket travel the short way.  This leads to the greedy approach – ignore the people and just watch the tickets.  At each trip start location, new tickets arrive, at each trip exit location we remove the newest tickets, leaving the old ones.  This leads to a natural implementation involving a stack – order all start/end locations from first stop to last stop, such that starts are sorted before ends when start and end locations are coincident, then walk them in order, pushing onto the stack ticket start locations and counts, and at each end location pop the stack as much as required to satisfy the exit count – putting any remnants back on the stack if we get a partially fulfilled starting location.  Sum up all the costs from entering/exiting and compare to the expected cost. This is O(N) in the number of distinct trips, so it will easily perform in time for the large input which only had 1000 distinct trips.

The real challenge for the large input is the size of the result – the number of tickets in flight can get up to 10^12,  and they can all exit at once, or all enter at once. (Problem statement does not rule out the 1000 trips actually being identical – just that each one has a maximum of 10^9 people).  So you need to remember to use 64bit integers at this point already.  But then the cost of individual trips can be ~10^18 itself, so calculating the total cost can overflow 64bit multiplication.  So you either have to remember to apply modulus before and after the multiplication, or simply do all math using BigInteger and apply the modulus at the very end.  If you take the modulus approach, you have to remember that when taking the difference of 2 modulus values (to calculate the value saved) the answer is always positive under modulus, but the % operator in many languages will give you the negative remainder when used on a negative result.  As a bonus, the small input is so constrained that it can’t even hit the modulus at all, so you can’t get any confidence in your modulus code before tackling the large input, without doing some manual testing yourself.

Q2) Given 2^N competitors where each competitor can be ranked in a strict order where higher ranked players always beat lower ranked players and p prizes to be awarded, determine the lowest rank which is guaranteed to receive a prize and the lowest rank which could receive a prize.  Players start in potentially any order, but after each of the N rounds they have a stable sort applied based on their Win/Loss record for the tournament so far (lexically ordered comparison favouring early wins rather than number of wins).  Prizes are then awarded to the top p entries in the final sort.

A2) This is an interesting competition structure – in order to be in the top half you have to win your first round.  Also of note that because there are 2^N competitors and N rounds and in each round everyone will always be playing against someone with the same win/loss record – the final results are fully ordered by the win/loss record, no reliance on the stable sort.  Small input is up to 10 rounds, large input is up to 50.  So 64bit integers are again important.

I did not find this problem very approachable for some reason, but after some thought I think the approach to take is to binary search on rank and ask the question, how high (or how low) can this rank go in the final result.  Comparing the result of that question to the prize threshold you can decide whether to go higher or lower to find the minimum rank which could (or is guaranteed to) win a prize.  So then the question becomes, for a given rank, what range of placings is possible.  Highest/lowest ranks have no options they are always stuck in their natural position.  Second highest/lowest ranks can go from their natural ranking to exactly one position past the middle in the direction away from their natural rank.  But what about the rest?  If we consider a given rank, at each round its ‘division’ (same win/loss record) will consist of a certain number of ranks higher and a certain number of ranks lower.  The actual value of these ranks is not relevant, just how many there are.  If we are trying to win we can consider what the effect of optimal strategy will have on these 2 numbers, and vice versa for trying to lose.

So if we have x above and y below and are trying to determine how high a final placing we can get, what is the ideal strategy?  I would suggest it is to maximise y at all stages.  One of the lower ranked will play you, and lose, maximising how many of the y-1 remaining win involves getting them to pair off with each other as much as possible, as every time someone from x plays against someone from y, x is guaranteed to win.  So new y = (y-1)/2 (rounding down) and new x = (x+y+1)/2 – 1 – new y.  Eventually y=0 and you have to lose, and from then on you also can’t win.  This sequence of wins/losses will give you the maximum placing for a given rank.  You can also reverse everything to get a sequence of losses/wins to get the minimum placing for a given rank.  Assuming these strategies are correct it is also fairly obvious that the maximum/minimum placing never has any turning points with respect to changing starting rank, so the binary search approach I originally suggested is plausible (although due to the step-function nature some extra care might be needed). On the other hand, the fact that this strategy demonstrates that ‘ideal’ scenarios always consist of sequences of wins then losses, or vice versa (but no mixing) it should be possible to come up with a much more direct approach simply calculating the size of y backwards.  (Consistently assuming that a round down was required in the forward version, or not, in order to get the range of values that will reach a specific ‘limit’ in the ideal scenario.)  I think this direct approach can be performed in O(N) with a sufficiently smart implementation, as compared to O(N^2) of the binary search approach.

Q3) Given a table of lengths of the longest increasing sub-sequences ending at i and a table of the longest decreasing sub-sequences starting from i, determine the lexicographically smallest permutation of the numbers 1 to N, which satisfies both tables.

A3) Inspecting the tables gives us a bunch of inequalities regarding elements in the final sequence, from there we just find the smallest permutation which satisfies the inequalities and presto we are done?  Somehow I think it isn’t as simple as that sounds.  But lets start anyway.  If two consecutive elements of the first table are increasing, we know that the values are increasing at that point as well.  If they are decreasing in the table, we know the values are decreasing, and if they are equal in the table we know the values are decreasing.  Corresponding logic also applies for the longest decreasing sub-sequences.  However we can learn more.   Each value in the first table is either 1 (indicating a new lowest value) or it is precisely 1 more than a previous number.  The new value is then between that previous number and all intermediate numbers.  However there might be multiple previous numbers that are exactly one less – it appears that maybe we can assume the least constrictive of those, which will be the closest one.

Is that enough constraints?  I am not sure.  The next question is, given these constraints how do we build the output?  My guess is we build a directed graph and perform a kind of topological sort.  Care must be taken to ensure that in the multiple ways the topological sort can be performed, we choose the one which gives the best lexicographical output.

Looking at a solution, I think that I have covered all the constraints, although it can be stated more simply.  For each entry, walk backwards through the table, the first one you are equal to, you are less than that, the first one you are exactly one more than, you are larger than that. (And vice versa for the second table).  When stated this way, you can more clearly see that the resultant graph is fully connected and so the way to populate the topological sort is clear, leaf pruning, where the leaf to be trimmed next is always the leaf which corresponds to the earliest position in the output.

Q4) Given a multi-player version of pong where the paddles are points and can pass by each other and each team has to rotate through its paddles (in order) to hit it back, determine how long the game goes for (and who wins, if it ever finishes).  You are given the initial position and velocity of the ball, the size of each team, how fast each team can move its paddles, and the size of the game world.

A4) Reading the description I thought, sure this question sounds reasonable.  Then I saw the small/large input constraints.  Even the small inputs contain numbers that won’t fit in 32bit integers.  Large input contains values which barely fit in 384bit numbers…

Small input does at least look feasible – each team has at most 1 million players, so as long as your code is fast, you can consider each player and think about bouncing cycles and reflection, identify the few scenarios to determine whether the paddle can make it in time between each turn it has to take – calculate the smallest failure out of the up to 2 million players and return.  The large input however has 10^100 players per team.  The only way this version is plausible is if you can consider ever player in a team all at once, identify the worst case within the team (maybe using a binary/ternary search?).  Given there was only one successful solution, I don’t feel too bad at leaving my discussion there.  In practice the implementation will require a BigRational class (or scaling all positions by the ball and paddle velocities so they are all integers in seconds per unit, cementing the need for BigInteger even for the small input).

GCJ13 R1C

So I am a bit late with this write up, I completely forgot about it…

Cutoff for the top 1000 which advance to round 2, was problem A small/large and problem B small with 45minutes spare.

Q1) Determine the number of subsequences which contain n (or more) consecutive consonants in a given string.

A1) Small input here is trivial, with a small maximum size you can use the brute force O(N^3) approach of considering each starting point, each end point, and checking  each scenario for maximum consecutive consonants.  Large input is size 10^6, so we need to do better than O(N^3/2) in order to run in time.

In practice it is actually not too hard to implement a solution in O(N) runtime – although given the answer can be proportional to O(N^2) we still need to remember to use 64bit integers.  The solution is to use a walking range of length n, you keep track of how many consonants in a row at the end of the range, first by just counting them, then as it walks forwards, adding 1 if the new location is a consonant, and subtracting 1 if the count was previously equal to n.  Each time the current number of consecutive consonants is n, add (length-n-current_starting_index+1)*(current_starting_index-last_successful_starting_index) where you initially set last_successful_starting_index to -1 and update it to equal the current_starting_index each time this calculating is performed and the total incremented.  The use of last_successful_starting_index here avoids double counting when there are multiple cases where the number of consecutive consonants is n.

Q2) Given you can only move along 2D grid in horizontal/vertical direction and each movement has a length of 1,2,3,4… determine a list of NESW instructions to get to any specific point from the origin.  Point is guaranteed to be reachable, and you should return the minimum length path.

A3) Small input was an interesting variation, you didn’t have to print the shortest path, it just had to be less than 500 moves, which makes the problem trivial as any consecutive pair of moves can be made opposing to have a delta of 1 in whatever direction you want, allowing you to step your way from 0,0 to x,0 to x,y, and since x,y are both less than 100, you will get there in less than 400 moves.

Large input however is not so forgiving, the result has to be the shortest path.  I had a few thoughts regarding a greedy approach where you do something like NENENENENNNN to get as close as possible (assuming the target is in the NE quadrant) and then use repeated approaches of ssssswswswswswsnenenenennnnnnnn type to converge even closer.  But I don’t see any way to guarantee that that is ideal.  My second thoughts was that maybe we can ignore the whole 2 dimensional nature of the problem for a bit, if instead of trying to get from 0,0 to x,y we tried to get from 0 to abs(x)+abs(y) we could end up with a sequence of EWWEEEWEW which appropriate use of replacement with NS can get us to abs(x),abs(y) instead.  Which can then be trivially flipped to get to the actual x,y.

This 1D version of the problem seems like it might be simpler.  Reading a solution it actually is definitely simpler.  Find how many steps it takes to go past the target.  Now we go backwards, starting at the destination, subtracting each value off in reverse, if we go negative, we add the next one back on instead.  Ultimately you can see that it must converge to 0.  Now how do we convert this sequence into one that works with 2D?  Using the same number of steps, we subtract off the largest absolute value out of x and y at each step.  This doesn’t converge to 0 quite so obviously, but apparently it does all the same.

Q3) Given a bunch of tribes which have a ‘length’ and move along a line and occasionally attack towards the other side of the line, and succeed if any part of the wall isn’t at least height s (which is potentially different for each team, and changes as time progresses), determine how many time they succeed if at the end of each day the wall is raised to the minimum height that would have stopped any attack made that day (if it isn’t already higher).

A3)  The small data set is a trivial simulation – you only need an array which covers from -200 to 200, and to order the maximum 100 events correctly and handle them happening simultaneously correctly.

The large input is a bit more tricky. 1000 tribes, each with up to 1000 attacks gives up to 1 million events, so if we are still doing pure simulation, we need do better than simulating each unit of the length of the tribe, since that is up to 10^6 and it moves up to 10^5 each move, giving a 10^8 space range and a 10^12 operation count.  I think using an interval tree will do the job, especially since the ranges are disjoint.  We need to handle splitting and merging, the former for correctness, the merging to avoid catastrophic performance.  I think with merging, each operating may match at worst 1000 height ranges –  giving a 10^9 operation count, which is huge, but with only 20 test cases, maybe plausible.

Looking at a solution, I think I needed to make the interval tree a bit more fancy otherwise it wouldn’t be fast enough, but the alternative is easier to implement than a full interval tree.  They don’t use explicit merging or an actual tree implementation, instead they create a list where the first element represents the entire set of distinct start/end locations for attacks, the second the first half of those locations, the third the second half, the 4th the first quater, a bit like a common binary heap implementation.  For each node they have 2 values, ‘whole’ and ‘min’.  ‘whole’ is used to shortcut when an attack area covers a large section – so if an entire node is covered, we don’t have to update all of its children, we just update the ‘whole’ entry, and from then on we use the higher of whole and min of the two children when looking at the tree to find the minimum height in a range, since min may not have been updated because of whole.  This ‘whole’ strategy obviates the need for merging, because it ensures n log n regardless (where n is the number of events) which is trivially fast enough.

TCO13 R2C

Low turnout as usual – only 1032 registered out of ~1900 potential.  With the top 100 through already, this round was always going to be the best opportunity to get a t-shirt, and I was probably close.  I got 258th with just a solution to the first problem  I was slow to implement, a fast implementation time of shortest path would have gotten me into t-shirt contention range.  Alternatively a successful challenge would have got me 150th which is good t-shirt contention.  I was looking at a broken 1000pt solution, which I was confident was broken, when I saw that a bunch of 250pt problems had been challenged and had a quick look at the carnage.  By the time I switched back to the 1000pt and was about to open the challenge window for a try, someone else picked it off first.

There is always next year of course…

Q1) Given an undirected graph determine the minimum number of steps to make 2 specific vertices neighbours where a step consists of taking one or more distinct pairs of vertices which each have a common neighbour, and making them neighbours.

A1) Looking back on this now, I should have realised after the clarification announcement that this question would have a lot of challenge opportunities.  Clarification seemed to state the obvious to me, so I didn’t pay any attention to it, but the 7 challenged solutions in my room all had the same mistake – assuming a step could only introduce one new edge into the graph at a time.

So the solution is straight forward – we need to know the shortest path between the 2 nodes, then calculate the number of steps to make them neighbours.  Most of the quickly implemented solutions used the DP version of all pairs shortest path which is O(N^3) – but given the restriction to 50 vertices this was not an issue.  I implemented the much longer piece of code for a breadth first search (O(N^2) due to adjacency matrix rather than edge lists) for single source shortest path, this cost me a bit of time.

Once we know the shortest path length, the actual vertexes are irrelevant, along that chain we takes groups of 3 consecutive and remove the middle one.  We repeat this phase until we get down to 2.  This can simply be implemented as subtracting floor(number of vertexes remaining/3) repeatedly until there is only 2 left, and counting the number of steps.  I stupidly actually implemented the loop of copying the vertex ids that remain each time into a new list and repeating until the list only had 2 elements – which again cost me time.

Q2) Given a requirement that a rectangular grid of numbers have a ‘peak’ where every value before the peak row is strictly ascending and after the peak row is strictly descending, and correspondingly for the peak column, determine the minimum sum of elements  given that some of the elements have to be specific values. (Up to 200 by 200, but only up to 50 specific values.)  Solution is not always possible, impossible states must be identified as such.

A3) I made a start on this question, but I’m pretty sure I was heading in the wrong direction, so I gave up.  Code required was quick lengthy and prone to mistakes, justifying the extra points (550 instead of the usual 500).  It appears that the correct path was to realise that in the ideal scenario, the 4 corners will be as low as possible.  So you create 4 pseudo answers by numbering starting from 1 (or given value) in each corner, and satisfying all the fixed values.  Then you merge the 4 pseudo answers together to get the smallest possible sum. (For actual details, you’d need to read a solution, they are in-depth.)  Its O(N^3) unlike my original idea which I suspect would have been O(N^4) in the worst case.

Q3) Given a numeric range 1 to n and the ability to make guesses regarding a hidden selected value where each guess is told correct, left or right – determine the maximum probability you can select the right answer between turns a and b inclusive. (n, a and b are all less than 1 million.)

A3) First off you have to realise that this question wants you to maximise the probability of selecting the right answer between a and b turns, not what the probability of ending between turns a and b if you play optimally with intent to win as quickly as possible.  So the actual problem has a straight forward cached recursion solution, which is O(n^2*a) time and O(n*a) space – which given the range of values you can clearly see is not even going to remotely work.  The solution I was about to challenge used this approach, with a few short cuts to make it fast enough for the basic cases given in the problem, but it was clearly not enough for general case of 1million, 500k, 500k type scenarios.

Only a handful of solutions for this problem, but unlike Q2, they are short.  Short but difficult to understand unfortunately.  One key thing I see is if the difference between a and b is more than ~21 they replace b with ~a+21 – because if you make it to a turns without getting it correct, you can guarantee that you get it right in the next ~21 turns because of binary search, so any optimal strategy will never go past a+21 turns.  The rest is not so easy to understand – at first glance it appears they are walking through possible first guesses and finding the best one, but I suspect this may not actually be a correct reading.

GCJ13 R1B

This time around the top 1000 advancement cut-off was Q1 + Q3 small in fairly fast time, or Q1 and Q2 small.

Q1) Given a game where you can absorb other smaller objects to become the total size of the 2, a starting size and a set of other objects, determine the minimum number of changes that have to be made to the set in order to be able to absorb every object into one big super object.  When adding to the set you can choose whatever size object you like.

A1) So this problem is somewhat greedy in nature.  It should be fairly obvious that there is never a downside to absorbing an object you can currently absorb.  So you start off by sorting the objects by ascending size, and absorb from the start until you can’t absorb any more.  At this point you have 2 choices – remove every larger object, or add a new object to make yourself larger in the hopes of continuing.  Again greedy comes into play, there is never any value in adding an object any larger than yourself (right now, you can always do it later when you are bigger) and no point making it any smaller than the largest you can absorb.  So add an object exactly equal to your current size minus one, absorb that, and repeat from the beginning.   You can continue this process until they are all done.

Along the way at each point you have to add an item, check to see whether your current number of added items + the number of remaining items, is better than your best result.  If it is, replace the best result.  Finally check whether always adding was better still at the very end.  Return this result.  Since you almost double in size every time, the answer is always going to be no more than about 21, so everything will easily run in time.  There is one special case that I have ignored so far.  If the initial object size is 1, you have to remove everything, because adding an object never helps in that case.

Q2) In a scenario where diamonds fall down and stack up (with random choice in fall direction when there isn’t an obvious choice) what is the probability that after n diamonds fall a given location has a diamond at it.

A2)  The important thing to notice with this question is that the falling diamonds occur in phases. Until the mound of diamonds reaches height n, it never goes wider than n either side of 0.  So the answer is either 1 or 0, unless the location being asked for is in the current phase.  Each phase adds 1, 5, 9, 13, 17 … diamonds.  This is of course an arithmetic series, so it has a quadratic sum formula, so the large input is obviously no worse than ~1000 phases, so it is safe to walk through them all until we get to the one of interest.  If the point of interest is in the final shell, we need to work out the probability.  For the small input there are only a handful of scenarios, you can probably work them out by hand and just hard code them, but for the large input it can get a bit more complicated.

If the final shell is full, obviously the answer is 1, otherwise we have 4n locations to fill with k objects where the 4n locations form 2 stacks of height at most 2n.  If k is <= 2n then each of those k had a 50% chance of falling to either side and the probability of reaching a certain location of height h is the probability of getting h or more heads from k coin tosses, which is standard probability theory (although some care will be needed for the calculations in the large input, 1600 choose h can be a very large number, as is 2^1600 – if you create your pascal triangle cache pre-scaled by 2^-depth and use double precision that should be sufficient accuracy).  If k > 2n, then it is possible to fill up an entire side, after which every diamond is guaranteed to fall on the other side.  I think this means that k-2n on both sides are guaranteed and we can treat the remaining (k – 2k+4n)=(4n-k) diamonds exactly as before, but with the height of the target location being reduced by k-2n – but I can’t say that I’ve double checked that carefully for probability theory mistakes.  If that isn’t true you can instead formulate a separate DP for each problem case (rather than reusing the same pascal triangle cache over and over) and build it up for each diamond dropped.

Q3) Given a string of ‘words’ where the spaces are removed and then someone introduces errors no more than once every 5 characters, determine the best cost match to a starting scenario based off a dictionary.

A3)  First thing to realise is that while the dictionary is quite large (500k words) the words themselves have a maximum length of 10.  This means that there can be no more than 2 mistakes per word.  This problem then boils down to a fairly standard dynamic programming problem.  The problem state is what character are we up to in the input, and how many characters since the last mistake (capped at 5).  From each problem state we consider each word length (1 to 10), and each valid scenario of adding errors to that next section (no errors, 1 error after the minimum gap, 2 errors after the minimum gap with minimum gap), check if the result exists, if so produce a new potential minimum for the final state (length and characters since the last mistake).

Only question is will this run fast enough for the large input.  Large input is 4000 characters, which gives 20k states.  Each state has 10 potential word lengths, and the worst case of those word lengths with has ~9500 cases, and the exists check requires operation cost at least equal to word length.  This gives an estimate cost magnitude of 20k*900k.  Or 18billion memory ops.  Even with only 4 test cases, that is very scary time wise.  We can probably trim quite a bit by skipping any scenarios where we can’t possibly improve the target cost scenario, and that cost magnitude is a bit of an exaggeration due to taking easy approximations.  So release build, no debugging, cross fingers… (or test run time before you submit…)

However, we can probably do better.  One option which seems a likely candidate is to use a prefix tree rather than a hashset to store the original dictionary contents – that way while we are generating our potential cases, we are going to get early failure quite often.  I would estimate this probably gives at least a factor of 20 improvement in practice.  Another option is to pre-compute separate dictionaries for each single character error position, then use them as a basis for the exists check, to avoid having to generate 2 character error scenarios which are the vast majority of the cases to check during the DP.  Memory usage will be quite large in this last case, but it should also definitely give a factor 20 improvement if implemented correctly.

I originally considered doing a once off creation of every 2 character error scenario from the dictionary, but the memory requirements can easily start heading towards 50+GB which defeats the point since I don’t have a home super computer.  Oh well.

GCJ13 R1A

An interesting new style of question turned up in this round, where you could potentially fail even if you do everything right…

Top 1000 advancing, cut-off ended up being Q1 small/large, Q2 small in quick time, or Q3 small as well.

Q1) Determine how many rings on a bullseye can be painted using a specific amount of paint.

A1)  Conveniently the ratio of paint volume to surface area involves pi, so this problem ends up being integer.  First off you calculate the area of an annulus with inner radius r and outer radius r+1 – this is pi*(2r+1).  Paint volume per ring is thus 2r+1, and we can get all rings by summing from the initial r, increasing r by 2 each time.  For small input you can just keep adding until you don’t have enough paint for the new ring and return the count of rings so far.  Large input paint volumes need 64 bit integers, so brute force needs over 1 billion iterations in the worst case, so that is unlikely to work fast enough.

So for the large input you need to be a bit smarter.  The sum is a classic arithmetic series total, which has the closed form (a+b)(b-a)/2k – where a is the smallest element, b the largest element and k the element spacing.  First option (which I would suggest is the best choice) is to binary search on the number of rings, using this formula to calculate the sum.  Being very careful to avoid integer overflow. (So long as your binary search range is selected very carefully (a+b)*(b-a)/2k should able to be made to work with 64bit integer, as k is 4 and a+b and b-a are both guaranteed to be even – but it is simpler to just use BigInteger.)  Alternatively you can expand the closed form to give a quadratic formula in terms of the number of elements and the target paint usage, solve for n using double precision floating point, and then check which of the nearby numbers is the actual maximum using the closed form (to avoid floating point error triggering the floor function to incorrectly give you the one less than the answer).

Q2) Given an ordered  list of tasks where you can spend your energy to create value, and you have a maximum energy E and a regeneration between each task R, and each task has a value per energy expended V_i – determine the maximum value you can get out of your energy, assuming you start at maximum energy.

A3) Small input the maximum energy is only 5, so going into any given task there are at most 5 possible scenarios, and only 10 tasks, gives a true brute force approach of 5^10, which is only a few million.  Obviously you are always going to spend all remaining energy on the last task, so its only really 5^9 even.  But you could use a cache to get the state space down to ~50 entries.

Large input maximum energy is 10^7, maximum value per energy is 10^7 and maximum number of tasks is 10^4.  So even the cached approach is ~10^18 runtime operations. (Which is coincidentally also the range of the maximum answer, so 64bit integers are back in play.)  Looking at the problem more closely it seems that a greedy approach is more likely – you want to use your energy all up on the most valuable tasks, unless you could have conserved some of that energy for a later even more valuable task.

Looking at someone’s solution this appears to be the case.  You have an array of energy used at each location, starting off by using all energy you have for the first task, and everything you regenerate at the second. And so on, using all the energy you have as fast as you can.  But also at each location after the first, you look backwards 1 by 1 to steal some energy forwards if you are higher value per energy.  If the total energy used between your current steal candidate and yourself is less than or equal to max, you can safely steal it all forwards.  But if it exceeds max, you take what you can, and leave the rest – and stop looking further back.  Also if you ever find an element more valuable than you, you stop going back because everything prior to that will have already been stolen forwards to that element as much as possible.  This greedy solution looks worst case O(N^2) which is ~10^8 – but the case to trigger that is first task is most valuable, rest are in ascending order, maximum energy is greater than N^2 * regen rate – so there probably weren’t too many inputs that were that bad.  I suspect that an O(N) (or close to it) solution is possible by tracking whether we have pulled all the energy forwards in certain ranges, and thus skipping them during any walks backwards.

Q3) Given a set of products selected at random by choosing random subsets of N cards with randomly selected 1 digit numbers (between 2 and M inclusive) take a guess at what the original cards were.

A3) Yes, that’s right, the question doesn’t want you to give the answer or -1 if it isn’t possible to tell for certain, it wants you to take a guess if you aren’t sure.  It then specifies that you pass if you have a certain percentage correctness.  Since that is a bit nasty, there is no large input, just 2 small inputs that you can repeatedly try until you succeed.  Time penalty still applies (which is kind of unfortunate).

Helpfully the first small input is very constrained, only 3 cards, only 4 possible values per card, giving 64 scenarios to consider as to whether they are possible given the presented products.  Then to maximise your chances, if there are multiple possibilities, you choose the ‘best one’, weighted by the probability that a given choice of cards resulted in that products given – grouping them by the sorted order (of which there is only 20 cases). Also, since you are given 7 products, your probability of being given a product which is maximal for the value of the cards is actually pretty high. (You could run a bunch of simulations and determine probability for each possible product as to what the original input was, then combine them to choose the most likely input.)

Looking at solutions, it appears that even that is more complicated than required.  Success could be achieved by looking at the products – highest power of 3, start your answer with  that many 3s, highest power of 5, add that many 5’s to the answer – then look at the maximum power of 2.  If you only have 1 digit remaining, you know the answer and return.  If you have 2 digits remaining and the power of 2 is 8, then add a 4 and a 2, otherwise add 2 2’s.  Finally return 3 2’s otherwise.

The second small input however has 7 possibilities per card and up to 12 cards.  This one really needs some smarts, but 7^12 is 17billion, so you can’t consider every starting hand, let alone the probabilities of every subset.  Also since there are only up to 12 products given per case, probability of getting the maximal product is quite low.  While we can’t consider every starting hand, if you group them by repeats it should be much better.  So if you generate the non-descending only hands, and calculate the number of repeats using the standard combinatorics (which I would have had to look up…) and then looking at the probabilities for each possible product (by trying the 2^12 multiplications) for frequency you can generate a big database which can be reused for each sub-test case.  For each possible answer you calculate the probability by multiplying all the probabilities of each product together under each potential initial scenario group, and which ever answer is best, you return it.

Looking at a solution it seems I was on the right track.  Only thing I saw was that rather than calculating actual repeats, they calculated inverse repeats divided by 12! and then used that as a division factor in the probability rather than a multiplication.  Probably avoids requiring to normalize the probability calculations and worry about range errors.

TCO13 R2B

Seems low turn out is normal now – 1214 registered out of a possible 1950.  6 Australians braving the 2am start.

Tough round again, top 50 was determined by having solved the 250 and 500pt, with a couple of 250pt only thrown in through large challenge success rates.  Despite throwing away some points with an obviously wrong challenge of a correct solution and getting off to a slow start with multiple silly bugs in my 250pt code, I managed to get top 400 placing and my TC rating went back over 1700 again.  But still not in the running to get a t-shirt.

Q1) Given 3 infinite arithmetic series which you can freely slide to different integer offsets determine the maximum minimum gap between any 2 numbers amongst the 3 series.  The 3 infinite series are defined by their spacings a,b,c which are all between 1 and 2000 inclusive.

A1) Initial guess was GCD(a,b,c)/3 rounded down – but that doesn’t work for cases like 20,30,40 (which was one of the given test cases helpfully).

As it turns out, the answer is either min(GCD(a,b)/2, GCD(b,c)/2, GCD(a,c)/2) or GCD(a,b,c)/3 (both cases, rounding down) – the determining factor being whether all 3 pairwise gcds are equal or not.  Equal results in the second case, not equal results in the first case.

But since the problem states that the maximum range is 1 to 2000, you can find the solution with a bit less smarts and a bit more brute force. (As I did.)  Firstly you observe that offset of one of the arrays doesn’t matter, it can be fixed at offset 0 – since it is an infinite scenario, the origin is meaningless.  Secondly, it is obvious that an offset greater than the sequences spacing adds no value, since again the sequences are infinite.

This leaves 4 million scenarios to evaluate.  So long as we can do so quickly, we are done.  So for a pair of infinite sequences (with spacings s1 and s2) with a relative offset of i – what is the closest approach?  Turns out this isn’t too difficult to work out – it is min (i % gcd(s1, s2), gcd – (i % gcd(s1, s2)).  So calculate this for all 3 pairs and take the minimum then return maximum of those across all scenarios.  My implementation (mostly pointlessly) went a bit faster than this by skipping the inner loop if the outer loop gave a pair of sequences which was worse than the current best.  Also the outer loop can be simplified to between 0 and gcd(a,b) rather than 0 and b.  However a mistake I made at first was trying to make the same range reduction for the inner loop – which isn’t valid as it doesn’t fully explore the combinations of modulus.

Q2) Given a directed graph of up to 50 vertices, where edges can come in 3 different flavours (and any given pair of vertexes can therefore have up to 3 edges in each direction between them) a game is played.  This game involves one person choosing a starting location and announcing the flavour of each transition made to move to a new location.  The second person then tries to state with certainty where the first person currently is.  If the first person reaches a dead-end, they must declare as such, and the game is over (even if there are multiple dead-ends).  Game is scored by the number of moves the first player makes before the game is over.  Given the graph, and assuming optimal play by player 1, return how long the game is – or -1 if the game is infinitely long.

A2)  So it isn’t too hard to write a solution to this problem (which I did) – it is however much more difficult to write one which isn’t too slow in some corner cases.  I actually tried to construct a large test case to break my solution, and failed – but the system tester did not.  My solution approached the problem from player 2’s perspective.  At the start the ‘possible space’ is the set of all locations.  Each called out transition maps the possible space to a new possible space.  If that space consists of a single location – the game is over.  If that space is empty, the optimally playing player could not have called that transition out.  A simple dictionary approach lets you store how many moves remain for a given state.  When you first see a state you store -1 – thus if you see a state on your current path you can immediately return -1, but if not you can replace that -1 with the actual best number of turns possible.  Running cost of this is worst case O(n^2*2^n) – which for n=50 is way too slow. (Using bitwise operations like I did its O(n*2^n) 64bit integer operations, but that is only valid till n=64, and its not a significant improvement anyway.)  But constructing this worst case is non-trivial – so I gave it a shot (and failed).

It appear the correct approach is to consider that for player 2 it doesn’t matter whether there are 2 or 22 possible locations, anything greater than one rules out the game ending.  So if you take any pair of locations, and consider the possible ways to map that pair to another pair and so on and so on – if you ever find the same pair again, you’ve created a loop and the game can continue forever, but if your search always terminates (no mapping for pair, or only mapping is to them both going to the same node) it isn’t too far of a stretch to see that the best number of turns you find in this search (starting from each possible starting pair) is the same as the best number of turns from the more extensive search.

Q3) Given an initially empty rectangle of size x, y – and the ability to perform 2 operations where you select a sub-rectangle sx, sy and fill 0 or more of the squares in the sub-rectangle – determine how many possible different final states there (modulo large prime) x and y bounded to at most 40.

A3) Only 2 solvers for this problem.  Answers look like serious combinatorics – adding and subtracting numerous cases.  Seems to break it down into rectangle size where changes are made.  For each rectangle size where changes are made count how many possibilities, and multiply that by number of possible offset locations of that rectangle in the larger rectangle.  That somehow makes the counting of possibilities a bit easier.  Seems that for each of these rectangles you consider the case where the 2 sub-rectangles are fixed to opposite corners – and then subtract off the overlap between the 2 cases of opposite corner.  Then there is some magic to do with the edges of the rectangles which must be to avoid some kind of double-counting – its all way too complicated for me at 5am in the morning…

GCJ13 Qualifying Round

As is typical for GCJ – qualifying round had a large turn out, ~22000 people made a successful submission, of those 17059 qualified (at the moment at least).

63 perfect scores – 2 of which were Australian, and I don’t recognize either handle.  Radeye gets the unfortunate distinction of 64th place, having managed to solve all of the hardest problems, but having the large input on the trivial first question fail… (I have to wonder if he was trying the different language for small/large input thing, as his small input in perl looks fine at first glance… but it is perl, so yeah…)

Q1) Do some simple analysis of a board from a 4×4 equivalent of  tic-tac-toe where there is a special piece which counts as both X and O. Determine winner/draw/incomplete.

A1) This problem was so trivial they had to point out not to make it harder on yourself by attempting to determine inevitable winner/draw.  Small and large input, but the only difference was the number of test cases. (Importantly, solving this problem completely is not enough to advance to round 1.)

The problem also specifies that the board will be from a valid game, so you don’t have to handle cases where both X and O have won.  Check each row/column/diagonal for either winner – return if found.  Otherwise return draw if board is full, incomplete if not.

Q2) Given a rectangular array of ‘heights’ and a device which reduces all heights in a row, or column above an adjustable setting, determine whether the given heights could be obtained if everything started with equal height (which is greater than or equal to anything in the given heights).  Adjustable setting can only be changed in between usages, not as it performs a row/column ‘trim’.

A2) The important thing to notice here is that for any given square, it doesn’t matter what order the device interacts with it, it only matters what the minimum setting was across all interactions.  So if we can determine what height was used for each row/column – we can just run a simulation and check whether the input matches expectation.

For each row/column – the device cannot have been set any lower than the highest member of that row/column.  And if it was set any higher, the usage definitely could have been ignored entirely.  Therefore you can just determine the maximum value for each row/column – perform that simulation, and check result versus input.  This runs in O(nm) time and additional space.

A slower alternative (which I thought of first – and have not verified is actually correct…) is for each cell, check to ensure that it is the highest value of either the row, or column (or both).  If it is not the highest value of either the row or the column, it cannot possibly work.  Trivial implementation here is O(nm*max(n,m)) – which would still be fast enough (and requires no additional space, if you cared for some reason).  This can also be done in O(nm) by painting cells which are equal to max of row, or max of column in two passes over each row/column.  If all cells are painted, you have succeeded.  But the boolean for each cell has to be stored somewhere.

Q3) Determine the number of palindromic squares of palindromic numbers there are in the range A to B. Maximum value of B either 1000, 10^14 or 10^100 depending on problem tier.

A3) This is the first time there has been a problem with 2 large inputs, one harder than the other.  This problem was correspondingly worth a very large number of points.

The small input is trivial.  In fact you can do it in your head, since the number of palindromes less than sqrt(1000) is about 12, and of those only 5 are still palindromic when squared.  So just search the set 1, 4, 9, 121, 484 against each given A, B pair to return a count.  I wonder if go-hero.net has a category for the null programming language…  There was only 100 test cases, if you type fast and pre filled the form a bit, it shouldn’t be too hard to do that in 4 minutes.

The first large input poses a bit more of a challenge.  The first thing to realise is that these numbers are very rare, so we can solve all 10000 test cases by generating the list once and performing binary search to find indexes to take difference to get the count.  Indeed we could even pre-calculate the list and hard code it in this case, there aren’t that many less than 10^14.  An approach which works for this case is generating all palindromes less than 10^7, squaring each one and checking whether its a palindrome.  We can even generate all palindromes less than 10^7 by brute force, checking every number to see if it is a palindrome – it should still run in time.

The second large input is where it gets tricky.  The numbers are still rare enough that you can calculate them all before running any test and keep them all in memory, but generating them all becomes a bit more time consuming.  First off, we can’t possibly brute force 10^50 numbers to find palindromes to square, so instead we build the palindromes by taking the 10^25 numbers and either reflecting after or through the last digit to create our palindromes.  2*10^25 however is still way way way too many.  But if you look at the output of such a program for up to 10^8, you can get a pretty confident grasp on some ways to speed things up.  The only case where a digit greater than 2 is used in a palindrome which when squared still gives a palindrome – is 3 itself, giving 9.  So you can special case that and this means you are down to 2*3^25 cases.  This is still too many, but it is a big improvement.  A closer look will show that the digit 2 is only used in very rare circumstances.  If it is the first digit, all other digits are zero, or the last digit is 1 and is reflected through rather than after.  Otherwise it can be the last digit that is reflected through.  The former gives 1.5 cases per power of 10, a tiny number. The other reduces us down to 3*2^25 cases.  This is 100 million cases, each of which is worst case performing a 50 digit by 50 digit multiplication.  This may take minutes even with a good BigInteger implementation – so advisable to test before hand to see if it will be fast enough, or even better – pre-compute and store/load it.

However, we can do better.  A closer look shows that every palindrome which is still a palindrome when squared, the sum of the squares of its digits is at most 9.  This is why 3 only appears once, and why 2 is so rare.  But it does help us beyond that.  Out of the 2^25 scenarios from creating patterns of 0s and 1s – we can skip everything which has more than 5 1s, or more specifically 4 1s in the reflected section and optionally a 1 in the middle.  This gets us closer to 3*25^4 which is clearly going to run in time.  But the complexity of walking permutations like this might be a little bit annoying…

Q4) There are a set of boxes, which each take a specific kind of key to open.  They break the key in the process.  Inside the boxes there may be more keys.  You start with a set of keys and a description of all the boxes, return the canonical opening order to open them all, if possible at all.

A4) Small input wasn’t too bad here, 20 boxes.  For a given subset of the boxes being open the number of keys you have left doesn’t depend on what order you opened them, so you can do a depth first search on the canonical order, ignoring any states you find twice.  This is O(n*2^n) which for 20 is perfectly fine – especially with only 25 test cases to perform.

Large input however was 200 boxes.  The previous algorithm is no longer any good.  I spent some time thinking about this, but couldn’t come up with a convincing idea.  My best thought was that maybe if you did a reverse depth first search, from the answer back to the start, it might be self-pruning to the point where your search space ends up being much smaller than 2^200.

Looking at other peoples code, it appears that the solution is that there exists a much faster method to determine whether it is still possible to get to the final state from any given state, compared to finding a solution.  This method is then just applied iteratively every time you try to make a move in the depth first search on canonical order, and back track immediately if the ‘can still finish’ function returns false.

There appears to be two parts to this approach.  One is that you first check that there are enough keys in the given situation to open every chest.  So you just add up the contents of every chest with the starting scenario and ensure it is higher than the count of the number of chests for each key type.  Somehow, having checked this then makes the following seemingly nonsensical ‘can still finish’ function valid.  Create a directed graph connecting each key type of a chest not yet opened, to the key types contained in those unopened chests.  Then for each key type you have (ignoring count) start traversing the graph using a simultaneous breadth first search.  If you can reach every key type used by an unopened chest – the problem is still solvable.

I actually wonder if some of the solutions I am reading are actually correct, or whether the fact that there was only 25 test cases meant it was easy to fake a truly correct answer.

TCO13 R2A (take 2)

Attendance not that high again, 1279 registered competitors for this rescheduled round. But at least the servers seem to be more stable tonight (even if I am no less sleepy).

6 Australians this time.  I guess one of them had difficulty connecting last night.

Tough round – I only submitted a solution for Q1, and I had low confidence, since it was a greedy solution and I could see the vague potential for non-greedy answers.  During challenge phase, 40% of submitted solutions to Q1 in my room were successfully challenged, including mine.  I was too tired to identify bugs in other peoples code however, which left me with 0 points.

Which given the round difficulty was 560th before systests.  This raised to equal 441st in the final tally.

Only 2 successful solutions to Q3, and both of those people skipped Q2.  Only ~30 solutions to Q2.  Cut-off to advance was a fast time on Q1 and a successful challenge.  Looks like John Dethridge and nabbftw might be in with a t-shirt chance from this round with a decent Q1 time – and EvgeniSergeev managed to get a successful challenge, the other 3 Aussies (including myself) left on 0.

Too tired to do any more write up now – I’ll try and come back to it later.  But I do at least know why my Q1 answer failed.

Q1) Given 2 equal length sequences of letters, determine the correlated sub-sequences which when appended give the maximal result.

A1) I have managed to fashion a ‘brute-force’ solution which passes system tests, but I’m not 100% happy that it is actually correct.  But I will present it here anyway.

The first thing I did was trim the first sequence so that it was in non-ascending order – performing the matching trims in the second sequence.  It seems trivial enough to me that if your first subsequence ever increases, you could have removed the previous lower characters to get a better value.  The problem then seems to become a question of how best to ‘cross the append’.

I identified a few different ways that you could cross the append, and brute forced them.  Each way basically starts at some offset in the first sequence and tries to cross at some letter value.  Each letter which gets in the way (is lower than the desired cross value) is removed as it is encountered – and as each letter is removed we check to see if we’ve found a better solution.  However there is one exception, once you have crossed over, if a letter is in the ‘way’ it may correspond to a letter in the first sequence with value higher than your starting index, in which case removing it can never be a win.  These letters are just left in place and hope they don’t make the solution worse.  Every combination of offset and possible cross letter value is tried and we return the best we can find.

The way I handle this exception lowers my confidence in my solution a lot – but it works well enough for system tests.

A better solution is dynamic programming.  The DP is ‘best solution of length n using the last k characters’.  Initialized with length 0 being empty, each iteration we check the kth last character to see if it makes a better solution to length n+1 (than the previous length n+1 solution, if one exists) by pre-pending it to length n. (Proof of correctness of this still isn’t entirely obvious, I suspect a similar solution to ‘best solution of length n using the first k characters’ would fail – since an obviously correct solution would need to apply arbitrary subsets to jump from multiple iterations before to the new iteration, rather than being based purely off the previous iteration.  Something to do with prefixing vs postfixing would seem to be the key.)

Q2) Given a square grid, where each cell can be any value 0 through 9, and up to 10 specified values, determine the number of scenarios which exist where each row and column adds to the same number modulo 10. (Answer given modulo large prime.)  The grid may be large, up to 1000×1000.

A2) This problem boils down to 2 cases.  Firstly the width of the grid may be larger than the number of given clues.  In this case the answer is 10^((n-1)*(n-1)-k + 1) – where n is the width of the square grid and k is the number of specified values.  Since the number of specified clues is less than the width of the grid, there exists at least 1 row and column which is completely independent of the specified values, which can be used to guarantee whatever desired modulo for each row/column.

This leaves the case where we don’t have an independent row and column.  But we do have a maximum grid size of 10.  We can break this down further.  First if any row or column is fully specified, we know the target modulus.  If there are 2 target modulus that don’t match, there is no solutions, and we can immediately return 0.  If there is no target modulus, we try and see how many solutions there are for each potential target modulus and add them all together.

Looking at someone else’s solution I see that from here we can break the problem down in to independent subsets.  If a cell is not specified, than that row/column pairing are not independent as they are linked by the choice to be made.  But if the value is specified, then you can just subtract that value from the target for that row/column and they remain independent.   A disjoint set tracker can be used to identify these independent sets of rows/columns, and the number of possibilities for each independent set can be multiplied together to get a result.  Once we have an independent set of rows and columns, there is either no solutions or lots of solutions.  If there is lots of solutions, the number of solutions is much like the original calculation, 10^((r-1)*(c-1)-k), where r is the number of rows in the independent subset, c the columns, and k the number of fixed values in those rows/columns.  So all that remains is to identify when there are no solutions.  This appears to be possible to do by looking at the fixed values in each row for columns which are independent, and for each column for rows which are independent.  If the sum of each of these don’t have the same modulus, there is no way to set the values. (How you derive that leap of logic, I am unsure… but it sounds vaguely familiar from magic squares – sum of the sums each have to be equal.)

I wondered if a simpler solution is for a given target modulus, go to each value, set it to 0, perform any cascades forced by the modulus, and for each value you manage to set to 0, that is an extra power of 10.  If you ever get to a contradiction, its not possible.   To check, I wrote this up, and it passes system tests.  Written correctly it is O(N^4) and with N at most 10 this easily runs in time. (The more complicated algorithm above appears to be O(N^3).)

Q3) Given 2 numbers A and B, determine the number of unique values X^Y where 1 <= X <= A and 1 <= Y <= B.  A and B are both potentially large.

A3)  I think I almost had a better chance of solving this problem than Q2.  Although that isn’t saying much given how tired I was.  We start with A*B scenarios, and start subtracting off duplications.  With each of A and B being up to 1 billion, obviously duplications need to be removed in batches.  The first trivial set of duplications is X=1, there are B-1 of these.  Now we only have to consider X > 1.  We need to identify scenarios of X^Y = N^M.  Cases where N is a power of X aren’t too bad, but there are cases like 8^2=4^3 which are far more problematic.

So, taken from someone else’s solution… For each value between 2 and the sqrt of A inclusive, consider every power not greater than A.  For every number between 2 and A inclusive not in this set there are B options.  There is also the 1 extra for the value 1.  What remains is how many contributions count from the others, numbers which have powers less than or equal to A.  If we take each unique non-power in the range 2 to sqrt of (A) inclusive, and consider how many powers of it exist which are less than or equal to A we can work out how many unique values we can reach using an exponent of at most B.  Each power can add B, but then we have to remove the overlaps.  It almost looks like a standard add everything, remove pairwise overlaps, add back in the triplets, removing the 4’s, etc.  But since the power could go up to 30, there are a lot of groups of 15, so it needs to be done a bit smarter.  I can’t really work out how to describe the solution here properly, so if you are interested I guess you can go read a solution.

TCO13 R2A

So, up at 1am to prepare for the 3am start.  Quite tired, but will see how I go…

2000 potential competitors for round 2, only 1260 turned up.  5 Australians battling the 3am start time. (Assuming they are east coast…)

3:10am – round starts, no one has been assigned rooms yet 😛

3:19am – round is cancelled 🙁

Update: Round is rescheduled for 3am tonight …