GCJ Qualifying under way

So, I can’t do a write up about the questions until tomorrow (sometime after I sleep post TCO12 R1C), but I did finally finish them.  Penalty time of 7:20 (which doesn’t count in qualifying since all you have to do is score 20 out of 100 – and even if all my large inputs fail somehow, I’ll still get 50) – but I had lunch and played computer games for 4 hours or so in between the 3rd and 4th questions.  So I think I could have done it in about 3 hours which I think isn’t too bad.  At time of writing only 40 people have submitted solutions to all inputs.  I think that is kind of low compared to normal, even if it is early days…

New this year was a question with no large input, not that exciting, but still something different.

TCO 2012 R1B

Again a late night, and again a poor performance.

I was faster on the 250pt question this week, but I probably spent more time testing than I needed to.  Still 217 is acceptable.  Unfortunately, at the prior to systest phase, the top 600 lowest score is 300 and I failed to get the 500pt.  So I sit in 750th, hoping for a huge number of failures in the 500pt problem that I struggled with.  I definitely saw a few solutions which were greedy in nature in my room, and although I couldn’t find a counter example, greedy is usually high risk.  I guess I’ll find out shortly.

Greedy solution passes… but still quite a few failures, enough for me to go up to 646th.  Just 11 points from cut-off this time.  R1C here I come?

Q1) Given up to 50 objects with lengths between 1 and 50 inclusive, determine the largest k where you can make k rows of length k either by using a single object or 2 objects separated by a gap of 1.

A1) Nice easy problem.  The answer is obviously in the range 0-50, and its not too hard to verify possibility, so try each one descending until it works, or you reach 0.  To verify, sort the objects, any object larger than k can be discarded, every object equal to k, count that up.  Then perform an inward walk of those left checking for sum equal to k-1.  If sum equals step both ends inwards and increase your count, if sum smaller move the bottom point up, if the sum is greater move the top point down.  Continue until the points meet or cross.  If the count is greater or equal to k, then its possible to make the k rows needed.

Q2) Given up to 50 tasks which take time between 1-3600 time units, determine the minimum time to complete the tasks subject to the following strange constraint.  You have a single worker, each worker can only do one task.  However a single worker can split into 2 workers in time t.

A2)  The greedy solution is to sort in descending order and replace the 2 smallest entries with the largest of those two plus the split time.  Repeat until there is only one task remaining and return its time.  Very simple, but is it correct?  The answer is yes.

The path I erroneously took was to try and determine the minimum time to perform the largest i tasks and have k workers left over.  However I couldn’t get this to work because my formulation didn’t allow for the other tasks occurring after a split but still concurrently with another task.  I actually realised this fairly early on and tried to formulate a reverse search more like the greedy solution, but I got confused, went round in a circle and started working on the same broken approach again. (Way too tired!)

‘Proof’ for greedy solution.  You can treat the splitting of workers as creating a tree.  Each leaf node of the tree does some work.  There are no empty leaf nodes because any empty leaf node could be removed and its parent replaced with the other branch.  In an optimal solution the 2 smallest items will be at the deepest level of the tree.  As if they are not, then they can be switched with any non-deepest leaf node and the solution either improves or stays the same.  Furthermore all deepest levels are reached at the same time point, so you can freely interchange any leaf at the deepest level without changing the total time for that tree.  Hence, the two smallest nodes can be made to share a common parent.  This pair of two smallest nodes and their parent can be replaced with a single leaf which takes as long as the longest of the two smallest nodes plus the split time.  This results in an equivalent problem with an identical solution presuming we have started with an optimal solution for the original problem (I don’t have a strong proof of that, but it appears obvious – decomposing a node can only make it easier to arrange the tasks to achieve an optimum so if we are already optimum, we are also optimum for composition).  We can now start all over again, but we have 1 less leaf node.  Hence we can repeat until we only have 1 node, and the total cost is now obvious.

Q3) Given two rows of equal length of up to 16 people each with heights between 140 and 190, determine the minimum number of adjacent people swaps such that every member of the second row is larger than its corresponding member in the first row.

A3)  Interesting question… the limits of only up to 16 people suggests an exponential search is feasible.

Looking at one of the passing answers, it looks like an exponential search on the minimum number of swaps to make the first k elements correct, looking at every possible subset of the k-1 swap locations.  I’m not sure how this accounts for swap ordering… maybe I’ll work it out after some sleep.

So, post sleep revisit…  The approach is to determine the minimum number of swaps required to have index i and smaller satisfy the criteria using i out of the available slots.  The tricky bit in understanding the solutions is that they use a compact state space.  If you use a bit field, with i bits removed, the fact that there are i bits removed indicates that those numbers were placed in the first i slots, in some order.  So all bits set means any ordering, so that is where we start.  For each set bit, we consider moving that value to the beginning.  This will take a number of swaps equal to the distance between the start and the candidate, the rest of the values will retain the same order, and obviously we only do this if moving it results in a satisfiable condition.  We now have a bunch of new bit fields to determine the minimum swaps for, with the bit corresponding to the moved index removed.  So looking at each new state we can determine which index we are moving to by counting the number of gaps, and we can start again.  However when it comes to distance to move to the new spot, we have to account for the fact that in moving an item to the left, the items to its left have all moved right 1.  Conveniently we can determine this from the bit field.  Each 0 bit corresponds to something which was moved, so if we pretend the 0 bits aren’t there and align right, that’s where the candidates are currently placed.  So the count of set bits to the left of the candidate to move gives the number of swaps.

This recursion can be memotized over the bit field to ensure it runs fast enough.  Terminal condition is the empty bit field which has a cost of 0.  All that remains is to prove this approach correct…

Looking at the recursion from the inner most levels first.  Obviously if all are in the right place, the cost to move them into the right place is 0.  If all but 1 of the left most spots have been made correct, obviously the rightmost must stay still, and if it is in the correct spot, the cost is 0.  If all but 2 of the left most spots are correct, either we swap or we don’t.  If we swap the cost is 1, otherwise the cost is 0.  The recursion will try both options so there is definitely no loss of generality yet.  All but 3.  In this scenario if the ideal scenario has the right most moving left 2, there is nothing to gain by swapping the other 2 first, it just increases the cost now and increases the search space, we can leave deciding whether to swap those until after swapping the right most across without any lower cost scenarios lost.  Thus demonstrating that we do not lose generality. (Not a proof, but doesn’t look too hard to formalize.)

Now if you’ve been paying close attention you’ll notice I left something out completely.  Everything above presumes that you only swap the people in the front row, or only in the back row.  I can’t see a good way to demonstrate why this still leads to the correct solution, but intuitively, mixing front and back swaps serves to lose the ability to chain the swaps in order to place items in the correct spot.  In order to perform the chaining you end up needing to do the corresponding front swap for a back swap or vice versa (or simply repeat the same swap again, either way it seems obvious that the number of swaps is increasing).  In some scenarios mixing the swaps doesn’t make things worse, but it does not provide ways to make things better.

Its TopCoder Time!

So I went in topcoder 2012 R1A and wow was I slow…  Easy questions, and I only implemented solutions for 2 out of 3 them with 3 minutes to spare.

724th before system tests, cut-off is top 600.  After system tests 655th.  A single challenge would have gotten me through, and I was staring at an incorrect java solution for the 250 for a while but someone challenged it faster.  So I shall be back next week, which will almost certainly be much easier with 600 of the top people going through this week.

Q1) Given a game where points are given out 1, 1, 0.5, 0.5, 0.25, 0.25 … and list of peoples turns which may not be in the correct order, determine who could have possibly won outright (no draws). (Max 8 turns.)

A1) This question caught me badly, I immediately presumed that there was an order of play, and the turns were just shuffled to be confusing.  This lead me to a simple solution which was completely wrong, which I ‘corrected’ twice before finally realising (looking at the final sample where 1 person had 3 turns and 1 person only had 1) that I had completely misread the question.

So, throw everything away, start again.  Simulate the game for every permutation of the given turns and collect the distinct set of winners.  C++ coders used next_permutation to great effect.  I just wrote the brute force.  Still even if I had have read the question correctly, I wasn’t writing code fast.

Q2) Determine the number of fractions a/b where a < b, a is co-prime with b and a*b equals one of the factorials less than or equal to N!. (Max 250 for N)

A2) Took a moment for my brain to get started here as well, I momentarily went off on a side track of counting the number of fractions less than 1/k which have a*b = m!, but realized pretty quickly that there was no way that was going to help me avoid counting the scenarios where a and b are not co-prime.  A bit more thinking and I realized that I could count all co-prime fractions a/b without restriction of a < b much more easily.  And since there are no factorials which are square numbers (ignoring 1, which we can) the answer is exactly half that.  So how do we count these?  Well break k! factorial into its prime factors and their powers, obviously if a, b are co-prime, then no prime factor is shared, so all we need to do is count how many ways we can divide the distinct prime factors into 2 groups (including cases where the groups are empty, since we can always make a or b 1).

Here I had a moment of stupidity.  How many ways can you divide N items into 2 groups… well that’s the sum of n choose k for all k, 0 to n – and choose can be implemented using a pascals triangle cache for high speed.  Err… Wait a second…  or I could have just raised 2 to the power N…  But too late, already implemented and tested, so submitted it shall be.

But again, it took me way too long.

Q3) Given a set of light switches which can control multiple lights, and each light is controlled by up to 2 switches and the list of exactly which light each switch toggles, and an initial state of lights, determine the minimum (non-negative?!?) number of switch uses required to turn all lights off. (Max 50 lights)

A3) This problem is so easy, I wish I had of had time to look at it.  Since each light is controlled by at most 2 switches, switches can be in 3 states.  Useless (because they aren’t used anywhere), Defined (because they are the only switch for each light they switch) and Joined (because either the two switches must be the same or opposite by the connecting light).

Collect all joined switches into clumps.  For each clump, try toggle or not for a single switch, the rest of the switches become defined.  If all lights turn off, determine how many toggles that matched to and choose the smallest if both toggling and not-toggling works.  Add all clumps together, add in the defined and ignore the useless… and you are done.

Well, except I didn’t mention all the failure scenarios.  A ‘Defined’ switch could have no solution if the lights it is defined for do not start in the same state.  Also neither state for a clump may result in all lights off.  Finally there could be a light which is not referenced by any switch and is not already off.

 

So in the final analysis it looks like if I hadn’t of misread the first question, there is a decent chance I would have saved enough time to get me the 40 points I needed to get through, and potentially even have been making good progress on the 3rd question when time ran out.

TCO11 R2

No t-shirts for me this year…

Only 760 out of 850 potential contestants turned up, and I came 443rd, short of the required top 350 which advance to shirt round.  I think 7 Aussies qualified for this round, but only 6 registered.  Out of those 6, 3 failed to get any points (possibly fell asleep!), 2 of us around the 440th place, and John Dethridge advances 315th place.

I could have made it through to get a t-shirt, but I took too long to realise I had an overflow bug in my count number of powers of k less than or equal to n loop.  I really should have implemented it using a divide, but also I should stop using 32bit integers – except where absolutely needed.  One day I will learn!

I had time enough to do question 2, but it was not easy and I was very tired.  I looked for corner cases for Q1 hoping to get the single challenge I needed to get through, but the given examples covered all of them, so there was basically no one to challenge.

Q1) How many boolean sequences of length n are are plausible, if the ith entry is the answer to whether a number is divisible by i.  Return answer mod 1000000007 since input can be up to 1000000.

A1) I was a bit slow in realizing the correct approach here, but it wasn’t too long before I realized the answer was related purely based on the primes less than or equal to n.  Specifically each prime doubled the answer at a minimum, more if different powers of that prime were less than n as well.  More specifically if the maximum power of a prime p less than or equal to n is p^k then the prime independently contributes (k+1) ways to build the final sequence.  So, multiply all these factors mod 1 billion and 7 and you have your answer.

What kept me too slow to advance was it took a long time to realize that my loop for determining max power was potentially squaring numbers just less than 1 million, which doesn’t fit into 32bit integer.  I had correctly identified that I needed to use 64bit integers for the result as small powers go more than squared and so mod 1 billion and 7 is not sufficient to keep you under 32bit limit, but I should have used them more liberally from the start.

Q2) Given an square array of strings which are built up by concatenating the smallest of the vertically and horizontally previous cells in front of the largest, where the first row and column are provided (and each have a single character in each cell), return up to 50 characters starting at a specific index of the string in the far bottom right corner (the largest string).  Note: no character will be repeated in the first row or column.

A2) Pascals triangle gives the length of each cell – and with a worst case of 30×30 the length of the largest string can be very large!

Having built up pascals triangle, all that is really needed is to know for each cell, which of the two vertically and horizontally previous cells is lexically smallest.  As usual, ‘all’ is of course the whole actual problem.  I thought I was on the right track for this, but I suspect its performance is too slow, the trick being to define a memotizable recursion which doesn’t require an offset and length for comparison for both cells involved.

Edit: Before I managed to fall asleep I think I solved this.  The memotizable recursion is ‘which is smaller for 2 cells on a common diagonal?’.  If the 2 cells are neighbours on a diagonal, they are built out of the 3 cells on the previous diagonal. From there it can be seen that the middle cell can be ignored as it is either the common start, the common end or transitive property applies and you can just compare the outside pair.  If they are not neighbours, you get 4 cells which are in neighbouring pairs.  By using the memotized recursion on each neighbouring pair you can then reapply the recursion on the smallest of each pair.  Memotization ensures that the algorithm runs in O(n^3), both in space and time.

Now if you can read between the lines there you’ll notice that I have failed to take in to account an important special case.  Once the recursion hits an edge we need to compare that edge value to a cell on the same diagonal.  Unless that other cell is on the opposite edge of the diagonal, this doesn’t appear trivial.  However the original problem was ‘determine the nth character in the bottom right hand corner’, we can trivially expand that to also ‘determine the first character in every cell’.  Since the answer to these new questions depends on the which is smaller question, it might seem we are at an impasse – but because each question is entirely answered by considering the previous diagonal alone, we can invoke mutual memotized recursion safely.

All that remains is to prove that this method works.  Something which is obvious is that each cell is only made up of characters which appear on the edge in the same or lesser row, or same or lesser column.  Thus since every comparison in the algorithm involves two different cells on the same diagonal, it can be seen that the termination comparison is never equal, as an edge on a diagonal compared to a different cell in the diagonal cannot be equal as the cells on the diagonal cannot be made out of that edges character and all edge characters are distinct.

Q3) Given 2 arrays of positions, and a set of ‘connections’ from one array to another, determine the number of orderings of the two arrays such that no connections cross. (Return mod 1000000007 as with arrays of length 50, the answer might be very large.)

A3) Edit: Also solved this one before going to sleep.  I think it is easier than Q2 even.

First step, DFS the connections to form them into groups.  If you discover a loop, return 0.  Now we have our first factor in the answer, n! where n is the number of groups.  Each group is going to be independent, since there is no way they can cross each other without causing a crossed connection so the n! comes from the orderings of the n groups.

Each group now contributes an independent factor.  There are 2 scenarios, either every edge in the group joins at a common endpoint, or there are 2 or more locations in the group where 2 or more edges join.  If it is a single common endpoint the answer is k! where k is the number of edges in the group.  If it is more than two common endpoints we can either arrange said endpoints left to right, or right to left – so that gives a factor of 2.  Then for each endpoint we also get a factor of (m-2)! where m is the number of edges which join to that endpoint.

Finally we have additional factors for the array locations which do not participate in a connection.  These can be added in any place in any order, so if there are n spots in the array and k of them are endpoints of 1 or more connections, the additional factor is n!/k!.  Since there are 2 arrays, this gives 2 more factors total.  Multiply everything together using repeated modulo and 64 bit integers… and we should be done.

All in all it makes me wish I was less tired when I did the competition.  I think I could have solved that second question during the competition if I had of been able to concentrate at all.  Third one, well I’m pretty sure I would have run out of time having only opened it with a few minutes to go.

TCO11 R1

With 2000 qualifiers able to sign up, only 1550 actually did.

I started writing this post while the solutions were going up, I wasn’t very impressed with my effort (even allowing for the 2am start time and the remnants of the head-cold I’ve had over the last few days), and fully expected that I had failed to make it through to the next round.  Scores were still updating but I was 852 and very few re-orderings were occurring due to system test failures.  However, at that point I also thought that the cut-off was the top 600.  I was wrong – top 850 go through – and final results have me at 845th!

I guess that means I get the privilege of waking up at 2am next week, along with the 6 other Australian’s who progressed. Only 8 Australians bothered to wake up (including myself), I’m sure more qualified than that…

Q1) Given a sequence of characters X and O, you want to reorder them into a different sequence.  4 operations are available, move the front to the end of either of 2  separate storage locations, or move the front of either of the 2 separate storage locations to the end of the original.  Return the minimum number of operations required.

A1) Simple! – although at 2am nursing a muddled head it certainly took me long enough – a few seconds longer and I would have been eliminated this round.  The answer is 2 times the minimum number of characters to remove from the front before the rest is equal to the start of the desired sequence.

Obviously there is no way it can be smaller than this, all that remains is to justify that it can always be achieved in this number of turns.  To do this, all it takes is to realize that there are 2 storage locations and 2 character types.  So every X you move to storage 1 and every 0 you move to storage 2.  Then you can put them back on to the end of the initial location in whatever order you like, no restrictions since each storage is all the same character.

Q2) Given a set of probabilities that a road segment is muddy, and the ability to jump over any single road segment you like, what is the expectation value for the minimum number of muddy road segments you have to stand in to get from start to end? (Start and end are both guaranteed to be dry.)

A2) As probability problems go, this one isn’t as bad as some.  However, my first write up I tried to build from start to finish, rather than from finish back to start – which was completely wrong, so I had to scrap that and start again.  I then made about a dozen mistakes getting 0 and 1 mixed up.  No idea why but I had decided to choose 1 as dry and 0 as muddy and half way through I switched that all backwards…  I should have realised that 0 as dry and 1 as muddy was a more natural assignment from the start.  I finally sorted them all out, but only a couple of minutes after the end of the available coding time!

So I wrote the solution as a recursion of expectation value for a given position depending on whether the position closer to the finish is muddy or not as well as the current position is muddy or not. (Apparently it can even be defined as a straight recursion, like a complicated version of the Fibonacci series – but I am too tired to see the path to that.)  From there the answer is the sum of the two scenarios for position 0(start) where they are not muddy, weighted average by the probability of each scenario which is given by the probability of position 1 being muddy.  Memotized recursion or DP to the answer.

Details of the recursion are rather long to fully write out, but can be summarized as a couple of conditions. If the next position is dry, take expectation value of that position for the dry case (2 scenarios weighted average), and then add 1 if the current position is muddy.  If the next position is wet, take expectation value of the position after in either dry or wet case (4 scenarios weighted average) and add 1 if the current position is muddy.

This works because if the next cell is dry jumping is not required, and worst case you jump into mud, so you never try jumping.  If next cell is muddy, you always jump, because if the cell after is dry its a winner, if it is muddy then you have stepped in the same number of muddy cells as if you had not jumped, but you are closer to the finish (that is not a formal proof, but hopefully obvious).

Q3) Imagine IP addresses where each of the 4 components are valued 0 to 999.  Then imagine that you effectively own all of them, and that people are willing to pay for each IP address.  Up to 50 people make a request for 1 or more IP addresses and give a price they are willing to pay per address.  Each request is in the dotted form, with wild cards allowed for any/all of the 4 segments(no partial wild cards, each segment is either a number or a match anything wild card).  Determine the maximum profit to be made.

A3) Interesting question.  With up to 50 trillion operations in brute force, there definitely needs to be some accelerated counting.

One failed solution I saw calculated all intersections, to gather each section where there is multiple possibilities. Sort them by size largest to smallest, then do inclusion/exclusion counting to get the answer.  I think the general principle was sound, but the implementation was lacking.  The trick comes in determining inclusion/exclusion counting.  Worst case for unique intersections is less than 70000, but for each one it could be the result of many different ways of combining, some of which might be needing an inclusion, while others need an exclusion (I think?).  I am thinking forming them into a potentially interlinking forest, which is depth first searched from each root with some kind of memotization to produce the result without a performance explosion.

5am has arrived, I’ll have to come back to this one. (And maybe QR3, which I never wrote up…)

GCJ11 R3

A quick look at how people performed in this round shows it to be a very tricky round.  No perfect scores this round, but with only 25 people advancing to the finals, unless you solved question 4 large (which was worth 31 points and only had 1 person solve it correctly), you needed everything else and a decent time to advance. Representing Australia, tikitikirevenge came close in 48th place.

Q1) Given an irregular shape defined by two poly-lines with identical start and end x coordinates, and continuously increasing x coordinates in between (but each poly line itself may have a different number of points) and these poly lines do not intersect.  Determine G – 1 x coordinate positions where cuts with dx=0 (vertical) divide the shape into G equal parts.

A1) First thing to do is to break the shape into trapeziums, at each x coordinate specified in the 2 poly lines, taking care for the case where the x-coordinates are identical, and otherwise having to determine the y coordinate for the other side using interpolation. Area of a each trapezium is easy, so we can then work out the total area, and the G equal sub divisions of that total area.  Iteration easily finds which trapezium contains each division point, all that is left is determining where in the trapezium the appropriate point is.  This could be done using a binary search, or simply derive the equation (which will be quadratic) and solve for the position given the target area (second solution will be outside the trapezium).

All in all, this question is fiddly but very doable.  Not surprisingly then this was the question with the highest success rate.

Q2) Given a set of N numbers each between 1 and 10000 inclusive, with possible repetition, you want to organize them into sets of consecutive integers such that the smallest such set is maximal.  What is the size of that maximal smallest set?

A2)  Small input size is 10, and so a brute force approach is plausible there.  Large input is up to 1000, which is much less friendly.

Firstly I think it is useful to break the problem down.  If you sort the numbers, you can then break the problem into sections based on where there is a gap in the sorted sequence.  The answer is then the minimum of the answer for each section.

Now all we need is the answer to a given section.  Where ‘all’ is pretty much the entire problem.  Obvious cases first.  Same number of repetitions for every value; answer equals set size.  If the repetition of a single value is greater than the sum of the repetitions of its neighbours, the answer is 1. In any case that we get an answer of 1, we can avoid looking at all other sections.

That still leaves a fair bit to consider.  Anywhere that the repetition count increases has to be the start of at least a number of sequences equal to the difference.  I am wondering if the greedy approach of starting no more sequences than required, and when the count decreases, stopping the oldest sequences, will produce the ideal answer.  It certainly seems sound on the first glance. A quick check of the answers shows this to be the solution.

Definitely a lot more thinking than the first problem, but again doable.

Q3) There is an RxC grid where each square contains a conveyor belt which points either vertically, horizontally, or one of the 2 diagonals.  Each conveyor belt has an individual direction control.  Determine how many out of the 2^(R*C) possible scenarios result in the conveyor belts pointing only in loops. (I.E. No cell can have 2 cells pointing to it.)  The grid wraps around on to itself ‘by the miracle of science’.  Answer modulo 1000003 to avoid huge answer problems.

A3) Small input RxC is up to 4×4, so only 2^16 scenarios to investigate, and each scenario can be checked relatively quickly, simply by counting how many cells point to each cell.  So, small input is trivial (and it receives a tiny number of points too).  Large input is 100×100 – hence why we need that modulo.

Special cases – a cell might have nothing that can possibly point to it, causes a result of 0.  Also a cell might have only one thing pointing too it, hence the direction of that conveyor is ‘defined’. A straight line loop of conveyors has 4 possibilities if length even, 2 if length odd.  Any such loops can be removed from the graph as a multiple of the answer obtained otherwise.  Otherwise the puzzle contains no reversible loops.  Which is a bit of a pain really, because it means decomposition appears infeasible.

At this point I’m pretty sure I’m out of my league, so I shall consult the answers.  Interesting!  Having repeatedly forced conveyors where a cell has only one option pointing to it, and exiting if there is ever no options for a cell, an interesting thing occurs.  Every remaining partially (or complete) undecided cell has either exactly 2 options for how conveyors point to it, or away from it (or both), due to conservation of edges.  This implies that choosing one direction, always results in forcing a direction, which in-turn forces another until you come round in a loop, and since you come round in loops, the problem is decomposable after all.  Each of these ‘loops of force’ is entirely independent and contributes exactly 2 independent ways to solve the problem.  So count the loops, and keep doubling modulo 1000003 to get the answer.

In competition I doubt I would have gotten this out.  The fact that the ‘loops of force’ aren’t constructed of consecutive cells makes it hard to observe by accident.

Q4) Given a binary representation of a square number, where some of the bits have been replaced with question marks (but not the highest set bit), determine the original number. (Only one answer will be possible.)

A4) Small input constraints of maximum 60 digits and at most 20 question marks made brute force an obvious and trivial course of action.  This was actually answered more frequently than the easy answer to question 3.

The large input only had one person successfully solve it it, so I don’t feel too bad that nothing springs to mind.  Constraints were 125 digits and up to 40 question marks.  My first thoughts are, maybe fill in the highest 20 question marks with each possibility, and then get the bounds of the 64bit integer which matches the lower 20 question marks, and then try each one.  Problem is, in the worst case scenario, that appears to be even worse than brute forcing all 2^40 scenarios.

A few little trivial cases to get started.  If the right most digit is 0, it and the digit to its left can be removed and ’00’ appended back onto the answer afterwards.  If the second right most digit is a 1, something is wrong because all squares either end in 01 or 00.  If second right most digit is a question mark, it must be a 0.  This leaves last 2 digits are ’01’ or ‘0?’.

The winning approach was to divide the number into half, if most of the question marks are in the top half, try each possibility for the bottom half, and use that to infer the top half (somehow).  Otherwise try each possibility for the top half, and use that to determine the bottom half.

Lets look at the second option first, since it is easiest.  If we calculate the square root of the maximum value which matches the specific pattern for the top half, round down and then square – we have our answer (assuming that it matches the pattern.)  Only trick is to acquire the square root of a 125bit number.  Binary search using BigInteger will do the trick.

This leaves the first option.  Lets put aside the possibility of the answer being even.  Above it was mentioned that the last 2 digits we actually work with are 01 or 0?, we can use this as justification for ignoring the possibility of the answer being even, because 0? can be broken into 00 or 01 sub cases, 00 causes recursion, 01 does not.  (Since only one side of the question mark ever causes recursion, there is no exponential run time problem to worry about.  Another important point is to only consider whether the low or high bits have the most question marks after ensuring the last 2 digits are 01, otherwise the algorithm can be easily tricked into running too slow.)

So, since the answer is odd, it ends in 01, but how does this help us?  Well if the answer is odd, then the square root is odd, so it either ends in 01 or 11.  IE it is 4A+1 or 4A+3.  If it is the former than the square is 16A^2+ 8A + 1 = 8A+1 mod 16, if our input is a specific mod 16, we can determine whether A is odd or even, which gives us one more digit of the square root.  Similar applies to 4A+3.  In both cases, this continues.  Given the right half digits of the original number, we can slowly build 2 possibilities for the same number of digits of the square root, which either ends in 01 or 11.  Since the number of digits in the square root is no more than half the number of digits in the input and we’ve filled in all question marks in the right hand half we have 2 complete possibilities for the square root.  Given these two possibilities we can then check which either actually matches the input.

Thinking about this problem, in hindsight I believe I’ve previously been shown this approach of determining the square root of a number given only its right hand half, but I doubt it would have helped me in the heat of competition, not even if it was the only problem I did.

Final analysis – the small inputs are generally all pretty easy to implement, the first 2 large inputs were in my doable category, this leaves my ‘feasible ideal placement’ between 160th and 90th – which matches with where I get eliminated in my best days.  Would have been nice not to screw up round 2 so badly 🙂

GCJ11 R2

Out again in round 2.  Last year in ~800th, this year in 1263rd. So again no t-shirt from Google.

I am a bit disappointed with my effort, the answer to question 3 had been staring me in the face for an hour, but my late night addled brain just refused to see it.  If I had of seen it, I think I could have even solved the large input and made it through to round 3.  Maybe I would have seen it if I had of spent a few more minutes on it rather than doing question 2 small input.

Q1) If you are in a corridor of a certain length, and there are moving walkways some or all of the length of that corridor, and each walkway has its own speed boost which it applies to your speed, and you can walk at a speed S and run at a speed R, but you will run out of puff if you run for longer than T total, what is the shortest time to get from the start of the corridor to the end?

A1) This problem was very easy, although it was a bit fiddly.  The key to notice is that you want to run during the period of time that your walking speed + walkway speed is slowest, as that has the greatest net effect on the final time.  So, just sort the input by walkway speed, run on all non-walkway bits you can, then on the walkways in the order they are now sorted.

Q2) Given an rectangular array of point masses, where each mass is given as a deviation from the smallest mass (and no deviation is more than 9 units), determine the largest square array of these point masses, which if you remove the 4 corners has a centre of mass at its actual centre.

A2) Again, simple enough, but the brute force attack (which is what I implemented to try and get a few more points) is O(N^5) and the large input is up to 500×500.  The key is to do a bunch of pre-calculation of moment of mass and sum of mass of every rectangle array starting from the top left corner, about (0,0) in O(N^2) time by using the fact that moment of mass and mass is additive, a larger rectangle can be determined from the sum of the rectangles 1 smaller in each dimension individually, minus the overlap, then finally tacking on the new mass element.  Once these pre-calculations are done, any rectangle can be calculated quickly by adding the bottom right corner and top left corner, and subtracting the bottom left and top right corners.  Then finally this can be adjusted by taking off the corners.  And to finish off we can check whether the centre of mass is in the middle by seeing whether the moments are equal to the product of total mass and the centre position.  Since there are O(N^3) scenarios to check, and they can now all be considered in O(1) time, the 500×500 isn’t too bad at all.

All in all, this question was perfectly doable, but I had decided to focus on question 3 for too long to come back and think about it properly.  If I had of done this problem I suspect I would have implemented a different approach.  Consider the number of 1xn and nx1 subsections of the input.  There are O(N^3) of these, which can each be calculated in O(1)  by building off each previous 1x(n-1) (n-1)x1 sections.  From there, each of the O(N^3) scenarios can be built up in O(1) from each previous scenario, by slapping on the rectangular edges and then subtracting off the corners twice (and adding on the previously removed corners).

Q3) Given an integer N, the numbers 1 to N can be ordered and then processed in the following way.  As each number is considered, a total (starting from 0) is checked to see if it is a multiple of this number.  If it is not, the total is increased to the smallest total which is a multiple of this number.  If the number is no longer a multiple of one of the numbers previously considered, this process continues until all processed numbers divide into the current total.  Throughout all possible orders what is the difference between the most and least number of times that adding a new number to consideration results in the total being increased.

A3) The wording was tricky, but it boils down to how do you order the first N numbers, such that the running lowest common multiple changes the most number of times, or the least number of times.

I fairly quickly identified that the most number of times occurred when you process the numbers in order, further identifying that the actual answer was equal to 1 + the sum of the maximum number each prime could be raised to and still be less than N.  That is half the problem solved…

However, I started by considering the reverse order, which worked correctly for small values.  I then considered maybe ordered by the number of prime factors, but that was actually even worse.  Finally I ran out of time with my sleep deprived brain giving in.

So, what was that second half of the answer?  It comes right back to the first part, put the maximum power of each prime in first, then everything else will not change the LCM.  I have no idea why I failed to consider this scenario, it seems so very obvious now.

And this leads to how to do the large input, where a maximum value of N was 10^12.  O(N) was going to be too slow, and looking at 10^12, I had immediately thought that the answer would involve the primes less than or equal to the square root. (I even wrote the once off pre-calculation of all primes less than 1million, to be ready for needing that scenario.)  With the minimum answer being the number of primes less than N, and the maximum answer being the number of primes, and all of their powers less than N, the difference is all of the prime powers of at least squared, which are less than N.  Hence that array of the prime numbers less than 1million is all you need to implement a very fast answer.

Q4) Given a graph with unit edge distances, determine the shortest path between 2 points, which is also within 1 unit of the most number of vertexes.

A4) The shortest path part of this algorithm is easy enough, the maximizing the number of vertices near the path is an entirely different story.  Fairly obviously it will be a memotized recursion/dynamic programming solution, but the recursion is not trivially obvious.

I looked up the answer because it is way early in the morning and I just wanted to finish writing this up before dawn…  So the recursion you use is to consider the maximum number of vertexes ‘near’ the shortest path to b where the previous node in the path was a.  This is equal to the maximum over the sum of paths ending in a with a each possible previous node before that, plus the number of additional new ‘near’ nodes which get added extending that path to b.  This is the key point, it is possible to work out the number of new ‘near’ nodes in a path if you know the last 3 nodes of the path.  This observation occurs because you can see that if a node next to the latest node in the path was close to anything before those last 2, then the second latest node isn’t actually on the shortest path, as it could have been skipped via this theoretical other path.

And in-fact doing that is actually the most time consuming part of the problem, but it isn’t exactly hard, just consider every vertex, is it next to the end of the path, if so, is it next to either of the previous 2 on the path.  Using an edge matrix, this is O(V) for each scenario.  Cache each triple to avoid repeated calculation, and you are done.  (Slight optimizations exist, but are not required given the input size.)

GCJ11 R1C

The problems in this round were just a tiny bit easier and it showed – qualifying required 2 whole problems, or a good time on 1 whole and 2 smalls.

Q1) Given a rectangular array where some tiles are marked with a hash, replace it with a new array where the same tiles which were marked with a hash are marked with 2×2 arrangements of ‘\’ and ‘/’ which look like diamonds.  This may not be possible, if not then return as much.

A1) Pretty straight forward question, the key is to identify that if a hash mark has nothing to its left or above, it has to be the top left corner of a 2×2.  So replace the whole 2×2 immediately.  If this overlaps with the edge of the array, or any of the 2×2 aren’t marked, fail immediately.  Once done, repeat until complete or fail.  A simple left to right, top to bottom walk solves the problem quickly and efficiently enough.

Q2) Given a sequence of stars, and a distance between each you want to travel from the first star, to the final star in as short of time as possible while stopping at every star in the exact order given.  To slightly complicate things your engineers can build up to L speed boosters on stars.  These speed boosters take a time t to be built, but once built all travel between the star it was built on, and the next star in the chain is performed at double speed.  This even applies if the space craft is between those stars when the building completes.

A2) With the large input claiming up to 1million stars, this problem looks a bit scary at first.  However a quick think and you should realize that a greedy algorithm applies.

The answer is (sum of the distances)/speed – (sum of speed boost benefits).  Since the first part is a constant, and the second part has a constant number of components – the minimal time comes from maximising each component in the sum of speed boost benefits.

So, simply calculate the speed boost benefit for each star, if a speed boost was built there, and then sort the array descending (no problem for 1 million entries using merge sort), and sum the first L elements.

Only things to watch out for are using 64bit integers and ensuring that your speed boost benefit formula correctly uses t and covers all 3 scenarios – ‘useless’, ‘full benefit’, ‘space craft in mid-flight when building completes’.

Q3) Given a list of N integers, determine the smallest integer in the range L to H which either divides or is a multiple of each integer in the list, if such an integer exists.

A3)  I solved this one on my own for a change, but the implementation for the large input is a bit hairy – so high risk I would have failed during competition.  The small input constraints allowed a trivial brute force so many people submitted that on its own.

The key is to sort the integers, then the answer is either smaller than the smallest, or larger than the largest, or part of one of the segments now described by each consecutive pair.

Lets treat each case separately for now.  Less than the smallest, determine the GCD of all numbers, and find the smallest divisor of that GCD which is in the range L-H.  If L>GCD – nothing, otherwise can generate all divisor pairs by walking the integers 1 to Sqrt(GCD). Larger than the largest, determine the LCM of all numbers, and if less than L use a quick division to determine the first multiple of the LCM which is in the L-H range.  If greater than H, obviously no answer.

This leaves the gaps.  Based on these two cases above it should become apparent what is required.  Determine the LCM of the first numbers, and the GCD of the rest.  Then the answer is a multiple of the LCM which also divides the GCD.  Obviously this won’t work if the LCM doesn’t divide the GCD, so that is a quick check.  Next L<->H must overlap with the LCM<->GCD range or there is no point trying.  If LCM is in the L<->H range, obviously we are done.  Otherwise we walk the integers 1 to Sqrt(GCD) again, checking to see if any divisor pairs contain a multiple of the LCM in the L-H range.  Choosing the smallest if there are multiple options.

So putting it all together, generate the LCMs starting from the left, and the GCDs starting from the right, try the less than smallest, then walk each consecutive pair of the sorted order, and finally the greater then largest and if at any time a solution is found in any of these parts, return it.

So I mentioned that the large input was a bit hairy and this is because of the LCM calculations involving numbers which are up to 10^16.  Using BigInteger saves you from overflow risk, but unless you realize that there is no need to continue calculating LCMs once they exceed 10^16 (since they are greater than the maximum H value) you run the risk of performance being a problem as BigInteger becomes slower as the size of the data increases.  Using a 64bit integer is okay, but you need to hand code or otherwise check for overflow in multiplication, and you need to apply the GCD division before the multiplication!

Performance wise the biggest cost is the walk from 1 to Sqrt(GCD) as this can be up to 10^8 modulo operations.  Luckily this can only happen for at most 1 segment, but in case 100million modulo operations is too scary, an improvement to the average case is to divide the L<->H range and the GCD by the LCM.  With the L<->H range round inwards, the GCD will divide cleanly.  Then you can start with ceil(L/GCD), walking up wards to Sqrt(GCD/LCM), then jump back to just below ceil(L/GCD) and walk down to the first divisor for which the other side of the pair exceeds H/LCM.  This doesn’t improve the worst case, but does give you early out and reduced calculation space in the average case.

GCJ11 R1A

So like last year, I scraped through in R1A with only 1 out of the available questions solved.  But this year I was a bit faster, so I got 655th rather than 800+.

I was actually very close to a respectable score – an off by 1 which took me almost 2 hours to find… *sigh*

Q1) Given an inclusive upper bound on the number of games played today (and at least 1 game was played), with an ‘exact integer’ percentage win rate for today of Pd and an ‘exact integer’ percentage win rate for all time Pg, is the scenario even possible?

A1) The large inputs put the inclusive upper bound at 10^15 – which seemed daunting at first, but fairly quickly I discovered a short cut which I thought made it irrelevant…

First we can rule out a couple of obvious scenarios – if Pg is 100% then Pd must be 100% or something is wrong.  Similarly Pg of 0% implies Pd of 0% or again we fail.  For me the next thing I realized was that if the inclusive upper bound was >= 100 then the option of 100 is available – an important option in the presence of percentages.  In this case every value of Pd is valid.  In that case all that remains is to consider whether Pg is valid…

Pg being valid is actually very simple, so long as it isn’t 0 or 100, we can add so many games prior to today, that today becomes irrelevant, other than to serve as a tiny modulation factor which needs to be compensated for to make Pg an exact integer.  This applies regardless of whether the number of games today was 100, or much smaller.

Final scenario is where the inclusive upper bound doesn’t include 100.  This is less than 100 options to step through, so why not!  For each value of n, we check whether n*Pd is an integer, or similarly, whether n*Pd = 0 mod 100.  If we find a match the situation is plausible, if not then the situation is invalid.

So as I alluded to earlier the upper bound max of 10^15 isn’t a problem, 3 if statements take care of all scenarios >= 100.  However when I went to run my code on the large input file, it crashed!  I hadn’t noticed exactly how big the upper bound max was, and was using int.Parse.

Q2) In a variant of hangman where there is no way to lose, just a score that gets worse the more useless guesses you make, you are choosing words from a dictionary and your friend is playing to a specific strategy.  You want to choose the word which will give your friend the worst score – if there are equal options the first word in the dictionary is the one you take.  If the strategy can be described by a permutation of the alphabet, where each letter is guessed in turn unless there are no dictionary words which match the previous guesses which contain the letter, in which case the letter is skipped, given a list of these permutations and the dictionary, what word should you select?

A2)  A very wordy question (pun not intended!), and for the longest time the question text itself was self-contradicting.  Large input had a dictionary size of 10k words and up to 100 permutations to try for each dictionary.  Maximum of 10 test cases meant that it wasn’t entirely insane though.  A brute force attack requires iterating over the dictionary, for each entry in the dictionary – and performing this multiple times even – it was clearly not going to work in time O(D^2*P) is billions upon billions of operations for the large input.

I spent a while considering whether there was some dynamic programming approach but eventually came to the conclusion that I was just trying too hard, and that an O(D*P) solution would be okay even with a rather high constant multiplier.  So I started implementing.  The approach is not to try each possible word choice and simulate, but instead to simulate all possible word choices at once.

At this point I should mention something which possibly isn’t clear from how I wrote the question above, which is that since the first thing you do in hangman is to write down the full list of blanks, your friend immediately knows the length of the word, so whatever word you select your friend will only consider words of that length.

Put all the words of a specific length in a list, then going through each one determine what its response to the current letter being tried in the strategy.  The responses (which can be represented by a bit field as maximum word length is small) break the words into groups.  Each of these groups is self consistent – if any of those words had been the originally selected word, the group would be same set of words that your friend would consider as still plausible and the basis of the strategy.  Once each group is created you can move to the next letter in the strategy, applying it to each group, splitting it into potentially even more groups which remain self consistent.  All that remains is to keep track of the score for each group, and then select the best score once you get to the end.  Repeat for each different word length, and select the best of the best.

Keeping track of the score for each group is fairly easy, first group has score 0.  If a group response to a letter by splitting into 2 or more groups, and one of those groups contains words that do not have that letter, that group has its score increased by 1 compared to the input group.  Key thing to notice here is that if the response of a group to a letter is to not split, and the words do not contain the letter, the score doesn’t go up – because this corresponds to the scenario where the strategy is to skip a letter, because none of the plausible words contain the letter.

I actually had this all written with almost an hour to go, I even took the time to generate the maximal input file to make sure it wasn’t too slow.  Except it didn’t produce the correct answers.  I was confused, I hunted for potential issues – only found 1, and it was still wrong.  With 30 minutes to go I decided I would write the brute force implementation, just get the points for the small input or if I had time, maybe use it to work out why the proper implementation didn’t work.  But I made copious mistakes in the brute force code – even though it was theoretically simpler, I got it wrong in multiple ways.  After the competition finished, I still had to fix 3 more bugs before it passed the practice small input.  At this point I compared it to my proper implementation, and found a specific test case to focus on.  Turned out that my proper implementation was actually much better, it hadn’t suffered from any of the bugs of the brute force code.  Just a single mistake, iterating over the dictionary of grouping words by length, I hadn’t used foreach, but had instead just iterated over the possible lengths, 1 to 10.  Except my loop stopped at 9, oops!

Q3) You are playing a special single-player card game.  You have at least 1 card in hand at the start, and the rest in the deck – with a total of up to 80 cards.  You start with 1 turn remaining until the end of the game, but each card has 3 values.  T, S and C.  T is the number of additional turns you gain by playing it, S is the score you get by playing it, and C is the number of cards you draw from the deck, if you play it.  You actually know the exact order and contents of both your hand and the deck before you start – determine the maximum score you can achieve.

A3)  This was the ‘hard’ problem and I definitely agree it was far harder than Q2.  Only 3 people got out the large input, and barely over 100 got the small.

The key difference between small and large was of course the limits imposed.  In the small input, C is at most 1, where as in the large input C could also be 2.  I think with the C is at most 1 limit, the problem can be more obviously turned into a dynamic programming problem…

I certainly had to look up the answer here, but the key comes from determining properties of the ideal answer and using those to limit your search space, which would otherwise be huge.  Firstly it is worth realizing that you should always play a T>0 card if you have one available.  Since they give more turns, there is no downside to playing them.  Next is the C=0 cards.  For a given maximal solution, any C=0 cards (which are not T>0 cards) can be reordered to the end.  Furthermore amongst C=0,T=0 cards you will always want to play them in largest score first order.  All that remains is to determine the order of playing the C=1 and C=2 (T=0) cards in between, if they are played at all.

The final and most important trick here is to determine that if you play 2 C=1 or 2 C=2 (T=0) cards, they can be reordered such that you play the one which arrives in your hand first, first.  Finally this gives enough limitations on the search for an ideal solution.  So we have a graph with status ‘index in deck of last drawn card’, number of turns remaining (capped at 80, since over 80 is pointless), the first C=1 card we might yet play, and the first C=2 card we might yet play and the first T>0 card we might yet play.  The transitions on this graph always increase one or more of these 5 values, so it is an acyclic directed graph.  Since there are no cycles you can fairly easily search this graph for the ‘longest path’ from the source, where each edge is a score and each vertex has an end bonus related to maximal use of remaining turns to play available C=0 cards.

To enumerate the possible transitions we have ‘play a T> card’, ‘play a C=1 card’, ‘play a C=2 card’, ‘decide to never play the current C=1 card’, ‘decide to never play the current C=2 card’.  A performance improvement option here is to remove the C=1/C=2 card options if there is a play a T>0 card option, since we know playing T>0 first is never a bad thing (Or alternatively, get rid of all T>0 from the status info and state transitions – presume to have used all the could be in the hand as part of every transition, and in determining the initial status). Use a DFS to generate the graph, as it is going to be sparse and maximum recursion is 80. Also add backwards pointing links because they will be useful for determining the maximum length.

Recursion with memotization will solve the problem at this point.  Starting from the implicit end node which every node connects to via the play of playable C=0 cards, recur across each back pointing link added before, using the edge weight (score increase due to that action) to build upon the result of the recursion.  Memotization to ensure the runtime is fine.

TCO11 QR2

So this one was at a more reasonable time, so I participated.  281st, not as good as I might like, but good enough to get through.

Q1) Given a scrambled list of two item types with a total count of N, and the ability to at most once for each i in 1 to N switch two elements that far apart, determine the minimum number of swaps to order the list, if it is possible.

A1) So, the answer to this question was so obvious that it takes a while to see.  In the case of there only being 2 element types a single pivot sort pass will result in an ordered list.  If you consider a pivot sort where you walk from the outside towards the middle, swapping if both are on the wrong side, walking a specific end if it is on the right side – you can see that it will perform a minimal number of swaps and that every swap will be closer than the last.

Hence it is always possible to sort, and the answer is just the count of the number of items not in the right spot to start with/2.  Or you can simulate the pivot sort, like I did.

Q2) Given a ‘map’ (1 dimensional array) which shows the position of two critter types, which are either grumpy at time t % K = 0 or at t % C != 0, determine the minimum time to get from the left to the right of the map if you cannot by in a cell at a time when its occupant is grumpy, if it is even possible.  Maximum value for K,C and length of array are all 50.

A2) So this is really pretty easy, its a shortest path algorithm, with a small twist.  Because the behaviour of a critter depends on time, each cell needs to be replicated in the graph to represent how time plays a part.  Obviously replicating once for every time step defeats the whole purpose of a shortest path algorithm… but since behaviours are dependent on mod K and mod C, it is pretty obvious that t mod K and t mod C are the only things that matter for a given cells possible future paths.  So each cell is replicated K*C times (or LCM(K,C) if you want to get really fancy, but given the availability of primes near 50, worst case for K*C and LCM(K*C) are pretty close…

So 50*50*50 vertexes in the graph, and 3 edges from each vertex.  From there you just use a priority queue based Dijkstra algorithm and presto, successful solution done.

I on the other hand used a non-priority queue – which for a while I had convinced myself should perform much slower in the worst case.  Shows how rusty I am!  graphs with equal edge distances don’t need a priority queue as a normal queue already walks them in breadth first search order.  Using a priority queue would only make things slower by adding the logarithmic insert/remove cost overhead.  So yeah, in hindsight my algorithm works pretty well…  If .Net had of had a built in priority queue I probably would have used it and only made things marginally worse!

Q3) Define S(x) to be the sum of the digits in x.  Define D(x) to be the recursive application of S until the input equals the output.  Given a range, determine how many of the contained numbers can be represented as y*D(y).  Range constraints: less than 10 billion.

A3) This problem was easy! – unfortunately I only realized as much with 10minutes on the clock, and I just wasn’t fast enough to write it all up.  I was 4 lines of code from being complete. (Although said 4 lines require a little bit of thought each…)  If I had of rushed rather than taking everything at an easy pace I might have saved those extra 2-3 minutes I needed to finish those 4 lines and perform the final debugging.  That would have gotten me in the top 40, which is where I know I am capable of reaching.

D(y) = 9 if y is a multiple of 9 else y mod 9.  If you’ve played around with digit sums before you would know this, if not a short exploration should make it fairly obvious.  So y*D(y) can be decomposed into 9 linear functions, which sometimes overlap.  Includes/Exclusion principle applies, so you can sum, subtract, sum, subtract your way through every possible n choose 9.  That is quite a few equations to write out, but everyone of them is pretty simple – if you could be bothered…

However! Every one of the 9 equations is of the form a*9k+b.  Note that these equations all have constant ‘mod 9’ values and they aren’t all the same.  Hence many pairs don’t have any overlap and so the inclusion/exclusion can be simplified significantly.  It decomposes into 3 pairs and 1 triple. This means you only need 9 base equations, 5 pair equations, and 1 triple equation.  I was half way through the pair equations when time ran out.  Interestingly I found one of the pair equations cancels with one of the base equations, and apparently there are more cancellations because looking at once of the successful answers, they only needed 9 equations and 7 of those were base equations. (So technically I was 2 lines of code away from being finished :P, 1 add and 1 delete…)