GCJ13 Qualifying Round

As is typical for GCJ – qualifying round had a large turn out, ~22000 people made a successful submission, of those 17059 qualified (at the moment at least).

63 perfect scores – 2 of which were Australian, and I don’t recognize either handle.  Radeye gets the unfortunate distinction of 64th place, having managed to solve all of the hardest problems, but having the large input on the trivial first question fail… (I have to wonder if he was trying the different language for small/large input thing, as his small input in perl looks fine at first glance… but it is perl, so yeah…)

Q1) Do some simple analysis of a board from a 4×4 equivalent of  tic-tac-toe where there is a special piece which counts as both X and O. Determine winner/draw/incomplete.

A1) This problem was so trivial they had to point out not to make it harder on yourself by attempting to determine inevitable winner/draw.  Small and large input, but the only difference was the number of test cases. (Importantly, solving this problem completely is not enough to advance to round 1.)

The problem also specifies that the board will be from a valid game, so you don’t have to handle cases where both X and O have won.  Check each row/column/diagonal for either winner – return if found.  Otherwise return draw if board is full, incomplete if not.

Q2) Given a rectangular array of ‘heights’ and a device which reduces all heights in a row, or column above an adjustable setting, determine whether the given heights could be obtained if everything started with equal height (which is greater than or equal to anything in the given heights).  Adjustable setting can only be changed in between usages, not as it performs a row/column ‘trim’.

A2) The important thing to notice here is that for any given square, it doesn’t matter what order the device interacts with it, it only matters what the minimum setting was across all interactions.  So if we can determine what height was used for each row/column – we can just run a simulation and check whether the input matches expectation.

For each row/column – the device cannot have been set any lower than the highest member of that row/column.  And if it was set any higher, the usage definitely could have been ignored entirely.  Therefore you can just determine the maximum value for each row/column – perform that simulation, and check result versus input.  This runs in O(nm) time and additional space.

A slower alternative (which I thought of first – and have not verified is actually correct…) is for each cell, check to ensure that it is the highest value of either the row, or column (or both).  If it is not the highest value of either the row or the column, it cannot possibly work.  Trivial implementation here is O(nm*max(n,m)) – which would still be fast enough (and requires no additional space, if you cared for some reason).  This can also be done in O(nm) by painting cells which are equal to max of row, or max of column in two passes over each row/column.  If all cells are painted, you have succeeded.  But the boolean for each cell has to be stored somewhere.

Q3) Determine the number of palindromic squares of palindromic numbers there are in the range A to B. Maximum value of B either 1000, 10^14 or 10^100 depending on problem tier.

A3) This is the first time there has been a problem with 2 large inputs, one harder than the other.  This problem was correspondingly worth a very large number of points.

The small input is trivial.  In fact you can do it in your head, since the number of palindromes less than sqrt(1000) is about 12, and of those only 5 are still palindromic when squared.  So just search the set 1, 4, 9, 121, 484 against each given A, B pair to return a count.  I wonder if go-hero.net has a category for the null programming language…  There was only 100 test cases, if you type fast and pre filled the form a bit, it shouldn’t be too hard to do that in 4 minutes.

The first large input poses a bit more of a challenge.  The first thing to realise is that these numbers are very rare, so we can solve all 10000 test cases by generating the list once and performing binary search to find indexes to take difference to get the count.  Indeed we could even pre-calculate the list and hard code it in this case, there aren’t that many less than 10^14.  An approach which works for this case is generating all palindromes less than 10^7, squaring each one and checking whether its a palindrome.  We can even generate all palindromes less than 10^7 by brute force, checking every number to see if it is a palindrome – it should still run in time.

The second large input is where it gets tricky.  The numbers are still rare enough that you can calculate them all before running any test and keep them all in memory, but generating them all becomes a bit more time consuming.  First off, we can’t possibly brute force 10^50 numbers to find palindromes to square, so instead we build the palindromes by taking the 10^25 numbers and either reflecting after or through the last digit to create our palindromes.  2*10^25 however is still way way way too many.  But if you look at the output of such a program for up to 10^8, you can get a pretty confident grasp on some ways to speed things up.  The only case where a digit greater than 2 is used in a palindrome which when squared still gives a palindrome – is 3 itself, giving 9.  So you can special case that and this means you are down to 2*3^25 cases.  This is still too many, but it is a big improvement.  A closer look will show that the digit 2 is only used in very rare circumstances.  If it is the first digit, all other digits are zero, or the last digit is 1 and is reflected through rather than after.  Otherwise it can be the last digit that is reflected through.  The former gives 1.5 cases per power of 10, a tiny number. The other reduces us down to 3*2^25 cases.  This is 100 million cases, each of which is worst case performing a 50 digit by 50 digit multiplication.  This may take minutes even with a good BigInteger implementation – so advisable to test before hand to see if it will be fast enough, or even better – pre-compute and store/load it.

However, we can do better.  A closer look shows that every palindrome which is still a palindrome when squared, the sum of the squares of its digits is at most 9.  This is why 3 only appears once, and why 2 is so rare.  But it does help us beyond that.  Out of the 2^25 scenarios from creating patterns of 0s and 1s – we can skip everything which has more than 5 1s, or more specifically 4 1s in the reflected section and optionally a 1 in the middle.  This gets us closer to 3*25^4 which is clearly going to run in time.  But the complexity of walking permutations like this might be a little bit annoying…

Q4) There are a set of boxes, which each take a specific kind of key to open.  They break the key in the process.  Inside the boxes there may be more keys.  You start with a set of keys and a description of all the boxes, return the canonical opening order to open them all, if possible at all.

A4) Small input wasn’t too bad here, 20 boxes.  For a given subset of the boxes being open the number of keys you have left doesn’t depend on what order you opened them, so you can do a depth first search on the canonical order, ignoring any states you find twice.  This is O(n*2^n) which for 20 is perfectly fine – especially with only 25 test cases to perform.

Large input however was 200 boxes.  The previous algorithm is no longer any good.  I spent some time thinking about this, but couldn’t come up with a convincing idea.  My best thought was that maybe if you did a reverse depth first search, from the answer back to the start, it might be self-pruning to the point where your search space ends up being much smaller than 2^200.

Looking at other peoples code, it appears that the solution is that there exists a much faster method to determine whether it is still possible to get to the final state from any given state, compared to finding a solution.  This method is then just applied iteratively every time you try to make a move in the depth first search on canonical order, and back track immediately if the ‘can still finish’ function returns false.

There appears to be two parts to this approach.  One is that you first check that there are enough keys in the given situation to open every chest.  So you just add up the contents of every chest with the starting scenario and ensure it is higher than the count of the number of chests for each key type.  Somehow, having checked this then makes the following seemingly nonsensical ‘can still finish’ function valid.  Create a directed graph connecting each key type of a chest not yet opened, to the key types contained in those unopened chests.  Then for each key type you have (ignoring count) start traversing the graph using a simultaneous breadth first search.  If you can reach every key type used by an unopened chest – the problem is still solvable.

I actually wonder if some of the solutions I am reading are actually correct, or whether the fact that there was only 25 test cases meant it was easy to fake a truly correct answer.

TCO13 R2A (take 2)

Attendance not that high again, 1279 registered competitors for this rescheduled round. But at least the servers seem to be more stable tonight (even if I am no less sleepy).

6 Australians this time.  I guess one of them had difficulty connecting last night.

Tough round – I only submitted a solution for Q1, and I had low confidence, since it was a greedy solution and I could see the vague potential for non-greedy answers.  During challenge phase, 40% of submitted solutions to Q1 in my room were successfully challenged, including mine.  I was too tired to identify bugs in other peoples code however, which left me with 0 points.

Which given the round difficulty was 560th before systests.  This raised to equal 441st in the final tally.

Only 2 successful solutions to Q3, and both of those people skipped Q2.  Only ~30 solutions to Q2.  Cut-off to advance was a fast time on Q1 and a successful challenge.  Looks like John Dethridge and nabbftw might be in with a t-shirt chance from this round with a decent Q1 time – and EvgeniSergeev managed to get a successful challenge, the other 3 Aussies (including myself) left on 0.

Too tired to do any more write up now – I’ll try and come back to it later.  But I do at least know why my Q1 answer failed.

Q1) Given 2 equal length sequences of letters, determine the correlated sub-sequences which when appended give the maximal result.

A1) I have managed to fashion a ‘brute-force’ solution which passes system tests, but I’m not 100% happy that it is actually correct.  But I will present it here anyway.

The first thing I did was trim the first sequence so that it was in non-ascending order – performing the matching trims in the second sequence.  It seems trivial enough to me that if your first subsequence ever increases, you could have removed the previous lower characters to get a better value.  The problem then seems to become a question of how best to ‘cross the append’.

I identified a few different ways that you could cross the append, and brute forced them.  Each way basically starts at some offset in the first sequence and tries to cross at some letter value.  Each letter which gets in the way (is lower than the desired cross value) is removed as it is encountered – and as each letter is removed we check to see if we’ve found a better solution.  However there is one exception, once you have crossed over, if a letter is in the ‘way’ it may correspond to a letter in the first sequence with value higher than your starting index, in which case removing it can never be a win.  These letters are just left in place and hope they don’t make the solution worse.  Every combination of offset and possible cross letter value is tried and we return the best we can find.

The way I handle this exception lowers my confidence in my solution a lot – but it works well enough for system tests.

A better solution is dynamic programming.  The DP is ‘best solution of length n using the last k characters’.  Initialized with length 0 being empty, each iteration we check the kth last character to see if it makes a better solution to length n+1 (than the previous length n+1 solution, if one exists) by pre-pending it to length n. (Proof of correctness of this still isn’t entirely obvious, I suspect a similar solution to ‘best solution of length n using the first k characters’ would fail – since an obviously correct solution would need to apply arbitrary subsets to jump from multiple iterations before to the new iteration, rather than being based purely off the previous iteration.  Something to do with prefixing vs postfixing would seem to be the key.)

Q2) Given a square grid, where each cell can be any value 0 through 9, and up to 10 specified values, determine the number of scenarios which exist where each row and column adds to the same number modulo 10. (Answer given modulo large prime.)  The grid may be large, up to 1000×1000.

A2) This problem boils down to 2 cases.  Firstly the width of the grid may be larger than the number of given clues.  In this case the answer is 10^((n-1)*(n-1)-k + 1) – where n is the width of the square grid and k is the number of specified values.  Since the number of specified clues is less than the width of the grid, there exists at least 1 row and column which is completely independent of the specified values, which can be used to guarantee whatever desired modulo for each row/column.

This leaves the case where we don’t have an independent row and column.  But we do have a maximum grid size of 10.  We can break this down further.  First if any row or column is fully specified, we know the target modulus.  If there are 2 target modulus that don’t match, there is no solutions, and we can immediately return 0.  If there is no target modulus, we try and see how many solutions there are for each potential target modulus and add them all together.

Looking at someone else’s solution I see that from here we can break the problem down in to independent subsets.  If a cell is not specified, than that row/column pairing are not independent as they are linked by the choice to be made.  But if the value is specified, then you can just subtract that value from the target for that row/column and they remain independent.   A disjoint set tracker can be used to identify these independent sets of rows/columns, and the number of possibilities for each independent set can be multiplied together to get a result.  Once we have an independent set of rows and columns, there is either no solutions or lots of solutions.  If there is lots of solutions, the number of solutions is much like the original calculation, 10^((r-1)*(c-1)-k), where r is the number of rows in the independent subset, c the columns, and k the number of fixed values in those rows/columns.  So all that remains is to identify when there are no solutions.  This appears to be possible to do by looking at the fixed values in each row for columns which are independent, and for each column for rows which are independent.  If the sum of each of these don’t have the same modulus, there is no way to set the values. (How you derive that leap of logic, I am unsure… but it sounds vaguely familiar from magic squares – sum of the sums each have to be equal.)

I wondered if a simpler solution is for a given target modulus, go to each value, set it to 0, perform any cascades forced by the modulus, and for each value you manage to set to 0, that is an extra power of 10.  If you ever get to a contradiction, its not possible.   To check, I wrote this up, and it passes system tests.  Written correctly it is O(N^4) and with N at most 10 this easily runs in time. (The more complicated algorithm above appears to be O(N^3).)

Q3) Given 2 numbers A and B, determine the number of unique values X^Y where 1 <= X <= A and 1 <= Y <= B.  A and B are both potentially large.

A3)  I think I almost had a better chance of solving this problem than Q2.  Although that isn’t saying much given how tired I was.  We start with A*B scenarios, and start subtracting off duplications.  With each of A and B being up to 1 billion, obviously duplications need to be removed in batches.  The first trivial set of duplications is X=1, there are B-1 of these.  Now we only have to consider X > 1.  We need to identify scenarios of X^Y = N^M.  Cases where N is a power of X aren’t too bad, but there are cases like 8^2=4^3 which are far more problematic.

So, taken from someone else’s solution… For each value between 2 and the sqrt of A inclusive, consider every power not greater than A.  For every number between 2 and A inclusive not in this set there are B options.  There is also the 1 extra for the value 1.  What remains is how many contributions count from the others, numbers which have powers less than or equal to A.  If we take each unique non-power in the range 2 to sqrt of (A) inclusive, and consider how many powers of it exist which are less than or equal to A we can work out how many unique values we can reach using an exponent of at most B.  Each power can add B, but then we have to remove the overlaps.  It almost looks like a standard add everything, remove pairwise overlaps, add back in the triplets, removing the 4’s, etc.  But since the power could go up to 30, there are a lot of groups of 15, so it needs to be done a bit smarter.  I can’t really work out how to describe the solution here properly, so if you are interested I guess you can go read a solution.

TCO13 R2A

So, up at 1am to prepare for the 3am start.  Quite tired, but will see how I go…

2000 potential competitors for round 2, only 1260 turned up.  5 Australians battling the 3am start time. (Assuming they are east coast…)

3:10am – round starts, no one has been assigned rooms yet 😛

3:19am – round is cancelled 🙁

Update: Round is rescheduled for 3am tonight …

 

TCO13 R1C

Looks like there was a big turn out again, all 2000 registration slots used.  12 Aussies seems the biggest turn out so far – with 4 advancing. (I think that gives 7 Aussies in to round 2.)

1000pt obviously wasn’t as easy as last week, but a fast time on the 250pt wasn’t enough to get through again, so the 500pt can’t have been too difficult.

Q1) Given a sequence length, maximum step size, start and end values, determine the maximum possible sequence element.

A1) Appears pretty straight forward, the key thing to realise is that the answer will either be a multiple of the step size above the start value, or the end value (or both).  With a maximum sequence length of 1million, you can even try both with brute force without running out of time (so long as for any given reached value, you calculate the number of steps back to the other end using division rather than repeated subtraction).  More efficient methods include an inclusive binary search on the range max(start, end) to max(start, end)+(length*maxstep) and checking feasibility by using division to calculate the minimum number of steps required to get there from each and and comparing the sum to the sequence length.  There is also probably a closed form formula involving the difference between the start and end values which does the problem in O(1) – but I haven’t bothered to derive it.

In any case you have to remember to handle a maximum step size of 0, which breaks all the division calculations.  So treat it as a trivial special case and return start before running any of the above code.

Q2) Given a set of integers which are each formed from the sum of an unknown number of non-negative integer sub-components, determine the smallest number for which it is guaranteed that there is no more than k sub-components which are greater.  Number of integers provided will be <50, but the values and k itself may be up to 1 billion.

A2)  This is a binary search problem – since as you increase the comparison value, the maximum number of sub-components greater than it never increases.  The answer will always be in the range 0 to 1 billion and 1, and each binary search step needs only perform a division calculation for each input value.

Specifically if we are considering whether n is a feasible solution, we divide each sum value by n+1 (since they have to score greater), and take the integer division result as the maximum number of sub-components that there could be which are greater. Add all of these answers together, and if it is no more than k then it is a feasible solution.  Binary search until you find the minimal feasible solution. (As usual, this binary search is technically implemented to find a value between true and false, so you need one which returns the first value above when equality is not found.)

Q3) Given an NxN grid and assuming uniform random allocation of a pair of cells (not coincident), determine the expectation value for the number of cells of the grid ‘selected’ using a pattern which is centred on each of the randomly allocated cells.  The pattern being up to 9 cells relative to the centre of (0,0), (a,b), (b,a), (-a, b), (-b, a), (-a,-b), (-b,-a), (a,-b), (b,-a).  If the pattern falls off the edge of the grid, the corresponding member of the pattern is ignored.  If any two pattern elements are for the same cell, a given cell can only be selected once. N, a and b are all no more than 1 billion.

A3) So yes, this is definitely harder than last week.  With such a large value for N, obviously brute force is out of the question.

It isn’t too difficult to calculate the expectation value for a single randomly allocated centre, using N, a and b there is fairly trivial formulas for what area of the grid the centre can be in, compared to the total area to allow each of the 9 points to be valid, then those fractions can be added together, taking care that the areas are up to 10^18 in size, so even if you take advantage that each fraction has the same base you may even overflow a 64bit integer unless it is unsigned when you add them together.  Also you need to handle the case where a equals b, and not double count the pattern elements which are then equal.  Double this gives a first approximation for the expectation value for 2 randomly selected centres, we just need to subtract off the expectation of coincident pattern elements.  And handle the fact that the 2 pieces will never be placed in the same place.

There are 3 cases of coincident pattern elements.  The centres are coincident, which has an easily calculated chance of 1 in the area size, and cancels out an entire pattern contribution – the expectation value of which we just calculated before.   In fact since centres co-incident is not allowed, we need to cancel out 2 entire pattern contributions.  The remaining 2 cases are more complicated – first is centre is coincident on a non-centre element – this cancels out exactly 2 cells (I think), but the probability is related to a subset of the area divided by the area squared (considered for each of the up to 8 different directions).  Finally we have to deal with the case of non-coincident centres where the pattern elements match.  This appears to have 2 sub-cases, straight and bent.  In straight, the line between the centres passes through the overlap cell, and there is exactly 1 cancelled out cell.  Again the probability is a subset of the area divided by area squared (this time reduced by 2a, 2b or 2b, 2a – for up to 8 different relative arrangements).  In bent, the centres end up a+b apart horizontally/vertically or a+b, a+b apart diagonally.  In the diagonal case, exactly 2 elements are cancelled out, and each of the 4 cases can be handled like straight scenario.  The horizontally/vertically cases are another question though – this time you have to consider up to 3 different scenarios.  Because each of the 2 elements to be cancelled out may not actually fit on the NxN grid.  But you can calculate an area where the ‘left/top’ element exists to be cancelled out, and another with the ‘right/bottom’ element exists to be cancelled out – and use those as individual 1 cancelled out cell cases like the straight scenario.

Once all these adjustments are made, I think a further adjustment of area/(area-1) needs to be made to compensate for the fact that co-incident centres could never be selected in the first place.

All in all, it isn’t an impossible problem if you’ve done some expectation value calculations before – but it is so fiddly, so many cases to code up, and so many chances to overflow/lose precision.  And correctly handling the no-coincident of centres is a bit tricky.

Looking at some solutions, it appears that it is possible to avoid special casing this so much, by considering each pair of elements from the patterns.  For the pair to be co-incident the centres have to be in a certain place relative to each other, which combined with the overlap position gives you the subtraction component as a fraction of area squared.  Each of these single subtraction components all combined, should give the same sum as all the subtraction scenarios I mentioned above.

TCO13 R1A

So pre challenge phase I was on 531st – with the feeling that things could go either way for me – the 500pt had the opportunity for lots of failures because of floating point error margins, which I was confident I had taken care of – but on the other hand, a small missed optimisation I noticed with 15seconds to go, left me uncertain whether it was possible to write a degenerative test case which would cause a time-out.

After challenge I was 480th, with a bit more confidence having seen someone successfully challenge 2 others but not my solution, using what I am pretty sure would have been a time-out attempt.

My solution to the 500pt did fail system tests, but in-spite of that my final placing was 471st.  There were a very large number of failures for the 500pt problem.  In my case it was a degenerative performance test case, as I expected.

471st gets me through to round 2, no need to slog through R1B and/or R1C.  R2A/B/C happen after daylight savings change, so I think they will be at 2am instead of 4am.  I am not sure which is worse really…

The 250pt question was easy, and really so was the 500pt – but it took me over half an hour to realise the obvious aspect of the 500pt problem I was missing.  I blame having to code at 4:30am…  The 1000pt was not – but still there was ~80 submissions for it.

Q1) Given a set of numbers in the range 0-9, what is the minimum cost (where it costs 1 per integer distance a number is moved by) to ensure all the numbers are within 1 of each other.

A1) The limited range made brute force trivial – just try pair of adjacent numbers (0,1), (1,2) … (8,9) and calculate the cost to move each number below to the bottom number and each number above to the higher number.  I managed to score 240 out of 250 points here, which is pretty high for me – but it was a pretty easy problem…  Competition advancement cut-off was 234 points – so I didn’t want to be too much slower though!

Q2) Given a set of ‘gaps’ which start and end on integer boundaries (but are exclusive of the boundaries themselves) determine the smallest number where all multiples of that number do not fall into any of these gaps.

A2) I saw several solutions to this, and many of them failed.  Most of them used floating point numbers for doing most of their code, which was a big risk due to cumulative errors.  Others failed (like mine) because the code didn’t scale well enough.

The key point which took me far too long to realise was that any candidate solution can be improved so long as none of the multiples lie on the end point of a gap.  Otherwise a microscopic reduction in jump size will still leave every multiple not hitting a gap.

This leads to the solution that the number is a divisor of one or more of the end values of a gap.  So for each end value we check each plausible divisor to see if it works for the entire set of gaps.  You can set an upper bound for the divisor by calculating the floor of right edge divided by width – but if the width is 1, this upper bound doesn’t help much.  Checking the divisors I kept everything in integers, basically doing simple fraction math rather than using floating point.  However I didn’t notice that my cost of checking whether a divisor worked or not wasn’t O(N) where N is the number of gaps but was instead O(D) where D is the maximum distance from the start to a right gap edge.  Since the upper bound for D was 30k and N was 50 – this was a significant performance penalty.  The algorithm is O(N^2D) if implemented correctly, but mine was instead O(ND^2).  O(N^2D) was fast enough to pass tests.  I saw an O(ND log ND) implementation which passed, but I don’t actually understand why it works…

The mistake I made which caused the O(D) performance when checking if a specific length worked was that I incremented the multiple counter by 1 while the current location was in front of the current gap.  If the first gap was very late, and the gap size under consideration was 1, it would tight loop increment that counter all the way up.  I needed to either use a simple division short circuit to accelerate to the greatest multiple before the current gap, or instead use a modulus check to see if the jump size would end up in the gap rather than walk my way up at all.

Q3) Given a rectangular grid which is cyclic in nature (left edge joins to right edge, and top edge joins to bottom edge) where each cell in the grid specifies a direction (either left, right, up or down), determine the minimum number of cell changes required such that every cell is part of a cyclic path which comes back to itself.

A3)  This problem is a tough one. (The system tester thought so as well, it refused to rate anyone on this problem the first time round :P)  I heard one competitor saying it was a case of min cost max flow – and I think I see how that could be the case, but min cost/max flow is a very complicated algorithm to try and implement during the competition.

The first level of analysis on this problem is pretty straight forward – if every cell has another cell pointing to it, then every cell is in a cycle.  Correspondingly every cell which has multiple cells pointing to it, has to have changes occur to reduce it to one.  The problem is how to minimise the number of changes required to get each case of excess inputs redirected to a location of insufficient inputs.

A bit of thought has failed to help me construct the appropriate min cost max flow graph to solve the problem, but it is after 7am – it is time to get some more sleep…

Post sleep – I can see how to formulate this as min cost circulation problems.  Each node is duplicated, with a minimum and maximum flow of 1from copy 1 to copy 2 (and a cost of 0) – then each copy 2 connects to copy 1 of its neighbour with maximum flow of 1 and a cost of 1 if it isn’t the neighbour the node currently points at.  One special node is added, post source which connects to every copy 2 with maximum flow of 1 and cost of 0.  Source connects to post-source with maximum flow equal to target cycle count and cost of 0.  All copy 2’s connect to sink with maximum flow of one and cost of 0.

The problem is then repeated for each target cycle count between 1 and V/2 – and the minimum of minimum costs is the result.

The problem is the standard min-cost max-flow algorithms don’t support minimum flows.  One solution in this case where minimum flow equals maximum flow is to make it ‘cheaper’ to go through every minimum flow edge then any other path that doesn’t include them.  To do this we change the edges with minimum flow to have no minimum flow but a negative cost greater than the number of edges in the original graph – this ensures that minimum cost will pass through every one of these edges.  In this case the shortest augmenting path algorithm has to be one which handles negative cost edges.

One refinement – not sure if I’ve considered this before or I am just starting to understand better for the first time – but in this case you can do all of the V/2 problems at once by setting the target cycles to infinite and keeping track of the cost after each augmenting flow.  Since each augment is minimum cost and increases flow by 1, it will pass through every cycle count and their associated minimum cost.

Surprise top coder…

So it appears I wasn’t paying enough attention to my email back in the middle of last year.  Unlike usual when top coder announces it is opening registrations a couple of months before the contest starts, this time it decided to open registrations the middle of the year before!

So I got a reminder email today, registration is open and surprise – round 1A is this weekend.  Guess I will need to quickly muster my ability to stay awake and conscious at 3am in the morning once more.  I should probably start trying to do these competitions in c++, but with this little notice I think I will be using c# again this year.

Since I had been surprised by the sudden appearance of the top coder open – I went and double checked – GCJ13 registrations don’t open for another 3 and a half weeks.  Not that I will be eligible to compete this time!

GCJ12 R3

My first non-competing round of the season… will see how many of these I bother to write up – if previous years is anything to go by, not many!

Q1) Given a set of tasks which each have a time to complete, and percentage chance of failure, determine the order which minimises the expected time to completion if failure of any task means you have to restart your order from scratch.

A1) This was a sneaky question, not insanely difficult, but they set the restrictions on the small input that the time to complete of every task was identical.  In that scenario the solution is trivial, you sort by most likely to fail to least likely to fail (using a stable sort, or secondarily sorting based on original index).

The large input isn’t much more complicated, the only question is what is your sort operator look like when the times of the tasks are not identical.  Unfortunately my initial guess would have been length of task times the chance of failure, when the correct answer is length of time divided by chance of success. Fraction class works here, or you can define your sort operator as a.Time*(100-b.FailChance) < b.Time*(100-a.FailChance).  Pretty sure I’ve seen this question somewhere before…

Q2) Given a hexagonally shaped hexagonal grid and a series of ‘moves’ which fill in hexagons, determine the first point in time that either 2 corners are connected, 3 edges (which don’t include the corners) are connected, or a ring is formed. (A ring is a connected loop of hexagons where there is at least 1 empty hexagon inside.)

A2) The large input has a hex grid with side lengths of up to 3000, and 10k moves, so the implementation needs to be reasonably efficient.  Having written LoopDeLoop, I’ve had opportunity to consider similar problems in the past, so I think I had a chance of solving this problem if I had of been in the round.  We can solve the 3 sub-problems independently, but there will be a common theme.

Corners. if we use a union-find data structure and associate properties with each set representative (specifically the number of corners the set touches), we can do this in linear time. Each move is handled easily Each surrounding position is either empty or a member of a set.  Gather the number of corners for the distinct sets, union them all together, add in the new position and set the number of corners equal to the total (adding 1 if the move is in a corner).  If total exceeds 1 we are done.

Edges.  Almost identical to the corners, only instead of summing, we use a 6bit number to represent the edges touched, and use bitwise or to combine them.  And rather then checking total we do a pop-count.

Loops.  Here is the tricky bit.  Again we can use the union-find data structure, but this time rather than aiming to union things together and track status, we are looking to see if the current move has 2 neighbours which are already in the same set.  There is also the possibility that the move doesn’t create a loop, but just fills in the edge of a ‘blob’.  If precisely 2 of the neighbours are the same set, and they aren’t touching, we have definitely created a loop.  If 3 of the neighbours are the same set and they aren’t ‘sequential’, we’ve created a loop.  If 4, same applies.  If 5 or more there isn’t anyway to not have them sequential (and specifically 6 can never happen since it is already a loop).  Edge and corner placements don’t really change this logic, although you certainly can’t go all the way to 6.

Plenty of opportunity to screw up, but overall I think I had a better chance here than with the large input of the first question…

Q3) Given a delivery fee per delivery, and cost of types of food, and length of time until stale, how long can you eat delivered food given a certain amount of money.

A3) For a given number of days to survive on a single delivery, the answer is fairly obvious.  For each day take the cheapest food which would not be stale.

The problem comes in deciding how many days to go between deliveries.  Obviously if a type of food exceeds the cost of the cheapest food plus delivery fee, that type of food will never be selected, but since the delivery fee is averaged over multiple days the crossover point could be lower.  It wouldn’t be too hard to do an infinite money scenario maximum efficiency but for a finite amount of money the problem is trickier.

The large input is huge, 10^18 money and up to 10^18 stale time.  But the number of food types is much smaller, only 200.  I looked up some solutions, and the one which I think most matches where my thinking was heading is assuming that the solution consists of d day cycles or d-1 cycles to make up the total number of days achievable. Binary search to find the maximum number of days, and ternary search inside that to find the minimum cost to complete that many days.  If minimum cost is below the total amount of money, binary search goes up, otherwise it goes down.

Q4) Given a sequence of characters, some of which can be substituted with numbers, determine the length of the shortest string which contains every possible length k consecutive subsequence of the original sequence or any of its substituted variations.

A4) Large input has k up to 500, small input is only 2.  Small input we can build the list of distinct pairs, including substitutions.  Then you have to work out how short you can make the joining.  For each distinct pair you have to have at least 1 character in the output string, but do you need 2?  For each unique character in the input we can count how many pairs start with it, and how many end with it.  The difference (if positive) is how many times you can’t reuse that character.  Finally even if you could theoretically reuse everything, there is always the first character, so there has to be something which can’t be reused. (I read a solution because I wanted to finish writing this blog post before 1am… but I don’t think this is too hard to come up with…)

The large input, I have no clue!  Unlike with the size 2 input, you can have multiple different length overlaps to save you characters.  (Not to mention that 500 character sequence can have 2^500 versions using substitutions…) Only 1 person solved this during the competition, so I don’t feel to bad…  Turns out he used a mincost/maxflow, although I certainly am not going to read the pages of code to try and understand how mincost/maxflow solves this problem…

TCO12 R2C

Low attendance for this round, out of 1900 potentials, less than 1100 turned up.  Apparently some people were having difficulty reaching the topcoder site just before it started so they delayed the start by 10minutes.  I lost connection twice, once during debugging the first question and again during the challenge phase (I think it was my own modem even, not top-coder).  But I don’t think it affected my performance significantly.

Writer solution for the 900pt problem was wrong… so it took a very long time to get any results…

346th before system tests – but I missed a corner case in my answer to question 1 and ended up with 0 points. Which dropped me down to 525th.  I wasn’t going to advance or get a t-shirt in either case, so not a large drama.

Q1) Given a fully connected directed graph where you can change 1 edge value, determine the longest path taken by a greedy algorithm which always chooses to move to the closest remaining destination, starting at 0 with tie breaking based on order of input.

A1) I made my code more complicated than it needed to be, so my submission time was a bit slow (but apparently not complicated enough).  Brute force is O(kN^4) (k being the maximum range of each edge value), which is too slow.  However it can easily be seen that there are only a few interesting values to change an edge to.  First is the maximum – this either eliminates the edge from consideration, or increases the maximum distance.  Next is the highest value which makes the edge be selected.  If the edge is already selected, this may still be higher than its current value.  So at each point you can choose the second shortest path if the edge being modified is under consideration and try the value equal or immediately below, whichever is the highest value which makes it selected.

There are a few corner cases, making maximum doesn’t rule the edge out of consideration if there all other edges are higher index and maximum length and you can’t have an edge with length less than 1, so you can’t always make an edge selected either.

Specific case I missed was when forcing the maximum, the scenario where it increases the maximum distance rather than forcing a different option doesn’t just include where it is already maximum or it is the only option.  It also covers the case where all other options are higher index and also maximum length.

Q2) Given a large number of points in a 2 dimensional plane where no x or y co-ordinates are in common, determine how many ways you can select 3 points such that if x coordinates are in order 1 2 3 the y coordinates are in order 1 3 2.

A2) The number of points is up to 30k, which rules out O(N^2) algorithms.  Evidently the solution is O(N log N).  I don’t think I’ve seen a problem like this one before, and the solution is kind of interesting.  I certainly wasn’t going to guess it during the comp though…  As a bonus, I completely misread the question and thought the x and y coordinates were in the same order (despite the question having a sentence explicitly calling out the fact that they were not…)

First do some range reduction, sort the x-coordinates and re-number them to be consecutive integers from 0.  Then switch to sorting by the y coordinates.

Now we need to actually solve the problem…  So we consider each point in reverse y order, first checking to see how many points already added have higher x then the current.  Then we add the current point.  At any point in time all points already added have greater y, so greater x is the criterion of interest.  Then we add the new point.

The first trick being that we need to answer the question regarding the number of points already added with higher x, quickly.  For this we use a range tree.  Since we renumbered the x-coordinates, we know the full range, and can do a trivial sideways binary tree to cover up to the next power of 2 greater than the range.  When an element is added, we increase the value of each segment which contains it.  When we need to know how many elements are we look into the tree, summing any right branches whenever we take the left.

The second trick being all I’ve said so far only helps with choosing 2 points, how do we choose 3?  Well we can take the number of values with greater x and do a choose 2, but while we know they have greater y values and x values, we don’t know if they actually are ordered correctly themselves.  So we need to keep track of how many pairings greater than a given are the wrong order.  Since the wrong order is that they are sorted, and we know how to count the number of elements greater, this is not too hard.  Just have another segment tree, but rather than giving unit weight to each number as it is added, weight it by the number of entries strictly greater then itself at the time it was added.  Thus the total of these is a count of all pairings in sorted order, which we can subtract from the choose 2 to get the correct answer.

Don’t think I’ve seen a question involving a segment tree before at all even – I’ll have to keep them in mind in future.

Q3) Given a list of heights, and the rule that at each time step positive heights shift 1 unit of height to the next entry in the list (circular so last moves to first), determine the new list of heights after a very long time.

A3) I think I should have put more effort into this question, because it certainly seems easier than Q2… The solution is to simulate it, but jump forward in time by the largest step you can at any point.

At any point if everything is 0 or negative, you are done.  If everything is positive, you are done.  If everything is 0 or 1, you can rotate the solution to finish.  If everything is 0 or greater, a few simple simulation steps will either make everything positive, or just 0 and 1.

So the only interesting scenarios are the ones which involve a mix of negative and positive numbers.  In these scenarios, you can jump ahead, so long as you don’t fill any holes or clear any mountains, since that changes the dynamics of the scenario.

We can rule out a few corner cases by forcing step by step simulation if the number of turns remaining is smallish.  Also if there are any 0’s between a mountain and a hole, some manual simulation will take care of that.  Then, for each hole we follow left to find the left-most contiguously positive position, and skip ahead a number of steps equal to the minimum of what fills the hole or reduces the height to 0.  The minimum of all these minimums is the number of steps to simulate, which can be done just by subtracting the number of turns off of the left-mountains and adding them to the connected holes.

I think the worst case runtime of this algorithm is something like O(N^3) – finding the connected mountains is linear, but the manual simulation sections might be O(N^2).  However since one mountain or hole is eliminated in each step, there shouldn’t be more than O(N) such sections.