## GCJ16 R1B

So, with 1000 having already advanced there probably wasn’t quite so much competition this round, but it definitely seemed to be a slightly harder round, with only 320 perfect scores.  Specifically the third problem large input was a bit difficult if you didn’t have experience with that kind of problem before.

In the end the cutoff was the first 2 problems in a decent time.

Q1) Given a set of characters made out of the the letters in the English words for the digits zero through nine, determine what the original digits were, in non-descending order.

A1) This was a fairly straight forward, if potentially tedious to code problem – you just have to identify the correct order to try removing the words from the distinct count of each letter in order to avoid getting into a dead end.  One such order is (Z)ERO, T(W)O, FO(U)R, SI(X), EI(G)HT, (O)NE, (T)HREE. (F)IVE, SE(V)EN, N(I)NE, .  Each of the letters in brackets is the last time that letter occurs reading left to right, so removing in that order obvious works.  I like this particular ordering because its the even digits followed by the odd digits, each in ascending order.

Q2) Given a pair of numbers that have been obscured by replacing some of the digits with question marks, and may also be zero padded, determine the way to replace the question marks which results in the minimal difference, breaking ties by minimizing the first value and further by minimizing the second value.

A2) So the contest analysis mentions that this can be done in O(N), which I found interesting, but I’ve not managed to solve yet.  The O(N^2) solution is consider the replacement of question marks to have 3 phases.  Phase 1 you try to make the numbers equal, Phase 2 you introduce a minimal single digit difference, Phase 3 you try to maximize the difference in the opposite direction to the difference introduced in Phase 2.  You just iterate over all of the possible places for Phase 2, including not every reaching it.  For each possible phase 2 location you can either try and make the second larger, or the first larger, but that only doubles the total number of passes, so its still O(N^2).  You can try and optimize further by aborting when phase 1 fails to make equal, but its not trivial to know when not to bother trying with phase 2 – I think you need  some kind of pre-processing step to work that out in order to get to the O(N) solution.

Q3) Given ordered pairs of words, determine the maximum subset which can be made entirely out of the first and second word sets of the inverse of that subset.

A3) I immediately recognized this as a graph problem.  I mistakenly first assumed that the answer was number of edges minus the minimum spanning tree, but the minimum spanning tree was the wrong concept.  Not too long after that I realized it was a bipartite graph problem.  With that I simply assumed that TMD.Algo had the solution and that the answer was the number of edges minus something to do with calculating the bipartite matching.   In the end its number of edges minus the bipartite maximal independent set.  Which doesn’t make a lot of sense at first glance, since you want number of edges minus the minimal edge cover size – but for a bipartite graph the maximal independent set is equal to the minimal edge cover size.  The minimal edge cover is equal to the size of the maximum matching plus one for each vertex not in the maximum matching.  Number of vertexes not in the maximum matching is to number of vertexes minus twice the maximum matching.  Add on the size of the maximum matching and you get number of vertexes minus the maximum matching, which is the same formula as the size of the maximum independent set.

## TCO16 R1C

Last round 1 for this years topcoder, 750 to advance, but only 615 registered.  So positive scores to advance.  In the end 500 people managed to get a positive score.

Q1) Determine if a set of integers is closed under addition.

A1) This is a very simple problem with an optimization which I think I might have tried, but screwed up.  Lucky I already got through in round 1A.  Just try all pairs, add and check for membership.  Even the O(N^3) solution is fast enough.

With all the inputs between +/-50 its a trivial O(N) to perform a distinct count, then return false if there are more than 3 distinct values.  If any non-zero number has more than 1, return false.  Otherwise check all pairs on the 3 values.

Q2) Determine if a sequence of C’s B’s and A’s can be reordered so that no B’s are adjacent and no C’s are within 2 of each other.

A2) So I think there should be some kind of pseudo-greedy approach to solve this, but all the solutions I’ve seen are dynamic programming.  The state is the number of A’s remaining, number of B’s remaining, number of C’s remaining and whether the last letter is a B or which of the last 2 letters is a C.  So for each possible state if there is an A remaining you can reduce the A count by force last letter to not be B and shift which of the last 2 letters is a C back one.  If there is a B remaining and last letter is not B, decrement B, mark last as B and shift which of the last 2 letters is a C back one.  Similarly for the C.  If the destination state already has a string assigned to it, skip, else use the current string and append the new letter and assign it to the new state – put it on a queue to consider.

If you ever get to a state of no more A, B, or C – return the string associated with the state.  Number of states is O(N^3) and each state has a string creation cost.  Total O(N^4) is fast enough.  Can be done in O(N^3) by not actually constructing full strings, just storing pointers to the come-from state.

Q3) Determine the number of integers between 1 and X when represented in binary for which the largest non-empty subset of at most length – K digits is less than Y.

A3) X and Y can be huge – so this problem is challenging.  I’ve not groked any solution yet.  Obviously all numbers less than Y are trivially a pass, but that is where it stops being easy.

## TCO16 R1B

So I wasn’t in this round since I already advanced to round 2, and I was too late getting back to the topcoder website to see the scoreboards, so this will just be a problem discussion.

Q1) Determine the first prime found in a sequence starting with N, or return -1 if not found in N steps.  Sequence is to replace X with the sum of the squares of its digits.

A1)  One terminal is if you get to the value 1 you can immediately return -1, because it will never change again.  However more generally it is not difficult to observe that once X is less than 200, it never goes over 200 again, and regardless of X (at least for up to 1 billion), it will be less than 200 in at most 3 steps.  Therefore if there is any repetitive loop other than reaching one, the number of steps to identify the cycle is no more than 200.  So just keep a hashset of values seen so far, and if you see something you’ve seen before, return -1.  Otherwise its just testing primality and breaking up digits and summing their squares in a loop.  Possibly also be careful for small starting N that you don’t run your loop more than N times looking for a cycle, but given the density of small primes, its not obvious there is such a scenario to worry about.

Q2) Given the ability to replace digits using a pool of specific counts of digits (no zeros) and an original list of numbers, determine the maximum sum that can be created by modifying the original list of numbers using the pool of replacement digits.

A2) This is a straight greedy problem.  The final sum depends on which digits are replaced with what values in what positions in the numbers, but not the order you make the changes.  Therefore for each digit available you replace them with the highest available digit that increase the number, and you prioritize the ones which provide the greatest increase in local value, since that increase is the same increase applied to the final sum.  As there are only 350 digits at most to consider, you could just do an O(N^2) algorithm to find the greatest increase, apply it, then try again until the pool runs dry or there are no more possible increases to the sum.

For an improvement to the run time you can sort the candidate digits by their place, descending power of ten, breaking ties by their digit, ascending.  Then you just walk through them replacing each with the best available improvement.

So I kind of glossed over why this works, it isn’t perfectly obvious since if your pool consists of only 9’s and 2’s and you use up all the 9’s on large numbers starting with 1, and the rest of the numbers don’t have any 1’s in them, those 2’s are going to waste.  However when you use a number, the number of digits which go to waste is at most 1.  Because the wastage is further down the sorted sequence, it either has a lower place (and hence even at 0-9 its 90% of the improvement that just increasing the current place one higher that is the minimum you get when invoking the wastage) or the same place but with a smaller difference, in which case the ideal is to take the 2 highest digits which is exactly what happens in the wastage scenario.

Q3) Given a sequence of values and multiple ranges that it costs 1 unit to increase value by 1 and one range that covers the full sequence that it costs T units to increase the value by 1, determine the minimum cost to create a sequence of values which is never less than the original sequence.

A3) The number of ranges, length of the sequence, and potential range of values in each spot are all at least up to 10^5, so its clear that you need a very efficient solution.  This problem is a clear step up in difficulty from the previous two.

So due to the high data input sizes my mind immediately went to a minimum find search on the number of units to spend on the range that costs T, then for each potential number of units spent under consideration we need to solve the minimum cost to spend on the other ranges in linear time.

As it happens solving the inner problem isn’t trivial either, to run in linear time you need some pre-computation.  Sort the ranges by their start points, breaking ties by the end point descending.  Then walk through and discard any ranges that don’t increase the end point, these ranges are fully contained by the current element, so are useless.  Now that we’ve got a more useful set of ranges (in O(N log N) time), we still need to do more pre-computation to allow for the inner loop of the search to be linear time.  The aim is to compute for each sequence position the end of the right most range which covers it.  (The reason this is useful I’ll explain below.)  First fill the list with -1 to represent places that aren’t covered at all.  With our sorted list the covered bits are easy, if the next element’s start is lower than the current end point, fill from this start to that start with the current elements end point, otherwise fill from start to end with that end point.  Then move onto the next element and repeat.  This is linear because no place is written more than twice.

So now that we’ve got this computed table minimum cost isn’t so hard to calculate in linear time.  Consider the left most position which isn’t already satisfied – everything to its left is all good, so there is no point increasing values in that area any more.  So the ideal range to increase is the one which covers this point, and covers at much to the right as possible.  Conveniently that’s what we already calculated.  So we determine the number of points to spend then apply that till the end point?  Not so fast, if all the ranges are about half the length of the sequence one step after another, you’ll suddenly have an O(N^2) algorithm.  Instead you need to keep track of how much you are currently adding, and mark the end point to decrement that addition by how much you just added.  Then you just walk along, if there is more needed, increase the spend, apply to the current location, mark when to decrease it, if there was a mark on this position, decrease the spend.  Accumulate the total spend increases to give a final total.  If there are any places where you are short and there is no covering range (as indicated by -1) abort and return infinity as the cost.

So all that remains is the minimum find search.  A ternary search is one option to consider, but care is required because ternary search won’t survive a plateau which is not the minimum, and there could be multiple values where you return infinity for.  So you need to special case infinity to always cut off the low third even if both of the probe points are equal.  Another alternative to the ternary search is to use a binary search on slope, looking for zero, if you don’t find zero, you can take the first element with positive slope to the right.  Again with the binary search on slope you have to take care of the infinities, they appear to have zero slope when next to each other, but should be considered to have negative slope.

## GCJ16 Round 1A

So this round was blitz’d – cut-off was top 1000, and all of them finished all of the problems.  I’m not confident I could have solved them all in time, the key observation for the second problem doesn’t seem something I would have caught.

Contest analysis was posted almost immediately again, so my analysis is kind of redundant, but I’ll do it anyway!

Q1) Given a sequence of characters where for each you can choose to put it at the start or end of a word you are building, determine the lexicographically largest word you can make.

A1) So this is a greedy problem. At every point whichever action makes local sense, is also the path to the global maximum.  The contest analysis shows how to solve the large in 4 lines of python, I think it can be done in one (rather long) line of c# using the Aggregate function for enumerables.

However both of these solutions are quadratic in the size of the input (which is fine given a maximum input size of 1000).  Its possible to do this problem in linear time using a dequeue.  This is because the larger of adding the new letter as a prefix or postfix is the same as comparing the new letter to the first letter and putting it at the back if the new letter is smaller.  This reduction of the comparison to just the first character works because the generated word will always go down after any first sequence of repeated letters, never up, so extending the sequence of repetition is always an improvement.  Pretty sure this property can pretty easily be proved, but I’ve not done it formally.

Q2) Given 2N-1 sequences of length N which are all strictly ascending and each correspond distinctly to one of the rows or columns of NxN grid where every row and column is strictly ascending, return the missing row or column values in ascending order.

A2) Brute force for the small input isn’t exactly trivial to write but it is at least obvious.  Contest analysis tells me that the trick here for the large input is that every missing value will have an odd frequency due to there only being one missing row/column.  This leads to a very straightforward solution – which would be one (very long) line of C# if there was a method to Sort and return an enumerable…  I think I might add a ToSortedArray extension method to TMD.Algo.  This is O(N^2) since the data to be sorted is length N.

However since I feel that trick is too hard to spot reliably, I’ll now present an alternative implementation which doesn’t rely on it, but is still O(N^2).  Its just a bit more complicated and hence not one I’m confident I could design in the time available.

So this other method also has a trick, but one I think is more obvious.  The smallest value in the grid is in the top left corner.  Therefore the smallest value in the first spot of any of the inputs, is the value of the top left corner.  Further more any sequence which starts with that value, must be either the top row or the left most column.  Therefore there will be at most 2 of these.  This identification takes O(N) time, since we only look at the first value of each and find min, then a second pass to tag those that are min.  The trick is that now that we’ve identified the input data which could potentially be in the first row or column, the problem has been reduced.  Ignoring the data we’ve already tagged, the smallest value in the second spot of any of the rest of the inputs is the value from the second position of the diagonal.  Again there are at most two which match and we can tag those as being the second row or column.  We can repeat this until we’ve now identified the entire diagonal of the final grid and at most two options for any row or column pair passing through each position in the diagonal.

So far we’ve spent O(N^2) time, but we’re not yet done, we’ve only identified a fraction of the full grid.  Luckily we’re actually almost done.  Chose one of the elements of the diagonal that has 2 input data’s associated with it.  Since the grid is currently empty you can place these in any orientation without loss of generality, so make one the row and the other the column and place them into the grid.  Now consider the two arrays you just placed, for any indexes where they aren’t identical, the corresponding row/column pair which crosses perpendicular through those indexes is an interesting place to think about.  Since the values are not the same, if you have two options for that row/column pair, only one can be the row, and the other must be the column.  Thus they can be placed.  And so on for every time you place a row/column pair if any of the elements are different you have another row/column pair that can be placed if its not already placed.  So use a queue and a mask to know what you’ve put in the queue already, and this cycle of placing and finding new things to place is linear in the number of values in the arrays placed so far.  The one thing is that if there isn’t anything different, or if the difference is for the row/column where one is missing, your queue will flush, but you won’t be finished yet.

Here is the not quite so obviously true bit, but one I’m confident is fine.  Since you’ve gotten to a point where there is no difference in the rows or column pairs that are yet to be placed, you can simply choose arbitrarily again, just like you did for the first pair.  The known cells are guaranteed to match, and the unknown cells are guaranteed not to be influenced by anything you’ve already placed before.  Thus we can just throw a new index on to the queue and keep going.

Finally there is just the row/column pair passing through the diagonal where only one data set was tagged.  Check if the one data set matches the column, if so return the row, otherwise return the column.  Although before you return, do make sure you’ve set the diagonal value itself, it won’t have been populated yet unless you populated it at the very beginning while the diagonal values were being determined.

Every step along the way to fill in the values is linear in the number of values filled in, so O(N^2) just like the much simpler solution proposed by the contest analysis.

Q3) Given a directed graph where every node has exactly one outgoing edge, determine the maximum number of nodes that can be placed in a circle such that every element in the circle is directly connected to one of its neighbouring elements.

A3) Despite this problem being worth the most number of points – I found it to be conceptually the easiest.  Yes even easier than the first problem.  That being said the implementation is a pain, even worse than my extra complicated option for Q2 large input size.

So the problem has two pretty straight forward cases.

Option 1, the circle is fully connected.  Since each node has exactly one out going edge, this implies a cycle of some length.  So step one is to find the largest cycle.  That’s one possible answer.  Having found all cycles we’ll throw away them or any nodes that are connected to them if the cycle length is greater than 2.

All of the remaining nodes form pairs of trees pointing to a cycle of length 2.  Option 2 is to take the longest path to each side of that cycle, sum those together, then sum across all such connected components, this is the alternative answer.  This represents taking all the longest fully connected lines and placing them into a circle. Return the largest of Option 1 and Option 2.

Depth first search will identify all the connected components in linear time, it will also identify the longest cycle in linear time.  A second depth first search on each connected component with cycle length of 2 will find the longest path to the cycle in linear time too if you keep track of path depth and destination to avoid recalculation when your depth first search finds somewhere you’ve searched before when starting from a new possible deepest node.  Or reverse the direction of the arrows and just do a standard height of tree calculation.

## TMD.Algo GCJ2016QR update

Now that the qualification round is complete and I’ve had a chance to write myself some solutions in TMD.Algo’s test harness, I’ve submitted them.

Along with this update are a bunch of new extensions for turning streams of digits into values, or values into streams of digits.  I used these in three of my solutions.

Commit here: GitHub

## GCJ16 Qualification Round

Contest analysis is already posted – 27170 positive scores and 1710 perfect scores.  Not mentioned was the cut-off, 22154 people are currently eligible for competing in Round 1.

The success rates for large were quite high for both of the first two problems, but quite a bit lower for the third and fourth.  I expected a low pass rate for the large input on the fourth as I’ll discuss later, but the third is less obvious.

Q1) Consider multiples of N, what is the end of the sequence which contains every digit in base 10 at least once at some point during the sequence.

A1) As mentioned in the contest analysis, it can be proved that other than 0 the maximum sequence length has an upper bound of 90, and the specific case of 125 gets close at 72.  Therefore the largest number to consider will be less than 90 million, so there is no risk of overflow.  So this problem boils down to can you correctly break a number down into its base 10 digits.  This is a pretty common operation in coding competitions for some reason or another, but one which is missing from TMD.Algo – I think I’ll fix that.

Q2) Given a sequence of – and + characters, determine the minimum number of operations to turn them all into +, if the only operation you can perform is to reverse the first k characters and also invert them all so – becomes + and vice versa.

A2) The key to this problem is that a change between – and + in the sequence will always remain a change during any operation unless the operation only includes one of those characters and the first element of the sequence is equal to the kth element of the sequence.  Therefore the number of changes, plus potentially one more because you end up with all -, is a trivial lower bound.  And simply repeatedly fixing the first change in the sequence is the solution because the prefix is always all the same character.  My preferred solution is to append a + on the end then just count where character not equal its next.

Q3) Generate a set of sequences of 1’s and 0’s that start and end with 1 and have a specific length, and when interpreted in each base 2 through 10, are always non-prime.

A3)  This was an unusual problem in that the entire test set was in the problem statement, there was nothing unexpected.  And despite that the large input only had a 70% pass rate.  This suggests a lot of people tried to be tricky like the contest analysis proposed, rather than just brute forcing with an arbitrary precision type.  Or didn’t realize that a 32 digit base 10 number is too large for 64 bit integers – I hope there wasn’t too many in that group, given to pass the small they would have already realized a 16 digit base 10 number is too large for 32 bit.

Anyway I just brute forced this problem using BigInteger and checking for trivial composites with factors less than 9.  Interestingly I found that just checking for composites using just 3 and 7 or 5 and 7 was effective, but not using just 3 and 5 or 2 and 7.  I’m not clear on why this is the case though, the contest analysis talks about a 3,2,7 being very popular so I guess 3,7 works for a significant fraction of those??  Really I’ve not done enough investigation to be sure.

Like the first problem, this problem involved digits, specifically for the brute force, interpreting them as values in different bases.  This is a pretty simple piece of code to write, but also one that shows up in programming competitions a bit, so it feels a bit of a deficit in TMD.Algo which I should fix.

Q4) Assuming that a K^C element sequence of L and G is generated by repeatedly applying a rule that starting with a length K (but otherwise unknown) base pattern of L and G a derived pattern is created by replacing each L with the original base pattern and G with an equal length sequence of G’s, determine S locations to check which will prove whether the are any G’s in the full sequence.

A4)  So this problem had a very trivial small input.  Regardless of the base pattern the first K characters are either base pattern, or all G.  If any of them are G then obviously the pattern contains a G.  If not then the base pattern is all L’s, which obviously means the entire sequence is L.  So when S = K you just return the numbers 1 through K.

The problem is that this approach tells you nothing about the large input.  Unless you actually understand the problem fully, you could come up with ways to do better than S = K, which will pass the small input, but then you’ll fail the large.  I think a second small input set where S could be anything, but K^C was limited to a much smaller number could have caught the first order failure to fully understand the problem without clearly giving away the full depth.

Anyway, I like this problem because of the subtle connection back to problem 1 and 3.  This is actually a problem about digits.  Given a zero-based trial location the zero-based positions in the original base sequence which could affect the trial locations value are the same as the digits of the trial location when written in base K.  More specifically, if any of those locations are G then the trial will return G.  So the ideal search is a set of C digit base K numbers where there is no unnecessarily repeated digits.  If any of the base pattern contains G, one of the search locations will be a G, otherwise the entire sequence is L’s since the base pattern is all L’s.  Once I have some nice digit sequence logic in TMD.Algo, I think the implementation for this solution will only be a couple of lines.