## TCO14 R2A

So I barely managed 615th out of the 1439 people who registered out of the 2500 potential people.  My rating didn’t move down much as a result, but I was disappointed with my result all the same.  Getting the second problem out was good for a guaranteed t-shirt, and solving the 1000pt was a guaranteed advancement.  Advancement cut-off was 442 points.

Q1) Given 16 values arrange them into a 4×4 grid such that sum of all edge values + double the corners + the absolute difference between neighbour pairs is maximal.  Return the maximal sum.

A1) I am not a fan of this question.  I solved it quickly, but accidentally, with a poor solution which is not at all obviously correct.  I then spent far too long trying to convince myself as to whether I was missing something critical, getting a low score in the process.

So the trick to this question is to convince yourself that the values given do not matter, if you sort them and place them the same as you would place 1 through 16 optimally, you still get an optimal solution.  One such ordering is as follows:

```16  3 15  8
4 14  1 13
12  2 11  5
7 10  6  9```

The idea is that if you choose to have your largest 8 numbers in a checker board pattern the sum becomes 4 * sum of largest 8 – 4 * sum of the middle 2 (because they are surrounded) – 2 * sum of those on the edges (because their being on the edge offsets one of their 3 neighbours) – 0 * sum of those in the corners (because corner offsets both of the 2 neighbours).  So it smallest 2 numbers go in the middle, next 4 smallest on the edges, and the next 2 in the corners, and the rest are checker boarded is optimal for the checker board scenario.  A small inspection shows that the formula still holds if some of the smallest 8 are equal to some of the largest 8.  All that remains is to prove that there is no scenario where 2 elements of the largest 8 are neighbours produces a better result.  But I have no idea how to do that and the checker board pattern is somewhat ‘obviously’ good.

As a demonstration of why this problem was ‘broken’ – placing the numbers into the 4×4 square in sorted order (but not for all random orders) and them performing a greedy pairwise swap until no pairwise swaps improve the sum, would pass system tests, despite having practically no relation to the actual greedy arrangement.

Q2) Given a set of positions in a long narrow hallway, determine the minimum they have to move in order to get to a different set of positions, if they can only pass each other at either end of the hallway.

A2) This is where I am really disappointed with my self.  Even with all of the time wasting I did on the first problem I still had > 25minutes left after my submission.  However because I had become convinced that I was missing something with that first question, and no one else in the room had submitted the 500pt, and one person had skipped it in favour of the 1000pt, I decided it was a better investment in my time to try and come up with a killer test case to break my (and hopefully others) 250pt question.  Which was both a) futile since everyone else’s code passed system tests and b) unsuccessful, since my solution passed system tests…

So this problem really isn’t that difficult.  Sort by starting locations to get an ordered set of indexes, sort by ending locations to get another ordered set of indexes.  Now there are 2 basic scenarios.

First up, some sub-range of the starting indexes which all have the same destination indexes directly shifts to their destinations while everything to the left shifts via the left end, and everything right shifts via the right end.  This is only valid if every item to the left has a destination on the left, and every item on the right has a destination on the right.  This has O(N^2) scenarios each with an evaluation time of O(N) to give O(N^3).

The second is a bit more complicated and I might have gotten it wrong (if there wasn’t a practice question which would have picked it up).  In this case no elements move directly to their destination, everything moves either via the left or the right ends.  So for every possible partition point you can consider everything going left and right, and getting to their destinations via the corresponding end points.  However this doesn’t work if the set of destination indexes of the left and right sections aren’t also on the left and right.  The trick is that these scenarios can all work, just that some items have to visit both ends.

So having partitioned the starting indexes into left and right groups, you also have to partition the destination indexes into left and right groups, and if they change groups, you have to add the extra distance of having visited both ends.  This gives O(N^2) scenarios each with an evaluation time of O(N) to add an addition O(N^3).  Input constraints were at most 50 items, so the result is easily managed.

Q3) Given a tree with black disks on some (non-root) nodes and a red disk on the root node, determine which nodes the red disk can visit by sliding along edges, if disks cannot pass through each other, but are otherwise all free to move along edges.

A3) Only 6 people solved this successfully, and none of the solutions are trivial to read and understand.  General idea seems to be that you cache the number of black disks and the number of nodes for each sub-tree and then use that to determine whether the black disks can be shuffled out of the way to let the red disk to visit.  If a node has multiple children and can be visited by the red disk, and not all of those children are filled, the red disk can visit some of the children, and potentially the black disks in the other children can be shifted up and into available spaces else where on the tree, allowing the red disk to reach even more locations.  I think this logic gets applied repeatedly until no more red disk reach locations can be found.  Without the cache the solution would be too slow.

## GCJ14 R1C

Top 1000 advanced which looks to be solving question 1 completely and question 2 small in < 1:50.  For safety add in the question 3 small input.  In fact you could drop question 1 large.  All 3 small inputs was sufficient to advance.

Q1) Given a fraction, determine if it can be created by repeated pairwise averaging of 2^40 values which are either 0 or 1.  If so, determine the best case depth of a value of 1 in that averaging. (Best case being closest to the final value.)

A1) Odd little question, but actually quite simple.  Difference between small and large input was that the large could involve 64 bit numbers, and you needed to implement a GCD to reduce the fraction to minimal components, but otherwise still quite simple.

The solution comes that because repeated pairwise averaging is identical to just averaging all the values at once, the fraction must be of the form n/2^40.  So if the base of the fraction is a power of 2, the fraction can be created.  This is where the GCD comes in for the large input, without reducing to the minimal form, the base could be a multiple of a power of 2, and also of some other number, which can be factored out of the numerator.

For calculating the depth, it is obvious that if you consider the n/2^40 form you have n 1’s at the top level.  To keep a 1 as long as possible you shift them all to one side.  Thus the depth will be 40 – the floor of log 2 of n.  Another way of looking at it is that if the fraction >= 1/2 a parent can be a 1, >= 1/4 a grand parent, and so on.  So the depth is equal to the number of times you can double the fraction before it is greater than or equal to 1.

Q2) Given sequences of characters, determine how many ways the sequences can be ordered such that the concatenation has every case of any given letter being in a single contiguous grouping.  Return result modulo 1 billion and 7.

A2) Small input there are 10! orderings to try, so it can just be brute forced.  Large input there are up to 100 sequences, so something smarter is needed.

The approach I would have gone with is categorizing the inputs.  If a sequence contains just the one character some number of times, we call that a ‘repeater’ (even if it is just a single character).  If a sequence starts with some number instances of a letter, we say it starts with that letter.  Similarly for ending.  Any characters between a start and end and the sequence also gets categorized as having those characters in the middle.

From there a bunch of impossible scenarios can be detected straight off the bat.  Only 1 sequence can start with any given letter, excluding repeaters.  Same for ending.  No 2 sequences can have the same characters in the middle, unless they are both repeaters.  A single sequence can’t have the same character in the middle multiple times, unless they are contiguous.  (I note now that the problem can be somewhat simplified by removing all contiguous duplications from the input sequences.  They don’t change the problem and just complicate detecting invalid scenarios.)

Next the inputs need to be grouped.  A starter and an ender of the same letter (and the repeaters) must be in the same group.  When gathering a group, you can follow the starter to an ender, and then to a starter, and to an ender and so on, and vice-versa, to get a full group.  However if you detect a cycle, this is also impossible.

The final part is to calculate the number of orderings.  If there are n groups, those groups can be selected to form n! orderings.  Each group has only one ordering, unless it has repeaters.  Each letter in a group which has k repeaters, contributes k! orderings.  All these factorials must be multiplied together, continually using the modulo 1 billion and 7 as required and use 64 bit integers to avoid overflow.

Q3) Given a NxM grid of points connected by horizontal and vertical lines, determine the minimum number of covered points required to enclose at least K locations.  Enclosing is defined by either being covered or being unable to connect to the edge of the grid without going through a covered point.

A3) The small input has N*M less than or equal to 20.  This actually limits a lot of the possible scenarios, but it does allow a brute force of whether each point is covered or not.

The large input has N*M <= 1000, which gives a lot greater possibilities for tricky corner cases, but does easily allow for a O(N*M*(N + M)) time algorithm.  (Examples of this can be found in the solutions to download.)

So this problem has a few things to it, which makes it a bit tricky.  First of all the problem is actually slightly simpler with an infinite grid.  In this scenario we can ask ‘what is the largest area that can be enclosed with n stones and not have to worry about whether the ideal shape runs into the edge of the grid or not.  So if n = 4k, the ideal shape is a diamond pattern.  Proving this may be non-trivial in practice, but it makes a lot of sense.  In such a shape any attempt to increase the area by moving a stone, forces all the other stones to move, just moving the shape.  If n = 4k+2 the ideal shape has 2 options, either a diamond with 2 longer opposite sides, or take the 4k ideal and duplicate 2 of its vertical or horizontal extremes, kind of making a blunted diamond.

``` ##
#  #
##```

When I was first thinking about this problem I made the mistake of not considering 4k+1 and 4k+3 correctly.  Looking at the case of k=1 led me to conclude that these could only cover one extra position in terms of area compared to the one stone less case.  Instead the ideal scenarios look more like this:

```  ##
#  #
#    #
#     #
#   #
# #
#```

The general pattern that can be derived is that there are 4 diagonal lines, which either meet at a point or a blunted point.  4k is a perfect diamond shape, 4k+1 one diagonal line moves outwards, causing 2 blunts.  4k+2 moves out two diagonal lines, if they are adjacent then there are 2 blunts on opposite corners, if they are opposite it is all points, but the diamond is stretched.  4k+3 moves out 3 diagonal lines, resulting in 2 adjacent blunted points on a stretched diamond.

``` ##
#  #
# #
#```

So, how does this interact with the the borders?  Well it turns out that so long as you choose adjacent case for 4k+2 (or maybe even if not?), simply truncating positions to fit inside the box is as good as you can do.  I don’t have a good proof, but once the shape is too big, the edges have to be connected, so you get runs across the edge, and minimizing stones becomes cutting diagonals across the corners, which ends up being the same as squishing the ideal shape into the available space.

So the O(N*M*(N+M)) solution is to try every area less than or equal to N*M, and cut off corners incrementally until they no longer touch the edge.  Since every area is tried, the ideal scenarios that don’t touch the edges of the full NxM grid are all tried because they are all touch the sides of some size rectangle.  Taking turns cutting the corners off one by one goes through the 4k, 4k-1, 4k-2, 4k-3, 4(k-1) sequence, pushing the diagonal lines inwards incrementally.  The best solution of all tried is returned.

A more interesting (but same running time solution) is to consider ‘can it be solved in n stones’ and then depending on the modulus the formula for the 4 boundary diagonal lines can be formulated, knowing the border will be a total of n stones, the area can then be calculated by walking over every point and checking whether it is inside the 4 diagonal lines.  The solution I saw using this approach tested cutting off corners in both orientations of the NxM grid, but I suspect that with a little thought it could be reduced to only testing one of those 2 orientations.

This approach leads to an O(N+M) solution.  Since the worst case answer is 2N+2M-4, you can walk upwards towards that value checking each n until one works.  If you can calculate the covered area in O(1) time, this is O(N+M).  This is much more complicated piece of code, but by calculating the intersection points of the 4 boundary lines with each other you can calculate a reduced area (if the shape isn’t already going to touch the edges) and then the number of empty points in the corners of this possibly reduced area can be calculated by k(k+1)/2 where k is the index (counting based on 0 from the corner) of the first filled in point along the edge, which is found by calculating the intersection of the diagonal line with the edge.  The sum of these 4 corner cut-offs is subtracted from the the total area to give the number of enclosed points.  There are 4 diagonal line pair intersections and then 4 diagonal line with a edge intersection calculations involved, all of which are simple formula solves (although the blunted corners result in the ‘intersection’ of the 2 lines being somewhere between points where you can place a stone, this is easily special cased) which can be done in O(1) time, for a total of O(1) time to calculate the covered area.

Special cases.  If the desired area is 4 or less, no need to do any work, the number of stones required equals the area.  This is also the case if the minimum of N and M is <= 2, since there is no way to surround a square in such a narrow space.  Also not obvious in the discussion above is when the desired area is >= N*M-3.  In this case the amount cut off a corner is 0 in some cases, which still works, it is just a bit unexpected considering all the use of diagonals in my description.

## GCJ14 R1B

Another week, another round of a coding competition… or so it seems this time of year.

Top 1000 advancing to round 2.  The cut-off was fully solving question 2 and the small input of question 1 in 2:20.  All small inputs in a fast time came 1035th, which was interestingly close to advancing.  The score to get by without having to worry about time was achieved by all small inputs and the large of the first question.

Q1) Given a game where characters can be duplicated and consecutive duplicate characters can be collapsed back to a single character, determine if a set of strings can be made equal, and if so, the minimum number of moves to achieve this goal.

A1) Determining whether the goal can be achieved is easy, just replace all sequences of consecutive repeats with a single character, and determine if all the input strings are equal.  This produces a ‘canonical’ form that all must match.  For the minimum number of moves it isn’t really that much harder.  For the small input there are only 2 strings, so choose one of the strings and count how many moves it takes to make the other equal to it.  There is no advantage to trying to meet in the middle, so on a per canonical string character basis just take the absolute difference of the counts in each of the strings and sum them all for the result.

For the large input there are up to 100 strings, but because each string is only at most 100 characters, no letter can be repeated more than 100 times.  Therefore the ideal to converge upon will be no more than 100 repeats and no less than 1.  Since each character of the canonical form can be considered independently, all possibilities can be easily tested within the time limit.  Just try each value of 1 to 100 inclusive and calculate the sum of absolute differences, the smallest sum is the minimum number of moves for that canonical character.  Then just sum all of those minimums for the final result.

The most challenging part of this question is probably that the large input is considerably more ‘different’ to the small input, so a bug in the code could easily get past the small input only to fail on the large input, even though it would easily run in time.

For interest, an extension of this question where the maximum length of a given string is unbounded by the maximum length of the canonical form is short, the problem is still easily solved so long as the counts are provided rather than having to parse the unbounded length string.  The sum of a set of absolute difference functions all with equal slope has a minimum at the middle of the sorted list of 0 points of the individual functions.  For an even number of functions, either of the middle 2 will work because the the sum is flat in between.  Thus with a value selected the total of absolute differences can be calculated for a total run time proportional only to the number of strings and the number of characters in the canonical form.

Q2) How many pairs of 2 numbers selected from specific ranges when bitwise ‘and’d together have a result in a 3rd specific range.  All ranges start from 0.

A2) Easiest small input ever?  With range sizes of only at most 1000 each, nested loop over the input ranges and calculate the and and compare to the maximum value of the 3rd range.  If less increase counter.  Once loops are done, return counter.  6 trivial lines?

Large input however is much more challenging.  With ranges of up to 1 billion, such brute force is out of the question.  Given the bitwise and function and the large ranges, running time seems like it will need to be related to the logarithm of the input range maximums.

Determining whether 2 numbers ‘and’ to give a 3rd is trivial, which leads to the thought, is there some way to convert the ranges into batches which can be easily processed in parallel.  Perhaps using a ternary notation of 0, 1 and ‘don’t care’?

All numbers less than n can be grouped in a ternary notation by having a prefix which is equal to the start of n, followed by a 0 where there is a 1 in n, followed by ‘don’t care’ for all subsequent bits.  The number of such patterns needed is equal to the number of set bits in n, which is worst case proportional to the logarithm of n.

So the problem now boils down to, generate these ternary representations for the 2 input ranges and the output range, then for every way of selecting from those 3 lists, determine how many results it allows.  Sum those all up, and return.

So the core part of that is ‘determine how many results it allows’.  So inside the triple nested loop to select the 3 ternary representations being tested another loop over each bit is needed.  Each bit is independent in the bitwise and, so the number of ways of doing each bit can be considered a multiplier.  Multiplying them all together will give the number of solutions that are derived from this particular selection.

Handling each bit can be done with a large if statement selecting on the 3 different values which can be in each of the 2 inputs and the output.  For example ‘don’t care’, ‘don’t care’, ‘don’t care’ has 4 possible scenarios and ‘don’t care’, ‘don’t care’, 0 has 3 possible scenarios.  Such an if statement is quite large and convoluted, so some more nested loops might be simpler.  Loop over each possibility for the 2 inputs, skipping if it doesn’t match the actual input values.  Then calculate the and of the 2 input bits and compare it to the expected output value incrementing the counter if it matches.  The simple match function for ‘bit vs ternary value’ is handled much more easily than the huge if statement.

One catch is that the 3 ranges originally specified 2 are exclusive and one is inclusive of the maximum, so applying the above approach the inclusive bound needs to be incremented to make it an exclusive bound first.  Alternatively all numbers less or equal to n can be considered by using the same set of patterns above and adding the exact pattern of n as well.  This is equivalent to setting the bit one past the end of the number to 0 and so can be treated as one extra loop, with appropriate special case handling.

Q3) Given a set of locations with distinct ‘values’ and a list of connections between locations (all locations are connected), determine a order of visiting which doesn’t arrive at any node more than once, but can backtrack as much as you like, which visits every node such that the sequence of first visits is ‘smallest’.  Smallest is defined by comparing the first elements of two sequences, then the second elements, and so on until one is smaller than the other.

A3)  The small input only had 8 nodes, so brute force was plausible.  Just had to make sure you don’t infinite loop.

First thing to realise is that the values of the locations do not matter, so first step is to renumber the graph so that the smallest value is node 0, the second smallest is node 1, and so on.  Then the walk of the graph desired is one which prefers smaller node indexes.

The solution seems to be a greedy approach.  First thing to realise is that node 0 (in the renumbered scenario) is always the correct place to start – since you can start from anywhere, and it is always possible to walk to every node of a connected graph, starting from any given node even with the additional restrictions specified by the problem.

From that first location you will always walk to the smallest index connected to node 0.  But after that it becomes slightly more tricky.  More generally the problem becomes deciding what to do when you have visited a set of nodes V and have a current path from node 0 to current node N.  For each of the nodes on the path, you can go to any of the nodes not in V that are connected to the node in the path.  This leads to a set of scenarios which need to be prioritized for the greedy approach.  The obvious priority is the index of the destination.  It is always preferable to go to a smaller index if possible.  However we need to break ties as the graph can be fully connected, in which case it is possible to reach every node from every node.  Here the idea is that deeper in the path is better.  Every time you backtrack you lose the possibility of going from that node in future, so it is kind of an opportunity cost.

So that seems pretty simple – but just choosing the best value from the set of scenarios, doesn’t always work.  The problem is that it may cause you to back track to take a link to a low index node, but there might be nodes that can only be reached through the node you backtracked from, and there is no way to get back to that node any more.  Therefore before taking any option in the priority order of options, you must perform a reachability check. A recursive walk from each node in the potential future path, out in to the world not visiting anywhere already visited, do you visit every node not already visited?  If the reachability check fails, that option must be discarded and the next highest priority considered.

The running time of this algorithm is a bit complicated, but an easy upper bound can be found.  There are O(N) steps.  Worst of these steps has O(N) nodes on the current path.  This results in O(N^2) possible scenarios to consider.  Each of which could require a reachability analysis which takes O(E) or O(N^2).  Thus an easy upper bound is O(N^5) which with 50 nodes is fine.  In practice the cases with high number of scenarios at a given step also tend to pass reachability easily, so the code runs within time with ease.