## TCO17 R1B

Less than 500 people turned up for this round, so of course positive scores advanced, of which there was 312.  Almost 100 of those solved the 1000pt question, with the 250pt question seeming to have a higher failure rate amongst the top competitors.

Q1) Given two amounts of two different types, and two rates of consumption and the ability to convert type 1 in to type 2 at rate of 2:1, determine the maximum time before one or the other type is exhausted.

A1) So there are two scenarios.  First determine the baseline time of exhaustion by dividing the amount by rate of consumption for each type.  If type 1 runs out first, there is nothing to be done, return that time.  Otherwise the aim is to find an exchange level which has type 1 and type 2 run out at the same time.  Since each increment we convert will lengthen the shortest and shorten the longest until that point.

Thus we need to solve (A1-x)/C1=(A2+x/2)/C2 to determine x, and then recalculate either side.  x =(C2A1-C1A2)/(C2+C1/2).  Note that x looks like it could be negative, but that is the same case as type 1 running out first.

Probable cause of failure is failing to account correctly for the input range.  C2*A1 will overflow 32 bit int if you haven’t converted to floating point.  C2A1-C1A2 being calculated with small precision float could be an issue as well, but I think using a 64 bit float might be sufficient, definitely fine with 80 bit float.  Obviously it could all be calculated using BigInteger fractions, but I’m pretty sure that is excessive.

Q2) Determine a set of positive integers such that the number of distinct sums which can be constructed from them (using each number at most once) is exactly k.  Maximum number of values in the set is 20 and k might be up to 1 million

A2) k=(2^N)-1 is an easy first scenario, just use powers of two between 0 and N-1 inclusive.

As an extension of this observation, it can be seen that given an existing set S, you can create a new set S2 which has double + 1 as many distinct output values, by adding a new value which is one more than the sum of the existing values.  If instead we add a new value which is equal to the sum, the number of distinct outputs doubles.  Now if we can work out a way to go from one set to another such that only 1 new value is added it would seem we are done.

If we can assume that the existing set of outputs when sorted forms the range 1 to x, then by adding the value 1, the new output is 1 to x+1, and we’ve added exactly one new value.  Helpfully the proposed doubling scheme above when given the range 1 to x creates 1 to 2x.

Thus we have increment and double operators, which lets us create any number in logarithmic number of steps.  However, with a target size of up to 1 million, its easy to construct a scenario where just double and increment take 38 values.  But we’re limited to 20.

So we need to be more greedy.  Note that we can actually extend our logic above to say that given a set of size x we can create any new size between x+1 and 2x+1, just by adjusting the new value added.  Or in reverse, given a current size x, we create it from any size between x-1 and floor(x/2).  So rather than building up to the target number, we could instead tear it down.  If the target is odd, choose a value for your output set which is half rounding up and solve for half rounding down, if the target is even choose the value equal to half and solve for half.  Repeat until target is 0.  Now you are done and its obvious that 20 values is sufficient!.

Q3) Given a tree with weighted nodes, determine the set of all sub-tree weights, then determine the sum of x raised to the power of each member of that set.  Return the answer modulo 1 billion and 7.  Weights of each node may be large, x itself may be large, but there is at most 50 nodes.

A3) Since the weight of each node can be large, and the tree could be very flat, the number of distinct subtree weights could be huge.  So enumerating them all is out of  the question.  So it would appear that the problem needs to be built up from the leaves.

As a simple starting point, consider a node with one child and an unknown number of sub children.  The answer consists of the answer ignoring the parent plus the answer including the parent added together.  Assume we have the answer for ignoring the parent (since we are planning on building up from leaves) the answer including the parent consists of, the parent on its own  (x^(parent weight)) and, the parent plus every subtree including the child (x^(parent weight + subtree weight) summed over every subtree).  We already have x^(sub tree weight) summed over every subtree, so this is (1+subtree answer)*x^(parent weight).

So we can now answer a linear tree, but that isn’t very interesting.  We need to handle branching.  So parent plus two children. (1+subtree1 answer + subtree 2 answer)*x*(parent weight) handles the scenarios of only choosing to include one of the children, but what about some subset of both.  Thinking for a bit this should be (subtree1 answer * subtree 2 answer)*x^(parent weight).  Subtree 1 is a sum of all the different subtrees, sub tree 2 answer is the sum of all those sub trees, so every combination of them can be formed by calculating the product, and each term of the product is correct as its x^weight we are trying to calculate so x^(weight1+weight2)=x^weight1*x^weight2.

This can obviously be expanded to an arbitrary number of children, but as written its got 2^(number of children) terms – which is far too many given tree branching factor can be as high as 49.  So we need to simplify.  1 + subtree1 + subtree2 + subtree1*subtree2 = (1+subtree1)*(1+subtree2).  And this pattern holds as number of children increases.  So now its a product of a linear number of terms – which can be easily calculated.

Finally once we build all the way up the tree, we then need to sum all the values over the tree, as each node is only calculating the sum for subtrees which are rooted at that location.

Of course the answer can be huge, so we need to consider how the modulo comes in to play.  Obviously adding two values is easy.  Multiplying two values needs care to avoid 32 bit overflow.  But the raw values of x^(current node weight) where both x and weight might be large requires both care for 32 bit overflow, and to perform accelerated exponentiation.  But those 3 scenarios cover all the operations needed.  No division or subtraction to be handled.  Pretty easy as 1000pt questions go.

## TCO17 R1A

I got to see 2 am twice staying up for this round, since it was scheduled for the hour when DST ended…

Positive scores advanced, only about 1000 people registered.  Only 17 people got all 3 questions out.  I think I was close on solving all 3, just ran out of time on the fiddly implementation for Q3, but given how many submissions for Q3 failed system tests, maybe I wasn’t as close as I thought…

Q1) For a table tennis tournament with people with distinct skill levels where greater skill always wins and only one table at which everyone queues, and after n consecutive wins the winner joins the end of the queue after the loser.  Determine who wins/loses in the Kth round.

A1) K was only at most 1000, so this was a simple simulation, even if you don’t have a queue data structure handy it’ll trivially run in time.  Just need some care tracking how many wins the players have had and be sure to add the retiring winner to the queue after that rounds loser.  I guess maybe input sizes of 2 and 3 would also present a potential corner case to some implementations.

Q2) Determine the maximum sum of products of the longer 2 dimensions of rectangular prisms which are made from an original rectangular prism of integer dimensions A,B,C by slicing parallel to a face to create 2 rectangular prisms of integer dimensions, where the pieces have to have a minimum dimension of each S in order to count.

A2) A,B,C could all be up to 100, and S could be 1, so the number of ways to break it down rules out brute force.  Instinct suggested a greedy solution was the way forward.  Options to consider are slicing in half, or slicing off a slice of size S.  Slicing in half gives you two smaller problems, but you can clearly see that you get less slices than slicing off S at a time if you repeatedly try to slice in half.  Once you decide to slice off S at a time, you could switch to dynamic programming – state space is at worst size 100^3, with 3 options to check from each state, that will easy run in time.

However I still liked this problem for greedy, so with a little bit of math, you can show that slicing from the smallest dimension is strictly better than slicing from a larger dimension.  But if the smallest dimension is less than 2*S, slicing on that dimension has no gain, and slicing in a the next longest dimension is actually a win.  Again once the two smallest dimensions are less than 2*S, you slice the longest dimension instead.  So first loop summing the product of the longest two dimensions until you get the shortest to less than 2*S, then repeat for until the second shortest is also less than 2*S, then again for the final dimension, and then finally add the product of the two longest remaining dimensions.

Simple mistake that could be made here is that when you are slicing the second or third longest dimensions to get them down to less than 2*S, you may end up with your 3 dimensions changing sort orders.  If you don’t resort your array after each run of slicing, you might end up start summing the product of the shortest and longest sides, rather than the two longest sides.  I sorted every time I changed the value, just to be safe, sorting 3 values is cheap 😛

Q3) Given a strictly convex polygon with the maximum and minimum y valued coordinates on the y axis, determine the volume of revolution around the y axis.

A3) This question was quite unclear, since given the polygon in most cases crosses the y axis, a full 360 degree rotation will cause the shape to sweep over itself.  So I did for a moment wonder if they meant the volume of 180 degree rotation – but the first example created a full cylinder out of a square with one edge on the y axis.  So I presume that the answer is the union volume from the 360 degree sweep.  I think I might have seen an answer which calculated volume of each half, doubled and added together and subtracted the intersection volume – so I think I was right about that.

So the final shape will be a stack of truncated cones, and the formula for volume of a truncated code is pretty simple.  The trick becomes determining the coordinates where each truncated code outside edge starts and stops.  Since its a union shape we’re trying to determine the volume of, take each segment on the left of the y-axis and reflect it to the right, and then we need to determine the outside edge of the combined set of segments.  Walking the segments in pairs advancing the segment which ends first, there are 3 possibilities for the overlap y range of the two segments.  Either one or the other is completely to the left or right of the other, or they could cross.  Turn each segment in to a line equation, allows the substitution of the y range min and max to check the maximum x-coordinates for the top and bottom of the range, if they belong to different segments do a segment intersection calculation to find the cross point.  Then you get 2 truncated cones instead of one by adding the cross point in between the 2 calculated maximum x coordinates for the start and end of the range..

Corner case – horizontal segments.  There can be up to 2 horizontal segments which are at the top and bottom since its convex.  These two segments can be ignored, and should be since otherwise you can have both horizontal and vertical segments which makes line equations difficult (without them you can construct the line equations as x = ay+b, otherwise vertical lines are annoying), and you also have segment overlaps with 0 extent in y axis which is confusing.

## Distributed Code Jam 2016 R2

15 to the world finals and at the moment the top 12 solved all the problems. To get through required a good time and only failing one of the larges, and not the one worth the most points.

I think I could have pretty easily gotten a top 200 placing, with the 2 easiest larges and the other two smalls.  None of the problems seem ‘too’ difficult, but writing the correct solution to 3 larges in the time available is probably a stretch for me.

Q1) Given some inefficient code, write equivalent but efficient code to solve the problem.

A1) Like last time the detail is of course in the code given.  This time it was a bit more deceptive.  It appeared to be calculating as a sum the all pair product between a set A and a set B, but subtly the central server doesn’t do any calculating, so its actually the all pair product where the pairs don’t sum to 0 mod 20.

The trick here is to realise the all pair product is sum of all elements in set A times the sum of all elements in set B.  Removing the pairs which sum to 0 mod 20 is a bit more tricky, but its the product of the sum of the 0 mod 20 in set A and set B plus the product of the sum of the items which are 1 mod 20 in set A and the items which are 19 mod 20 in set B, and so on.

To simplify the code slightly rather than calculating the total and the removing, you can just calculate the sum of items that are x mod 20 in set A and y mod 20 in set B and sum all pair product where x+y mod 20 != 0

All that remains is to solve the problem in a distributed scenario where there are 100 million elements in each of set A and set B.  Summing things is easily distributed, so each node creates to sum tables, one for each of set A and set B with an entry for each mod 20.  Then it processes its chunk of the two inputs calculating the mod and summing it into the correct chunk.  Once done, send both tables to the central server, which sums all the tables together and then performs the final pair product step.

One thing is that the answer can easily be too big for 64 bit, so the problem actually wants it modulo 1 billion and 7.  As with many problems with this requirement, its important to remember to apply the modulo after every calculation.  And even though the modulo and all individual inputs are valid for 32 bit, calculating products will have intermediary values which need 64 bit precision.

Q2) Determine the longest prefix of a sequence of brackets which can be extended in to a balanced sequence.  Return -1 if the entire sequence is already a balanced sequence.

A2) So the question is worded a bit more vaguely than above, but if you’ve had some exposure to bracket systems you’ll know that the description is just equivalent to any balanced sequence of brackets.  At first it might seem this problem will be difficult to do distributed because each segment’s validity clearly depends on the one before, but one of the details of a balanced sequence of brackets is that any sequence of brackets can be made balanced, so long as the running sum of open (+1) and closed (-1) never goes negative.  Therefore each segment just depends on the running sum of the previous segments.

So each node calculates its total for the local running sum, sends them to the central server which accumulates those running sums and sends back the correct starting total for each node, to that node.  Now each node can check whether the true running sum goes negative at any point.  If it does, it sends that position to central server, otherwise it sends -1.  The central server then returns the first non-negative result.  Otherwise it checks if the grand total is 0, if so it returns -1, otherwise it returns the length of the full data set.  (This last conditional I think is probably the easiest thing to miss in this problem…)

Since the data is only 1 character per value, memory use isn’t a high risk, and the cost per read of value is low enough that you could probably read each value assigned to the node twice without running out of time if you don’t want to read in to a local array.  The usual gotchas around splitting data amongst the nodes apply.

Q3) Given a game where you can move left or right 1 unit or stay still in any given turn, and the rest of the game map moves down one unit after your move.  Determine the maximum number of points you can get given any choice of starting locations.  Each non-wall containing cell is worth between 0 and 9 points which you earn just by entering it.

A3)  The small input can be easily done on a single node, and the solution is a dynamic program on the maximum number of points you can get in total once you arrive at position x.  From each row from bottom to top you try each non-wall position, then there are 3 possibilities, if the position in front isn’t a wall max its current value with the sum of the local current value and the value of that cell.  Then if the right and right forward diagonal aren’t walls, similar with the right diagonal cell, but also add in the value of the right cell.  Similarly with left diagonal.  This is seeded by the value of each non-wall position in the bottom row, and the final result is the maximum value in the row above the top of the input.  This DP is O(1) per cell, so can be completed very quickly in the small input scenario where the maximum size is 1000 by 1000.  The large input of 30000 by 30000 would be almost processed in time by a single node, if it wasn’t for the memory limits and the time to read the raw input.

So the question becomes, how to break the problem up to pass the large input across multiple servers.  Horizontal stripes are a bad idea, because the mathematical operation is a max of sums and it mixes values from a lot of locations so it doesn’t distribute well trying to chain it with results that arrive after you’ve started.  So the solution is vertical stripes.  However the edges of each row of a stripe are dependent on one cell to the right and left of the stripe of  the row below.  If each server was to communicate its edges with every row, that would be 30 thousand messages, which given the estimated 5ms latency per message is far far far too long to run in time.

In order to send less messages each node needs to do more work.  Because the game is limited to one move left or right per row, if we want to only send a message every 300 rows (which brings in a total of half a second expected latency instead of 150 seconds) we need to start with an extra 600 width on our base section, then 598, 596…   This doubles the number of values which has to be read by each server if they would usually be dealing with a 300 wide stripe (and doubles the number of DP calculations, but the reading of the input dominates).

In order to continue after we’ve done 300 rows, we need to share the correct values for the full set of 300 neighbours on each side – which we can get from each of the neighbouring servers if we’ve set it up right…  Once we’ve got that information we can continue for another 300 rows.

So, does this solution scale well enough?  Estimated latency waiting for network is 500ms.  Time to ready the values for the stripe and half of each neighbour is 30000*600*0.05us = 0.9 seconds.  Actual computation time is much lower.  So 5s should be plenty of time.  Memory usage is 18MB minimum, but quadruple that if you use 16bit chars and a resizing array without calling reserve.  Still fits.  In fact it fits well enough you might as well make your life simpler and just use a full 3 widths allocation even though you’ll only use half of each of the outside areas.

Each message sent is 1200 bytes, you send up to 100 messages to 2 servers, this isn’t even a whole megabyte…  And the 100 messages is well below the 1000 limit.

Gotchas:

1. Using your normal logic for distributing the columns to servers can result in server with much less than 300 width.  If your code mistakenly decides to send in height chunks equal to the width allocated to the server, you can easily become too message spammy…   If you don’t, you need to start requesting results from more than one server over, and again you become too spammy. You want to explicitly allocate exactly 300 main columns to each server to make this simpler.  If you don’t use all the servers, that’s okay, it will still easily run in time.  More generically (if you don’t like writing your code to the specific problem constraints…) you should divide it in to equal chunks of width depending on the longest of the two dimensions.
2. One server has a different width, so its neighbouring server gets complicated…  Solution is to pretend that the total width is actually exactly a multiple of 300, and fill the extra columns with walls.  Makes the code for reading values in to the array slightly more difficult, but makes everything else much simpler…

Q4) Determine the cheapest way to get from a to b if there is a petrol station every km and you use one litre per km, and have a maximum fuel tank size T.  Each petrol station has its own price per litre.

A4) So I struggled for a while just to get the logic right for the small input, but eventually I realised its actually quite simple.  For each km, the cost you paid for that km, is the cheapest of the previous T petrol stations.  So if you keep a running minimum you just add the minimum value for each node as you go.

As I mentioned for the first problem, the sum operation distributes well.  A running min on the other hand…

So, the solution is all about calculating a running min.  If T is small, this is easy, just do it locally.  More specifically, if T is less than the width you allocate to each node, the problem might as well be solved locally, each node potentially doubles the number of values it has to read in, but that is still easily managed against the time and memory limits with some care.  Each node calculates its sum and sends it to the central server which creates the grand total.

However there is no such limit on T, it could be very large…  However, if T is larger than the width you allocate to each machine, you don’t have to do a running min on that machine, its just a simple min of the values seen so far.  Any value added is never removed.  However you also need to consider the minimum of some of the values allocated to other nodes, and that you do have to remove values from.  But before we get into that detail.. Consider if T is larger than double the width.  In that case there is an entire server’s worth of values which will always be in the running min for any given server other than the first.  Here we find the opportunity to calculate something distributed which will help the other nodes.

So the first step is for each node to find its minimum price of its allocated section, and broadcast that to every node to the right of it.

So, now for each node the running min consists of 3 (or 4) values of which we take the minimum.

1. The minimum seen so far in this chunk that we are calculating the sum for.
2. The minimum of the previous servers which are always completely covered by the fuel tank size T.
3. The minimum of the previous server which is currently completely covered (but won’t be later once we’ve advanced more).
4. The minimum of the partial server section needed to fill out T.

The first two of these are easy, the third is easy if its present.  The fourth presents a bit of a challenge.  I propose that the simplest way is for the local server to read the values allocated to that partial server, in reverse order from the end of that server allocation.  As it finds a new local minimum, it records it and its position.  Then as you advance pointer you check if the current far minimum is still valid, if not, pop it off the stack and use the one underneath.  Worst case this stack contains the entire contents of a servers allocation, at 8 bytes a pop, 4 bytes for the minimum (since they all fit under 1 billion) and 4 bytes for the index.

So, does it scale well enough.  Well the worst case tank size is when it causes some servers to create the stack for almost one entire server, then again for another server because advancing the width just drops over into the next server, and in order to calculate that minimum you have to process the entire server’s width.  The processing time itself is cheap, the memory allocations will fit so long as you reuse your reserved size stack when switching servers at the boundary… (Should be under 80MB if you do it right.) Network latency is not significant…  The big concern is the data reading time.  5 million positions allocated per server, reading time 0.75 seconds just for its own.  Triple that and its getting close to half of the allowed time limit just reading the data.

So, it might be good enough… but can we do better?  The answer is kind of yes.  At the beginning we calculated a single min per server and sent that around.  If instead we calculate 100 mins per server, we don’t make calculating the global minimum for the trivially covered section much slower, but it reduces the size of the worst case stack by a factor of 100.  On the other hand it increases the worst case number of stacks we have to create from 2 to 101, but because those stacks are all 1/100th the size, we just reduced the overhead from 2 extra servers worth of reading to 1.01 extra servers worth.  This saves you almost 0.75 seconds of reading time in the worst case.  The reduction to the worst case stack size also makes it much much easier to stay under the memory limit.  It increases the network traffic size by a factor of 100, but its still under 100KB sent per node.

Gotchas:

1. I said it was easy for the case where T is small – but for T around the size of 1 width its actually a bit tricky.  Doing a running min the natural data structure is a multiset (an ordered set which allows repetition), but a multiset with 5 million entries can have an an unexpectedly large memory cost, well beyond the 40MB you might expect.  Without adequate knowledge of the implementation details of the multiset included in your programming language, you definitely run risk of running out of ram…  This can be solved by the same extension proposed to reduce the read worst size in the very large T case.  Using the locally calculated 100 mins, the size of T where you need to do a true running min is reduced to only 50thousand elements, which a multiset will easily fit in to memory for.

## GCJ 2016 R3

Unlike Round 2 where I think I would have struggled to make the top 500, this round I think I might have done much better if I had been competing. Possibly even top 120.

Advancing to the world finals was definitely beyond me, that would have required solving the large of the problem worth the most points as well as the other problems I consider within my reach, which I struggle to even comprehend how the solution verifier could work…

Q1) Given a sequence of moods which you can either make a request or submission against, and the constraint that you can only submit your last unsubmitted request, determine the maximum score you can get if you get 10 points for requesting in the same mood as a submission, and 5 points otherwise.

A1) The actual problem describes the possibility to get 0 points, but that would require you to request the wrong thing for the current mood and then submit it later also against the wrong mood.  Any such scenario can trivially be improved by changing your request to not be the wrong thing, since the later submission would then also not be the wrong match, so you get 10 more points.

So, this problem is a bit deceptive, but a little playing around with scenarios should give you a good guess that greedy is the way forward.  If the input sequence of moods has equal two in a row, you’ll do a request submit to get the 10 points.  Having removed those, you might now have moods which are neighbouring and equal, so you can align those up and get 10 points there as well.  Keep repeating until there is no pairings left.  The remains is a simply alternating sequence, there is no way to get 10 points in a simply alternating sequence regardless of the moves you perform.  So just take each pair as it comes and get 5 points each.

Having discovered this greedy solution, the only problem remaining is that the large is up to 20k, so an O(N^2) solution is going to be too slow…

As it turns out, its possible to do in linear time.  The simplest approach I have for doing this is a bit unusual.  It uses a linked list!

Convert the input in to a linked list, then while you have a current node and and a next node, if they are equal, remove those nodes and leave current point to either just before current or just after next if there is nothing before current.  If they aren’t equal, advance one.  Every time you remove a pair add 5 points to your total.  Then add 5 points for half the size of the original input.

A more complicated approach which doesn’t technically require a linked list, would instead involve an array of starts.  When you find a pair you repeat outwards to find the largest simple chunk which can be matched up from the inside out.  Then you set the start array for the last member of that chunk to point to the first.  Then you move on.  Whenever you finish creating a chunk, check if its left neighbour has a start value.  If so use that value to conceptually O(1) merge this chunk with its neighbour and start considering whether the ones outside that can be made in to a pair.  If so, keep going and write a start value once you get stuck again.  This basically simulates the process of the linked list algorithm…

Q2) Determine the fraction of strings which contain certain words where the strings are generated by all the possible ways to select nodes of a forest such that you only ever select a child after its parent and each node has a single character label.

A2) This question was probably the one I was least likely to get out even though I could solve it.  It just looks too hard…

In practice you just need a small number of scenarios to convince yourself that a simple answer is in fact correct.

Because the required accuracy is only 3 parts in 100, randomly generating 10000 of the strings should be plenty.  (The exact detail escapes me, but I seem to recall that if you require an accuracy of 1 part in x for something that has a specific random chance, you need to run x^2 simulations.)

Given the input size is only up to 100, O(N^2) should be ‘okay’ to generate 10k strings if you are quick about it.  The trick is just how to ensure that your randomly selected strings are representative.

To work this out, consider a simple case where one node is on its own, and 10 others are in a chain.  There are 11 possible outputs, for each of the different possible locations of inserting the one node in the chain.  If you were to select with equal likelihood from the available options at any point, half of your generate strings will start with the label of the one node on its own, when it should only be ~9%.  The next option I considered was randomly selecting a node and adding it and all of its parents.  However if we do that the probability of the one node on its own being the last value is far higher than 9%.

The third option I considered was, weighting each available node to select from, by additionally including the count of all of its children.  This means the first selection is 10 parts the chain, and 1 part the node on its own.  Which gets us the correct percentage.  Following all the way through the options shows it generates things correctly for this scenario.  Which is promising.  We already know it also generates the right values for a forest entirely of single nodes.  So that is two data points in its favour…  I tried one more scenario to convince myself.  A single node and a simple 3 node tree.  The tree has 2 orders, and the single node has 4 insertion points, this gives 8 possible scenarios.  Of those 8 scenarios 2 start with the single node.  Again the weighted selection works 3:1 corresponds to 8:2.  And walking through the rest of the scenarios shows the percentages work out.

Good thing about this question is that its just a high valued small input, so if this weighting thing is wrong, we can find out pretty quickly…  but its just fine as it happens.

So calculate the weights of the tree, then add the roots, randomly select based on total weight, remove that node, add its children, repeat.  This is O(N^2).

A slightly nicer to implement solution is to recognise that the weights are effectively a place holder for allowing random selection of any node and then putting the deepest not yet placed parent of that node instead of the selected node.  This way you don’t have to do any tree constructing.  Just have a boolean array representing what has been selected so far, generate a random number the size of the remnants and walk to find the nth not selected item.  Then walk its parents while they exist and aren’t selected.  Finally select that node and then repeat the process until all nodes are selected.

Q3) Determine the smallest maximum jump size to get from one asteroid to a second asteroid, if there are a bunch of asteroids moving at linear speeds and you can jump between any two at any moment so long as you don’t stay anywhere more than S seconds.

A3) This problem reminds me of another I’ve done before, but that was in 1 dimension, this is in three…

However, the small input is easy, since all the linear speeds are zero.  Therefore there is no point to waiting anywhere.  Therefore it just becomes a question of for a given maximum jump size x, is there a path between asteroid 0 and asteroid 1 then solving for the smallest maximum jump size x.  Once a maximum jump size is set, the problem reduces to a simple search (DFS is one option).  To solve for smallest, you can do a binary search.

Contest analysis isn’t up yet, and I’ve not ready anyone else’s solutions, so I don’t know how to do the large.  I suspect it has a similar structure.  Once you have a set maximum jump size you can determine the time ranges that a given edge is open or closed.  This appears to cause a combinatoric explosion if you clone each node based time range that a given combination of edges are open/closed with edges between the clones depending on the size of S, but maybe you don’t need to, you can instead just have a separate clone per status of a single edge, connected to clones representing different arrival time ranges and the separate arrival and departure clones are connected depending on the times and size of S…  I don’t quite think it works though.

Q4) Given a simple single bit single register computer and two programs that run simultaneously made of three atomic instructions, set 0, set 1 and print register, determine a program pair which can potentially print any of the ‘good’ values, but never prints the ‘bad’ value.

A4) Again I don’t know how to solve the large (at least I don’t think I do…), but the small is surprisingly trivial.

For the small input the bad value is always a sequence of ones.  So as long as the bad value isn’t also a good value (which is something to just check for in general…) one option is to write a program which can print any sequence of X digits except for all ones.  One way to do this, is to have one program that continuously sets the register to 0, then prints the register, x times.  Then the second program consists of exactly X-1 calls to set the register to 1.  Because of arbitrary interleaving, this program pair can print anything except for X 1’s.  Since the first program always sets to 0 before printing, there would need to be an interleave before every print, but there are only X-1 instructions from the second program to interleave.

This is suggestive of possible approaches for the large, but the open question in my mind is whether they cover all possible inputs correctly, or just a subset.  For instance an input where the bad value is all zeros, can be solved using an inverted version of the program.  If the bad value has one one digit, and all the good values have either more or all less than one one digit similar constructions work.  In general I can solve if the number of set bits is either strictly greater or strictly less than the bad value, but not otherwise.  For specific cases if there is only one good value, I can solve that too…  If there are multiple good values and they have x set bits in common, but the bad value ‘further away’ from the commonality then the good values or doesn’t have all of those values set. That can be solved too.  Or inverted for x unset bits in common.

In general the things I can solve are where one program does all the printing, and the second program is a sequence of ones or zeros of some length.  Its not clear to me that such a program pair is always the solution if there is such a solution, but it doesn’t seem unlikely…

## Distributed Code Jam 2016 Round 1

So this year, the structure is a bit different 2 rounds before the grand final for distributed and more people eligible to compete in DCJ.

As usual I’ve been busy, so Round 2 is almost about to happen, I figured it was time to write up my analysis of Round 1.

Unlike round 2 of the main code jam, I think I would have had a chance of advancing to round 2 of DCJ this year if I was competing.  I usually say there is practically no chance of me writing up 4 solutions in 3 hours, but this round the questions seemed pretty easy (except for the large of the last one), so I think there would have been a real chance of me getting a top 200 placing.

Q1) Given some inefficient source code, write an equivalent, but efficient version.

A1) So the detail is of course in the provided source code.  It is of course a bit convoluted, but the net result is its calculating the largest difference between any two values in the input.

Of course the best way to do that is to calculate the global maximum and global minimum and take their difference.  The large input there is too much data to do this all in one node (of course), but if you break the input in to approximately equal chunks per server, each server calculates min/max – sends to central server which calculates the min of mins and max of maxs.  Then the difference of those two is the largest difference and that central server can print it out.

I suspect the biggest chance of failing this question is in breaking up the input incorrectly.  Its a very common task in DCJ, you always want to keep the code around to do this, you need to handle cases where there is less input then there are nodes, or if you try to allocate equal size to each node that some nodes won’t get any.  If you want to get tricky and try dividing up the input by modulo rather than blocks (which can seem easier) remember to write your loop using += modulus rather than checking every number against the mod.  1 billion divisions is going to time out…

Another risky move would be to read all the values you need into memory rather than just calculating the min/max on the stream of values as you read them in.  This should only take 80MB of ram per node in the worst case, but if you are using a doubling resizeable data structure like vector, you will be very likely to hit 128MB instead, maybe even higher…  Reserve the required space to avoid this.

Q2) Determine who wins a giant rock paper scissors competition if everyone always plays a specific move, and in case of draws the person with lower id wins.

A2) The large input size is only 2^28 unlike the 1 billion for the first problem, but its still easily large enough for you to time out if you tried to read all of the values on a single node.  Breaking the input up in to approximately equal chunks across all servers is not a good strategy this time, a much better option is to break it up into a number of exactly equal chunks which is the largest power of 2 which is less than the number of servers.  This means each server is running its own isolated competition with a single winner.  Each single winner can then be passed to the central server and it can continue from there.

This one doesn’t seem very risky so long as you break your input up by powers of two.  Only 4 million characters, you can read them all in to memory, and even not bother reserving, or reusing vectors as you run the contest (even if you happen to be using 16bit characters…).  Run time should be pretty fast too.

Q3) Determine the minimum number of moves to restructure a bunch of piles of crates to have the specified almost uniform height across all piles. (All equal, or one higher for some of the earlier stacks if the total isn’t exactly dividable amongst the number of stacks.

A3) Definite step up in difficulty here.  The contest analysis describes one approach, I’m going to describe a slightly different one which I think is a bit simpler logically, although its a bit more difficult to code to be fast enough…

First phase we need to know how high each stack should be.  To know this we need to know how many crates there are in total.  So every node takes a chunk and sums it up and sends to central server which sums the sums, and then rebroadcasts that grand total back to everyone…

The core trick the problem is that because the crates can only be moved one at a time to a neighbouring stack, the problem can be solved left to right one stack at a time.  First stack knows how high it should be, how high it actually is.  If its too high, k moves to move the excess to the next.  If its too low, it needs to steal from its neighbours to the right, so walk right stealing until you have enough.  Steal from the closest first, and as you get further away you each steal costs multiple moves to move through all the intervening stacks.

For the distributed version each node needs to solve in isolation to be fast enough, so at some point you are going to dump a bunch of crates on to the next server, or you are going to need a bunch of crates from the next server.  For the purposes of stealing once you get the edge you just assume the other server will have moved them in to your last stack and you can count the cost of moving them from that stack to the right locations.

So, since each server will need to know the flow at the serve edge to calculate the correct value, the central server might as well calculate that while its calculating the grand total.  Since all it needs to work that out is know what the desired height total for each server is, and the total.  Since it just worked out the grand total, the desired height total for each server isn’t much harder, with the one corner case of the server where the desired height changes.  Then going from left to right comparing the sum to the desired total it can work out the flow to the next server, and assuming the flow and next sum and compared to the next desired total it can determine the next flow after that….

It then shares the flow to server before and server after to every server.  The initial and final flows are of course 0.

Each server node then can process left to right.  First deal with the flow.  Positive flow in to this node you just add to the corresponding edge node.  Negative flow you start the standard stealing process to calculate.  Remembering to track the cost.

Once the flows are fixed you can just solve the problem locally given the desired height calculated from the grand total.

Gotchas to watch out for.

1. Stealing from neighbours can cause you to run too slow if you try to check the same set of neighbours over and over.  You need to keep track of where you have already picked clean, and start one right of that any time you need to steal to ensure that the local calculation part is linear in the size of the section assigned.
2. Even though each stack size is only up to 1 billion, the flows calculated can be in the hundreds of millions of billions.  Therefore you must use 64 bit entries.  Given each node is dealing with up to 10 million entries, this means you must reserve to avoid the risk of hitting 128MB.
3. Because of the large flows that can travel across nodes, when stealing the cost calculated from a single steal can overflow 64 bits.  Either do the product with 128bit types, or (more sensibly) ensure that you mod the number of crates being moved before you multiply by the distance moved.
4. Mod everywhere – the product I mentioned above is the biggest individual risk, but there are plenty of other ways to overflow if you don’t remember to apply the mod during the process and not just at the end.

Q4) Determine the smallest positive integer in a set which is unique.

A4) The small can be run on a single node, and is easy to write.  Just read the values in, sort them, return the first which is not equal to one of its neighbours.  Or zero if you get to the end of the array.

The large is only 35 Million, which if it wasn’t for the memory limit is almost able to be processed in the 4 second time limit…  Well, maybe it would need 7-8 seconds…

The contest analysis solution is much safer, rather than trying to do a distributed merge sort, break the problem in to chunks and then combine the results.  Unlike the other problems where the chunks are segments of the input stream, (or stripes if you like for the first problem) the appropriate chunks depends on the input.  In order for each section of input to be processed independently, we need to be sure that all duplicates of any given number are processed by the same node.  This can be achieved by a hash function.  Each node processes a segment, and after locally separating the dupes from the uniques, forwards values to other servers based on their hash.  Each server then receives a two sets of values from other servers, the uniques and the dupes.  Since it now knows all copies of a specific value are together, it can make a call as to the lowest unique which has a specific hash assigned to this server.  It sends these locally lowest definitely globally unique values to the central server, which can then calculate the global lowest.

With the relatively small input size, if you are using all 100 nodes it would seem that there is no risk of running out of ram following the general strategy described above.  But if you don’t reduce the input sent to specific servers by hash down to the two sets and instead send all the values, a single server can easily be overwhelmed by the input case where all the values are the same.  If your hash is terrible (ie mod 100), similar problems can be had.  But if your hash is decent, and you do correctly filter the data down before sending it on, it should be pretty safe.

## GCJ16 R2

Busy as usual, so this analysis is quite late.  I’m not sure I would have advanced if I was competing in this round, the top 500 cutoff was both the first two questions and one of the other smalls.  The first two problems weren’t especially challenging, but writing correct solutions to 3 problems (even if one is a small) in the time available is not really something I am reliable at.

Top 1000 for the t-shirt was much more managable, the first two problems, or even failing one of those larges you pass if you have one of the other smalls as a backup.

Q1) Determine the minimal ordering for a set of rock, paper, scissors players such that there will be no draws in the resulting tournament.  Or return impossible if such a scenario doesn’t exist.

A1) If you solve the small version by brute force you can see that for any given total number of players, there are only 3 possible scenarios in terms of the number of players who like each of rock, paper or scissors.   This makes a lot of sense once you try and build the problem from the end result.  The tournament either ends in a winner who plays R or P or S.  For each you know that they had to have played the type which they can beat.  This continues until you get to the right total number of players.  The trick is to determine the minimal ordering.  A mistake I made was to assume that if when you are generating the set of players you always choose to generate the next level in alphabetical order where possible, the final result will be optimal.  If you don’t have sharp eyes you might miss that this doesn’t generate the same result as your brute force solution.  Moral of the story, use an actual diff to compare the output of your brute force and optimal solutions…

An example of how this is wrong is if you start with R – next is RS, third tier is RSPS, but the ideal is PSRS.  The solution is to perform a more general optimization step.  Rather than sorting just the neighbours, sort the pairs, and then quadruples, the octuples, etc…  You might as well just do this at the final stage, unless you’ve decided to pre-generate all the answers for all the different lengths, which is easily managed for the given constraints.  Such a sort maintains the tree structure of the competition, but (fairly obviously) minimizes the ordering.

Q2) Given a set of probabilities of voting yes or no, determine a subset of size K which maximises the chance of a tie.

A2) I think this question is actually quite tricky, but if you do the brute force for the small input you can pretty easily see a pattern.  The extremes of the probability spectrum are the best choices.  Having made that observation you might try a few options, like greedily adding either the highest two, or the lowest two, or one high and one low, whichever is best.  This doesn’t work.  What does work is trying every combination of size K which has the N minimum values and K-N maximum values.  I don’t really see how you could come to this solution except by observation  My intuition mistakenly suggests it should be the middle ones.

However that isn’t the whole problem – for a given set of K probabilities, you need to determine the probability of a tie.  Even for the small input, brute forcing this is unreasonable.  However, this sub-problem is a straight forward dynamic programming problem.  Given x people voting yes or no, there is between 0 and x yes votes.  So you can have a dynamic program on the probability of the first x people having voted y yes votes.  Start with 0 people having 0 votes as probability one.  Then use f(x,y) = f(x-1, y)*P(x votes no) + f(x-1, y-1)*P(x votes yes) to fill in the half triangle of the K by K grid of scenarios and return f(K, K/2).

Q3) Given a grid which is to contain a hedge maze constructed of diagonal walls, determine if (and if so how) to construct the hedge such that there is a specific set of paths between edges where these paths never join or cross.

A3) The first observation is that the diagonal hedge maze structure can only create paths that are linear, so you don’t have to worry about the joining or crossing part.  You just need to concern yourself with the paths getting to the right edge destination for each possible starting edge location.

The small brute force is small enough you can construct every possible hedge maze and then trace the paths.  The main difficulty with this O(2^(R*C) * R * C) solution is tracing the paths.  One option (which I investigated) is to store current location as cell x,y and whether its the top or left edge of that cell, with virtual cells beyond the edges to do the bottom edges and right edges of the bottom and right most cells.  Then given that edge and diagonals in the neighbouring cells, determine the two possible new locations.  Rule out the one which is either outside or the same as the previous location you were at.  This works, but its far more cumbersome than the method suggested in the analysis, which is to store position as cell x,y and which of the 4 directions you are entering this cell from.  By increasing the size of the third component from 2 to 4 options, you get rid of the need to possibly generate two locations, or track where your previous position was.

As an aside, these kind of grid problems which work with edges are something I feel I could do something nice about in TMD.Algo – hopefully when I get some time I’ll make such a convenience class.

The large can’t be brute-forced, but some iterative thinking comes to the conclusion that it can be solved greedily.  If two neighbouring edge locations need to join, the minimum number of cells their path can go through is those exactly neighbouring them.  And such a path can always be constructed.  If you want to connect things which are further apart, the optimal solution is to hug the wall of the paths of things closer together which have already been connected.  This wall hugging is clearly in some sense minimal. Its compactness means it maximises the region remaining for other paths to be constructed through.  Thus the conclusion that this greedy wall following will either work or the problem isn’t solvable.

I feel that writing the code to add walls to create the wall following is something I would have found awkward and difficult under a time constraint, especially with my x,y,(left or top) position notation.  However the contest analysis makes the excellent point that the wall following is either flowing through a cell which already has a wall, or you want to turn left if you are going between two locations that have shortest length in the clockwise direction.  With the x,y, (direction of entry) formulation, its very easy to determine what wall you have to add in order to turn left.

Q4) Given a set of people who have a different skill set, determine the minimum number of additional skills to give to some of these people to ensure that regardless of order of arrival and selection of tasks to work on (from those not already taken), everyone ends up with something to work on. (Number of tasks and their corresponding required skills is equal to the size of the set of people.)

A4) So this question deceived me in to think it was easier than it actually is.  Some reasoning about the problem comes to the conclusion that if two people know the same skill, they are ‘connected’, and if a group of people are all somehow connected, they must be fully connected, or there exists some arrival order which is wrong.  From that I jumped to a completely incorrect solution of computing the connected components of people and just summing the minimum number of skills needed to make those connected components fully connected.  This doesn’t even work for the samples included…  Since it doesn’t necessarily ensure that everyone knows anything at all.

The small input size is only 4 people, so there is only 2^16 possible skill set scenarios.  The criterion for whether a scenario is valid is that for each column there must be at least one bit set, for each row as well.  And if there are more than one bit set in a row, those columns must be identical and vice versa.  Choose the ones which are a superset of the starting details, and find the one with the minimal set of differences.  That minimum is the answer.

The large input is far from being susceptible to brute force.  2^50 skill set scenarios.  However, greedy approaches like my first attempt are equally problematic.  It comes down to avoiding the full 2^50 explosion while considering every possibility.

The first key observation which is given by the contest analysis is that a connected component which has p people who know q skills, it doesn’t matter how they are connected or what the specific labels of those people or skills are.  The final answer is groups of x people knowing x skills with x^2 edges, the sums of these x^2’s minus the original number of edges needs to be minimised, but that answer isn’t affected by any of those details.

The second key observation is that a connected component that has p people knowing q skills is further interchangeable with any other component with the same number of people and skills.

So a given scenario consists of counts of distinct pairs.  The number new of scenarios that can be generated is quadratic in the number of distinct pairs, but it will always have one less total number of pairs.  So if we consider every possible scenario generated so far of total number of pairs k, use it to generate new scenarios with k-1 pairs, and just keep generating from the distinct results.  If you see a scenario with only balanced values, calculate the sum of squares to be a new potential minimum.  This approach is a bit different to as described in contest analysis, and may not actually run fast enough – I’ll need to check…  Writing the appropriate hash functions for a multi-set of pairs would be interesting.

## GCJ16 R1B

So, with 1000 having already advanced there probably wasn’t quite so much competition this round, but it definitely seemed to be a slightly harder round, with only 320 perfect scores.  Specifically the third problem large input was a bit difficult if you didn’t have experience with that kind of problem before.

In the end the cutoff was the first 2 problems in a decent time.

Q1) Given a set of characters made out of the the letters in the English words for the digits zero through nine, determine what the original digits were, in non-descending order.

A1) This was a fairly straight forward, if potentially tedious to code problem – you just have to identify the correct order to try removing the words from the distinct count of each letter in order to avoid getting into a dead end.  One such order is (Z)ERO, T(W)O, FO(U)R, SI(X), EI(G)HT, (O)NE, (T)HREE. (F)IVE, SE(V)EN, N(I)NE, .  Each of the letters in brackets is the last time that letter occurs reading left to right, so removing in that order obvious works.  I like this particular ordering because its the even digits followed by the odd digits, each in ascending order.

Q2) Given a pair of numbers that have been obscured by replacing some of the digits with question marks, and may also be zero padded, determine the way to replace the question marks which results in the minimal difference, breaking ties by minimizing the first value and further by minimizing the second value.

A2) So the contest analysis mentions that this can be done in O(N), which I found interesting, but I’ve not managed to solve yet.  The O(N^2) solution is consider the replacement of question marks to have 3 phases.  Phase 1 you try to make the numbers equal, Phase 2 you introduce a minimal single digit difference, Phase 3 you try to maximize the difference in the opposite direction to the difference introduced in Phase 2.  You just iterate over all of the possible places for Phase 2, including not every reaching it.  For each possible phase 2 location you can either try and make the second larger, or the first larger, but that only doubles the total number of passes, so its still O(N^2).  You can try and optimize further by aborting when phase 1 fails to make equal, but its not trivial to know when not to bother trying with phase 2 – I think you need  some kind of pre-processing step to work that out in order to get to the O(N) solution.

Q3) Given ordered pairs of words, determine the maximum subset which can be made entirely out of the first and second word sets of the inverse of that subset.

A3) I immediately recognized this as a graph problem.  I mistakenly first assumed that the answer was number of edges minus the minimum spanning tree, but the minimum spanning tree was the wrong concept.  Not too long after that I realized it was a bipartite graph problem.  With that I simply assumed that TMD.Algo had the solution and that the answer was the number of edges minus something to do with calculating the bipartite matching.   In the end its number of edges minus the bipartite maximal independent set.  Which doesn’t make a lot of sense at first glance, since you want number of edges minus the minimal edge cover size – but for a bipartite graph the maximal independent set is equal to the minimal edge cover size.  The minimal edge cover is equal to the size of the maximum matching plus one for each vertex not in the maximum matching.  Number of vertexes not in the maximum matching is to number of vertexes minus twice the maximum matching.  Add on the size of the maximum matching and you get number of vertexes minus the maximum matching, which is the same formula as the size of the maximum independent set.

## TCO16 R1C

Last round 1 for this years topcoder, 750 to advance, but only 615 registered.  So positive scores to advance.  In the end 500 people managed to get a positive score.

Q1) Determine if a set of integers is closed under addition.

A1) This is a very simple problem with an optimization which I think I might have tried, but screwed up.  Lucky I already got through in round 1A.  Just try all pairs, add and check for membership.  Even the O(N^3) solution is fast enough.

With all the inputs between +/-50 its a trivial O(N) to perform a distinct count, then return false if there are more than 3 distinct values.  If any non-zero number has more than 1, return false.  Otherwise check all pairs on the 3 values.

Q2) Determine if a sequence of C’s B’s and A’s can be reordered so that no B’s are adjacent and no C’s are within 2 of each other.

A2) So I think there should be some kind of pseudo-greedy approach to solve this, but all the solutions I’ve seen are dynamic programming.  The state is the number of A’s remaining, number of B’s remaining, number of C’s remaining and whether the last letter is a B or which of the last 2 letters is a C.  So for each possible state if there is an A remaining you can reduce the A count by force last letter to not be B and shift which of the last 2 letters is a C back one.  If there is a B remaining and last letter is not B, decrement B, mark last as B and shift which of the last 2 letters is a C back one.  Similarly for the C.  If the destination state already has a string assigned to it, skip, else use the current string and append the new letter and assign it to the new state – put it on a queue to consider.

If you ever get to a state of no more A, B, or C – return the string associated with the state.  Number of states is O(N^3) and each state has a string creation cost.  Total O(N^4) is fast enough.  Can be done in O(N^3) by not actually constructing full strings, just storing pointers to the come-from state.

Q3) Determine the number of integers between 1 and X when represented in binary for which the largest non-empty subset of at most length – K digits is less than Y.

A3) X and Y can be huge – so this problem is challenging.  I’ve not groked any solution yet.  Obviously all numbers less than Y are trivially a pass, but that is where it stops being easy.

## TCO16 R1B

So I wasn’t in this round since I already advanced to round 2, and I was too late getting back to the topcoder website to see the scoreboards, so this will just be a problem discussion.

Q1) Determine the first prime found in a sequence starting with N, or return -1 if not found in N steps.  Sequence is to replace X with the sum of the squares of its digits.

A1)  One terminal is if you get to the value 1 you can immediately return -1, because it will never change again.  However more generally it is not difficult to observe that once X is less than 200, it never goes over 200 again, and regardless of X (at least for up to 1 billion), it will be less than 200 in at most 3 steps.  Therefore if there is any repetitive loop other than reaching one, the number of steps to identify the cycle is no more than 200.  So just keep a hashset of values seen so far, and if you see something you’ve seen before, return -1.  Otherwise its just testing primality and breaking up digits and summing their squares in a loop.  Possibly also be careful for small starting N that you don’t run your loop more than N times looking for a cycle, but given the density of small primes, its not obvious there is such a scenario to worry about.

Q2) Given the ability to replace digits using a pool of specific counts of digits (no zeros) and an original list of numbers, determine the maximum sum that can be created by modifying the original list of numbers using the pool of replacement digits.

A2) This is a straight greedy problem.  The final sum depends on which digits are replaced with what values in what positions in the numbers, but not the order you make the changes.  Therefore for each digit available you replace them with the highest available digit that increase the number, and you prioritize the ones which provide the greatest increase in local value, since that increase is the same increase applied to the final sum.  As there are only 350 digits at most to consider, you could just do an O(N^2) algorithm to find the greatest increase, apply it, then try again until the pool runs dry or there are no more possible increases to the sum.

For an improvement to the run time you can sort the candidate digits by their place, descending power of ten, breaking ties by their digit, ascending.  Then you just walk through them replacing each with the best available improvement.

So I kind of glossed over why this works, it isn’t perfectly obvious since if your pool consists of only 9’s and 2’s and you use up all the 9’s on large numbers starting with 1, and the rest of the numbers don’t have any 1’s in them, those 2’s are going to waste.  However when you use a number, the number of digits which go to waste is at most 1.  Because the wastage is further down the sorted sequence, it either has a lower place (and hence even at 0-9 its 90% of the improvement that just increasing the current place one higher that is the minimum you get when invoking the wastage) or the same place but with a smaller difference, in which case the ideal is to take the 2 highest digits which is exactly what happens in the wastage scenario.

Q3) Given a sequence of values and multiple ranges that it costs 1 unit to increase value by 1 and one range that covers the full sequence that it costs T units to increase the value by 1, determine the minimum cost to create a sequence of values which is never less than the original sequence.

A3) The number of ranges, length of the sequence, and potential range of values in each spot are all at least up to 10^5, so its clear that you need a very efficient solution.  This problem is a clear step up in difficulty from the previous two.

So due to the high data input sizes my mind immediately went to a minimum find search on the number of units to spend on the range that costs T, then for each potential number of units spent under consideration we need to solve the minimum cost to spend on the other ranges in linear time.

As it happens solving the inner problem isn’t trivial either, to run in linear time you need some pre-computation.  Sort the ranges by their start points, breaking ties by the end point descending.  Then walk through and discard any ranges that don’t increase the end point, these ranges are fully contained by the current element, so are useless.  Now that we’ve got a more useful set of ranges (in O(N log N) time), we still need to do more pre-computation to allow for the inner loop of the search to be linear time.  The aim is to compute for each sequence position the end of the right most range which covers it.  (The reason this is useful I’ll explain below.)  First fill the list with -1 to represent places that aren’t covered at all.  With our sorted list the covered bits are easy, if the next element’s start is lower than the current end point, fill from this start to that start with the current elements end point, otherwise fill from start to end with that end point.  Then move onto the next element and repeat.  This is linear because no place is written more than twice.

So now that we’ve got this computed table minimum cost isn’t so hard to calculate in linear time.  Consider the left most position which isn’t already satisfied – everything to its left is all good, so there is no point increasing values in that area any more.  So the ideal range to increase is the one which covers this point, and covers at much to the right as possible.  Conveniently that’s what we already calculated.  So we determine the number of points to spend then apply that till the end point?  Not so fast, if all the ranges are about half the length of the sequence one step after another, you’ll suddenly have an O(N^2) algorithm.  Instead you need to keep track of how much you are currently adding, and mark the end point to decrement that addition by how much you just added.  Then you just walk along, if there is more needed, increase the spend, apply to the current location, mark when to decrease it, if there was a mark on this position, decrease the spend.  Accumulate the total spend increases to give a final total.  If there are any places where you are short and there is no covering range (as indicated by -1) abort and return infinity as the cost.

So all that remains is the minimum find search.  A ternary search is one option to consider, but care is required because ternary search won’t survive a plateau which is not the minimum, and there could be multiple values where you return infinity for.  So you need to special case infinity to always cut off the low third even if both of the probe points are equal.  Another alternative to the ternary search is to use a binary search on slope, looking for zero, if you don’t find zero, you can take the first element with positive slope to the right.  Again with the binary search on slope you have to take care of the infinities, they appear to have zero slope when next to each other, but should be considered to have negative slope.

## GCJ16 Round 1A

So this round was blitz’d – cut-off was top 1000, and all of them finished all of the problems.  I’m not confident I could have solved them all in time, the key observation for the second problem doesn’t seem something I would have caught.

Contest analysis was posted almost immediately again, so my analysis is kind of redundant, but I’ll do it anyway!

Q1) Given a sequence of characters where for each you can choose to put it at the start or end of a word you are building, determine the lexicographically largest word you can make.

A1) So this is a greedy problem. At every point whichever action makes local sense, is also the path to the global maximum.  The contest analysis shows how to solve the large in 4 lines of python, I think it can be done in one (rather long) line of c# using the Aggregate function for enumerables.

However both of these solutions are quadratic in the size of the input (which is fine given a maximum input size of 1000).  Its possible to do this problem in linear time using a dequeue.  This is because the larger of adding the new letter as a prefix or postfix is the same as comparing the new letter to the first letter and putting it at the back if the new letter is smaller.  This reduction of the comparison to just the first character works because the generated word will always go down after any first sequence of repeated letters, never up, so extending the sequence of repetition is always an improvement.  Pretty sure this property can pretty easily be proved, but I’ve not done it formally.

Q2) Given 2N-1 sequences of length N which are all strictly ascending and each correspond distinctly to one of the rows or columns of NxN grid where every row and column is strictly ascending, return the missing row or column values in ascending order.

A2) Brute force for the small input isn’t exactly trivial to write but it is at least obvious.  Contest analysis tells me that the trick here for the large input is that every missing value will have an odd frequency due to there only being one missing row/column.  This leads to a very straightforward solution – which would be one (very long) line of C# if there was a method to Sort and return an enumerable…  I think I might add a ToSortedArray extension method to TMD.Algo.  This is O(N^2) since the data to be sorted is length N.

However since I feel that trick is too hard to spot reliably, I’ll now present an alternative implementation which doesn’t rely on it, but is still O(N^2).  Its just a bit more complicated and hence not one I’m confident I could design in the time available.

So this other method also has a trick, but one I think is more obvious.  The smallest value in the grid is in the top left corner.  Therefore the smallest value in the first spot of any of the inputs, is the value of the top left corner.  Further more any sequence which starts with that value, must be either the top row or the left most column.  Therefore there will be at most 2 of these.  This identification takes O(N) time, since we only look at the first value of each and find min, then a second pass to tag those that are min.  The trick is that now that we’ve identified the input data which could potentially be in the first row or column, the problem has been reduced.  Ignoring the data we’ve already tagged, the smallest value in the second spot of any of the rest of the inputs is the value from the second position of the diagonal.  Again there are at most two which match and we can tag those as being the second row or column.  We can repeat this until we’ve now identified the entire diagonal of the final grid and at most two options for any row or column pair passing through each position in the diagonal.

So far we’ve spent O(N^2) time, but we’re not yet done, we’ve only identified a fraction of the full grid.  Luckily we’re actually almost done.  Chose one of the elements of the diagonal that has 2 input data’s associated with it.  Since the grid is currently empty you can place these in any orientation without loss of generality, so make one the row and the other the column and place them into the grid.  Now consider the two arrays you just placed, for any indexes where they aren’t identical, the corresponding row/column pair which crosses perpendicular through those indexes is an interesting place to think about.  Since the values are not the same, if you have two options for that row/column pair, only one can be the row, and the other must be the column.  Thus they can be placed.  And so on for every time you place a row/column pair if any of the elements are different you have another row/column pair that can be placed if its not already placed.  So use a queue and a mask to know what you’ve put in the queue already, and this cycle of placing and finding new things to place is linear in the number of values in the arrays placed so far.  The one thing is that if there isn’t anything different, or if the difference is for the row/column where one is missing, your queue will flush, but you won’t be finished yet.

Here is the not quite so obviously true bit, but one I’m confident is fine.  Since you’ve gotten to a point where there is no difference in the rows or column pairs that are yet to be placed, you can simply choose arbitrarily again, just like you did for the first pair.  The known cells are guaranteed to match, and the unknown cells are guaranteed not to be influenced by anything you’ve already placed before.  Thus we can just throw a new index on to the queue and keep going.

Finally there is just the row/column pair passing through the diagonal where only one data set was tagged.  Check if the one data set matches the column, if so return the row, otherwise return the column.  Although before you return, do make sure you’ve set the diagonal value itself, it won’t have been populated yet unless you populated it at the very beginning while the diagonal values were being determined.

Every step along the way to fill in the values is linear in the number of values filled in, so O(N^2) just like the much simpler solution proposed by the contest analysis.

Q3) Given a directed graph where every node has exactly one outgoing edge, determine the maximum number of nodes that can be placed in a circle such that every element in the circle is directly connected to one of its neighbouring elements.

A3) Despite this problem being worth the most number of points – I found it to be conceptually the easiest.  Yes even easier than the first problem.  That being said the implementation is a pain, even worse than my extra complicated option for Q2 large input size.

So the problem has two pretty straight forward cases.

Option 1, the circle is fully connected.  Since each node has exactly one out going edge, this implies a cycle of some length.  So step one is to find the largest cycle.  That’s one possible answer.  Having found all cycles we’ll throw away them or any nodes that are connected to them if the cycle length is greater than 2.

All of the remaining nodes form pairs of trees pointing to a cycle of length 2.  Option 2 is to take the longest path to each side of that cycle, sum those together, then sum across all such connected components, this is the alternative answer.  This represents taking all the longest fully connected lines and placing them into a circle. Return the largest of Option 1 and Option 2.

Depth first search will identify all the connected components in linear time, it will also identify the longest cycle in linear time.  A second depth first search on each connected component with cycle length of 2 will find the longest path to the cycle in linear time too if you keep track of path depth and destination to avoid recalculation when your depth first search finds somewhere you’ve searched before when starting from a new possible deepest node.  Or reverse the direction of the arrows and just do a standard height of tree calculation.