## Distributed Code Jam 2016 R2

15 to the world finals and at the moment the top 12 solved all the problems. To get through required a good time and only failing one of the larges, and not the one worth the most points.

I think I could have pretty easily gotten a top 200 placing, with the 2 easiest larges and the other two smalls.  None of the problems seem ‘too’ difficult, but writing the correct solution to 3 larges in the time available is probably a stretch for me.

Q1) Given some inefficient code, write equivalent but efficient code to solve the problem.

A1) Like last time the detail is of course in the code given.  This time it was a bit more deceptive.  It appeared to be calculating as a sum the all pair product between a set A and a set B, but subtly the central server doesn’t do any calculating, so its actually the all pair product where the pairs don’t sum to 0 mod 20.

The trick here is to realise the all pair product is sum of all elements in set A times the sum of all elements in set B.  Removing the pairs which sum to 0 mod 20 is a bit more tricky, but its the product of the sum of the 0 mod 20 in set A and set B plus the product of the sum of the items which are 1 mod 20 in set A and the items which are 19 mod 20 in set B, and so on.

To simplify the code slightly rather than calculating the total and the removing, you can just calculate the sum of items that are x mod 20 in set A and y mod 20 in set B and sum all pair product where x+y mod 20 != 0

All that remains is to solve the problem in a distributed scenario where there are 100 million elements in each of set A and set B.  Summing things is easily distributed, so each node creates to sum tables, one for each of set A and set B with an entry for each mod 20.  Then it processes its chunk of the two inputs calculating the mod and summing it into the correct chunk.  Once done, send both tables to the central server, which sums all the tables together and then performs the final pair product step.

One thing is that the answer can easily be too big for 64 bit, so the problem actually wants it modulo 1 billion and 7.  As with many problems with this requirement, its important to remember to apply the modulo after every calculation.  And even though the modulo and all individual inputs are valid for 32 bit, calculating products will have intermediary values which need 64 bit precision.

Q2) Determine the longest prefix of a sequence of brackets which can be extended in to a balanced sequence.  Return -1 if the entire sequence is already a balanced sequence.

A2) So the question is worded a bit more vaguely than above, but if you’ve had some exposure to bracket systems you’ll know that the description is just equivalent to any balanced sequence of brackets.  At first it might seem this problem will be difficult to do distributed because each segment’s validity clearly depends on the one before, but one of the details of a balanced sequence of brackets is that any sequence of brackets can be made balanced, so long as the running sum of open (+1) and closed (-1) never goes negative.  Therefore each segment just depends on the running sum of the previous segments.

So each node calculates its total for the local running sum, sends them to the central server which accumulates those running sums and sends back the correct starting total for each node, to that node.  Now each node can check whether the true running sum goes negative at any point.  If it does, it sends that position to central server, otherwise it sends -1.  The central server then returns the first non-negative result.  Otherwise it checks if the grand total is 0, if so it returns -1, otherwise it returns the length of the full data set.  (This last conditional I think is probably the easiest thing to miss in this problem…)

Since the data is only 1 character per value, memory use isn’t a high risk, and the cost per read of value is low enough that you could probably read each value assigned to the node twice without running out of time if you don’t want to read in to a local array.  The usual gotchas around splitting data amongst the nodes apply.

Q3) Given a game where you can move left or right 1 unit or stay still in any given turn, and the rest of the game map moves down one unit after your move.  Determine the maximum number of points you can get given any choice of starting locations.  Each non-wall containing cell is worth between 0 and 9 points which you earn just by entering it.

A3)  The small input can be easily done on a single node, and the solution is a dynamic program on the maximum number of points you can get in total once you arrive at position x.  From each row from bottom to top you try each non-wall position, then there are 3 possibilities, if the position in front isn’t a wall max its current value with the sum of the local current value and the value of that cell.  Then if the right and right forward diagonal aren’t walls, similar with the right diagonal cell, but also add in the value of the right cell.  Similarly with left diagonal.  This is seeded by the value of each non-wall position in the bottom row, and the final result is the maximum value in the row above the top of the input.  This DP is O(1) per cell, so can be completed very quickly in the small input scenario where the maximum size is 1000 by 1000.  The large input of 30000 by 30000 would be almost processed in time by a single node, if it wasn’t for the memory limits and the time to read the raw input.

So the question becomes, how to break the problem up to pass the large input across multiple servers.  Horizontal stripes are a bad idea, because the mathematical operation is a max of sums and it mixes values from a lot of locations so it doesn’t distribute well trying to chain it with results that arrive after you’ve started.  So the solution is vertical stripes.  However the edges of each row of a stripe are dependent on one cell to the right and left of the stripe of  the row below.  If each server was to communicate its edges with every row, that would be 30 thousand messages, which given the estimated 5ms latency per message is far far far too long to run in time.

In order to send less messages each node needs to do more work.  Because the game is limited to one move left or right per row, if we want to only send a message every 300 rows (which brings in a total of half a second expected latency instead of 150 seconds) we need to start with an extra 600 width on our base section, then 598, 596…   This doubles the number of values which has to be read by each server if they would usually be dealing with a 300 wide stripe (and doubles the number of DP calculations, but the reading of the input dominates).

In order to continue after we’ve done 300 rows, we need to share the correct values for the full set of 300 neighbours on each side – which we can get from each of the neighbouring servers if we’ve set it up right…  Once we’ve got that information we can continue for another 300 rows.

So, does this solution scale well enough?  Estimated latency waiting for network is 500ms.  Time to ready the values for the stripe and half of each neighbour is 30000*600*0.05us = 0.9 seconds.  Actual computation time is much lower.  So 5s should be plenty of time.  Memory usage is 18MB minimum, but quadruple that if you use 16bit chars and a resizing array without calling reserve.  Still fits.  In fact it fits well enough you might as well make your life simpler and just use a full 3 widths allocation even though you’ll only use half of each of the outside areas.

Each message sent is 1200 bytes, you send up to 100 messages to 2 servers, this isn’t even a whole megabyte…  And the 100 messages is well below the 1000 limit.

Gotchas:

1. Using your normal logic for distributing the columns to servers can result in server with much less than 300 width.  If your code mistakenly decides to send in height chunks equal to the width allocated to the server, you can easily become too message spammy…   If you don’t, you need to start requesting results from more than one server over, and again you become too spammy. You want to explicitly allocate exactly 300 main columns to each server to make this simpler.  If you don’t use all the servers, that’s okay, it will still easily run in time.  More generically (if you don’t like writing your code to the specific problem constraints…) you should divide it in to equal chunks of width depending on the longest of the two dimensions.
2. One server has a different width, so its neighbouring server gets complicated…  Solution is to pretend that the total width is actually exactly a multiple of 300, and fill the extra columns with walls.  Makes the code for reading values in to the array slightly more difficult, but makes everything else much simpler…

Q4) Determine the cheapest way to get from a to b if there is a petrol station every km and you use one litre per km, and have a maximum fuel tank size T.  Each petrol station has its own price per litre.

A4) So I struggled for a while just to get the logic right for the small input, but eventually I realised its actually quite simple.  For each km, the cost you paid for that km, is the cheapest of the previous T petrol stations.  So if you keep a running minimum you just add the minimum value for each node as you go.

As I mentioned for the first problem, the sum operation distributes well.  A running min on the other hand…

So, the solution is all about calculating a running min.  If T is small, this is easy, just do it locally.  More specifically, if T is less than the width you allocate to each node, the problem might as well be solved locally, each node potentially doubles the number of values it has to read in, but that is still easily managed against the time and memory limits with some care.  Each node calculates its sum and sends it to the central server which creates the grand total.

However there is no such limit on T, it could be very large…  However, if T is larger than the width you allocate to each machine, you don’t have to do a running min on that machine, its just a simple min of the values seen so far.  Any value added is never removed.  However you also need to consider the minimum of some of the values allocated to other nodes, and that you do have to remove values from.  But before we get into that detail.. Consider if T is larger than double the width.  In that case there is an entire server’s worth of values which will always be in the running min for any given server other than the first.  Here we find the opportunity to calculate something distributed which will help the other nodes.

So the first step is for each node to find its minimum price of its allocated section, and broadcast that to every node to the right of it.

So, now for each node the running min consists of 3 (or 4) values of which we take the minimum.

1. The minimum seen so far in this chunk that we are calculating the sum for.
2. The minimum of the previous servers which are always completely covered by the fuel tank size T.
3. The minimum of the previous server which is currently completely covered (but won’t be later once we’ve advanced more).
4. The minimum of the partial server section needed to fill out T.

The first two of these are easy, the third is easy if its present.  The fourth presents a bit of a challenge.  I propose that the simplest way is for the local server to read the values allocated to that partial server, in reverse order from the end of that server allocation.  As it finds a new local minimum, it records it and its position.  Then as you advance pointer you check if the current far minimum is still valid, if not, pop it off the stack and use the one underneath.  Worst case this stack contains the entire contents of a servers allocation, at 8 bytes a pop, 4 bytes for the minimum (since they all fit under 1 billion) and 4 bytes for the index.

So, does it scale well enough.  Well the worst case tank size is when it causes some servers to create the stack for almost one entire server, then again for another server because advancing the width just drops over into the next server, and in order to calculate that minimum you have to process the entire server’s width.  The processing time itself is cheap, the memory allocations will fit so long as you reuse your reserved size stack when switching servers at the boundary… (Should be under 80MB if you do it right.) Network latency is not significant…  The big concern is the data reading time.  5 million positions allocated per server, reading time 0.75 seconds just for its own.  Triple that and its getting close to half of the allowed time limit just reading the data.

So, it might be good enough… but can we do better?  The answer is kind of yes.  At the beginning we calculated a single min per server and sent that around.  If instead we calculate 100 mins per server, we don’t make calculating the global minimum for the trivially covered section much slower, but it reduces the size of the worst case stack by a factor of 100.  On the other hand it increases the worst case number of stacks we have to create from 2 to 101, but because those stacks are all 1/100th the size, we just reduced the overhead from 2 extra servers worth of reading to 1.01 extra servers worth.  This saves you almost 0.75 seconds of reading time in the worst case.  The reduction to the worst case stack size also makes it much much easier to stay under the memory limit.  It increases the network traffic size by a factor of 100, but its still under 100KB sent per node.

Gotchas:

1. I said it was easy for the case where T is small – but for T around the size of 1 width its actually a bit tricky.  Doing a running min the natural data structure is a multiset (an ordered set which allows repetition), but a multiset with 5 million entries can have an an unexpectedly large memory cost, well beyond the 40MB you might expect.  Without adequate knowledge of the implementation details of the multiset included in your programming language, you definitely run risk of running out of ram…  This can be solved by the same extension proposed to reduce the read worst size in the very large T case.  Using the locally calculated 100 mins, the size of T where you need to do a true running min is reduced to only 50thousand elements, which a multiset will easily fit in to memory for.

## GCJ 2016 R3

Unlike Round 2 where I think I would have struggled to make the top 500, this round I think I might have done much better if I had been competing. Possibly even top 120.

Advancing to the world finals was definitely beyond me, that would have required solving the large of the problem worth the most points as well as the other problems I consider within my reach, which I struggle to even comprehend how the solution verifier could work…

Q1) Given a sequence of moods which you can either make a request or submission against, and the constraint that you can only submit your last unsubmitted request, determine the maximum score you can get if you get 10 points for requesting in the same mood as a submission, and 5 points otherwise.

A1) The actual problem describes the possibility to get 0 points, but that would require you to request the wrong thing for the current mood and then submit it later also against the wrong mood.  Any such scenario can trivially be improved by changing your request to not be the wrong thing, since the later submission would then also not be the wrong match, so you get 10 more points.

So, this problem is a bit deceptive, but a little playing around with scenarios should give you a good guess that greedy is the way forward.  If the input sequence of moods has equal two in a row, you’ll do a request submit to get the 10 points.  Having removed those, you might now have moods which are neighbouring and equal, so you can align those up and get 10 points there as well.  Keep repeating until there is no pairings left.  The remains is a simply alternating sequence, there is no way to get 10 points in a simply alternating sequence regardless of the moves you perform.  So just take each pair as it comes and get 5 points each.

Having discovered this greedy solution, the only problem remaining is that the large is up to 20k, so an O(N^2) solution is going to be too slow…

As it turns out, its possible to do in linear time.  The simplest approach I have for doing this is a bit unusual.  It uses a linked list!

Convert the input in to a linked list, then while you have a current node and and a next node, if they are equal, remove those nodes and leave current point to either just before current or just after next if there is nothing before current.  If they aren’t equal, advance one.  Every time you remove a pair add 5 points to your total.  Then add 5 points for half the size of the original input.

A more complicated approach which doesn’t technically require a linked list, would instead involve an array of starts.  When you find a pair you repeat outwards to find the largest simple chunk which can be matched up from the inside out.  Then you set the start array for the last member of that chunk to point to the first.  Then you move on.  Whenever you finish creating a chunk, check if its left neighbour has a start value.  If so use that value to conceptually O(1) merge this chunk with its neighbour and start considering whether the ones outside that can be made in to a pair.  If so, keep going and write a start value once you get stuck again.  This basically simulates the process of the linked list algorithm…

Q2) Determine the fraction of strings which contain certain words where the strings are generated by all the possible ways to select nodes of a forest such that you only ever select a child after its parent and each node has a single character label.

A2) This question was probably the one I was least likely to get out even though I could solve it.  It just looks too hard…

In practice you just need a small number of scenarios to convince yourself that a simple answer is in fact correct.

Because the required accuracy is only 3 parts in 100, randomly generating 10000 of the strings should be plenty.  (The exact detail escapes me, but I seem to recall that if you require an accuracy of 1 part in x for something that has a specific random chance, you need to run x^2 simulations.)

Given the input size is only up to 100, O(N^2) should be ‘okay’ to generate 10k strings if you are quick about it.  The trick is just how to ensure that your randomly selected strings are representative.

To work this out, consider a simple case where one node is on its own, and 10 others are in a chain.  There are 11 possible outputs, for each of the different possible locations of inserting the one node in the chain.  If you were to select with equal likelihood from the available options at any point, half of your generate strings will start with the label of the one node on its own, when it should only be ~9%.  The next option I considered was randomly selecting a node and adding it and all of its parents.  However if we do that the probability of the one node on its own being the last value is far higher than 9%.

The third option I considered was, weighting each available node to select from, by additionally including the count of all of its children.  This means the first selection is 10 parts the chain, and 1 part the node on its own.  Which gets us the correct percentage.  Following all the way through the options shows it generates things correctly for this scenario.  Which is promising.  We already know it also generates the right values for a forest entirely of single nodes.  So that is two data points in its favour…  I tried one more scenario to convince myself.  A single node and a simple 3 node tree.  The tree has 2 orders, and the single node has 4 insertion points, this gives 8 possible scenarios.  Of those 8 scenarios 2 start with the single node.  Again the weighted selection works 3:1 corresponds to 8:2.  And walking through the rest of the scenarios shows the percentages work out.

Good thing about this question is that its just a high valued small input, so if this weighting thing is wrong, we can find out pretty quickly…  but its just fine as it happens.

So calculate the weights of the tree, then add the roots, randomly select based on total weight, remove that node, add its children, repeat.  This is O(N^2).

A slightly nicer to implement solution is to recognise that the weights are effectively a place holder for allowing random selection of any node and then putting the deepest not yet placed parent of that node instead of the selected node.  This way you don’t have to do any tree constructing.  Just have a boolean array representing what has been selected so far, generate a random number the size of the remnants and walk to find the nth not selected item.  Then walk its parents while they exist and aren’t selected.  Finally select that node and then repeat the process until all nodes are selected.

Q3) Determine the smallest maximum jump size to get from one asteroid to a second asteroid, if there are a bunch of asteroids moving at linear speeds and you can jump between any two at any moment so long as you don’t stay anywhere more than S seconds.

A3) This problem reminds me of another I’ve done before, but that was in 1 dimension, this is in three…

However, the small input is easy, since all the linear speeds are zero.  Therefore there is no point to waiting anywhere.  Therefore it just becomes a question of for a given maximum jump size x, is there a path between asteroid 0 and asteroid 1 then solving for the smallest maximum jump size x.  Once a maximum jump size is set, the problem reduces to a simple search (DFS is one option).  To solve for smallest, you can do a binary search.

Contest analysis isn’t up yet, and I’ve not ready anyone else’s solutions, so I don’t know how to do the large.  I suspect it has a similar structure.  Once you have a set maximum jump size you can determine the time ranges that a given edge is open or closed.  This appears to cause a combinatoric explosion if you clone each node based time range that a given combination of edges are open/closed with edges between the clones depending on the size of S, but maybe you don’t need to, you can instead just have a separate clone per status of a single edge, connected to clones representing different arrival time ranges and the separate arrival and departure clones are connected depending on the times and size of S…  I don’t quite think it works though.

Q4) Given a simple single bit single register computer and two programs that run simultaneously made of three atomic instructions, set 0, set 1 and print register, determine a program pair which can potentially print any of the ‘good’ values, but never prints the ‘bad’ value.

A4) Again I don’t know how to solve the large (at least I don’t think I do…), but the small is surprisingly trivial.

For the small input the bad value is always a sequence of ones.  So as long as the bad value isn’t also a good value (which is something to just check for in general…) one option is to write a program which can print any sequence of X digits except for all ones.  One way to do this, is to have one program that continuously sets the register to 0, then prints the register, x times.  Then the second program consists of exactly X-1 calls to set the register to 1.  Because of arbitrary interleaving, this program pair can print anything except for X 1’s.  Since the first program always sets to 0 before printing, there would need to be an interleave before every print, but there are only X-1 instructions from the second program to interleave.

This is suggestive of possible approaches for the large, but the open question in my mind is whether they cover all possible inputs correctly, or just a subset.  For instance an input where the bad value is all zeros, can be solved using an inverted version of the program.  If the bad value has one one digit, and all the good values have either more or all less than one one digit similar constructions work.  In general I can solve if the number of set bits is either strictly greater or strictly less than the bad value, but not otherwise.  For specific cases if there is only one good value, I can solve that too…  If there are multiple good values and they have x set bits in common, but the bad value ‘further away’ from the commonality then the good values or doesn’t have all of those values set. That can be solved too.  Or inverted for x unset bits in common.

In general the things I can solve are where one program does all the printing, and the second program is a sequence of ones or zeros of some length.  Its not clear to me that such a program pair is always the solution if there is such a solution, but it doesn’t seem unlikely…

## Distributed Code Jam 2016 Round 1

So this year, the structure is a bit different 2 rounds before the grand final for distributed and more people eligible to compete in DCJ.

As usual I’ve been busy, so Round 2 is almost about to happen, I figured it was time to write up my analysis of Round 1.

Unlike round 2 of the main code jam, I think I would have had a chance of advancing to round 2 of DCJ this year if I was competing.  I usually say there is practically no chance of me writing up 4 solutions in 3 hours, but this round the questions seemed pretty easy (except for the large of the last one), so I think there would have been a real chance of me getting a top 200 placing.

Q1) Given some inefficient source code, write an equivalent, but efficient version.

A1) So the detail is of course in the provided source code.  It is of course a bit convoluted, but the net result is its calculating the largest difference between any two values in the input.

Of course the best way to do that is to calculate the global maximum and global minimum and take their difference.  The large input there is too much data to do this all in one node (of course), but if you break the input in to approximately equal chunks per server, each server calculates min/max – sends to central server which calculates the min of mins and max of maxs.  Then the difference of those two is the largest difference and that central server can print it out.

I suspect the biggest chance of failing this question is in breaking up the input incorrectly.  Its a very common task in DCJ, you always want to keep the code around to do this, you need to handle cases where there is less input then there are nodes, or if you try to allocate equal size to each node that some nodes won’t get any.  If you want to get tricky and try dividing up the input by modulo rather than blocks (which can seem easier) remember to write your loop using += modulus rather than checking every number against the mod.  1 billion divisions is going to time out…

Another risky move would be to read all the values you need into memory rather than just calculating the min/max on the stream of values as you read them in.  This should only take 80MB of ram per node in the worst case, but if you are using a doubling resizeable data structure like vector, you will be very likely to hit 128MB instead, maybe even higher…  Reserve the required space to avoid this.

Q2) Determine who wins a giant rock paper scissors competition if everyone always plays a specific move, and in case of draws the person with lower id wins.

A2) The large input size is only 2^28 unlike the 1 billion for the first problem, but its still easily large enough for you to time out if you tried to read all of the values on a single node.  Breaking the input up in to approximately equal chunks across all servers is not a good strategy this time, a much better option is to break it up into a number of exactly equal chunks which is the largest power of 2 which is less than the number of servers.  This means each server is running its own isolated competition with a single winner.  Each single winner can then be passed to the central server and it can continue from there.

This one doesn’t seem very risky so long as you break your input up by powers of two.  Only 4 million characters, you can read them all in to memory, and even not bother reserving, or reusing vectors as you run the contest (even if you happen to be using 16bit characters…).  Run time should be pretty fast too.

Q3) Determine the minimum number of moves to restructure a bunch of piles of crates to have the specified almost uniform height across all piles. (All equal, or one higher for some of the earlier stacks if the total isn’t exactly dividable amongst the number of stacks.

A3) Definite step up in difficulty here.  The contest analysis describes one approach, I’m going to describe a slightly different one which I think is a bit simpler logically, although its a bit more difficult to code to be fast enough…

First phase we need to know how high each stack should be.  To know this we need to know how many crates there are in total.  So every node takes a chunk and sums it up and sends to central server which sums the sums, and then rebroadcasts that grand total back to everyone…

The core trick the problem is that because the crates can only be moved one at a time to a neighbouring stack, the problem can be solved left to right one stack at a time.  First stack knows how high it should be, how high it actually is.  If its too high, k moves to move the excess to the next.  If its too low, it needs to steal from its neighbours to the right, so walk right stealing until you have enough.  Steal from the closest first, and as you get further away you each steal costs multiple moves to move through all the intervening stacks.

For the distributed version each node needs to solve in isolation to be fast enough, so at some point you are going to dump a bunch of crates on to the next server, or you are going to need a bunch of crates from the next server.  For the purposes of stealing once you get the edge you just assume the other server will have moved them in to your last stack and you can count the cost of moving them from that stack to the right locations.

So, since each server will need to know the flow at the serve edge to calculate the correct value, the central server might as well calculate that while its calculating the grand total.  Since all it needs to work that out is know what the desired height total for each server is, and the total.  Since it just worked out the grand total, the desired height total for each server isn’t much harder, with the one corner case of the server where the desired height changes.  Then going from left to right comparing the sum to the desired total it can work out the flow to the next server, and assuming the flow and next sum and compared to the next desired total it can determine the next flow after that….

It then shares the flow to server before and server after to every server.  The initial and final flows are of course 0.

Each server node then can process left to right.  First deal with the flow.  Positive flow in to this node you just add to the corresponding edge node.  Negative flow you start the standard stealing process to calculate.  Remembering to track the cost.

Once the flows are fixed you can just solve the problem locally given the desired height calculated from the grand total.

Gotchas to watch out for.

1. Stealing from neighbours can cause you to run too slow if you try to check the same set of neighbours over and over.  You need to keep track of where you have already picked clean, and start one right of that any time you need to steal to ensure that the local calculation part is linear in the size of the section assigned.
2. Even though each stack size is only up to 1 billion, the flows calculated can be in the hundreds of millions of billions.  Therefore you must use 64 bit entries.  Given each node is dealing with up to 10 million entries, this means you must reserve to avoid the risk of hitting 128MB.
3. Because of the large flows that can travel across nodes, when stealing the cost calculated from a single steal can overflow 64 bits.  Either do the product with 128bit types, or (more sensibly) ensure that you mod the number of crates being moved before you multiply by the distance moved.
4. Mod everywhere – the product I mentioned above is the biggest individual risk, but there are plenty of other ways to overflow if you don’t remember to apply the mod during the process and not just at the end.

Q4) Determine the smallest positive integer in a set which is unique.

A4) The small can be run on a single node, and is easy to write.  Just read the values in, sort them, return the first which is not equal to one of its neighbours.  Or zero if you get to the end of the array.

The large is only 35 Million, which if it wasn’t for the memory limit is almost able to be processed in the 4 second time limit…  Well, maybe it would need 7-8 seconds…

The contest analysis solution is much safer, rather than trying to do a distributed merge sort, break the problem in to chunks and then combine the results.  Unlike the other problems where the chunks are segments of the input stream, (or stripes if you like for the first problem) the appropriate chunks depends on the input.  In order for each section of input to be processed independently, we need to be sure that all duplicates of any given number are processed by the same node.  This can be achieved by a hash function.  Each node processes a segment, and after locally separating the dupes from the uniques, forwards values to other servers based on their hash.  Each server then receives a two sets of values from other servers, the uniques and the dupes.  Since it now knows all copies of a specific value are together, it can make a call as to the lowest unique which has a specific hash assigned to this server.  It sends these locally lowest definitely globally unique values to the central server, which can then calculate the global lowest.

With the relatively small input size, if you are using all 100 nodes it would seem that there is no risk of running out of ram following the general strategy described above.  But if you don’t reduce the input sent to specific servers by hash down to the two sets and instead send all the values, a single server can easily be overwhelmed by the input case where all the values are the same.  If your hash is terrible (ie mod 100), similar problems can be had.  But if your hash is decent, and you do correctly filter the data down before sending it on, it should be pretty safe.

## GCJ16 R2

Busy as usual, so this analysis is quite late.  I’m not sure I would have advanced if I was competing in this round, the top 500 cutoff was both the first two questions and one of the other smalls.  The first two problems weren’t especially challenging, but writing correct solutions to 3 problems (even if one is a small) in the time available is not really something I am reliable at.

Top 1000 for the t-shirt was much more managable, the first two problems, or even failing one of those larges you pass if you have one of the other smalls as a backup.

Q1) Determine the minimal ordering for a set of rock, paper, scissors players such that there will be no draws in the resulting tournament.  Or return impossible if such a scenario doesn’t exist.

A1) If you solve the small version by brute force you can see that for any given total number of players, there are only 3 possible scenarios in terms of the number of players who like each of rock, paper or scissors.   This makes a lot of sense once you try and build the problem from the end result.  The tournament either ends in a winner who plays R or P or S.  For each you know that they had to have played the type which they can beat.  This continues until you get to the right total number of players.  The trick is to determine the minimal ordering.  A mistake I made was to assume that if when you are generating the set of players you always choose to generate the next level in alphabetical order where possible, the final result will be optimal.  If you don’t have sharp eyes you might miss that this doesn’t generate the same result as your brute force solution.  Moral of the story, use an actual diff to compare the output of your brute force and optimal solutions…

An example of how this is wrong is if you start with R – next is RS, third tier is RSPS, but the ideal is PSRS.  The solution is to perform a more general optimization step.  Rather than sorting just the neighbours, sort the pairs, and then quadruples, the octuples, etc…  You might as well just do this at the final stage, unless you’ve decided to pre-generate all the answers for all the different lengths, which is easily managed for the given constraints.  Such a sort maintains the tree structure of the competition, but (fairly obviously) minimizes the ordering.

Q2) Given a set of probabilities of voting yes or no, determine a subset of size K which maximises the chance of a tie.

A2) I think this question is actually quite tricky, but if you do the brute force for the small input you can pretty easily see a pattern.  The extremes of the probability spectrum are the best choices.  Having made that observation you might try a few options, like greedily adding either the highest two, or the lowest two, or one high and one low, whichever is best.  This doesn’t work.  What does work is trying every combination of size K which has the N minimum values and K-N maximum values.  I don’t really see how you could come to this solution except by observation  My intuition mistakenly suggests it should be the middle ones.

However that isn’t the whole problem – for a given set of K probabilities, you need to determine the probability of a tie.  Even for the small input, brute forcing this is unreasonable.  However, this sub-problem is a straight forward dynamic programming problem.  Given x people voting yes or no, there is between 0 and x yes votes.  So you can have a dynamic program on the probability of the first x people having voted y yes votes.  Start with 0 people having 0 votes as probability one.  Then use f(x,y) = f(x-1, y)*P(x votes no) + f(x-1, y-1)*P(x votes yes) to fill in the half triangle of the K by K grid of scenarios and return f(K, K/2).

Q3) Given a grid which is to contain a hedge maze constructed of diagonal walls, determine if (and if so how) to construct the hedge such that there is a specific set of paths between edges where these paths never join or cross.

A3) The first observation is that the diagonal hedge maze structure can only create paths that are linear, so you don’t have to worry about the joining or crossing part.  You just need to concern yourself with the paths getting to the right edge destination for each possible starting edge location.

The small brute force is small enough you can construct every possible hedge maze and then trace the paths.  The main difficulty with this O(2^(R*C) * R * C) solution is tracing the paths.  One option (which I investigated) is to store current location as cell x,y and whether its the top or left edge of that cell, with virtual cells beyond the edges to do the bottom edges and right edges of the bottom and right most cells.  Then given that edge and diagonals in the neighbouring cells, determine the two possible new locations.  Rule out the one which is either outside or the same as the previous location you were at.  This works, but its far more cumbersome than the method suggested in the analysis, which is to store position as cell x,y and which of the 4 directions you are entering this cell from.  By increasing the size of the third component from 2 to 4 options, you get rid of the need to possibly generate two locations, or track where your previous position was.

As an aside, these kind of grid problems which work with edges are something I feel I could do something nice about in TMD.Algo – hopefully when I get some time I’ll make such a convenience class.

The large can’t be brute-forced, but some iterative thinking comes to the conclusion that it can be solved greedily.  If two neighbouring edge locations need to join, the minimum number of cells their path can go through is those exactly neighbouring them.  And such a path can always be constructed.  If you want to connect things which are further apart, the optimal solution is to hug the wall of the paths of things closer together which have already been connected.  This wall hugging is clearly in some sense minimal. Its compactness means it maximises the region remaining for other paths to be constructed through.  Thus the conclusion that this greedy wall following will either work or the problem isn’t solvable.

I feel that writing the code to add walls to create the wall following is something I would have found awkward and difficult under a time constraint, especially with my x,y,(left or top) position notation.  However the contest analysis makes the excellent point that the wall following is either flowing through a cell which already has a wall, or you want to turn left if you are going between two locations that have shortest length in the clockwise direction.  With the x,y, (direction of entry) formulation, its very easy to determine what wall you have to add in order to turn left.

Q4) Given a set of people who have a different skill set, determine the minimum number of additional skills to give to some of these people to ensure that regardless of order of arrival and selection of tasks to work on (from those not already taken), everyone ends up with something to work on. (Number of tasks and their corresponding required skills is equal to the size of the set of people.)

A4) So this question deceived me in to think it was easier than it actually is.  Some reasoning about the problem comes to the conclusion that if two people know the same skill, they are ‘connected’, and if a group of people are all somehow connected, they must be fully connected, or there exists some arrival order which is wrong.  From that I jumped to a completely incorrect solution of computing the connected components of people and just summing the minimum number of skills needed to make those connected components fully connected.  This doesn’t even work for the samples included…  Since it doesn’t necessarily ensure that everyone knows anything at all.

The small input size is only 4 people, so there is only 2^16 possible skill set scenarios.  The criterion for whether a scenario is valid is that for each column there must be at least one bit set, for each row as well.  And if there are more than one bit set in a row, those columns must be identical and vice versa.  Choose the ones which are a superset of the starting details, and find the one with the minimal set of differences.  That minimum is the answer.

The large input is far from being susceptible to brute force.  2^50 skill set scenarios.  However, greedy approaches like my first attempt are equally problematic.  It comes down to avoiding the full 2^50 explosion while considering every possibility.

The first key observation which is given by the contest analysis is that a connected component which has p people who know q skills, it doesn’t matter how they are connected or what the specific labels of those people or skills are.  The final answer is groups of x people knowing x skills with x^2 edges, the sums of these x^2’s minus the original number of edges needs to be minimised, but that answer isn’t affected by any of those details.

The second key observation is that a connected component that has p people knowing q skills is further interchangeable with any other component with the same number of people and skills.

So a given scenario consists of counts of distinct pairs.  The number new of scenarios that can be generated is quadratic in the number of distinct pairs, but it will always have one less total number of pairs.  So if we consider every possible scenario generated so far of total number of pairs k, use it to generate new scenarios with k-1 pairs, and just keep generating from the distinct results.  If you see a scenario with only balanced values, calculate the sum of squares to be a new potential minimum.  This approach is a bit different to as described in contest analysis, and may not actually run fast enough – I’ll need to check…  Writing the appropriate hash functions for a multi-set of pairs would be interesting.