# GCJ16 R2

Busy as usual, so this analysis is quite late.  I’m not sure I would have advanced if I was competing in this round, the top 500 cutoff was both the first two questions and one of the other smalls.  The first two problems weren’t especially challenging, but writing correct solutions to 3 problems (even if one is a small) in the time available is not really something I am reliable at.

Top 1000 for the t-shirt was much more managable, the first two problems, or even failing one of those larges you pass if you have one of the other smalls as a backup.

Q1) Determine the minimal ordering for a set of rock, paper, scissors players such that there will be no draws in the resulting tournament.  Or return impossible if such a scenario doesn’t exist.

A1) If you solve the small version by brute force you can see that for any given total number of players, there are only 3 possible scenarios in terms of the number of players who like each of rock, paper or scissors.   This makes a lot of sense once you try and build the problem from the end result.  The tournament either ends in a winner who plays R or P or S.  For each you know that they had to have played the type which they can beat.  This continues until you get to the right total number of players.  The trick is to determine the minimal ordering.  A mistake I made was to assume that if when you are generating the set of players you always choose to generate the next level in alphabetical order where possible, the final result will be optimal.  If you don’t have sharp eyes you might miss that this doesn’t generate the same result as your brute force solution.  Moral of the story, use an actual diff to compare the output of your brute force and optimal solutions…

An example of how this is wrong is if you start with R – next is RS, third tier is RSPS, but the ideal is PSRS.  The solution is to perform a more general optimization step.  Rather than sorting just the neighbours, sort the pairs, and then quadruples, the octuples, etc…  You might as well just do this at the final stage, unless you’ve decided to pre-generate all the answers for all the different lengths, which is easily managed for the given constraints.  Such a sort maintains the tree structure of the competition, but (fairly obviously) minimizes the ordering.

Q2) Given a set of probabilities of voting yes or no, determine a subset of size K which maximises the chance of a tie.

A2) I think this question is actually quite tricky, but if you do the brute force for the small input you can pretty easily see a pattern.  The extremes of the probability spectrum are the best choices.  Having made that observation you might try a few options, like greedily adding either the highest two, or the lowest two, or one high and one low, whichever is best.  This doesn’t work.  What does work is trying every combination of size K which has the N minimum values and K-N maximum values.  I don’t really see how you could come to this solution except by observation  My intuition mistakenly suggests it should be the middle ones.

However that isn’t the whole problem – for a given set of K probabilities, you need to determine the probability of a tie.  Even for the small input, brute forcing this is unreasonable.  However, this sub-problem is a straight forward dynamic programming problem.  Given x people voting yes or no, there is between 0 and x yes votes.  So you can have a dynamic program on the probability of the first x people having voted y yes votes.  Start with 0 people having 0 votes as probability one.  Then use f(x,y) = f(x-1, y)*P(x votes no) + f(x-1, y-1)*P(x votes yes) to fill in the half triangle of the K by K grid of scenarios and return f(K, K/2).

Q3) Given a grid which is to contain a hedge maze constructed of diagonal walls, determine if (and if so how) to construct the hedge such that there is a specific set of paths between edges where these paths never join or cross.

A3) The first observation is that the diagonal hedge maze structure can only create paths that are linear, so you don’t have to worry about the joining or crossing part.  You just need to concern yourself with the paths getting to the right edge destination for each possible starting edge location.

The small brute force is small enough you can construct every possible hedge maze and then trace the paths.  The main difficulty with this O(2^(R*C) * R * C) solution is tracing the paths.  One option (which I investigated) is to store current location as cell x,y and whether its the top or left edge of that cell, with virtual cells beyond the edges to do the bottom edges and right edges of the bottom and right most cells.  Then given that edge and diagonals in the neighbouring cells, determine the two possible new locations.  Rule out the one which is either outside or the same as the previous location you were at.  This works, but its far more cumbersome than the method suggested in the analysis, which is to store position as cell x,y and which of the 4 directions you are entering this cell from.  By increasing the size of the third component from 2 to 4 options, you get rid of the need to possibly generate two locations, or track where your previous position was.

As an aside, these kind of grid problems which work with edges are something I feel I could do something nice about in TMD.Algo – hopefully when I get some time I’ll make such a convenience class.

The large can’t be brute-forced, but some iterative thinking comes to the conclusion that it can be solved greedily.  If two neighbouring edge locations need to join, the minimum number of cells their path can go through is those exactly neighbouring them.  And such a path can always be constructed.  If you want to connect things which are further apart, the optimal solution is to hug the wall of the paths of things closer together which have already been connected.  This wall hugging is clearly in some sense minimal. Its compactness means it maximises the region remaining for other paths to be constructed through.  Thus the conclusion that this greedy wall following will either work or the problem isn’t solvable.

I feel that writing the code to add walls to create the wall following is something I would have found awkward and difficult under a time constraint, especially with my x,y,(left or top) position notation.  However the contest analysis makes the excellent point that the wall following is either flowing through a cell which already has a wall, or you want to turn left if you are going between two locations that have shortest length in the clockwise direction.  With the x,y, (direction of entry) formulation, its very easy to determine what wall you have to add in order to turn left.

Q4) Given a set of people who have a different skill set, determine the minimum number of additional skills to give to some of these people to ensure that regardless of order of arrival and selection of tasks to work on (from those not already taken), everyone ends up with something to work on. (Number of tasks and their corresponding required skills is equal to the size of the set of people.)

A4) So this question deceived me in to think it was easier than it actually is.  Some reasoning about the problem comes to the conclusion that if two people know the same skill, they are ‘connected’, and if a group of people are all somehow connected, they must be fully connected, or there exists some arrival order which is wrong.  From that I jumped to a completely incorrect solution of computing the connected components of people and just summing the minimum number of skills needed to make those connected components fully connected.  This doesn’t even work for the samples included…  Since it doesn’t necessarily ensure that everyone knows anything at all.

The small input size is only 4 people, so there is only 2^16 possible skill set scenarios.  The criterion for whether a scenario is valid is that for each column there must be at least one bit set, for each row as well.  And if there are more than one bit set in a row, those columns must be identical and vice versa.  Choose the ones which are a superset of the starting details, and find the one with the minimal set of differences.  That minimum is the answer.

The large input is far from being susceptible to brute force.  2^50 skill set scenarios.  However, greedy approaches like my first attempt are equally problematic.  It comes down to avoiding the full 2^50 explosion while considering every possibility.

The first key observation which is given by the contest analysis is that a connected component which has p people who know q skills, it doesn’t matter how they are connected or what the specific labels of those people or skills are.  The final answer is groups of x people knowing x skills with x^2 edges, the sums of these x^2’s minus the original number of edges needs to be minimised, but that answer isn’t affected by any of those details.

The second key observation is that a connected component that has p people knowing q skills is further interchangeable with any other component with the same number of people and skills.

So a given scenario consists of counts of distinct pairs.  The number new of scenarios that can be generated is quadratic in the number of distinct pairs, but it will always have one less total number of pairs.  So if we consider every possible scenario generated so far of total number of pairs k, use it to generate new scenarios with k-1 pairs, and just keep generating from the distinct results.  If you see a scenario with only balanced values, calculate the sum of squares to be a new potential minimum.  This approach is a bit different to as described in contest analysis, and may not actually run fast enough – I’ll need to check…  Writing the appropriate hash functions for a multi-set of pairs would be interesting.