GCJ12 R1B – The Missing Docs .Net

So, I didn’t have to do this round, but looking at the scores it was very competitive. Probably harder to advance than R1A. Looking at the problems I think there were all theoretically in my grasp, except for Q3 large input, I might have missed the key insight, and probably wouldn’t have been game to try a solution based on random numbers.

Q1) If each item receives 2 scores – 1 fixed, and 1 a variable fraction of the total of the fixed scores, what is the minimum fraction for each item to guarantee that regardless of how the other items receive their fractional scores, that item will not have the distinctly lowest total.

A1) I see a couple of options here. But first some analysis of the question.

If we give a specific fraction to an item, what is the worst case competition for it? Giving points to everything lower than it until they all equal the new value, and still having a non-zero fraction left. (Since that non-zero fraction can be divided amongst all equals to push them above.) So, if we have the items sorted by their fixed values, we can go from smallest to largest, skipping the current item under consideration, subtracting from the available percentage as needed until we get to an item which is above the target sum. This is O(N).

The above suggests we can use a floating point number binary search to find the threshold, giving about a 20*O(N^2) to get sufficient accuracy for all of them. This would be fast enough given the maximum constraint of N=200.

A more direct approach solve numerically. In between each step from one distinct fixed score to the next, the problem is a linear equation. If the answer for the linear equation lies outside the range, you move to the next step. Again this is trivially O(N), for each item, making O(N^2). In fact we can do a binary search on these segments, giving O(log N) for a total of O(N log N) for that component, but only if you can change the segments from one scenario to the next in O(log N) time or better as simply constructing the segments is O(N). If you construct the segments for every item, then the segments for each individual scenario is either the same with the counts above a given point reduced by 1 or two adjacent segments are joined. This later scenario can actually be ignored and treated like the first, the two linear equations will be the same, but testing them both is no significant loss. It is easy to embed the former logic into the linear equation, so we can indeed reuse the same segment list, giving us our O(N log N) running time.

Q2) Given a rectangular grid of cells where each cell has a floor and ceiling height, and a water level, and a requirement that to get from one cell to the next you can’t pass through an edge without at least a 50cm gap (either current ceiling/destination floor/water or current floor/water/destination ceiling or destination ceiling/floor) you want to get to the south east corner of the grid as fast as possible once the water starts dropping. Every second at some future point the water level will drop 10cm. It takes 1 second to move from cell to cell if there is 20cm of water above the ground in the current cell, otherwise it takes 10 seconds. Starting from wherever is currently reachable from the north west corner of the grid, what is this shortest path? (No diagonal movement.)

A2) Ahh, interesting problem. Especially since the large input makes the grid up to 100×100 in size. Suggests a shortest path algorithm, but the edge cost changes as a function of arrival time. Lets do some more thinking.

Firstly, it should be apparent that we never go back the way we came once we start. More specifically because the best travel times only ever get worse the later we arrive, there is never any benefit to being somewhere at a later point in time then the earliest arrival. This suggests shortest path will be fine, with some small modifications. Finally there are no negative travel times, so we don’t have to worry about that complexity when it comes to shortest paths.

We can reach anywhere reachable from the start at time 0, so simply depth first search to determine what is reachable and initialise a grid with 0 at those locations and maxvalue at all others. Add all reachable locations to an updateable priority queue. Now we perform an earliest first search, at each location we determine how long to reach each of its 4 neighbours relative to the earliest arrival time for that cell which will either be wait+1 second or wait + 10seconds, where wait is how long until the water drops low enough to make it passable. Wait could be 0 seconds if already passable, or infinite if it will never be passable. If the new arrival time for the destination is lower than the current value, update the priority queue.

The code to calculate the wait isn’t exactly complicated, but it does have a number of criterion and would be a likely place to make a mistake. It is also where you have to handle not leaving the edges of the grid.

This gives us the all points earliest arrival time, including the south east cell, which now contains the answer. So what is the running time? Initial search for starting points is O(N*M) or O(V) (V is total cells, or ‘vertexes’ in the graph.) From there we run an earliest first search using a priority queue. Lets assume we use the LookupHeap from TMD.Algo, this has O(log N) add/remove of any element. The number of priority queue updates/adds/removes will be at most O(E). E is the number of edges, which has an exclusive upper bound of 4V, effectively making it O(V) priority queue manipulations for a total of O(V log V). Despite V being 10k this is quite reasonable. The other component of running time the determination of wait, is easily O(1) per edge, so it is not significant.

Q3) Given a set of distinct positive integers, determine if there are 2 distinct subsets which have the same sum.

A3) Pretty simple problem statement, but the large input has 500 integers each up to 10^12 in size. Small input is more reasonable, 20 integers with 10^5 each. Only up to 10 test cases in both scenarios, so plenty of cpu time available for each test case.

Small input approach. The most(?) obvious brute force is going to be too slow, dividing into each possible set of 2 groups, then determining the intersection of every possible sum of those groups is something like 2^40. However almost as obvious an approach is to break it into 3 groups, if the sum of all elements in group 1 equals all elements in group 2, we’ve found the solution. Group 3 is ignored. This can have a running time of O(3^N). Which is quite high, but given only 10 test cases, and 4minutes to submit, it is plausible if optimised. Important part of the implementation is to not calculate the 2 sums directly for every scenario, instead, mutate them as the scenarios change, adding and subtracting. This ensures the implementation isn’t O(N*3^N) which would almost certainly be too slow.

Large input approach… hrmm. Well caching on the small input implementation gives us O(N*distinct number of sums) which doesn’t immediately give us anything, but we know the range of the sums which gives us an easy upper bound of O(N*(kN)^2) where k is the maximum value. For the large input this is a considerable improvement over the O(3^N), but still longer than life of universe type processing times expected… But it does tell us that as we get large inputs, the probability of there being a common sum approaches 1, maybe even is 1 by pigeon hole principle. Maybe this will be useful.

So I looked up an answer to get a view into this question and I found a very important detail which I overlooked. If you have 2 sets which are not equal which have the same sum, you can remove all the elements in common to get 2 new sets which are distinct and have the same sum! Now we just have to find 2 distinct sets with the same sum. This insight also allows the small input to be done in O(2^N) by using O(2^N) space as well as time. It also lets us prove that the probability of common sum is 1. 2^500 is greater than 500*10^12, so there are at least 2 different subsets which have the same sum, and hence 2 disjoint non-empty subsets with same sum by removing any common elements.

Here is where the certainty comes in, if we do random guesses, and store the sums in memory until we find a collision we can get a very high likelihood of passing tests just by making enough guesses. Birthday paradox means that despite the potential 10^14 different sums, the probability of collision is quite high with only 10million repetitions, and extremely high at 100million. (And even higher considering the difficulty of constructing a set of numbers which evenly spreads sums across the entire range.)

Another approach I saw in solutions with less random in it… If we sort the input then break it into sections of length 20, then determine every subset sum, then choose the 2 closest sums we end up with 25 sets of distinct subsets where we can use the sum difference as individual values to calculate every subset sum. If we find a collision then we can construct the subset. The idea here appears to be that we’ll end up working with a much smaller size numbers by finding the smallest difference in each section, so the probability of 0 sum is increased significantly in the second set of summations. Again this really seems to be a probabilistic approach, but apparently good enough that you don’t need to randomise the data.

Leave a Reply Cancel reply