So, I think everyone qualified for round 3 was eligible to do the practice round. 202 positive scores, 14 perfect scores. I hope everyone who intends to participate in the distributed online round had a go in the practice round, as the distributed format is new and quite different – I think people who spent a good long time on the practice round will have a big advantage.
Even with 50 hours to think things over, and a testrun feature which lets you check if your solution is too slow on the large input, for test cases of your own design, pass rates for large input were quite low. 2 of the quests had less than 50% pass rates.
The top 14 perfect scores was 12 c++ and 2 java. No python. Small sample size, but given the ratio of language use in the group that advanced to round 3, its absence is slightly suspicious.
Q1) Find the largest prefix/postfix sum for a list of numbers if the prefix/postfix are not allowed to overlap. 5×10^8 members in the list.
A1) On a single machine this problem boils down to what is the largest drop. Once you’ve found the largest drop (by calculating running sum and keeping track of the previous maxima vs the current value) the answer is the sum of all numbers – the largest drop (which will be negative, giving you a bigger total).
Splitting this to multiple machines isn’t too difficult. Give each machine a subsequence of approximately equal length, calculate the minima, the maxima, the ‘local’ biggest drop and the total. These can all be calculated in a single pass of O(N) time. Send all these values back to the primary machine and exit.
The global biggest drop is either the largest of the local biggest drops, or it is the difference between one of the maxima’s and a subsequent minima. First the maxima’s and minima’s need to be corrected since they are local, so add the running total of the previous totals to each local minima and maxima. Then calculate the biggest drop, and finally subtract that from the overall total of totals. These steps are O(Nodes) so will easily run in time. Remember to use 64 bit integers everywhere, as the numbers can sum to 5*10^17.
Q2) Calculate who, if anyone, is a majority in a vote. 10^9 voters and potentially 10^9 candidates.
A2) So I have 2 approaches for this problem, which have different scaling functions, but are surprisingly similar in run time for the problem size.
The ‘cool’ solution is to count the frequency of each of the 32 bits in the candidate vote identifier over all votes. If there is a majority, the set bits of the majority candidate will have over half the vote, and the unset bits will not. So simply send the totals to central server and accumulate and compare to number of votes. This lets you create the bit pattern of the majority, if there is a majority. Send this majority holder back out to everyone, who then counts the frequency of exactly that candidate. Send these new counts to central server, accumulate and check if its a majority.
This solution is linear, but requires two passes. Given how long it takes to call GetVote you will lose 500ms of your time just doing that if you don’t store the values, and if you do the store the values you should be careful you don’t run low on memory, but it will save you 250ms… Each value also has 32 bit extract’s to perform to potentially increment the 32 counters.
The alternative solution is to sort the segments you assign to each server. This is O(N log N), but with at most 10million entries, the log N is only a factor of 24. You must be sure your sort function is in place, because otherwise you will run out of memory. Having sorted them you take every entry which got more than 20% of the vote (which is easily found now that they are sorted). There are at most 4 of these, send their identifiers to central server. Also send the size of the threshold you used to select them, which is 20% of the number assigned to this node.
The central server now has up to 400 candidates, which it can easily total and sort. Any total which when added to the sum of thresholds sent back (which should be close to 20% of the overall, but might be slightly off due to rounding) gets a majority is a candidate. There is at most 3 of these as they each have to have at least 30% of the vote. Send these back and do a second pass counting totals for each of the 3. Finally the majority holder either appears, or does not.
This solution has the advantage that rather than doing 32 operations per entry to start and 1 more to find the majority candidate total, it does 24 (for the sort), 1 to count run lengths and then 3 more for the totals. In practice there is more overhead, sorting operations are compare and swap where as bit extract and increment is cheaper – memory locality of the operations is also much better in the first solution. It also has the advantage of potentially not doing the second pass. If you get a straight majority or none get to 30%, the second phase can be skipped. This bonus has an interesting correlation – it is more difficult to construct a fast running input which has slow sorting time and also has 30% of votes going to one person.
Q3) Find the shortest distance between two specific nodes in a circular doubly linked list. 10^9 nodes.
A3) So the single machine version is you just walk left from start until you find the other target, compare the number of steps to the total number of nodes and you are done. The distributed version is a bit more awkward… you need to break the input set into sections, but given the nodes are in random order, how do you do such a thing?
One option is to select evenly spaced nodes and call your section done if you ever see another node which is x mod k. This is however fragile, its not hard to construct inputs which means regardless of your choice of x and k, one of your nodes is going to do a lot more work than the rest.
Solution is to use random nodes instead, that keeps you safer. To be extra safe, choose 10 random nodes per server, that way if one of its segments is double length, that only adds 10% extra running time. However you now have to compare every value you get while walking against 1k potential terminals. This is way too slow. One option is to sort the selected nodes and binary search. This has potential to run fast enough. Another option is to not use truly randomly selected nodes. Instead divide the nodes into groups of k, and for each group select a random terminal position. Then you can check if you are terminal by checking node value mod k vs random_nodes[nod value / k] mod k.
Potential little tricks for avoiding overhead. Assume each machine runs the same version of libc – rather than communicating the random node selection from central server to each server, use a hard coded seed to rand and calculate it on each server.
In any case, each node returns the length of its segment, the start, stop nodes, and whether it saw either or both of the start and end nodes. The central server can then easily and quickly construct the path from start to end node.
Q4) Determine whether it is possible to partition a set of numbers into two equal halves. 52 numbers, each of which have a large range of possible values.
A4) Unlike all the other questions, this doesn’t bombard us with lots of data. But the question is NP-Complete, so it doesn’t really have to. Wikipedia (see subset-sum problem) tells us that it can be solved in O(2^(N/2)) time, which is actually almost really close to fast enough for a single machine – but it would use more than the allotted memory limit.
The simple fix is to use 64 of the 100 machines, and tell each to ‘fix’ 6 of the numbers as either left or right, then run subset sum on the remaining numbers with an adjusted target based on the ‘fixed’ values. This saves a factor of 8 in memory usage and running time, which is just about enough. Each machine returns whether it finds the partition, and central server thus has the answer.
This only has Sqrt(Nodes) scaling, unlike the other problems which each have linear scaling – but getting linear scaling for this problem seems likely to be quite a bit more difficult…