The Missing Docs .Net – Page 5 – Finding the holes in the Net

GCJ14 Qualification Round

So, I didn’t compete in this, but plenty of people did.

20595 people qualified with the minimum 25 out of 90 points. 1737 people got 90 out of 90. ~25500 people submitted at least one solution.

Q1) Given 16 distinct numbers arranged in a 4×4 grid, and a selected row, and the numbers after a reshuffle and another selected row that was picked, determine either the one number in common, or return ‘Bad magician’ if there is more than 1, or ‘Bad volunteer’ if there is less than one.

A1) I remember learning this magic trick as a child, although it was in the context of a deck of playing cards rather than 16 numbers. But that seems neither here nor there – the code looks like it should be pretty simple either way. You don’t even have to parse the 4×4 grids into 2 dimensional arrays, since the input gives you the selected row before the grid, so you can just only read in the one row which matters. Then given the 2 4 element arrays, you can calculate the result using a pretty straight forward nested loop – or you can be lazy and just add the arrays to standard Set data types and ask it for the intersection.

Q2) If you have a base earn rate of 2 cookies per second, and buying a cookie farm costs C cookies and increases your earn rate by F cookies per second, and have a target of X cookies, what is the shortest possible time to get to your target. Earn rates and buy times are not based on integer time, can be any real number.

A2) So my first instinct looking at this question was pretty poor – I saw finding a minimum on a function which seems to fairly obviously only have a single turning point in the range of positive numbers of cookie farms bought. This naturally made me think ternary search. Which would work, but is completely pointless – because calculating the time to win for a large number of farms does almost all the same work as calculating the time to win for every smaller number of farms, so performing a ternary search is just a waste. Also the upper bound of the ternary search needs to be defined, which is not trivially obvious.

So the simple solution is to have a running total of how long it takes to build n farms, and for each one, compute how much additional time to get to the target. Once the total starts getting worse, just return the best found. The running time of this is O(B) B is the number of farms which need to be built.

The size of B is an interesting question itself. We can determine when the turning point is going to be by an equilibrium equation. X/(BF+2) = C/(BF + 2) + X/(BF+ 2 + F) For B smaller than this, it is faster to build another one, and for B greater, it is slower (for constraints of X/C/F all being positive at least…)

Solving for B:

X (BF + 2 + F - BF - 2)/(BF + 2 + F) = C
BF + 2 + F = XF/C
B = X/C - 1 - 2/F

Interesting here is how insignificant F is. This is because a high F lets you build everything quicker, not just the final target, so if the starting rate had of been F rather than 2, it would have been completely irrelevant except as a scaling factor of the total time.

Since C is >= 1 in the problem statement, O(B) is practically O(X).

However, now that we have exact number of farms to build, interest turns to whether the solution can be done in O(1). This becomes a problem of evaluating the sum of 1/(ax+b) for x = 0 to x = B. This seemed vaguely related to the Harmonic sequence sum (a=1, b=1), which I provide a reasonably fast implementation in TMD.Algo, so I suspected it was possible. Wolfram alpha told me that it was (digamma(b/a + B + 1) – digamma(b/a))/a. So if you happen to be programming in a language which provides digamma evaluation, you are set. If not, you will need to write it yourself. Wolfram alpha provides formulas for small and large X, the large X one bearing a striking resemblance to the Harmonic sequence sum I implemented for TMD.Algo. The trick here is that digamma(b/a) is digamma(2/F) which can easily be in the range of 1 to 2 which is probably its most tricky area to approximate. This is especially difficult because the digamma passes through 0 in this vicinity, so the recurrence relation digamma(x+1) = 1/x + digamma(x) is going to end up subtracting very similar numbers, resulting in significant relative error. But since that is only near 0, and if small B is handled separately the digamma(b/a + B + 1) term should be dominant enough to swamp the error. So yes, O(1) is plausible, if you are happy considering digamma as O(1) when its implementation will probably involve at least 5th order approximation functions to get sufficient places to fill a double precision float.

Q3) Given an RxC board and M mines, determine if they can be placed such there exists a location where a single click reveals all non-mine squares, using classic minesweeper rules where 0’s auto expand in a cascading fashion.

A3) This problem by far had the lowest success rates. Which is interesting because the a maximum of 5×5 it is actually plausible to brute force the small input. Try each of the 2^25 patterns, throw away the ones which don’t have M mines (to avoid implementing a combination generator), then for each of the rest, calculate the numbers and ensure all numbers are adjacent a 0 and all 0’s are connected (flood fill count vs total 0 count for instance), return the first one you find, with any one of the O’s marked as the click location. This is worst case >125million instructions, which is quite a few given the 200 test cases, but given the memory locality it should be plausible for any language without excessive overhead.

More interesting is doing the problem without brute force, as is needed for the large input. This isn’t hugely difficult, but the number of corner cases is significant. It is convenient to write a transpose function so you only have to deal with the cases of R >= C. (And remember to transpose back if needed when you are done.)

From there there are several cases which need special handling.

M = RC – 1. Here the solution is, fill in everything and overwrite one of the mines with the click location. Handling this scenario first is useful.
C = 1. Fill all the rows until you reach M, and then put click location in the last row (or anywhere not next to the mines).
C = 2. Here M must be even and RC-M must be at least 4. If those are true, then simply fill every row until you have M mines, and place click location in the last row (or again anywhere not next to the mines).

Which leaves the general case C >= 3. C = 3 itself has some specialness but it can be treated under a single code path with the rest easily enough as far as I can tell. For convenience, define L as RC-M – the left overs. If leftovers (L) is >= 3C it is almost as simple as just filling from the top, left to right. The one case which needs handling is L % C = 1, in this case the one left over on the right is not adjacent to a 0. This can be solved by moving the mine next to it, down one row and all the way to the left. This newly placed mine is fine so long as it is on the 3rd last row or earlier, hence the condition of >= 3C. In fact we can extend this condition to L >= 2C + 2, since the first problematic scenario is L = 2C + 1.

In fact, L=2C is trivial, so our first step is to fill everything except the last 2 rows. If L = 2C+ 1, we remove 3 from the 3rd last row, and put 1 in each of the first column of the last 2 rows. This pattern of switching 3 for 2 is one of the basic steps we can use from here on. However, this only makes sense if C >= 4 which brings us to how this code path can be made to work with C = 3.

So, what values of L are clearly impossible? L=1 is fine, we covered it first up. L=2 is bad, there is no way that only 2 spots can exist with either of them being a 0 (excluding C = 1 as all of this discussion here will). L=3, same. L=4, this is possible in a corner so long as C >= 2. L=5, again no way. L=6 works with an edge which allows the 3×2 empty rectangle. L=7, again impossible. L=8 has two options 4×2 on an edge or a 3×3 in a corner with the opposite corner taken out of it. So why is this important? Because if C = 3, L = 2C + 1 = 7, which is a known impossibility, so we can test for known impossible values of L before starting the general case logic.

So, this leaves handling of L < 2C and L != 2, 3, 5 or 7. Starting from having filled in all but the last 2 rows, the first operation is remove 3 from row 3 and add 2 each to the last 2 rows. This gives 2C – 1, although it only is in a valid state if C is at least 5, but again C = 3 this is 5 and C = 4 this is 7, both of which have been excluded. Now put the 3 back in the 3rd row, and remove the right most one for each of the last 2 rows. This moves us to 2C – 2, and this time it is valid for C >= 2, which is all those under consideration. Repeating the first step gets to 2C – 3, but this time only valid for C is at least 6, but again smaller C values are excluded by the L being 3 or 5 or 7. These steps can continue to be alternated, decreasing L by 1 until the desired target is reached. In every case the click target is the bottom right corner.

Q4) There are 2N distinct numbers selected from 0-1 exclusive, half the numbers are given to player 1 and half to player 2. A turn consists of player 1 announcing a number, player 2 selecting a number, and player 1 winning a point if their number is actually greater than player 2’s. Player 1 is it a disadvantage obviously, so they are considering cheating. They would cheat by first looking at all of player 2’s numbers at the start of each game, and then potentially lying when announcing the number, knowing that in this specific game the number comparison is confidential, only the result is public, not the specific values that went in, and that player 2 will be playing to maximise their score.

A4) This questions problem statement is a bit of a mouthful, you would be better to just go and read the original…

So this problem is interesting, in that what I think is the hardest part of this problem is also the easiest, and that is ‘what is the ideal strategy’ for the non-cheating version. Its the hardest in that I can’t come up with a viable proof in the time I allotted to considering it, but it is seemingly pretty easy to guess. So first off player 2’s strategy is pretty obvious. If they can win, they will play the smallest number which wins, if they can’t win, they will play their smallest number remaining. Player 1’s strategy, I don’t know, does it even matter? Just playing them in sorted order seems reasonable.

Speaking of sorted order, sorting all the numbers for each player is a good starting point. For calculating the non cheating version consider each of player 1’s numbers in order, and walk up player 2’s as needed. Once you find a player 2 value greater than player 1, player 1 loses that point and increment both indexes. Once you run out of player 2 values, player 1 wins all remaining plays.

For the cheating version first we need to identify how you can cheat. There are 2 ways, one which is more obvious to me than the other. The first is that you can increase the number you announce until it is just smaller than the opponents greatest number, to force them to beat you using their largest value, rather than potentially a smaller value. The second is that if you can beat their smallest value you can announce a value of 1 minus epsilon (since the values are all 0 to 1 exclusive, you can always announce a value higher than their actual highest value) and your opponent then plays their smallest value, which you beat, just as you stated you would. They are both important, but this second one has the most obvious consequences.

To calculate the cheating win, take the sorted lists and compare the first values. If player 1 is greater than player 2, player 1 wins this round and both indexes advance (by using cheat method 2). Otherwise player 2 wins this round, but by using cheat method 1, player 2’s largest value is the one which will be discarded, so only player 1’s index advances. Repeat until all player 1 numbers have been considered.

TCO14 R1A

So the reschedule actually happened this time. For a moment I suspected it might not as my client broke at room assignment time, had to log out and in again. I have been sick for the last 4 days, so I wasn’t looking forward to even more 4am nights if this one had been cancelled as well.

750 to advance… Turned out to be pretty easy looking questions with 750th before challenge phase being 500points, almost 200 people submitted something for all 3 problems, including myself. Been a long time since I submitted a 1000pt problem, although this one did seem pretty easy…

Challenge phase had a few successes, but not a huge number. 750th shifted slightly to ~495 points, and the number of people with all 3 still submitted was almost 140.

System testing took out another chunk, but I survived unscathed. 59th. Even assuming all 250 byes would have beaten me, that is a decent placement… Just over 100 submitted all 3 problems successfully. 750th place dropped down a tiny bit to 484 points.

Ratings took forever due to a glitch with the display of the results (seems the results were not affected much at a basic glance at least…) but I am now 1882 – which is a new personal best.

Q1) Given the ability to sort L characters and discard any characters after those sorted, return the lexicographically smallest result given an input string and any number of uses of this ability.

A1) So, I was a bit slow here, spent too long trying to decide whether the greedy approach was right. Turns out it was. So sorting the last L characters then moving one character forward, repeat until getting to the front, carries as many alphabetically early characters forwards as is possible, and since shorter strings compare earlier, truncating the rest is always a win.

Q2) Given a string S and the ability to permute it such that the index of a character doesn’t move by more than D, determine the lexicographically smallest result possible.

A2) Again greedy, but this time I was more confident. Greedy approach is to take the smallest character, at the earliest index available, to use as the new letter. Exception being if there is a letter that is still unused which is D characters earlier than the current index, in which case it can’t be put off any longer, and must be used now. I used a simple search the 2*D+1 locations for each output index, but I saw one solution which was O(N log D) rather than O(ND) by using a sorted set for the letters under consideration.

Q3) Given a set of lamps controlled by switches, which might also potentially toggle one or both of their neighbours, but always toggle their target lamp, determine the worst case minimum number of lamps that must be on. Lamps are in a line, not circular.

A3) So, the description of this question is a bit awkward, but the examples provided were surprisingly useful. They clearly showed than a sequence of YN(on off) or NY(off on) had a worst case of 1 light on. A sequence of NN or YY had a worst case of 0. So a greedy approach of trying to get as many YN, NY pairings as possible seemed sensible. Except the final example didn’t work.

I took longer to get it than most, but I did get there in the end. The missing case is ‘YYY’, This can be wired up such that at least one of the lights is always on. Seemingly a greedy approach may still work from here if written carefully, but at this point I decided enough was enough with the greedy solutions, so I wrote a basic DP where the best ‘worst case’ for the first n lamps was either that of n-1 lamps, or n-2 lamps + 1 if NY or YN or n-3 lamps + 1 if YYY.

The fact that there are no other scenarios which need considering is not exactly obvious. But I did prove to myself that YYYY worst case is 1 and YYYYY is also worst case of 1, so I was pretty confident.

GCJ 2014 Qualification Has Started

Registration is open until the qualification round is finished, but you will probably want to leave yourself some time to actually do the problems, so I suggest taking a look sooner rather than later…

TCO14 R1A failed to happen last weekend, so it was rescheduled for tonight – I’m not sure what the registration time requirements are for TCO this year, but it might not be too late…

Google Code Jam 2014 and Top Coder Open 2014

Just a quick note – GCJ14 registration is now open. Qualification round start on April 11th (in some time zones at least…) and registration is open until any time before the round ends (round is available for 25 hours).

Google is still employing me so I won’t be able to compete again this year, but the TCO algorithm schedule has finally been announced so I now know my sleepless nights in April won’t only be because of my baby daughter… Yes, yet again every round is at 2am Sydney time. Interestingly it seems GCJ and TCO schedules were not quite so neatly organized this time around, there are a couple of occasions where they are scheduled on the same day. Although one of those is the qualification round for GCJ, so people should be able to schedule around that pretty easily and the other is at least 15 hours between start times.

TCO14

Registration is open already. I think this is why I almost forgot to register for this years competition…

.Net 4.5.1 Preview

A new .Net preview is out, this time a minor point release, but I decided to do a reference assembly comparison anyway.

Absolutely nothing removed (as expected), but also only a handful of additions to the API.

Precision/Scale properties for DbParameter.
SqlConnectionStringBuilder gets connection retry count and interval properties.
EventSource (Diagnostics.Tracing) gets ConstructionException and CurrentThreadActivityId read-only properties and a couple of methods to set the current thread activity ID. I think the current thread activity ID concept may be new to Windows 8.1 – if something else I was reading is associated with this.
EventWrittenEventArgs (Diagnostics.Tracing) gets ActivityId and RelatedActivityId read-only properties.
ActiveDirectory has new enum values for 2012 R2 Server domains/forests.
MemoryMappedViewAccessor and MemoryMappedViewStream both now have PointerOffset read-only properties.
GCLargeObjectHeapCompactionMode enum for use with the new LargeObjectHeapCompactionMode property on GCSettings. 2 values, Default and CompactOnce. I guess this property auto-resets back to default on the next full GC if set to compact once. (Also in Core.)
A bunch of new generic methods on Marshal – looks like a mixture of convenience and avoiding boxing of structs. (Also in Core.)
SignedXml – SafeCanonicalizationMethods read-only property.
TransactionScope now supports the new TransactionScopeAsyncFlowOption enum in several new constructors. I presume this controls how transaction scopes interact with async, which I presume lets you improve performance when you know a certain async section can safely be performed outside transaction scope.
Web.Hosting.CustomLoaderAttribute
HostingEnvironment.StopListening event.
IStopListeningRegisteredObject and ISuspendibleRegisteredObject interfaces in Web.Hosting – but I haven’t found what uses them yet.
Core only: AsRandomAccessStream extension method for streams.
Core only: System.Threading.Timer arrives.

I have listed everything I found, but if you want to look at my raw processed output files for some reason (which are much smaller than last time, due to a far smaller number of changes) there is v4.5.1 to v4.5 comparison and v4.5.1 core to v4.5 core comparison.

Binary Functions

There are 16 binary functions over 2 variables. This is a simple consequence of there being 4 input possibilities and only 2 output possibilities for each. In computer programming we frequently use ‘and’ and ‘or’, less often ‘xor’, but that still leaves 13 more.

Lets take a closer look at all 16. For brevity I will label each by their outputs for the inputs F,F T,F F,T T,T – so ‘and’ is FFFT.

TTTT – ‘always true’. Not very interesting, but it exists.
FFFF – ‘always false’. Again not very interesting.
FFFT – ‘and’. Well known and used in programming frequently.
TTTF – ‘not and’. Simple combination of the negation function with the and.
FTTT – ‘or’. Well known and used in programming frequently.
TFFF – ‘not or’. Simple combination of negation and ‘or’.
FTTF – ‘xor’. Not quite as well known as ‘or’, but simply ‘or’ where it can’t be both.
TFFT – ‘not xor’. Again a simple combination of negation and ‘xor’. But it is also the A == B scenario and thus xor can also be described as A != B.
FFTT – ‘B is true’. First parameter is ignored, and second is just passed through.
TTFF – ‘B is not true’. Again just the negation of the one before.
FTFT – ‘A is true’. This time the second parameter is ignored.
TFTF – ‘A is not true’. Yet another simple negation.
TFTT – ‘A implies B’. If A is true, B must be true, but otherwise anything goes.
FTFF – ‘A and (not B)’. This is also the negation of ‘A implies B’ but programming sees combinations of ‘and’ and ‘not’ far more frequently.
TTFT – ‘B implies A’. If B is true, A must be true.
FFTF – ‘(not A) and B’. As for 14, but reversed.

So, why did I list all these? So I could categorize them.

Really, really boring: TTTT, FFFF – No interest in the inputs.
Boring: FFTT, TTFF, FTFT, TFTF – Only one input is used.
‘And’ variants: FFFT, TFFF, FFTF, FTFF – Only one scenario is true.
‘Or’ variants: FTTT, TTTF, TFTT, TTFT – Three scenarios are true.
Strongly linked: TFFT, FTTF – Two scenarios are true, and both inputs are used.

I am now going to state that I think ‘and’ variants are actually boring as well. Since there is only one possible true state, they all result in you being able to say something about A or something about B, in isolation. The problem is separable. Which is just the nature of ‘and’, but it makes it less interesting to me.

This leaves 6 ‘interesting’ functions. A is the same as B. A is the opposite of B (xor). A implies B. B implies A. Not both false (or). Not both true (nand).

Of these, ‘or’ is obviously the most commonly used in programming. Interestingly, despite being a fundamental element of predicate logic, I have not yet worked with a programming language where A implies B has its own operator, you have to do B or not A, which is a bit clumsy.

Where does this thought train lead me… well maybe I will have more to say on this topic another time.

TCO13 R3B

Last on-line round of the TCO/GCJ ‘season’. 138 people eligible to compete, 125 turned up. Unlike last time the first 2 problems were completed by a significant proportion, so advancing required a fast time for both the first 2 problems. Only one person solved all 3, the same person also had the only solution to the 3rd problem.

Q1) Given a sequence of numbers where each element is between 1 and n, determine how many sequences of equal length with each element also between 1 and n are a ‘match’ with the original sequence if you remove one element from each sequence. Since the answer is large, return modulo large prime.

A1) At first glance this doesn’t seem too complicated, consider the original sequence, remove each element in turn, and count how many sequences you can create by adding one element. The problem is of course that this double counts. Even just considering a single case, if you insert the same value before/after a given number, the sequences are the same. So either we need to determine how many double counts occur and subtract them off, or use a different approach entirely.

Looking at a solution (because I feel lazy today) scenarios can be divided in to 2 cases. Where the number we insert is a member of the original sequence, and where it is not. If it is in the original sequence, we have complicated possibility of double counting, if it is not, then it is much simpler. So for each element removed (up to 50 cases) for each insertion position ( up to 50 places) for each distinct original value (up to 50 values) we have something which needs checking for duplicates. This is a fairly small sample size, so we can easily use a set. For each of the element removed, for each insertion position, we can also insert the n-distinct_count_options other values. There is still the ability to double count here, but these cases can be grouped by what the inserted value is (regardless of whatever that value is), so we can use a place holder (0 for instance) and the same set, and each time we get a non-match we add n-distinct_count_options to the final result.

Q2) Given pairs of values, determine the minimum number of individual value swaps (swapping one value from one pair with one value from another pair) is required such that all pairs have a maximum difference of k. If it is impossible, return -1.

A2) There is only a maximum of 16 pairs here, which suggests bit-masks may play a role.

But again, I am feeling lazy today, so I went straight to a solution. The first step is to determine whether the problem is possible at all. It seems that if you take all the values from the set of pairs, put them together, and sort them, they must fall out pairwise into values which are close enough. The logic behind this seems to be rather straight forward – if you consider the smallest value you need a value close enough, and if there are multiple which work, choosing anything other than the smallest of those, can only potentially make it harder to form the next pair.

To find the minimum number of swaps we can look at this problem from a different angle. Any solvable situation can easily be done in at most n-1 swaps, if we look at the sorted order pairing, we can swap the second to be next to the first, the 4th to be next to the 3rd, and so on, until the final number is already in the right place, the question is how much can we short cut that. If we have a solvable scenario which can be divided into 2 solvable subsets, worst case we can do the first in k-1 steps and the second in n-k-1 steps, giving n – 2 steps. What remains is to determine the most number of steps which can be saved, by following this down recursively. The maximum number of steps which can be saved by a scenario is the best sum of the maximum number of steps that can be saved over each partition into 2 solvable subsets. With a minimum of 1. Since we already have worked out how to determine if a situation is solvable, we can perform that for every subset, then considering every subset, of every subset (which is not a small number, but is achievable), determine if that subset and its compliment are both solvable, and use that as the framework to build up a recursive cache of maximum number of steps to be saved for a given subset. Then the final answer is n – maximum number of steps saved for the full set, assuming it is solvable. If it isn’t the answer is of course -1.

Q3) Determine the minimum number of jumps to get from 0,0 to x,y if each jump can’t be more than a distance sqrt(d) in a straight line, and jumps may only be to/from integer coordinates.

A3) Unusual bit here is that rather than just asking this question once, each test case can be up to 50 of these questions, which you have to answer all at once. For d=1 the answer is |x| + |y|. For d=2 the answer is max(|x|,|y|). From there the question that first needs to be asked I think is whether a greedy solution is correct. If so we should be able to find a pattern in the jumps which repeats and use that to ‘accelerate’ to the finish.

Looking at the solution again, it appears some fancy maths comes into play. Given d, you start off by determining the pair of maximal distance jumps which are either side of the angle connecting 0,0 to |x|,|y|, do this by constructing each possible maximum distance jump and watching the crossproduct change signs as we go from one side to the other. (May want to take care of the x= 0 or y = 0 cases separately just to be sure you don’t screw that up.) Once you have this pair of vectors it just becomes a question of how many of one and how many of the other to add together to give the right destination. This is where the maths seems to get a bit fancy. Taking directly from the solution, the answer is crossproduct((x,y),V2-V1) / crossproduct(V1, V2-V1), rounded up. I’m not quite sure how to derive this exactly, but at first glance I wouldn’t say it was wrong.

GCJ13 R3

So, out of 500 qualifiers, only 311 positive scores. And no perfect scores.

Cut-off to advance to the world finals was having solved the first and last problems. But given only 3 people fully solved the last problem, a safer grouping was having solved first, 3rd and the small input of the last.

Interestingly solving Q1 and small inputs of the rest got between 29th and 54th, which was a reasonably achievable score if you gave up and didn’t get bogged down on any of the tougher large inputs.

Q1) In a rigged game of roulette where the ball always lands on one of the values with minimum payout, given how everyone else has already played, determine your maximum profit you can make with your money.

A1) So for this question it seems likely that we can enumerate the scenarios of maximum winnings. The small input certainly is trivial, with a maximum budget of 1000, you can use the strategy that placing money on an outcome which is not currently one of the lowest payout outcomes, is a wasted bet. Furthermore when there are multiple equal, you choose the one which has the least of your money on it. Then just apply that strategy by placing each bet of size 1, one at a time, and find which is best. The large input however we need to enumerate the ideal scenarios more efficiently as you could have up to 10^12 to spend.

So, rather than going 1 by 1, we need to skip ahead by chunks at a time. Specifically, any scenario which is profitable, will be more so, if we can raise all the minimum payout outcomes by 1, without increasing the number of minimum payout scenarios. So we can seed with a few scenarios, and push them as high as we can, without reaching equality with another players bet or running out of budget. Each time we reach another players bet, we start again assuming having brought everything up equal to that. I think that the scenarios to seed with are the 37 equal or above equality with a bet (or equal or above 0 if there are less than 37 player bets). Running time is thus 37*37 divisions/minimum/maximum calculations – easily running in time.

I think this would work, but I see solutions instead using binary search to find the best scenario rather than division. It seems that there is a more aggressive strategy which always spends as much of the budget as possible for each of 37 scenarios, rather than 37*37 scenarios spending up to the next price threshold.

Q2) Given a set of vertices construct a non-self intersecting polygon which uses all the vertices with at least half the area of the convex hull.

A2) Just like Round 2, another problem where it doesn’t state maximum area, just at least half, and any satisfying result can be returned. Being a computational geometry question, it isn’t my favourite. So I think the approach would be to start by calculating the convex hull. Then it becomes a question of where to insert the contained points into the order described by the convex hull. Small input however is limited to 10 vertices, so it can be brute forced, just consider every polygon, build them up to rule out self-intersection, then calculate the area that remains. Large input has 1000 vertices, and I want to say a greedy approach will work, where for each vertex not in the current border you calculate its minimum cut-out location to add it to the list, and perform that cut-out. Problem is doing this in better than O(N^3) time seems like it would require some fancy spatial data structures.

Looking at the answers there is a far simpler approach which avoids the whole question of maximum and just dives straight at the ‘at least half’ criterion instead. It is not at all obvious though… If you calculate the convex hull, and consider connecting all the internal points in a left-to-right sequence to the outside of the hull (choosing an orientation to ensure that internal points can be strictly ordered left-to-right, you create 2 areas (both of which are valid non-self-intersecting polygons), which together add to give the area of the convex hull. Hence one of them must be at least half. So if you take half the convex hull, and all the other points ordered left-to-right, that can only be larger than the previous area, so one of those 2 must be more than half the area. Since all coordinates are integer, you can select a very slight fractional angle, which will ensure they all have a distinct left-to-right order. Calculate the convex-hull, and the left/right most coordinates of the convex hull under this order then serve as the split points to divide the hull into 2 parts, try both versions, calculate the maximum area. Since the polygons are known to be non-self intersecting, the running cost is O(N log N) for sorting on a slant, and then O(N) to implement convex hull on sorted data and O(N) to calculate the area, so 1000 is easy.

Q3) Given a set of one direction edges on a graph and an ordered subset of them which represents a path from vertex 1 to vertex 2, determine the first edge of that subset which is definitely not on the shortest path, if any, given a range of possible costs for each edge. When considering each edge for whether it can be on the shortest path, the shortest path must include all previous edges.

A3) This is a very strange question, on my first reading I didn’t understand it very well and thought it was about ensuring each edge specified was on a shortest path, not only on a shortest path prefixed by a given set of edges. I think the small input can be brute forced by trying each combination given by choosing one of the extreme values for each edge and calculating the shortest path for each of these 2^20 scenarios. Whichever one goes furthest down the specified path will define the answer. Large input certainly needs a different approach. I am thinking that we start with every edge at minimum value, then apply all source shortest path, then for each edge on the route we take the specified previous edges, and from the end of the edge to vertex 2 we use the previously calculated shortest path to the destination, then every edge not on that list has its cost set to maximum, and we determine if the shortest path goes through our selected edge. Running time should be okay, O(N^2 log N) or so, but I am not certain it is actually correct.

Looking at the solution I am still not entirely sure, but I am pretty sure there are corner cases it breaks where the shortest path back from end of prefix allows for a short-cut that skips the selected prefix edges but there exists another ‘short path’ back which doesn’t cause that problem. The solution does a simultaneous 2 starting point shortest path by priority first search. One starting point is vertex 1 with initial cost 0, the other is the end of the current prefix being considered, with a cost implied by the minimum for each edge in that prefix (minus a half to give it priority since equally fast is still on the shortest path). Each of the prefix edges are forced to have weight minimal. If we are at a vertex which has a fractional best cost, we use minimal weight for outgoing edges, because our prefix shortest path got there first, so its always good. Otherwise we use maximal weight for outgoing edges, since the raw search got there first, which is bad. If the destination vertex ends up being fractional, shortest path has succeeded with the prefix.

Q4) Given a starting list of boolean values, determine the average sum of the number of consecutive true values starting from randomly selected locations until the first false to the right (wrapping back to the beginning if needed), if each found false location becomes true, and the process repeats until every value is true.

A4) Again this is a problem where the small input is quite easily achieved. With only 20 entries in the list, the problem can easily be recursively defined with a cache over the possible states in the list. Each state has 20 random approaches which have to be averaged to give the final value of that state.

Large input, I had no clue, so I went straight to the solutions. The solution is an O(N^3) approach by calculating the probability of filling a consecutive subsequence (including wrapping) of values with a specific member being last, and the expectation value of the same. Each of these calculations can be expressed in terms the items before the specific last entry to fill, and the ones after it and using some fancy combinatorics for the probability calculation. I won’t go into it here, you can always read the official contest analysis for more details.

TCO13 R3A

So, having been knocked out in Round 2, I didn’t have to wake up at 2am to do this round, which is good because I’m not sure I would have gotten any points anyway… As usual, there were 3 questions, but no one solved question 3 and only 5 people successfully solved question 2, meaning you could advance to the finals with just by solving Q1 quickly and a successful challenge or 2.

Q1) Given a set of 32bit numbers, determine whether each of a second set of 32bit numbers is in the resultant of K rounds of expanding the the set using the bitwise and operation. Specifically, every pair in the set is anded and the result is added to the new set, the input and output sets can contain duplicates.

A1) So, my initial thoughts were around that the and operation is rather limiting, so the total count of numbers will never get very large, so you could just simulate the problem, collecting duplicate outputs into counts. But that would be mistaken. using an appropriate set of initial numbers you can generate a very large set of outputs even once you apply distinct, and as a result simulation will take too long.

So, K=1 is easy, you just consider every pair. K=2 there are 2 cases, the 2 input numbers are either from distinct original pairs (N choose 4) or they are from overlapping original pairs (N choose 3). K=3 there are more scenarios. 8 distinct original numbers anded together is the maximum, you can obviously do scenarios for 7, 6, 5, 4. What is interesting is that you can still do 3. In the K=2 case, each scenario of N choose 3 can be created 3 different ways. ((1,2) (2,3)) ((1,3) (2,3)) ((1,2) (1,3)). Since all duplicates are kept this means that these would be anded together pairwise to create another 3 new copies of each case of (N choose 3). From there on it follows that every possible combination involving at least 3 and at most 2^K distinct original numbers will be in the output. And since the original input is at most 16 numbers, we can consider every subset easily enough. We just need to remember to special case K=1, as it is the only case where you can choose pairs.

Q2) Given a NxN array with 3 specified positions on the edge being important, determine the minimum number of filled in positions which need to be added (some may already be filled in) such that there is no path of connection between any pair of the 3 original specified points (path consists of horizontal and vertical only, no diagonals). If there is no way to do this is less than 100 positions, return 100.

A2) The ‘100’ limit is a misdirection, other than when any 2 of the specified positions are adjacent (and hence cannot be blocked) the maximum answer is 6, 3 for each of 2 of the original positions required to surround them against an edge – the 3rd position doesn’t matter of course once 2 of them are blocked in. However, considering the cases of placing 1-5 options on a 20×20 grid, and for each case performing a connection check is 400^6 operations in the worst case, which is far too many to complete in the few second time limit. It is quite feasible to check each single location, or even each pair, but beyond that is pushing your luck.

Looking at the 5 successful solutions they all seem to use the same kind of approach. The idea is to think about the minimum cost to create a wall between 2 points. Then the final result is either 2 walls which each block in 1 edge location, or 3 walls which meet in the middle. The minimum cost to create a wall between 2 points is a graph problem where a block already filled in creates 0 cost edges for the paths it blocks, a block not filled in creates 1 cost edges instead. The net result is you can run all pairs shortest path. Once you have the all pairs shortest path, you can then consider each centre location, and for each of those consider the shortest paths to create three walls from there to any location on the edge between each pair of the 3 selected edge locations. Finally you can choose each pair of edge positions and choose the cheapest 2 walls which go from one side of a selected location to the other. (The 2 cheapest selected must obviously be for different selected locations.) Choose the best result out of all of these scenarios and, presto you have the answer. In the worst case of a large empty map with 3 spread out selected edge locations the 2 walls will always win out, but on a map which is already filled in much more significantly, the 3 walls meeting is more likely.

Q3) There is a set of m positive integers , n of which are between 1 and t inclusive, and the sum of all m is less than or equal to s. Determine how many sets satisfy these requirements (returning count modulus large prime). Constraints, n is at most 100 less than m, t is at most 10^5, m is at most 10^9 and s is at most 10^18 and is at least n*t.

A3) So I don’t have any answers to fall back on here, what I find most interesting is the constraints, they are rather detailed – s being at least n*t for instance is suspicious. Looking at this problem I see possible approaches with running times of at least O(s) which is obviously out of the question. The combinatorics certainly seems beyond me at the moment.