May 2010 – Page 2 – The Missing Docs .Net

I’ll only cover the take 2 questions here, even though I thought the take 1 questions were better…

Q1) This question was a convoluted way of asking ‘what is the minimum number of neighbour swaps to sort a list containing 2 unique (but potentially repeated) elements’. It was phrased a bit to confuse, but that is what it boiled down to.

I implemented this by hand writing a bubble sort and counting every swap, which was simple and for the input size, easily going to perform in time. A more optimal solution (which takes advantage of there only being 2 entry types) is to step through each entry, keeping a counter of how many of the type of element you want to be first you have seen so far. Each time you encounter an element of that type, increase your result field by the difference between the current position and the number of elements of that type already seen. This computes the result in O(N) rather than O(N^2). Oddly enough I swear I’ve seen this question somewhere else recently…

Q2) This one was a bit more tricky. The question was, given a short (up to 10) list of UDLR direction commands, and a number of times to repeat them, and assuming every move is unit size, how many unique locations will you end up visiting? The trick being the number of times to repeat them was up to 100 million…

The solution to this comes from realising that after the first few repetitions of the direction commands the number of new unique spots per repetition will always be the same. I didn’t bother proving a minimum number of repetitions before it stabilises, which I suspect may be quite low, I just chose 20 to be on the safe side (current estimate is that 4 repetitions is sufficient for a command list maximum length of 10). Use a set (dictionary since topcoder still only supports .Net 2.0) to store the locations you have visited, and simulate the the first few replications, storing how many new are added for every attempt. Then simply multiply the last number of new spots by the number of repetitions left, and add that to receive your final total. I simplified my code by storing my location as a single integer rather than a pair, using 15 bits for one axis and the rest for the other. Since the number of repetitions simulated is only 20 and the number of steps per reptition is 10, 15bits is overkill, but whatever…

Q3) This of course was the hardest question, and I didn’t get it completed in time. This question was a bit complicated, but it involved up to 50 different ‘lists’ being unioned together and sorted. The lists themselves could be distinct values (stored in 50 characters, so at most 24 numbers) or a description of an arithmetic or geometric sequence. The sequences were defined with start number, step/multiplier, number of elements in the sequence. The program had to take this sorted union and return what element was at a given position in the list (for up to 50 positions). Where that position was anything up to 1billion. To simplify things a bit if the value at a position was over 1 billion, you were to return -1, and if the position asked was beyond the end of the sorted union, also return -1. However to annoyingly complicate things, the input numbers were not gauranteed to fit in 64bit integers, although it was guaranteed that they would all be greater than 0.

Parsing the input was really much of the problem. The restriction that values over 1billion return as -1 and positions beyond the end of the sorted union return -1 means that because it is sorted, you can ignore every value over 1billion and pretend it is not in the list. Therefore int.TryParse and a check that the result if valid was less than or equal to 1 billion was sufficient for each field of the input. Depending on which field what to do with the fact that the value is > 1billion you did have to treat it differently. For the exact lists you could just throw each excessive element away. For the sequences if it was the starting value, you could throw the entire sequence away. If it was the multiplier/increment then you could replace the sequence with an exact list containing just one value. However if it was a sequence length, you need to treat it as at least 1 billion.

Another side step which can be performed to reduce the problem space a bit is to realise that geometric sequence with a multiplier greater than 1 will all exceed 1 billion in 30 repetitions or less – so you can expand them into exact element lists. With a multiplier of 1, you can replace them with an arithmetic sequence with step 0. Therefore the rest of the code only has to handle arithmetic sequences and exact lists. All of the exact lists can be combined and sorted so all that is left is a single exact list (might be empty) and up to 50 arithmetic sequences. If you wanted all arithmetic sequences with length less than 30 or so could be treated as exact lists as well, but it doesn’t really help anything.

Finally now that the data is all prepared you have to find the actual values at given indexes. This cannot be done by generating the first 1 billion entries, for one thing it would exceed the time limit, but for the other it most certainly would exceed the memory limit. The solution comes from performing a binary search. A binary search on what? A binary search on the ranges of positions which have a given value. If the position we are after is in a range, it has the value for that range. The function we need to implement the binary search is ‘how many numbers are less than n’. We then binary search on n over the range 0 to 1billion asking the function for n and n+1. If the answer to the first question is greater than our required index, the answer must be smaller than n. If the answer to the second question is less than or equal to our required index the answer must be greater than n. Otherwise the answer is n.

Answering how many numbers are less than n for the exact list is trivial, since we sorted it. You can even use a binary search, since the number of elements may be as high as 1500. For the arithmetic sequences you can take n+1 subtract the start value, if result is negative return 0 else divide by the step size (using integer division) and add 1. Unless the step size is 0, which we introduced when getting rid of the geometric sequences, in which case the answer is infinity (aka 1 billion) if n+1 is greater than or equal to the start value. Whatever answer found has to be capped by the maximum number of entries for the sequence as defined previously. Finally the answer for each sequence and the exact list are summed, being careful not to overflow integer (since you could be adding the number 1 billion 50 times). For an added performance kick this function can be memotized in case the different position queries end up passing in the same values. It won’t make a huge difference to performance since the input values can be spaced such that only 5-6 out of up to 30 binary search steps will overlap for each query.

I ran out of time when I was just about to start implementing that final paragraph – but there was plenty of opportunity for off by one errors which I probably would have wasted a good 10 minutes fixing.

So this is where I would usually post my post match analysis … (and given the contest only started 50minutes ago, this post is way too soon…)

Topcoder servers have ‘crashed’ multiple times. I managed to solve the first problem, but not before the system became so unresponsive that I couldn’t submit it…

I did however discover a cool new feature in google as a result. Since the topcoder website was down I thought I would try a google search to see if someone had posted something which was already indexed… and it appears that google now indexes twitter in real time. You get a section in the search results which updates dynamically pushing new twitter posts which correspond to your search times. This is how I discovered that everyone else was also suffering from the crashing issue which was affecting me…

Maybe they will reschedule, otherwise qualifying round 2 is my only chance to qualify – qualifying round 3 is during work hours.

Edit: Well they finally got the servers going, and I managed to submit my answer. Currently pending system tests, but assuming I pass those I am in the 600 who theoretically advance if they don’t declare the entire event invalid…

Edit2: Systests complete, I’m 385th of 474 people who got a > 0 score (out of ~1200 contestants) – still no word on whether the event is being marked invalid.

Edit3: Apparently just before they got the servers running again they posted a broadcast which I didn’t get because I was crashed, which declared the event invalid. It is going to be rescheduled at some time to be announced in the future – I’ll probably write up an analysis of the problems tomorrow anyway…

Month: May 2010

TMD Algo 0.0.4.0

TCO10 QR1 Analysis

TCO QR1 Take 2

TCO10 Qualifying round 1