GCJ13 R1B

This time around the top 1000 advancement cut-off was Q1 + Q3 small in fairly fast time, or Q1 and Q2 small.

Q1) Given a game where you can absorb other smaller objects to become the total size of the 2, a starting size and a set of other objects, determine the minimum number of changes that have to be made to the set in order to be able to absorb every object into one big super object.  When adding to the set you can choose whatever size object you like.

A1) So this problem is somewhat greedy in nature.  It should be fairly obvious that there is never a downside to absorbing an object you can currently absorb.  So you start off by sorting the objects by ascending size, and absorb from the start until you can’t absorb any more.  At this point you have 2 choices – remove every larger object, or add a new object to make yourself larger in the hopes of continuing.  Again greedy comes into play, there is never any value in adding an object any larger than yourself (right now, you can always do it later when you are bigger) and no point making it any smaller than the largest you can absorb.  So add an object exactly equal to your current size minus one, absorb that, and repeat from the beginning.   You can continue this process until they are all done.

Along the way at each point you have to add an item, check to see whether your current number of added items + the number of remaining items, is better than your best result.  If it is, replace the best result.  Finally check whether always adding was better still at the very end.  Return this result.  Since you almost double in size every time, the answer is always going to be no more than about 21, so everything will easily run in time.  There is one special case that I have ignored so far.  If the initial object size is 1, you have to remove everything, because adding an object never helps in that case.

Q2) In a scenario where diamonds fall down and stack up (with random choice in fall direction when there isn’t an obvious choice) what is the probability that after n diamonds fall a given location has a diamond at it.

A2)  The important thing to notice with this question is that the falling diamonds occur in phases. Until the mound of diamonds reaches height n, it never goes wider than n either side of 0.  So the answer is either 1 or 0, unless the location being asked for is in the current phase.  Each phase adds 1, 5, 9, 13, 17 … diamonds.  This is of course an arithmetic series, so it has a quadratic sum formula, so the large input is obviously no worse than ~1000 phases, so it is safe to walk through them all until we get to the one of interest.  If the point of interest is in the final shell, we need to work out the probability.  For the small input there are only a handful of scenarios, you can probably work them out by hand and just hard code them, but for the large input it can get a bit more complicated.

If the final shell is full, obviously the answer is 1, otherwise we have 4n locations to fill with k objects where the 4n locations form 2 stacks of height at most 2n.  If k is <= 2n then each of those k had a 50% chance of falling to either side and the probability of reaching a certain location of height h is the probability of getting h or more heads from k coin tosses, which is standard probability theory (although some care will be needed for the calculations in the large input, 1600 choose h can be a very large number, as is 2^1600 – if you create your pascal triangle cache pre-scaled by 2^-depth and use double precision that should be sufficient accuracy).  If k > 2n, then it is possible to fill up an entire side, after which every diamond is guaranteed to fall on the other side.  I think this means that k-2n on both sides are guaranteed and we can treat the remaining (k – 2k+4n)=(4n-k) diamonds exactly as before, but with the height of the target location being reduced by k-2n – but I can’t say that I’ve double checked that carefully for probability theory mistakes.  If that isn’t true you can instead formulate a separate DP for each problem case (rather than reusing the same pascal triangle cache over and over) and build it up for each diamond dropped.

Q3) Given a string of ‘words’ where the spaces are removed and then someone introduces errors no more than once every 5 characters, determine the best cost match to a starting scenario based off a dictionary.

A3)  First thing to realise is that while the dictionary is quite large (500k words) the words themselves have a maximum length of 10.  This means that there can be no more than 2 mistakes per word.  This problem then boils down to a fairly standard dynamic programming problem.  The problem state is what character are we up to in the input, and how many characters since the last mistake (capped at 5).  From each problem state we consider each word length (1 to 10), and each valid scenario of adding errors to that next section (no errors, 1 error after the minimum gap, 2 errors after the minimum gap with minimum gap), check if the result exists, if so produce a new potential minimum for the final state (length and characters since the last mistake).

Only question is will this run fast enough for the large input.  Large input is 4000 characters, which gives 20k states.  Each state has 10 potential word lengths, and the worst case of those word lengths with has ~9500 cases, and the exists check requires operation cost at least equal to word length.  This gives an estimate cost magnitude of 20k*900k.  Or 18billion memory ops.  Even with only 4 test cases, that is very scary time wise.  We can probably trim quite a bit by skipping any scenarios where we can’t possibly improve the target cost scenario, and that cost magnitude is a bit of an exaggeration due to taking easy approximations.  So release build, no debugging, cross fingers… (or test run time before you submit…)

However, we can probably do better.  One option which seems a likely candidate is to use a prefix tree rather than a hashset to store the original dictionary contents – that way while we are generating our potential cases, we are going to get early failure quite often.  I would estimate this probably gives at least a factor of 20 improvement in practice.  Another option is to pre-compute separate dictionaries for each single character error position, then use them as a basis for the exists check, to avoid having to generate 2 character error scenarios which are the vast majority of the cases to check during the DP.  Memory usage will be quite large in this last case, but it should also definitely give a factor 20 improvement if implemented correctly.

I originally considered doing a once off creation of every 2 character error scenario from the dictionary, but the memory requirements can easily start heading towards 50+GB which defeats the point since I don’t have a home super computer.  Oh well.

Leave a Reply

Your email address will not be published. Required fields are marked *