TCO14 R2C

Last chance to get through to round 3 or get a t-shirt, but I was never really in the running for either this year it seems.  Final ranking of 420th with a fairly slow submission time for the first question, probably could have gotten into the top 300, but the trick to solving the second problem seems clearly out of my reach for 3am with 45 minutes.

Q1) Determine the range which if reversed produces the lexicographically smallest string.  If multiple equal results prefer smallest first index, then smallest second index to break ties.

A1) This question was rated for 300pt, so I thought there was going to be some real difficulty to it, so I ended up going slow and careful.

Pure brute force is O(N^3) which for length 2500 is too slow.  But it is pretty easy to get this down to O(N^2).   If there are 2 candidates for lexicographically smallest, unless the first change they make is at the same place, you can trivially choose the one with the earliest change.  This means that once you have identified a starting index which can be improved by swapping with something later, the question is reduced to which later index is best.  Finding the first index which can be improved can be done in O(N) time (because there are only 26 distinct possible values, otherwise O(N log N) would apply).  But given dealing with all the possible second indexes is going to generate O(N) scenarios with a cost each of O(N) to check if they are smaller than the current best, you might as well just brute force finding the first index, which is O(N^2).

There are a couple of corner cases that can come up.  If you actually construct the strings with segments reversed it is best to avoid gathering them all up and doing a sort, it might be too slow as O(N^2 log N) is pretty high.  However doing in-place comparison (like I did) requires care that you remember that the earlier of the possible last indexes will run out of reversed area first, and you need to compare the later reversed part of the larger range to some of the non-reversed part after the smaller range, and not accidentally compare with non-reversed part before the smaller range.

One corner case which I thought existed but seems does not is that if you have a range which when reversed gives the best lexicographical string its ranking could potentially be improved if the character before and after the range is identical, since reversing the larger range still gives the same result, but the first index is earlier.  I wrote code to handle this case, but for the final result, and as a tie-breaker when considering two possible last indexes.  However it cannot happen.  Presume that it did exist, for example that the best lexicographical ordering is …ej……ge… which the j to g range has been selected for reversal.  Immediately this can be shown as invalid as the j to e range produces a better result.  So lets try …ej…….ee… where the j to first e range has been selected for reversal.  Here we know that there is no letter earlier than e anywhere after that first e, because otherwise switching that letter and the first e would provide a better reversal.  This means that j to first e in the ending pair is a worse option that j to the second e in the ending pair, because the reversed sequence will have a longer sequence of e’s before something higher is encountered.

Q2) Given a undirected graph which is specified as clumps of vertexes which are fully connected (where the vertexes in each clump are not distinct from other clumps, indeed the graph is connected). Determine the sum of the all pairs shortest path distances.

A3) So all pairs shortest path distance sum can be calculated in O(VE) for an undirected connected graph with all edges of equal length using a simple BFS.  The problem is that with 2500 vertices, and potentially being fully connected, this is O(N^3) which is too slow.

I thought the trick here was to treat vertices which are only in one clump specially, but on a closer look the problem statement doesn’t actually restrict how many such vertexes there are, even to the potentially silly case where every vertex is listed in 2 identical clumps that are completely overlapped.  Even if subsets were disallowed, this would still have the all but 2 vertexes in common case which is still going to be O(N^3).

The actual solution appears to be to introduce more vertexes.  One which represents each clump.  Then rather than connecting nodes in a clump together, they are connected to their clump vertex.  This doesn’t change the connectedness of the graph, it only doubles the distance between everything.  Calculating the all pairs distance (excluding the clump nodes) is still a simple BFS, just with a conditional for when to add values based on whether you are at a fake vertex or not.  Then you just take the final result and divide it by 2.  This new graph has a number of edges equal to the sum of the number of clumps each vertex is in, which by the definition of the problem is limited to double the number of vertexes.  The number of clumps is also limited to be the number of vertexes, so O(VE) becomes O(N^2) with a reasonable constant.

Q3) Given a set of ranges of indices and the maximum value in each range, determine whether there exists a permutation of the numbers 1 to n which can satisfy all the defined constraints.

A3) With up to 50 constraints, considering how every subset interacts is going to be too slow. However it might be possible to just introduce each new constraint one at a time and check against implied constraints calculated from previous checked pairings.  Information can only be inferred over sections which start or stop at a constraint boundary, so there can be at most 2500 implied constraints.  Therefore the running time should be O(N^4) if constraint interactions need only be considered pairwise in O(1) time.  Point 4 below looks like the most problematic to this, but might be able to be addressed by sorting regions by their implied maximum or similar.

There are several things to consider at a minimum.

    1. Two disjoint ranges cannot have the same maximum.
    2. A subset cannot have a higher maximum.
    3. A range cannot have a maximum value smaller than its width.
    4. Multiple disjoint ranges cannot all have maximum values smaller than the sum of their widths.
    5. Two ranges which overlap define a union range with the larger maximum.
    6. Two ranges which overlap with equal maximum define an intersection region with the same maximum and the maximums of the non-overlaps are less than specified.

 

    Two ranges which overlap with different maximums imply neither maximum is in the overlap.

But getting from there to a working solution is not obvious to me.