TC SRM 471

This SRM was cancelled due to some issues, but the questions are still there, so here I go…

Q1) Determine the largest prime less than or equal to n, which when divided by 2 (each time rounding down) D-1 times, results in D primes being generated. Limits: n is up to 10million and D is up to 10.

A1) Generating all primes less than or equal to n is easy, the only thing to be careful is that if your programming language of choice stores bool in 4 bytes, you’ll exceed the memory limit.  So use byte, or bitarray, if needed.  Once primes are acquired, start from n and keep walking down, when you find a prime check the dividing sequence, if it works return the current prime you started with.  If you get to less than 2^D, break and return -1.

Q2) Given a directed graph with edges which each have a positive length if they exist (but the reverse edge may not have the same length as its matching forward edge), determine the shortest path (if one exists) which gets from vertex 0 to vertex n-1 where no sum of consecutive edges is 0 mod 13. Limits: n is up to 25.

A2) This is a dp over the graph, shortest path to get to vertex i where the current modulus for the sum of each of the last k edges is represented as a bitset.  From each value you generate more by considering each edge away from vertex i which itself does not have a length of 0 mod 13.  The new bitset for the destination is the current bitset rotated by the edge length mod 13, unioned with the bit indicating the edge length mod 13.  If the rotation causes the 0th bit to be set, do not add the new destination to the set. If we do add it,we minimize length between the new length total of current state + edge length, and any existing path found to the new vertex with the new bitset.

Because before the rotate bit 0 is not set, and after the rotate bit 0 is now in bit edge length mod 13), every step adds a new bit.  This means that the depth of search is at most 13.  It also means that the appropriate dp order is by number of set bits. By using the dp, we ensure we ensure that rather than worst case ~24^13 it is at most 2^13*24, which is much more reasonable.  Once the dp is generated, we walk all bitsets for the destination node and choose the smallest.

As an aside, since all cases where the 0th bit is set are discarded, you can alternatively use a 12 bit bitset, but you get added shifts since the rotate operation needs to occur on a 13bit representation.

Q3) Given up to 50 line segments which have integer dx/dy/dz, what is the maximum distance (squared so answer is an integer)  between the start and end you can achieve by joining them end to end such that the created line does not self-intersect.

A3) I managed to get some ideas for this but wasn’t sure if I was on the right track, so I took a look at some solutions.  Probably the most important bit which I did get was that the self-intersect restriction is irrelevant, any self-intersecting arrangement can be made non-self intersecting and at the same time larger, by fairly simple rearrangement of the connections.  The other important point is that the order of the edges is irrelevant, it only matters which end you choose to connect on to the current line.  (Still with up to 2^50 combinations, you don’t want to try them all…) Most of the solutions I saw were along the lines of ‘choose a random direction, and choose to connect each edge such that it has a positive or 0 dot product with that direction.’  Repeat while clock says less than 1.8 seconds past, and hope the best one found is the winner.  The solution in my head before I started looking at solutions was similar, only rather than random directions, it was try each edge direction, I gather that this would have not been sufficient as some other solutions were like that,but then threw in extra random trials, or tried directions perpendicular to pairs of edges, and added some random trials as well.  I found 2 solutions which did not use a random number generator, but I haven’t actually copy pasted them and run them through system check to ensure they are right…

Option 1, Firstly, join all parallel edge to make them into a single edge with length equal to sum of the components.  Then for every pair of edges, calculate the cross product.  Use dot product to select order in the direction of the cross product.  However, some edges will have a 0 dot product.  Try to maximise these in 4 different directions (using dot product), each of the positive and negative cross products of the incoming cross product and one of the first two edges.  One of these maximisation attempts will be largest, return it.

Option 2, Consider each pair of edges, and calculate the cross product.  Perform straight dot product optimisation like before, but don’t bother trying to optimise the 0 dot product cases.  For edges, also try the ‘jittering’ the endpoints +/- 1 in each direction, to create 26 variants, and repeat considering each pair of edges including these jittered versions.  This jittering tilts the created cross products, sort of simulating the different secondary optimisations which were applied to the 0 dot product cases.  Seems kind of dodgy to me, but then so does option 1.

The only bit that I didn’t really get was using cross products to choose the direction of primary maximisation.  The only thing I can see for sure is that it results in a lot more directions being tried. But you would think that trying each pair-wise sum would achieve the same goal.  Not sure on this.  With the SRM being cancelled, I guess there probably isn’t an official write up either for me to check on.  If every line segment was in the same plane, the cross product approach is completely useless since it actually only results in 1 direction, and every line has a 0 dot product with that direction.

TC SRM 472

Q1) 2 people take turns to remove powers of 4 from a number which starts with n. Where each player tries to be the last person to remove a number determine who succeeds.  n is up to 1 billion.

A1) This problem seems to be one of ‘solve by inspection’. Simulate the first few thousand values of n using dp and you’ll quickly see that the results form a regular pattern.  2 out of every 5 return player 2, the rest return player 1.  Mathematical induction can be used to prove that the pattern continues by noticing that all powers of 4 are either 1 or 4 mod 5.  In the end the code boils down to k = n mod 5; return (k == 2 || k == 0) ? “player 2” : “player 1”;

Q2) Given n cards which each have a number on the front and back such that numbers 1 to n each occur exactly once on the front and exactly once on the back, determine the number of different lists of n numbers which can be formed by reordering the cards in any order, with any face up. n is up to 50 and the answer should be returned modulo 1 billion and 7.

A2) This seemed a tricky question, and inspection of round statistics showed I was not alone in finding it hard.  The most important trick is to realise that if each card is consider as a link between the number on the front and the back, and each case of the same number is linked together, the full set of cards can be divided into cycles which each are independent of the others.  If the cycle consists of a single card, it has no options, only possible locations in the final list.  If it is longer, it both has a number of options which is dependent on the cycle length, and the number of ways that cycle can be interspersed into overall list.  The answer becomes the product of the number of ways each cycle length can be chosen from n reduced by the length of cycles already considered times the number of ways a pattern can be made in the cycle itself.

The number of ways a pattern can be made in a cycle can be calculated a number of ways, dp is one option, although it seems a little tricky.  An analytical approach is easier.  Each number on the front of the cards which are in a cycle can either be present 1 2 or 0 times.  Each 2 implies a 0, so it seems helpful to consider each case of the number of 2’s separately.  The number of ways to select which numbers are available to use with k doubles is the tricky bit, once you have that can boost that to get the full answer by multiplying by the number of orderings, which is n!/2^k if there are k 2’s.  While each 2 implies a 0, it doesn’t imply exactly where that 0 is, other than it is somewhere before the next 2…  So we need to reserve 2k spots for the 2’s and 0’s and then we need to alternate 2’s and 0’s, which gives 2 options.  So mathematically it is C(n, 2k)*2.  However in the case of k=0 there is only 1 option, so we need to get rid of the multiplication by 2. Which gives C(n, 2k)*(k==0 ? 1 : 2).

And that’s all the components, all that remains is to sum things up and multiply it all out and perform a lot of modulo operations everywhere.  (Pre-calculate the choose function using DP as usual.)

Q3) Consider a h x w rectangle of tiles where each tile is coloured with one of 4 colours.  Determine the number of ways to make every tile a different colour from its (up to 8) neighbours, while only changing at most k tiles. Limits: h and w up to 50, k up to h*w.

A3) After problem 2, I’m surprised there was even 12 correct solutions to this one.  First up there are a couple of special cases.  If height or width is 1, simple dynamic programming provides the solution.  In fact the rest of the approach used to solve the problem requires height and width greater than 2, so it we have to solve that first.

For height and width greater than 2 consider each possible set of 4 colours for the top left corner.  Playing around with the possibilities which satisfy the different colour to neighbours requirement you discover that every second row is identical or every second column is identical, or both. So assume the rows are identical, each column after the first 2 has 2 choices for patterns.  Then assume the columns are identical, each row after the first 2 has 2 choices for patterns.  This gives us 2 DP (very similar, in fact you can do a diagonal flip of the input and use the same DP for each).  In each DP you have number of ways to make the first n rows work (columns) using j changes.  Then sum all values of j from 0 to k for the all h row case, and presto you have the number of ways for column (row) alternation for a given 2×2 top left starting point.  Sum column and row together and … wait a second.  As mentioned earlier there is a case where both rows and columns alternate.  For a given 2×2 starting point, there is only one such case, but if it can be achieved in k changes or less, we’ve counted it twice.  So check that quickly and subtract it off.  Sum over all 2×2 starting points and presto you are done.  Not an easy problem, but I almost wonder if it was actually easier than question 2.

TC SRM 473

Q1) Given a sequence of step, turn left and turn right commands, which are repeated infinitely, determine whether you will stay in a constrained (but not specified) area, or drift off in to the distance.  Maximum command sequence length is 50.

A1) There are 4 scenarios.  1. You come back to where you started. 2. You end up facing the opposite way you started, and so the second run you end up back where you started. 3. You end up facing 90 degrees to where you started, in which case after 4 runs you trace a ‘square’ and end up where you started. 4. You end up facing the same direction as where you started, but not where you started, in this case you drift away.  In any case, simulate 4 runs and if you are back where you started, you don’t drift.  Case 1 will hit the same place 4 times, case 2 will come back twice, case 3 will come back once, but all 3 will be at the starting position after 4 runs.

Q2) Given n points equally distributed around the edge of a circle, and a ‘pseudo randomly’ generated subset k of those points which are marked, determine how many right angled triangles can be created using only the marked points. n is up to 1 million but k is limited to 100 thousand.  Also the input is guaranteed not to result in an answer greater than 2^64.

A2) Seems kind of tough for a second question at first glance.  First we need to identify how to create right triangles.  If n is even, this is easy, possible pair of edges corresponds to a mirror pair, and either of those mirror pair will make a right triangle.  If n is odd, no right triangles can be created… (A right angled triangle has its hypotenuse as a diameter of a circle, and 2 different circles can only have at most 2 crossing points, not 3.) However even with O(1) dictionary lookup and considering every pair of marked points to determine if either of the matching mirror locations is way too much.  So we need a solution which comes from considering the points one at a time.  The trick is in the logic we used to get rid of all the cases where n is odd.  The hypotenuse must be a diameter, so from each marked point we can determine immediately the location across the diameter and verify that it is a marked point.  If it is a marked point we then have a second trick, every single other marked point other than the 2 consumed will make a valid right triangle with the diameter.  So for every marked point in one half of the circle, if there exists the matching point then add k-2 to the sum.  Not so hard after all… (Although implementing the pseudo randomness itself had risks…)

Q3) Given a n x m board and k varieties of pieces each with its own count, determine how many ways exist to layout the pieces such that every row and column has at most 1 variety. n and m have a maximum of 30 and k has a maximum of 10. Return result modulo 1 billion and 9.

A3) Hmm, as usual Q3 seems beyond me. Even just trying to generate canonical arrangements which can then be boosted to the actual answer applying combinatorial multipliers, seems difficult.  After thinking about it for a while I gave up.   Turns out canonical arrangement approach is no good, you need to boost as you go along.

From a solution.  Define dynamic program over number of ways to populate the board using i rows and j columns to contain the first l varieties.  This is equal to the number of ways to populate the board with l-1 varieties in i-di rows and j-dj columns, times n-i-di choose di, times m-j-dj choose dj times the number of ways to arrange count of l in di rows and dj columns.  That second arrangement can be done using dynamic programming up front, or memotized, since it will be repeated a lot.  Firstly l has to fit into the area of di*dj, and must be greater than or equal to both di and dj in order for there to be an arrangement. I would have thought x*y choose c would have been the answer at this point but I would have been wrong, as it doesn’t consider the case where a row or column is left entirely empty. So we subtract off the number of ways to arrange c in to each possible scenario of less rows or columns, each multiplied by the number of ways to arrange the rows or columns into the full x and y size.

Not as bad as I first thought, but still pretty tricky.

TC SRM 474

Q1) Given a set of unit steps in N dimensional space, each in a specific direction of a specific dimension, determine if the path ever self intersects.  N is up to 1 billion and there are up to 50 steps.

A1) As I was reading this question I was thinking, this is kind of trivial, even for question 1.  Then I saw the 1 billion.  However, that only shocked me for a moment, since 50 steps means only at most 50 dimensions are actually moved along…

So anyway, maintain a dictionary of locations.  First walk the input to see which dimensions are actually used, then map those to the first k integers.  Then a location is simply an array of length k, and array equality can be used for the dictionary. (Or don’t bother with the dictionary, only 50 entries…).  Perform the steps, if any location equals any previous location return invalid otherwise return valid.

Q2) How many ways can a connected graph (with edge lengths) be reduced to a tree such that the shortest distance from all nodes to 0 is the same for both the graph and the tree. Graph vertex count is up to 50.  Return result modulo 1 billion and 7.

A2) So first step, calculate the shortest path for all nodes to node 0.  Choose your favourite shortest path algorithm… Then we sort the nodes from closest to furthest.  Then we consider each node in that order, creating a running product (with modulo after each multiplication and using long long numbers to avoid overflow) of the number of options each new node has to connect in to the previous nodes in a ways which equals the previously determined shortest path.  Presto, done.

Q3) Given a tree with n nodes, and a connected graph also with n nodes, how many ways can the tree be overlaid on the connected graph such that every edge in the tree matches an edge in the graph?  n is at most 14. Return result modulo 1 billion and 7.

A3) I swear I’ve seen this problem before… no idea where though.  I also thought maybe I would be able to solve it, but I eventually gave up and looked at an answer.

Step 1, determine a dfs order for the tree.  Then considering each possible starting position in the graph for node 0 of the tree, memotize over the dfs options in the graph which match the tree.  Memotization is an array of dfs position (source destination pair, but since its a tree, can reduce that to an edge or null for leaves – since the answer for a properly matched leaf is always 1), the set of nodes in the graph unused, and the node in the graph which the source is being matched to.  At each recursion, unless it is a leaf position in which case the answer is 1 as long as there are no available graph nodes, sum over each partition of the available graph nodes, where one of those partitions is connected to the node currently being matched to the source. Each element of the sum is the product of the number of ways the first partition works with a given mapping for destination times how many times the rest of the partition works for the next edge of the dfs from the currently being matched node.

The above on its own is going to be slow, too slow.  The inner loops for the top level recursion is 14*2^14 and the size of the number of potential positions where the loop has to be run is also quite high.  A major speed boost is to only consider partitions which have exactly the right number of nodes to satisfy the dfs branch being considered.  The number of nodes down a given dfs branch can be precalculated, and a popcount lookup table for every bitset will make the comparison fast.  Alternatively rather than walking every subset of the bitset and checking its popcount, the patterned combination walk bit tricks function can be used.  This reduces the inner loop to a worst case of 14*Choose(14,7), but far more importantly makes use of the memotization much sparser rather than being quite dense.

Hopefully next time I see this question I can remember all of that…

TC SRM 475

Another past SRM.

Q1) Given a list of n spots, where each spot is one of 3 types, and k markers, which are initially randomly placed such that no 2 markers are on the same spot, follow the rules where markers in the left most spot are moved right, markers in the right 2 most spots are moved left, otherwise markers on type 0 are moved left, type 1 move right, and type 2 moves left on the first move, but otherwise moves back to where they came from in the previous turn.  At the end of each turn the right most spot is removed from play, and any spot with 2 markers is cleared.  Determine the ‘expectation value’ for the number of markers left when there is only 2 spots left. n is up to 17 as is k.

A1) This is a question of details.  The important realisation is that determining the expectation value can be done by performing every possible starting scenario.  Worst case is n of 17 and k of 9 which has 24thousand possible starting points.  Each simulation is O(n^2).  After that it is a matter of implementing the simulation correctly.  Type 2 moves would be tricky, but the requirement to remove all cases with 2 markers on the same spot means a simple bool array ‘camefromright’ (initialised to false) is sufficient to make that work.

Q2) You start with 1 pair of an animal which is 1 year old. At age 2 the animals can breed to produce a new pair which will be age 0.  All pairs which are age 2 or higher will breed.  In some years after breeding half of the entire population of pairs will ‘leave’ never to be considered in the problem again.  If the number of pairs in the population is odd, the number of pairs which leave is half rounding upwards.  The specific years are provided in a list which has maximum length 50.  How many pairs of animals will there be after breeding (and possibly leaving) for n years?  n is up to 10 million so the numbers may be very large and the results should be returned modulo 1 billion and 9.

A2) This is the Fibonacci sequence with a twist.  The leaving years are the tricky part.  With a maximum of 10 million, you don’t even have to implement accelerated Fibonacci using matrix powering.  Leaving years aren’t too tricky – they remove all of the old population and half of the 1 year olds, leaving all the babies.  The halving can be done modulo 1 billion and 9 by multiplying by 500 million and 5.  The problem is the case where the number of pairs is odd.  Because of the modulo 1 billion and 9 (which is odd) you don’t know which numbers are odd or even once they overflow.  So the real trick to the problem is knowing when something is odd.  First thought is we can perform the same set of actions, also modulo 2.  However there is no inverse of a number which is not co-prime with the modulus.  So when we divide by 2, our modulo 2 is broken.

So it would seem we are at an impasse.  I certainly would have been.  However there is a trick (which I learnt from reading solutions :P), if you use modulo 4, you can perform a literal divide by 2 and only the high bit of the modulo will be broken. The low bit will still be accurate.  With up to 50 leaving years, modulo 2^51 would be sufficient to allow us 50 divide by 2’s and still have an accurate low bit.  Nice trick.

You can take that trick one step further, in programming languages where integer overflow is defined to wrap (which is not necessarily true in c++), addition and subtraction are implicitly modulo 2^(the number of bits in the number) so you can skip the whole modulo thing entirely and just use 64bit numbers for working out whether the numbers are even or odd.

Q3) There are up to 50 people, who answer up to 50 questions.  Each question has a point value, and some questions have been marked and others have not.  For questions which have been marked we know whether each person is correct or not, otherwise we only know if they answered the question or not.  We are going to select k people from the top n scores once we finish marking.  Equal scores are broken by order of person in the input.  Looking forward from this intermediate stage, how many different possible groups of k people are there?

A3) I didn’t get far on this question before giving up and reading solutions.  My favourite approach is to start off by caching choose function for up to 50×50.  Then resolve the problem by rephrasing it as ‘how many different groups of k people are there where person ‘i’ has the lowest maximum possible score’, and summing for each possible ‘i’.  These sub problems can be resolved by considering that people with a higher minimum score than the lowest maximum possible score ‘must’ be in the top n scores, and people with lower maximum possible scores cannot be (due to the requirement that ‘i’ is the lowest maximum possible score), the rest are entirely optional, we can toggle them in and out in any combination.  Then so long as we don’t have more than n-1 ‘must be in’ and we don’t have less than n-1 must and optionals, we can consider each partition between musts and optionals to make the group of k up until we choose so many optionals than the top n criterion would be overflown.  This is the sum of the product of the 2 choose functions, 1 for choosing from the optionals, and the other to choose from the ‘musts’.

In case the above is not convincing enough, a few pointers to make it clearer why it works.  Firstly ‘i’ being the lowest maximum possible score ensures there is no overlap between considered scenarios.  Then the fact that optionals can be independently toggled in and out means we aren’t counting cases which are impossible in the simple product of choose functions.

TC SRM 476

So I didn’t go in this single round match, but as part of my lead up to my being knocked out in round 4 I am going to write up 1 or maybe 2 past SRM’s a day as a bit of practice.

Q1) Given fixed and variable cost for each element where variable cost is proportional to the number of elements selected, what is the maximum number of elements which can be selected with a limit to maximum total cost. Up to 50 elements to select from.

A1) I was a bit disappointed at the time it took me to work out the solution to this, it was at least a minute… With only up to 50 elements to select from there are only a maximum of 50 answers, so you can check to see whether each answer is possible.  Can even binary search to find the switch point between possible and not-possible if you were wanted extra efficiency… which is completely unnecessary.  To check whether an answer is possible, calculate the total cost for each element assuming that there are n elements selected, and sum the smallest n costs to check if that is less than the maximum cost.

Q2) Given a directed graph connecting n nodes, what is the probability you can visit all nodes connected from node 0, starting at node 0, if when you arrive at any given node a random subset of size k of the available exiting edges is available for selection and you only visit nodes which are connected to from node 0 and never visit a node twice. Assume you know the entire graph and when presented with multiple options you select the one which will maximise your chance of success.  n is up to 36, k is up to n, and a textual space separated list of the outgoing destinations for each node is up to 36 characters long.

A2) I had to look up the answer to this because I was stumped.  However it turned out that the solution was one that I had considered and thrown away, because I mistakenly read the condition for the list of destinations from each node as being up to 36 entries, not up to 36 characters.  This was the killer point, because 36 characters can only hold at most 15 distinct numbers space separated when 1 is the minimum number.  With only at most 16 destinations  2^15 * 16 is an easily manageable state space, and another factor of 15 on top of that is an easily manageable running time.  What I am describing is a memotized recursion calculating the probability of success given having visited some of the required nodes, and currently being on a specific node.  Using the probability returned from recursion for each possible destination choice, you choose the highest probability.  However this has to be modulated by the chance of being able to select that choice, add in the second highest probability modulated by the chance of not being able to select the first choice, the third by chance of not the first 2, etc, until all choices are accounted for or probability of not being able to choose from the top n is 0.  Chances can be calculated using basic combinatorics… and assuming some caching up front, should be O(1).

Q3) Given a bi-directed graph (each directed edge always has its opposite edge present) with at most 1 cycle (when considered as an undirected graph) where each edge has a non-zero capacity, determine the minimum total capacity increase to ensure that a capacity n flow exists to node 0, regardless of the arrangement of sources (for total of n flow) feeding in to nodes other than 0. Maximum number of vertexes is 50, n is up to 100000. Maximum number of edges is also 100 (50 pairs).

A3) I actually think I stood a better chance with this question than question 2.  I am pretty certain that the arrangement of sources can be ignored, we just have to ensure the graph can handle a flow of n from any single node to node 0.  At that point I would think I would implement a maximum flow algorithm, then find ‘least number of constrictor’ paths and augment them until the number of constrictors increases, then re find.  Of course you can bail out as soon as you reach your target.  Least number of constrictor paths is like a shortest path search where edges with spare capacity have weight 0 and those that don’t have weight 1.  The low number of edges makes this fast.  Also since there is only at most 1 cycle there is a maximum of 2 paths between any 2 nodes, so implementing a proper maximum flow algorithm is overkill.  The 2 paths may have common sections, but other than that limitation, the maximum flow is the sum of the maximum flow for the  2 arcs of the cycle.  Even the least constrictor thing can probably be optimised due to the limit of a single cycle.  You can even reused the modified graph as you move on to each other potential starting node, many of them will then pass straight after the maximum flow test.  I think running time is worst case of about O(V * E^2) in an ideal implementation.

TCO10 R3

Well, looks like I am through… was a very strange round.  Apparently several people got through with no successful solutions.  Only 174 people had positive scores!

119th – although my code for Q1 seems to succeed by fluke rather than me having correctly solved the problem in my head…  Q2 on the other hand I thought my solution was dodgy, but then it looks the same as passing solutions, so I am not yet sure why it failed… (probably a ‘thinko’ which just happened to pass all the provided test cases… …)

3rd problem was evil.  I’ll leave the write ups until after I’ve had some sleep…

Edit: Some sleep had… probably not enough… but here we go with the rest.

If I hadn’t of broken Q2 I was looking at a top 40 finish, which is enough to make me think maybe I can make it to round 5 yet….

Q1) Given a number, return the last number less than or equal to that number which is removed while implementing the standard Sieve of Eratosthenes.

A1) So its morning now and I remember some more of my thought processes that happened at 2am, that I had already forgotten by 4am…  It was less by fluke that I solved this after all, just insufficient consideration of the ways that I could have screwed up my implementation using over-optimisation…

In uni there was an assignment to implement fast prime listing, and I got a bit competitive micro-optimising Sieve of Eratosthenes to shave precious seconds off of generating all prime numbers less than 2billion.  So I had plenty of experience to make solving this problem easy.  I pretty quickly wrote down the 2 sentence solution in a comment – find the largest prime p less than or equal to the square root of n, find the largest multiple of that prime less than or equal to n which is co-prime with all primes less than p.  First part can be implemented with … the Sieve of Eratosthenes … then the second part can be done by dividing n by the largest prime to get the largest multiple and stepping down through each possible multiple while the multiple is not co-prime with all smaller primes.  This is what I implemented and had work.

The reason why I was so confused, was immediately after the contest (which had had a lot of challenges succeed) the test case of n=8 came up in discussion. My sleep-deprived brain couldn’t work out why my solution worked immediately and then upon realising my stupidity at thinking largest prime less than or equal to square root of 8 was 3 (?!?), was worried that there was some case where the second largest prime less than the square root would be involved.  Again this was 4am stupidity as obviously the largest prime less than the square root, squared, will be among the last to be removed.

However it is probably worth discussing the case of n=8.  This is a special case because the answer is not the product of 2 prime numbers, it is instead a prime number cubed, 8 itself.  I suspect that for all n > 8 the answer is the product of 2 prime numbers, and over-optimisation could lead to making this assumption as part of your answer, resulting in 8 being wrongly returned as 6.

Q2) Given a set of starting points, and associated destinations, a travel speed of 1 distance unit per time unit and instantaneous but not simultaneous teleportation of any one person to another, what is the minimum time for all people to arrive at their destinations if there is 1 person at each starting point, at time 0.

A2) I was quite happy with myself for this question.  I solved it.  Unfortunately my solution in my head didn’t make it into the code correctly and hence I failed.  Not really surprising it was 3am and solution in my head was so foggy I could hardly make it out.

First observation is that there is no point teleporting at any time other than t=0.  This observation is pretty easy to self-justify, but a formal proof would probably take more than a couple of lines.

Second observation is that no one can teleport to the location the first person to teleport, teleports from.  This is trivial stuff, but it leads us to the 3rd observation, after one person teleports the remaining people can teleport however they want, any rearrangement is possible, so long as they don’t want to go where the first person teleported from.  I made this observation which is the key to solving the problem, but I failed to fully understand why it was true and that was probably my undoing.  The reason this is true is because after the first person teleports there are 2 people in the one location. This 2 people in one location scenario allows for trading, any person can switch locations with one of those 2 people (specifically not the first teleporter since that person is where they want to be).  After sufficient swaps, everyone will end up where they want to be.

So now for implementation. There are 2 cases (which is what I failed on, my solution tried to merge both cases together and inadvertently allowed the return of answers which are smaller than possible).  Case 1, no one teleports, everyone goes straight to their destination.  Determine the latest arrival, this is the benchmark.  Case 2, one person teleports first.  For each possible first teleporter consider the latest earliest arrival if no one is allowed to travel to their destination from the first teleporters starting location but can travel from any other starting location.  This *includes* the first teleporter.  That specifically was my mistake – I thought that by allowing the ‘first teleporter’ to not teleport I was covering case 1 at the same time as implementing case 2, but I wasn’t forcing *everyone* to not teleport, so I was ending up with erroneous simulations producing the possibility of wrong answer (although not in the provided sample test cases).  If any scenario in case 2 improved upon the benchmark, update it with the best and return it as the result.

Q3) Modulo 1 billion and 9 (large prime), how many unique alphanumeric passwords are there of length N which have at least D digits, U upper case letters and L lower case letters? All of N, D, U and L may be up to 200000.

A3) My best effort at solving this had a running time in 10’s of billions of memory lookups (using an inclusion/exclusion principle approach), so I gave up and looked at how other people solved it.

First get the trivial out of the way… If D+U+L > N return 0.

Then also obvious, generate tables of powers of 10, 26, factorials and inverse factorials mod P.  We are going to be doing a lot of choose function calls, so the factorial/inverse factorial cache will be valuable…

Perform a summation over each possible number of digits d = N-U-L down to D.

For each of these the number of passwords is N choose d times 10 to the d times 26 to the (N-d) times the sum of N-d choose u for u=N-d-L down to U.

The problem is the inner summation is too expensive (just like my inclusion/exclusion approach).  The trick is to relate one inner summation to the next as d decreases, so we don’t have to recompute it fully for each outer loop.  The other part to this trick is to realise the only reason we can do this easily is because the size of the set of lower case letters is the same as the size of the set of upper case letters (which would not have worked for my inclusion/exclusion approach).  As d decreases the inner sum gets one extra new term and the size of the choice goes up by one.  This can be thought of in terms of Pascal’s triangle, which leads to the result that the new sum is double the old one, plus the immediate outside neighbours of the previous summation.  This can be done in O(1) time, allowing for efficient solution.

This pattern, that the sum of a part of a row of Pascal’s triangle can be used to efficiently generate the sum of a similar part of the next row is something I haven’t seen before (or at least not as far as I can remember), looks useful.

TCO10 R2

So I get a t-shirt this year 🙂  not a google one, but a t-shirt none-the-less.

Round 2 was a very strange round, I thought I was slow on Q1, but apparently I was unusually fast on Q2 – which is very much not how I usually perform in these competitions, even less so at 2am start times…

As far as I can tell out of 8 Aussies only 2 of us got t-shirts and are through to round 3.  In fact despite it not being a google code competition, and hence John Dethridge being eligable to compete, I was the highest ranked Aussie at 133rd place.  I think that is my personal second best placing ever in a coding competition. (Although I guess it doesn’t really count until its a placing which gets me eliminated…)  My best ever I believe was 115th or so in one of the first google code jams.  One of the ones where 100 people were going to be flown to the US to compete on-site…

Q3 had a decent number of submissions, but they did not include me.  Almost all of those submissions failed though, so I have to suspect there is something I am missing about this question (even if I did only work out how to solve it at the start of intermission – I hadn’t been trying that hard since I felt relatively secure that I would get through given my current score and with only 30 minutes left I felt my chances of writing up a solution were negligible… and on second thoughts I am not sure I actually have solved it…).  1 of the 5 people to solve Q3 didn’t even get through, because both their first 2 answers failed.

I might as well write up some analysis before I get 3 hours sleep…

Q1) Given a set of places and multilane roads which connect them (each multilane road has the same number of lanes in each direction) and a single snow plow which starts at place 0 and takes 1 minute to traverse a single lane between any 2 places.  What is the minimum time to clear all the snow and return to place 0?

A1) I wasted a couple of minutes convincing myself that my very easy solution was correct.

Step 1, verify the graph is connected to all places which have roads – I used breadth first search from place 0 and then checked the two ends of every road were at places reachable from place 0.If not reachable return -1.

Step 2, sum the number of lanes, return that value.  This step was the one which bothered me, but I eventually convinced myself that since depth first search visits every edge exactly once, and multilane highways can always be processed with all but 1 lane on the way in, and the final lane on the way back, that it is indeed this trivial.

Q2) Given a number less than or equal to 100 million, find the smallest number equal to or greater than that number, which is the sum of 2 numbers made up entirely of odd digits in base 10.

A2) The key to this question is to recognise that the number of numbers which consist of only odd digits less than 100 million is only half a million.  100 million itself can be expressed as 1 + 99999999, so there is no need for anything more than 8 digits.

So, generate all odd digit only numbers up to and including 99999999, in order.  Then starting i and j equal to the first and last positions in the list, decrement j while the sum of the two positions is still greater or equal to the input.  Do not go past it.  Record that sum. Increment i and start repeating the j decrement again.  When you can’t decrement j any further, compare sum with best so far, if better, update.  Keep repeating this until i is greater than j.

There is a much more efficient method, where the solution can be found in time order of the number of digits in n, rather than a time order of n/log n, but taking the simpler to reason/code approach was probably the only reason I scored so well.  If they had of wanted to make the question harder they could have simply increased the limit from 100million to 1 billion.  This would have caused memory usage of the simpler implementation to jump, I believe past the memory limit.

Q3) Given a ‘large’ grid of squares, and up to 40 of those squares which ‘interesting’ determine the minimum number of grid splits required to get each square on its own.  A grid split is dividing one section of grid in to 2 smaller sections, either along a horizontal or vertical line.  And ‘on its own’ means no other squares, interesting or otherwise, may be in its section.

A3) I am still not sure why a greedy approach of selecting the edge which has the most number of interesting squares touching it for each of the current set of sections and choosing to divide on it might fail (which is where I spent my first 25 minutes of thought), but a memotization or dynamic programming approach is going to be the right approach…

The problem with DP is it does appear it will run too slow.  The worst case of the number of potential sub problems is 80x80x80x80 and each of those cases will require processing time, with up to 160 different children for each of the highest level problems. 2 seconds processing time is going to be pushing it, and memory usage is going to be close to the limit.

Apparently some very good coding in c++ means the direct DP approach can run in time and in memory.  And after hearing one of the other coders say 40^5 as their execution time, I think I may have worked out the trick.  If you sort special squares along an axis, there are places where you get 2 potential edge splits in a row, where there are no special squares inside.  I believe the optimisation is to realise that if you split one of those edges, you can split both and treat it as a single operation, but with a count of 2 rather than 1.  This makes it 41x41x41x41 worst case, with 82 potential children at the highest (every?) level.  I’m not 100% on this optimisation yet but I’m pretty sure its the one.

Actually I have subsequently convinced myself that this optimisation is fine, and the same logic allows us to immediately trim off any outermost edge split locations before we start the dp.  This drops the dp table to 39x39x39x39 worst case with 78 potential children at every level (not just highest, but there could be some potential to speed it up with a binary search to find one which is in the current location and walk outwards, other methods seem to have a higher overhead than the time they save…)

TCO10 R1

So TCO10 round 1 finally came along, of course I’ve been sick all week and TCO starts their competitions at 2am, so I wasn’t feeling in the best of shape for getting through in  the top 850.

But I did, about 750th, entirely due to a successful challenge during the challenge phase.  Been a long time since I’ve done one of those, and its the first time it has made a real difference.  I only got the first question out successfully, I realised I had been an idiot for about an hour with 2 minutes left on the clock to do the second question, and I made some stupid mistakes in my implementation, which I submitted without testing knowing my time on the first question was likely insufficient to get through.  Interestingly no one challenged it despite the fact it failed even the most basic tests!

Its 4am, but I am a little energised from having snuck through, so I’ll write up the first 2 questions.  Question 3 will have to wait – I haven’t really looked at it. (Although at first glance I think I would have made better progress on it than Q2 – although I probably would not have solved it).

Q1) Given 2 strings of equal length, find the smallest string which you can both make them equal to in the minimum number of moves.  A move is changing one letter in either string to either the letter above or below (where z can wrap to a and vice versa).  Minimum number of moves has priority over ‘smallest’ string. Smallest being lexicographically earliest, alphabetical sorting.

A1) The examples pretty much covered how to do this, although I imagine that skipped the one tricky corner case.  In any case, the solution is to check each letter in turn between the 2 strings.  If the absolute difference ignoring wrapping is less than 13, add the smallest of the letters to the result, otherwise add the letter a.  I was very concerned that at the 2am start I might have gotten the less than 13 condition wrong, implementing an off by one.  A simple mistake would have been to used less than or equal to.  If the non-wrap distance is 13, the wrap distance is also 13, but you have to remember that wrapping always takes you through a, which results in a lexicographically smaller answer.

Q2) Given a computer with 2 registers both initialised to 1, and 2 instructions, one which sets register 2 (Y) to the sum of the 2 registers, and the other which sets register 1 (X) to the sum of the two registers, determine the shortest program set register 1 to the value ‘R’, where ‘R’ is between 1 and 1 million inclusive.  If there is more than one program, return the program which is lexicographically smallest, where the second instruction as described above is called ‘X’ and the first one is called ‘Y’. (Just to confuse, the instructions have the same names as the registers!)

A2) I felt pretty stupid at 2minutes to go when I realised that the approach I had discounted immediately, was the answer.  That approach was trying each combination of register 2 with register 1 equal to R and where register 2 is less than or equal to R.

Instead I recognised the similarities with Fibonacci numbers, and wasted time thinking the golden ratio might be involved in the solution.  I tried breadth first generating under the assumption that the answer string would be short, but I soon proved that it would not always be.  Despite that I somehow went back to golden ratio thinking…

What I should have realised was the association with GCD, a combination of registers can only be reached if their GCD is 1.  But more importantly, I should have realised that for a given pair of X and Y, there is only one X’, Y’ pair which can produce it. And following this back to X=1, Y=1 (if it does indeed reach that) can be done very quickly (since it is Euclid’s algorithm).  So running it 1million times is acceptable. (This is what I realised with 2 minutes to go…)

The subtlety is the requirement to return the program.  If you try and build up the program for every execution of Euclid’s algorithm you will find making the generated strings will consume more cpu time than Euclid’s algorithm…  So the trick is to only generate the strings if needed.  Solving the problem in 2 phases is one approach, executing the second phase only for the cases which phase 1 indicated were smallest.  Another is assuming that the shortest answer will have X’ and Y’ near each other, iterating in an order which does that first and then capping the Euclid algorithm recursion to whatever your best previously seen depth is.  This doesn’t actually seem a guaranteed success, but it did seem to be good enough.

The best answer I saw calculated the programs as a single phase, but expressed them as a run length encoding, allowing for the division acceleration of GCD to be maintained.  Better yet the run length encoding was a list which when compared to another list, actually maintained the lexicographical sorting of the non-encoded string, so the only point at which a large string might be constructed, was creating the final answer.

Finally if you were using c++ and 2 pre-allocated 1meg char arrays, it appears you can run in under 2 seconds cpu time even if you don’t use division acceleration. Sneaky…

Q3) Given a fully connected directed cost graph, determine maximum profit from tours of the graph which all have the same income (regardless of where the tour goes), all start and end at vertex 0, but cannot visit the same non-zero vertex as each other.

A3) I initially suspected something like enumerating subsets of the destinations, determining cheapest tour for each, and sticking them in a maxflow graph in a way which ensures no 2 which overlap destination wise can be selected, then running max flow graph algorithm.  Suspect the creating of the graph is the bit which I would get stuck on, not enough practice.

… Apparently I couldn’t sleep until I solved this one, at least I didn’t feel the need to come and write up the solution straight away.

Create a max-flow/min-cost graph as follows. For every vertex in the original graph, create 2 vertexes, destination enter, and destination exit.  Between each pair (exception for vertex 0) create an edge with no cost and capacity 1.  For each edge in the original graph, add an edge between destination exit for the original edge start, and destination enter for the original edge end, with cost equal to original cost and a positive integer capacity (might as well make it 1).  Connect vertex 0 enter to the sink with infinite capacity and no cost.  Connect vertex 0 exit to the source with no cost and capacity C.

Run max-flow/min-cost for each C equal 0 to number of vertexes -1.  Return the best of flow times income – cost, out of each simulation.

I was happy I managed to solve this without looking at any solutions (at 5am no less), I think that is the first time I’ve reasoned through a max-flow/min-cost problem from scratch.  Interestingly I thought I had implemented a max-flow/min-cost in TMD.Algo, but it appears not.  I really should get to that!

GCJ10 Round 3

So round 3 was last weekend, and no exceptional circumstances were declared so just the top 500 went through (which didn’t include me).

With only the top 25 going onwards, the questions were always going to be tough, and for the 4 questions, getting all the small inputs and 2 of the large inputs in a decent time was sufficient to get through.

Only 370 people out of the top 500 got a positive score, and having an initial look through the questions I doubt I would have done very well – maybe a couple of small inputs out.  One of the 2 remaining Australians got 66th place, quite impressive.

On to the questions I guess, although I’m going to have to refer to the post contest analysis this time for sure!

Q1) Given a sequence of numbers which are at most D digits in size, determine whether the sequence is predictable if it is governed by S(n) = A*S(n-1)+B mod P and if so return the next number.  P is prime and less than 10^D, A and B are non-negative.

A1) The small input D was at most 4, meaning that a brute force on P and A, solving for B, verifying the rest of the numbers fit, and determining whether the set of next numbers was unique, is easily done.  That would have been easy.  The large input however D is at most 6, making a brute force of P and A impossible.  What we need to do is consider each P and the set of numbers to determine both A and B simultaneously.  If we have 3 numbers, we have 2 simultaneous linear equations, Y=A*X+B mod P and Z=A*Y+B mod P (if we don’t have 3 numbers, the sequence is not predictable… for every A there exists a B which gives the result).  If we subtract these equations we get Y-Z = A*(X-Y) mod P.  and so long as X-Y is non-zero since P is prime, this equation has a unique solution which we can solve using extended GCD (which is in TMD.Algo).  This still leaves a few special cases.  Specifically the X=Y case.  If X=Y then Y will also equal Z, due to symmetry, regardless of A and B and P.  So any case of repeated numbers is predictable. This includes the 2 number case, despite what I said just before…

Looking at the contest analysis seems I was on the money here. So, worst case I would have come 250th… lets see if I can do better…

Q2) Given up to 100 unique integer lengths of fencing, determine the minimum number of fencing pieces required to create a fence with a length of exactly L.  L is between 10^10 and 10^18.

A2) For the small input the maximum length of each piece of fencing was 100.  For the large that jumped to 100000.  I think that a key point to this question would have to be that L has a very large minimum value.   For the small input I would posit that it can be solved using dynamic programming on the lengths of fence to create lengths less than 10000, then for each of positions up to 10k prior to the target L using only the largest piece, check if there exists a way to reach L using the DP table, and find the minimum amongst those options.  When the fence size jumps to 100k, my approach becomes infeasible, because the DP needs to go to 100k squared.  Going to have to check the solutions for this one.

Okay, so I would probably not have gotten the large out, although the intuitive leap does appear within my level.  Take the largest board, use it until we get just short of L (or exactly L).  Then we need to determine how to make the remainder using smaller fences, or to make the remainder + largest fence, or +2 largest fence…  We can do this by creating a graph, starting with nodes 0 to largest fence-1, and adding edges as follows.  For each fence length less than max, add an edge from each node to node + length of weight 1 if node+ length < max or to node + length -max of weight 0 otherwise.  Then given this potentially 100k node graph with, maybe 10million edges, find shortest path from 0 to remainder (This can be done in O(E) time).  The length of that shortest path + number of max length boards to reach 0 is the answer.  This only works because the shortest path will never have more than 100k weight 0 edges, and L is greater than 100k squared.

Based on my reading of the answer, I believe my small input answer is correct, so that would bring me to 21points or 187th place.

Q3) Given up to 200 locations at integer coordinates on a (very long) line each containing 1 or more items, you want to move them so that there is at most 1 at any spot.  However your only option for moving them is if there are 2 or more on a spot you can move 1 to the left and 1 to the right.

A3) Small input is up to 200 items, large input is up to 100k items.  Even the small input seems tricky here… would seem to need some kind of strategy.  Thinking about each move, it doesn’t change the centre of mass, so if we can find the centre of mass, the number of moves is minimally limited to those which spread them to being every spot filled with 1 about that centre of mass. That however may not be possible.  Movements at the centre of mass are more useful than other movements since they are the only ones which actually separate things.  The problem may also be separable into sub problems which don’t interact.  In that case the centre of mass of each sub problem seems important.  Small input I guess can be done via simulation of a strategy of centre of mass first or maybe largest first.  I probably would have just tried one to see if it solved it, since you get answers for the small input immediately.  Large input however requires better than straight simulation…  Time to check the answers again…

Well, turns out strategy isn’t needed for the small input, so I would have solved it ‘by accident’ – the actual series of moves is irrelevant, the final configuration is always the same and takes the same number of moves to get to regardless of strategy.  Apparently this is a ‘famous theorem’ which probably means there are a few thousand people in the world who know it 😛  With this observation (and one other leap I would have been unlikely to get) the problem becomes quite simple.  Simply construct the final answer and from there determine how many moves were required to get to it.  Constructing the final answer can be easily done by simply building up the input.  Adding each stack one item at a time, it either goes into an empty spot, or it causes the current end stack to split in 2.  If you keep a stack of segments, the possible scenarios are, top 2 segments join, top segment splits or top segment splits and the bottom part joins with the one below it.  Each operation is therefore O(1), giving an O(n) simulation to get the final position.  Now the trick to get the number of moves is to take the sum of squares of the original positions and compare them to the sum of squares of the final positions.  What will be found is a difference which when divided by 2 is the number of moves. This can be proven by considering how the sum of squares changes with each move.  I suspect that if I had of known the theorem I would have tried to count the number of moves while simulating building up the stack.  Each addition adds a number of moves related to the lengths of the splits it causes in a segment. Doubt I would have solved that in a competitive scenario anyway…

I’m willing to give myself the small input though, bringing me to 27 points and 158th in my imaginary competition where I hadn’t flunked in the previous round.

Q4) How many ways can you add 1 or more different numbers to give k, subject to a rather complex restriction…  The restriction is, if the numbers are written in base B, the dth digit (counting from the right) of each number will be unique across all the numbers. Different orderings of the same numbers all count as the same way.

A4) This question strikes me as pure evil, but interestingly enough slightly more people got out the large input for this than for the previous question…  The small input limits B to be binary to decimal and k to be no more than 100.  The large input however raises B to being anything up to base 70 and k to be up to 10^18…  I think the key to do even just the small input is to realise that we can reorganise the digits freely between numbers, without changing the sum.  But even with that it seems a bit tricky, dp over sums s less than k, with the number for the largest digit rearrangement being n, also less than k.   Then we need to work out how to multiply the total to get the actual answer.  I think we may need to break the dp down into number of sums of n numbers to get to s, since I think the reordering count depends on that.  However the fact that 0’s on the front don’t need to be distinct is a problem.  At this point I give up and look at the solutions.  Its not like I would have had any time left to solve this question in the competition anyway…

Apparently I was complicating things too much – the small input can be done by simple brute force, the number of partitions of 100 with distinct integers is only a few hundred thousand, you can just generate them all.

However I was starting to get on the right track for the large input.  It is a dynamic programming problem, and the order of digits is not relevant for each column, but the needed approach is to build up one digit at a time, and using a nested DP to count the number of reorderings for each column.  So you need to keep track of the carry, and how many of the numbers are finished and whether there is a 0.  The zero allows for the termination of a number, but I am not sure why just the fact of a 0 rather than the number of 0’s is needed to be known.  It almost makes sense, but I would need to implement it to be sure.

Anyway 0 points for me here so I’m stuck with my theoretical 158th place.

158th would have been close to my best ever placing, which was 115th in the first GCJ when it was still run by top coder… (and 100 people were going to be flown to the US) and the depth of field has definitely increased since back then… (In a super imaginary world where I had made the intuitive leap for Q2 large input, I would have gotten 51st… ahh I can dream :P)