Random Musings – Page 2 – The Missing Docs .Net

TCO11 QR2

So this one was at a more reasonable time, so I participated. 281st, not as good as I might like, but good enough to get through.

Q1) Given a scrambled list of two item types with a total count of N, and the ability to at most once for each i in 1 to N switch two elements that far apart, determine the minimum number of swaps to order the list, if it is possible.

A1) So, the answer to this question was so obvious that it takes a while to see. In the case of there only being 2 element types a single pivot sort pass will result in an ordered list. If you consider a pivot sort where you walk from the outside towards the middle, swapping if both are on the wrong side, walking a specific end if it is on the right side – you can see that it will perform a minimal number of swaps and that every swap will be closer than the last.

Hence it is always possible to sort, and the answer is just the count of the number of items not in the right spot to start with/2. Or you can simulate the pivot sort, like I did.

Q2) Given a ‘map’ (1 dimensional array) which shows the position of two critter types, which are either grumpy at time t % K = 0 or at t % C != 0, determine the minimum time to get from the left to the right of the map if you cannot by in a cell at a time when its occupant is grumpy, if it is even possible. Maximum value for K,C and length of array are all 50.

A2) So this is really pretty easy, its a shortest path algorithm, with a small twist. Because the behaviour of a critter depends on time, each cell needs to be replicated in the graph to represent how time plays a part. Obviously replicating once for every time step defeats the whole purpose of a shortest path algorithm… but since behaviours are dependent on mod K and mod C, it is pretty obvious that t mod K and t mod C are the only things that matter for a given cells possible future paths. So each cell is replicated K*C times (or LCM(K,C) if you want to get really fancy, but given the availability of primes near 50, worst case for K*C and LCM(K*C) are pretty close…

So 50*50*50 vertexes in the graph, and 3 edges from each vertex. From there you just use a priority queue based Dijkstra algorithm and presto, successful solution done.

I on the other hand used a non-priority queue – which for a while I had convinced myself should perform much slower in the worst case. Shows how rusty I am! graphs with equal edge distances don’t need a priority queue as a normal queue already walks them in breadth first search order. Using a priority queue would only make things slower by adding the logarithmic insert/remove cost overhead. So yeah, in hindsight my algorithm works pretty well… If .Net had of had a built in priority queue I probably would have used it and only made things marginally worse!

Q3) Define S(x) to be the sum of the digits in x. Define D(x) to be the recursive application of S until the input equals the output. Given a range, determine how many of the contained numbers can be represented as y*D(y). Range constraints: less than 10 billion.

A3) This problem was easy! – unfortunately I only realized as much with 10minutes on the clock, and I just wasn’t fast enough to write it all up. I was 4 lines of code from being complete. (Although said 4 lines require a little bit of thought each…) If I had of rushed rather than taking everything at an easy pace I might have saved those extra 2-3 minutes I needed to finish those 4 lines and perform the final debugging. That would have gotten me in the top 40, which is where I know I am capable of reaching.

D(y) = 9 if y is a multiple of 9 else y mod 9. If you’ve played around with digit sums before you would know this, if not a short exploration should make it fairly obvious. So y*D(y) can be decomposed into 9 linear functions, which sometimes overlap. Includes/Exclusion principle applies, so you can sum, subtract, sum, subtract your way through every possible n choose 9. That is quite a few equations to write out, but everyone of them is pretty simple – if you could be bothered…

However! Every one of the 9 equations is of the form a*9k+b. Note that these equations all have constant ‘mod 9’ values and they aren’t all the same. Hence many pairs don’t have any overlap and so the inclusion/exclusion can be simplified significantly. It decomposes into 3 pairs and 1 triple. This means you only need 9 base equations, 5 pair equations, and 1 triple equation. I was half way through the pair equations when time ran out. Interestingly I found one of the pair equations cancels with one of the base equations, and apparently there are more cancellations because looking at once of the successful answers, they only needed 9 equations and 7 of those were base equations. (So technically I was 2 lines of code away from being finished :P, 1 add and 1 delete…)

TCO11 QR1

So TCO11 kind of crept up on me, wasn’t paying much attention compared to Google code jam. But here it is qualification rounds 1 2 3 intermixing with the schedule for GCJ Round 1.

I was thinking of entering this round, probably would have been tricky to get through though, QR1 is not usually agreeable to my skill balance. I signed up for it, but by the time it hit midnight here I decided that I was too tired to stay up another 2 hours until the competition started. QR2 and 3 are much better times for me, although QR3 is during a work day so, hopefully I get through on QR2.

Almost 1600 contestants competed in QR1 – cut off for 600th place to advance to round 1 appears to be 235 points. So a ‘fast’ question 1, or question 1 + challenges, or question 2 was the target. Only 1 Australian through this round – but given the time zone it is hardly surprising…

Q1) There is a group of up to 50 people and each knows exactly how many of them are compulsive liars. Each makes a claim ‘there are at least n liars’, for a compulsive liar this means there are ‘less than n liars’. Determine the minimum number of liars, or return -1 if there is no possible correct answer.

A1) So this isn’t a difficult question, although there might be 2^50 possible scenarios for consideration they can easily be excluded in bulk, and exact scenario is not required. We just need to walk through each possible liar count and determine whether it is satisfiable.

So what is the requirement for satisfiability? For a trial n, there must be n claims greater than n, and the rest must be less than or equal to n. This corresponds to the n liars and the N-n truth tellers. Trivial enough really… probably trivial enough that I would have been too slow.

Some corner cutting options which are totally pointless and just serve to make your answer more complicated include counting the 0 and 1 entries – as those cannot be liars, and counting the N and greater entries – as those must be liars, then performing the walk in between. If the range had of been larger, sorting entries would allow for an O(n log n) implementation rather than O(n^2) – but it was just pointless in this small input size scenario.

Q2) Given a set of up to 50 integer music track lengths, and random selection from them (where repetition is possible) determine the probabilities that a given time (T) will be playing each track.

A2) Looks like an interesting question on the surface… seems like it calls for dynamic programming, but with T being up to 80000, need to be careful not to invoke O(T^2) on the DP array population.

At T=0, all probabilities are equal and it stays that way up until minimum track length. At that point probability of that track drops and the others go up. So it looks like maybe the probabilities can be linked to when tracks last ended.

At this point in my investigations into the solution I went down a long path of counting the number of ways to be ending a given song at time T. This was a problem in 2 ways. 1) It produced Huge numbers which were just too large to handle, and 2) it was useless because it didn’t account for the probability of the starting point of each song. Counting is not a good option, probabilities are better. I doubt I would have realized this during the competition soon enough, so highly likely I would have failed to get through.

So the correct approach is to ask “what is the probability that a song can start playing at time t?” This is even easier than the counting question. P(0) = 1. P(t) = Sum(P(t-ti)/N). The probability is the sum of the probability at the previous potential start time for each song divided by the number of song options for playing at that time. From there the answer is just the sum for each track over each possible start time that would have it still playing at the finish time. (Remembering to divide by the number of tracks that could have started playing at that time.)

Q3) Given a pattern of ‘B’, ‘W’ and ‘?’, where there is always exactly 1 question mark and 0 or more of the other characters, determine the shortest and lexicographically minimal replacement of ‘?’ which makes the sequence have a value of N, using the following rules. Sequence of length 1 has value 1. A letter alternation increases value by 1, a letter repetition decreases by 1. The sequence as evaluated left to right must never decrease below value 1. Further conditions, input pattern is at most 50 characters, and the final value of N is at most 100.

A3) Initially looking at this question I am thinking it must be a joke…

Four scenarios. 1) Question mark is on the end. Evaluate the pattern start for viability, and then proceed from evaluation end value to target value with directness – empty, alternates, or continuous repeats. 2) Question mark is on the start. Slightly more tricky, Check for empty, or add alternating on the front until it works (if it is going to at all). 3) Question mark is in the middle. Hardest case. Similar to case 2, but target to get back to isn’t empty. 4) Question mark is on its own. Answer is alternation starting from B until target value reached.

Scenario 3 is the only tricky part. Given a start and end character, and a target end value and a given start, without ever going below 0, what is the lexicographically smallest approach that yields the correct result. I suspect this could be done using a list of maybe a dozen or so scenarios which are simply thought through for the optimal strategy, but it can also be turned into a recursion, but one which needs to be evaluated in a fashion which is greedy so as to avoid considering solutions which are known to be way too long.

So with a known start character and start value, we can look at it as a shortest path to specific value ending in specific character. Adding a character the new state is determined by the old value, the old last character, and the new last character, with the outputs being the new last character and new value. If the assumption is made that the value never needs to exceed a certain threshold in order to achieve the shortest path, this state space is limited and easily fully explored (only two possible transitions from each spot, which we only need to consider if we find a better answer for that position). A simple threshold to assume is double the maximum target, or 200 – but I suspect the true threshold is much lower.

So, not a joke after all… still I think I was more likely to solve it than question 2…

GCJ11 QR Post-Round Analysis

So looks like a pretty big turn out – 10566 qualifiers. 1855 perfect scores – which I was a part of – although in some ways I don’t think I deserved to be.

The 4 problems was definitely more than I could have handled in a 2.5 hour competition (not without less huge insane mistakes at least!) – I actually took more than 5 hours to submit my answers – although I had a lot of distractions and took a few hour break in the middle. So sure it took less than 2.5 hours to write up the code, but I cannot ignore the value of that break allowing to the answers to mull over in my head.

Q1) Given starting positions for 2 robots and that they can both move at 1 unit per second, and that it takes 1 second to press a button, and a list of integer button positions and who has to press them in exactly the given order, what is the minimum time to complete the required work?

A1) This puzzle was actually portal themed – there was mention of possible cake. As usual they provided a sample example of ‘one way’ to solve a sample optimally. I almost wonder whether I should start to assume that any time they say that it is an example of a sub-optimal strategy which just happens to be optimal.

Anyway the question itself is a fairly basic simulation using a greedy algorithm. At any given point in time the two robots have a target position, which is based on the next button which needs pushing by them. For a tiny optimization, this only ever changes when they push a button, but given the constraints even on the large input you could probably have gotten away with updating the target every timestep. Then it just comes down to simulating each timestep. If a robot is at its target, it doesn’t move. If it is supposed to push a button it pushes the button and doesn’t move. A likely trivial mistake to make would have been to accidentally let the second robot push its button in the same turn as the first robot because you’ve updated which is the next button to press but haven’t yet ended the current turn. Maximum number of turns is something like 100k for the large input, no problems given the 8minutes you have to submit the answer.

Q2) Given a list of pairs which if matched consecutively are replaced by a third (which will never be found in the list of pairs), and a list of pairs that if they are ever found anywhere in a list (except if they are instantly replaced by something in the first list) cause the entire list to be cleared. Process a list of input characters adding each one to the output list in turn, subject to these rules, print the output list in a pretty format.

A2) At 82% this problem had the lowest small to large conversion ratio – and it isn’t surprising because the small problem constrains the list of pairs to be at most 1 for each case and the large allows up to 30. This isn’t a performance problem at all, but it does mean that several corner cases just cannot occur. Writing up my solution for this problem probably took longest of all of them, although I found it easy to solve in theory – I made copious errors during implementation – several of which would have crept past the small inputs if I hadn’t of noticed them in time. (Keeping with the computer gaming theme, this problem is inspired by Magika)

So again this is a straightforward simulation. If the list is empty you add the character. If not first check whether the incoming character and the existing last character are a pair in the first list of pairs. Performance is not a problem, just walk them all, checking both orders (scale optimization is to add pairs a dictionary of dictionaries – but not needed here). If a match is found, replace the current last character with the replacement and continue loop onto next character. If not it is time to check whether there is a need to clear the entire list.

Again because the scale constraints are so small there is no need to optimize this using dictionaries, you can just walk every entry already in the list, consider it with the incoming character against every pair provided in both orders. Obviously this is O(n*k) but k is at most 30 in the large input. If you do optimize this scenario you would probably do what I did and add a set to keep track of the unique elements in the input list so you can dictionary lookup the next character, and do a set intersection between the dictionary results and the set of unique elements in the input list. The one thing to be careful of here is that a set is not a good implementation for keeping track of the set of unique elements in the input list, you need a count of each unique. Without a count, the pair transformations which remove the last element can cause you to incorrectly claim that element is no longer in the list. In any case, if you find a match, you clear the list and do not add the incoming character.

Finally if neither transformation nor clearing occurs, add the character. At the end of the simulation loop the nice formatting was trivially achieved using string.Join – but a simple loop with a first loop condition does the job.

Q3) Two boys have a set of items which each have a value. One boy can’t add – any time he adds two numbers he always forgets to carry the 1. He also adds in binary… The other boy however can add just fine and wants to create two piles of these items which the first boy will think have the same value, but hopefully do not. In fact he wants to minimize the value of one non-empty pile, and being a mean boy, give that to the first boy and take the rest for himself. For a given set of values, is it even possible, and if so what is the maximum value the second boy can take for himself.

A3) Looking at the questions, I would say this one gives the first question a run for its money as the easiest question – but the statistics don’t back me up. I was bemused to see people asking the Admins questions about this question during the competition. ‘Does the first boy add the items left to right?’ – ‘Yes’. Addition without carrying the 1 is both commutative and associative, so the order is irrelevant.

First off we note that the first boy thinks A+A=0. This combined with commutative and associative, quickly shows that the sum of all items must be 0 if there are 2 piles which have the same sum. The next leap is to see that A+A is the only way to get 0, hence again because of commutative and associative natures any bi-partition of such a pile with sum 0, has equal values.

So, from there it is trivial. Second boy gives the single smallest item to the first boy and takes everything else.

Implementation involves proving the no-carry sum is 0. Afterwards I realized that I could have implemented this using simple xor on the whole number, but I broke the integers into arrays of bools and reimplemented bitwise no-carry sum (xor!) by hand. Once you prove that the xor of all inputs is 0, sort, and sum the top n-1 elements and return that as result. Simple.

Q4) Given a permutation of the first N integers, and a single operation which is ‘randomly permute a selected subset’ and acting ideally – what is the expectation value for the number of operations until the list is sorted.

A4) This was by far both the hardest and easiest problem. The provided sample provided a hint which was both useful and useless at the same-time… Random simulation looked liable to time-out before convergence, especially for large input and presumed you knew what the ideal strategy was.

So, what is the ideal strategy? The sample provided a hint that if you have n disordered elements, acting on each sub-permutation cycle independently appeared a good strategy. (Permutation cycle in disordered elements is the smallest set of entries which can be reordered on their own to end up completely ordered correctly. For example 2341 is a permutation cycle of 1234, as is 2413, but 2143 and 4321 are not as they can be divided into pairs which when switched are in the correct positions. This may not be the correct definition of a permutation cycle, but I can’t think of a better term to use in this discussion…)

Indeed I could see that it seemed reasonably obvious, that not randomizing a subset of a permutation cycle was inefficient. I didn’t prove it, but a trial with small permutation cycles agreed with the hypothesis. So I started down the path of dividing the input into a count of the number of each permutation cycle length, under the assumption that the ideal strategy was to randomize each permutation cycle in turn until it was ordered. At this point I tried some examples – in fact the first example I tried was a cycle of length 3, 312. However due to some incredibly stupid mistakes (I made 3 in quick succession, but only noticed 2) I incorrectly calculated an average of 2.5 operations to order this cycle. I then made a completely mistaken leap from this mistaken value to presume that this was further evidence that working with cycles was better than working at random.

So at this point I start breaking out some serious combinatorics deriving the formula for the average number of operations for a cycle of length n, assuming the ideal behaviour of working after each operation was to work with one cycle at a time. I’m not great with combinatorics, and it took me a while to realize that most of the combinations and permutations and factorials in my formula would actually cancel out, and I wouldn’t be needing to calculate the ratio of 3000 digit numbers to 7 significant figures. I then made a mistake doing the cancellations and ended up with something again, completely wrong. It was obvious though, so I started some trials again. This time on the core piece of my formula which is ‘how many different ways is there for n randomly permuted entries to have a length k cycle which includes the first element. I was bemused to find that my trials didn’t depend on k, it was always (n-1)! – or 1/nth of the total permutations. I could see where I screwed up my cancellations now and ploughed ahead with my formula. I implemented mutual recursion memotization, and hit go. It was fast, it produced the correct answers for the sample.

At this point I decided to have a look at my memotization tables – I laughed because there, accurate to 7 or more significant figures, was a table which contained ‘n’ at index ‘n’. I had successfully demonstrated that for ‘n’ disordered elements the sum of the times for each subcycle is equal to ‘n’, which is tantalizingly close to suggesting that advanced strategy was irrelevant. The only thing you have to do, is make sure you don’t randomize something which is already in the correct place (and make sure you do randomize every member of any given cycle, which comes from letting everything else randomize).

So at this point I ripped out my memotization tables and replaced them with ‘if passed n, return n’. I then realized the obvious, that I didn’t actually need almost any logic any more at all. The answer was equal to the number of disordered elements in the input. Counter, for loop, if statement, increment, return result.

I probably would have solved this problem quickly, if it wasn’t for that mistake of calculating 2.5 as the expectation value for cycle length 3… Instead I wasted almost 3 hours on it alone. 3 hours for 4 lines of code… (ignoring the file parsing code of course!)

GCJ11 Qualification Round

So its almost halfway over and over 4500 people currently have the requisite 25 points to their name – of course the final testing could still play havoc with a few of them.

A significant change in format this year – previously there was no way to qualify solving just the small input files, solve one problem in full and you are through.

This year there are 4 questions, and you could solve just small input files to get through – indeed I’ve solved them all so baring some catastrophic event there is no way I am not through to round 1. (I also solved the large inputs, we’ll see how the final testing treats those…) Also, the first question isn’t worth enough points to get through.

So I plan on doing a write up tomorrow once it is all finished, but it is looking to be rather a busy day, so it might get delayed…

GCJ11 is coming

Just 2 and a bit days till GCJ11 qualification round now. Registration is open on the day I believe, so no rush.

Democratic meritocracy or merit-based democracy?

Woo, dangerous topic detected…

Anyway I was just thinking random thoughts while reading about how popularity had increased for some elected official because of how they had handled themselves during a natural disaster. My thoughts strayed to the disconnect between who people think is best for them and who is best for them when it comes to politics. After all we are asked to make judgements based on limited information presented to us via system which is not designed to ensure completeness or lack of bias – we just have to hope that by looking at all available sources we can get sufficient completeness, detect biases, compensate for them, and we have to hope that everyone else can do that too. I was wondering what options might improve things without switching from the democratic way that is so entrenched…

Voting systems. I’ve always been interested in voting systems because of how you end up with different winners depending on which voting system you choose, even though they might all seem reasonable on the surface. The idea which came in to my head was probably one which grew from a seed planted by knowing of a website where you can enter your policy beliefs and it tells you the party with the best policy belief match. What if rather than a straight democratic vote system it was crossed with an exam.

First you hold a referendum to choose questions – and based on the referendum chosen questions also get given weightings based on population percentage which thought it important (possible fancy mathematics involved here…). Then all political candidates have to do this exam, this list of questions. The answers are a matter of public record – legally binding, if an elected politician clearly acts against their own answers it is grounds to have their position revoked. Finally the exam is graded. How?

Well questions can have 2 forms, opinions or facts. An example question could be what percentage of your electorate is currently living in a retirement village? These questions are factual, verifiable – but also somewhat useless because the questions will be known in advance so all it tests is that the politician researched factual questions that the population thinks it is important for them to know the answer to. Also as they are factual there isn’t the possibility of a politician acting contrary to their answer. They might serve a limited purpose, but all they really test is the politicians capability of recall. A different question is ‘do you think 10 years jail is sufficient punishment for armed robbery?’ These kinds of questions are opinions, and to get the ‘correct’ answer where better to go than democracy. So after the politicians answers are published, the populous goes back to the polls and does the exam themselves.

Using the peoples ‘votes’ on each question, each politician can be graded. Another opportunity for some fancy voting mathematics research is to work out the fair way to do this, whether you give partial marks based on percentage of population which matched your answer, or only mark correct based on majority preference. Maximum grade equals elected. (Just imagine the ‘how to vote’ cards…) Also to consider here, margin of victory also describes the wriggle room – how much a politician can violate the published answers and still expect to get re-elected under this system.

Going back to the factual questions for a moment, maybe they can be removed from the election and get added to a ‘politicians exam’. Much like an exam you have to pass to be a lawyer, or to train to be a doctor. If you can’t pass the politician’s exam, you can’t run for office. Maybe more targeted questions could be required to be passed in order to become a minister for a specific portfolio – like specialist exams for doctors. Would have to look at what happens to the government who forms and has no one who can pass the ministerial exam for science…

I think the above ideas have some merit (…), although I do not think for one second I will see them implemented. They also have some downsides – after all is rigidity of conviction actually a good thing?

Civ 5 AI

Been ages since my last post – definitely appears that I have given up doing the top coder problem analysis for a while – I’ll probably get back into the swing of it when this year’s comps start up again….

So I’ve been playing a bit of Civilization 5 – I played the original a bit and some of the intervening editions but I was never the addict that many people I knew were. Civ 5 has several things about it which I like a lot better than previous versions, but the AI is not one of them.

I’ve been slowly working my way up the difficulty levels – finding that they weren’t that hard. I know I’m not a very good player when it comes to these games, but I won my first game on each difficulty level up to emperor (excluding the games where I quit in the first 50 turns because I inadvertently taunted the AI and it decided to steam roll me with units early on – while I stupidly had no army or defence).
It took me a couple of goes to get Immortal difficulty level, in the end I explicitly chose a map that was large with only 4 players, ensuring I had room to work with at the start before the AI found me, time for a half decent strategy to catch up with the advantages given to the AI at the beginning. In pretty much every game at these high difficulty levels I did the science victory because I am no good at the war side of the game (despite the AI being even worse, at high difficulty it has a lot more units), and can be converted to diplomatic or domination victory if it becomes problematic. On Immortal the space race was very close, I had to pay other countries to attack whoever was going for space victory in order to slow them down that little bit extra I needed to win.

So when it came to my first deity level difficulty match (the highest difficulty level), I decided I wanted to try something different. I had done a single city culture based victory on the easiest difficulty level for one of the achievements, and I remember that I won quite early on in the game. So I thought maybe that would alleviate some of the problems with the science victory approach as I would be done before the AI. I also wanted to minimize tweaking of the game parameters since it feels a bit like cheating, choosing options which you know you are good at, or the AI is weaker at. So I decided the standard Duel size on the standard Continents map should be acceptable – At least I wouldn’t start next to the AI, but small map so I wouldn’t have much room to grow and the AI would likely turn up and smack me fairly early on. Random opponent and I chose India for myself since its good for a small number of large cities which the cultural victory type favours.

In the end I didn’t end up finishing the game any sooner than my science victory on immortal difficulty level, indeed the culture victory took me longer. But I won all the same – my first game on deity level – no retries, nothing.

I built a single city. I had intended on building maybe 3, but the duel map was even smaller than I expected, there was only really room for 2 large cities on my ‘continent’ and there was a city state taking up a few squares at one end, so I decided (at the time fully expecting to lose imminently) to see how I went with a single city.
I focused on culture buildings (obviously!), but with a single city expanding quickly you quite easily keep up with the technology tree so there wasn’t much thinking involved in choosing what to build next – I basically just built everything. I had a couple of units to explore, which quickly showed the map had no shallow water to the other AI, and 3 city states within my reach. I quickly grabbed 2 to ally level, but the AI got the third, spending alot of money on it, made it beyond my means to get ally with immediately. As the game progressed, my single worker upgrading tiles as they became available and me raking in money, production and having such a high happy rating that I was in a golden age more often than not, it became apparent that I was doing okay, but not that great. Cultural victory was definitely going to take longer than the Immortal AI would win a science victory. The AI I was playing against this time was the French, who get a cultural advantage and early on looked highly likely to be ahead in that respect, as well as being far, far ahead in science, and based on the trading screen having enough money to buy all of the city states off me 4 times over. Not to mention I had a single city defended by a single unit, no walls and the AI had an army capable of destroying a 20 town continent with a half decent army.

So it did indeed look like I was going to lose, as I had thought all along. But as the game went on, the inevitable ‘the French have built the Apollo program’ or ‘united nations’ or ‘has declared war on you’ just didn’t happen. Indeed I almost built the united nations myself, since by then I had amassed enough money to steal the city states away from the AI and the AI didn’t attempt to use its huge reserves of cash to fight me for them. (As an aside my single city with its 4 friends and certain cultural policies was almost generating science as fast as my first failed attempt at Immortal difficulty level where I had a dozen towns – just a sign of how badly I play really…)
Finally, I just sort of won by default. I kept playing after game over to see what the AI had in mind – and ~40 turns later it built the UN and purchased all of the city states in an aggressive fashion. Up until that point I assumed that I had just had a lucky run and ended up with an AI who had decided to try and win on points (the AI had a huge score), but losing on points at turn 500 is a sure sign you weren’t actually trying to win… but here lies the problem.
As far as it can be determined it seems the AI picks a single strategy and sticks to it. There doesn’t appear to be any comparative evaluation to decide which strategy is most likely to result in victory at any given time. The game is slow enough as it is, the original Civ ran on ~386?, and the game logic doesn’t seem to have really become that much more complicated. Even when you switch from the 3d graphics to the old 2d display Civ 5 chugs on a moderate size map in the late game, the large maps are quite painful. To think how it would be with real strategic evaluation … horrifying. But without it a game is mostly about luck of the draw as to whether the AI will use a strategy you are terrible against, or one which is practically useless. Without the dynamic strategic evaluation there is no real challenge, no competitive force which helps you improve. You increase the difficulty level because half the time its too easy, but then half the time you get a game which is too hard given you’ve never learnt how to cope with the AI’s strategy.

Anyway, enough rant, I guess I’ll go play another round… 😛

TC SRM 478

Been a while since I last posted… I’ve fallen behind as was to be expected.

Q1) Given carrots are placed at locations 0 mod 1billion 7, and from any given location there are 2 options to go to, 4x+3 or 8x+7 and at most 100thousand movements are allowed, what is the minimum number of movements to get from starting position x, to a carrot.

A1) Obviously this problem is about transitions modulo 1 billion and 7. The trick is needing to rule out trying all 2^100000 options. 1 billion and 7 is too large to compute all transitions, so a simple memotization won’t work.

The first step is to realise that order of application doesn’t matter. A(B(x)) == B(A(x)). That gets rid of almost all of the solution space. Furthermore A(A(A(x)))=B(B(x)) and since we are trying to minimise moves the ideal solution will always choose 2 8x rather than 3 4x. From there it is a simple matter of repeated application of B and trying 0 1 or 2 applications of A. (Technically you need to verify that another application of B isn’t equal of 2 appliations of A are equal given the statements made so far, but A(A(x))!=B(x) for all x so that is not a problem)

Another solution is to notice that all transition states available are of the form 2^(n)*(x+1)-1, and that every possible n is reachable except n=1. Furthermore since the order of application doesn’t matter, and a greedy approach of using 8x as often as possible therefore is the ideal scenario. Therefore the answer is n/3 if n is a multiple of 3, or n/3+1 otherwise, for the first n which results in a 0 mod 1 billion and 7 answer. (if 2 mod 3 then apply 4x, if 1 mod 3 then apply 4x twice from one less 8x.)

Q2) Given up to 15 bottles each with capacity C and a list of initial contents levels (all integer), and a price list for how much a bottle with any specific content level is worth determine maximum profit given the ability to pour 1 bottle into another until it is empty or the other is full. Maximum C is 50 and maximum price is 1million.

A2) Each pouring operation either results in a bottle which is empty or one which is full (potentially both). Empty and full bottles are of no interest for future pouring operations, so there are at most 14 pouring operations that will be performed. Giving a search space of 14! – which is too large.

A dynamic programming approach might seem like a good idea, all we have to do is determine the maximum for each subset and combine them. But a closer look shows that it doesn’t make sense because we don’t know the contents levels in bottles as they may be neither full, empty nor their initial value.

So a different approach is required. If a subset of the bottles all pour, one into another in some order. They all end up either full or empty except for possibly 1. Given we know the total contents and the capacity of each one, div gives you the number of full bottles, and mod gives you the capacity of the last possibly non-empty one, the rest are empty. Presto we can trivially calculate the maximum for each subset where they all pour. Therefore we can determine the maximum for a specific subset, it is the maximum for all partitions of pouring. Number of partitions of 15 elements is still ~1billion so we can’t brute force those.

The maximum of n elements is therefore found by considering each possible non-empty subset as all pour, and the maximum of the rest. This has a running time of less than 2^29, but it seems likely to time-out in the worst case.

Next approach – For each subset one of the elements will have content j, the rest are already sold. You can generate a subset 1 larger by combining the unsold one with an unused one, or selling the unsold one and keeping the new one as unsold. This has a state space of 2^15*50, but only an average of about 15 options to consider for each spot. This gives running time <2^25 which is clearly going to run in time.

Q3) Given a list of the count of each type of apple in up to 50 boxes, determine the probability of selecting each specific apple type at random if you first randomly select a non-empty subset of the boxes to further select from randomly. (Up to 50 types of apple and each count is <200, no box is empty)

A3) The dual random selection is kind of annoying as the number of subsets is up to 2^50. So obviously we can’t consider each scenario and average.

Dynamic programming might be a possibility. Maybe determine the probabilities using the first n boxes? Except how do you expand from n boxes to n+1. This appears to be impossible. To modify the probabilities you need to know how many apples there were before. Hmm, that <200 limitation is pretty small, gives a maximum of 500k apples in any subset. Each transition only depends on the one before, so we don’t need to keep everything in memory. So probabilities using the first n boxes with a total of k apples. That is 2meg ram for just one probabilitiy, so we’ll have to do each probability separately, which I think is fine…

Well maybe a bit slow.

A faster way to do it is to break down the problem differently so you can accumulate the probabilities for each item simultaneously without requiring any additional ram. One breakdown is to consider the answer as being the sum of count of apples of specific type in a box divided by total apples in subset, times number of ways to create that total with a subset that contains the specific box divided by total number of subsets, for each possible box and total number of apples in a subset. Efficient calculation of number of ways to create total with a subset that contains the box means that running time is proportional to 500k times 50 worst case, as each apple type can be accumulated at the same time, because the part which requires the giant array is constant over each apple type. Unlike the previous calculation where the giant array is specific to the apple type, which is 50 times slower. (Although apparently an efficient implementation can pass system tests… without this further optimization)

TC SRM 477

As is usual with SRM I didn’t do this one. However, I figure I might keep writing up blogs until next years TCO.

Q1) Given a boolean rectangular array which maps to a hexagonal grid by shifting every second row half an entry to the right, determine the number of edges between true and false exist.

A1) This is easy enough, its a simple case of walking each cell and checking its 6 ‘neighbours’ and incrementing a counter if its not the same. Once complete divide answer by 2. The only difficulty is making sure you don’t screw up the determination of the 6 neighbours, which is different depending on whether the current cell is in an odd or even row.

Q2) Given a list of numbers, determine the greatest number of distinct pairs which can be selected such that all such pairs are the shorter sides of a integer sided right triangle and have a gcd of 1. Numbers are all less than 1 million and worst case there might be 1250 numbers (less if they aren’t all single digit – maximum of 350 6 digit numbers).

A2) The constraints on the pairs are pretty significant, and could easily rule out most of the numbers entirely. However 600 3s and 600 4s are a real possibility. We can easily go through each pair of numbers and work out if they are compatible. We can turn this into a graph with counts available for each number. Not certain whether the graph can contain cycles. If the graph can’t contain cycles, it is bipartite and we can perform a max flow to determine the answer. If it can contain cycles we have a problem…

So I checked out some answers – and yes it does seem max matching is now considered a question 2 level algorithm, not question 3 as I thought it used to be. However, whether there are cycles or not is not relevant, as 2 odd numbers apparently cannot satisfy the requirements (since the sum of their squares will be 2 mod 4), and 2 even numbers is ruled out by the gcd(a, b) == 1 requirement. Hence the pairs are bipartite based on even or odd.

Q3) Given a weighted tree with up to 200 nodes, determine the minimum weight sum of a tour which starts and ends at node 0, visits every edge at least once, and can teleport for cost L up to K times.

A3) If K is 0, the answer is 2 times the sum of the edge weights as every edge will be traversed twice. If K is 1, we can subtract the longest path and add L, and choose which is best. That is to say if the depth of the deepest node from any other node is > L we will teleport between those nodes. However once K is > 1, it becomes a lot trickier…

Again looking at solutions. The problem is a dynamic programming problem. Recurse over the minimum tour length within a subtree if there are i teleport endpoints in that subtree. This can be determined by considering that if a subtree has an even number of teleport endpoints, it will be both exited and entered, otherwise it will only be exited or entered, not both. So there are n nodes, and up to 2k teleport endpoints, so the DP has O(nk) states. Determining each recursion requires considering how to distribute the i teleport endpoints between a nodes children. Considering every combination is too much, so we need to switch up the DP a bit. If instead of thinking about each node, we consider each edge, we realise we only need to consider each possibility for i down a given edge, not in combinations with other edges out of a given node. So we can instead either recurse to the left most child and all its outgoing edges, or to the ‘set of edges to the right of the left most child’ Since the number of edges equals number of nodes -1 in a tree, the DP states is still O(nk). Number of possibilities to try for each case is O(k) giving O(nk^2) running time to populate the DP table. Once the table is populated all we have to do is check for root of the tree, each even number of end points, modified by the cost of teleporting, and determine the best result.

The real trick here is disassociating the end points, realising the problem can be solved without caring about specific locations teleporting to specific others.

This is the second time I’ve seen the ‘dfs edge walk on a tree’ DP problem. I should probably write up a solution to practice it.

TCO10 R4

So, as predicted by the odds I am out ;). I did at least score some points, so I came 125th. The question structure didn’t favour me, an easy first question jumping to a hard second question. A fast time on the first question was sufficient to get through, or a couple of challenges and a terrible time. However the first problem was so easy the challenges were hard to come by, and the second problem’s input requirements so obtuse that constructing a failing test case was next to impossible.

My time for the first problem was terrible – I had the problem solved in the first minute, but a plus sign where I wanted a minus sign sent me down a warren of verifying things by hand, and quickly half an hour had vanished. That being said I probably would not have gotten through even if I hadn’t screwed up badly, had to be fast. Although an extra 20minutes and I might have actually had a serious attempt at the second question.

For a while I thought maybe I could do the 3rd question, thinking it was a well known computational geometry problem which I could just grab an algorithm off a textbook for. However the closest related problem in the textbook told me the running time would be too slow.

Q1) Given a set of bank accounts with known values, where 1 dollar gets you one ticket in a lottery draw each week for a fixed amount, what is the expectation value of bank account 0 after n weeks. Limits: n up to 1000, winnings per week up to 1000, up to 50 bank accounts, each with a starting balance of up to 1million.

A1) I implemented this as a DP of probabilities of number of wins in a given number of weeks. Note that the total amount of tickets in a given week stays the same regardless of who wins in the previous weeks, so only bank account 0 and the initial total of the incoming bank accounts is of interest. Once DP is formed, simply multiply each probability in the final week by account balance for that many wins and return.

As it turns out the problem can be solved without the DP. It appears that simply using the expectation value of one week can be used to generate the expectation value of the next week. Reducing O(N^2) down to O(N).

Q2) Given a sequence of numbers formed by multiplying the digits of each of up to 50 consecutive integers, determine the location of the first of those integers, assuming the sequence refers to the earliest such match. Limits: the input numbers will be less than 1 billion, output will be less than 2^63.

A2) After the competition I realised that this problem is not as hard as it looks. I really might have worked this out if I had of made a serious attempt (and had that extra half hour :P). If the input is all 0’s you can return 10,100, 1000 depending on input length quickly. In fact brute forcing for possible answers of 1-1000 takes care of some corner cases. Find the first non-zero number, brute force the last 3 non-zero digits and solve for the rest by trying as many 9s then 8s then 7s down to 2s going from right to left, then check against all other numbers in the input to see if it is right. Return the smallest result. For kicks I wrote this up and it failed test cases – but only because of a mistake in my code, fixed that and it now passes system tests. My first thought was brute force 2 digits (well my real first thoughts were more complicated than that), but input size > 12 allows some control over the 3rd digit. A number followed by 12 0s indicates that the last 3 digits were 9s for instance. You can look at the input set and break it down in to all sorts of combinations, but in the end, simply brute forcing all last 3 digit scenarios is easier…

As an aside I was left wondering what checks the challenge system used, since the input limitations are significant. Other than 0 no two consecutive numbers in the input can be equal. No input number can have a prime factor greater than 7. At least 5 of the inputs must be 0, and after the last consecutive 0 every 10th number must be a 0. Two consecutive non-zero numbers must be increasing and the difference must divide them.

Q3) Given up to 1200 points, where no 3 are co-linear and no 2 are coincident, determine the number of line segment intersections from choosing any 4 points of the points.

A3) Brute force is obviously no good. A look on Wikipedia indicates that even the best algorithm for enumerating all intersections of the complete set of line segments from n points is in the order of the number of intersections, and if the 1200 points form a convex hull the number of intersections is well over N^3 (Close to N^4 even I think).

Therefore we can’t enumerate the actual intersections we have to count them in batches. Specifically the time needed needs to be about O(1) for every line segment. O(log n) would be sufficient.

I had a quick look at one solution, and it considers each point, and sorts the other points by angle relative to that point, then for each of those segments in order, mystically calculates the intersection count really quickly. … going to have to read it again after some sleep.

Edit: Post sleep update.

I had a look at some more solutions. General approach seems to be assume everything crosses. Then for every point, sort the other points around it by angle. If you can get 180 degree separation in the sorting, all the points on the left will not form crosses with the points on the right. So you subtract these scenarios away. However there is over-counting which you have to adjust for. Looks like the over-counting adjustment is probably the hardest bit.