Buffer of Doom

A simple pattern I have implemented myself, seen in many places and often results in problems is the following.

We need a buffer to hold some data before we process it, so we allocate an array large enough to hold the data.  To avoid continual garbage we keep this array and reuse it.  If the data we need to buffer is too large, we throw away the current buffer and make a new larger one.  We might even go fancy and ensure that the new buffer is greater in size than the current one by a minimum percentage margin, to ensure we don’t suffer from mass garbage from a 1,2,3,4,5,6 attack.

Where the problem comes in is the spike.  A single extraordinarily large piece of data is sent.  In response we successfully allocate a huge buffer, maybe taking up almost half of the entire available memory space.  Because of our design we never let this go.  Now if we are multi-threaded and have multiple processing queues each with their own buffer, a well crafted scenario will take up almost all available memory just with these buffers, until rather than failing due to an OOM exception somewhere we can catch it, the runtime itself throws OOM and kills the process.

Solution? Well we could simply outlaw all large data sets, but if you have multiple processing queues the threshold you have to place can be very low, to the point where normal processing is no longer feasible.  Another idea is a bit of a compromise, have a high maximum, and a lower ‘spike threshold’.  Any data set over the spike threshold is considered an anomaly and the buffer is not kept for reuse.

A bit more complicated is the reverse extension of the minimum margin of growth.  If more than ‘k’ buffer uses in a row are less than the current size reduced by a percentage ‘p’, throw the current buffer away and use a smaller buffer.  Question in my mind here is whether k is a constant, or a number which decreases as the buffer gets larger.  I think probably the later results in best behaviour, but some theoretical analysis for random and worst case scenarios would be in order.

And the reason I brought this topic up at all?  System.Data.SqlClient.  Large string values to or from the database are internally handled using a buffer of doom.  There is one buffer per database connection.  You have to be careful with large data or you end up allocating almost all available memory to database connection buffers…

DPS Evil 1.6

I had a couple of requests to update my DPS parsing client/server to support the new combat messages on Bullroarer.  The new messages seem to parse much more reliably with much less code, almost like they were designed to be nice 😛 (Well they could be nicer in terms of parse-ability, there is still potential ambiguity, but unlikely…)

I’ve uploaded a new version and it is available from the same link as before on the ‘My Software/LOTRO Software’ page. New to this version is a requirement to enter your character’s name into the settings.  This is because the log messages no longer use ‘You’.  Because of this requirement, the settings page now automatically opens on start.

At some point I will probably upgrade LOTROChatNarrator as well, but I’ve not had any reason to use its damage parsing functionality recently, so it probably won’t be in a hurry.

GC.SuppressFinalize and ContextBoundObject

I think this is a bit of an interesting combination – SuppressFinalize works on a specific object telling the garbage collector not to call the finalize method and context bound object means that references to that object are remoting proxies, so there are technically multiple objects which could be the one that the runtime calls finalize on.

Edit: Turns out I made a mistake in my original analysis (rather of silly of me, running different pieces of code on the different versions of .Net runtime), apparently GC.SuppressFinalize has never worked in a way which is useful with ContextBoundObject.  However the Windows 7 SP1 change now means that the finalizer is called via the proxy, not magically directly on the underlying object.  So if you had code for intercepting calls via the proxy and it didn’t handle the finalizer (because previously it didn’t have to), you have a new code path to which could go horribly wrong…

Democratic meritocracy or merit-based democracy?

Woo, dangerous topic detected…

Anyway I was just thinking random thoughts while reading about how popularity had increased for some elected official because of how they had handled themselves during a natural disaster.  My thoughts strayed to the disconnect between who people think is best for them and who is best for them when it comes to politics.  After all we are asked to make judgements based on limited information presented to us via system which is not designed to ensure completeness or lack of bias – we just have to hope that by looking at all available sources we can get sufficient completeness, detect biases, compensate for them, and we have to hope that everyone else can do that too.  I was wondering what options might improve things without switching from the democratic way that is so entrenched…

Voting systems.  I’ve always been interested in voting systems because of how you end up with different winners depending on which voting system you choose, even though they might all seem reasonable on the surface.  The idea which came in to my head was probably one which grew from a seed planted by knowing of a website where you can enter your policy beliefs and it tells you the party with the best policy belief match.  What if rather than a straight democratic vote system it was crossed with an exam.

First you hold a referendum to choose questions – and based on the referendum chosen questions also get given weightings based on population percentage which thought it important (possible fancy mathematics involved here…). Then all political candidates have to do this exam, this list of questions.  The answers are a matter of public record – legally binding, if an elected politician clearly acts against their own answers it is grounds to have their position revoked.  Finally the exam is graded.  How?

Well questions can have 2 forms, opinions or facts.  An example question could be what percentage of your electorate is currently living in a retirement village?  These questions are factual, verifiable – but also somewhat useless because the questions will be known in advance so all it tests is that the politician researched factual questions that the population thinks it is important for them to know the answer to.  Also as they are factual there isn’t the possibility of a politician acting contrary to their answer.  They might serve a limited purpose, but all they really test is the politicians capability of recall. A different question is ‘do you think 10 years jail is sufficient punishment for armed robbery?’  These kinds of questions are opinions, and to get the ‘correct’ answer where better to go than democracy.  So after the politicians answers are published, the populous goes back to the polls and does the exam themselves.

Using the peoples ‘votes’ on each question, each politician can be graded.  Another opportunity for some fancy voting mathematics research is to work out the fair way to do this, whether you give partial marks based on percentage of population which matched your answer, or only mark correct based on majority preference.  Maximum grade equals elected. (Just imagine the ‘how to vote’ cards…)  Also to consider here, margin of victory also describes the wriggle room – how much a politician can violate the published answers and still expect to get re-elected under this system.

Going back to the factual questions for a moment, maybe they can be removed from the election and get added to a ‘politicians exam’.  Much like an exam you have to pass to be a lawyer, or to train to be a doctor.  If you can’t pass the politician’s exam, you can’t run for office.  Maybe more targeted questions could be required to be passed in order to become a minister for a specific portfolio – like specialist exams for doctors.  Would have to look at what happens to the government who forms and has no one who can pass the ministerial exam for science…

I think the above ideas have some merit (…), although I do not think for one second I will see them implemented.  They also have some downsides – after all is rigidity of conviction actually a good thing?

Civ 5 AI

Been ages since my last post – definitely appears that I have given up doing the top coder problem analysis for a while – I’ll probably get back into the swing of it when this year’s comps start up again….

So I’ve been playing a bit of Civilization 5 – I played the original a bit and some of the intervening editions but I was never the addict that many people I knew were. Civ 5 has several things about it which I like a lot better than previous versions, but the AI is not one of them.

I’ve been slowly working my way up the difficulty levels – finding that they weren’t that hard. I know I’m not a very good player when it comes to these games, but I won my first game on each difficulty level up to emperor (excluding the games where I quit in the first 50 turns because I inadvertently taunted the AI and it decided to steam roll me with units early on – while I stupidly had no army or defence).
It took me a couple of goes to get Immortal difficulty level, in the end I explicitly chose a map that was large with only 4 players, ensuring I had room to work with at the start before the AI found me, time for a half decent strategy to catch up with the advantages given to the AI at the beginning. In pretty much every game at these high difficulty levels I did the science victory because I am no good at the war side of the game (despite the AI being even worse, at high difficulty it has a lot more units), and can be converted to diplomatic or domination victory if it becomes problematic. On Immortal the space race was very close, I had to pay other countries to attack whoever was going for space victory in order to slow them down that little bit extra I needed to win.

So when it came to my first deity level difficulty match (the highest difficulty level), I decided I wanted to try something different. I had done a single city culture based victory on the easiest difficulty level for one of the achievements, and I remember that I won quite early on in the game. So I thought maybe that would alleviate some of the problems with the science victory approach as I would be done before the AI. I also wanted to minimize tweaking of the game parameters since it feels a bit like cheating, choosing options which you know you are good at, or the AI is weaker at. So I decided the standard Duel size on the standard Continents map should be acceptable – At least I wouldn’t start next to the AI, but small map so I wouldn’t have much room to grow and the AI would likely turn up and smack me fairly early on. Random opponent and I chose India for myself since its good for a small number of large cities which the cultural victory type favours.

In the end I didn’t end up finishing the game any sooner than my science victory on immortal difficulty level, indeed the culture victory took me longer. But I won all the same – my first game on deity level – no retries, nothing.

I built a single city. I had intended on building maybe 3, but the duel map was even smaller than I expected, there was only really room for 2 large cities on my ‘continent’ and there was a city state taking up a few squares at one end, so I decided (at the time fully expecting to lose imminently) to see how I went with a single city.
I focused on culture buildings (obviously!), but with a single city expanding quickly you quite easily keep up with the technology tree so there wasn’t much thinking involved in choosing what to build next – I basically just built everything. I had a couple of units to explore, which quickly showed the map had no shallow water to the other AI, and 3 city states within my reach. I quickly grabbed 2 to ally level, but the AI got the third, spending alot of money on it, made it beyond my means to get ally with immediately. As the game progressed, my single worker upgrading tiles as they became available and me raking in money, production and having such a high happy rating that I was in a golden age more often than not, it became apparent that I was doing okay, but not that great. Cultural victory was definitely going to take longer than the Immortal AI would win a science victory. The AI I was playing against this time was the French, who get a cultural advantage and early on looked highly likely to be ahead in that respect, as well as being far, far ahead in science, and based on the trading screen having enough money to buy all of the city states off me 4 times over. Not to mention I had a single city defended by a single unit, no walls and the AI had an army capable of destroying a 20 town continent with a half decent army.

So it did indeed look like I was going to lose, as I had thought all along. But as the game went on, the inevitable ‘the French have built the Apollo program’ or ‘united nations’ or ‘has declared war on you’ just didn’t happen. Indeed I almost built the united nations myself, since by then I had amassed enough money to steal the city states away from the AI and the AI didn’t attempt to use its huge reserves of cash to fight me for them. (As an aside my single city with its 4 friends and certain cultural policies was almost generating science as fast as my first failed attempt at Immortal difficulty level where I had a dozen towns – just a sign of how badly I play really…)
Finally, I just sort of won by default. I kept playing after game over to see what the AI had in mind – and ~40 turns later it built the UN and purchased all of the city states in an aggressive fashion. Up until that point I assumed that I had just had a lucky run and ended up with an AI who had decided to try and win on points (the AI had a huge score), but losing on points at turn 500 is a sure sign you weren’t actually trying to win… but here lies the problem.
As far as it can be determined it seems the AI picks a single strategy and sticks to it. There doesn’t appear to be any comparative evaluation to decide which strategy is most likely to result in victory at any given time. The game is slow enough as it is, the original Civ ran on ~386?, and the game logic doesn’t seem to have really become that much more complicated. Even when you switch from the 3d graphics to the old 2d display Civ 5 chugs on a moderate size map in the late game, the large maps are quite painful. To think how it would be with real strategic evaluation … horrifying. But without it a game is mostly about luck of the draw as to whether the AI will use a strategy you are terrible against, or one which is practically useless. Without the dynamic strategic evaluation there is no real challenge, no competitive force which helps you improve. You increase the difficulty level because half the time its too easy, but then half the time you get a game which is too hard given you’ve never learnt how to cope with the AI’s strategy.

Anyway, enough rant, I guess I’ll go play another round… 😛

TC SRM 478

Been a while since I last posted… I’ve fallen behind as was to be expected.

Q1) Given carrots are placed at locations 0 mod 1billion 7, and from any given location there are 2 options to go to, 4x+3 or 8x+7 and at most 100thousand movements are allowed, what is the minimum number of movements to get from starting position x, to a carrot.

A1) Obviously this problem is about transitions modulo 1 billion and 7.  The trick is needing to rule out trying all 2^100000 options.   1 billion and 7 is too large to compute all transitions, so a simple memotization won’t work.

The first step is to realise that order of application doesn’t matter. A(B(x)) == B(A(x)). That gets rid of almost all of the solution space.  Furthermore A(A(A(x)))=B(B(x)) and since we are trying to minimise moves the ideal solution will always choose 2 8x rather than 3 4x.  From there it is a simple matter of repeated application of B and trying 0 1 or 2 applications of A. (Technically you need to verify that another application of B isn’t equal of 2 appliations of A are equal given the statements made so far, but A(A(x))!=B(x) for all x so that is not a problem)

Another solution is to notice that all transition states available are of the form 2^(n)*(x+1)-1, and that every possible n is reachable except n=1.  Furthermore since the order of application doesn’t matter, and a greedy approach of using 8x as often as possible therefore is the ideal scenario. Therefore the answer is n/3 if n is a multiple of 3, or n/3+1 otherwise, for the first n which results in a 0 mod 1 billion and 7 answer. (if 2 mod 3 then apply 4x, if 1 mod 3 then apply 4x twice from one less 8x.)

Q2) Given up to 15 bottles each with capacity C and a list of initial contents levels (all integer), and a price list for how much a bottle with any specific content level is worth determine maximum profit given the ability to pour 1 bottle into another until it is empty or the other is full. Maximum C is 50 and maximum price is 1million.

A2) Each pouring operation either results in a bottle which is empty or one which is full (potentially both).  Empty and full bottles are of no interest for future pouring operations, so there are at most 14 pouring operations that will be performed. Giving a search space of 14! – which is too large.

A dynamic programming approach might seem like a good idea, all we have to do is determine the maximum for each subset and combine them.  But a closer look shows that it doesn’t make sense because we don’t know the contents levels in bottles as they may be neither full, empty nor their initial value.

So a different approach is required.  If a subset of the bottles all pour, one into another in some order.  They all end up either full or empty except for possibly 1.  Given we know the total contents and the capacity of each one, div gives you the number of full bottles, and mod gives you the capacity of the last possibly non-empty one, the rest are empty.  Presto we can trivially calculate the maximum for each subset where they all pour.  Therefore we can determine the maximum for a specific subset, it is the maximum for all partitions of pouring.  Number of partitions of 15 elements is still ~1billion so we can’t brute force those.

The maximum of n elements is therefore found by considering each possible non-empty subset as all pour, and the maximum of the rest.  This has a running time of less than 2^29, but it seems likely to time-out in the worst case.

Next approach – For each subset one of the elements will have content j, the rest are already sold.  You can generate a subset 1 larger by combining the unsold one with an unused one, or selling the unsold one and keeping the new one as unsold.  This has a state space of 2^15*50, but only an average of about 15 options to consider for each spot.  This gives running time <2^25 which is clearly going to run in time.

Q3) Given a list of the count of each type of apple in up to 50 boxes, determine the probability of selecting each specific apple type at random if you first randomly select a non-empty subset of the boxes to further select from randomly. (Up to 50 types of apple and each count is <200, no box is empty)

A3) The dual random selection is kind of annoying as the number of subsets is up to 2^50.  So obviously we can’t consider each scenario and average.

Dynamic programming might be a possibility. Maybe determine the probabilities using the first n boxes?  Except how do you expand from n boxes to n+1.  This appears to be impossible.  To modify the probabilities you need to know how many apples there were before.  Hmm, that <200 limitation is pretty small, gives a maximum of 500k apples in any subset.  Each transition only depends on the one before, so we don’t need to keep everything in memory.  So probabilities using the first n boxes with a total of k apples.  That is 2meg ram for just one probabilitiy, so we’ll have to do each probability separately, which I think is fine…

Well maybe a bit slow.

A faster way to do it is to break down the problem differently so you can accumulate the probabilities for each item simultaneously without requiring any additional ram.  One breakdown is to consider the answer as being the sum of count of apples of specific type in a box divided by total apples in subset, times number of ways to create that total with a subset that contains the specific box divided by total number of subsets, for each possible box and total number of apples in a subset.  Efficient calculation of number of ways to create total with a subset that contains the box means that running time is proportional to 500k times 50 worst case, as each apple type can be accumulated at the same time, because the part which requires the giant array is constant over each apple type.  Unlike the previous calculation where the giant array is specific to the apple type, which is 50 times slower. (Although apparently an efficient implementation can pass system tests… without this further optimization)

TC SRM 477

As is usual with SRM I didn’t do this one.  However, I figure I might keep writing up blogs until next years TCO.

Q1) Given a boolean rectangular array which maps to a hexagonal grid by shifting every second row half an entry to the right, determine the number of edges between true and false exist.

A1) This is easy enough, its a simple case of walking each cell and checking its 6 ‘neighbours’ and incrementing a counter if its not the same.  Once complete divide answer by 2.  The only difficulty is making sure you don’t screw up the determination of the 6 neighbours, which is different depending on whether the current cell is in an odd or even row.

Q2) Given a list of numbers, determine the greatest number of distinct pairs which can be selected such that all such pairs are the shorter sides of a integer sided right triangle and have a gcd of 1.  Numbers are all less than 1 million and worst case there might be 1250 numbers (less if they aren’t all single digit – maximum of 350 6 digit numbers).

A2) The constraints on the pairs are pretty significant, and could easily rule out most of the numbers entirely.  However 600 3s and 600 4s are a real possibility.  We can easily go through each pair of numbers and work out if they are compatible.  We can turn this into a graph with counts available for each number.  Not certain whether the graph can contain cycles.  If the graph can’t contain cycles, it is bipartite and we can perform a max flow to determine the answer.  If it can contain cycles we have a problem…

So I checked out some answers – and yes it does seem max matching is now considered a question 2 level algorithm, not question 3 as I thought it used to be.  However, whether there are cycles or not is not relevant, as 2 odd numbers apparently cannot satisfy the requirements (since the sum of their squares will be 2 mod 4), and 2 even numbers is ruled out by the gcd(a, b) == 1 requirement.  Hence the pairs are bipartite based on even or odd.

Q3) Given a weighted tree with up to 200 nodes, determine the minimum weight sum of a tour which starts and ends at node 0, visits every edge at least once, and can teleport for cost L up to K times.

A3) If K is 0, the answer is 2 times the sum of the edge weights as every edge will be traversed twice.  If K is 1, we can subtract the longest path and add L, and choose which is best. That is to say if the depth of the deepest node from any other node is > L we will teleport between those nodes.  However once K is > 1, it becomes a lot trickier…

Again looking at solutions.  The problem is a dynamic programming problem.  Recurse over the minimum tour length within a subtree if there are i teleport endpoints in that subtree.  This can be determined by considering that if a subtree has an even number of teleport endpoints, it will be both exited and entered, otherwise it will only be exited or entered, not both.  So there are n nodes, and up to 2k teleport endpoints, so the DP has O(nk) states.  Determining each recursion requires considering how to distribute the i teleport endpoints between a nodes children.  Considering every combination is too much, so we need to switch up the DP a bit.  If instead of thinking about each node, we consider each edge, we realise we only need to consider each possibility for i down a given edge, not in combinations with other edges out of a given node.  So we can instead either recurse to the left most child and all its outgoing edges, or to the ‘set of edges to the right of the left most child’  Since the number of edges equals number of nodes -1 in a tree, the DP states is still O(nk).  Number of possibilities to try for each case is O(k) giving O(nk^2) running time to populate the DP table.  Once the table is populated all we have to do is check for root of the tree, each even number of end points, modified by the cost of teleporting, and determine the best result.

The real trick here is disassociating the end points, realising the problem can be solved without caring about specific locations teleporting to specific others.

This is the second time I’ve seen the ‘dfs edge walk on a tree’ DP problem.  I should probably write up a solution to practice it.

TCO10 R4

So, as predicted by the odds I am out ;). I did at least score some points, so I came 125th. The question structure didn’t favour me, an easy first question jumping to a hard second question.  A fast time on the first question was sufficient to get through, or a couple of challenges and a terrible time.  However the first problem was so easy the challenges were hard to come by, and the second problem’s input requirements so obtuse that constructing a failing test case was next to impossible.

My time for the first problem was terrible  – I had the problem solved in the first minute, but a plus sign where I wanted a minus sign sent me down a warren of verifying things by hand, and quickly half an hour had vanished.  That being said I probably would not have gotten through even if I hadn’t screwed up badly, had to be fast.  Although an extra 20minutes and I might have actually had a serious attempt at the second question.

For a while I thought maybe I could do the 3rd question, thinking it was a well known computational geometry problem which I could just grab an algorithm off a textbook for.  However the closest related problem in the textbook told me the running time would be too slow.

Q1) Given a set of bank accounts with known values, where 1 dollar gets you one ticket in a lottery draw each week for a fixed amount, what is the expectation value of bank account 0 after n weeks. Limits: n up to 1000, winnings per week up to 1000, up to 50 bank accounts, each with a starting balance of up to 1million.

A1) I implemented this as a DP of probabilities of number of wins in a given number of weeks.  Note that the total amount of tickets in a given week stays the same regardless of who wins in the previous weeks, so only bank account 0 and the initial total of the incoming bank accounts is of interest.  Once DP is formed, simply multiply each probability in the final week by account balance for that many wins and return.

As it turns out the problem can be solved without the DP. It appears that simply using the expectation value of one week can be used to generate the expectation value of the next week.  Reducing O(N^2) down to O(N).

Q2) Given a sequence of numbers formed by multiplying the digits of each of up to 50 consecutive integers, determine the location of the first of those integers, assuming the sequence refers to the earliest such match. Limits: the input numbers will be less than 1 billion, output will be less than 2^63.

A2) After the competition I realised that this problem is not as hard as it looks.  I really might have worked this out if I had of made a serious attempt (and had that extra half hour :P). If the input is all 0’s you can return 10,100, 1000 depending on input length quickly.   In fact brute forcing for possible answers of 1-1000 takes care of some corner cases. Find the first non-zero number, brute force the last 3 non-zero digits and solve for the rest by trying as many 9s then 8s then 7s down to 2s going from right to left, then check against all other numbers in the input to see if it is right.  Return the smallest result.  For kicks I wrote this up and it failed test cases – but only because of a mistake in my code, fixed that and it now passes system tests.  My first thought was brute force 2 digits (well my real first thoughts were more complicated than that), but input size > 12 allows some control over the 3rd digit.  A number followed by 12 0s indicates that the last 3 digits were 9s for instance.  You can look at the input set and break it down in to all sorts of combinations, but in the end, simply brute forcing all last 3 digit scenarios is easier…

As an aside I was left wondering what checks the challenge system used, since the input limitations are significant.  Other than 0 no two consecutive numbers in the input can be equal. No input number can have a prime factor greater than 7. At least 5 of the inputs must be 0, and after the last consecutive 0 every 10th number must be a 0. Two consecutive non-zero numbers must be increasing and the difference must divide them.

Q3) Given up to 1200 points, where no 3 are co-linear and no 2 are coincident, determine the number of line segment intersections from choosing any 4 points of the points.

A3) Brute force is obviously no good. A look on Wikipedia indicates that even the best algorithm for enumerating all intersections of the complete set of line segments from n points is in the order of the number of intersections, and if the 1200 points form a convex hull the number of intersections is well over N^3 (Close to N^4 even I think).

Therefore we can’t enumerate the actual intersections we have to count them in batches.  Specifically the time needed needs to be about O(1) for every line segment. O(log n) would be sufficient.

I had a quick look at one solution, and it considers each point, and sorts the other points by angle relative to that point, then for each of those segments in order, mystically calculates the intersection count really quickly. … going to have to read it again after some sleep.

Edit: Post sleep update.

I had a look at some more solutions.  General approach seems to be assume everything crosses.  Then for every point, sort the other points around it by angle.  If you can get 180 degree separation in the sorting, all the points on the left will not form crosses with the points on the right. So you subtract these scenarios away.  However there is over-counting which you have to adjust for.  Looks like the over-counting adjustment is probably the hardest bit.

TC SRM 468

Q1) Given a dictionary and a mapping of numbers to sets of letters, and an input sequence consisting of numbers, ‘try alternate’ button pushes and ‘spaces’, determine the output.  Assume dictionary is processed in alphabetical order.

A1) Easy enough, make a dictionary of number sequences to list of words that match, sort each list, then process input to produce output…

Q2) Given a graph with a line of nodes each connecting from one to the next, determine the minimum time to get from the first node to the last.  Subject to some special conditions.  Each edge has two time options for its travel.  The first is always available, but the second can only be used if there are less than k contiguous sections where second option has already been used, or you used the second option for the last edge. The number of nodes n is up to 400thousand, and k is up to n or 40, whichever is lower.  The first and second times for each travel option are generated using a pseudo-random number generator with specified parameters.

A2) While writing my question text above I may have made this question even easier due to my phraseology.  Its a DP problem, 3 parameters, number of contiguous sections of second option so far, boolean if last was second option and how many nodes passed already.  Very easy, only problem is that it requires 64meg ram in the default dp table.  Luckily every entry only depends on the ones at one less nodes passed.  Lucky because that means we can get rid of the memory problem, and also because if it depended on all of them the processing time would have been excessive.

Q3) Given a k-partite graph where each ‘layer’ only connects to its neighbours (arranging the k layers in a circle), determine the maximum independent set.  Limits: Maximum vertices is 100, k between 10 and 50.

A3) When I first saw this problem I was shocked, a question 3 which maybe I can actually do.  (The original question isn’t phrased as given, but still it was obvious this was the real question.)  I unfortunately pretty quickly came to the realisation that the graph itself didn’t have any limitations which guaranteed it was bipartite. If k was even, I was good to go, but when k is odd, I had a problem.  I thought about it for quite a while, and along the way I had the idea that I could try all the possibilities for one of the layers, and do the rest pretending it was a bipartite graph.  Unfortunately I threw this thought away because I didn’t remember that k was at least 10, and so I decided that worst case would be 33 vertices as the smallest layer and 2^33 scenarios is too many to consider.  With k at least 10 (or we might as well say 11 since 10 is trivial and a maximum vertices of 100, the worst case scenario is when the 100 vertices are distributed in layers of 10 or 9.  Select one of the rows with only 9 in it, and try all 2^9 scenarios.  For a given layer if we are trialling a vertex as being in the independent set, we can remove it, and any immediate neighbour vertexes.  If not we just remove it.  This removes the entire layer, leaving us with a bipartite graph which we can solve via the dual maximum matching problem using simple maxflow on a graph with capacities which are all 1.  Add in the number of ‘independent’ vertexes in the current trial and choose maximum over all trials and done.  Probably one of the easier question 3’s I’ve seen recently.

TC SRM 469

Q1) Given a n rows of m seats, and a list of specific taken seats, return how many ways 2 people can sit next to each other. Limits: n and m up to 1 billion and number of taken seats up to 50.

A1) Very very easy.  Use long long to avoid overflow.  For each row with taken seats get the sorted taken seats in that row, then for each gap greater than 1, add gap-1 to the total.  Finally add number of empty rows times m-1.

Not my favourite kind of question, simply because I am not uber-speed code writer.

Q2) Given a counter which ticks down every minute, and a set of k items which each have a time length, but increase the counter by j at a specific time within that time length, determine the maximum number of items which can be processed completely before the counter drops below 0.  If there are multiple ways to achieve this goal, return the order which prefers lower numbered items first. Limits: k up to 20.

A2) Each item has a minimum requirement, and a net effect. If there is a non-negative net effect which we can meet the minimum requirement for, we want to use it. If there are only negative net effects, which we can meet the minimum requirements for, we will never be able to use anything which is currently not at minimum requirement.  However, the order of the things left is non-trivial.

With a limit on k of only 20, every possible subset is a valid consideration if we can process it quickly. This provided a hint, but I needed to look at a solution to get the rest. If we divide the input into start and end sections, we can determine which of the end section can be placed immediately after the start section.  This lets us create a DP of maximum number which can be processed from a set which appear after the rest not in the set (regardless of whether the rest not in the set can actually all be processed before it).  That final condition may sound crazy, but once you get to the full set,  the not-in set becomes empty set and the condition is fine, so the rest doesn’t matter.

Once the DP is done, you just have to do the ‘reverse dp walk’ to construct the output.

Q3) Given a list of items and 2 queues to put them in, where each item has a processing time, which depends which queue it is in, and after processing is placed into the other queue, unless it has already been processed by that queue, determine how many different ways to place the items in the 2 queues subject to 2 conditions.  1) The items have to be added to the queues in the same order as the input. 2) Neither queue must ever be empty before the other queue has finished its initial list. Limits: up to 47 items and each processing time is between 1 and 20.

A3) No clue – the processing time limit seems suspiciously small though. Maybe a dp on the number of solutions with an initial queue time of n?  Let me look at a solution.

Rather than solving the problem as given, solve a different problem, number of ways for queue 1 to run out before queue 2 is done its initial list.  Then switch queues and use the same logic again.  Subtract both from number of possible distributions.

So, how to count the number of ways for it to run out. Each time we put something on to queue 1, it makes it harder to run out, each time we put something on queue 2 it is easier to run out.  Some analysis is required to work out the DP.  The definition of fail is if total queue 1 time + total time of first i-1 from queue 2 processed at queue 1 speed – total time of first i from queue 2 processed at queue 2 speed < 0 for any i.  We want to know what the minimum of the left for any i is, for each possible distribution in order to know whether it has failed, if that minimum is less than 0, then it is a fail.  As each new item is added to the pile this function changes.  If we add one to queue 1, all possible left hand sides increase by the processing time, thus also increasing the minimum.  If we add one to queue 2, the inequality doesn’t change except for the new i=new length of queue 2 case.  In that case the inequality equals the previous scenario where all were processed minus the new addition as a possible new minimum.  So we can minimize between that and the old minimum.  The only problem is we need to know that previous scenario as well.  So ‘previous scenario’ and ‘minimum’ are 2 parameters to the DP, the 3rd being number of items en-queued.

Now that we have this new parameter we need to know how it changes.  If we add to the first pile, it goes up exactly the same as the minimum does, adding the queue 1 processing time.  If its added to the second queue, that causes a subtraction of the queue 2 processing time, and an addition of the queue 1 processing time.

Potential range on previous scenario and minimum are pretty large so memory is an issue if the entire dp was kept in memory.  Helpfully each case only depends on the previous number of items en-queued, and the final summation is only on the full list of items en-queued, so we can throw away memory as we go. (Avoid garbage collection issues and alternate arrays, don’t keep allocating.)  Additionally with the inequality range being +/- ~900, that is close to 200million dp locations filled without any optimisation.  Running with c++ this is probably reasonable, but with Java further trimming helped.  For instance minimums only ever get smaller and we are only interested in the negative ones, so treat all positive minimums as 0, and we save half the state space.  Furthermore the area of the DP increases by +/- 20 in both dimensions with each step, taking this into account can save another factor of ~3 rather than blindly checking every spot.

Again out of my range, but useful to sit down and understand how it works.