.Net 4.6 RC Diff

So its been a while since I last did a framework surface diff, but it seems my program still works.  This time I diffed 4.6 RC vs 4.5.1.

A few small things, but hiding in the middle is a bunch of new classes in System.Numerics which is nice.

As usual this list is not complete, I skip things which I don’t think are worth mentioning.  But that isn’t many things this time.

  • System.Array.Empty<T>() – does what it says on the box.
  • System.Buffer.MemoryCopy – like BlockCopy, but for raw memory pointers rather than arrays.
  • Usual set of SQL Server database connection additions – column encryption, authentication types.
  • DateTimeOffset supports conversion to/from unix time in seconds/milliseconds.
  • System.Diagnostics.ProcessStartInfo now has an Environment property.  Not sure what it does yet, given there is already the EnvironmentVariables property.
  • Bunch of Event tracing stuff I have no idea about…
  • System.FormattableString – seems to bunch a format string and its arguments together – but has no public constructor… But there is a static create method on System.Runtime.CompilerServices.FormattableStringFactory.
  • System.GC – new ‘no GC’ region methods which ensure there is enough memory before they start.  Can also force a small object heap compaction.
  • System.Globalization.CompareInfo.GetHashCode – can get a culture specific hashcode.
  • System.IO.MemoryStream.TryGetBuffer – can get back an array segment when memory stream is constructed with offset/length.
  • Async methods for NamedPipeStream connection.
  • Async methods for read/write/flush on UnmanagedMemoryStream
  • Socket options for reusing ports.
  • System Numerics adds Matrix3x2, Matrix4x4, Plane, Quaternion, Vector2, Vector3, Vector4.  They are unfortunately all single precision, but the reasoning is justified – they are all hardware accelerated using SIMD.  Interestingly the pre-release 4.6 documentation appears newer than the actual released code here.  It documents many useful static methods which are not yet in the actual library.  Also there is apparently a System.Vector<T> coming which works for any primitive type T, with a maximum rank dependent on SIMD register width.
  • New assembly level attribute – DisablePrivateReflection – apparently does what it says…
  • Asymmetric padding mode options for crypto.  CNG supports RSA.  Crypto random number generator can write random values to a subset of an array.
  • WindowsIdentity.RunImpersonated – can now execute a delegate rather than having to set up an impersonation context.
  • string.Format with format provider now has the small arg count optimization of explicit methods for 1 2 and 3 format args.
  • System.Encoding has dropped default support for code page based encodings, you have to manually call RegisterProvider to have them supported??
  • System.Encoding has a GetString which takes a byte pointer and length rather than just an array.
  • StringBuilder has an Append which takes a char pointer and length rather than just an array.  Also small arg count optimization with AppendFormat with format provider.
  • System.Threading.AsyncLocal – interesting, not 100% clear what it does yet, but it sounds like thread local variables except with a value per ‘async flow’ rather than per thread.
  • Task creation methods for immediate cancelled/exception states.
  • Extension methods to get a safe wait handle given a wait handle.
  • System.Uri.IdnHost – can get punycode version of international domain names.
  • WPF diagnostics hook event for visible tree changes.
  • System.Xml.XmlNode gets a PreviousText property.

TCO15 R1B

Having passed through already I didn’t compete in this round, but I was interested to see how the turn out was without a Google Code Jam happening at the same time.  Only 1023 people registered, which after allowing for the 700 advancers last time is only ~400 new people (assuming everyone who competed last time and didn’t advance, competed again) that became available without the code jam running.  Looks like round 2 will probably be quite short of its 2500 target if this keeps up.

Positive score criterion appears to have been the cut-off again, only 591 advancers.

Q1) Find the size of the largest sub range of an array where at least half of the numbers have a common divisor other than 1.

A1) With the array size limited to 50, and the size of the values limited to 1000, this seems a trivial brute force.  There are 1250 sub arrays of average length 25 and (obviously) less than 1000 prime numbers to test for divisibility.  There apparently 168 prime numbers less than 1000, so its not a huge win, but it does help a little if you happen to be working with a particularly slow programming language I guess.

Q2) Determine the expectation count of objects found if each object has a probability of being found on its own, and if found a list of other objects that will definitely be found (which expands transitively).

A2) Up to 50 objects, so a brute force of every combination of basic find on its own chance is out of the question.

Each object can be expanded out transitively easily, but the question is how to combine the probabilities is not obvious.  If an object is in isolation, nothing connects to it, the probability is simple.  If the object is part of a cycle which has no external connections, the probability is 1 – product(probability of each member of cycle not being selected).  Its the external connections which are trickier.

I think all cycles can be collapsed to be replaced with a single node with a ‘base’ probability as described above.  Once the cycles are removed, the graph is now a directed acyclic forest.  The roots are obvious they have their base probability and that’s it.  Children then have probability of being chosen of 1 – product(base probability of not being chosen, and total probability of not being chosen for each parent) – remembering that each node can have multiple parents and we have to have calculated the total probability for each parent before we can start.

Apparently I’ve made the problem more complicated than required though – in effect the probability of an item being found is 1 – product(base probability of not being chosen) for every node that is transitively connected to the item in question.  I can see how the math works out to be the same, but it wasn’t an obvious starting point.

Q3) Given a tree determine the number of distinct subsets of 7 vertices which are connected to such that the second is has the first as a parent/grandparent, the 3rd 4th and 5th has the second, and 6th and 7th has the 5th.  Subsets are distinct only if they have different members, order doesn’t matter.

A3) Number of vertices in the tree is up to 2000, so brute force is trivially out of the question.  It would seem a multiple pass dynamic program is in question.

First pass is number of transitive children for each vertex.  Second pass is number of ways to select 2 children of a given vertex.  Third pass is sum of products of the second pass for each child and the number of ways to choose 2 other children which are not the selected child or its descendants.  Fourth pass is sum of the 3rd pass for each child.  Final answer is the total of all values in the 4th pass.

The 3rd and 4th pass enumerate all transitive children for each vertex, this will only perform acceptably if you store the children as an adjacency list rather than checking for every potential vertex as a child in an adjacency matrix.  Additionally the number of ways to select 2 children needs to be done as N*(N-1)/2 – not by enumerating.  Calculating N in the second pass is trivial from first pass, N in the 3rd pass is first pass value minus the first pass value for the currently selected child minus 1 more for the currently selected child itself.

TMD.Algo now on GitHub (also 0.0.6.0)

I’ve migrated TMD.Algo from my internal source control to a GitHub public repository.  It can be found here.

As part of the migration I have dropped the signing key, so a simple download of the project will actually compile, but adding the built result to the GAC (if you so wish) will involve a bit more work.

Also along with the move to GitHub comes some work I’ve done on the library over the last couple of years, so I’ve upped the version number to 0.0.6.0.  The major new feature is the GCJ class under TMD.Algo.Competitions.  This class is designed to be the main entry point of a program parsing Google Code Jam style input and producing Google Code Jam style output.  It simplifies the basic parsing logic.  It also optionally supports running test cases in parallel, for those times your code just doesn’t quite optimize fast enough to otherwise solve in time.

The majority of the work for 0.0.6.0 however was in starting a new set of integration tests.  These integration tests take custom written solutions to past GCJ problems, which attempt to use the TMD.Algo library as much as vaguely makes sense, and ensure that they can handle the practice sample/small/large inputs to produce outputs that GCJ practice website considers passing.

Other smaller changes can be found in the commit description.

GCJ15 R1A

Over 12000 qualifiers could have turned up for R1A, but given time zones that was always unlikely.  5032 positive scores, over 300 perfect scores, and the advancing cut off was a fast time on the first 2 problems completely solved.  If I had of been doing this round I think I would have advanced, but maybe not having completed the easiest question – as I think the wording is incomplete, leaving the question open to being interpreted as a much harder question.

Q1) Given a list of observations of the number of items on a plate at 10 second intervals and knowledge that there is someone who can add items at any time, and another person who removes them either whenever, or at a constant rate if the plate is not empty (but not both).  What is the minimum number of items the second person removes under each mode of item removal.  Presume that constant rate item removal is a continuum, if removing 1 per second, there is half of the item still on the plate at 0.5 seconds.

A1) So I’ve changed the wording for this question to be what was actually marked correct, there is no statement regarding whether items are removed from the plate discretely or gradually across the period of constant rate of removal.  Removing discretely is a much different question, which is quite a bit harder.  (And given the persons ability to remove arbitrary numbers of items at any point in time in the first mode, my mind gravitated towards discrete.)

To see where this is different consider the example 1 1 1 0.  Under the continuum removal case the number of items removed per 10 second period must be an integer, as all observations are integer.  Hence 1 item per 10 seconds is the minimum removal rate, and 3 items are removed.  But if removing is a discrete event, you can choose a removal rate of 1 every 25 seconds, in which case only 1 item is removed. (Assuming that the discrete removals happen at the end of each removal period rather than the start – the start you have to remove one immediately, so the total is 2.)

Given the continuum restriction this problem is trivial.  In the first mode of removal you remove items only if the count goes down from observation to observation, so just sum the neighbour differences where they go down.  In the second mode the minimum number removed is described by the minimum rate – and the minimum rate is given by the largest drop.  Once largest drop is found the answer becomes simply a sum of all the values (excluding the last) except if the value is above the largest drop, then you just add the size of the largest drop instead.

The non-continuum problem is much harder as you have a search space which is not limited to integer values per 10 seconds, and its not clear that it is even a single transition from too slow to fast enough which would allow it to be binary searched.

Q2) A list of barbers each have a well defined customer processing rate.  When will the nth customer be served?  Assume that if multiple barbers are free, they are filled earliest in the list first.

A2) Nice problem.  The size of N may be huge meaning simulation is out of the question, even for the small input.  For the small input it might be possible to find a period of repetition of patterns of when barbers become available and skip ahead as appropriate, but the large input clearly rules that approach out as well.

While a simulation is not feasible, it is possible given a time T to determine quickly how many people each barber has started serving by the end of that minute, its just (T / barbers_time_to_cut) + 1.  The same formula given T – 1 gives how many people had started being served before this time.  This gives a range, which if it contains N, you’ve found the minute N starts being served and all that remains is to determine which barbers changed state between T -1 and T (which comes from the same formula), and choose between them based on how far N is from the start of the range.  To find which T-1 to T contains N, binary search on T.  If N is before the value at T – 1 T needs to be lower, if its greater than the value at T it T needs to be higher, otherwise you’ve found the value.

Q3) Determine, for each vertex, how many other vertexes need to be removed, for the specific vertex to be on the convex hull.  Consider co-linear points to form a convex hull which touches all of them.  All input vertexes will be distinct integer coordinates.

A3) The small input only has 15, so a brute force trial of every subset using the standard O(N log N) convex hull algorithm (like in TMD.Algo) – would work.  However the one in TMD.Algo throws exceptions if you give it a subset smaller than 3 or perfectly co-linear points, so those cases have to be handled separately.  For each node on the convex hull from each subset, it gets a new potential minimum based on the size of the subset vs the original input size.

The large input is much harder – 3000 obviously can’t be done with an exponential cost algorithm.  However with a fast computer and lots and lots of cores… an O(N^3) approach might work, if it has a sufficiently low constant.  Consider each pair of nodes gives N(N+1)/2 options.  These 2 nodes define a line of the form ay + bx + c = 0 (or a dx,dy vector).  Now every other point can be substituted into the formula (or cross product with dx2,dy2 vector formed by subtracting the point from original point  used to create the dx,dy vector).  Count the number of positive, negative and 0 outcomes (from either case).  Since we are considering every possible point pair, then obviously one of those pairs will be an edge on the ideal convex hull, for one of those points.  For the edge to be part of the convex hull, either all the positive, or all the negative points must be removed.  Thus the size of the smaller of the two options is a potential minima for both of the original two points.  Calculating the line formula is 2 products and 2 additions, then 2 conditional increments to classify.  Giving a constant for N^3 of 1 multiply, 1 add and 1 conditional increment, each of which is potentially only a clock cycle on a modern cpu.  The cross product approach is 3 subtractions instead of 2 additions, but should also be reasonable.

But if your programming language doesn’t optimize well, or you don’t have a bunch of cores available there is an O(N^2 log N) approach, which can be considered slightly inspired by the standard fast convex hull algorithm.

For each point sort all other points by angle relative to that point.  This can be done by first dividing the points into left/right and using a standard sort algorithm on each side using cross product of each point’s vector relative to the starting point as the comparator function.  Once the points are sorted O(N log N) the minimum removal of the original point can be calculated in O(N) time.  For each sorted point, it defines an edge with the starting point, and the rest of the sorted points are either left/right or equal of that edge.  The counts of these left/right/equal can be efficiently calculated in general, by first doing a linear scan of the sorted list for the first candidate edge, then to do the rest of the candidate edges you just have to advance the the indexes in the sorted list that mark the interesting areas, left, right, equal but on same side of starting point as candidate point, equal but on opposite side of starting point as candidate point.  In the process of going through all candidate edges each marked index will never need to backtrack more than one index per step, so they all have a maximum of O(N) operations each to maintain, giving a total of O(N) operations. (An alternative to all this state maintenance is to for each candidate edge binary search to find each of the 4 transition points.  Its O(log N) rather than O(1) per candidate edge, but given the original sort is O(N log N), this is not significant.)

The above algorithm is simply repeated for every input point, giving a total of O(N^2 log N)

GCJ15 QR

20 points to advance meant that a full solution to the simplest problem wasn’t sufficient.  Even so 12438 people advance to round 1.  23296 people got a positive score.  Makes the 1371 people who turned up to topcoder open pretty tiny, although maybe the simultaneous scheduling of code jam qualifying round and topcoder open was part of why topcoder open’s numbers were so much lower than usual.

348 perfect scores, despite the problem worth the most points having significant scenarios in the large input that were not even close to covered in the small input.

Q1) Given a count of the number of people who will stand up for each threshold of people already standing up, determine the minimum number of extra people (of any mixture of thresholds) required to make everyone stand up.

A1) This is a basic greedy problem, consider each threshold with non-zero population in order, if there aren’t enough people assume there are by adding the difference between how many are standing up and need to be standing up to the already standing up set.  Accumulate these differences and return.  This works because larger thresholds won’t stand up before earlier thresholds, and so you have to add enough to trigger the earlier threshold.  Large input is only 1000, and the solution is linear anyway, so no problems with running time.

Q2) Given stacks that go down by one every minute, except if you pause everything and move a subset of one stack to anywhere else, including making a new stack, what is the minimum time for all stacks to become empty?

A2) So my first instincts here were wrong, a single stack of size 9 can be done in 5 minutes, but my first instinct was to only ever divide stacks into two, which gives 6 minutes as best effort here.  The correct approach is to take 6 (or 3) off , then 3 off the remaining stack of 6, then allow 3 minutes to run its course of the 3 stacks of size 3.  Interestingly the small input includes stacks of size 9, so at least this mistake wouldn’t slip past to the large input.  The high percentage failure rate on the small input here suggest I may not have been alone in missing this key point at first.

To get the large input done in time it is sufficient to solve the problem in O(NK) time, so one option is to consider best time where you split tall stacks down to a maximum height of m before allowing time to run.  Height ranges to consider are from 1 to tallest stack, and number of turns to split a stack down to at most height m is given by simple division, so the total of all stacks can be done in O(N) time.  Giving a total of O(NK) N being number of stacks and K being the height of the largest stack.

Assumptions at play here are that it is always better to split first before letting time run.  This is pretty obvious as if a split is worth doing, doing it sooner means more pancakes per second are removed later.

I think there is an approach with better runtime characteristics, which involves considering only a subset of all heights, something like O(N*Sqrt(K)*Log(N*Sqrt(K))).  For each stack generate the the set of heights for splits down to sqrt of its height, sort all of these in decreasing order and consider the ones greater or equal to sqrt of the largest height.  Each step through this list of heights corresponds to one extra minute doing a split, and defines a sequence of heights of highest remaining after n splits.  Then just linear search for best.  This approach presumes it is never any better to divide a pile beyond its square root, which seems straight forwardly true, dividing things further takes more than one extra split per height reduced.

Q3) Given a potentially repeating sequence if i’s j’s and k’s, determine if it is possible to subdivide them into sections which evaluate to i, j and k respectively assuming that the i, j and k’s represent standard quaternion’s and are being multiplied using standard quaternion multiplication.

A3) The problem helpfully explains that quaternion’s satisfy A*(B*C) = (A*B)*C.

One approach (which is a bit slow, but just sufficient for the large input) is to determine all prefixes which can create i, all the postfixes which can create k, and determine if the gaps in between create j.  The fact that the sequence can be repeating (and the repeat count can be huge) appears to be a problem at first, but a close inspection of quaternion powers reveals this is not going to be a problem.  A fairly quick inspection finds that any quaternion to the power 4 is 1.  Hence repeat counts mod 4 are the only cases needing to be considered.  So, calculate the value of the entire repeating sequence, and consider each of its 4 powers as a pre multiple of any prefix of the repeating sequence.  If the answer is i, store the pairing of the mod 4 power and the prefix length.  Similar for k, but post multiply and postfix of the sequence.  Both of these sets can be generated on O(N) the size of the repeating sequence which is at most 10k.  Each set is also O(N) in size, so we have to be able to determine whether there exists a j in between in O(1) time, as anything over O(N^2) is clearly too slow.

Given a candidate i and k prefix/postfix – there are 2 possibilities, the prefix postfix meet at the same central location and the j is the middle section of that repeated segment, or they have a larger gap in between, and j is a postfix (optional repeats) prefix sequence.  The later is easiest, the postfix length is defined by the prefix length of the candidate i, the prefix length defined by the postfix length of the k candidate and the number of repeats, mod 4 is defined by the mod 4 of each of the i and k candidates and the total number of repeats.  Care must be taken if the number of repeats is small that the only mod 4 value which satisfies the criterion, might be negative.  If you have a cache of prefix/postfix values and powers of the repeated section, this becomes 3 quaternion multiplies.

The first case where the 2 touch is more difficult.  Need to check the mod sum vs the total repeats works out, since the j section doesn’t have any repeats this time.  But unless you pre-cache the product of every subsection of the repeated section, it can’t be done in O(1) time.  That pre-caching is another O(N^2) cost, and a lot of memory, so nice if we can avoid it.  The solution is quaternion division (pre/post) is well defined.  Each quaternion multiplication is a permutation, so for a result and one of the input multipliers, there is only one possible value.  We know the value for the product of an entire section, and the value of the prefix and postfix, so we can do prefix division and then postfix division to determine the value of the middle part.   This is O(1) time, as we need.

However, there is a better way! – if the sequence can be broken in to i, j and k subsections, its total product must be the same as the product of i j and k, which is -1.  So if the total product if the entire sequence (including repeats) is not -1, we can exit out early.  If it is, there is no need to verify the value of j, instead all that is required is to verify that there exist i and k prefix/postfix, and those do not overlap.  So generate the i and k prefix/postfixes as before, but select the shortest of each.  Now there is no longer an O(N^2) inner loop, just a check of the 2 shortest prefix/postfixes do not overlap.  If they don’t, then the inner section is known to be j – given by the fact that quaternion division is well defined and knowing the total product.

Q4) In a game where an opponent gets to choose one n-omino, and you have to place it in an RxC sized container and then fill the remaining space with n-ominos of the same size (but not any specific shapes), determine whether for a given RxC and n the game is always winnable by the opponent (as in they can always choose an n-omino that you cannot place and fill the surrounds).

A4) The small input is interesting here because the number of test cases is 64 exactly.  This corresponds to the number of possible small inputs! – R, C and n are all limited to the range 1-4.  The large they are limited to 20, meaning there are only 8000 possible inputs.  You could theoretically write a less efficient program, brute force them all and hard code the result.

There is one main early out.  If R*C % n is non-zero, it doesn’t matter what the opponent does, they always win by default.

Another easy scope reduction is if R is larger than C, switch so R is the small one and C is the larger.  Then if n is larger than 2R it can be made into an L shape that cannot possibly fit, so the opponent wins.  This is actually almost sufficient for the small input, the one remaining case is the t piece can be selected in 2×4 for n=4, which splits the the space into a 1 and 3 areas which can’t be filled.

The large input covers a lot more scenarios.  But they can be brute forced by hand, if you are careful.  The 17% pass rate suggests cases are easy to miss… First there is a hint in the problem itself, it shows some of the 7-ominos, one of them has a hole.  Obviously the opponent can always choose this piece and make tiling impossible.  This is trivially extended to all larger n-omino’s.

Before I cover the remaining scenarios it is worth mentioning that if the opponent can’t win for AxB, anything larger than AxB that satisfies the mod n condition also can’t be won by the opponent.  So we just have to start small and work our way out, as soon as we get a case where we win, everything larger is a given.  It is fairly trivial to see that some simple zig-zag filling can be used to fill any L shape or rectangle shape that you might add to an AxB to make it a larger case, so long as these L/rectangle shapes themselves are a multiple of n in area.

So lets cover things in order

  • n = 1 – trivially no opponent win.
  • n = 2 – trivially no opponent win if mod satisfied
  • n = 3 – no opponent win if R > 1 and mod satisfied, otherwise opponent win – again pretty trivial.
  • n = 4 – opponent win for R = 1 by L shape.  As mentioned before the t shape is opponent win for R = 2 and C = 4, but this extends to R = 2 in general, as it leaves both sides of wherever you place it with a 1 mod 2 value, which can’t be tiled using pieces of size 4.  R = 3 and higher always works because 3×4 can easily be tiled regardless of opponent choice. (There aren’t many choices for tetromino’s so they are easily worked through.)
  • n = 5 – here is where it gets tricky… base is opponent win for R <= 2 using the L shape. R >= 4 is no opponent win, not trivial to work out, but you can consider all 12 by hand for 4×5, and then larger falls out.
    R = 3 is the gotcha case.  The tricky piece is the diagonal step shape – also known as ‘W’ under standard pentomino naming.  Regardless of rotation in an R=3 space this divides the space into two parts.  For C = 5, the size of these parts is 3 and 7, or 6 and 4 depending on placement.  Neither of these can be tilled.  But for C = 10 or higher, it can be placed so the spaces are 15 and 10 (in the C = 10 case, similar in larger), which can be tiled easily.
  • n = 6 – again L shape gives opponent win for R = 2.  R = 3 is an opponent win by the dagger shape a 4×3 cross which divides the shape into 2 mod 3 and 1 mod 3 spaces, which can’t be tiled by size 6 shapes.  It also can’t be tiled due to the 3×3 ‘space invader’, despite being able to place it in any rotation, which I think is cool…
    R = 4 – this time there are 35 different 6-omino’s to brute force, but none of them cause trouble in 4×6, and hence all larger is no opponent win if mod condition is satisfied.
  • n >= 7 – the piece with a hole gives guaranteed win to opponent.

I missed the W shape in n = 5 when I tried to work these things out by hand, I wonder what the most common mistake was for the large input.

I also wonder how many people tried to actually solve the puzzle programmatically rather than hard coding rules from by hand brute force.  This is not trivial as it involves determining whether you can tile a connected space or not and it is possible to create a ‘t’ junction which means even if the space to fill is a multiple of the n under question, it depends on how many pieces are on each side of the ‘t’ junction as to whether it can be connected.  Its also tedious generating all the n-ominos (if you don’t hard code them) and writing reflection/rotation generation and sliding them all around.  And for the largest sizes you might risk time-out, unless you take advantage of the expansion proof where smaller cases answer larger cases I mentioned, and pre-calculate the answer for all possible inputs (from small to large) before running the 100 test cases.

TCO15 R1A

So round 1 looks like it is going to be a bit silly this year, 750 to advance per round, 3 rounds, 250 byes, but only 1371 people registered for R1A.  Unless a bunch of people forgot about R1A, everyone with a positive score in R1B will advance, let alone R1C…

I had a poorly written solution to Q1 and a decent solution to Q2, but ran out of time to really think through Q3.  Both my submitted solutions passed, regardless of how wrong my solution to Q1 was in theory…

In the end the positive score criterion came in to play, only 700 people advanced, the rest got 0 points or lower.  I was in 221st, not an amazing effort considering the 250 byes, but not bad.  My rating even went up a tiny bit, 1805.

Q1) Given an inclusive range, determine the greatest number of distinct common digits between any pair of numbers.

A1) I wrote a brute force solution, which I limited to running if the range was 200 in length or less, then for the rest I checked each number against single pair digit switches, if the resulting number was in the range that was a pair to consider, if not I presumed that it could be paired with one of its neighbours, to give distinct count of digits in itself – 1.  There are numerous flaws in this approach, but I couldn’t work out how to actually exploit any of them, as they only overestimated in scenarios where another valid option was better anyway, and seemingly never underestimated.

The better approach (far better) is to create a 10 bit mask out of each input, for the distinct digits present.  Then tabulate the masks.  Result is then the best bit count out of any mask with a count  > 1, or of the ‘&’ of any two bit masks both with counts > 0.

Q2) Given a directed graph  with exactly one edge leaving each node (which can return to self) determine the number of subsets of the graph which can follow there edges for up to K times without the new subset being any smaller.

A2) Two tricks here.  1) K can be huge, but the graph only has at most 50 nodes.  So it is fairly obvious that K > 2500 can just be considered as K = 2500 as any pair of nodes which might eventually map together will have a cycle length each of at most 50, and hence will either never meet, or meet in under 2500 steps.  Indeed this seems like a very conservative analysis, I think they either never meet or since they end up in the same cycle its at most 50 ‘pre-cycle’ links before they meet.  2) We just need to consider whether any pair of nodes eventually maps to the same place.  Once these pairs are found, they transitively form in to groups in which every pair collapses.

Together this means there are at most 2500 scenarios needing at most 2500 (or 50) steps of simulation, which will run quickly.

Once the groups are identified by simulation the the result is the product of the values of size of each group + 1.  This is because for each group either nothing is selected, or at most one element is selected.  Any node which doesn’t find a pair with any other node is in a group by itself, causing a multiple of 2, which leads to the worst case value of 2^50 as expected in the case where every node is independent.

Q3) Given a weighted bipartite graph, determine the minimum edge removal cost to ensure there is no perfect matching.

A3) So I’m still trying to understand the solutions I’ve read here, but basically it considers every non-empty subset of the one side of the graph, determines the cost to disconnect each node in the other side of the graph from that subset, and considers a possible minimum as disconnecting the cheapest x nodes, where x is the number of nodes in one side of the graph – the size of the subset + 1.  For the case of subset is everything, this corresponds to disconnecting one node entirely.  In the case the subset is a single node, it corresponds to disconnecting everything from that node.  In between it covers the other cases like if 2 nodes are disconnected from everything except 1, they can’t be perfectly matched due to pigeon hole principle.

Only thing I’m not clear on is the proof that every non-perfect matching scenario can be reduced to x nodes disconnected from n-x+1 nodes.  The reverse is obvious every such scenario is non-perfect matching, but the equivalence is not so much…