BindingList does not scale…

So I’ve been doing some UI work in the last 6 months or so – experimenting with MVVM and the like.  Mostly with WPF, but also some win forms.

Until the last few days I had managed to avoid working with BindingList, for one reason or another.  Either because I was using ObservableCollection in WPF or because my lists were static, or they always updated in bulk.  As part of my investigation into BindingList I found that it supports something which ObservableCollection does not. That is, cascade notification of changes to elements.  So if an element type supports INotifyPropertyChanged, then BindingList will raise events when an element raises its PropertyChanged event, saying which element changed and more specifically which property on that element changed.  Seems like a kind of nifty feature, at first glance…

But the implementation does not scale, it is slow, it performs terribly with larger lists.  If your element type supports INotifyPropertyChanged, every time one of those elements raises the property changed event the entire list is walked to work out the index in the list of the item which raised the event!  I was in shock when I first realised this.  You see BindingList is truly just a rather thin wrapper over Collection<T>, so there is no metadata associated with each entry, all of the binding of the element PropertyChanged event is directed to a single handler, and all it gets given is the source and the name of the changed property, so there is no way to include the NewIndex parameter in ListChangedEventArgs without doing a search.  (By default this search even uses the default object comparator, so if you happen to have two different but sometimes equal objects in your list, enjoy the results…)

Another side note – AddNew, the other feature which BindingList has which Collection<T> does not – also does not scale.  It has to use IndexOf to find out where in the list the newly added item ended up  in case it needs to cancel the add, because it supports auto sorting in derived types. (BindingList does not support auto sorting itself…)

Morale of the story… don’t use BindingList for more than a hundred or so items which support INotifyPropertyChanged – write your own.  If you do write your own, consider just not supporting the cascading modify events at all (even though you can do it efficiently if you have to).  Item presentation should bind to the items, not to the list which holds them.  But alas, not every control agrees…

(PS – BindingSource internally creates a BindingList in many scenarios…)

(PPS – Heh, I just realized this is immediately after my Contains rant, and is effectively an IndexOf rant, which I said I hadn’t seen misused much…)

Posted in .Net Stuff | Leave a comment

List.Contains Considered Harmful

So this post isn’t going to add much value to the universe – it is just a rant.

The number of times where I have seen the word ‘Contains’ and cringed is far, far more than I would have ever expected.  I can’t say the same for IndexOf, but it certainly carries a similar risk.

foreach (int newValue in input)
    if (!list.Contains(newValue))
        list.Add(newValue);

This little snippet and similar versions are the most common cringe worthy occasion, but there are certainly plenty of others.  Of course it works just fine, until input contains a hundred thousand entries, or more (or less…).

I think a major part of the problem is that Contains is so innocuous, just one little word, so easy.  If we forced everyone to write the double nested loop, maybe there would be less cringe-worthy moments.

Most of the time this is a case of poorly chosen data structure – they don’t actually want a list, they want a set.  SortedSet/HashSet either one will probably do.  If that isn’t it they might sometimes want something like multiset from c++, or Dictionary<T, int> aka a counting set.  Rarely they may even want an OrderedSet – elements have specific (non-sorted) order, no repeats and hence want a fast contains check – although I’ve never seen such a collection in use.  (I see a few future additions to TMD.Algo…)

Sometimes changing the data structure is not an option. (Legacy code, I am looking at you!)  While this is not a happy place to be when it comes to performance, temporarily placing the data into the correct data structure during the important code points is still a huge win in many scenarios.

But simply knowing the above is probably not enough, I suspect people are going to fall into the same trap time and time again.  Maybe I should open a Connect ticket asking Contains to be renamed CheckEachExistingEntryForTheGivenInput…

Posted in .Net Stuff | 3 Comments

Google+ Page

Not entirely sure what I am doing here but presenting…

My new Google+ Page

I guess I shall see how it goes…

Posted in Random Musings | Leave a comment

.Net Runtime core improvements

Looking through the slides for this presentation.  I found a few things interesting.

‘Server GC’ in .Net 4.5 now supports background garbage collection, so it is now more useful for desktop apps as well as pure-throughput apps.  It also supports better distributing the mark phase of mark-sweep across multiple cores allowing garbage collection to scale better.

Profile guided optimization can now be done with managed code, and the resultant information annotated into IL for improved startup-time and reduced memory usage.  Apparently.

On Windows 8 only, .Net 4.5 will ‘auto-NGen’.  No need to think about whether you should NGen or not, its all automatically done in the background based on observed assembly usage.

And finally quite possibly my favourite.  Profiler hooks can now rewrite methods and request a rejit.  So you can inject instrumentation to methods which have already been jitted, or disable instrumentation without a process re-start.

Posted in .Net Stuff | Leave a comment

WinRT and the return of the memory leak…

WinRT, so much new and interesting… so much old and broken?  I’ve read a few articles now, watched a couple of talk recordings – nothing great and in-depth but I think I’ve picked up a few details.

WinRT is effectively the combination of ‘COM done better’ and ‘OS API done using the result’.

On the COM done better side we get, nicer integration into more languages.  Along with that is a metadata format which is shipped with the code.  Metadata which includes versioning information to try and avoid some of the COM interface versioning hell.  You get better runtime reflection – every WinRT type can tell you what interfaces it implements and provide some kind of name.  Apparently there is type inheritance (although apparently only the WinRT core is allowed to use it…), and generics and better handling of events and asynchronous callbacks.  A new better type of string, simple structs and collection types built into the core that everyone can consistently use as part of their own APIs.  Each app apparently even has its own COM(I mean WinRT) registry, so no conflict issues.

OS API done using the result gives opportunities to do things the way they should be done.  It also allows for namespacing and componentization, which when mixed with the metadata mean much improved discoverability and general developer experience.  No need to call clean-up methods, because they are automatically called when the associated type is released…

Which brings us to the ‘old and broken’ part of this story.  WinRT shows its COM roots, everything is IUnknown, the same IUnknown from COM.  This means AddRef and Release.  The real bonus here however is that this is all hidden away from c#/VB/javascript and even c++ if you use the c++/cx extension approach to dealing with WinRT.  I say bonus somewhat dubiously as fully-automatic reference counting results in fully-automatic memory leaks.

Take for example the PropertySet in WinRT.  Using this from c# in VS11 the only vaguely apparent suggestion that this isn’t a .Net type is its namespace.  I can easily see developers thinking that this was a replacement for StringDictionary (which has conveniently been removed from the .Net 4.5 core runtime).  But you will see the difference if you create a circle of references between PropertySets.  You have to manually break the circle or the PropertySet memory will never be released!

As a bonus to this post, I’ll end by mentioning that WinRT apparently does not support exceptions!  This may seem strange to anyone who has used it, since calling WinRT APIs certainly throws exceptions, but this is entirely language integration, under the hood an error is limited to a 32bit integer – a HRESULT.  This means WinRT components cannot provide useful error details directly through their APIs without mangling those APIs to provide a bunch of output parameters that are normally never used.

I was watching one of the WinRT talks where they decided to describe a ‘feature’ that if an error occurs in the WinRT core libraries, it can directly tell the attached debugger additional details, since they cannot be returned to the calling code! (One can only hope that they provide a way for custom written components to do this too, it is going to be bad enough that when ComponentArt 19 for Metro comes out every error returned to your code is ‘COMException’ at runtime, not being able to find out useful details while debugging would be a nightmare.)  Almost makes one wish for the return of GetLastError…

Posted in .Net Stuff | Leave a comment

.Net 4.5 – Changes I find interesting

So I went through the 17meg file looking for the changes I found most interesting.  So here they are… in no particular order.

  • SQL Server connection string now supports an option called ‘application intention’ – lets you specify whether you intend on performing any modifications.  All the documentation I could find on this topic talks about primary/secondary replicas and denying access to secondary replicas if the intention isn’t read-only – but it seems like a great opportunity for a simple approach to reducing attack-surface.
  • MethodImpl attribute has gained a new option ‘AggressiveInlining’ – documentation says the method will be inlined if possible.
  • New assembly mode Preferred32bit – documentation seems to suggest that this is the new Require32bit, only it won’t break when Microsoft releases an OS with no WOW64 mode… or maybe that if you invoke it from a 64bit context it will still load…
  • Reflection.TypeInfo – this is a bit of a strange one.  In .Net 4.5 Core, it is practically ‘the’ place to go to get all your type reflection information needs, it has most of the APIs they removed from Type, and some simpler methods and properties for working via reflection.  In .Net 4.5 Full, it still has those simpler methods/properties, but all the APIs are still back on Type instead.
  • Regular expressions now have an option for ‘match timeout’ to stop runaway regular expressions with infinite recursive backtracking or similar.
  • Threading.Tasks.DataFlow – a higher-level approach to defining how threads send data to each other, rather than writing your own queue or thread-pool.  Apparently it has been released as a CTP before and is based off of an existing part of the c++ TPL.
  • Threading.Volatile.Read/Write – Documentation here is that this provides volatile semantics to visual basic, but apparently also for array elements in c#.  I think there might be some use for it in crazy corner cases to implement ‘partial’ volatile as well to improve performance.
  • WeakReference<T> – typed weak references.  Does not support trying to take a reference to a struct (oddly enough)!  But it does provide a combined test and retrieve method to avoid calling IsAlive only to receive nothing from GetValue.
  • Windows.Shell.WindowChrome – WPF gets access to custom Aero UI features.  Now I too can create really annoying glass windows…
  • Runtime.ProfileOptimization.StartProfile – this could be my new favourite.  Putting this at the start of application causes method JITs to be recorded, and if the application has been run more than once, it also causes a background thread to spin up and JIT methods from the previous recording, speeding up the application start-up so long as the application behaviour is relatively predictable.
  • IReadOnlyList – a read only interface for List<T> (but not IList<T>…).  This interface is covariant so it can be cast, but it also cannot have the Contains or IndexOf methods. (The CopyTo method is missing for no apparent reason, except that maybe it might as well be an extension method.)
  • IReadOnlyDictionary – a read only interface for Dictionary<TKey, TValue> (but not IDictionary<TKey,TValue>).  This interface is not covariant or contravariant in any way, despite only ever taking the key as an input parameter and the value as an output parameter.  I wonder if that is an oversight.  There is also an IDictionary wrapper which presents this interface – presumably to ensure someone doesn’t just cast away your read-only-ness.
  • CultureInfo.DefaultThreadCurrentCulture – appears you can change which culture new threads get given – very nice in some areas I can imagine.
  • Zip file support through Compression.ZipArchive, and supposedly gzip compression which is significantly improved.
  • IProgress<T> – little interface with just one method Report(T progress). Progress<T>, a default implementation which raises an event which report is called. Odd little thing, not sure what it is doing…
  • New interop types – IInspectable and HString.  These are apparently windows runtime types…
  • Monitor.IsEntered – just in case you forgot? (Guess threading code can get complicated at times…)
  • Threading.Timeout.InfiniteTimeSpan – finally, no more converting -1ms and hoping it works?
  • Improved virtualization support in WPF. (Pixel level scrolling and pre-post caching.) (Much more interesting than the ribbon!)
  • Async support is everywhere… not just streams/web requests/dns queries/database access – it is also in xml parsing (presumably to support the scenario where the underlying stream is a file).

Edit: watching some talks I discover that something that didn’t seem interesting at the time actually is.  ThreadLocal<T> had an IEnumerable<T> GetValues method added to it.  This actually lets you see all of the values currently allocated across all threads.  Not sure how often I’ll use it, but it is cool.

Posted in .Net Stuff | Leave a comment

.Net 4.5 Developer Preview

So I, like apparently many others, downloaded the windows 8 developer preview.

Plenty of interesting stuff under the hood, but I decided to jump into .Net 4.5.  In the dev preview there is a VS 11 Express edition, for Metro apps. I quickly discovered that .Net 4.5 ‘Core’ Edition, which is what you can use in Metro, is quite a bit different. (How many hours did I spend trying to write the word hello to a file…)

So I decided to do a version comparison. Enter RuntimeVersionCompare.html.  Note this is a 17MB html document, so it may take a little while to download – and I may have to find somewhere else to host it if it becomes popular…

There are 4 colours used.  Red means ‘Only in .Net 4′, Green means ‘Only in .Net 4.5′, Yellow means ‘Missing from .Net 4.5 Core’ and white is whatever else is left over.  With 5 flags (since I included the client profiles), there are potentially 31 scenarios for colouring, but I was lazy.  Each entry does contain numbers mapping to which version contains the specific API, so if you don’t like the colours as they stand, a simple re-parse and output should fix that.  (You could even risk it and try and do such a change using regular expressions…)

You will see a lot of yellow – in part this is because of the absence of WinRT API in the output which Metro can use in addition to .Net 4.5 Core.  WinRT API is missing mostly because it is described using winmd files, which I haven’t yet attempted to reverse-engineer. (Hopefully they are actually .Net metadata-only dll files in disguise and ildasm will process them just fine…)

Also note that this analysis was all done using custom code which parses the output of IlDasm – as such it almost certainly contains errors, strange syntax in the output and many other flaws (not the least of which is that I didn’t use good CSS style in my output HTML).

Feel free to do whatever you like with this document, although attribution would be nice, and please try to avoid downloading it more than once…

Posted in .Net Stuff | Leave a comment

Google+

So I got a Google+ invitation today.  Admittedly I am not a big social network user (not even close) but I was definitely interested to see the differences between Google+ and Facebook.  People have talked about ‘circles’ in the media a lot, and they certainly are an excellent feature but they were not top of my thoughts.  Far more than circles I think the key difference between Google+ and Facebook is the fact that I don’t need your permission to add you to one of my circles.  Admittedly unless you share a lot of items with ‘public’ this doesn’t do me much good but conceptually it just feels so much nicer… and there is something to be said for conceptual nicety.

There are definitely a few niggles – for example you can secure your links section on your profile, but there doesn’t appear to be a way to customize it on a per-link basis.  Currently I only have one link, so that is hardly an issue, but I could see it being significant to some people.  Another case which I think is probably much more interesting is the ‘people who are in my circles’ list on your profile.  You can select which circles to include, and you can select which circles can see it at all, but I could see a strong case for options like ‘anyone who is in a circle can see who else is in that circle, but not other circles’ or even much deeper customization.

It would probably be nice to define circles using other circles.  So if there is a couple of circles that I frequently want to share with, I could create a ‘union’ circle, when either dependent circle updates, the union circle membership would update automatically, reducing my mistakes.  Similarly, an except circle like ‘all my friends except …’ again, reducing mistakes.  Then again, I currently only have 1 person in my circles…

I would be interested to see whether Google+ ends up with a strong integration with the existing multiple accounts feature of Google – I definitely see that as being a potential win for people with ‘Gamer Identities’ among other scenarios.  I would test it, but somehow I think that sending an invite to yourself is not the intention!

Posted in Random Musings | Leave a comment

TCO11 R2

No t-shirts for me this year…

Only 760 out of 850 potential contestants turned up, and I came 443rd, short of the required top 350 which advance to shirt round.  I think 7 Aussies qualified for this round, but only 6 registered.  Out of those 6, 3 failed to get any points (possibly fell asleep!), 2 of us around the 440th place, and John Dethridge advances 315th place.

I could have made it through to get a t-shirt, but I took too long to realise I had an overflow bug in my count number of powers of k less than or equal to n loop.  I really should have implemented it using a divide, but also I should stop using 32bit integers – except where absolutely needed.  One day I will learn!

I had time enough to do question 2, but it was not easy and I was very tired.  I looked for corner cases for Q1 hoping to get the single challenge I needed to get through, but the given examples covered all of them, so there was basically no one to challenge.

Q1) How many boolean sequences of length n are are plausible, if the ith entry is the answer to whether a number is divisible by i.  Return answer mod 1000000007 since input can be up to 1000000.

A1) I was a bit slow in realizing the correct approach here, but it wasn’t too long before I realized the answer was related purely based on the primes less than or equal to n.  Specifically each prime doubled the answer at a minimum, more if different powers of that prime were less than n as well.  More specifically if the maximum power of a prime p less than or equal to n is p^k then the prime independently contributes (k+1) ways to build the final sequence.  So, multiply all these factors mod 1 billion and 7 and you have your answer.

What kept me too slow to advance was it took a long time to realize that my loop for determining max power was potentially squaring numbers just less than 1 million, which doesn’t fit into 32bit integer.  I had correctly identified that I needed to use 64bit integers for the result as small powers go more than squared and so mod 1 billion and 7 is not sufficient to keep you under 32bit limit, but I should have used them more liberally from the start.

Q2) Given an square array of strings which are built up by concatenating the smallest of the vertically and horizontally previous cells in front of the largest, where the first row and column are provided (and each have a single character in each cell), return up to 50 characters starting at a specific index of the string in the far bottom right corner (the largest string).  Note: no character will be repeated in the first row or column.

A2) Pascals triangle gives the length of each cell – and with a worst case of 30×30 the length of the largest string can be very large!

Having built up pascals triangle, all that is really needed is to know for each cell, which of the two vertically and horizontally previous cells is lexically smallest.  As usual, ‘all’ is of course the whole actual problem.  I thought I was on the right track for this, but I suspect its performance is too slow, the trick being to define a memotizable recursion which doesn’t require an offset and length for comparison for both cells involved.

Edit: Before I managed to fall asleep I think I solved this.  The memotizable recursion is ‘which is smaller for 2 cells on a common diagonal?’.  If the 2 cells are neighbours on a diagonal, they are built out of the 3 cells on the previous diagonal. From there it can be seen that the middle cell can be ignored as it is either the common start, the common end or transitive property applies and you can just compare the outside pair.  If they are not neighbours, you get 4 cells which are in neighbouring pairs.  By using the memotized recursion on each neighbouring pair you can then reapply the recursion on the smallest of each pair.  Memotization ensures that the algorithm runs in O(n^3), both in space and time.

Now if you can read between the lines there you’ll notice that I have failed to take in to account an important special case.  Once the recursion hits an edge we need to compare that edge value to a cell on the same diagonal.  Unless that other cell is on the opposite edge of the diagonal, this doesn’t appear trivial.  However the original problem was ‘determine the nth character in the bottom right hand corner’, we can trivially expand that to also ‘determine the first character in every cell’.  Since the answer to these new questions depends on the which is smaller question, it might seem we are at an impasse – but because each question is entirely answered by considering the previous diagonal alone, we can invoke mutual memotized recursion safely.

All that remains is to prove that this method works.  Something which is obvious is that each cell is only made up of characters which appear on the edge in the same or lesser row, or same or lesser column.  Thus since every comparison in the algorithm involves two different cells on the same diagonal, it can be seen that the termination comparison is never equal, as an edge on a diagonal compared to a different cell in the diagonal cannot be equal as the cells on the diagonal cannot be made out of that edges character and all edge characters are distinct.

Q3) Given 2 arrays of positions, and a set of ‘connections’ from one array to another, determine the number of orderings of the two arrays such that no connections cross. (Return mod 1000000007 as with arrays of length 50, the answer might be very large.)

A3) Edit: Also solved this one before going to sleep.  I think it is easier than Q2 even.

First step, DFS the connections to form them into groups.  If you discover a loop, return 0.  Now we have our first factor in the answer, n! where n is the number of groups.  Each group is going to be independent, since there is no way they can cross each other without causing a crossed connection so the n! comes from the orderings of the n groups.

Each group now contributes an independent factor.  There are 2 scenarios, either every edge in the group joins at a common endpoint, or there are 2 or more locations in the group where 2 or more edges join.  If it is a single common endpoint the answer is k! where k is the number of edges in the group.  If it is more than two common endpoints we can either arrange said endpoints left to right, or right to left – so that gives a factor of 2.  Then for each endpoint we also get a factor of (m-2)! where m is the number of edges which join to that endpoint.

Finally we have additional factors for the array locations which do not participate in a connection.  These can be added in any place in any order, so if there are n spots in the array and k of them are endpoints of 1 or more connections, the additional factor is n!/k!.  Since there are 2 arrays, this gives 2 more factors total.  Multiply everything together using repeated modulo and 64 bit integers… and we should be done.

All in all it makes me wish I was less tired when I did the competition.  I think I could have solved that second question during the competition if I had of been able to concentrate at all.  Third one, well I’m pretty sure I would have run out of time having only opened it with a few minutes to go.

Posted in Code Competitions | Leave a comment

TCO11 R1

With 2000 qualifiers able to sign up, only 1550 actually did.

I started writing this post while the solutions were going up, I wasn’t very impressed with my effort (even allowing for the 2am start time and the remnants of the head-cold I’ve had over the last few days), and fully expected that I had failed to make it through to the next round.  Scores were still updating but I was 852 and very few re-orderings were occurring due to system test failures.  However, at that point I also thought that the cut-off was the top 600.  I was wrong – top 850 go through – and final results have me at 845th!

I guess that means I get the privilege of waking up at 2am next week, along with the 6 other Australian’s who progressed. Only 8 Australians bothered to wake up (including myself), I’m sure more qualified than that…

Q1) Given a sequence of characters X and O, you want to reorder them into a different sequence.  4 operations are available, move the front to the end of either of 2  separate storage locations, or move the front of either of the 2 separate storage locations to the end of the original.  Return the minimum number of operations required.

A1) Simple! – although at 2am nursing a muddled head it certainly took me long enough – a few seconds longer and I would have been eliminated this round.  The answer is 2 times the minimum number of characters to remove from the front before the rest is equal to the start of the desired sequence.

Obviously there is no way it can be smaller than this, all that remains is to justify that it can always be achieved in this number of turns.  To do this, all it takes is to realize that there are 2 storage locations and 2 character types.  So every X you move to storage 1 and every 0 you move to storage 2.  Then you can put them back on to the end of the initial location in whatever order you like, no restrictions since each storage is all the same character.

Q2) Given a set of probabilities that a road segment is muddy, and the ability to jump over any single road segment you like, what is the expectation value for the minimum number of muddy road segments you have to stand in to get from start to end? (Start and end are both guaranteed to be dry.)

A2) As probability problems go, this one isn’t as bad as some.  However, my first write up I tried to build from start to finish, rather than from finish back to start – which was completely wrong, so I had to scrap that and start again.  I then made about a dozen mistakes getting 0 and 1 mixed up.  No idea why but I had decided to choose 1 as dry and 0 as muddy and half way through I switched that all backwards…  I should have realised that 0 as dry and 1 as muddy was a more natural assignment from the start.  I finally sorted them all out, but only a couple of minutes after the end of the available coding time!

So I wrote the solution as a recursion of expectation value for a given position depending on whether the position closer to the finish is muddy or not as well as the current position is muddy or not. (Apparently it can even be defined as a straight recursion, like a complicated version of the Fibonacci series – but I am too tired to see the path to that.)  From there the answer is the sum of the two scenarios for position 0(start) where they are not muddy, weighted average by the probability of each scenario which is given by the probability of position 1 being muddy.  Memotized recursion or DP to the answer.

Details of the recursion are rather long to fully write out, but can be summarized as a couple of conditions. If the next position is dry, take expectation value of that position for the dry case (2 scenarios weighted average), and then add 1 if the current position is muddy.  If the next position is wet, take expectation value of the position after in either dry or wet case (4 scenarios weighted average) and add 1 if the current position is muddy.

This works because if the next cell is dry jumping is not required, and worst case you jump into mud, so you never try jumping.  If next cell is muddy, you always jump, because if the cell after is dry its a winner, if it is muddy then you have stepped in the same number of muddy cells as if you had not jumped, but you are closer to the finish (that is not a formal proof, but hopefully obvious).

Q3) Imagine IP addresses where each of the 4 components are valued 0 to 999.  Then imagine that you effectively own all of them, and that people are willing to pay for each IP address.  Up to 50 people make a request for 1 or more IP addresses and give a price they are willing to pay per address.  Each request is in the dotted form, with wild cards allowed for any/all of the 4 segments(no partial wild cards, each segment is either a number or a match anything wild card).  Determine the maximum profit to be made.

A3) Interesting question.  With up to 50 trillion operations in brute force, there definitely needs to be some accelerated counting.

One failed solution I saw calculated all intersections, to gather each section where there is multiple possibilities. Sort them by size largest to smallest, then do inclusion/exclusion counting to get the answer.  I think the general principle was sound, but the implementation was lacking.  The trick comes in determining inclusion/exclusion counting.  Worst case for unique intersections is less than 70000, but for each one it could be the result of many different ways of combining, some of which might be needing an inclusion, while others need an exclusion (I think?).  I am thinking forming them into a potentially interlinking forest, which is depth first searched from each root with some kind of memotization to produce the result without a performance explosion.

5am has arrived, I’ll have to come back to this one. (And maybe QR3, which I never wrote up…)

Posted in Code Competitions | Leave a comment