.Net 4.6 RC Diff

So its been a while since I last did a framework surface diff, but it seems my program still works.  This time I diffed 4.6 RC vs 4.5.1.

A few small things, but hiding in the middle is a bunch of new classes in System.Numerics which is nice.

As usual this list is not complete, I skip things which I don’t think are worth mentioning.  But that isn’t many things this time.

  • System.Array.Empty<T>() – does what it says on the box.
  • System.Buffer.MemoryCopy – like BlockCopy, but for raw memory pointers rather than arrays.
  • Usual set of SQL Server database connection additions – column encryption, authentication types.
  • DateTimeOffset supports conversion to/from unix time in seconds/milliseconds.
  • System.Diagnostics.ProcessStartInfo now has an Environment property.  Not sure what it does yet, given there is already the EnvironmentVariables property.
  • Bunch of Event tracing stuff I have no idea about…
  • System.FormattableString – seems to bunch a format string and its arguments together – but has no public constructor… But there is a static create method on System.Runtime.CompilerServices.FormattableStringFactory.
  • System.GC – new ‘no GC’ region methods which ensure there is enough memory before they start.  Can also force a small object heap compaction.
  • System.Globalization.CompareInfo.GetHashCode – can get a culture specific hashcode.
  • System.IO.MemoryStream.TryGetBuffer – can get back an array segment when memory stream is constructed with offset/length.
  • Async methods for NamedPipeStream connection.
  • Async methods for read/write/flush on UnmanagedMemoryStream
  • Socket options for reusing ports.
  • System Numerics adds Matrix3x2, Matrix4x4, Plane, Quaternion, Vector2, Vector3, Vector4.  They are unfortunately all single precision, but the reasoning is justified – they are all hardware accelerated using SIMD.  Interestingly the pre-release 4.6 documentation appears newer than the actual released code here.  It documents many useful static methods which are not yet in the actual library.  Also there is apparently a System.Vector<T> coming which works for any primitive type T, with a maximum rank dependent on SIMD register width.
  • New assembly level attribute – DisablePrivateReflection – apparently does what it says…
  • Asymmetric padding mode options for crypto.  CNG supports RSA.  Crypto random number generator can write random values to a subset of an array.
  • WindowsIdentity.RunImpersonated – can now execute a delegate rather than having to set up an impersonation context.
  • string.Format with format provider now has the small arg count optimization of explicit methods for 1 2 and 3 format args.
  • System.Encoding has dropped default support for code page based encodings, you have to manually call RegisterProvider to have them supported??
  • System.Encoding has a GetString which takes a byte pointer and length rather than just an array.
  • StringBuilder has an Append which takes a char pointer and length rather than just an array.  Also small arg count optimization with AppendFormat with format provider.
  • System.Threading.AsyncLocal – interesting, not 100% clear what it does yet, but it sounds like thread local variables except with a value per ‘async flow’ rather than per thread.
  • Task creation methods for immediate cancelled/exception states.
  • Extension methods to get a safe wait handle given a wait handle.
  • System.Uri.IdnHost – can get punycode version of international domain names.
  • WPF diagnostics hook event for visible tree changes.
  • System.Xml.XmlNode gets a PreviousText property.

.Net 4.5.1 Preview

A new .Net preview is out, this time a minor point release, but I decided to do a reference assembly comparison anyway.

Absolutely nothing removed (as expected), but also only a handful of additions to the API.

  • Precision/Scale properties for DbParameter.
  • SqlConnectionStringBuilder gets connection retry count and interval properties.
  • EventSource (Diagnostics.Tracing) gets ConstructionException and CurrentThreadActivityId read-only properties and a couple of methods to set the current thread activity ID.  I think the current thread activity ID concept may be new to Windows 8.1 – if something else I was reading is associated with this.
  • EventWrittenEventArgs (Diagnostics.Tracing) gets ActivityId and RelatedActivityId read-only properties.
  • ActiveDirectory has new enum values for 2012 R2 Server domains/forests.
  • MemoryMappedViewAccessor and MemoryMappedViewStream both now have PointerOffset read-only properties.
  • GCLargeObjectHeapCompactionMode enum for use with the new LargeObjectHeapCompactionMode property on GCSettings.  2 values, Default and CompactOnce.  I guess this property auto-resets back to default on the next full GC if set to compact once. (Also in Core.)
  • A bunch of new generic methods on Marshal – looks like a mixture of convenience and avoiding boxing of structs. (Also in Core.)
  • SignedXml – SafeCanonicalizationMethods read-only property.
  • TransactionScope now supports the new TransactionScopeAsyncFlowOption enum in several new constructors.  I presume this controls how transaction scopes interact with async, which I presume lets you improve performance when you know a certain async section can safely be performed outside transaction scope.
  • Web.Hosting.CustomLoaderAttribute
  • HostingEnvironment.StopListening event.
  • IStopListeningRegisteredObject and ISuspendibleRegisteredObject interfaces in Web.Hosting – but I haven’t found what uses them yet.
  • Core only: AsRandomAccessStream extension method for streams.
  • Core only: System.Threading.Timer arrives.

I have listed everything I found, but if you want to look at my raw processed output files for some reason (which are much smaller than last time, due to a far smaller number of changes) there is v4.5.1 to v4.5 comparison and v4.5.1 core to v4.5 core comparison.

Socket.ReceiveAsync – broken or broken?

I ran into another .Net runtime bug today.  Socket.ReceiveAsync.  This method promises to improve performance compared to using Socket.BeginReceive/EndReceive and, I would say it manages that.  However, if you are receiving data into continuously moving buffer, ReceiveAsync is dangerous.  Specifically if you call BufferList setter or SetBuffer after each receive before starting the next, it returns seemingly random values for the number of bytes transferred compared to what is written in the actual buffer.

After many hours of reflection and debugging I think I may finally understand what is going wrong.  It appears that ReceiveAsync is ‘too fast’, an asynchronous completion can be triggered before ReceiveAsync exits, presumably while WSARecv is still running.  Then when BufferList gets set it modifies stuff that WSARecv is still using, causing much confusion and something goes horribly wrong…

I think BeginReceive/EndReceive don’t suffer from this problem because they construct a new set of parameters for WSARecv every time rather than attempting to reuse stuff to avoid memory allocations, so there is no potential for a still active WSARecv to mess around with the data about to be given to the next WSARecv.  Either that or EndReceive ends up waiting in what would be the failure case for ReceiveAsync – but the reflected code doesn’t seem to indicate that.

The ‘solution’ I have for ReceiveAsync relies on waiting to reset BufferList until a ManualResetEventSlim gets set when the previous call to ReceiveAsync returns.  This solves the problems, but even ManualResetEventSlim is expensive enough that it seems the performance gains are lost.

Hacky Ugly Example Code:  (Note that removing the comment marks on the lines regarding the ManualResetEventSlim ensures this program never prints anything.)

using System;
using System.Net;
using System.Net.Sockets;
using System.Threading;

namespace ConsoleApplication8
{
    class Program
    {
        static void Main(string[] args)
        {
            TcpListener listener = new TcpListener(IPAddress.Any, 5678);
            listener.Start();
            client = new TcpClient("localhost", 5678);
            var server = listener.AcceptTcpClient();
            ThreadPool.QueueUserWorkItem(callback =>
                {
                    byte[] bufferInner = new byte[8192];
                    while (true)
                    {
                        try
                        {
                            int bytes = server.Client.Receive(bufferInner);
                            if (bytes == 0)
                                return;
                            server.Client.Send(bufferInner, 0, bytes, SocketFlags.None);
                        }
                        catch
                        {
                            break;
                        }
                    }

                });
            ThreadPool.QueueUserWorkItem(callback =>
                {
                    byte[] bufferInner = new byte[24];
                    for (int i = 0; i < bufferInner.Length; i++)
                        bufferInner[i] = (byte) (i+1);
                    for (int i = 0; i < 10000000; i++)
                        client.Client.Send(bufferInner);
                });
            asyncEventArgs1 = new SocketAsyncEventArgs();
            asyncEventArgs1.Completed += AsyncEventArgsOnCompleted;
            asyncEventArgs2 = new SocketAsyncEventArgs();
            asyncEventArgs2.Completed += AsyncEventArgsOnCompleted;
            buffer = new byte[bigBufferLength];
            BeginReceive();
            while (true)
            {
                Console.ReadKey();
            }
        }

        private const int bigBufferLength = 1024*1024;

        private static void BeginReceive()
        {
            if (asyncEventArgs == asyncEventArgs1)
                asyncEventArgs = asyncEventArgs2;
            else
                asyncEventArgs = asyncEventArgs2;
            int length = rnd.Next(2) == 0 ? 8192 : 100;
            prevLastWrapped = lastWrapped;
            lastWrapped = false;
            if (lastWriteIndex + length <= bigBufferLength)
                asyncEventArgs.BufferList = new ArraySegment<byte>[]
                    {new ArraySegment<byte>(buffer, lastWriteIndex, length),};
            else
            {
                asyncEventArgs.BufferList = new ArraySegment<byte>[]
                    {
                        new ArraySegment<byte>(buffer, lastWriteIndex, bigBufferLength - lastWriteIndex),
                        new ArraySegment<byte>(buffer, 0, lastWriteIndex + length - bigBufferLength),
                    };
                lastWrapped = true;
            }
            Clear(length);
            prevLastWriteLength = lastWriteLength;
            lastWriteLength = length;
            prevWasSync = wasSync;
            wasSync = false;
            int newVal = Interlocked.Decrement(ref counter);
            if (newVal != -1)
            {
                Console.WriteLine("BeginReceive called wrong number of times. {0}", newVal);
                throw new Exception("Fail!");
            }
            //callerExited.Reset();
            if (!client.Client.ReceiveAsync(asyncEventArgs))
            {
                wasSync = true;
                //callerExited.Set();
                AsyncEventArgsOnCompleted(null, asyncEventArgs);
            }
            else
            {
                //callerExited.Set();
            }
        }
        //private static ManualResetEventSlim callerExited = new ManualResetEventSlim(false);

        private static bool lastWrapped = false;
        private static bool prevLastWrapped = false;

        private static bool wasSync = false;
        private static bool prevWasSync = false;

        private static volatile int counter = 0;

        private static void Clear(int length)
        {
            if (lastWriteIndex + length <= bigBufferLength)
                Array.Clear(buffer, lastWriteIndex, length);
            else
            {
                Array.Clear(buffer, lastWriteIndex, bigBufferLength - lastWriteIndex);
                Array.Clear(buffer, 0, lastWriteIndex + length - bigBufferLength);                
            }
        }

        private static void AsyncEventArgsOnCompleted(object sender, SocketAsyncEventArgs socketAsyncEventArgs)
        {
            int newVal = Interlocked.Increment(ref counter);
            if (newVal != 0)
            {
                Console.WriteLine("OnCompleted called wrong number of times. {0}", newVal);
                throw new Exception("Fail!");
            }

            if (socketAsyncEventArgs.SocketError != SocketError.Success)
                return;
            if (socketAsyncEventArgs.BytesTransferred == 0)
                return;
            if (socketAsyncEventArgs.BytesTransferred > lastWriteLength)
            {
                Console.WriteLine("Receive too long, Sync:{0}:{1} LastWrapped:{2}:{3} Length:{4}:{5} ClaimedLength:{6}", wasSync, prevWasSync, lastWrapped, prevLastWrapped, lastWriteLength, prevLastWriteLength, socketAsyncEventArgs.BytesTransferred);
            }
            if (buffer[(socketAsyncEventArgs.BytesTransferred-1 + lastWriteIndex) % bigBufferLength] == 0)
            {
                Console.WriteLine("Receive too short-quickcheck , Sync:{0}:{1} LastWrapped:{2}:{3} Length:{4}:{5}", wasSync, prevWasSync, lastWrapped, prevLastWrapped, lastWriteLength, prevLastWriteLength);
            }
            //for (int i = 0; i < socketAsyncEventArgs.BytesTransferred; i++)
            //{
            //    if (buffer[(i + lastWriteIndex) % bigBufferLength] == 0)
            //    {
            //        Console.WriteLine("Receive too short , Sync:{0}:{1} LastWrapped:{2}:{3} Length:{4}:{5} ActualLength:{6}", wasSync, prevWasSync, lastWrapped, prevLastWrapped, lastWriteLength, prevLastWriteLength, i);
            //        break;
            //    }
            //}
            lastWriteIndex = (lastWriteIndex + socketAsyncEventArgs.BytesTransferred)%bigBufferLength;
            //callerExited.Wait();
            BeginReceive();
        }

        private static SocketAsyncEventArgs asyncEventArgs;
        private static SocketAsyncEventArgs asyncEventArgs1;
        private static SocketAsyncEventArgs asyncEventArgs2;
        private static byte[] buffer;
        private static int lastWriteIndex;
        private static int lastWriteLength;
        private static int prevLastWriteLength = 0;
        private static Random rnd =new Random();
        private static TcpClient client;
    }
}

The not so Random random

Just a quick one here – while I was working on some graphs for a suggestion I made for LOTRO. I discovered a peculiar anomaly in Random.NextDouble.

Plotting a graph of the length of runs where NextDouble returns greater or equal to 0.01, there is a significant dip at a run length of about 53.  Switching to RNGCryptoServiceProvider makes this dip disappear and the dip is clearly reproducible even with sample sizes in the hundreds of millions.

Quite bizarre – although not quite as bizare as this deviation mentioned at stackoverflow.

Enums in dictionary keys.

I was doing some random reflection walking to remind myself whether using a struct as a dictionary key triggered any boxing operations using the default equality comparer.  And if you implement IEquatable<T> it still executes a box statement in IL, but my attempts at creating an equality comparer which trivially does not execute boxing operations isn’t any faster, so it seems the optimiser can properly eliminate explicit null checks in generics where the type parameter ends up being a struct and remove the box operation as dead code.  But that is not what this post is about.

Along the way in my reflection I saw that the default equality comparer special cases enums.  I immediately thought this was great, because I had recently discovered that GetHashCode called on an enum is ‘expensive’, it is *much* cheaper if you cast it to the underlying type and call GetHashCode on that instead. (Specific case was an enum member of a type which overrode equality, so the default equality comparer special case didn’t apply.)  So I had a look at the reflected code for this special case enum equality comparer and found an ‘interesting’ thing.

The implementation called this thing called JitHelper.UnsafeEnumCast<T> to cast its parameters to int.  What makes it interesting is that the ‘implementation’ of this method is ‘throw new InvalidOperationException()’.  Wait what…

Now obviously this method implementation isn’t actually called by EnumEqualityComparer, using enum’s as dictionary keys works just fine.  So it seems that the JIT magically substitutes a cast if T is an enum type, and only uses the implementation as a default fallback scenario.

The things they do to get around the limited type constraints in generics…

And this led me to look closer at JitHelper.  Here I find JitHelper.UnsafeCast<T>…

Have you ever wanted to cast Generic<Specific> to Generic<T> when you know at runtime that T is Specific?  It seems that this little piece of magic lets you do that. (It is apparently used in async stuff to allow for cached result tasks to be returned for a bunch of basic types from a generic result task building function.  For bonus points I now know that true, false, 0 in a dozen types, null and the integers -1 through 9(?!?), each have their own special cached task instances to be shared…)

.Net 4.5 Beta

So its out, I’ve got it installed, but I’m only just starting to do a comparison between the surface area of the developer preview and the beta.

A few tidbits I found before I fell asleep…

Looks like Linq to SQL is getting support for geometry types and more functions.

DataReader – GetDbNullAsync, GetFieldValueAsync (interesting, I guess large columns can still be loading while we know we’ve got a new row), GetStream/GetTextReader (a way to get at large columns without consuming all available memory?)

Async BulkCopy (makes sense!)

SQLException has a GetClientConnectionId – not sure in what scenario you might get confused about which connection a sql exception comes from, so I guess this is something other than what it seems at first glance.

GetHResult is on Exception now…

Some of the new http stuff had non-async methods removed rather than provide both. (Interesting, I wonder if that will be the pattern going forwards…)

Maybe some TLS1.1/1.2 support in SslStream.

UnsafeOnCompleted on TaskAwaiter – (Makes me think of critical finalizer… wonder if its even vaguely related…)

GCLatencyMode.SustainedLowLatency – sounds exciting if it lives up to its name… (Apparently there is something called .Net 4.0.3 beta which already contains this…)

AsyncWait on SemaphoreSlim … could be interesting…

WPF gets the concept of ‘InactiveSelection’… wonder what that means…

(And thats enough for now…)

Treemap Maker

So I’ve added a new little utility to the web site.  Its under the default software page, not one of the sub-pages.

This utility is a bit like Sequoia View, but not actually as good right now.  Only advantage it has over Sequoia is that it can do file count based layout as well as file size based layout.  I find this of interest when backing up a system as the raw file count can often slow down the process as much as large data, so this lets me target where to trim/create archives.

It has left click to zoom in one level (even down to the one file taking up your entire screen level) and right click to undo a zoom.  The layout algorithm is one I found on-line, something called Pivot-by-Middle. does a decent job and wasn’t too complex to implement.  It has a black rectangle to indicate the top-level directory your mouse is currently over, which is very expensive to draw in the current implementation, so it will lag behind if you have a large number of rectangles in the window.  It also shows whatever file your mouse is over up the top.

The colouring is one of the big steps backwards compared to Sequoia – I use a random colour blending approach, where the colour of a square is the blending of the colour of the file and all of its parent directories, and all colours are initialized at random.  Its better than nothing, but that is about all I can say for it…

Caveats: It ignores all errors while trying to walk the file/directory hierarchy – so if you don’t have read access to a large directory, it won’t tell you (it won’t know) and your graph may not be as interesting as it first appears.

Expression is Interesting

I was playing about with Expression trees today for practically the first time and I wrote something like this.

object a = objectDictionary["somekey"];
Expression constExpr = Expression.Constant(a);

Eventually this expression went into a bunch more expressions and finally compiled into a delegate.

It wasn’t until afterwards that I went… wait… what?  I had just passed a reference type to be a ‘constant’ in a delegate and it wasn’t even a string.  Surely it will fail when I run it.  But no, it works just fine.

I may not have written a lot of IL, but I did write a parser for the output of ildasm, so I like to pretend that I know a bit about what you can do in IL and I certainly couldn’t see any way you could embed a reference into some IL.  And it certainly can’t be capturing a, since that rewrites the parent method and Expressions can’t do that…

So I turned to the trusty internet and found ILVisualizer, which I suspect I will be keeping on hand at all times from now on… and lo, the compiled delegate is closed, with the this parameter bound to an instance of CompilerServices.Closure. This type has an object array property called Constants, where my trusty reference has been stored and the generated delegate loads this and accesses the first element.

In part what I find interesting about this is that you can’t trivially write a lambda to do the same thing, since that will capture the variable, and subsequent changes will affect the lambda.  Here we are guaranteed to get the value of the reference at the point in time the expression is constructed.

BindingList does not scale…

So I’ve been doing some UI work in the last 6 months or so – experimenting with MVVM and the like.  Mostly with WPF, but also some win forms.

Until the last few days I had managed to avoid working with BindingList, for one reason or another.  Either because I was using ObservableCollection in WPF or because my lists were static, or they always updated in bulk.  As part of my investigation into BindingList I found that it supports something which ObservableCollection does not. That is, cascade notification of changes to elements.  So if an element type supports INotifyPropertyChanged, then BindingList will raise events when an element raises its PropertyChanged event, saying which element changed and more specifically which property on that element changed.  Seems like a kind of nifty feature, at first glance…

But the implementation does not scale, it is slow, it performs terribly with larger lists.  If your element type supports INotifyPropertyChanged, every time one of those elements raises the property changed event the entire list is walked to work out the index in the list of the item which raised the event!  I was in shock when I first realised this.  You see BindingList is truly just a rather thin wrapper over Collection<T>, so there is no metadata associated with each entry, all of the binding of the element PropertyChanged event is directed to a single handler, and all it gets given is the source and the name of the changed property, so there is no way to include the NewIndex parameter in ListChangedEventArgs without doing a search.  (By default this search even uses the default object comparator, so if you happen to have two different but sometimes equal objects in your list, enjoy the results…)

Another side note – AddNew, the other feature which BindingList has which Collection<T> does not – also does not scale.  It has to use IndexOf to find out where in the list the newly added item ended up  in case it needs to cancel the add, because it supports auto sorting in derived types. (BindingList does not support auto sorting itself…)

Morale of the story… don’t use BindingList for more than a hundred or so items which support INotifyPropertyChanged – write your own.  If you do write your own, consider just not supporting the cascading modify events at all (even though you can do it efficiently if you have to).  Item presentation should bind to the items, not to the list which holds them.  But alas, not every control agrees…

(PS – BindingSource internally creates a BindingList in many scenarios…)

(PPS – Heh, I just realized this is immediately after my Contains rant, and is effectively an IndexOf rant, which I said I hadn’t seen misused much…)

List.Contains Considered Harmful

So this post isn’t going to add much value to the universe – it is just a rant.

The number of times where I have seen the word ‘Contains’ and cringed is far, far more than I would have ever expected.  I can’t say the same for IndexOf, but it certainly carries a similar risk.

foreach (int newValue in input)
    if (!list.Contains(newValue))
        list.Add(newValue);

This little snippet and similar versions are the most common cringe worthy occasion, but there are certainly plenty of others.  Of course it works just fine, until input contains a hundred thousand entries, or more (or less…).

I think a major part of the problem is that Contains is so innocuous, just one little word, so easy.  If we forced everyone to write the double nested loop, maybe there would be less cringe-worthy moments.

Most of the time this is a case of poorly chosen data structure – they don’t actually want a list, they want a set.  SortedSet/HashSet either one will probably do.  If that isn’t it they might sometimes want something like multiset from c++, or Dictionary<T, int> aka a counting set.  Rarely they may even want an OrderedSet – elements have specific (non-sorted) order, no repeats and hence want a fast contains check – although I’ve never seen such a collection in use.  (I see a few future additions to TMD.Algo…)

Sometimes changing the data structure is not an option. (Legacy code, I am looking at you!)  While this is not a happy place to be when it comes to performance, temporarily placing the data into the correct data structure during the important code points is still a huge win in many scenarios.

But simply knowing the above is probably not enough, I suspect people are going to fall into the same trap time and time again.  Maybe I should open a Connect ticket asking Contains to be renamed CheckEachExistingEntryForTheGivenInput…