List.Contains Considered Harmful

So this post isn’t going to add much value to the universe – it is just a rant.

The number of times where I have seen the word ‘Contains’ and cringed is far, far more than I would have ever expected.  I can’t say the same for IndexOf, but it certainly carries a similar risk.

foreach (int newValue in input)
    if (!list.Contains(newValue))

This little snippet and similar versions are the most common cringe worthy occasion, but there are certainly plenty of others.  Of course it works just fine, until input contains a hundred thousand entries, or more (or less…).

I think a major part of the problem is that Contains is so innocuous, just one little word, so easy.  If we forced everyone to write the double nested loop, maybe there would be less cringe-worthy moments.

Most of the time this is a case of poorly chosen data structure – they don’t actually want a list, they want a set.  SortedSet/HashSet either one will probably do.  If that isn’t it they might sometimes want something like multiset from c++, or Dictionary<T, int> aka a counting set.  Rarely they may even want an OrderedSet – elements have specific (non-sorted) order, no repeats and hence want a fast contains check – although I’ve never seen such a collection in use.  (I see a few future additions to TMD.Algo…)

Sometimes changing the data structure is not an option. (Legacy code, I am looking at you!)  While this is not a happy place to be when it comes to performance, temporarily placing the data into the correct data structure during the important code points is still a huge win in many scenarios.

But simply knowing the above is probably not enough, I suspect people are going to fall into the same trap time and time again.  Maybe I should open a Connect ticket asking Contains to be renamed CheckEachExistingEntryForTheGivenInput…