“If all you have is a hammer…

…everything looks like a nail” says the proverb.

I’ll illustrate it with a little story about a common mistake I see sometimes.

(Code samples are in C# but use only basic features so C++ and Java developers should not be too disoriented.)

Static arrays

When you started programming chances are the first collection you used was a fixed-length array so you’ve become accustomed, not to say addicted, to its brackets [] indexing.
And for good reason: this is a very simple and intuitive interface for accessing the elements of a collection, a good hammer.

So when you wanted to compute the sum of all the elements of a nail, uh I mean an int[] array, you used code like this:

long sum = 0;
for (int i = 0; i < array.Length; ++i)
{
    sum += array[i];
}

Simple and efficient!

Dynamic arrays

Later you were faced with situations where you could’nt know in advance the final length of your collection and you discovered dynamic arrays, e.g. List<T>.
You naturally copy-pasted your old code, and quicly discovered that the only thing you needed to change was replacing the “Length” property by the “Count” property:

long sum = 0;
for (int i = 0; i < list.Count; ++i)
{
    sum += list[i];
}

And all was working fine, no issue on the horizon even with big set of data.
Seems like the list is a nail too after all, so far so good…

Associative arrays

Later again you’ve found that sometimes indexing with continuous integers is not enough and you needed to associate a richer identifier to each element of your collection.
Google revealed you that what you needed is named an associative array, map or dictionary and that C# offers some implementations like Dictionary<TKey, TValue>.

But this time this is more involved: you can still use the Count property but you’ve lost the sexy bracket indexing :( !
This is quite frustrating, you have a hammer, a really cool and powerful hammer that has worked for years with tons of situations, but now you have to deal with something that at first sight doesn’t really look like a nail.
But by looking again at the Dictionary type you see that thanks to Linq and its ElementAt() extension method it could ressemble a nail, no it’s become a nail, no no it’s a nail, a nail, yes a nail!

And you do this :

long sum = 0;
for (int i = 0; i < dictionary.Count; ++i)
{
    sum += dictionary.ElementAt(i).Value;
}

Woot! Woot! It’s working! :)

Well at least with a toy program where you have to treat only a small set of data…

It almost worked

Because later when your boss asks you to treat a huge 1G file you feel that something is going wrong because it’s dead slow.
Later your boss comes back and starts looking over your shoulder waiting for the results eagerly as he needs them for the 4pm metting.
When you have exhausted all your jokes and that it’s still running you understand it might not end on time and feel well … embarrassed.

So what happens?

The answer is simple: a dictionary is not a nail, you can’t index it quickly as you would with an array or a list that are both based on continuous memory areas where elements can be accessed in constant time aka O(1).
So when you ask Linq to get the ith element of a dictionary it will simply enumerate the elements and count until it reaches i and returns the current element.
The complexity of this enumeration is O(n), not so crazy.
But remember you use it each time you loop in your main for loop iteration => the whole complexity of your sum is O(n2), ouch!

Reality knocking...

Reality knocking…

The solution

We’ve seen that the integer indexing was not a good fit for the dictionary so we will use another abstraction: a simple iteration over the elements:

long sum = 0;
foreach (KeyValuePair<int, int> pair in dictionary)
{
    sum += pair.Value;
}

This leads to a quite different code, not that complex, but this time you obtain decent performance.

The benchmark

Here is the benchmark I’ve used :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;

class Test
{
	static void Main()
	{
		const int N = (int)5e4;
		
		int[] array = new int[N];
		List<int> list = new List<int>(N);
		Dictionary<int, int> dictionary = new Dictionary<int, int>(N);
		
		Random rand = new Random();
		
		for (int i = 0; i < N; ++i)
		{
			int n = rand.Next() % 100;
		
			array[i] = n;
			list.Add(n);
			dictionary[i] = n;
		}
		
		long sumA = 0, sumL = 0, sumD = 0, sumD2 = 0;
		
		Stopwatch sw = new Stopwatch();
		
		sw.Start();
		for (int i = 0; i < array.Length; ++i)
		{
			sumA += array[i];
		}
		sw.Stop();
		
		long arrayDuration = sw.ElapsedTicks;
		
		sw.Restart();
		for (int i = 0; i < list.Count; ++i)
		{
			sumL += list[i];
		}
		sw.Stop();
		
		long listDuration = sw.ElapsedTicks;
		
		sw.Restart();
		for (int i = 0; i < dictionary.Count; ++i)
		{
			sumD += dictionary.ElementAt(i).Value;
		}
		sw.Stop();
		
		long dictionaryDuration = sw.ElapsedTicks;
		
		sw.Restart();
		foreach (KeyValuePair<int, int> pair in dictionary)
		{
			sumD2 += pair.Value;
		}
		sw.Stop();
		
		long dictionaryDuration2 = sw.ElapsedTicks;

		string outputTemplate = @"+----------------------+
|TYPE  |TICKS   |FACTOR|
+----------------------+
|array |{0,8}|{4,6}|
+----------------------+
|list  |{1,8}|{5,6}|
+----------------------+
|dico1 |{2,8}|{6,6}|
+----------------------+
|dico2 |{3,8}|{7,6}|
+----------------------+";
		
		Console.WriteLine(outputTemplate, arrayDuration, listDuration, dictionaryDuration, dictionaryDuration2,
									      1, listDuration / arrayDuration, dictionaryDuration / arrayDuration, dictionaryDuration2 / arrayDuration);
	}
}

And here are some results :

+----------------------+
|TYPE  |TICKS   |FACTOR|
+----------------------+
|array |     240|     1|
+----------------------+
|list  |     704|     2|
+----------------------+
|dico1 |53844427|224351|
+----------------------+
|dico2 |    2367|     9|
+----------------------+

As you can see the naive ElementAt iteration is 20000 times slower than the foreach iteration and 200000 times slower than the reference fixed-length array implementation!

Conclusion

The bottom line is simple: stop to see nails everywhere, and try to adapt to each situation, here to each collection to obtain the best of it. :)

2 thoughts on ““If all you have is a hammer…

  1. There is no comment section available in the Code Project version of this article. I was going to point out that you can use indexing with a Dictionary. In your example sumD+=dictionary[i] works and is much quicker than your test. You can also use sum=list.Sum() in your List example. Regards, George.

    • Thanks for your comment George.
      My goal was to illustrate the “ElementAt” issue only, avoiding any distraction.
      - “dictionary[i]” works in this case because I’ve used a continuous set of integers as keys (to avoid a more contrived keys set), but in the general case the keys can be anything,
      - there is some extension methods for basic operations like “Sum” but I wanted to be generic; in the general case you can use “Aggregate” but it is specific to “LINQ” and can be harder to read for complex aggregation where a “foreach” loop is preferable.
      Regards :)

Leave a Reply

Your email address will not be published. Required fields are marked *


two × 7 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>