Shorter code ≠ better code

Oliver Oliver • Published 2 years ago Updated one year ago


There's an interesting post about how clever code is bad. It outlines some bad practices, and some good ones to use in their stead, such as the snippet:

// bad
while (*d++ = *s++);

// good
strcpy(destination, source);

But I'm here to hammer that point home and show to you that shorter code can, and often does, perform a lot worse than a readable alternative.

Oftentimes, when using C#, developers like to shorten their code with the mindset of “fewer lines to code = less CPU instructions = faster.” This is not always the case.

Let's take this a simple example of having a bool array, where we wish to count how many elements within it are true.

The method may look something like this:

private readonly bool[] _array = {true, false, true, true};

public int HowManyAreTrue()
{
    int count = 0;

    foreach (bool item in _array)
    {
        if (item)
        {
            count++;
        }
    }

    return count;
}

Rider, by default, is configured to suggest you transform this method to a LINQ expression. It will refactor the code to something like this:

private readonly bool[] _array = {true, false, true, true};

public int HowManyAreTrue()
{
    return _array.Count(item => item);
}

Now the IDE has stopped complaining and you call it a day. You may settle on this implementation because “there are fewer lines of code”, it must be faster right? This is the trap that many people fall into.

To really demonstrate why this is bad, let's install the trusty BenchmarkDotNet package and compare these two approaches:

private readonly bool[] _array = {true, false, true, true};

[Benchmark]
public int CustomLoop()
{
    int count = 0;

    foreach (bool item in _array)
    {
        if (item)
        {
            count++;
        }
    }

    return count;
}

[Benchmark]
public int CountWithLinq()
{
    return _array.Count(item => item);
}

If we run these benchmarks, and look at the results table, we can see just how bad CountWithLinq really performs:

Method Mean Error StdDev Allocated
CustomLoop 3.422 ns 0.016 0.015 -
CountWithLinq 39.894 ns 0.483 0.404 32 B

Using LINQ has rewarded us with a 1000% increase in runtime, and tops it off with a 32 byte allocation. Why?

Well, what you may neglect to realise, is that the delegate passed to Count is increasing run time by quite a subtantial differential because it's leading to a heap allocation of Func<T, bool>. This means Rider has stabbed us in the back and suggested we use “shorter code” that is demonstrably worse.

Why do delegates allocate?

Delegates are quite an amazing feature of C#. But the truth is, they are nothing more than syntax sugar. A delegate compiles to an entire class - and since classes allocate on the heap, so does any instance of a delegate.

If you don't believe me, you can see this in action by looking at the CIL for a delegate, and a variable of that delegate type.

MyDelegate someMethod = Foo;
someMethod();

void Foo()
{
}

<mark>delegate void MyDelegate();</mark>

MyDelegate is parameterless and returns void. It seems so insignificant in code. But look how it compiles:

.class private auto ansi sealed MyDelegate
    extends [System.Runtime]System.MulticastDelegate
{
    // Methods
    .method public hidebysig specialname rtspecialname 
        instance void .ctor (
            object 'object',
            native int 'method'
        ) runtime managed 
    {
    } // end of method MyDelegate::.ctor

    .method public hidebysig newslot virtual 
        instance void Invoke () runtime managed 
    {
    } // end of method MyDelegate::Invoke

    .method public hidebysig newslot virtual 
        instance class [System.Runtime]System.IAsyncResult BeginInvoke (
            class [System.Runtime]System.AsyncCallback callback,
            object 'object'
        ) runtime managed 
    {
    } // end of method MyDelegate::BeginInvoke

    .method public hidebysig newslot virtual 
        instance void EndInvoke (
            class [System.Runtime]System.IAsyncResult result
        ) runtime managed 
    {
    } // end of method MyDelegate::EndInvoke

} // end of class MyDelegate

That one innocent line compiled to an entire class, one which inherits System.MulticastDelegate. This means when we create a variable of that type, and assign it the result of Foo as a method group, it compiled to a newobj instruction - which creates a heap allocation. And then when we invoke the delegate, it performs a virtual call with the callvirt instruction - which also bites us in the ass because virtual calls are not that performant.

ldnull
ldftn void Program::'<<Main>$>g__Foo|0_0'()
newobj instance void MyDelegate::.ctor(object, native int)

<mark>callvirt instance void MyDelegate::Invoke()</mark>

This is what slows LINQ down. This is why it is bad and it should feel bad. LINQ may let you write shorter code, but because LINQ is littered with methods that accept delegates as parameters, actually using LINQ can put you at a significant disadvantage.

Now, I am not saying to completely avoid LINQ. There are certain situations where LINQ is necessary - such as dealing with a dynamic data source like a database, or some asynchronous IEnumerable. There are also cases where it may feel adaquate because you have a monster of mammoth LINQ chains involving a few SelectMany and ToDictionary calls, and refactoring those out would cost you more time and effort than it's worth.

But for mundane tasks, simple operations like Where or Count or Sum, I can assure you there is a better way to approach the problem.

Your source code may result in more lines, but those lines will perform so much better - and you will thank yourself later that you did it.




2 legacy comments

Legacy comments are comments that were posted using a commenting system that I no longer use. This exists for posterity.

Evan Sanders Evan Sanders • 2 years ago

Hmmm I wonder why they have it in the first place…

Oliver Booth Oliver Booth • 2 years ago

LINQ was originally introduced to prepare for Entity Framework, and that's where it really shines the most. With systems like LINQ-to-SQL (thanks to Expression<T>), it's a fantastic way to avoid baking SQL queries in your code.

But over the years, it's just been used and abused in places where it really doesn't belong. I guess that comes with the territory. You introduce some tech, people will always use it poorly.