Monday, 19 November 2007

Laziness in C#/LINQ

Perhaps this post should be called "The influence of Haskell".

One thing that people often don't realize is that when people (e.g. my buddy Erik Meijer) say that IEnumerable is a lazy list, they *really* mean it! Consider the following code in C# 3.0. It queries a database (in this case an array) by filtering out those elements that are not less than thirty. It then takes this resulting array and filters out those elements that are not more than twenty. The result is then printed out.

var MyDatabase = new int[] { 1, 25, 40, 5, 23 };

var query3 =
from x in MyDatabase
where LessThanThirty(x)
select x;

var query4 =
from x in query3
where MoreThanTwenty(x)
select x;

foreach (var r in query4) Console.WriteLine(r);

In fact, this is NOT the semantics. It's lazy lists a la Haskell, not strict lists a la ML. To see the difference, let me give you the code for pne of the filter tests (I'll make it side-effecting so you can see when it happens...)

static bool LessThanThirty(int x)
Console.WriteLine(x+"? Less than 30");
return x < 30;

Now try running the code. Not what you expected? That's because this is lazy programming!!

If you're a Haskell programmer, this is natural; although I suspect that programmers from more conventional languages might find it a little tricky.

POSTSCRIPT: Okay, a number of people have asked me to give the result of running the code. It's as follows (where I've added some formatting to help you see what's going on).

1? Less than 30
1? More Than 20

25? Less than 30
25? More Than 20

40? Less than 30

5? Less than 30
5? More Than 20

23? Less than 30
23? More Than 20

So you can see the laziness working here, in that we get alternating Less-than-30, more-than-20 calls. Cool!


Tim said...

This isn't lazy evaluation, it's something much worse.

Lazy evaluation implies referential transparency: that the end result of the computation is independent of the order of evaluation. This way, the only aspect of execution that differs between lazy and ordinary (eager) evaluation is divergence.

But here, you have lazyness without referential transparency.

This is the defect of Algol 60 that gained infamy over 40 years ago: The result of your effectful computation is intimately tied to the order in which lazy items are accessed! An expression that looks like it should do the same on every invocation actually behaves differently on the first and subsequent runs.

Tim Sweeney

Gavin Bierman said...

If you mean that lazy evaluation is only an implementation technique for call-by-name and should not produce observably different results then you're right.

I'm not suggesting, by the way, that I think this is the best state of affairs. Merely that this is what happens.

(It all arises from a round-the-lunch-table chat with Erik Meijer and Daan Leijen. We came up with this example and asked other people at the lunch-table what the result was. No one got it right. Hence I wanted to write this down.)