torsdag 22 maj 2008

LINQ to SQL: On null values in coalesce expressions

The T-SQL function COALESCE is a convenient way of using optional parameters in an SQL query. It accepts a number of parameters, and returns the first non-null value in the set. When all the values in the set are null, the function returns a null value. For those more acquainted with SQL CASE, the following two expressions are exactly identical:

SELECT COALESCE(@param1, @param2, 'Default value')

SELECT
   CASE WHEN @param1 IS NOT NULL THEN @param1 ELSE
      CASE WHEN @param2 IS NOT NULL THEN @param2 ELSE
         'Default value'
      END
   END

C# provides a convenient shorthand - expr1 ?? expr2 - for coalesce expressions, which allows us to create a function as such:
public int Coalesce(int? var, int defaultValue)
{
   return var ?? defaultValue;
}

In LINQ to SQL, the ?? shorthand is always translated into COALESCE when the SQL query is generated, which equips us with the ability to use optional parameters as seen in figure 1:

Figure 1

public IEnumerable GetItems(int? parentId, int? typeId)
{
   myDataContext db = new myDataContext();
   return from item in db.Items
          where
             item.ParentId == (parentId ?? item.ParentId)
             && item.TypeId == (typeId ?? item.TypeId)
          select item;
}

As a side-remark, note the necessity of brackets in the above code. This code will render the following SQL query:
SELECT
   *
FROM
   Items
WHERE
   ParentId = COALESCE(@parentId, ParentId)
   AND TypeId = COALESCE(@typeId, TypeId)

So far so good, but this, in fact, poses a problem when using nullable database fields. Consider the following table structure:
Items
-----------------------------------
Field name     Type      Allow null
-----------------------------------
Id             int       false
ParentId       int       true
TypeId         int       false

Now, assume we want to find all items of TypeId 5, regardless of their parent items; indeed, regardless of whether they have a parent at all. The following request seems intuitive enough: GetItems(null, 5);. This will result in the C# expression item.ParentId == (null ?? item.ParentId), which is completely valid, but it will translate to the following SQL:
SELECT
   *
FROM
   Items
WHERE
   ParentId = COALESCE(null, ParentId)
   AND TypeId = COALESCE(5, TypeId)

For a record of ParentId 1 and TypeId 5, this will be just the way to do things, but if we consider the fact that there could exist an item with ParentId null and TypeId 5, the first COALESCE would return null, as both of its parameters would be of a null value. In SQL, this results in a null = null comparison, which will return false (hence the IS NULL syntax). Consequently, items with a ParentId of null will not be returned, even though the intention was to specify that the query should disregard of ParentId completely.

A workaround for the problem would be to rewrite a method accordingly:

Figure 2
public IEnumerable GetItems(int? parentId, int? typeId)
{
   myDataContext db = new myDataContext();
   return from item in db.Items
          where
             ((item.ParentId == null && parentId == null) || item.ParentId == (parentId ?? item.ParentId))
             && item.TypeId == (typeId ?? item.TypeId)
          select item;
}

Now, you don't have to look at figure 2 to realize that there's probably a better way to solve this problem. Expression trees would do the trick, for sure, but for our simple implementation, it turns out there's a less complex solution:

Figure 3
public IEnumerable GetItems(int? parentId, int? typeId)
{
   myDataContext db = new myDataContext();
   var items = from item in db.Items select item;

   if(parentId != null)
      items = from item in items where item.ParentItemID == parentId select item;

   if(typeId != null)
      items = from item in items where item.ItemTypeID == typeId select item;

return items;
}

Beginning by selecting the entire set of items available, we then proceed to filter the results to match any of the criterias that may or may not be provided. At first glance, this would appear to begin by executing a plain SELECT * FROM Items query, returning all items in the table, even when we know that's not always requested, but as the result set - items - is never enumerated before the return statement, LINQ to SQL optimizes the above code into one query at runtime, depending on what values have been specified.

An SQL Profiler trace gives us the answers in plain text:
Figure 1
exec sp_executesql N'SELECT Id, TypeId, ParentId FROM Items WHERE (ParentId = (COALESCE(@p0,ParentId))) AND (TypeId = (COALESCE(@p1,TypeId)))',N'@p0 int,@p1 int',@p0=NULL,@p1=5

The query generated in Figure 1 returns 0 rows, which is not what we expect, given the data in the table.

Figure 2
exec sp_executesql N'SELECT Id, TypeId, ParentId FROM Items WHERE ((ParentId IS NULL) OR (ParentId = (COALESCE(@p0,ParentId)))) AND (TypeId = (COALESCE(@p1,TypeId)))',N'@p0 int,@p1 int',@p0=NULL,@p1=5

The query from Figure 2 gives us the result we want, but in a verbose and inconvenient query

Figure 3
exec sp_executesql N'SELECT Id, TypeId, ParentId FROM Items WHERE TypeId = @p0',N'@p0 int',@p0=5

From Figure 3, only one query is executed (although we appear to have begun by selecting everything in the table), and only the non-null parameter is passed.

Inga kommentarer: