colinramsay.co.uk

NHibernate: Optimising Queries with Projections

15 Jan 2008

As I mentioned previously, I am a only just beginning to get a feel for NHibernate's querying abilities. Yesterday I had a situation where I wanted to retrieve particular records in a table based on values from another table. Here's an example, with names changed to protect the innocent:

Category
--Has Many Posts

I wanted to select all categories where the Post date was within a certain range. Here's what I started with:

DetachedCriteria criteria = DetachedCriteria.For(typeof (Category))
	.CreateCriteria("Posts")
	.Add(Expression.Between("PostDate", startDate, endDate));

The Category object has a collection property on it called Posts, so when I CreateCriteria("Posts"), I am effectively telling NH I want to drill down through that property and begin filtering on the properties of Post, which is what I do with the next line.

This was pretty straightforward, once I'd thought it through. However, running the query gave me hundreds of matches instead of the few I knew I should be getting. A bit of digging showed me that this was because (in SQL terms) this wasn't running a DISTINCT query, so I was getting a Category back for every matching Post. After a quick Google search, I found out how to replicate the DISTINCT keyword that I thought I was after:

DetachedCriteria criteria = DetachedCriteria.For(typeof (Category))
	.CreateCriteria("Posts")
	.Add(Expression.Between("PostDate", startDate, endDate))
        .SetResultTransformer(new NHibernate.Transform.DistinctRootEntityResultTransformer());

I tell NH to run a DISTINCT on the root entity, in this case Category, and hey presto, I get the number of records I was expecting. I sat down to test this out a little bit more and think about the solution in my head. It started getting a bit slow when I increased the date range, so I fired up SQL Server Profiler to see what was happening under the hood.

What I saw made lots of sense. It was pulling back all of the duplicate records I saw before I used the result transformer. It was also pulling back all of the data for the Post entity. The result transformer must work on the client side, so I was pulling back far too much information, and my test run was taking 10 seconds to complete.

There were two clear areas for optimisation here, and they were linked. I didn't need all of the Post data, and it was that which was causing me to get duplicate rows, so I rethought my code:

DetachedCriteria postList = DetachedCriteria.For(typeof(Post))
	.SetProjection(Projections.Property("Category.CategoryUID"))
	.Add(Expression.Between("PostDate", startDate, endDate));

DetachedCriteria criteria = DetachedCriteria.For(typeof (Category))
	.Add(Subqueries.PropertyIn("CategoryUID", postList));

Firstly I create a criteria to get the posts which I need, but with one important difference: SetProjection. In this example, I am saying that I only want to retrieve the CategoryUID on the Category property of Post. In other words, despite this being a Criteria for typeof(Post), I will actually pull back a list of integers representing the CategoryUIDs which match. I am narrowing down the columns which I am returning using Projections.

Secondly, I pass those results to a subquery, which is pretty much like using IN (SELECT) in your WHERE clause in a SQL statement. Using the PropertyIn method took me a while to discover, but when I did it worked like a charm. My test run now takes zero seconds to complete, and I am a happy developer.

Feedback or questions on this post? Create an issue on GitHub.