Greatest-Per-Group Query in Rails Using Custom Query Methods
This is part 2 of 4 in a series on how to use Greatest Per Group queries in Ruby on Rails to avoid the N + 1 query problem:
- When and how to use Greatest Per Group queries (with PostgreSQL)
- Rails implementation using custom query methods
- Rails implementation using a scoped
has_one
association - Rails implementation using a view backed model
Everything has been tested with Rails 4.2 and PostgreSQL. See demonstration app and it's source code
In Part 1 of this series, we covered when Greatest Per Group queries can help avoid the N + 1 query problem and how to create them with PostgreSQL's DISTINCT ON
feature. Here we'll learn how to take our Greatest Per Group query and fit it into our Rails app using custom query methods.
Taking advantage of ActiveRecord's dynamic model attribute behavior
One of the things about Ruby on Rails that I found most confusing when I first got started is the way that you don't have to define the attributes of your models in your models. Instead, you just add columns to the underlying database table – usually with migrations, but it doesn't actually matter how they get there – and ActiveRecord will automatically add those attributes to your model instances and create accessors for them.
Though this confused me when I was newer to Rails, I have come to appreciate it as I've gained experience. It strikes a nice balance between having an easy default and allowing you to roll up your sleeves and get your hands dirty when necessary. And getting our hands dirty is exactly what we'll do here.
Because Rails dynamically creates attributes on our model for any columns returned by the database, all we have to do is get Rails to execute our query from part 1 when loading our collection of authors. Then each author will have an attribute called newest_post_title
with the title of the newest post.
Here's that query again:
SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
AS newest_posts
ON newest_posts.author_id = authors.id
Part of the magic of Rails is that it's intelligent naming defaults allow us to avoid writing SQL most of the time. But the tools to get our hands dirty with SQL are right there, waiting for us when we need them. Here's how we get Rails to execute that query:
@authors = Author
.select("authors.*,
newest_posts.title AS newest_post_title")
.joins("LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
AS newest_posts ON newest_posts.author_id = authors.id")
It's actually incredibly straightforward. We're using the select
method to specify the SELECT
statement and then chaining on the joins
method to specify the LEFT JOIN
. If you do this in a Rails console, you'll see that ActiveRecord executes the exact SQL query we want.
Now, we can access the newest_post_title
when we iterate over our list of authors like this:
@authors.each do |author|
puts "Newest post for #{author.name}: #{author.newest_post_title}"
end
And it only generates a single query, rather than the N + 1 queries we would generate otherwise.
Taking advantage of the way ActiveRecord dynamically creates an attribute for every column present in the query result is an incredibly powerful technique. It allows us to essentially add any attribute we need to any model, all on the fly, without having to set anything up in our model class ahead of time.
All we have to do is craft the right SQL statement.
Some shortcomings with this approach
The most obvious shortcoming with this approach is that it is rather verbose and ugly. You've got a lot of SQL inside those chained select
and joins
methods and, unless you wrote the code yourself, it takes a bit of puzzling to figure out what's going on. This shortcoming is easy to overcome, though. We can encapsulate all that logic in a named scope:
scope :with_newest_post_title, -> {
.select("authors.*,
newest_posts.title AS newest_post_title")
.joins("LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
AS newest_posts ON newest_posts.author_id = authors.id")
}
This named scope allows you to just do Author.with_newest_post_title
instead of typing that huge long thing out each time, which is definitely a win.
A less obvious shortcoming with this approach is that it is not composable. Let's say you also want to display how long each author has been writing for the site. Now you need the date of each author's oldest post as well as the title of each author's newest post. You could create another named scope for the date of the author's first post:
scope :with_oldest_post_date, -> {
select("authors.*,
oldest_posts.date AS oldest_post_title")
.joins("LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at ASC)
AS oldest_posts ON oldest_posts.author_id = authors.id")
}
But when you try to chain the two named scopes together – that is, when you try to compose them – with something like:
Author.with_newest_post_title.with_oldest_post_date
It won't work. That's because the SELECT
statements from each named scope conflict with each other. Instead, you would need a third named scope that combines the two:
scope :with_newest_post_title_and_oldest_post_date, -> {
select("authors.*,
newest_posts.title AS newest_post_title,
oldest_posts.date AS oldest_post_title")
.joins("LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
AS newest_posts ON newest_posts.author_id = authors.id")
.joins("LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at ASC)
AS oldest_posts ON oldest_posts.author_id = authors.id")
}
Does that look like a hot mess to you? Because it looks like a hot mess to me. We've got three named scopes to account for all the possible combinations of the two attributes we might want to include in our query. That is literally the definition of combinatorial explosion. And anyway, who wants big blocks of SQL mixed in with their Ruby?
But this approach is a pretty handy tool to have in your back pocket when you know exactly how to achieve what you want in SQL and you just need to get it done fast. The next two parts will cover approaches that improve on the shortcomings of this approach, plus bring other benefits to the table:
- Using a scoped
has_one
association (clean and elegant) - Using a view backed model (when all else fails)