Greatest-Per-Group Query in Rails Using a Scoped has_one Assocation
This is part 3 of 4 in a series on how to use Greatest Per Group queries in Ruby on Rails to avoid the N + 1 query problem:
- When and how to use Greatest Per Group queries (with PostgreSQL)
- Rails implementation using custom query methods
- Rails implementation using a scoped
has_one
relation - Rails implementation using a view backed model
Everything has been tested with Rails 4.2 and PostgreSQL. See demonstration app and it's source code
In Part 2 of this series, we covered how to use a Greatest Per Group query in Rails using custom query methods, as well as some of the limitations of that approach. Here we'll cover an approach that makes up for those limitations and brings some extra benefits to the table.
A scoped has_one
association
We'll need our Greatest Per Group query from Part 1 fresh in our heads for this next part, so here it is again:
SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
AS newest_posts
ON newest_posts.author_id = authors.id
There are three things happening in this query:
- We're creating a temporary
newest_posts
table that returns a single post – the newest one – for each author. - Then, we're joining this temporary table to our
authors
table onauthors.id = newest_posts.author_id
. - Finally, we're selecting some data from that
newest_posts
table to return along with our author data.
Let's forget about the third item – selecting the data – for now and instead focus on the first two.
We're creating a table that has a single row per author and we're joining it to the authors
table on the author_id
foreign key. If that feels eerily familiar to you, it's because this is essentially exactly what happens when we have a has_one
association defined on a model. What we need to do is somehow get Rails to use our subquery for a has_one
we define. Here's our subquery again:
SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC
Luckily, Rails allows us to customize – to scope – the query used for a has_one
association. Here's how we do that with our subquery:
class Author
has_many :posts
has_one :newest_post, -> {
select("DISTINCT ON (author_id) *")
.order(:author_id, created_at: :desc)
}, class_name: "Post"
This basically tells Rails that 1) we have a has_one
called newest_post
, 2) it is essentially a Post
, and 3) to use .select("SELECT DISTINCT ON (author_id) *").order(:author_id, created_at: :desc)
when generating the query. And because it is just like any other has_one
association, you can eager load it:
@authors = Author.includes(:newest_post).all
@authors.each do |author|
puts "Newest post for #{author.full_name}: #{author.newest_post.title}"
end
And it only fires off two queries.
Why this approach is awesome
Let's compare and contrast. Here's how we achieved this with custom query methods:
@authors = Author
.select("authors.*,
newest_posts.title AS newest_post_title")
.joins("LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
AS newest_posts ON newest_posts.author_id = authors.id")
And here's what we just did:
class Author
has_many :posts
has_one :newest_post, -> {
select("DISTINCT ON (author_id) *")
.order(:author_id, created_at: :desc)
}, class_name: "Post"
Already this feels like a big improvement. It eliminates a lot of the inline SQL of the customer query methods, it succinctly describes – has_one :newest_post
– what is going on, and it's just more intuitive. The idea that an author has a newest post makes sense. It is easy idea to hold in your head. All that SQL stuffed inline into your Ruby code is not easy to hold in your head.
But there are other benefits to this approach as well.
One of the problems we identified with using custom query methods is that they aren't very composable. If you abstract them away behind named scopes, you can't combine the named scopes because the SELECT
clauses interfere with each other. But scoped has_one
associations are as composable as you can get.
Let's say we also wanted an oldest_post
association:
class Author
has_many :posts
has_one :newest_post, -> {
select("DISTINCT ON (author_id) *")
.order(:author_id, created_at: :desc)
}, class_name: "Post"
has_one :oldest_post, -> {
select("DISTINCT ON (author_id) *")
.order(:author_id, created_at: :asc)
}, class_name: "Post"
This allows us to do something like this:
@authors = Author.includes(:newest_post, :oldest_post)
@authors.each do |author|
puts "Newest post for #{author.name}: #{author.newest_post.title}"
puts "#{author.name} first posted on #{author.oldest_post.date}"
end
Compared to the mess of SQL and named scopes that we would need if we were using custom query methods, this is a breath of fresh air. Not only is it composable, it's way less code. And it works the way Rails wants to work. What do I mean by that? Well, because we've set these up as has_one
assocations – and that's really what they were all along; an author's newest or oldest post is a single post that is related to the author – we get a bunch of stuff for free.
For instance, let's say we want to show a list of authors with the title of their newest post, but not include any authors who have not posted yet. Now that's simple. We can use our new has_one
relations with joins
:
@authors = Author.joins(:newest_post)
@authors.each do |author|
puts "Newest post for #{author.name}: #{author.newest_post.title}"
end
And, since these are associations, they'll return full-fledged Post
instances. So if we decide that we want to display other attributes of the newest post, there's no need to modify any SQL. We simply call the attribute we need:
<%= pluralize(@author.newest_post.word_count, "word") %>
And if we want to have a link to the newest post, that's simple too:
<%= link_to @author.newest_post.title, post_path(@author.newest_post) %>
Or we can use it in a form builder:
<%= form_for @author.newest_post do |f| %>
<%= f.text_field :title %>
<%= f.submit %>
<% end %>
Or we can access it's assocations:
<%= pluralize(@author.newest_post.comments.count, "comment") %>
You get the idea. You can do whatever you want with it, because it's an instance of one of your models. But each one of these would have required different custom SQL if we were just using custom query methods to add attributes to our Author
instances.
How this approach works
So that covers why this approach is better, but I want to dive into how this approach works because, even though it might not look like it, we're getting ActiveRecord to use pretty much the same SQL query in both approaches. The main difference is in where we put the parts of that query.
Let's revisit our original query to return a list of the newest posts for each author:
SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC
In the first approach, we put this into a subquery, joined the result of that subquery to our authors
table, then selected the column we needed:
SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
AS newest_posts
ON newest_posts.author_id = authors.id
In this approach we don't have to do all of that because Rails already understands how to join has_one
relations. Instead, all we have to do is create a has_one
relation that executes our query and then Rails will know what to do with it. And if you take a closer look at the has_one
we defined above, you'll see that it looks remarkably like that original query of ours:
has_one :newest_post, -> {
select("DISTINCT ON (author_id) *")
.order(:author_id, created_at: :desc)
}, class_name: "Post"
In the last argument, we're telling Rails that this has_one
relation should return an instance of a Post
. Normally, Rails will figure this out automatically based on the name we give our has_one
relation, but that won't work in this case, so we have to specify the model class. And Rails knows that the table for the Post
class is posts
. Then, in the lambda passed as the second argument, we're specifying the SELECT DISTINCT ON
and ORDER
clauses that we want Rails to use when querying the posts
table.
And the rest, as they say, is magic.
Well, almost
In my opinion, the approach we covered here – using a scoped has_one
association – really hits the sweet spot for implementing Greatest Per Group queries in Rails. It works for most use cases and, because it does things in a Rails-y way, it's intuitive and composable. But it breaks down with more complicated stuff that's at the boundary of how much ActiveRecord "understands" SQL. Like, for instance, if you try to use joins in the custom scope on the has_one
relation, or in more complicated uses of the has_one
relation when doing eager loading.
That's okay, though, because View Backed Models, which we'll cover in Part 4, are the knife that cuts through any knot.