This is part 3 of 4 in a series on how to use Greatest Per Group queries in Ruby on Rails to avoid the N + 1 query problem:

  1. When and how to use Greatest Per Group queries (with PostgreSQL)
  2. Rails implementation using custom query methods
  3. Rails implementation using a scoped has_one relation
  4. Rails implementation using a view backed model

Everything has been tested with Rails 4.2 and PostgreSQL. See demonstration app and it's source code


In Part 2 of this series, we covered how to use a Greatest Per Group query in Rails using custom query methods, as well as some of the limitations of that approach. Here we'll cover an approach that makes up for those limitations and brings some extra benefits to the table.

A scoped has_one association

We'll need our Greatest Per Group query from Part 1 fresh in our heads for this next part, so here it is again:

SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN (SELECT DISTINCT ON (author_id) *
           FROM posts
           ORDER BY author_id, created_at DESC)
           AS newest_posts
ON newest_posts.author_id = authors.id

There are three things happening in this query:

  1. We're creating a temporary newest_posts table that returns a single post – the newest one – for each author.
  2. Then, we're joining this temporary table to our authors table on authors.id = newest_posts.author_id.
  3. Finally, we're selecting some data from that newest_posts table to return along with our author data.

Let's forget about the third item – selecting the data – for now and instead focus on the first two.

We're creating a table that has a single row per author and we're joining it to the authors table on the author_id foreign key. If that feels eerily familiar to you, it's because this is essentially exactly what happens when we have a has_one association defined on a model. What we need to do is somehow get Rails to use our subquery for a has_one we define. Here's our subquery again:

SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC

Luckily, Rails allows us to customize – to scope – the query used for a has_one association. Here's how we do that with our subquery:

class Author
  has_many :posts
  has_one :newest_post, -> {
    select("DISTINCT ON (author_id) *")
    .order(:author_id, created_at: :desc)
  }, class_name: "Post"

This basically tells Rails that 1) we have a has_one called newest_post, 2) it is essentially a Post, and 3) to use .select("SELECT DISTINCT ON (author_id) *").order(:author_id, created_at: :desc) when generating the query. And because it is just like any other has_one association, you can eager load it:

@authors = Author.includes(:newest_post).all
@authors.each do |author|
  puts "Newest post for #{author.full_name}: #{author.newest_post.title}"
end

And it only fires off two queries.

Why this approach is awesome

Let's compare and contrast. Here's how we achieved this with custom query methods:

@authors = Author
  .select("authors.*,
           newest_posts.title AS newest_post_title")
  .joins("LEFT JOIN (SELECT DISTINCT ON (author_id) *
                     FROM posts
                     ORDER BY author_id, created_at DESC)
          AS newest_posts ON newest_posts.author_id = authors.id")

And here's what we just did:

class Author
  has_many :posts
  has_one :newest_post, -> {
    select("DISTINCT ON (author_id) *")
    .order(:author_id, created_at: :desc)
  }, class_name: "Post"

Already this feels like a big improvement. It eliminates a lot of the inline SQL of the customer query methods, it succinctly describes – has_one :newest_post – what is going on, and it's just more intuitive. The idea that an author has a newest post makes sense. It is easy idea to hold in your head. All that SQL stuffed inline into your Ruby code is not easy to hold in your head.

But there are other benefits to this approach as well.

One of the problems we identified with using custom query methods is that they aren't very composable. If you abstract them away behind named scopes, you can't combine the named scopes because the SELECT clauses interfere with each other. But scoped has_one associations are as composable as you can get.

Let's say we also wanted an oldest_post association:

class Author
  has_many :posts
  has_one :newest_post, -> {
    select("DISTINCT ON (author_id) *")
    .order(:author_id, created_at: :desc)
  }, class_name: "Post"
  has_one :oldest_post, -> {
    select("DISTINCT ON (author_id) *")
    .order(:author_id, created_at: :asc)
  }, class_name: "Post"

This allows us to do something like this:

@authors = Author.includes(:newest_post, :oldest_post)
@authors.each do |author|
  puts "Newest post for #{author.name}: #{author.newest_post.title}"
  puts "#{author.name} first posted on #{author.oldest_post.date}"
end

Compared to the mess of SQL and named scopes that we would need if we were using custom query methods, this is a breath of fresh air. Not only is it composable, it's way less code. And it works the way Rails wants to work. What do I mean by that? Well, because we've set these up as has_one assocations – and that's really what they were all along; an author's newest or oldest post is a single post that is related to the author – we get a bunch of stuff for free.

For instance, let's say we want to show a list of authors with the title of their newest post, but not include any authors who have not posted yet. Now that's simple. We can use our new has_one relations with joins:

@authors = Author.joins(:newest_post)
@authors.each do |author|
  puts "Newest post for #{author.name}: #{author.newest_post.title}"
end

And, since these are associations, they'll return full-fledged Post instances. So if we decide that we want to display other attributes of the newest post, there's no need to modify any SQL. We simply call the attribute we need:

<%= pluralize(@author.newest_post.word_count, "word") %>

And if we want to have a link to the newest post, that's simple too:

<%= link_to @author.newest_post.title, post_path(@author.newest_post) %>

Or we can use it in a form builder:

<%= form_for @author.newest_post do |f| %>
  <%= f.text_field :title %>
  <%= f.submit %>
<% end %>

Or we can access it's assocations:

<%= pluralize(@author.newest_post.comments.count, "comment") %>

You get the idea. You can do whatever you want with it, because it's an instance of one of your models. But each one of these would have required different custom SQL if we were just using custom query methods to add attributes to our Author instances.

How this approach works

So that covers why this approach is better, but I want to dive into how this approach works because, even though it might not look like it, we're getting ActiveRecord to use pretty much the same SQL query in both approaches. The main difference is in where we put the parts of that query.

Let's revisit our original query to return a list of the newest posts for each author:

SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC

In the first approach, we put this into a subquery, joined the result of that subquery to our authors table, then selected the column we needed:

SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN (SELECT DISTINCT ON (author_id) *
           FROM posts
           ORDER BY author_id, created_at DESC)
           AS newest_posts
ON newest_posts.author_id = authors.id

In this approach we don't have to do all of that because Rails already understands how to join has_one relations. Instead, all we have to do is create a has_one relation that executes our query and then Rails will know what to do with it. And if you take a closer look at the has_one we defined above, you'll see that it looks remarkably like that original query of ours:

   has_one :newest_post, -> {
    select("DISTINCT ON (author_id) *")
    .order(:author_id, created_at: :desc)
  }, class_name: "Post"

In the last argument, we're telling Rails that this has_one relation should return an instance of a Post. Normally, Rails will figure this out automatically based on the name we give our has_one relation, but that won't work in this case, so we have to specify the model class. And Rails knows that the table for the Post class is posts. Then, in the lambda passed as the second argument, we're specifying the SELECT DISTINCT ON and ORDER clauses that we want Rails to use when querying the posts table.

And the rest, as they say, is magic.

Well, almost

In my opinion, the approach we covered here – using a scoped has_one association – really hits the sweet spot for implementing Greatest Per Group queries in Rails. It works for most use cases and, because it does things in a Rails-y way, it's intuitive and composable. But it breaks down with more complicated stuff that's at the boundary of how much ActiveRecord "understands" SQL. Like, for instance, if you try to use joins in the custom scope on the has_one relation, or in more complicated uses of the has_one relation when doing eager loading.

That's okay, though, because View Backed Models, which we'll cover in Part 4, are the knife that cuts through any knot.