Greatest-Per-Group Query in Rails Using a View Backed Model

This is part 4 of 4 in a series on how to use Greatest Per Group queries in Ruby on Rails to avoid the N + 1 query problem:

  1. When and how to use Greatest Per Group queries (with PostgreSQL)
  2. Rails implementation using custom query methods
  3. Rails implementation using a scoped has_one relation
  4. Rails implementation using a view backed model

Everything has been tested with Rails 4.2 and PostgreSQL. See demonstration app and it's source code


In Part 3 of this series, we covered how to implement a Greatest Per Group query in Rails using a scoped has_one relation – my preferred method – but we left off with the caveat that there are some cases where that approach will not work. Here we'll cover an approach that requires a bit more setup, but always works: using a View Backed Model.

A View Backed Model

Let's start by unpacking the term View Backed Model. In Rails, each model is backed by a table in the underlying database. So, our Author model is backed by the authors table and our Post model is backed by the posts table. A view backed model is a model that is backed by a database view, instead of by a normal table.

Great, so what exactly is a database view? Well, you could go read the Postgres documentation on views, but the short version is that views allow you to encapsulate a query behind a table interface. Basically, a database view lets you assign a name to a query and then interact with the result that query generates as if it were any other table in your database (filter it, join to it, etc.). Don't worry if that seems a little fuzzy to you, because we're about to make it more concrete by applying it to our Greatest Per Group problem.

Let's start by looking at the query we came up with back in Part 1 of this series:

SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN (SELECT DISTINCT ON (author_id) *
           FROM posts
           ORDER BY author_id, created_at DESC)
           AS newest_posts
ON newest_posts.author_id = authors.id

Working from the inside out, what we're doing in this query is:

  1. Using a subquery to creating a temporary table.
  2. Naming the temporary table newest_posts.
  3. And then joining it to the authors table.

A database view allows us to encapsulate the subquery that makes up our temporary newest_posts table and make it permanently available throughout our database. Here's how we create that database view:

CREATE VIEW newest_posts AS
  SELECT DISTINCT ON (author_id) *
  FROM posts
  ORDER BY author_id, created_at DESC

As you can see, it looks remarkably similar to our subquery from above, just rearranged a little bit. Now we have a view called newest_posts that we can treat like a table. And every time we refer to it, PostgreSQL will run this query and treat the results as a table.

So if we want the newest post for each author, we can do:

SELECT * FROM newest_posts

And to get the newest post for a specific author, we can do:

SELECT * FROM newest_posts WHERE author_id = 42

Or, if we want to accomplish our original query from above, we can do:

SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN newest_posts
ON newest_posts.author_id = authors.id

As you can see, our view is a drop in replacement for the original subquery. Which makes sense, because PostgreSQL is literally running the exact same query in the spot where it says newest_posts.

Database views are a powerful way to encapsulate commonly used or complicated queries. And they can be an incredibly useful tool in a Rails app, because it's fairly easy to hit the edge of how well ActiveRecord maps to what you're trying to accomplish with a query. But ActiveRecord understands tables pretty well and, since a database view acts like a table, they are a powerful way to force ActiveRecord to just do what you need it to do.

Great, so how do we use database views in a Rails app?

Database Views in Rails

Since a database view is something that exists in the database, we'll create ours in a migration:

class CreateNewestPostsView < ActiveRecord::Migration
  def up
    execute <<-sql
      CREATE VIEW newest_posts AS
        SELECT DISTINCT ON (author_id) *
        FROM posts
        ORDER BY author_id, created_at DESC
    sql
  end

  def down
    execute <<-sql
      DROP VIEW newest_posts
    sql
  end
end

And since a database view behaves like a table, all we need to do is create a model that follows the Rails naming conventions and things will just work (well, mostly, but we'll get to that in a bit):

class NewestPost < ActiveRecord::Base
end

Of course, Rails has no idea that the newest_posts table that it expects to back the NewestPost model is not actually a table at all, but is instead a dynamic query result comprised of rows from the posts table. PostgreSQL takes care of all of that behind the scenes. All Rails knows is that there is a table-like thing called newest_posts which returns one row per Author and has an author_id column that it can use to join to the authors table. So, if we do:

NewestPost.all

Rails will generate SQL like:

SELECT * FROM newest_posts;

But PostgreSQL will actually do:

SELECT *
FROM (SELECT DISTINCT ON (author_id) *
      FROM posts
      ORDER BY author_id, created_at DESC)

This allows us to use use our View Backed Model just like any other model in our app. For instance, we can set up associations:

class NewestPost < ActiveRecord::Base
  belongs_to :author
end

Then we can create the corresponding has_one association in our Author model:

class Author < ActiveRecord::Base
  has_many :posts
  has_one :newest_post
end

And now we have a normal ActiveRecord association that we can use for eager loading:

@authors = Author.includes(:newest_post)
@authors.each do |author|
  puts "#{author.name}'s latest post: #{author.newest_post.title}"
end

And Rails only generates two queries: one to load the authors and another to load the newest posts.

A few things to be aware of

If you'll recall, we created our database view in a Rails migration, which is all fine and dandy. Except that we had to execute raw SQL instead of using one of ActiveRecord::Migration's handy methods (like create_table or add_index). That's because Rails doesn't really know what views are. And a side effect of this is that, if you take a look at your schema.rb after creating a view, you won't see your view in there.

So if somebody tries to get their development environment up and running with rake db:create, they won't have any database views. Which is probably not what you want if you're trying to use views in your Rails app.

That's okay, though, because there are two ways around this.

The first option is to configure your app to use SQL instead of Ruby for schema dumps. By default, Rails is configured to do Ruby schema dumps. That's why schema.rb ends in .rb and consists of a series of ActiveRecord::Migration methods designed to recreate your schema from scratch.

But you can configure Rails to do SQL schema dumps instead. In which case you'll get a structure.sql file filled with a bunch of SQL statements that look like the output of pg_dump. Critically, this type of schema dump will include statements to recreate any database views (and extensions) you've created.

You can read the documentation on how to do this, but it's pretty simple. In config/application.rb, you change this line:

config.active_record.schema_format = :ruby

to:

config.active_record.schema_format = :sql

You'll probably want to delete your schema.rb file, too.

And that's all there is to it.

There is another, more comprehensive approach to solving the schema dump problem, which is to use the Scenic gem. Scenic actually adds ActiveRecord::Migration methods which allow you to avoid using SQL formatted schema dumps. Plus it has a bunch of other convenient tools for working with database views in your Rails app. I haven't used it, personally, but I plan to.

The other major consideration with View Backed Models is a little bit more subtle. And it is that instances of a View Backed Model are not instances of the model that is backed by the table they're derived from. That's a bit much to digest, so let me explain it with some code.

Let's grab a NewestPost:

@newest_post = NewestPost.first

Because the newest_posts database view selects all of the columns from the posts table that it is derived from (that little * at the end of the second line of our view):

CREATE VIEW newest_posts AS
  SELECT DISTINCT ON (author_id) *
  FROM posts
  ORDER BY author_id, created_at DESC

The data from those columns is available on instances of NewestPost. So we can do:

@newest_post.title
@newest_post.body
@newest_post.created_at

And they all work as we would expect them to. But any methods or associations defined on Post will not work. So if Post has_many :comments and you try to do something like:

@newest_post.comments

It will not work.

It's easy to start thinking of instances of NewestPost as instances of Post, since that's what they are in a logical sense, but to the dumb machine, NewestPost and Post are just two completely unrelated model classes.

So, around now you might be thinking to yourself, "Aha, but this is easy to solve. I'll just make NewestPost inherit from Post and then it'll have access to all of the methods and associations defined on Post."

In a sane world, that would be a perfectly reasonable way to proceed. But, unfortunately, we exist in a world where Rails has Single Table Inheritance (STI). I'll spare you a deep dive on STI, but the gist of it is that, for descendants of ActiveRecord::Base, Rails stomps all over your ability to have one class inherit from another in a way that would be useful to us here.

But, as STI taketh away, so STI giveth.

Because STI is such a convoluted mess, it necessitates the existence of the very tool that will allow us to escape the conundrum that it has created for us. And that tool is the becomes method. You can read the API docs, but becomes uses the attributes of the object you call it on to create an instance of the class you pass to it. Yet another thing that is easier understood with code. Here it is in action:

@newest_post.becomes(Post)

Which is more or less the equivalent of doing:

Post.new(@newest_post.attributes)

And since we now have an instance of Post, we can use all of the methods and associations defined on Post without having to redefine them. Which allows us to stuff like this:

@newest_post.becomes(Post).comments

So there you go. It's a bit of a kludge and I wish it were more elegant, but it works. And, in my opinion, the power and utility of View Backed Models are worth a modest sacrifice in elegance.


And that's it. Now you know way more than you ever wanted to about this strange little corner of the Rails world.