Greatest-Per-Group Query in Rails Using a View Backed Model
This is part 4 of 4 in a series on how to use Greatest Per Group queries in Ruby on Rails to avoid the N + 1 query problem:
- When and how to use Greatest Per Group queries (with PostgreSQL)
- Rails implementation using custom query methods
- Rails implementation using a scoped
has_one
relation - Rails implementation using a view backed model
Everything has been tested with Rails 4.2 and PostgreSQL. See demonstration app and it's source code
In Part 3 of this series, we covered how to implement a Greatest Per Group query in Rails using a scoped has_one
relation – my preferred method – but we left off with the caveat that there are some cases where that approach will not work. Here we'll cover an approach that requires a bit more setup, but always works: using a View Backed Model.
A View Backed Model
Let's start by unpacking the term View Backed Model. In Rails, each model is backed by a table in the underlying database. So, our Author
model is backed by the authors
table and our Post
model is backed by the posts
table. A view backed model is a model that is backed by a database view, instead of by a normal table.
Great, so what exactly is a database view? Well, you could go read the Postgres documentation on views, but the short version is that views allow you to encapsulate a query behind a table interface. Basically, a database view lets you assign a name to a query and then interact with the result that query generates as if it were any other table in your database (filter it, join to it, etc.). Don't worry if that seems a little fuzzy to you, because we're about to make it more concrete by applying it to our Greatest Per Group problem.
Let's start by looking at the query we came up with back in Part 1 of this series:
SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
AS newest_posts
ON newest_posts.author_id = authors.id
Working from the inside out, what we're doing in this query is:
- Using a subquery to creating a temporary table.
- Naming the temporary table
newest_posts
. - And then joining it to the
authors
table.
A database view allows us to encapsulate the subquery that makes up our temporary newest_posts
table and make it permanently available throughout our database. Here's how we create that database view:
CREATE VIEW newest_posts AS
SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC
As you can see, it looks remarkably similar to our subquery from above, just rearranged a little bit. Now we have a view called newest_posts
that we can treat like a table. And every time we refer to it, PostgreSQL will run this query and treat the results as a table.
So if we want the newest post for each author, we can do:
SELECT * FROM newest_posts
And to get the newest post for a specific author, we can do:
SELECT * FROM newest_posts WHERE author_id = 42
Or, if we want to accomplish our original query from above, we can do:
SELECT authors.*, newest_posts.title AS newest_post_title
FROM authors
LEFT JOIN newest_posts
ON newest_posts.author_id = authors.id
As you can see, our view is a drop in replacement for the original subquery. Which makes sense, because PostgreSQL is literally running the exact same query in the spot where it says newest_posts
.
Database views are a powerful way to encapsulate commonly used or complicated queries. And they can be an incredibly useful tool in a Rails app, because it's fairly easy to hit the edge of how well ActiveRecord maps to what you're trying to accomplish with a query. But ActiveRecord understands tables pretty well and, since a database view acts like a table, they are a powerful way to force ActiveRecord to just do what you need it to do.
Great, so how do we use database views in a Rails app?
Database Views in Rails
Since a database view is something that exists in the database, we'll create ours in a migration:
class CreateNewestPostsView < ActiveRecord::Migration
def up
execute <<-sql
CREATE VIEW newest_posts AS
SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC
sql
end
def down
execute <<-sql
DROP VIEW newest_posts
sql
end
end
And since a database view behaves like a table, all we need to do is create a model that follows the Rails naming conventions and things will just work (well, mostly, but we'll get to that in a bit):
class NewestPost < ActiveRecord::Base
end
Of course, Rails has no idea that the newest_posts
table that it expects to back the NewestPost
model is not actually a table at all, but is instead a dynamic query result comprised of rows from the posts
table. PostgreSQL takes care of all of that behind the scenes. All Rails knows is that there is a table-like thing called newest_posts
which returns one row per Author
and has an author_id
column that it can use to join to the authors
table. So, if we do:
NewestPost.all
Rails will generate SQL like:
SELECT * FROM newest_posts;
But PostgreSQL will actually do:
SELECT *
FROM (SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC)
This allows us to use use our View Backed Model just like any other model in our app. For instance, we can set up associations:
class NewestPost < ActiveRecord::Base
belongs_to :author
end
Then we can create the corresponding has_one
association in our Author
model:
class Author < ActiveRecord::Base
has_many :posts
has_one :newest_post
end
And now we have a normal ActiveRecord association that we can use for eager loading:
@authors = Author.includes(:newest_post)
@authors.each do |author|
puts "#{author.name}'s latest post: #{author.newest_post.title}"
end
And Rails only generates two queries: one to load the authors and another to load the newest posts.
A few things to be aware of
If you'll recall, we created our database view in a Rails migration, which is all fine and dandy. Except that we had to execute raw SQL instead of using one of ActiveRecord::Migration
's handy methods (like create_table
or add_index
). That's because Rails doesn't really know what views are. And a side effect of this is that, if you take a look at your schema.rb
after creating a view, you won't see your view in there.
So if somebody tries to get their development environment up and running with rake db:create
, they won't have any database views. Which is probably not what you want if you're trying to use views in your Rails app.
That's okay, though, because there are two ways around this.
The first option is to configure your app to use SQL instead of Ruby for schema dumps. By default, Rails is configured to do Ruby schema dumps. That's why schema.rb
ends in .rb
and consists of a series of ActiveRecord::Migration
methods designed to recreate your schema from scratch.
But you can configure Rails to do SQL schema dumps instead. In which case you'll get a structure.sql
file filled with a bunch of SQL statements that look like the output of pg_dump
. Critically, this type of schema dump will include statements to recreate any database views (and extensions) you've created.
You can read the documentation on how to do this, but it's pretty simple. In config/application.rb
, you change this line:
config.active_record.schema_format = :ruby
to:
config.active_record.schema_format = :sql
You'll probably want to delete your schema.rb
file, too.
And that's all there is to it.
There is another, more comprehensive approach to solving the schema dump problem, which is to use the Scenic gem. Scenic actually adds ActiveRecord::Migration
methods which allow you to avoid using SQL formatted schema dumps. Plus it has a bunch of other convenient tools for working with database views in your Rails app. I haven't used it, personally, but I plan to.
The other major consideration with View Backed Models is a little bit more subtle. And it is that instances of a View Backed Model are not instances of the model that is backed by the table they're derived from. That's a bit much to digest, so let me explain it with some code.
Let's grab a NewestPost
:
@newest_post = NewestPost.first
Because the newest_posts
database view selects all of the columns from the posts
table that it is derived from (that little *
at the end of the second line of our view):
CREATE VIEW newest_posts AS
SELECT DISTINCT ON (author_id) *
FROM posts
ORDER BY author_id, created_at DESC
The data from those columns is available on instances of NewestPost
. So we can do:
@newest_post.title
@newest_post.body
@newest_post.created_at
And they all work as we would expect them to. But any methods or associations defined on Post
will not work. So if Post
has_many :comments
and you try to do something like:
@newest_post.comments
It will not work.
It's easy to start thinking of instances of NewestPost
as instances of Post
, since that's what they are in a logical sense, but to the dumb machine, NewestPost
and Post
are just two completely unrelated model classes.
So, around now you might be thinking to yourself, "Aha, but this is easy to solve. I'll just make NewestPost
inherit from Post
and then it'll have access to all of the methods and associations defined on Post
."
In a sane world, that would be a perfectly reasonable way to proceed. But, unfortunately, we exist in a world where Rails has Single Table Inheritance (STI). I'll spare you a deep dive on STI, but the gist of it is that, for descendants of ActiveRecord::Base
, Rails stomps all over your ability to have one class inherit from another in a way that would be useful to us here.
But, as STI taketh away, so STI giveth.
Because STI is such a convoluted mess, it necessitates the existence of the very tool that will allow us to escape the conundrum that it has created for us. And that tool is the becomes
method. You can read the API docs, but becomes
uses the attributes of the object you call it on to create an instance of the class you pass to it. Yet another thing that is easier understood with code. Here it is in action:
@newest_post.becomes(Post)
Which is more or less the equivalent of doing:
Post.new(@newest_post.attributes)
And since we now have an instance of Post
, we can use all of the methods and associations defined on Post
without having to redefine them. Which allows us to stuff like this:
@newest_post.becomes(Post).comments
So there you go. It's a bit of a kludge and I wish it were more elegant, but it works. And, in my opinion, the power and utility of View Backed Models are worth a modest sacrifice in elegance.
And that's it. Now you know way more than you ever wanted to about this strange little corner of the Rails world.