Materialized views are not something widely used in Ruby on Rails applications. However, I have recently tried to use it and the results were very satisfying.
In this case study, I would like to present a simple application using Ruby 2.4.1, Rails 5.1.4, PostgreSQL 10 and the scenic gem.
What are database views?
From PostgreSQL documentation:
Suppose the combined listing of weather records and city location is of particular interest to your application, but you do not want to type the query each time you need it. You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table:
Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces.
Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon.
This is just a very handy method not to write complicated queries. It does not help with performance – the complicated query is still executed every time.
Materialized views
Fortunately, the views can be materialized. Again, let’s take a look into PostgreSQL documentation:
Materialized views in PostgreSQL use the rule system like views do, but persist the results in a table-like form. The main differences between:
CREATE
MATERIALIZED VIEW
mymatview AS
SELECT
* FROM
mytab;
and:
CREATE
TABLE
mymatview AS
SELECT
* FROM
mytab;
are that the materialized view cannot subsequently be directly updated and that the query used to create the materialized view is stored in exactly the same way that a view’s query is stored, so that fresh data can be generated for the materialized view with:
REFRESH MATERIALIZED VIEW
mymatview;
Simply speaking, the results of the view query are stored in the database – just like any other table. The only difference is that we can not update the view directly but it can be refreshed using records from its source tables.
This way, instead of executing expensive queries, we can use data organized in a more simple way in a materialized view.
Case study
Here you can find a very simple application I have created for the purpose of the blog post. It is an example of a real-life problem I have stumbled upon and extracted the essence into the repository.
The idea is simple. Users have permissions. However, permissions are not assigned directly to users. Users belong to groups and the groups have multiple permissions.
There are 5 models:
It is very probable you have already seen a similar pattern – grouping records with a similar purpose. There is nothing complicated with this design. However, getting information can require a few SQL joins.
Test data
To have some data to work on and test results I have prepared seeds.rb. It creates all permissions, 100 users, each belonging to 3 permission groups. There are 10 permission groups and each of them has 5 permissions.
Just run rails db:seed
to get your database filled with data.
Solution #1 – naive approach
Suppose we want to check whether a given user has :a
and :b
permissions:
This translates to the following SQL query:
What do we get with EXPLAIN
?
As we can see most scans are using indices so there is not much we can do to make it faster.
I have written a simple benchmark as a rake task. Using the seeded data I’m checking 1000 times whether a user has 1 permission, then 2 random permissions, 3, 4 and finally 5.
rake benchmark_permissions
user system total real
joins - 1 permission 1.750000 0.050000 1.800000 ( 2.266476)
joins - 2 permissions 1.840000 0.060000 1.900000 ( 3.427442)
joins - 3 permissions 1.860000 0.070000 1.930000 ( 3.497956)
joins - 4 permissions 1.880000 0.060000 1.940000 ( 3.568906)
Ok, it does not tell us much. Can we make it faster?
Solution #2 – materialized views approach
This is the moment when it gets interesting. Guys from thoughtbot have created a very nice gem called scenic. It makes managing database views in Rails easy. I recommend you try it.
For our problem we could create the following solution:
rails generate scenic:view permissions_check_results
This generates two files. One is db/views/permissions_check_results_v01.sql
, which I have filled with the code below:
What’s happening here? We are creating a simple view with two columns: user_id
and permission_name
(note that we are using enum – permission_name
is stored as an integer).
uder_id |
permission_name |
1 |
1 |
1 |
2 |
2 |
3 |
In the example above user with id 1
has two permissions: 1
and 2
, user with id 2
has only one permission: 3
. Instead of joining multiple tables and checking for permissions through groups we now have a very simple, two-column table (the view).
The other file generated by the scenic task is a database migration: db/migrate/[TIMESTAMP]_create_permissions_check_results.rb
The important part is adding the materialized: true
option. We also need to create a model.
It’s time to compare our previous benchmark results with new ones. Using the code from the beginning of the article and adding a few new lines:
rake benchmark_permissions
user system total real
joins - 1 permission 1.840000 0.050000 1.890000 ( 2.328792)
joins - 2 permissions 1.860000 0.070000 1.930000 ( 3.452119)
joins - 3 permissions 1.880000 0.060000 1.940000 ( 3.519821)
joins - 4 permissions 1.950000 0.070000 2.020000 ( 3.664984)
view - 1 permission 0.780000 0.060000 0.840000 ( 1.272834)
view - 2 permissions 0.770000 0.050000 0.820000 ( 1.310658)
view - 3 permissions 0.780000 0.060000 0.840000 ( 1.291651)
view - 4 permissions 0.800000 0.050000 0.850000 ( 1.329006)
The new results are 3x times faster. That’s a pretty good result. Some of you might have doubts:
Hey, this is only reading. What about updating records in database?
You are right – I have not mentioned this nor have I tested it. The reason is simple – I don’t care. The permissions in my case are updated very rarely. On the other hand, I need to get them very often. I care much more about reading than updating time.
Summary
If you need to write a lot of joins to get information from your database you may consider creating a materialized view.
- You can think of database views as kind of an API or public methods. You get a consistent way of accessing your records. The view remains the same while the source tables may change.
- It’s faster (if you read more often than write).