This is the fourth and final article in the series reviewing the GTD application Tracks.
Rails is a wonderful productivity enhancer. It can sometimes seem almost like magic: you write a few lines of code here and there, and Rails just “does the right thing.� What’s not to love about that?
Unfortunately, that magic often tricks newcomers to Rails into thinking that the right thing will be done, even when there is no way for Rails to do so. I’m speaking, specifically, of database indexes (or rather, the lack thereof).
The Tracks application is certainly not the only one to do this, so this is in no way directed solely at them. I’ve encountered far too many Rails applications that seem to assume that defining the database tables is sufficient, and that Rails will magically take care of the rest. This, sadly, is not true, and has resulted in a whole generation of web applications with terrible database performance. The only indexes that Rails adds are on your tables’ primary keys. All other indexes must be defined explicitly, by you.
Even veterans can make this mistake. At 37signals, we were missing indexes on four (smaller) tables in Campfire until recently. It wasn’t until we started wondering why the load on that database was so high that I went looking, and discovered some full-table scans occuring in some frequent queries. Adding the necessary indexes caused the load to drop by over half!
I’ve addressed the question of when to add indexes in a post on my own blog. Rather than repost that article here, you can read it there: Indexing for DB performance.
In general, you’ll want indexes on your foreign keys, but if it were only that simple, Rails could figure that out for you. Sadly, the indexes you need depend largely upon how your application will query the database. Consider the following table definition:
1 2 3 4 5 6 7
create_table "notes", :force => true do |t| t.column "user_id", :integer, :default => 0, :null => false t.column "project_id", :integer, :default => 0, :null => false t.column "body", :text t.column "created_at", :datetime t.column "updated_at", :datetime end
The foreign keys here are user_id and project_id. Should there be indexes on those? It depends.
Consider the SQL generated for the following call:
note = Note.find(1234, :include => :user)
You should see SQL something like the following in your log:
1 2 3 4
SELECT * FROM notes, users WHERE users.id = notes.user_id AND notes.id = 1234
The database, upon receiving this query, has to determine how to satisfy it. Without getting into the nitty-gritty (mostly because the nitty-gritty differs between databases, but also because I’m only superficially familiar with the nitty-gritty, anyway), the DB decides that it will start with grabbing the note, since it was given a constant for that primary key.
So, it knows the note. That means it knows the user_id, and can then satisfy that part of the query via the primary key on users. After that, it’s done, since it has found all the necessary rows.
In this case, it was able to satisfy the query via the primary keys on the two tables. No other indexes were necessary, and since Rails defines our primary keys for us, that kind of query works efficiently “out of the box�.
Now, consider the SQL generated for something like this:
1 2
# user = User.find(1234) notes = user.notes
You’ll see something like this in your logs:
1 2 3
SELECT * FROM notes WHERE user_id = 1234
Ok, in this case, we know the user_id, since we know the user the notes belong to. We want to find all notes with that user_id. The database, however, cannot satisfy that query using any of the defined indexes, so it has to resort to the dreaded “full table scan�. It has to look at every row in the notes table. For small tables, even up to a few thousand rows, that won’t be noticable. But when you start getting tens or hundreds of thousands of rows, the difference becomes significant. If you’re on a shared host, people will start hating you.
So, let’s assume you recognize that your application is querying like this, looking records up by foreign key. Rails, fortunately, makes it simple to add new indexes:
1 2 3 4 5 6 7 8 9 10
# script/generate migration AddIndexToUser
class AddIndexToUser < ActiveRecord::Migration
def self.up
add_index :notes, :user_id
end
def self.down
remove_index :notes, :user_id
end
end
In general, you’ll want an index on any combination of columns that you are using to look up records. For Tracks, “Find all notes for this user� needs an index on the foreign key, but “Find the user for this note� does not.
I really cannot stress enough the importance of getting to know how your DBMS of choice works. I’ve only scratched the surface in this article, and in the post on my own blog. Find a good book on SQL and indexes. Learn specifically how your database uses indexes to satisfy queries. You cannot hope to scale your application, in any framework, if your database queries are doing full table scans all the time.
No comments yet.
You must be logged in to add your own comment.