For many of us, databases are no fun… surely not as fun as ActiveRecord or Rails in general. Every railers I know love the “dot notation” offered by ActiveRecord associations immensely… while refering to the vast majority of raw SQL queries as “Some ugly non-sense gibberish that we would all prefer not know the existence”.
Our hate towards the SQL language and databases is not a valid excuse to not assume our responsabilities as application developers. Recently I was working on a Rails plugin when the sentiment of being incompetent struck my body exactly like the lightning strikes an adventurous golfer who happily plays golf during a thunderstorm.
It all happened when I had a query that was generating a lot of results… and was slow as hell to process… even though I was using the :include option for Eager Loading.
@stuff = ParentStuff.find(:all, :include => :child_stuff, :order => “created_at DESC”)
Then, in my view, I had something like :
@stuff.each do |parent_stuff|
bla bla bla <%=parent_stuff.name%>
parent_stuff.child_stuff.each do |stuff|
bla bla bla <%=stuff.title%>
end
end
It was awfully slow! I started looking at the logs and saw this typical monstruous sql query. Look at it… you have to look at it. I know you don’t want to… but it won’t go away :
SELECT parent_stuff.`id` AS t0_r0, parent_stuff.`field1` AS t0_r1, parent_stuff.`field2` AS t0_r2, parent_stuff.`field3` AS t0_r3, parent_stuff.`field4` AS t0_r4, child_stuff.`id` AS t1_r0, child_stuff.`field5` AS t1_r1, child_stuff.`field6` AS t1_r2, child_stuff.`field7` AS t1_r3, child_stuff.`field8` AS t1_r4, child_stuff.`field9` AS t1_r5, child_stuff.`field10` AS t1_r6, child_stuff.`field11` AS t1_r7 FROM parent_stuff LEFT OUTER JOIN child_stuff ON child_stuff.the_foreign_key = parent_stuff.id WHERE (parent_stuff.created_at >= ‘2008-04-24 00:00:00’ AND parent_stuff.created_at <= '2008-04-24 09:44:56') ORDER BY parent_stuff.created_at
What's the problem with this? It's just one query... cannot take that much time. I thought it was a "Rails problem" until I ran the query directly in a MySql web interface. It took the same amount of time, meaning that the problem was within the query itself. Oh sh**... it was a database problem! Like many Rails Developers, my database skills are, while not inexistant, inadequate.
I tried to figure out what could be the problem with this query. I replaced every LEFT OUTER JOIN with INNER JOIN and Boom! the query executed in less than half a second.
My urgent desire of leaving the MySql web interface made me go back to my Rails code. I then tried some stupid random stuff like :
@stuff = ParentStuff.find(:all, :include => :child_stuff, :joins => “INNER JOIN child_stuff ON child_stuff.parent_stuff_id=id” :order => “created_at DESC”) but the slow LEFT OUTER JOINs remain in the query instead of being replace by the faster INNER JOIN. Then, I learned that in Rails 2.0, you could do Eager Loading with INNER JOIN by passing an association name to the :joins option, like that :
@stuff = ParentStuff.find(:all, :joins => :child_stuff, :order => “created_at DESC”)
Problem is I was working on a plugin… and I’d like it to be compatible with older versions of Rails. So I did what I was scared of doing since the beginning : Going to some MySql forums to read the advice of some DBA’s …
Then I found the answer to my problem… and this answer proved me that I needed to stop playing blindly with ActiveRecord like if the backend database was none of my business…. because it is. The answer is so obvious… but given the fact that I never wanted to care about databases since Rails exists, I didn’t think about it :
Answer : Add an index to the foreign key in the child table!
This is just one case that shows the dangers of not being aware of what’s going on in the database. It’s sad, but you cannot blindly let ActiveRecord manage everything that is Database related for you. It’s only a question of being in control of the entire application… and the relational database is a part of it.
UPDATE
This post has kind of turned against me. The only thing I wanted to say with this article is that ActiveRecord needs our supervision… not that I am a moron who can’t understand databases. I know what joins are, I know what indexes are… but in most situations I don’t have to “care” about them with ActiveRecord. Sorry but I felt the need to defend myself about this issue 🙂
How can we be so unrespectful about databases? I’ve been working in Rails since 1 year and never I looked at the indexes that are created (or not). Shame on me…
Nice article. I’m working with databases full time so I definitely appreciate indexes and using correct joins. Jamis Buck covered this issue a while back here (should be mandatory reading for rails developers) http://www.therailsway.com/2006/11/21/tracks-part-4
Some related reading, I did a presentation on ActiveRecord gotchas with include/select/find_by_sql. Not only joins, but limiting your selects are really important here http://work.rowanhick.com/2008/02/12/how-to-avoid-hanging-yourself-with-rails/
It’s very true that ActiveRecord causes too many Rails coders to not think about the database. ActiveRecord can be painfully slow
One question: Is it right that in many cases, switching an outer join to an inner join could cause different data to be selected from the database?
@Robin: yes, they can provide different results. Here’s a pretty good write up on the subject: http://www.wellho.net/mouth/158_MySQL-LEFT-JOIN-and-RIGHT-JOIN-INNER-JOIN-and-OUTER-JOIN.html
This is why I like the stuff that RedHill on Rails has. Allowing me to add my indexes and foreign key associations in my migrations.
You seem to have run into the same problem like many other developers. Just because you’re doing Rails, that doesn’t mean you don’t have to understand the works _behind_ Rails. Every web developer working with any programming language or framework should be at least familiar with the basics of joins (i.e. differences between outer and inner joins), basic database optimization (like using indexes) and how to track inefficient queries (using EXPLAIN in MySQL, for example).
Read. Learn. Remember. 😉
On the other hand, you lose a lot of the ActiveRecord advantages if you pay *too much* attention to the database. Don’t optimize prematurely. But yeah — if there’s slowness, tail the db log and look for inefficiencies.
After reading this post and seeing the comment “don’t optimize prematurely”, many database developers and seasoned programmers are shaking their heads right now.
I’ve seen many a code (in rails, python, or any language for that matter) that uses an ORM library which doesn’t generate efficient SQL. This is fine, as long as those developer using the ORM know what the generated SQL is and how to make it more efficent. The guys from iBATIS have a good idea of how to interface with a DB, but it’s missing the glamour of ActiveRecord. Sometimes custom SQL is the best option, regardless of how ugly it makes the ORM/ActiveRecord code look.
Rails DOES care about the database, in fact Rails cares more than most other web frameworks out there. ActiveRecord is designed to keep you CLOSE to SQL, so you never forget that in the end that’s what being executed.
You should think of ActiveRecord as something to help you construct and issue SQL commands that would be otherwise complex and troublesome to write, not as an excuse to not know anything about your database.
Rails is a calculator, and the database is math. Teach a child math before giving them a calculator!
I got exactly the opposite from you I do a great merise relationship model (i try …) but a sh***s rails models…
I learn merise with a teacher that doesn’t know ruby nor rails …(and who think c# THE programming language ..)so i start with complicated diagrams but poor(inefficient) relation in my apps.
well sometimes it’s hard to make rails/relationship diagrams match…
Indexes are good for more than speeding up SELECTs, as I’m sure you’re aware. One ActiveRecord feature bit me pretty nicely. The validates_uniqueness_of constraint is definitely not guaranteed as the skateboard book says on p. 369. Further, it does a SELECT every time it does an INSERT. This can be costly. A “guaranteed” way to enforce uniqueness and to help with efficiency is to create a unique index on the field(s) in question at the database level. You’ll need to trap the database error yourself though. YMMV.
Thanks everyone for your comments
The morale of the story for me is that you cannot blindly delegate everything that is “database related” to ActiveRecord or any other O/R mapping tool. Maybe one day tools like ActiveRecord will be so advanced that playing at the database level will not be a necessity anymore … but this time hasn’t come yet. Until then we have to keep an eye on the generated queries from time to time as well as do some optimizing when our app ask for it.
You know what? Before switching to Rails I was a convinced .NET (1.1) developer that used to write all his SQL queries by hand. I remember how much it was repetitive and tedious to do this for every single project. I hated it so much that I made the mistake of embracing ActiveRecord with too much passion when I made the switch to Rails. There is an old saying that love is blind… I should have think about it.
You should ignore the database, until you can’t ignore it any longer. I’ve seen the SQL that Rails has generated and figured I could write better by hand. But then I’ve seen the assembler generated by C compilers and reckoned that I could write better by hand. That is not the point though. Ultimately what you are missing here is a tool on the database that *tells* you when you should put an index on something. I’ve alwasy been surpirsed at how poor the tooling is at the database level. Perhaps there is too much ‘hand optimisation’ going on, not too ittle
Do you really know the difference between ‘inner’ and ‘outer’ joins? They are quite striking and you can’t replace one for the other with no adequate reasoning. Rails jas both :include and :joins keyword (and :joins accept an array or strings as well) and you have to do the reasoning on what do you want. It seems to me that you don’t know those differences and what do you really want.
@Akita,
*sigh*… enough with this already… this is getting on my nerves. I know the difference between INNER and OUTER. A left outer join includes the rows in the right tables even if there are no match (when it happens the columns in the right table will all be filled with NULLs) while an INNER join only returns the rows “that matches”. In my specific case it just could not happen that a row from the right table does not match a row from the left.
Is this explanation convince you that I know what joins are or do you still need more clarifications?
This is so very true: Rails is pretty damned magic, but sometimes the rubber meets the road.
SQL makes me laugh since a design goal was to be usable by typical business people … yeah, right. I consider myself a black-belt SQLer (arguable, but invite smackdowns from any interested combatant), and despite my >20 years experience with the language, I still find SQL to be the most infuriating and obtuse syntax on the face of the earth (this includes C and C++, as well as perl). Rails insulates us in perhaps 90% of normal cases from the gnarly details. The problem is those 10% of the rest.
I wonder if there’s a need for a Rails “sniff check” of a given controller action, or query? Wouldn’t it be cool if Rails warned “this query takes a really long time — try these things before hiring a $250/hour consultant: 1) use eager loading, 2) check to make sure these columns are indexed (“foo”, “bar”, “fubar”), and 3) learn SQL, you lazy, incompetent so-called software engineer. OK, maybe just 2 things.
We dealt with this issue in the last few days for a really complicated polymorphic query. It is as real as it gets. Rails doesn’t solve all problems … yet.
I second the motion for wishing rails would warn when it encountered a potentially bad query 🙂
Any one who is unaware of SQL and is creating an application should not be in IT. It is that simple. I have been in this business for 20years and this makes me sick. All that is see is garbage and idiots that create stuff that does not work. I consult to clients who hire you idiots and you will find that with the little knowledge you have your stuff will not work.
njdba, stfu and go back to Oracle developper 2000 you arrogant ass.
It’s not my style but damn it feels good to let it go once in a while.
He, he,… I’ve been on IT business for 17 years, and must admit that Rails make me nervous when I think about performance in general – and DB in particular. But, we always have this very same argument with our DBA: times are changing, development cycles are meant to be “agile”, and not *everything* can be fine-tuned just-in-case.
After all, this is the reason for having a testing / QA process. When a slow query/transaction shows up… dig for it and fix it!
Nice article.
This gadget is easy to clean up and use, and you will save all kinds
of money in the long run. You may wonder what they can offer
you that a regular oven doesn’t have, and that would be a greater precision in what you’re cooking, meaning no failed recipes or undercooked food.
This can be a nifty little device that squeezes whole potatoes into perfectly cut fries which are instantly ready for frying.