Rails, just like many other frameworks, has a lot of magic (and surprises too). Last month, when I had a chance to do a performance tuning for the ActiveRecord #find_in_batches
and #find_batch
I found some interesting thing about those two similar methods that's worth sharing.
But first let's get into a quick intro to whom haven't heard about any of those methods.
Here's the API doc for find_in_batches
:
Yields each batch of records that was found by the find options as an array.
And for in_batches
:
Yields ActiveRecord::Relation objects to work with a batch of records.
That's easy to understand but also very similar. In fact the difference is so subtle if we don't jump into the details. But in short, find_in_batches
yields each batch of records that was found while in_batches
yields ActiveRecord::Relation objects.
So the following code:
Post.find_in_batches do |group|
group.each { |post| puts post.title }
end
will only send one query per batch to database to retrieve all posts' data for the batch:
SELECT "posts".* FROM "posts" WHERE ...
However:
Post.in_batches do |group|
group.each { |post| puts post.title }
end
Will send two queries per batch to database. The first query to get posts' ids for the batch:
SELECT "posts"."id" FROM "posts" WHERE ...
And the second query to get all posts' data for the batch:
SELECT "posts".* FROM "posts" WHERE ...
More details:
If you look in to the source code for those two functions here, you will see that find_in_batches
actually calls in_batches
with load: true
passed in the argument. However the default value for load
is false
in in_batches
.
And if you look further in the in_batches
for the part that uses the value of load
, it will look like this:
if load
records = batch_relation.records
ids = records.map(&:id)
yielded_relation = where(primary_key => ids)
yielded_relation.load_records(records)
else
ids = batch_relation.pluck(primary_key)
yielded_relation = where(primary_key => ids)
end
I hope this post makes it clear for you guys who trying to find the differences between find_in_batches
and in_batches
. Knowing the differences will help developers to use Rails' Active Record more efficiently.