ActiveRecord count vs length vs size and what will happen if you use it the way you shouldn’t

One of the most common and most deadly errors you can make: using length instead of count. You can repeat this multiple times, but you will always find someone who'll use it the way it shouldn't be used.

So, first just to make it clear:

#count - collection.count

  • Counts number of elements using SQL query (SELECT COUNT(*) FROM...)
  • #count result is not stored internally during object life cycle, which means, that each time we invoke this method, SQL query is performed again
  • count is really fast comparing to length
2.1.2 :048 > collection = User.all; nil
 => nil
2.1.2 :049 > collection.count
   (0.7ms)  SELECT COUNT(*) FROM `users`
 => 16053
2.1.2 :050 > collection.count
 => 16053

#length - collection.length

  • Returns length of a collecion without performing additional queries... as long as collection is loaded
  • When we have lazy loaded collection, length will load whole colletion into memory and then will return length of it
  • Might use all of your memory when used in a bad way
  • Really fast when having a eagerly loaded collection
2.1.2 :055 > collection = User.all; nil
 => nil
2.1.2 :056 > collection.length
  User Load (122.9ms)  SELECT `users`.* FROM `users`
 => 16053
2.1.2 :057 > collection = User.all; nil
 => nil
2.1.2 :058 > collection.to_a; nil
  User Load (140.9ms)  SELECT `users`.* FROM `users`
 => nil
2.1.2 :059 > collection.length
 => 16053
2.1.2 :060 > collection.length
 => 16053

#size - collection.size

  • Combines abilities of both previous methods;
  • If collection is loaded, will count it's elements (no additional query)
  • If collection is not loaded, will perform additional query
2.1.2 :034 > collection = User.all; nil
 => nil 
2.1.2 :035 > collection.count
   (0.3ms)  SELECT COUNT(*) FROM `users`
 => 16053 
2.1.2 :036 > collection.count
   (0.3ms)  SELECT COUNT(*) FROM `users`
 => 16053 
2.1.2 :037 > collection.size
   (0.2ms)  SELECT COUNT(*) FROM `users`
 => 16053 
2.1.2 :038 > collection.to_a; nil
  User Load (64.2ms)  SELECT `users`.* FROM `users`
 => nil 
2.1.2 :039 > collection.size
 => 16053 

Why would you even care?

Well it might have a huge impact on your apps performance (and resource consumption). In general if you don't want to care at all and you want to delegate this responsibility to someone else, use #size. If you want to care, then play with it and understand how it works, otherwise you might end up doing something like this:

print "We have #{User.all.length} users!"

And this is the performance difference on my computer (with only 16k users):

       user     system      total        real
count     0.010000   0.000000   0.010000 (  0.002989)
length    0.730000   0.060000   0.790000 (  0.846671)

Nearly 1 second to perform such simple task. And this could have a serious impact on your web app! Keep that in mind.

Categories: Education, Rails, Ruby, Software

7 Comments

  1. I always use #size since I know this.
    Thanks for the tip!

  2. Good post. Thanks!

  3. You should test SQL performance too.

    When you do User.all.length AR fetches all records from database and then works with plain ruby array.

    When you do User.all.count AR adds count(*) and delegates the count logic to SQL server. In 90% cases it will be faster, but if you do a complex query with joins, the count() function can raise unexpected performance issues.

  4. It should be mentioned that `size` also integrates seamlessly with counter_cache columns. That is, it will not perform COUNT query if counter_cache column exists for a collection.

  5. +1 for testing SQL performance. With large tables, of course loading the entire data set will be slower (or even impossible), but that does NOT mean count(*) will be fast – even if there is no where condition. Google for your specific MySQL version, but in general “naked” (SELECT COUNT(*) FROM table) will use the primary (clustered) index, which isn’t fast at all.

    (MyISAM excels at this, because it stores the row count on the database header, so this operation is always constant time and extremely fast. But I see less and less installations using MyISAM nowadays)

  6. I have recently seen all those three returning different results when used with `joins`: 100, nil and some kind of exception. Careful!

  7. I just ran into a unique situation where all three of these methods returned different results and I had to go through all of them until I found the answer I wanted. I needed to find out how many unique searches have been formed with combination of keywords and location fields:

    Search.select(:keywords, :location).distinct

    Which generates the following SQL, as desired:

    SELECT DISTINCT `searches`.`keywords`, `searches`.`location` FROM `searches`

    For the purposes of this example let’s say we had performed 10 total searches and there were 3 unique combinations of keywords and locations.

    Trying to get the total via #count returns 1:

    Search.select(:keywords, :location).distinct
    SELECT DISTINCT COUNT(DISTINCT keywords, location) FROM `searches`

    Trying to get the total via #size returned 10:

    Search.select(:keywords, :location).size
    SELECT DISTINCT COUNT(DISTINCT `searches`.`id`) FROM `searches`

    And finally #length, which was the correct answer of 3:

    Search.select(:keywords, :location).length
    SELECT DISTINCT `searches`.`keywords`, `searches`.`location` FROM `searches`

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑