Tag: hash

Ruby: Hash default value – be cautious when you use it

Few weeks ago a friend asked me, why this Ruby example acts so strangely:

hash = Hash.new([])
puts hash #=> {}
hash['foo'] << 1 << 2 << 3
puts hash['foo'] #=> [1, 2, 3]
puts hash #=> {}
hash.delete('foo') #=> nil
puts hash['foo'] #=> [1, 2, 3]

You may ask, why a hash that clearly has some values in a 'foo' key is empty when we print it? Furthermore, why once we delete this key, the values are still present?

Everything goes down to the ::new method and the way Hash deals with the default value. Most of the programmers that I know were assuming, that when they pass an empty array to a hash initializer, each key without a value will be initialized with an empty array:

hash = Hash.new([])
puts hash #=> {}
puts hash['foo'] #=> []
puts hash['bar'] #=> []
puts hash #=> { 'foo' => [], 'bar' => [] }

However Ruby does not work like that. Under the hood, when ::new method is invoked, Ruby stores the default value inside newly created hash instance and uses it each time we request a value of a key that is not present. Internal implementation of this fetching method looks similar to this (in terms of how it works):

def fetch(key)
  instance_variable_get("@_#{key}") || @_defaults
end

It means that when you provide a default object, it will always be one and the same object. Ok. But it does not explain why when we print this array, it appears to be empty! Well... it does. What we were doing up until now in our examples was modifying the internal structure of a default array. This is the reason why Ruby thinks, that there's nothing new in the array. In fact, there is nothing new and from Ruby perspective, the array is empty. We were reusing the default value all the times.

If you decide to use a Hash default value that is other than nil and you don't understand this concept, you might get into trouble. That's why it is a really good practice to initialize non-nil hashes with a block:

hash = Hash.new { |hash, key| hash[key] = [] }
puts hash #=> {}
hash['foo'] << 1 << 2 << 3
puts hash['foo'] #=> [1, 2, 3]
puts hash #=> { 'foo' => [1, 2, 3] }
hash.delete('foo') #=> [1, 2, 3]
puts hash['foo'] #=> []
puts hash #=> { 'foo' => [] }

Ruby hash initializing – why do you think you have a hash, but you have an array

We all use hashes and it seems, that there's nothing special about them. They are as dictionaries in Python, or any other similar structures available for multiple different languages. And that's more or less true. Although Ruby hashes are really special, because from time to time they really are... arrays.

You won't see that clearly until you either look into Ruby source code, or you benchmark it. Here is code and and results of my benchmark:

Benchmark

require 'benchmark'

GC.disable

ar = nil

100.times do |steps|
  d = Benchmark.measure {
    100000.times { ar = { a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9, j: 10 } }
  }

  system("echo '#{ar.count}, #{d.real}' >> #{ar.count}.csv")
end

Benchmark results

hasharray

There's a huge disproportion between 6 and 7 elements. And that's the point where a Ruby "hashy array" becomes a real hash. To explain this, I will quote funny-falcon:

Investigation shows, that typical rails application allocates tons of small hashes. Up to 40% of whole allocated hashes never grows bigger than 1 element size.

That's the primary reason of this patch. Small arrays are faster and use less memory than hashes. If you want to see the code, here's a pull request with code that optimize arrays up to 6 elements.

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑