Ruby: Hash default value – be cautious when you use it

Few weeks ago a friend asked me, why this Ruby example acts so strangely:

hash = Hash.new([])
puts hash #=> {}
hash['foo'] << 1 << 2 << 3
puts hash['foo'] #=> [1, 2, 3]
puts hash #=> {}
hash.delete('foo') #=> nil
puts hash['foo'] #=> [1, 2, 3]

You may ask, why a hash that clearly has some values in a 'foo' key is empty when we print it? Furthermore, why once we delete this key, the values are still present?

Everything goes down to the ::new method and the way Hash deals with the default value. Most of the programmers that I know were assuming, that when they pass an empty array to a hash initializer, each key without a value will be initialized with an empty array:

hash = Hash.new([])
puts hash #=> {}
puts hash['foo'] #=> []
puts hash['bar'] #=> []
puts hash #=> { 'foo' => [], 'bar' => [] }

However Ruby does not work like that. Under the hood, when ::new method is invoked, Ruby stores the default value inside newly created hash instance and uses it each time we request a value of a key that is not present. Internal implementation of this fetching method looks similar to this (in terms of how it works):

def fetch(key)
  instance_variable_get("@_#{key}") || @_defaults
end

It means that when you provide a default object, it will always be one and the same object. Ok. But it does not explain why when we print this array, it appears to be empty! Well... it does. What we were doing up until now in our examples was modifying the internal structure of a default array. This is the reason why Ruby thinks, that there's nothing new in the array. In fact, there is nothing new and from Ruby perspective, the array is empty. We were reusing the default value all the times.

If you decide to use a Hash default value that is other than nil and you don't understand this concept, you might get into trouble. That's why it is a really good practice to initialize non-nil hashes with a block:

hash = Hash.new { |hash, key| hash[key] = [] }
puts hash #=> {}
hash['foo'] << 1 << 2 << 3
puts hash['foo'] #=> [1, 2, 3]
puts hash #=> { 'foo' => [1, 2, 3] }
hash.delete('foo') #=> [1, 2, 3]
puts hash['foo'] #=> []
puts hash #=> { 'foo' => [] }

Categories: Ruby, Software

2 Comments

  1. Great explanation. I really like that you included the printing quirk, which comes about since you only *used*, never *set*, hash[‘foo’]. This is one of the most common and useful gotchas I cover in my presentation on Ruby Gotchas — for which the slides are available at http://bit.ly/RubyGotchas . :-)

  2. Maciej Mensfeld

    October 27, 2016 — 12:19

    Glad I could help!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑