Ruby: Hash default value – be cautious when you use it

Few weeks ago a friend asked me, why this Ruby example acts so strangely:

hash = Hash.new([])
puts hash #=> {}
hash['foo'] << 1 << 2 << 3
puts hash['foo'] #=> [1, 2, 3]
puts hash #=> {}
hash.delete('foo') #=> nil
puts hash['foo'] #=> [1, 2, 3]

You may ask, why a hash that clearly has some values in a ‘foo’ key is empty when we print it? Furthermore, why once we delete this key, the values are still present?

Everything goes down to the ::new method and the way Hash deals with the default value. Most of the programmers that I know were assuming, that when they pass an empty array to a hash initializer, each key without a value will be initialized with an empty array:

hash = Hash.new([])
puts hash #=> {}
puts hash['foo'] #=> []
puts hash['bar'] #=> []
puts hash #=> { 'foo' => [], 'bar' => [] }

However Ruby does not work like that. Under the hood, when ::new method is invoked, Ruby stores the default value inside newly created hash instance and uses it each time we request a value of a key that is not present. Internal implementation of this fetching method looks similar to this (in terms of how it works):

def fetch(key)
  instance_variable_get("@_#{key}") || @_defaults
end

It means that when you provide a default object, it will always be one and the same object. Ok. But it does not explain why when we print this array, it appears to be empty! Well… it does. What we were doing up until now in our examples was modifying the internal structure of a default array. This is the reason why Ruby thinks, that there’s nothing new in the array. In fact, there is nothing new and from Ruby perspective, the array is empty. We were reusing the default value all the times.

If you decide to use a Hash default value that is other than nil and you don’t understand this concept, you might get into trouble. That’s why it is a really good practice to initialize non-nil hashes with a block:

hash = Hash.new { |hash, key| hash[key] = [] }
puts hash #=> {}
hash['foo'] << 1 << 2 << 3
puts hash['foo'] #=> [1, 2, 3]
puts hash #=> { 'foo' => [1, 2, 3] }
hash.delete('foo') #=> [1, 2, 3]
puts hash['foo'] #=> []
puts hash #=> { 'foo' => [] }
17
Sep 2016
POSTED BY
POSTED IN Ruby Software
DISCUSSION 0 Comments
TAGS

, ,

Apache Zookeeper + Apache Kafka start / restart script

During my¬† work on Karafka framework I have to start/stop/restart Apache Kafka and Zookeeper quite often. Here’s a short script that will pull the most recent version of Kafka and Zookeeper, will run them and print their IPs.

Note, that it will also stop Kafka and Zookeeper if they were running.

ZOOKEEPER_CHROOT='/kafka'

docker stop zookeeper
docker stop kafka
docker rm zookeeper
docker rm kafka

docker pull jplock/zookeeper
docker pull ches/kafka

docker run \
  -d \
  -e ZOOKEEPER_CHROOT=$ZOOKEEPER_CHROOT \
  --name zookeeper \
  jplock/zookeeper:3.4.6

docker run \
  -d \
  -e ZOOKEEPER_CHROOT=$ZOOKEEPER_CHROOT \
  --name kafka \
  --link zookeeper:zookeeper \
  ches/kafka

ZOOKEEPER_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' zookeeper)
KAFKA_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' kafka)

echo "Zookeeper chroot: $ZOOKEEPER_CHROOT"
echo "Zookeeper: $ZOOKEEPER_IP"
echo "Kafka: $KAFKA_IP"
01
Aug 2016
POSTED BY
POSTED IN Linux Other Software
DISCUSSION 0 Comments