Hash to OpenStruct (#81)

by Hans Fugal

More than a few times I've wished I could get a nice nested OpenStruct out of YAML data, instead of the more unwieldy nested hashes. It's mostly a matter of style. It's a straightforward task to convert a nested hash structure into a nested OpenStruct, but it's the sort of task that you can do a lot of ways, and I'll bet some of you can come up with more elegant and/or more efficient ways than I have so far.

Here's a sample YAML document to get you started:

---
foo: 1
bar:
baz: [1, 2, 3]
quux: 42
doctors:
- William Hartnell
- Patrick Troughton
- Jon Pertwee
- Tom Baker
- Peter Davison
- Colin Baker
- Sylvester McCoy
- Paul McGann
- Christopher Eccleston
- David Tennant
a: {x: 1, y: 2, z: 3}


Quiz Summary

The solutions for this quiz are bite size and filled-to-bursting with clever tricks, so let's check out several. First, here's the easiest way I saw to not solve the problem, by Ilmari Heikkinen:

ruby
class Hash
def method_missing(mn,*a)
mn = mn.to_s
if mn =~ /=$/
super if a.size > 1
self[mn[0...-1]] = a[0]
else
super unless has_key?(mn) and a.empty?
self[mn]
end
end
end

Ilmari doesn't bother fiddling with imports here, deciding instead to just implement an OpenStructish interface directly in Hash. Any method call Hash doesn't recognize will be sent to this method_missing() callback. Inside the method, Ilmari checks to see if the call ended with an equals sign and had exactly one argument. If so, the assignment is made. Otherwise, if a matching key can be found, it is fetched and returned.

Of course, this kind of wholesale modification of Hash is quite dangerous. If some code is counting on the normal failure chain of an unknown method called on Hash, this code could easily break it.

To get away from that, we're going to need to build the converters, as Jacob Fugal does here:

ruby
require "yaml"
require "ostruct"

class Object
def to_openstruct
self
end
end

class Array
def to_openstruct
map{ |el| el.to_openstruct }
end
end

class Hash
def to_openstruct
mapped = {}
each{ |key,value| mapped[key] = value.to_openstruct }
OpenStruct.new(mapped)
end
end

module YAML
def self.load_openstruct(source)
self.load(source).to_openstruct
end
end

p YAML.load_openstruct(File.read("sample.yml"))

This was a popular style of solution. People found ways to shorten it quite a bit, but it's easy to see the goal with this one. Any normal Object is given a to_openstruct() method that has no effect. Array's version calls to_openstruct() on each member and Hash passes the call down to each value. Finally, a method is added to YAML that does the load(), then starts the chain reaction at the top of the constructed object tree. This call gets passed down by Arrays and Hashes as we just saw and converts most Hashes to OpenStructs. (It doesn't convert Hashes in the instance variables of custom objects that have been serialized.)

Now MenTaLguY threw a big monkey wrench into solutions like this when he posted this alternate test data:

---
&verily
lemurs:
unite: *verily
beneath:
- patagonian
- bread
- products
thusly: [1, 2, 3, 4]

As you can see, YAML is allowed to have nested data structures. When I run Jacob's solution on this I get the odd error:

Illegal instruction

One of MenTaLguY's own solutions to this was to use some lazy evaluation and memoization:

ruby
require 'ostruct'
require 'lazy'

def hashes_to_openstructs( obj, memo={} )
return obj unless Hash === obj
memo[obj.object_id] ||= promise {
OpenStruct.new( Hash[
*obj.inject( [] ) { |a, (k, v)|
a.push k, hashes_to_openstructs( v, memo )
}
] )
}
end

This is a recursive solution like Jacob's, though trimmed down and less complete. The trick here is that instead of walking the whole object tree immediately, the code just promises to do it when needed. A promise() is just a magic object that springs to life when it is actually used for the first time (constructing itself by calling the block).

The other trick of this method is the memoization. Using the memo Hash and the ||= operator, the conversion process caches each object by object_id(). Any future calls for the same object, just get the already constructed version straight from the cache.

Then TRANS pointed out a interesting fact, YAML already has to understand all this recursive data structure stuff, so we really want to let it do all the hard work and just change what it loads. TRANS poked around in the innards of YAML and did get a working solution, but why the lucky stiff was drawn into the challenge and suggested this version:

ruby
require 'yaml'
require 'ostruct'

class << YAML::DefaultResolver
alias_method :_node_import, :node_import
def node_import(node)
o = _node_import(node)
o.is_a?(Hash) ? OpenStruct.new(o) : o
end
end

This just overrides a piece of YAML behavior to check if the object just loaded was a Hash. When it is, it is replaced with an OpenStruct. Simple and very effective.

Now this YAML solution will convert Hashes inside the instance variables of objects. That's probably a bad thing, since those classes likely weren't designed with that in mind. You always have to weight the tradeoffs and choose a solution that will best meet your current needs.

A big thanks to all who couldn't help but fiddle with Hans's fun little challenge. I don't believe I've ever seen such variation in the solutions before.

Tomorrow's quiz is to help Benjohn Barnes get into shape...