Ruby Quiz - SerializableProc (#38)

SerializableProc (#38)

I'm a Proc addict. I use them all over the place in my code. Because of that, whenever I end up needing persistence and I call Marshal.dump() or YAML.dump() on some object hierarchy, I get to watch everything explode (since Procs cannot be serialized).

This week's Ruby Quiz is to build a Proc that can be serialized.

I'm not aware of any possible way to add serialization capabilities to Ruby's core Proc, which rules out a complete solution. However, even if what we build is a hack, at least one person finds it super useful.

The task then is to build SerializableProc. It should support being serialized by Marshal, PStore, and YAML and otherwise behave as close to a Proc as possible. Put another way, make the following code run for your creation:

ruby

require "pstore"
require "yaml"

code = # Build your SerializableProc here!

File.open("proc.marshalled", "w") { |file| Marshal.dump(code, file) }
code = File.open("proc.marshalled") { |file| Marshal.load(file) }

code.call

store = PStore.new("proc.pstore")
store.transaction do
store["proc"] = code
end
store.transaction do
code = store["proc"]
end

code.call

File.open("proc.yaml", "w") { |file| YAML.dump(code, file) }
code = File.open("proc.yaml") { |file| YAML.load(file) }

code.call

Quiz Summary

The solutions this time show some interesting differences in approach, so I want to walk through a handful of them below. The very first solution was from Robin Stocker and that's a fine place to start. Here's the class:

ruby

class SerializableProc

def initialize( block )
@block = block
# Test if block is valid.
to_proc
end

def to_proc
# Raises exception if block isn't valid, e.g. SyntaxError.
eval "Proc.new{ #{@block} }"
end

def method_missing( *args )
to_proc.send( *args )
end

end

It can't get much simpler than that. The main idea here, and in all the solutions, is that we need to capture the source of the Proc. The source is just a String so we can serialize that with ease and we can always create a new Proc if we have the source. In other words, Robin's main idea is to go (syntactically) from this:

ruby

Proc.new {
puts "Hello world!"
}

To this:

ruby

SerializableProc.new %q{
puts "Hello world!"
}

In the first pure Ruby version we're building a Proc with the block of code to define the body. In the second SerializableProc version, we're just passing a String to the constructor that can be used to build a block. Christian Neukirchen had something very interesting to say about the change:

Obvious problems of this approach are the lack of closures and editor
support (depending on the inverse quality of your editor :P)...

We'll get back to the lack of closures issue later, but I found the "inverse quality of your editor" claim interesting. The meaning is that a poor editor may not consider %q{...} equivalent to '...'. If it doesn't realize a String is being entered, it may continue to syntax highlight the code inside. Of course, you could always remove the %q whenever you want to see the code highlighting, but that's tedious.

Getting back to Robin's class, initialize() just stores the String and creates a Proc from it so an Exception will be thrown at construction time if fed invalid code. The method to_proc() is what builds the Proc object by wrapping the String in "Proc.new { ... }" and calling eval(). Finally, method missing makes SerializableProc behave close to a Proc. Anytime it sees a method call that isn't initialize() or to_proc(), it creates a Proc object and forwards the message.

We don't see anything specific to Serialization in Robin's code, because both Marshal (PStore uses Marshal) and YAML can handle a custom class with String instance data. Like magic, it all just works.

Robin had a complaint though:

I imagine my solution is not very fast, as each time a method on the
SerializableProc is called, a new Proc object is created.

The object could be saved in an instance variable @proc so that speed is
only low on the first execution. But that would require the definition of
custom dump methods for each Dumper so that it would not attempt to dump
@proc.

My own solution (and others), do cache the Proc and define some custom dump methods. Let's have a look at how something like that comes out:

ruby

class SerializableProc
def self._load( proc_string )
new(proc_string)
end

def initialize( proc_string )
@code = proc_string
@proc = nil
end

def _dump( depth )
@code
end

def method_missing( method, *args )
if to_proc.respond_to? method
@proc.send(method, *args)
else
super
end
end

def to_proc( )
return @proc unless @proc.nil?

if @code =~ /\A\s*(?:lambda|proc)(?:\s*\{|\s+do).*(?:\}|end)\s*\Z/
@proc = eval @code
elsif @code =~ /\A\s*(?:\{|do).*(?:\}|end)\s*\Z/
@proc = eval "lambda #{@code}"
else
@proc = eval "lambda { #{@code} }"
end
end

def to_yaml( )
@proc = nil
super
end
end

My initialize() is the same, save that I create a variable to hold the Proc object and I wasn't clever enough to trigger the early Exception when the code is bad. My to_proc() looks scary but I just try to accept a wider range of Strings, wrapping them in only what they need. The end result is the same. Note that any Proc created is cached. My method_missing() is also very similar. If the Proc object responds to the method, it is forwarded. The first line of method_missing() calls to_proc() to ensure we've created one. After that, it can safely use the @proc variable.

The _load() class method and _dump() instance method is what it takes to support Marshal. First, _dump() is expected to return a String that could be used to rebuild the instance. Then, _load() is passed that String on reload and expected to return the recreated instance. The String choice is simple in this case, since we're using the source.

There are multiple ways to support YAML serialization, but I opted for the super simple cheat. YAML can't serialize a Proc, but it's just a cache that can always be restored. I just override to_yaml() and clear the cache before handing serialization back to the default method. My code is unaffected by the Proc's absence and it will recreate it when needed.

Taking one more step, Dominik Bathon builds the Proc in the constructor and never has to recreate it:

ruby

require "delegate"
require "yaml"

class SProc < DelegateClass(Proc)

attr_reader :proc_src

def initialize(proc_src)
super(eval("Proc.new { #{proc_src} }"))
@proc_src = proc_src
end

def ==(other)
@proc_src == other.proc_src rescue false
end

def inspect
"#<SProc: #{@proc_src.inspect}>"
end
alias :to_s :inspect

def marshal_dump
@proc_src
end

def marshal_load(proc_src)
initialize(proc_src)
end

def to_yaml(opts = {})
YAML::quick_emit(self.object_id, opts) { |out|
out.map("!rubyquiz.com,2005/SProc" ) { |map|
map.add("proc_src", @proc_src)
}
}
end

end

YAML.add_domain_type("rubyquiz.com,2005", "SProc") { |type, val|
SProc.new(val["proc_src"])
}

Dominik uses the delegate library, instead of the method_missing() trick. That's a two step process. You can see the first step when SPoc is defined to inherit from DelegateClass(Proc), which sets a type for the object so delegate knows which messages to forward. The second step is the first line of the constructor, which passes the delegate object to the DelegateClass. That's the instance that will receive forwarded messages. Dominik also defined a custom ==(), "because that doesn't really work with method_missing/delegate."

Dominik's code uses a different interface to support Marshal, but does the same thing I did, as you can see. The YAML support is different. SProc.to_yaml() spits out a new YAML type, that basically just emits the source. The code outside of the class adds the YAML support to read this type back in, whenever it is encountered. Here's what the class looks like when it's resting in a YAML file:

!rubyquiz.com,2005/SProc
proc_src: |2-
|*args|
puts "Hello world"
print "Args: "
p args

The advantage here is that the YAML export procedure never touches the Proc so it doesn't need to be hidden or removed and rebuilt.

Florian's solution is also worth mention, though it takes a completely different road to solving the problem. Time and space don't allow me to recreate and annotate the code here, but Florian described the premise well in the submission message:

I wrote this a while ago and it works by extracting a proc's origin file
name and line number from its .inspect string and using the source code
(which usually does not have to be read from disc) -- it works with
procs generated in IRB, eval() calls and regular files. It does not work
from ruby -e and stuff like "foo".instance_eval "lambda {}".source
probably doesn't work either.

Usage:

code = lambda { puts "Hello World" }
puts code.source
Marshal.load(Marshal.dump(code)).call
YAML.load(code.to_yaml).call

The code itself is a fascinating read. It uses the relatively unknown SCRIPT_LINES__ Hash, has great tricks like overriding eval() to capture that source, and even implements a partial Ruby parser with standard libraries. I'm telling you, that code reads like a good mystery novel for programmers. Don't miss it!

One last point. I said in the quiz all this is just a hack, no matter how useful it is. Dave Burt sent a message to Ruby talk along these lines:

Proc's documentation tells us that "Proc objects are blocks of code that
have been bound to a set of local variables." (That is, they are "closures"
with "bindings".) Do any of the proposed solutions so far store local
variables?

# That is, can the following Proc be serialized?
local_var = 42
code = proc { local_var += 1 } # <= what should that look like in YAML?
code.call #=> 43

An excellent point. These toys we're creating have serious limitations to be sure. I assume this is the very reason Ruby's Procs cannot be serialized. Using binding() might make it possible to work around this problem in some instances, but there are clearly some Procs that cannot be cleanly serialized.

My thanks to all who committed such wonderful code and discussion to this week's quiz. I know I learned multiple new things and I hope others did too.

Tomorrow we have a quiz to sample some algorithmic fun...