Code to S-Exp (#95)

by Ken Bloom

S-expressions are a useful way of representing functional expressions in many aspects of computing. Lisp's syntax is based heavily on s-expressions, and the fact that Lisp uses them to represent both code and data allows many interesting libraries (such as CLSQL: http://clsql.b9.com/) which do things with functions besides simply evaluating them. While working on building a SQL generation library, I found that it would be nice to be able to generate s-expressions programmatically with Ruby.

An s-expression is a nested list structure where the first element of each list is the name of the function to be called, and the remaining elements of the list are the arguments to that function. (Binary operators are converted to prefix notation). For example the s-expression (in LISP syntax)

(max (count field))

would correspond to

max(count(field))

in ordinary functional notation. Likewise,

(roots x (+ (+ (* x x) x) 1 ))

would correspond to

roots(x, ((x*x) + x) + 1)

since we treat binary operators by converting them to prefix notation.

Your mission: Create a function named sxp() that can take a block (not a string), and create an s-expression representing the code in the block.

Since my goal is to post-process the s-expressions to create SQL code, there is some special behavior that I will allow to make this easier. If your code evaluates (rather than parsing) purely numerical expressions that don't contain functions or field names (represented by Symbols here), then this is satisfactory behavior since it shouldn't matter whether Ruby evaluates them or the SQL database evaluates them. This means, for example, that sxp{3+5} can give you 8 as an s-expression, but for extra credit, try to eliminate this behavior as well and return [:+, 3, 5].

It is very important to avoid breaking the normal semantics of Ruby when used outside of a code block being passed to sxp.

Here are some examples and their expected result:

sxp{max(count(:name))} => [:max, [:count, :name]]
sxp{count(3+7)} => [:count, 10] or [:count, [:+, 3, 7]]
sxp{3+:symbol} => [:+, 3, :symbol]
sxp{3+count(:field)} => [:+, 3, [:count, :field]]
sxp{7/:field} => [:/, 7, :field]
sxp{:field > 5} => [:>, :field, 5]
sxp{8} => 8
sxp{:field1 == :field2} => [:==, :field1, :field2]
7/:field => throws TypeError
7+count(:field) => throws NoMethodError
5+6 => 11
:field > 5 => throws NoMethodError

(In code for this concept, I returned my s-expression as an object which had inspect() modified to appear as an array. You may return any convenient object representation of an s-expression.)


Quiz Summary

This quiz is a little more challenging than it looks. The real trick is to satisfy the quiz conditions without breaking the Ruby interpreter in the process. Let me show you what I mean.

It's easy enough to get the first few quiz tests passing, with code like my own submission:

ruby
class SXP
instance_methods.each do |meth|
undef_method(meth) unless meth =~ /\A__/ or meth == "instance_eval"
end

def initialize(&block)
@code = block
end

def method_missing(meth, *args, &block)
if args.any? { |e| e.is_a? Array }
[meth, args.inject(Array.new) { |arr, a| arr.push(*a) }]
else
[meth, *args]
end
end

def result
instance_eval(&@code)
end
end

def sxp(&block)
SXP.new(&block).result
end

This works by creating a new class and stripping it of as many methods as possible, so method_missing() can catch just about any message received. The code block passed to initialize() is tucked away for later use in the result() method, which triggers the block and returns the Array built-up by the calls to method_missing(). Finally, you can see the trivial method used to wrap this class and support the quiz interface.

Then what though?

Some of the quiz test don't directly call methods we can field with my magic object, instead relying on behaviors of the Fixnum and Symbol classes:

sxp{3+:symbol} => [:+, 3, :symbol]
sxp{3+count(:field)} => [:+, 3, [:count, :field]]
sxp{7/:field} => [:/, 7, :field]
sxp{:field > 5} => [:>, :field, 5]
sxp{:field1 == :field2} => [:==, :field1, :field2]

If we want to support these with pure Ruby, we need to start ripping up and rebuilding Ruby's core classes. A few of solutions did just that. Here's Robin Stocker's code as an example:

ruby
class SxpGenerator

def method_missing(meth, *args)
[meth, *args]
end

BINARY_METHODS = [:+, :-, :*, :/, :%, :**, :^, :<, :>, :<=, :>=, :==]

def self.overwrite_methods(mod)
BINARY_METHODS.each do |method|
mod.module_eval do
if method_defined? method
alias_method "__orig_#{method}__", method
end
define_method method do |arg|
[method, self, arg]
end
end
end
end

def self.restore_methods(mod)
BINARY_METHODS.each do |method|
mod.module_eval do
orig_method = "__orig_#{method}__"
if method_defined? orig_method
alias_method method, orig_method
remove_method orig_method
else
remove_method method
end
end
end
end

end


def sxp(&block)
klasses = [Fixnum, Bignum, Symbol, Array, Float, String]
klasses.each do |klass|
SxpGenerator.overwrite_methods(klass)
end
begin
result = SxpGenerator.new.instance_eval &block
rescue Exception
result = nil
end
klasses.each do |klass|
SxpGenerator.restore_methods(klass)
end
result
end

Robin's solution is a similar pattern to my own, with two additional steps: methods for core classes like Fixnum and Symbol are modified just before the instance_eval() and restored after. The modification is done using alias_method(), which makes a duplicate of the code, to save the old functionality and define_method() to replace the default. Afterwards the old methods are just copied back or removed, if they never existed to begin with.

This gets all of the quiz tests passing, but Boris Prinz summed up my opinion of doing something like this when he said of his own similar solution, "I wouldn't use this code in a production environment, because methods of built-in classes are redefined." Don't get me wrong, both Boris and Robin did an impressive job, but this kind of wholesale hacking of the core just seems too risky to rely on.

A lot can go wrong when you change Ruby's behaviors. What if some external libraries, counting on core functionality, are loaded when you make the change? The first reaction is probably to assume they won't be running while we are evaluating the expression, but can we be sure of that? What if they are running in background Threads? I guess we could set Thread.critical before we make the change, but this is quickly getting to be an ugly problem.

Here's another point to consider: why use a block at all? The primary point of methods that take blocks is to allow user code access to the power of Ruby at the time of the call. Well, if we have to change Ruby to make the call work, that idea goes out the window. If the method took a String instead, we could parse the expression ourselves or even hand it off to a forked copy of the Ruby interpreter that modifies itself, evaluates, and sends us back the result.

Ross Bamford used this safe hand-off strategy by calling on Why the Lucky Stiff's Sandbox extension. Let's see how that turns out:

ruby
require 'sandbox'

module Sxp
class << self
def sxp(&blk)
sb = Sandbox.new
sb.main.instance_variable_set(:@blk,
blk || raise(LocalJumpError, "No block given"))
sb.load("#{File.dirname(__FILE__)}/sandboxed.rb")
end
end
end

Here you can see that Ross creates a new Sandbox, passes the provided block into it by setting an instance variable inside the Sandbox, and triggers the execution of another file. Here's that second file:

ruby
class Object ; alias :__instance_eval :instance_eval ; end
class Array ; alias :__each :each ; end

[Object, Kernel, Symbol, Fixnum, Bignum, Float, NilClass, FalseClass,
TrueClass, Hash, Array, String].__each do |clz|
clz.class_eval do
instance_methods.__each do |m|
undef_method m unless /^__|^inspect$|^to_(s(?:tr)?|a(?:ry)?)$/.match(m)
end
def method_missing(sym, *args); [sym, self, *args]; end
def to_ary; [self]; end # needed by every class in this world
end
end

# A special method_missing on the main object handles 'function' calls
class << self; def method_missing(sym, *args); [sym, *args]; end; end

__instance_eval &@blk

This is another version of core class modification, but this one is brutal. Most of the methods of the core classes are just outright eliminated by undef_method() calls. Pretty much everything is replaced with the now-familiar method_missing() trick. On the last line, you can see the block invoked in this new not-really-Ruby interpreter.

What makes this solution different is that all of the damage was done only to the Sandbox. All of the core classes are alive and well in the original Ruby interpreter. This makes the code safer to rely on, in my opinion.

Definitely take some time to look through the solutions from Ryan Davis and Dominik Bathon as well. They make use of ParseTree (a library for Ruby to s-expression conversion!) and RubyNode to transform the Ruby code into an s-expression. These are very effective and don't need to modify the core so heavily, if at all.

My thanks to all Ruby/Lisp hackers who gave this problem a shot. You really showed off a wide range of possibilities.

Tomorrow, we will return to weaving tall tales, but this time take a different approach from Markov Chains...