Ruby Quiz - Port a Library (#64)

Port a Library (#64)

Twice this week, I've gone looking for the Ruby equivalent to a simple Perl module and had trouble finding what I was after. Both times I've peeked inside the source and been surprised at how trivial the operations are. "I could port that in no time," I thought. This quiz is my thinly disguised attempt to pass my homework on to others. :)

Seriously, this quiz is *not* intended to be a lot of work. Don't underestimate the power of a simple library. (See the "Rethinking Memoization" thread where we are trying to improve a very helpful library that is literally 10 lines of code, in one of the forms presented.)

Given all that, this is a build-it-yourself Ruby Quiz. Most of us are familiar with another language. Go into their libraries and find something you like, that is also simple, and port the library to Ruby. (You might want to search the RAA and RubyForge first, just to make sure someone hasn't done similar work already.) If a library is over 200 lines, forget it. This one is for the little guys!

If you'll allow a brief aside here, it can be interesting to consider what the word "port" means. Obviously, the goal of this is to build a library that does the same things for Ruby. Don't think that means you should copy every method, verbatim though. If you don't think a method is needed, leave it out. See a better way to do something, use your way. Most important though, remember to Rubyize the interface. It's fine to port your favorite Java library, but Ruby programmers don't want to call methodsNamedLikeThis(). Watch for chances to use blocks and jump on them *when they lead to a better experience*. Just remember the adage, "If it ain't broke, don't fix it."

A few more details: Please tell us what your library does and show an example of simple usage in your submission email. Be kind to your quiz summarizer. ;) Also, please credit the original library and author who worked so hard to give you something cool to play with!

Now, if you have no idea what to port, here are two suggestions. (Please feel free to post other suggestions to Ruby Talk. These are *not* spoilers!)

File::ReadBackwards

This is a Perl module (by Uri Guttman) for reading a file in reverse, line-by-line. This can often be helpful for things like log files, where the interesting information is usually at the end.

Don't worry about the Perl interface on this one, copy Ruby's File instead. Heck, all I really want is a foreach() iterator. Anything else is extra.

This module is so well commented, you should be able to understand how it works, even if you aren't familiar with Perl. Here's a link straight to the source:

File::ReadBackwards Source on the CPAN

WWW::RobotRules

This is another Perl module (by Gisle Aas) and it is actually over the 200 line limit. Trust me though, it doesn't need to be. :)

The idea here is that many web sites provide a /robots.txt file, telling spider programs which pages they should not visit. This module gives you a way to parse these rules and make queries about what you are allowed to visit. You can learn all about the interface and even the file format of /robots.txt at:

WWW::RobotRules on the CPAN

Quiz Summary

The great element of porting a library is that you get examine another programmer's ideas. If you're lucky, that may teach you a new trick or two. I'll use my experience as an example.

Using Buffers

When I decided to port File::ReadBackwards, the first question I asked myself was, how do you read a file backwards? I decided that you would need to put the read head at the end of the file, then work it backwards bit by bit. You can't return a line until you have the whole thing, so I knew I would need to buffer the reads. I guessed I would be sure I had a line when I had seen two line endings (whatever is between them is a line) or run into the beginning of the file (no more data). That actually turns out to be the rough process, but, luckily for me, Uri Guttman is smarter than me and reading Uri's code taught me a couple of tricks.

Here's a simplified version of my port, showing only the interesting methods:

ruby

# Based on Perl's File::ReadBackwards module, by Uri Guttman.
class Elif
MAX_READ_SIZE = 1 << 10 # 1024

def initialize( *args )
@file = File.new(*args)
@file.seek(0, IO::SEEK_END)

@current_pos = @file.pos

@read_size = @file.pos % MAX_READ_SIZE
@read_size = MAX_READ_SIZE if @read_size.zero?

@line_buffer = Array.new
end

def gets( sep_string = $/ )
return @line_buffer.pop if @line_buffer.size > 2 or @current_pos.zero?

@current_pos -= @read_size
@file.seek(@current_pos, IO::SEEK_SET)

@line_buffer[0] = "#{@file.read(@read_size)}#{@line_buffer[0]}"
@read_size = MAX_READ_SIZE # Set a size for the next read.

@line_buffer[0] =
@line_buffer[0].scan(/.*?#{Regexp.escape(sep_string)}|.+/)
@line_buffer.flatten!

gets(sep_string)
end
end

You can see the first trick Uri taught me in initialize(). Working the pointer backwards can be messy. You have to keep track of where you are, shift the read pointer back, read some, but always make sure you have that many bytes left. There's an easier way.

If you pick some number of bytes you are going to read, you can consider the file to be in chunks that size. For example, if we are going to read ten bytes at a time and have a twenty-four byte file, we can deal with it in chunks of ten, ten, and four. The only odd chunk size is at the end, where we start, so we can deal with that immediately and then all future reads can be whatever chunk size we selected.

That's what initialize() is doing: Open the file, jump to the end, and set that initial read size to handle the dangling partial chunk.

Now we can tackle gets() and I'll tell you about the other lesson Uri taught me. First, the function of gets() is pretty basic: If we know we have a line or that we are out of lines, return it or nil. Otherwise, read some more data, trying to make one of those two exit conditions true, and recurse. The only hard part is deciding when we have a line.

I was dreaming up a complicated solution to this, when reading Uri's code showed me the light. You can store the data in the buffer exactly like it is in the file (or whatever we've read of it so far). We will break it at lines though, because that's what we're interested in reading. At any given time, it's very likely the buffer holds a partial line (because we haven't seen the rest). However, if we have at least two lines buffered, we can return one immediately. One of them is likely a partial sure, but the last one, the one the user wants, must be full now. This is easy to code, as you can see above.

In gets(), I prepend each read to the first (likely partial) line. Then I use scan() to find all the lines in there, creating an Array, then flatten!() to fold those lines back into the buffer, discarding the extra Array.

Thanks for the lessons Uri!

You really pick these insights up from reading the code of others, which is why I think reading code is important. This is one of the big perks of running the Ruby Quiz. I get to read a lot of code and the submissions are always teaching me things. This week was no different, so now let me tell you what I learned from others.

Lazy Evaluation

This next library came at just the right time for me. I'm reading Higher-Order Perl, by Mark Jason Dominus, and trying to apply the ideas I am learning there to my Ruby usage. One of the big concepts in the book is "lazy" code, which is just a fancy way of saying, I want to run this... later.

There are a lot of advantages to something like this. If an operation is expensive in computational terms, we can assure that it doesn't happen until it is needed. The advantage to that is that it may not be needed at all, which keeps us from wasting time.

Another less-used example is that we can delay evaluation until we have more information. The standard PStore library is a great example here. You pass it the path to the cache file in initialize(), but it waits until transaction() to actually open the file. The reason is that you can start a "read-only" transaction, or a normal transaction that allows reading and writing. By waiting to open the file, PStore knows the right mode to use, the right level of file locking to apply, etc.

The tricky part to lazy evaluation, for me at least, is just getting your head around it. When you decide you're ready though, here is an excellent first step:

ruby

module Trampoline
# Instance methods
class Bounce
def initialize(cons, klass, *args)
@klass, @cons, @args = klass, cons, args
end

def method_missing(method, *args)
@obj = @klass.send(@cons, *@args) unless @obj
@obj.send(method, *args)
end
end

# Class methods
class << Bounce
alias_method :old_new, :new

def new(*args)
old_new(:new, *args)
end

def method_missing(method, *args)
old_new(method, *args)
end
end
end

That's Matthew Moss's port of the Perl Trampoline module, by Steven Lembark. Let's break down what this does.

First, the instance methods. It seems that initialize() doesn't do anything except store some information. We will find out what for in method_missing().

Remember that method_missing() will be called for any message we haven't defined. In this case, that's pretty much everything. (More on that in a minute...) When called, method_missing() makes sure @obj is defined. If it's not, it is created by calling the proper cons(tructor) with the args initialize() tucked away. Then the message is just forwarded to @obj. That means the object is built just before the first method call, then reused to handle all future method calls.

The only problem with the above strategy is that Bounce includes some default Ruby methods inherited from Object. This means that something like a to_s() call isn't forwarded. You can fix this by adding something like the following to Bounce:

ruby

instance_methods.each { |m| undef_method m unless m =~ /^__/ }

Now let's look at the class methods. You can see that new() is moved and redefined, to change its interface for calling code. Now we see another method_missing(), this time for the class itself. It works just like the redefined new(), forwarding the message to the constructor. Remember, not all objects are constructed with new(). Singleton objects often use instance(), for example. This method allows for that. Whatever is called will later be used to build the object.

That's a great introduction to lazy evaluation. When that sinks in a bit and you're ready for more, see the excellent lazy.rb library by MenTaLguY:

lazy.rb

Currying

Another interesting technique discussed in Higher-Order Perl was also represented in the solutions to this quiz. Ross Bamford ported Perl's Sub::Curry module by Johan Lodin. Currying is basically the process of using functions to manufacture other functions, as seen in these simple examples:

ruby

require "curry"

scale = lambda { |size, object| object * size }

puts "3 scaleded to a size of 10 is #{scale[10, 3]}." # 30
puts

# curry some functions
double = scale.curry(2)
triple = scale.curry(3)
halve = scale.curry(0.5)

puts "4 doubled is #{double[4]}." # 8
puts "1 tripled is #{triple[1]}." # 3
puts "Half of 10 is #{halve[10]}." # 5.0

The great side of this library is that it can handle much more complicated argument setups than this. For example, what if the arguments to scale had been reversed? No problem:

ruby

scale = lambda { |object, size| object * size }

puts "3 scaleded to a size of 10 is #{scale[10, 3]}." # 30
puts

# we can leave "holes" in the argument list
double = scale.curry(Curry::HOLE, 2)
triple = scale.curry(Curry::HOLE, 3)
halve = scale.curry(Curry::HOLE, 0.5)

puts "4 doubled is #{double[4]}." # 8
puts "1 tripled is #{triple[1]}." # 3
puts "Half of 10 is #{halve[10]}." # 5.0

That works exactly the same, but if you're not impressed yet we can get even fancier. The library already supports a bunch of "spices", like HOLE above, but you can also add your own:

ruby

# Lazy Evaluation meets Currying...
class LazySpice < Curry::SpiceArg
def initialize( &promise )
super("LAZYSPICE")

@promise = promise
end

def spice_arg( args ) # called to provide the missing argument
[@promise.call]
end
end

logger = lambda do |time, message|
puts "[#{time.strftime('%I:%M:%S %p %m/%d/%y')}] #{message}"
end

log_now = logger.curry(LazySpice.new { Time.now })

log_now["First Message."] # => [12:47:53 PM 02/01/06] First Message.
sleep 3
log_now["Second Message."] # => [12:47:56 PM 02/01/06] Second Message.

Notice how the LazySpice isn't evaluated until the time of the call. That lazy execution makes sure our message is stamped with the time it was actually logged.

I'm not going to show the library here, since I've gone on long enough, but I hope you will agree that it's worth a look.

Wrap Up

Please take some time to look into the other solutions I didn't cover here. All of them were interesting ideas and I hope their authors will consider packaging them up for all to use.

Myself, and others, were worried that this would not be a popular quiz. It far exceeded my expectations though and I owe a big thank you to all who made that happen! You are all so clever it makes even me look good.

We'll go back to a more traditional Ruby Quiz problem tomorrow, I promise. This time we will borrow a problem from the Code Katas...