Secret Agent 00111 (#94)

by Mauricio Fernandez

"Secret Agent 00111 is back at the Casino again, playing a game of chance, while the fate of mankind hangs in the balance. Each game consists of a series of favorable events (probability p), terminated by the first occurrence of an unfavorable event (probability q = 1 - p). More specifically, the game is roulette, and the unfavorable event is the occurrence of 0, which has a probability of q = 1 / 37. No one seriously doubts that 00111 will come through again, but the Secret Service is quite concerned about communicating the blow-by-blow description to Whitehall.

The bartender, who is a free-lance agent, has a binary channel available, but he charges a stiff fee for each bit sent. The problem perplexing the Service is how to encode the vicissitudes of the wheel[1] so as to place the least strain on the Royal Exchequer."

00111 will be needing the communication device in 48H. Q has decided to give him a programmable gizmo, with a built-in Ruby interpreter, in the hope that 00111 will code/play with it on his own and do his best to return it in the original condition, very unlike all the other gadgets he's been given before.

Therefore, Q can enjoy the luxury of coding in Ruby, and he's confident he will be able to complete the task in time. Q's first thought of using Zlib::Deflate to compress the data as shown in the following example:

ruby
# EVENTS will be a large sequence of boolean events:
# true favorable event (00111 wins)
# false unfavorable event (00111 loses)
# we generate a sample one as follows:
EVENTS = (0..1000).map{ rand() > 1.0 / 37 }

# compress
require 'zlib'
string = EVENTS.map{|x| x ? "0" : "1"}.join("")
string.size # => 1001
compressed = Zlib::Deflate.deflate(string)
compressed.size # => 69

# now 'compressed' is transmitted by the bartender

However, Q has been wary of Zlib since he ran into some sort of BufferError when he was installing Mongrel using RubyGems, and he still mourns the sad demise of 00110, which was the direct result of a binary incompatibility between a Ruby build and a third-party extension.

Moreover, the MI6 has been spying on the management of the Casino, which faced a similar problem and solved it long ago, and intercepted the following message at the time the Casino's system was being developed:

TEST RUN 43
-----------
Original size: 10000
Compressed to: 242 bytes
Result using deflate: 462 bytes

Q keeps a very detailed log of his activities, so you know he has been reading up on Information Theory, and has found a paper that addresses the very problem he needs to solve:

Run-length encodings--S. W. Golomb (1966); IEEE Trans Info Theory 12(3):399[1]

He also found some information in the Wikipedia[2], and wrote the following unit tests before feeling indisposed while reading RailsBlob[3]:

Quiz Tests

Q has also wrote some code to exert the SecretAgent00111CommunicationGizmo:

Quiz Gimzo

Finally, Q left a note with the expected (approximate) output one should get when running 00111.rb:

Dear myself,

ruby 00111.rb
writes something like this to stdout before you hand your beautiful creation
to the brutish 00111:

Probability : 0.972972972972973
Approx. with: 0.957603280698574
Exponent: 4 (p**4 = 1/2; m = 2 ** e = 16)
Decoded correctly? yes
Original size: 1000
Compressed to: 23 bytes (2.3%)
Compare to 57 bytes using deflate...
=========================================
Testing buffered encoder/decoder
Decoded correctly? (b) yes
Decoded correctly? (c) yes

Sincerely,

Q

Unfortunately, it seems Q is not doing any better, but management is sure you will be able to complete Q's code (that is, write 00111_gizmo.rb such that the above unit tests pass and the sample program works) in time, given all the material he left.

00111's mission will require _at least_ the following tests to pass:

* test_rle
* test_unrle
* test_basic_encoding
* test_basic_decoding
* test_round_trip_probabilistic

The other tests are optional (you can choose to comment them, but the sample program won't execute correctly if they fail, though), but you should probably do your best to make them pass: this is an excellent opportunity to prove your skills, and it seems Q's retirement is not too far away...

Good luck.

PS: you owe me one, man. Such opportunities are rare.

----
[1] http://urchin.earth.li/~twic/Golombs_Original_Paper/
[2] http://en.wikipedia.org/wiki/Golomb_coding
[3] http://railsblob.blogspot.com/

Quiz Summary

Mission Report

Once again Secret Agent 00111 has saved the world from certain doom. As usual the damage report was gratuitous: destroyed vehicles worth my salary for the next five years, the usual string of sexual harassment and unwanted pregnancy lawsuits we will need to address, and one missing flock of sheep according to a local farmer. (Don't ask!) Full details to follow under separate cover.

00111's secret to success really was rather brilliant, though please don't let on that we know that. The agent has more than enough ego as it is. Recognizing that Q was not going to come through this time, 00111 took the prototype device to a hacker operating under the alias Boris Prinz. (We're still trying to link Boris to a real identity, but we suspect him of numerous famous hacker incidents.) This turned out to be the key choice for 00111.

I'm sure you recall the brutal demise of 00100 when forced to rely on a Java programmer's abilities while under severe time constraints. Desperate for quicker results, 00111 gambled on a not-yet-mainstream community of Ruby programmers. It turns out these Ruby guys are a hidden community of elite hackers who seem to do this kind of this for fun. It's rather bizarre, but the results are undeniable.

Naturally, Q's gizmo was utterly destroyed during 00111's daring escape through the casino's canal drainage system. However, we're now monitoring all Boris Prinz communications and have managed to intercept an email message where he is bragging to the Ruby community about his success. The message contains a prototype of the code 00111 loaded into the gizmo. I'm going to examine that code in this next section so we have it on file.

Ruby Code Analysis

We will look at this code slightly out of order, so that we might better understand the thinking of Boris in his design. First, here is the heart of how boris managed to keep the communications so short and to the point:

ruby
module SecretAgent00111CommunicationGizmo
class UndefinedRLE < Exception
end

def self.rle events
raise UndefinedRLE.new if events.size == 0
rl = []
truevals = 0
events.each do |result|
if result
truevals += 1
else
rl << truevals
truevals = 0
end
end
raise UndefinedRLE.new if truevals != 0
rl
end
end

What you have here is a method that accepts a series of favorable or unfavorable events, represented by Ruby's boolean values true and false. The method returns a Run Length Encoding for these events, counting the number of truths that occur in a row. In order for this to be possible for any size stream, a trailing false must be appended after each each series of truths.

Here's a sample usage, so you can see what I mean here:

>> SecretAgent00111CommunicationGizmo.rle(
?> [true] * 10 + # 10 favorable events
?> [false] + # the trailing false
?> [false] + # 0 favorable events and trailing false
?> [true] * 10 + # 10 more favorable events
?> [false] # the trailing false
>> )
=> [10, 0, 10]

This method wasn't used in the gizmo's actual solution, though it helped us make sense of the process. Boriz claimed it was part of some "Test Driven Development (TDD)" process. Since none of our programmers are familiar with the concept, we assume he made it up.

This leads us to the closely related unencode method:

ruby
module SecretAgent00111CommunicationGizmo
def self.unrle rl
events = []
rl.each {|n| events += [true]*n + [false]}
events
end
end

Here we are examining the reverse operation, which unencodes the counts and adds back the trailing false markers.

The gizmo's actual encoding process was driven by the following method:

ruby
require 'stringio'

module SecretAgent00111CommunicationGizmo
def self.encode events, exponent
io = StringIO.new
e = Encoder.new(exponent, io)
events.each do |result|
e << result
end
e.finish
io.string
end
end

Here again we have the events and you can see that we also receive an exponent, which we will discuss in a bit. The events are fed to a custom Encoder object which encodes them and writes them to the specified IO parameter. A StringIO object is used as a stand-in IO here to collect the results in a Ruby String.

The Encoder is the source of most of the magic. Here is that code:

ruby
module SecretAgent00111CommunicationGizmo
class Encoder
def initialize exponent, io
@exponent = exponent
@io = io
@events = []
@bits = ''
@io << exponent.chr
end

def << result
@events << result
if result == false
encode_events
end
end

def encode_events
n = @events.size - 1
@events = []
@bits << "1" * (n / 2**@exponent)
@bits << "%0#{@exponent+1}B" % (n % 2**@exponent)
flush_bytes
end

def flush_bytes
# write every 8 bits to the stream:
@bits.gsub!(/[01]{8}/) do |byte|
@io << byte.to_i(2).chr
''
end
end

def finish
@events << false
encode_events
# fill up remaining byte with "1"s:
rest = @bits.size % 8
if rest != 0
@bits << "1" * (8 - rest)
end
flush_bytes
end
end
end

The Encoder begins by setting up variables to hold the exponent, IO stream, events, and a bit encoding. You can see that it sends the exponent through the stream right away. That's required by the encoding, which works out to be this:

1. A byte representing the exponent used to encode events
2. One or more encoded numbers, each consisting of the following:
a. A group of one bits number / 2 ** exponent in length
b. A zero bit
c. Exponent bits representing number % 2 ** exponent
3. Any one bits needed to pad the stream length to a multiple of eight

That list, really defines the rest of the methods we are looking at here. First, <<() is used to add events and it always triggers the call to encode_events() when a false is encountered. That method turns the event list into the bits described by 2a, 2b, and 2c. encode_events() then triggers a call to flush_bytes(), which sends those bits to the stream whenever we have at least eight (so they can be converted into binary byte data). Finally, finish() handles step 3 by adding the extra one bits needed.

Let's look at how that turns out when received at the other end. Again, we have a simple method driving the process:

ruby
require 'stringio'

module SecretAgent00111CommunicationGizmo
def self.decode data
events.pop # remove last false
events
end
end

This method wraps the stream in a custom Decoder object and reads the encoded events. The last false is removed, because it's just the marker that came with the final series of true events, and the event list is returned.

The Decoder is the final piece of the puzzle:

ruby
module SecretAgent00111CommunicationGizmo
class ExponentUndefined < Exception
end

class Decoder
def bits(string)
string.unpack("B*")[0]
end

def initialize io
@io = io
@exponent = nil
@bits = ''
@t = @n = nil
end

def exponent
return @exponent if @exponent
e = @io.getc
if e
@exponent = e
else
raise ExponentUndefined.new
end
@exponent
end

n = @n
n += @t * 2**exponent if @t
@t = @n = nil
@events += SecretAgent00111CommunicationGizmo.unrle([n])
end

def decode_bits
loop do
found = false
if @bits =~ /^(1+)0/
@t = \$1.size
@bits.sub!(/^(1+)/, '')
found = true
end
re = Regexp.new("^0([01]{#{exponent}})")
if @bits =~ re
@n = \$1.to_i(2)
@bits.sub!(re, '')
found = true
end
return unless found
end
end

@events = []
begin
exponent
@bits += bits(data)
decode_bits
rescue ExponentUndefined
# exponent not available yet in stream, try again later
end
@events
end
end
end

Start with initialize(), which is a setup very similar to Encoder. The @t and @n variables are used to track the sequence of true events found and the encoded number itself. Next skip to read() which is the primary work method. It begins by using exponent() to find the magic number, if present. If the exponent isn't yet available, read() returns zero events and the code can try the stream again later. Next, bits() is a trivial wrapper to unpack all of the bytes (most significant first) into a String of ones and zeros.

The real work is done in decode_bits(). This method loops over the bits looking first for the run of one bits and then for the exponent length remainder. When both of those are located, a call to add_events() rebuilds the original encoded number which unrle() reverts to a series of events.

Summary

As you can see, 00111 survived thanks the the cunning of the hacker Boris Prinz. Had a less skilled coder been involved we might be inducting 01000 right now. Please send a care package to Boris's team: Daniel Martin, Karl Czisch, Kero, and Pierre Barbier de Reuille. We own them all our gratitude.

Don't let 00111 get too comfortable. We'll be sending him back out on assignment tomorrow morning, this time into the villainous Lisp domains to hijack their secrets...