Code Cleaning (#26)

I'm always very vocal about how Ruby Quiz isn't interested in golf and obfuscation. It's my own private fight for clean code.

To be fair though, you can really learn a lot from practices like golf and obfuscation. It'll teach you a surprising number of details about the inner workings of your language of choice. Still principals are principals and if I bend, word will quickly get out that I've given up the fight. Can't allow that!

Here's my compromise.

This week's challenge is to utterly clean some famous examples of compressed Ruby code. Refactor the code until it's as readable as possible, whatever that means to you.

For those that faint at the sight of dense code, I offer a an "easier" challenge. Try this code by Mauricio FernáÏdez:

ruby
#!/usr/bin/ruby -rcgi
H,B=%w'HomePage w7.cgi?n=%s';c=CGI.new'html4';n,d=c['n']!=''?c['n']:H,c['d'];t=`
cat #{n}`
;d!=''&&`echo #{t=CGI.escapeHTML(d)} >#{n}`;c.instance_eval{out{h1{n}+
a(B%H){H}+pre{t.gsub(/([A-Z]\w+){2}/){a(B%$&){$&}}}+form("get"){textarea('d'){t
}+hidden('n',n)+submit}}}

If you prefer a "trickier" challenge, I offer this famous code from Florian Gross:

ruby
#!/usr/bin/ruby
# Server: ruby p2p.rb password server public-uri private-uri merge-servers
# Sample: ruby p2p.rb foobar server druby://123.123.123.123:1337
# druby://:1337 druby://foo.bar:1337
# Client: ruby p2p.rb password client server-uri download-pattern [list-only]
# Sample: ruby p2p.rb foobar client druby://localhost:1337 *.rb
################################################################################
# You are not allowed to use this application for anything illegal unless you
# live inside a sane place. Insane places currently include California (see
# link) and might soon include the complete USA. People using this software are
# responsible for themselves. I can't prevent them from doing illegal stuff for
# obvious reasons. So have fun and do whatever you can get away with for now.
#
# http://info.sen.ca.gov/pub/bill/sen/sb_0051-0100/sb_96_bill_20050114_introduced.html
################################################################################
require'drb';F=File;def c(u)DRbObject.new((),u)end;def x(u)[P,u].hash;end;def s(
p)F.basename p[/[^|]+/]end;P,M,U,V,*O=$*;M["s"]?(DRb.start_service V,Class.new{
def p(z=O)O.push(*z).uniq;end;new.methods.map{|m|m[/_[_t]/]||private(m)};def y;(
p(U)+p).map{|u|u!=U&&c(u).f(x(u),p(U))};self;end;def f(c,a=O,t=2)x(U)==c&&t<1?
Dir[s(a)]:t<2?[*open(s(a),"rb")]:p(a)end}.new.y;sleep):c(U).f(x(U)).map{|n|c(n).
f(x(n),V,0).map{|f|s f}.map{|f|O[0]?p(f):open(f,"wb")<<c(n).f(x(n),f,1)}}

This is a little different from the traditional Ruby Quiz, but I encourage all to play and learn. I promise to go back to normal challenges next week...


Quiz Summary

No takers for this idea, eh? We seem to like uglying up code better than cleaning it! Well, I'm a firm believer in eating my own dog food, so...

Solving this quiz isn't really about the end result. It's more the process involved. Here's a stroll through my process for the first script.

Timothy Byrd asked the right first question on Ruby Talk, which basically amounts to, "What does this sucker do?" The programs used are semi-famous and if you follow Redhanded, you probably already know:

Batsman's 5-Line Wiki

If you didn't, the -rcgi in the first line is a really big hint. -r is the command-line short cut for a requiring a library, in this case cgi. From there, it's pretty easy to assume that the script is a CGI script and that told me I needed to get it behind a Web server to play with it.

I could have put it behind Apache and worked with it that way, but I chose to use Ruby's standard WEBrick server instead. I'm glad I did too, because I ran into a few issues while getting it running that were super easy to see, by watching WEBrick's responses in my terminal. Here's the WEBrick script I wrote to serve it up:

ruby
#!/usr/local/bin/ruby

require "webrick"

server = WEBrick::HTTPServer.new( :Port => 8080,
:DocumentRoot => "cgi-bin" )

['INT', 'TERM'].each do |signal|
trap(signal) { server.shutdown }
end
server.start

That's super basic WEBrick in action. Pull in the library, initialize a server with a port and document directory, set signal handlers for shutting down, and start it up. This server can handle HTML, ERb templates, and, most importantly here, CGI. Perfect.

I created the referenced "cgi-bin" directory right next to my server.rb script and dropped in a file with the code to test, named "wiki.rb". I browsed over to http://localhost:8080/wiki.rb and got to watch all my clever work go up in flames. Luckily, bug hunting was pretty easy by watching WEBrick's output:

ERROR CGIHandler: /Users/james/Desktop/cgi-bin/wiki.cgi:
/usr/lib/ruby/1.6/cgi.rb:259:in `escapeHTML': private method `gsub'
called for []:Array (NameError) from /Users/james/Desktop/cgi-bin/wiki.cgi:3

Okay, the error message isn't perfect, but it did get me thinking. Wasn't CGI's handling of parameters changed somewhere around Ruby 1.8? A quick test with the constant RUBY_VERSION did show that I was running an old version of Ruby (1.6.8). Changing the shebang line got me back in business:

ruby
#!/usr/local/bin/ruby -rcgi

And I was greeted by a Wiki HomePage. Nifty. Now that I had it running, I felt like I could start dealing with the code and see what it was doing.

The first thing I like to do with any code I can't read is to inject a lot of whitespace. It helps me identify the sections of code. A cool trick to get started with this in golfed/obfuscated Ruby code is a global find and replace of ";" with "\n". Then season with space, tab and return to taste. Here's my space-out version:

ruby
#!/usr/local/bin/ruby -rcgi

H, B = %w'HomePage w7.cgi?n=%s'

c = CGI.new 'html4'

n, d = c['n'] != '' ? c['n'] : H, c['d']

t = `cat #{n}`

d != '' && `echo #{t = CGI.escapeHTML(d)} > #{n}`

c.instance_eval {
out {
h1 { n } +
a(B % H) { H } +
pre { t.gsub(/([A-Z]\w+){2}/) { a(B % $&amp;) { $&amp; } } } +
form("get") {
textarea('d') { t } +
hidden('n', n) +
submit
}
}
}

Now we're getting somewhere. I can see what's going on. This silly little change opened my eyes to another problem immediately. Take a look at that second line:

ruby
H, B = %w'HomePage w7.cgi?n=%s'

I now know what the original script was called: "w7.cgi". (The seventh Wiki? Batsman is an animal!) I modified the line to play nice with my version:

ruby
H, B = %w'HomePage wiki.cgi?n=%s'

On to the next step. Let's clean up some of the language constructs used here. We can spell out -rcgi, make those assignments slightly more obvious, eliminate the ternary operator, clarify the use of the && operator, remove the dependancy on the ugly $& variable, and swap a few { ... } pairs to do ... end pairs. I thought about removing the instance_eval() call, but to be honest I like that better than typing "c." 10 times. Let's see how the code looks now:

ruby
#!/usr/local/bin/ruby

require 'cgi'

H = 'HomePage'
B = 'wiki.cgi?n=%s'

c = CGI.new 'html4'

n = if c['n'] == '' then H else c['n'] end
d = c['d']

t = `cat #{n}`

`echo #{t = CGI.escapeHTML(d)} > #{n}` unless d == ''

c.instance_eval do
out do
h1 { n } +
a(B % H) { H } +
pre do
t.gsub(/([A-Z]\w+){2}/) { |match| a(B % match) { match } }
end +
form("get") do
textarea('d') { t } +
hidden('n', n) +
submit
end
end
end

The whole time I'm working with this code, I'm running it in my WEBrick server, checking my changes and learning more about how it functions. One thing I'm noticing is an occasional usage statement from cat:

cat: HomePage: No such file or directory

Sometimes it's being called on files that don't exist, probably before we add content to a given Wiki page. It still works (returning no content), but we can silence the warning. In fact, we should just remove the external dependancies all together, making the code more portable in the process:

ruby
#!/usr/local/bin/ruby

require 'cgi'

H = 'HomePage'
B = 'wiki.cgi?n=%s'

c = CGI.new 'html4'

n = if c['n'] == '' then H else c['n'] end
d = c['d']

t = File.read(n) rescue t = ''

unless d == ''
t = CGI.escapeHTML(d)
File.open(n, "w") { |f| f.write t }
end

c.instance_eval do
out do
h1 { n } +
a(B % H) { H } +
pre do
t.gsub(/([A-Z]\w+){2}/) { |match| a(B % match) { match } }
end +
form("get") do
textarea('d') { t } +
hidden('n', n) +
submit
end
end
end

At this point, I understand the code well enough to extend the variable names and add some comments, which should make its function pretty clear to others:

ruby
#!/usr/local/bin/ruby

# wiki.cgi

require 'cgi'

HOME = 'HomePage'
LINK = 'wiki.cgi?name=%s'

query = CGI.new 'html4'

# fetch query data
page_name = if query['name'] == '' then HOME else query['name'] end
page_changes = query['changes']

# fetch file content for this page, unless it's a new page
content = File.read(page_name) rescue content = ''

# save page changes, if needed
unless page_changes == ''
content = CGI.escapeHTML(page_changes)
File.open(page_name, 'w') { |f| f.write content }
end

# output requested page
query.instance_eval do
out do
h1 { page_name } +
a(LINK % HOME) { HOME } +
pre do # content area
content.gsub(/([A-Z]\w+){2}/) do |match|
a(LINK % match) { match }
end
end +
form('get') do # update from
textarea('changes') { content } +
hidden('name', page_name) +
submit
end
end
end

That's probably as far as I would take that code, without trying to make any fundamental changes. The functionality is still pretty much the same (including limitations!), but it's much easier to follow how the code works.

I used pretty much the same process to decrypt Florian's code, so I won't bore you with a repeat. One additional tip that did help me through the complex renamings is worth mentioning here though. When you need to rename a much-used method or variable, just do it and try to compile. That will often give you the exact line numbers that need updating.

One more interesting tidbit. When I entered the International Obfuscated Ruby Code Contest, I used pretty much the opposite approach. I wrote a clean version of my Ruby Quiz Loader, save that I tried to be a little more terse than usual. Once I had that working, I just kept beating on it with the ugly stick until I couldn't read it any more. For the curious, here's the original script:

ruby
#!/usr/bin/env ruby

require "open-uri"

puts "\nLoading...\n\n"

u, m, a = "http://www.rubyquiz.com/",
{ "nbsp" => " ", "lt" => :<, "gt" => :>, "amp" => :& },
[[/^\s+<\/div>.+/m, ""], [/^\s+/, ""], [/\n/, "\n\n"],
[/<br \/>/, "\n"], [/<hr \/>/, "-=" * 40], [/<[^>]+>/, ""],
[/^ruby/, ""], [/\n{3,}/, "\n\n"]]

open(u) { |w|
$F = w.read.scan(/li>.+?"([^"]+)..([^<]+)/)
}

puts "Ruby Quiz\n\n"
$F.each { |e|
i = e[0][/\d+/]
s = "%2s. %s" % [i, e[1]]
i.to_i % 2 == 0 ? puts(s) : print("%-38s " % s)
}

print "\n? "
n = gets.chomp.to_i

puts "\nLoading...\n\n"

open(u + $F[n-1][0]) { |q|
$_ = q.read[/^\s+<span.+/m]
a.each { |(s, r)| gsub!(s, r) }
gsub!(/&(\w+);/) { |e| m.key?($1) ? m[$1] : e }
while $_ =~ /([^\n]{81,})/
s = $1.dup
r = $1.dup
r[r.rindex(" ", 80), 1] = "\n"
r.sub!(/\n[ \t]+/, "\n")
sub!(/#{Regexp.escape(s)}/, r)
end
}

while sub!(/^(?:[^\n]*\n){20}/, "")
puts "#$&\n--MORE--"
g = $_
gets
exit if $_[0] == ?q
$_ = g
end
puts $_

That became:

Ruby Quiz Loader

Tomorrow, we're back to normal Ruby Quiz material, this time from Jason Bailey...