Numbers Can Be Words (#133)

by Morton Goldberg

When working with hexadecimal numbers it is likely that you've noticed some hex numbers are also words. For example, 'bad' and 'face' are both English words and valid hex numbers (2989 and 64206, respectively, in decimal). I got to thinking that it would be interesting to find out how many and which hex numbers were also valid English words. Of course, almost immediately I started to think of generalizations. What about other bases? What about languages other than English?

Your mission is to pick a word list in some language (it will have be one that uses roman letters) and write Ruby code to filter the list to extract all the words which are valid numbers in a given base. For many bases this isn't an interesting task--for bases 2-10, the filter comes up empty; for bases 11-13, the filter output is uninteresting (IMO); for bases approaching 36, the filter passes almost everything (also uninteresting IMO). However, for bases in the range from 14 to about 22, the results can be interesting and even surprising, especially if one constrains the filter to accept only words of some length.

I used `/usr/share/dict/words` for my word list. Participants who don't have that list on their system or want a different one can go to Kevin's Word List Page (http://wordlist.sourceforge.net/) as a source of other word lists.

Some points you might want to consider: Do you want to omit short words like 'a' and 'ad'? (I made word length a parameter). Do you want to allow capitalized words (I prohibited them)? Do you want to restrict the bases allowed (I didn't)?


Quiz Summary

We all know this problem isn't tough at all. Myself and several others solved it with a one-liner. We all enjoy a good one-liner, right?

Actually, I built this solution before the quiz was posted and when I shared it with Morton, he, very politely, mentioned the G-word. The fact is that I wasn't really trying to "golf" though. I was using a strategy of problem solving I call Thinking With The Command-Line. Let me take you through my process to my solution and beyond to show your what I mean.

First, it's important to remember that Ruby has a lot of command-line switches that help with these quick tasks. It's not a sin to use these tools. They make short work of jobs like this because the easy things should be easy. You're not golfing when you use these, you're just telling Ruby this is a quick job and you trust her to handle the details on this one.

Let's start with the basics. Obviously, we want to read over the dictionary and print some words in it. Let's begin with just that much:

$ ruby -pe 1 /usr/share/dict/words

The -p switch asks Ruby to wrap your program in a read and print loop over the files given as arguments (or STDIN). That was really all I wanted, so I just needed a program to be wrapped. That's where -e comes in. It let's you give the code on the command-line and I provided the most trivial code I could think of. It does nothing of course. It just gives Ruby something to wrap and let's her do all of the work for me.

OK, so I'm now printing the dictionary, but I really want to print just some words of the dictionary. I need to introduce some conditional that only prints when I say it's OK to do so. For that, we move to -p's twin -n and actually resort to writing a little code:

$ ruby -ne 'print if true' /usr/share/dict/words

The -n switch gives us the same loop around our code, just minus the print() statement. This lets me choose when I want to print() something.

The read loops that Ruby creates for us always stick the current line in $_. By default, that exactly what print() spits out.

Great. That's about half of this task. Now I just need the if condition and I'm done. Before I figure that out though, let's examine one other command-line switch. I want to set a base for the code to use. It's true that I could just drop a number in the code and change it as needed, but it would be better if the number was separate. Ruby has a shortcut switch for that too:

$ ruby -se 'p $base' -- -base=14

The -s switch adds some rudimentary variable processing to switches passes to the program. Note that I said switches passed to the program, not to Ruby. I used the -- switch above to end Ruby's switch processing and switch into the program context. You can then see that the switch just sets a global for us. That's fine for our purposes.

That means all we need is a Regexp that selects the words we want. I came up with:

/\A[\d\s#{("a".."z").to_a.join[0...($base.to_i - 10)]}]+\Z/i

That's really just one big character class describing the accepted characters. The Ruby code inside it creates a String of the alphabet and pulls enough letters off the front of it to match the current base. Note that we also allow for digits and the whitespace that will be at the end of each line.

If we put all of that together, we pretty much have my solution:

$ ruby -sne
'print if $_ =~ /\A[\d\s#{("a".."z").to_a.join[0...($base.to_i - 10)]}]+\Z/'
-- -base=12 /usr/share/dict/words

If you're not found of the Regexp, we could remove it. That involves two changes:

1. Convert our base into an Array of acceptable characters. We only want
such code to run one time, so we will place it in a BEGIN { ... } block.
2. Bring the characters in as an Array so that we can ease the testing of
letters. Ruby's has switches for that too. We can use -a to split()
the line of input and -F to provide the pattern to split() on. We will
also add -l to remove the line ending for us.

Here's the code:

$ ruby -slap
-F'\b|\B'
-e 'BEGIN { $hex = ("a".."z").to_a.first($base.to_i - 10) }'
-e 'next unless $F.all? { |l| $hex.include? l.downcase }'
-- -base=14 /usr/share/dict/words

Note that I snuck in another change in addition to those described. I switched back to -p and just skipped word that aren't numbers.

The trick in this version is that -a causes each line of input to be split() into the variable $F. I also added the -F switch with a pattern that will match between each character to control how the split() works.

Of course, we could argue the point of if this is still a one-liner since I'm now passing Ruby two lines of code, but I try not to loose a lot of sleep over such things.

A final weakness of this solution is that it doesn't finish processing as soon as it could. We can easily add that if you can tolerate one more line:

$ ruby -slap
-F'\b|\B'
-e 'BEGIN { $hex = ("a".."z").to_a.first($base.to_i - 10) }'
-e 'break unless $hex.include? $F.first.downcase'
-e 'next unless $F.all? { |l| $hex.include? l.downcase }'
-- -base=14 /usr/share/dict/words

Now might be the right time to consider putting all of this in a file, especially if we wanted to do more with it. I'm done though, so I'll leave that as an exercise for the interested reader.

My thanks to all who showed how easy this can really be.

Tomorrow we're back to simulations and drawing pretty pictures...