1-800-THE-QUIZ (#20)

Many companies like to list their phone numbers using the letters printed on most telephones. This makes the number easier to remember for customers. A famous example being 1-800-PICK-UPS.

This week's quiz is to write a program that will show a user possible matches for a list of provided phone numbers.

Your script should behave as a standard Unix filter, reading from files specified as command-line arguments or STDIN when no files are given. Each line of these files will contain a single phone number.

For each phone number read, your filter should output all possible word replacements from a dictionary. Your script should try to replace every digit of the provided phone number with a letter from a dictionary word; however, if no match can be made, a single digit can be left as is at that point. No two consecutive digits can remain unchanged and the program should skip over a number (producing no output) if a match cannot be made.

Your script should allow the user to set a dictionary with the -d command-line option, but it's fine to use a reasonable default for your system. The dictionary is expected to have one word per line.

All punctuation and whitespace should be ignored in both phone numbers and the dictionary file. The program should not be case sensative, letting "a" == "A". Output should be capital letters and digits separated at word boundaries with a single dash (-), one possible word encoding per line. For example, if your program is fed the number:

873.7829

One possible line of output is

USE-RUBY

According to my dictionary.

The number encoding on my phone is:

2 = A B C
3 = D E F
4 = G H I
5 = J K L
6 = M N O
7 = P Q R S
8 = T U V
9 = W X Y Z

Feel free to use that, or the encoding on your own phone.


Quiz Summary

Here's an interesting quote from Brian Schroeder's solution page:

To make it more difficult JEGII introduced the possibility to skip letters,
but in my opinion this leads to bad results (Though it made me think, so it
was a good idea).

I'm not sure exactly what qualifies as "bad results," but here's a thought I had while reading this:

CAR-4-YOU

That's not the reason I added it though. As usual, the real truth is far less interesting. It was pointed out to me, by Tobias Peters, that this problem originated as a means to compare various programming languages. I examined the original description of the problem, forwarded to me by Tobias, and tried to stay close to the original challenge. Allowing single numbers comes from there.

One last interesting quote from Brian's page:

-e, --encoding ENCODING
How the alphabet is encoded to phonenumbers. james or logic are supported.

Does this suggest that "james" and "logic" are opposites? Food for thought...

On to the solutions.

I'll show Brian's solution below, because I ran into some minor issues with the other two. However, do take a look at Jannis Harder's WEBrick servlet. It's very little work to offer this service to the whole world through a Web interface and that program shows it off nicely. Lee Marlow's solution was also short and very straight forward, though the running time was a little high.

Let's inspect Brian's code:

ruby
# Nodes in the Dictionary.
class DictionaryNode < Array
# Terminal info
attr_reader :words

def initialize
super(10)
@words = []
end
end

This first piece of the puzzle is a node to be used by a tree class we'll meet shortly. DictionaryNode is an Array that contains exactly 10 members. Why 10? Because that's how many digits our encoding has. DictionaryNode also contains an Array of words, that will be filled from the dictionary file.

Here's the start of Brian's tree class:

ruby
# A tree-indexed version of the dictionary that allows
# efficent searching by number 2 alphabet mapping.
class Dictionary
def initialize(encoding)
super()
@encoding = {}
@inverse_encoding = {}

encoding.each do | k, v |
@encoding[k] = v.split(/\s+/).map{|c| c[0]}
end

# Create map from characters to numbers
@inverse_encoding = @encoding.inject({}) { | r, (k, v) |
v.each do | l | r[l] = k end
r
}
@root = DictionaryNode.new
end

# ...

That's pretty easy to follow. This setup work creates maps for the encoding of this Dictionary object. The maps go both ways, numbers to letters and the inverse letters to numbers. Finally, the root of the tree is created from a new DictionaryNode.

The following methods add words to the tree from a dictionary file:

ruby
# ...

# Helper method for rekursive adding of words to the dictionary
private
def add_recursive(node, word, position)
if word.length == position
node.words << word
return node
end
add_recursive(
node[@inverse_encoding[word[position]]] ||= DictionaryNode.new,
word,
position + 1 )
end

# Add words to the dictionary
public
def add(word)
add_recursive(@root, word, 0)
self
end

# Load a wordlist from a file, which contains one word per line.
# Ignores punctuation and whitespace.
def load_wordlist(file, options)
$stderr.print "Loading dictionary... " if options.verbose
start = Time.new
file.read.gsub(/[^A-Za-z\n]/, '').upcase!.split($/).uniq!.each do |w|
w.chomp!
next if w.empty? or w.length <= options.min_length
self.add(w)
end
if options.verbose
$stderr.puts "built dictionary in %f seconds" %
(Time.new-start).to_f
end
self
end

# ...

The dictionary reading process starts in the last method, load_wordlist(). This method slurps the file, discards illegal characters, normalizes case, breaks up the list on line boundaries, and eliminates repetition all on the third line. Each member of the list of words that creates is then sent on to the add() method. Note the nice use of options.verbose to show a build time.

Before I move on to add(), let me point out that the above method does have a few minor issues. When I was playing with this code to figure out how it works, I fed it a five word dictionary and was surprised when it crashed. The cause? No duplicates. That causes uniq!() to return nil (a hot topic on Ruby Talk lately) and since nil doesn't support an each() call, the code blew up. upcase!() has similar problems.

One more minor issue. Here's a tip: When you normalize case, it's generally better to go down than up. The reason is international support. Some languages distinguish between things like uppercase and titlecase. That means that a bunch of uppercase conversions might not be consistent, based on certain local settings. The best way to avoid such problems is to lowercase content instead. This isn't much of a problem, but it's a good habit to build.

Back to the code. add(), as you can see, is just a shell over add_recursive(). It passes the word on with the root node and a starting position of 0.

add_recursive() is pretty clever. It digs down into the tree until finding the right spot to place the word in the Dictionary. This digging happens at the end of the method with recursive calls. The current letter in the word is examined and a branch of the tree is created to handle that encoded letter, if it didn't already exist. The algorithm then moves to that node, examining the next letter in line. When all the letters have been branched off, we're at the right place to insert the word. The if at the beginning of the method handles that end condition.

The last thing a Dictionary object requires is a way to hunt for words. Here are those methods:

ruby
# ...

private
# Search words and return (in the block) words and the unmatched rest
# of the number
def sub_find(node, number, &block)
# Return words found so far
block[node.words.map{|w|w.dup}, number] unless node.words.empty?
# No more digits, so stop searching here
return node if number.empty?
# Search for longer words
sub_find(node[number[0]], number[1..-1], &block) if node[number[0]]
end

private
# Calculate all allowed skip patterns for a number of a given length
def skips(s, length)
return [s] if length == 0
result = skips(s + [false], length-1)
result.concat(skips(s + [true], length-1)) unless s[-1]
result
end

public

def find_noskip(number)
result = []
sub_find(@root, number) do | words, rest_number |
if rest_number.empty?
result.concat(words)
else
find_noskip(rest_number).each do | sentence |
words.each do | w |
result << w + '-' + sentence
end
end
end
end
result
end

# Skipping makes this a bit ugly
def find(number)
result = []
skips([], number.length).each do | skipped |

# Create the injector that can inject the skipped numbers
# back into the word
injector = []
skipped.zip(number).each_with_index do |(s,n), i|
injector << [n.to_s, i] if s
end

# We search for words built from the unskipped digits
unskipped_digits =
number.zip(skipped).select{|(d, s)| !s}.map{|(d,s)|d}
sentences = find_noskip(unskipped_digits)
# Inject the skipped digits back into the found sentences
sentences.each do | s |
injector.each do | (n, i) | s.insert(i, n) end
end

result.concat(sentences)
end
result
end
end

Start with the sub_find() method. It's the key to the search and easy enough to digest. sub_find() takes a node to search, the number to use in that search, and a block to pass results to. The first line passes all matching words from this node to the block, if there are any. The root node, where the algorithm begins, won't have any since most dictionaries don't include 0 length words. The second line finishes the process, if we've examined all the numbers. The third line recurses, moving to the node for the next digit at the head of the number variable. That's half of the picture.

The find_noskip() method is the public face for that. It calls sub_find(), passing a block of code that fills the local results Array as matches are found. When a word matches in the number, find_noskip() recurses looking other words to finish off the number. Of course, as the name implies, this version of the process does not skip digits.

For skipping, you need the find() method. find() first calls skip() to calculate all possible skip patterns for this number. Then, one skip at a time, find() removes the skipped digits and calls find_noskip() on the remainder. After results are generated, the skips are reinserted back into their original locations. That's pretty tricky.

To be clear, this does not function as I intended the quiz to work (and I now understand why Brian thinks I allowed "bad results"). Numbers were only to be allowed at word boundaries, while Brian's algorithm will reinsert them into the middle of words. Looking back, I did not make this very clear in the quiz and it's certainly my error. Brian's code is still a very nice implementation of his interpretation of the rules.

Finally, there's an interface that puts all that code to use. Let's look at that:

ruby
encodings = {
:james => {
2 => 'A B C',
3 => 'D E F',
4 => 'G H I',
5 => 'J K L',
6 => 'M N O',
7 => 'P Q R S',
8 => 'T U V',
9 => 'W X Y Z'},

:logic => {
0 => 'A B',
1 => 'C D',
2 => 'E F',
3 => 'G H',
4 => 'I J K',
5 => 'L M N',
6 => 'O P Q',
7 => 'R S T',
8 => 'U V W',
9 => 'X Y Z'
}
}

require 'optparse'

class PhonewordOptions < OptionParser
attr_reader :dictionary, :encoding, :format, :allow_skips, :help,
:encoding_help, :verbose, :min_length
def initialize
super()
@dictionary = '/usr/share/dict/words'
@encoding = :james
@format = :plain
@allow_skips = true
@help = false
@encoding_help = false
@verbose = false
@ignore_non_alpha = false
@min_length = 1
self.on("-d", "--dictionary DICTIONARY", String) { | v |
@dictionary = v
}
self.on("-e", "--encoding ENCODING", String,
"How the alphabet is encoded to phonenumbers. " +
"james or logic are supported.") { | v |
@encoding = v.downcase.to_sym
}
self.on("-p", "--plain",
'One result per found number, ' +
'no other information. (Default)') { @format = :plain }
self.on("-f", "--full", 'Prefix the result with the number') {
@format = :full
}
self.on("-v", "--verbose", 'Make more noise') { @verbose = true }
self.on("-s", "--skips", "--allow_skips", "--allow-skips",
'Allow to skip one adjacent number while matching. (Default)',
'Gives lots of ugly results, but james asked for it.') {
@allow_skips = true
}
self.on("-c", "--no-skips",
"Don't leave numbers in the detected words") {
@allow_skips = false
}
self.on("-m" "--min-length", "Minimum length of accepted words.",
"Use this to ignore one-letter words that make " +
"the output quite uninteresting.", Integer) { | v |
@min_length = v
}
self.on("-?", "--help") { @help = true }
self.on("--supported-encodings", "--encoding-help",
"List the supported encodings") { @encoding_help = true }
end
end

options = PhonewordOptions.new
options.parse!(ARGV)

if options.help
puts options
exit
end

if options.encoding_help or !encodings[options.encoding]
puts "Possible encodings:"
puts encodings.to_a.sort_by{|(k,v)|k.to_s}.map{|(k,v)|
"#{k}:\n"+v.map{|(n,e)|" #{n}: #{e}"}.sort.join("\n")
}
exit
end

dictionary = Dictionary.new(encodings[options.encoding]).load_wordlist(
File.open(options.dictionary), options
)

output = {
:plain => lambda do | number, sentence | sentence end,
:full => lambda do | number, sentence |
"#{number.ljust(15)}: #{sentence}"
end
}

method = {true => :find, false => :find_noskip }

ARGF.each do | number |
number.strip!
number = number.gsub(/[^0-9]/, '').unpack('C*').map{|n|n - ?0}
$stderr.puts "Searching for #{number}" if options.verbose
dictionary.send(
method[options.allow_skips], number
).each do | sentence |
puts output[options.format][number, sentence]
end
end

Most of that code is option handling. Brian creates his own PhonewordOptions object, which inherits from OptionParser. In the setup for that object, defaults are established and option parsing in defined with several calls to on(). From there, reader methods are provided for all the defined options. This makes for a pretty self-contained bundle of option parsing and reading. You can see the options object put to good use, after the class.

That final block is what actually kicks off the program. Each number is read from ARGF, cleaned up, and passed to the find methods of the dictionary object. Results from that find are printed, creating a complete solution.

My thanks go out to all three quiz workers. It was nice to have a few people playing along again.

Tomorrows quiz will stay with the topic of phones and how we use them...