Hangman (#130)

by Brian Candler

Most people are probably familiar with the game of Hangman. The first player picks a word or phrase, and the second player has to guess it a letter at a time. If they make six wrong guesses (i.e. the target word does not contain the guessed letter), they lose. If they guess the entire word before then, they win.

This quiz is to make a Hangman guessing player in Ruby. Play should proceed as follows:

1. The program requests a word or phrase pattern, e.g. "-------".

2. The program suggests a letter, or may guess the entire word or phrase.

3. The user indicates which letter positions, if any, match that letter.
If none match, a life is lost. If six (or configurable) lives are lost,
the program loses.

The specification is otherwise open-ended to allow you to focus on whatever part of the problem interests you. For example:

* You may choose the form of user interface (e.g. terminal, GUI toolkit,
web).

* You can just show the number of wrong guesses made, or you can actually
draw the hangman.

* You may concentrate on improving the play, for example by using a
dictionary to improve the guesses made at each stage. A suitable file
is /usr/share/dict/words on many Linux systems.

* A dynamic solution could start with an empty dictionary, and guess the
answer by chance. If it fails, it would prompt the user for the word
or phrase they were thinking of. It would add new words or phrases to
its dictionary so as to become a better player over time.

* You could investigate ways of precomputing a hangman decision tree,
optimizing it for the minimum number of wrong guesses along each branch.
The aim is to produce an unbeatable guesser for a given dictionary.

* You may wish to consider how best to decouple the UI from the guessing
logic, to enable different UI's to work with the same guessing engine,
or vice versa.


Quiz Summary

People wrote quite a bit of code for this quiz. I'm guessing that's because it can be challenging to see how your code is doing and to refine that.

I'm going to show my own solution this time, not because it's any better than the others, but because I can talk you through my thinking in building it.

The first thing in my code is a tiny library that handles dictionary loading for the guessing script and the test script. I extracted this common functionality when I built my testing script:

ruby
#!/usr/bin/env ruby -wKU

WORDS_CASH_FILE = "words.cache"

if File.exist? WORDS_CASH_FILE
WORDS = File.open(WORDS_CASH_FILE) { |file| Marshal.load(file) }
else
WORDS = File.open( ARGV.find { |arg| arg =~ /\A[^-]/ } ||
"/usr/share/dict/words" ) do |dict|
dict.inject(Hash.new) do |all, word|
all.update(word.delete("^A-Za-z").downcase => true)
end.keys.sort_by { |w| [w.length, w] }
end
File.open(WORDS_CASH_FILE, "w") { |file| Marshal.dump(WORDS, file) }
end

There's nothing too fancy here. Originally I just reread the dictionary every time, but that slowed down testing too much and I found caching it with Marshal helped. I also started sorting the dictionary, eventually, to make it easier to see where my testing was, "I've tried everything beneath four letters and I'm in the K's on those."

On to the guessing script. Here's how that begins:

ruby
#!/usr/bin/env ruby -wKU

puts "One moment..."
puts
require "words"

def frequency(words)
freq = Hash.new(0)
words.each do |word|
word.split("").each { |letter| freq[letter] += word.count(letter) }
end
freq
end
FREQ = frequency(WORDS).sort_by { |_, count| -count }.
map { |letter, _| letter }

choices = WORDS
guesses = Array.new

# ...

This code prints a warning that loading the dictionary might take a moment, then requires the mini-library I just showed. Once loaded, it defines a method for counting letter frequencies and runs it on the dictionary.

After that it initializes some state variables for the guessing. The script decreases the word list as it works, so that is copied into a variable to reflect that it will change. We then setup a way to keep track of our guesses.

The next bit of code begins the main event loop:

ruby
# ...

loop do
puts guesses.empty? ?
"Please enter a word pattern (_ _ _ _ for example):" :
"Please update your pattern according to my guess " +
"(_ i _ _ for example):"
$stdout.flush
pattern = $stdin.gets.to_s.delete("^A-Za-z_")

# ...

This code just asks the user for a pattern and retrieves what they give us. The call to flush() is to support the test script, which we will see in a bit.

ruby
# ...

bad_guesses = guesses - pattern.delete("_").split("")
if bad_guesses.size > 5 and pattern.include? "_"
puts "I'm out of guesses. You win."
elsif not pattern.include? "_"
puts "I guessed your word. Pretty smart, huh?"
else
choices = choices.grep(
bad_guesses.empty? ?
/\A#{pattern.tr("_", ".")}\Z/i :
/\A(?!.*[#{bad_guesses.join}])#{pattern.tr("_", ".")}\Z/i
)
guess = frequency(choices).
reject { |letter, _| guesses.include? letter }.
sort_by { |letter, count| [-count, FREQ.index(letter)] }.
first.first rescue nil

guesses << guess
puts "I guess the letter '#{guess}'."
puts
next
end

# ...

This is the main body of this guessing script. The first two branches just check for win and loss conditions, but the else branch is where the logic is.

Before a selection is made, the dictionary is reduced by all words that couldn't possibly match. After that, a frequency count is made for all remaining possibilities. Those possible letters are then sorted by that count and how common they are in this dictionary. The letter that bubbles to the top is our choice.

This selection process works decently on larger words. It really starts to show decent success rates at words four letters and longer. Sadly, it fairs poorly with smaller words.

Once chosen a guess is printed for the user and the loop cycles to the next iteration.

When we fall through the above code we have reached an end condition. The following code sorts that out:

ruby
# ...

puts
if ARGV.include? "--loop"
choices = WORDS
guesses = Array.new
else
break
end
end

Here I watch for a special command-line switch that puts the program into a continuous loop for testing. This avoids reloading the dictionary and thus is faster.

I landed on my word choice strategy mainly by trial and error. That was made possible by being able to run the algorithm over some words and watching how accurately it could pick them. I did that by writing a simple script that pretend to be me. It runs the game, feeds it words, and keeps score of its progress. Here's the start of that code:

ruby
#!/usr/bin/env ruby -wKU

require "words"

results = Hash.new(0)
at_exit do
results[:total] = results[:right] + results[:wrong]
puts
puts "Words: #{results[:total]}"
puts "Guessed: #{results[:right]}"
puts "Missed: #{results[:wrong]}"
printf "Accuracy: %.2f%%\n",
results[:right] / results[:total].to_f * 100
puts
end
trap("INT") { exit }

# ...

Here I load the dictionary using the same code the guesser relies on. Then I build a Hash to hold guess counts and setup result printing for when the program exits. This allows me to cancel test runs at anytime, but still see results thus far. That was nice for spot checking results. Sometimes I could tell right away that a change was better or worse.

Here's the actual testing code:

ruby
# ...

IO.popen( File.join(File.dirname(__FILE__), "hangman.rb --loop"),
"r+" ) do |hangman|
WORDS.each do |word|
pattern = word.tr("a-z", "_")
loop do
input = String.new
hangman.each do |line|
input << line
break if input =~ /^(?:I'm out|I guessed)|:\Z/
end

if input =~ /^I'm out/
puts "It missed '#{word}'."
results[:wrong] += 1
break
elsif input =~ /^I guessed/
puts "It guessed '#{word}'."
results[:right] += 1
break
elsif input =~ /^I guess the letter '(.)'/
guess = $1
word.split("").each_with_index do |letter, i|
pattern[i, 1] = letter if letter == guess
end
end

hangman.puts pattern
end
end
end

This code just runs the guessing script just as a human would do. It feeds the script words and watches the responses to see what happens. You can think of this process as a poor man's expect.

Many of the other solutions achieved better results than my own, so be sure to look through them. Some were quite long but most had decent documentation about their process.

My thanks to all who found the time to dig into this problem.

Tomorrow we have a classic computer science exercise and it's an easy one, so warm up your impress-people-with-a-concise-solution skills...