Scrabble Stems (#12)

by Martin DeMello

In Scrabble parlance, a 'bingo stem' is defined as a set of six letters that combines with a large fraction of the alphabet to anagram to valid seven letter words (a 'bingo' is a move using all seven tiles on your rack). For instance, one of the more prolific stems, SATIRE, combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N, O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST ...).

Write a program that, given a word list and a cutoff n, finds all 6 letter stems that combine with n or more letters, sorted in order of how many distinct letters they combine with.


Quiz Summary

This quiz wasn't too tough compared to last week's quiz, but I think that's a plus. It's good to have an easy problem now and then.

What makes this problem simple is the usual algorithm tradeoff, we can sacrifice memory (in some cases it's speed, or even both) for an easy to code solution. There are a lot of stems, but we can generate them all and store them with 50 MBs of RAM, which is not too rare these days.

All submitted solutions made this trade. Here's an easy to follow version by Carlos:

ruby
DICT = "/usr/share/dict/words"
CUTOFF = ARGV[0].to_i

STEMS = {}

File.open(DICT) do |f|
f.each do |word|
word.chomp!
next if word.length != 7
word.downcase!
letters = word.split(//).sort!
uniques = letters.uniq
word = letters.join
uniques.each do |letter|
stem = word.sub(letter, "")
(STEMS[stem] ||= {})[letter] = 1
end
end
end

result = STEMS.delete_if { |k,v| v.size < CUTOFF }.
sort_by { |k,v| v.size }.
reverse!.
collect! { |k,v| [k, v.size] }

result.each do |stem, combining| puts "#{stem} #{combining}" end

The code starts by setting up a DICTIONARY, CUTOFF, and a place to store the STEMS we find.

After that, we have a line by line read of the dictionary, which is where most of the work is done. The code inside "each" drops the newline, checks to make sure we only keep seven letter words, and normalizes case. The next step is to split the words into letters and sort them to ease comparisons. Then we generate all the unique letters and remove those from the word one at a time to find all the stems, adding each of those to STEMs.

At this point we've found all the stems in the dictionary for all seven letter words. The next chunk of code removes results below the cutoff we wanted, sorts the results and prepares them for display.

The final line of the program writes out the results, line by line.

Of course, we can't always afford to sacrifice the RAM. And this problem could be solved without as much memory. The trick for that is to generate and verify stems one at a time. This might be needed if the search space was larger.

As always, the solutions were varied and educational. Some highlights:

Glenn Parker's solution focuses on the more general problem
of generating stems, instead of just the Scrabble usage of
finding bingos.

Jannis Harder submitted a solution for finding "nonsense"
stems, not in the dictionary.

Dennis Ranke sent in a tricky solution that uses bit shifting
to track stem counts while it works.

But they're all worth a look, I think.

My thanks the stem solvers and to everyone who is helping Ruby Quiz stay so darn interesting. I could never keep up if I wasn't getting such great help from the community.

No quiz this week, as I'll be spending a little time off with the family enjoying the holidays. I hope everyone else can do the same, holiday or no. The quiz will be back next week to kick off a cryptic new year...