Panagrams (#86)

by Darren Kirby

One thing that interests me are word puzzles and language oddities. One such example is the self-documenting panagram. If a panagram is a sentence that uses every letter in the alphabet, then a self-documenting panagram is a sentence that enumerates its own letter count. Simple enough, but what if we state that the letter count must be spelled ie: 'twenty-seven' instead of '27'. Now we have a challenge.

A while back I wrote a script in Python that finds these sentences. Today I rewrote it in Ruby and it found me this sentence:

Darren's ruby panagram program found this sentence which contains exactly
nine 'a's, two 'b's, five 'c's, four 'd's, thirty-five 'e's, nine 'f's,
three 'g's, nine 'h's, sixteen 'i's, one 'j', one 'k', two 'l's, three 'm's,
twenty-seven 'n's, fourteen 'o's, three 'p's, one 'q', fifteen 'r's,
thirty-four 's's, twenty-two 't's, six 'u's, six 'v's, seven 'w's, six 'x's,
seven 'y's, and one 'z'.

My script does have its problems, and I would love to see what kind of code the Ruby experts could come up with to find self-documenting panagrams.

There is a lot more info on self-documenting panagrams at this address:


Quiz Summary

First, let me clear up the naming issue, since I missed it when created the quiz. The actual term for sentences that contain all the letters of the alphabet is pangram (or Swallowsgram) as discussed in the linked article. The quiz name is in error.

Now that we know what to call them, the question becomes how do we generate self documenting pangrams? The linked article described a technique called "Robbinsoning," which is a simple process. The idea is that you start with some random distribution of letter counts, build the sentence, adjust the counts to reflect the actual sentence counts, rebuild, adjust, etc. You can zero in on a solution in this fashion and most of the submitted solutions used something along these lines.

I want to have a look at Danial Martin's code below, but before we get into that you need to know how Daniel's code tracks letter frequencies. Here's Daniel's own description of the technique:

# I represented the letter frequencies of letters in a sentence
# as one huge bignum, such that if "freq" was a variable containing
# the number, then "freq & 0xFF" would be the number of "a"s in the
# sentence, "(freq>>8) & 0xFF" would be the number of "b"s, etc.
# This means that when I adjust a guess, changing the actual frequency
# is as simple as adding and subtracting from a single variable.

Now that we know what to expect, here's the first bit of solution code (reformatted slighty):

def find_sentence(prefix, suffix, initial = {})
letterre ='(?i:[a-z])');
letters = ('a'..'z').to_a
letterpower = {|h,k| h[k] = 1 << ((k[0]-?a)*8)}
lettershifts = {|x| ((x[0]-?a)*8)}
basesentence = prefix + {|x|
(x == 'z'? 'and ' : '') + "_ '#{x}'"
}.join(', ') + suffix
basefreq = 0
basesentence.scan(letterre) {|x| basefreq += letterpower[x.downcase]}

# ...

Obviously, we have a lot of setup work here. Let's take it line by line, because there's a lot going on. This method is invoked with a sentence prefix and suffix, and optionally the initial letter counts to try. When invoked, the first line defines a letter Regexp that will match individual letters in upper or lower case. The next line generates an Array of letters the code can iterate over.

The next two variables are helpers for working with the Bignum frequencies. letterpower will give you the number needed to add one count for the keyed letter to the frequencies and letter shifts is the offset a given letter is shifted into the Bigum. (Note that letterpower calculates its own shift, in the same way lettershifts does.)

The next three lines are easier to swallow. First, the sentence is constructed using the prefix, placeholders for counts, the word "and" as needed, and the sentence suffix. The next two lines then calculate the letter frequencies of this baseline using the Regexp to iterate over the letters and letterpower to adjust the count.

Let's tackle the next chunk of code:

# ...

# enfreq holds the letter counts that spelling out that number adds to
# the sentence.
# E.g. enfreq[1] == letterpower['o'] + letterpower['n'] + letterpower['e']
enfreq = {|h,k|
if k > 255 then
h[k] = h[k >> 8]
h[k] = 0
k.to_en.scan(letterre) {|x| h[k] += letterpower[x.downcase]}
h[k] += letterpower['s'] if k != 1

# ...

This Hash is a typical memoization idiom in Ruby. Give the Hash the code to calculate values from keys which it will invoke on the first call, then all future calls use a simple Hash lookup. The lookup is much faster of course, since it doesn't need to rerun the code to build it.

In this case, the code builds letter counts, to add to the overall frequency counts, for the English word equivalents to the passed number. The else statement is where that happens. The process is basically what we saw for counting the base sentence frequency before. Note that the code accounts for the s needed, should the count be plural.

The to_en() call in this code is provided by Glenn Parker's solution to Ruby Quiz #25. I've discussed that code multiple times now, so I left it out of this summary for brevity.

Time for a little more code (minus a not needed variable removed by me):

# ...

guessfreq = 0
guessfreq += (initial[x]||0) * letterpower[x]
guessfreq = basefreq if guessfreq == 0
actualfreq = 0

# ...

Here we have the last bit of all this setup work. First, an initial guessfreq is built for the letters based on the passed Hash. If no starting point was given, the sentence uses the basefreq calculated earlier. Finally, a variable is allocated to hold the actualfreq of the generated sentence.

OK, grab a deep breath and let's finally tackle the actual guessing loop that zeros in on solutions (debugging code and comments removed by me):

# ...

until guessfreq == actualfreq do
if actualfreq > 0 then
lettershifts.each{ |y|
g = 0xFF & (guessfreq >> y)
a = 0xFF & (actualfreq >> y)
if (g != a)
d = (g-a).abs
r1 = rand(d+1)
r2 = rand(d+1)
r1=r2 if r1 < r2
r1=-r1 if a<g
if (r1 != 0) then
guessfreq += r1 << y
actualfreq += enfreq[g+r1] - enfreq[g]
actualfreq = basefreq
lettershifts.each {|y|
g = 0xFF & (guessfreq >> y)
actualfreq += enfreq[g]

# ...

This code cycles until our latest guess matches the actual count for the sentence. On the first pass actualfreq will be zero, so the else clause is executed. This sets actualfreq to the baseline and adds in our guess. After that, each iteration should hit the if branch.

Each time through the if branch, every letter is compared for a distance from its guess value and its actual value. A random number is selected (well two actually with the high one favored) and added to our guess. Then the actual is updated to reflect the change. The net effect is that they close in on each other until our guess matches reality.

When they match, the final sentence can be constructed:

prefix + ('a'..'z').map {|x|
g = (guessfreq >> ((x[0]-?a)*8))%256
(x == 'z'? 'and ' : '') + "#{g.to_en} '#{x}'" +
(g==1 ? '' : 's')}.join(', ') + suffix

That works like the original sentence construction, but real numbers instead of actual placeholders this time. That's returned as our final result.

Here's the code that starts the process, passing the initial prefix and suffix:

puts find_sentence(
"Daniel Martin's sallowsgram program produced a sentence with ", "."

Interestingly, it seems that some inputs never resolve to a solution. I have not investigated this too deeply, but multiple quiz solvers reported the same issue.

My thanks to all the clever solvers who found all those pangrams so quickly. I learned a lot from reading the solutions, including how to use NArray from Simon Kroeger (worth a look).

Tomorrow's quiz has us inventing time travel for Ruby, so the summary for that could show up at any moment now...