English Numerals (#25)

by Timothy Byrd

While we normally write numbers using Arabic (or since Quiz #22, Roman) numerals, numbers can also be written out as English phrases.

For example:

7 == seven (the hard way)
42 == forty-two (a very important number)
2001 == two thousand and one (a space odyssey)
1999 == (party like it's) nineteen hundred and ninety-nine

So the quiz is a problem from a Pi Mu Epsilon (US national math club) newsletter:

"When the integers 1 to 10_000_000_000 are written in the English language, then sorted as strings, which odd number appears first in the list?"

Your mission, should you choose to accept it, is to:

- Create Ruby code to translate a number to it's English language form. (My sample version works with integers up to 10**72-1.)

- Determine programmatically which odd number in 1..10_000_000_000 would sort first if written in English. (Brute force is the obvious solution, but the computer may have to think about it...)

- Would the answer change for a larger range of values, say 10**30?

- Do French and German Rubyists get a different answer than the Americans?

Extra credit:

-Add a Pinyin translator.


Quiz Summary

by Timothy Byrd

When writing numbers using Arabic numerals, we customarily group them into phrases or clauses of three digits each. These clauses are identical and are separated by scaling factors such as 'thousand' and 'billion'. All the solutions took advantage of this repetition, though I'd given irregular examples such as 'twelve hundred' or using 'and' to separate the final tens and ones from the rest of the number.

Most of the solutions (though not mine, infamously) extended the integer class to have a method for translating the number to English.

Eliah Hecht submitted the first solution. He extended the Integer with a to_en method. Unlike the others, he only had an array for the numbers 1..9 and defined methods to build items like seventeen or sixty from their component syllables. His general method was to convert the number to a string and split off the initial clause, generate the English form of that with a degree suffix and concatenate the result of a recursive call on the remaining part of the number.

He put in the 'and's, and his reasoned out answer of "eight billion and eighty-five" was - I'm chagrined to say - more correct than my "eight billion eight hundred and eighty-five". (Including 'and' really screws up a recursive or iterative approach - or at least my approach - to building the number, since you don't include the 'and' without something in front of it that you may want to replace later.)

Glenn Parker gave the next solution, noting: "I'm afraid I could not bring myself to code up some random ill-defined method of expressing numbers in English, so I did it the way I was taught in school, using hypens and absolutely no "ands" or commas. I think I've got Strunk & White on my side. Regardless, I'll be somewhat surprised if my answer comes out very different from others."

A spirited defense of the purity of the English language. And his thoughts on finessing around the brute force approach are worth quoting:

'Every number from 1 to 10**10 can be expressed in English as a concatenation of one or more independent phrases. Each phrase expresses the value of a triplet of numerals from the original number. For example, 56_106_021 uses three phrases: "fifty-six million" for the millions triplet (56), "one hundred six thousand" for the thousands triplet (106), and "twenty-one" for the ones triplet (021). I don't allow "twelve hundred" for 1200, but I think it would only complicate the logic slightly to handle this.

'I proceed to find the lowest possible phrase for each of the four triplets in the range of interest. It must be possible to express the lowest number name using a subset of the lowest possible phrases for each triplet. And since the desired number must be odd, valid subsets will all end with the ones triplet phrase.

'The ones triplet range has only 500 phrases: the odd numbers from 1 to 999. The thousands triplet has 1000 phrases: the multiples of 1000 from 1000 to 999_999. Likewise, the millions triplet has 1000 phrases. The billions triplet has only ten phrases because we stop caring after 10_000_000_000 (instead of going to 999_999_999_999). Finding the four lowest phrases for each triplet requires examining only 2510 numbers. There are only eight combinations of the resulting four phrases that represent an odd number.

'The result: "eight billion eight hundred eight million eight hundred eight thousand eight hundred eighty-five".'

His code is concise. He effectively splits up the number into clauses or phrases in a single line by splitting it into an array of individual digits, and then processes them by grabbing three at a time from the bottom.

Matthew D Moss did a Japanese translator in addition to the English one. He implemented his Integer.to_spoken to accept a translator as an optional argument - used as a function object.

Dave Burt used code from "[his] translation of the Perl modules Number::Spell and Lingua::EN::Numericalize (see CPAN)". He also comments:

'My answer is that "a baker's dozen" comes immediately before "a billion, a hundred and eight thousand, a hundred and eighty five". I haven't yet enhanced my program to give me this result :)'

Patrick Hurley applied some logic to his solution, pruning his search space down to: "numbers ending in 5 and having a combination of 0 and 8's".

Nikolai Weibull commented "My Lisp module proves yet again to be capable of answering a Ruby Quiz".

As for me, I discovered how horribly slow my solution was. I was able to improve it by an order of magnitude - from just over five seconds to just over half a second. Most of the gain was from changing one routine:

ruby
def self.to_English(val, eu_names = true, include_and = true)
v = val.to_i.abs
return "zero" if v == 0
include_and = false if v <= 100
exp_hash = eu_names ? EurExponents : AmExponents

a = []
(a.push v % 1000; v /= 1000) while v > 0

r = ''

# allow for numbers like 'twelve hundred'
#
if a[1] == 1 and a[0] >= 100 then
a[1] = 0
a[0] += 1000
end

a.each_with_index {|obj, i|
next if obj == 0
r = "#{to_English_base(obj, include_and && \
i == 0)} #{exp_hash[i]}"
.strip + " #{r}"
}

r.strip
end

I used to counting down through the entire exponent table for each number testing if the number were larger than 10**exponent. I now chop the number up into three digit chunks and iterate up through the chunks. (Note that I've also changed all my hashes to arrays, but they only helped by a few percentage points.) At first I thought this code wouldn't support numbers like 'twelve hundred', but it turn out to take just a few lines. It's a textbook example that one well-placed optimization at the end can make a world of difference.