Intro
I listen to the NPR puzzle every Sunday morning. I’m not particularly good at solving them, however – I usually don’t. But I always consider if I could get a little help from my friendly Linux server, i.e., if it lends itself to solution by programming. As soon as I heard this week’s challenge I felt that it was a good candidate. I was not disappointed…
The details
So Will Shortz says think of a common word with four letters. Now add O, H and M to that word, scramble the letters to make another common word in seven letters. The words are both things you use daily, and these things might be next to each other.
My thought pattern on that is that, great, we can look through a dictionary of seven-letter words which contain O, H and M. That already might be sufficiently limiting.
This reminded me of using the built-in Linux dictionary to give me some great tips when playing Words with Friends, which I document here.
In my CentOS my dictionary is /unix/share/dict/linux.words. It has 479,829 words:
$ cd /usr/share/dict; wc linux.words
That’s a lot. So of course most of them are garbagey words. Here’s the beginning of the list:
$ more linux.words
1080 10-point 10th 11-point 12-point 16-point 18-point 1st 2 20-point 2,4,5-t 2,4-d 2D 2nd 30-30 3-D 3-d 3D 3M 3rd 48-point 4-D 4GL 4H 4th 5-point 5-T 5th 6-point 6th 7-point 7th 8-point 8th 9-point 9th -a A A. a a' a- a. A-1 A1 a1 A4 A5 AA aa A.A.A. AAA aaa AAAA AAAAAA ... |
You see my point? But amongst the garbage are real words, so it’ll be fine for our purpose.
What I like to do is to build up to increasingly complex constructions. Mind you, I am no command-line expert. I am an experimentalist through-and-through. My development cycle is Try, Demonstrate, Fix, Try Demonstrate, Improve. The whole process can sometimes be finished in under a minute, so it must have merit.
First try:
$ grep o linux.words|wc
230908 230908 2597289 |
OK. Looks like we got some work to do, yet.
Next (using up-arrow key to recall previous command, of course):
$ grep o linux.words|grep m|wc
60483 60483 724857 |
Next:
$ grep o linux.words|grep m|grep h|wc
15379 15379 199724 |
Drat. Still too many. But what are we actually producing?
$ grep o linux.words|grep m|grep h|more
abbroachment abdominohysterectomy abdominohysterotomy abdominothoracic Abelmoschus abhominable abmho abmhos abohm abohms abolishment abolishments abouchement absmho absohm Acantholimon acanthoma acanthomas Acanthomeridae acanthopomatous accompliceship accomplish accomplishable accomplished accomplisher accomplishers accomplishes accomplishing accomplishment accomplishments accomplisht accouchement accouchements accroachment Acetaminophen acetaminophen acetoamidophenol acetomorphin acetomorphine acetylmethylcarbinol acetylthymol Achamoth achenodium achlamydeous Achomawi ... |
Of course, words with capitalizations, words longer and shorter than seven letters – there’s lots of tools left to cut this down to manageable size.
With this expression we can simultaneously require exactly seven letters in our words and require only lowercase alphabetical letters: egrep ′^[a-z]{7}$′. This is an extended regular expression that matches the beginning (^) and end ($) of the string, only characters a-z, and exactly seven of them ({7}).
With that vast improvement, we’re down to 352 entries, a list small enough to browse by hand. But the solution still didn’t pop out at me. Most of the words are obscure ones, which should automatically be excluded because we are looking for common words. We have:
$ grep o linux.words|grep m|grep h|egrep ′^[a-z]{7}$′|more
achroma alamoth almohad amchoor amolish amorpha amorphi amorphy amphion amphora amphore apothem apothgm armhole armhoop bemouth bimorph bioherm bochism bohemia bohmite camooch camphol camphor chagoma chamiso chamois chamoix chefdom chemizo chessom chiloma chomage chomped chomper chorism chrisom chromas chromed chromes chromic chromid chromos chromyl ... |
So I thought it might be inspiring to put the four letters you would have if you take away the O, H and M next to each word, right?
I probably ought to use xargs but never got used to it. I’ve memorized this other way:
$ grep o linux.words |grep m|grep h|egrep ′^[a-z]{7}$′|while read line; do
> s=`echo $line|sed s/o//|sed s/h//|sed s/m//`
> echo $line $s
> done|more
sed is an old standard used to do substitutions. sed s/o// for example is a filter which removes the first occurrence of the letter O.
I could almost use the tr command, as in
> …|tr -d ′[ohm]′
in place of all those sed statements, but I couldn’t solve the problem of tr deleting all occurrences of the letters O, H and M. And the solution didn’t jump out at me.
So until I figure that out, use sed. That gives:
achroma acra alamoth alat almohad alad amchoor acor amolish alis amorpha arpa amorphi arpi amorphy arpy amphion apin amphora apra amphore apre apothem apte apothgm aptg armhole arle armhoop arop bemouth beut bimorph birp bioherm bier bochism bcis bohemia beia bohmite bite camooch caoc camphol capl camphor capr chagoma caga chamiso cais chamois cais chamoix caix chefdom cefd chemizo ceiz chessom cess chiloma cila chomage cage chomped cped chomper cper chorism cris chrisom cris chromas cras chromed cred chromes cres chromic cric chromid crid chromos cros chromyl cryl ... |
Friday update
I can now reveal the section listing that reveals the answer because the submission deadline has passed. It’s here:
... schmoes sces schmoos scos semihot seit shahdom sahd shaloms sals shamalo saal shammos sams shamois sais shamoys says shampoo sapo shimose sise shmooze soze shoeman sean sholoms slos shopman span shopmen spen shotman stan ... |
See it? I think it leaps out at you:
shampoo sapo
becomes of course:
soap
shampoo
!
They’re common words found next to each other that obey the rules of the challenge. You can probably tell I’m proud of solving this one. I rarely do. I hope they don’t call on me because I also don’t even play well against the radio on Sunday mornings.
Conclusion
Now I can’t give out the answer right now because the submission deadline is a few days from now. But I will say that the answer pretty much pops out at you when you review the full listing generated with the above sequence of commands. There is no doubt whatsoever.
I have shown how a person with modest command-line familiarity can solve a word problem that was put out on NPR. I don’t think people are so much interested in learning a command line because there is no instant gratification and th learning curve is steep, but for some it is still worth the effort. I use it, well, all the time. Solving the puzzle this way took a lot longer to document, but probably only about 30 minutes of actual tinkering.
One reply on “Solving this week’s NPR weekend puzzle with a few Linux commands”
[…] Dr John's Tech Talk Technical Discussion of Internet Infrastructure Skip to content HomeSample Page ← My favorite openssl commands Solving this week’s weekend puzzle with a few Linux commands → […]