egrep – Dr John's Tech Talk

Solution to this week’s NPR puzzle using simple Linux commands

aaronic aarrghh abacate abacaxi abacist abactor abaculi abaddon abadejo abaised abaiser abaisse abalone abandon abandum abasers abashed abashes abasias abasing abatage abaters abating abators abattis abattue abature abaxial ...

abilities abnormity aboiteaus aboiteaux abolition abrazitic absurdity academite acanthite accipiter acclivity accredits accubitum accubitus acetosity acidities acinacity aconitine aconitums acquisita ...

academe academes Pleuracanthea vacanthearted vacantheartedness aconine japaconine pseudaconine adderspit affidavit affidavits affidavy preaffidavit divagating extravagating indagating propagating self-propagating ...

academe aconine alumine alveole ammonic barbary barbone basting bauxite berline bethank boraces bullion capella carbone carmele carmine cascade catline cavated celeste ceruses chloric chondre chromes cations claries cockney coenobe compose ...

Solving this week’s NPR weekend puzzle with a few Linux commands

Intro
I listen to the NPR puzzle every Sunday morning. I’m not particularly good at solving them, however – I usually don’t. But I always consider if I could get a little help from my friendly Linux server, i.e., if it lends itself to solution by programming. As soon as I heard this week’s challenge I felt that it was a good candidate. I was not disappointed…

The details
So Will Shortz says think of a common word with four letters. Now add O, H and M to that word, scramble the letters to make another common word in seven letters. The words are both things you use daily, and these things might be next to each other.

My thought pattern on that is that, great, we can look through a dictionary of seven-letter words which contain O, H and M. That already might be sufficiently limiting.

This reminded me of using the built-in Linux dictionary to give me some great tips when playing Words with Friends, which I document here.

In my CentOS my dictionary is /unix/share/dict/linux.words. It has 479,829 words:

$ cd /usr/share/dict; wc linux.words

That’s a lot. So of course most of them are garbagey words. Here’s the beginning of the list:

$ more linux.words

1080
10-point
10th
11-point
12-point
16-point
18-point
1st
2
20-point
2,4,5-t
2,4-d
2D
2nd
30-30
3-D
3-d
3D
3M
3rd
48-point
4-D
4GL
4H
4th
5-point
5-T
5th
6-point
6th
7-point
7th
8-point
8th
9-point
9th
-a
A
A.
a
a'
a-
a.
A-1
A1
a1
A4
A5
AA
aa
A.A.A.
AAA
aaa
AAAA
AAAAAA
...

You see my point? But amongst the garbage are real words, so it’ll be fine for our purpose.

What I like to do is to build up to increasingly complex constructions. Mind you, I am no command-line expert. I am an experimentalist through-and-through. My development cycle is Try, Demonstrate, Fix, Try Demonstrate, Improve. The whole process can sometimes be finished in under a minute, so it must have merit.

First try:

$ grep o linux.words|wc

 230908  230908 2597289

OK. Looks like we got some work to do, yet.

Next (using up-arrow key to recall previous command, of course):

$ grep o linux.words|grep m|wc

  60483   60483  724857

$ grep o linux.words|grep m|grep h|wc

  15379   15379  199724

Drat. Still too many. But what are we actually producing?

$ grep o linux.words|grep m|grep h|more

abbroachment
abdominohysterectomy
abdominohysterotomy
abdominothoracic
Abelmoschus
abhominable
abmho
abmhos
abohm
abohms
abolishment
abolishments
abouchement
absmho
absohm
Acantholimon
acanthoma
acanthomas
Acanthomeridae
acanthopomatous
accompliceship
accomplish
accomplishable
accomplished
accomplisher
accomplishers
accomplishes
accomplishing
accomplishment
accomplishments
accomplisht
accouchement
accouchements
accroachment
Acetaminophen
acetaminophen
acetoamidophenol
acetomorphin
acetomorphine
acetylmethylcarbinol
acetylthymol
Achamoth
achenodium
achlamydeous
Achomawi
...

Of course, words with capitalizations, words longer and shorter than seven letters – there’s lots of tools left to cut this down to manageable size.

With this expression we can simultaneously require exactly seven letters in our words and require only lowercase alphabetical letters: egrep ′^[a-z]{7}$′. This is an extended regular expression that matches the beginning (^) and end ($) of the string, only characters a-z, and exactly seven of them ({7}).

With that vast improvement, we’re down to 352 entries, a list small enough to browse by hand. But the solution still didn’t pop out at me. Most of the words are obscure ones, which should automatically be excluded because we are looking for common words. We have:

$ grep o linux.words|grep m|grep h|egrep ′^[a-z]{7}$′|more

achroma
alamoth
almohad
amchoor
amolish
amorpha
amorphi
amorphy
amphion
amphora
amphore
apothem
apothgm
armhole
armhoop
bemouth
bimorph
bioherm
bochism
bohemia
bohmite
camooch
camphol
camphor
chagoma
chamiso
chamois
chamoix
chefdom
chemizo
chessom
chiloma
chomage
chomped
chomper
chorism
chrisom
chromas
chromed
chromes
chromic
chromid
chromos
chromyl
...

So I thought it might be inspiring to put the four letters you would have if you take away the O, H and M next to each word, right?

I probably ought to use xargs but never got used to it. I’ve memorized this other way:

$ grep o linux.words |grep m|grep h|egrep ′^[a-z]{7}$′|while read line; do
> s=`echo $line|sed s/o//|sed s/h//|sed s/m//`
> echo $line $s
> done|more

sed is an old standard used to do substitutions. sed s/o// for example is a filter which removes the first occurrence of the letter O.

I could almost use the tr command, as in

> …|tr -d ′[ohm]′

in place of all those sed statements, but I couldn’t solve the problem of tr deleting all occurrences of the letters O, H and M. And the solution didn’t jump out at me.

So until I figure that out, use sed. That gives:

achroma acra
alamoth alat
almohad alad
amchoor acor
amolish alis
amorpha arpa
amorphi arpi
amorphy arpy
amphion apin
amphora apra
amphore apre
apothem apte
apothgm aptg
armhole arle
armhoop arop
bemouth beut
bimorph birp
bioherm bier
bochism bcis
bohemia beia
bohmite bite
camooch caoc
camphol capl
camphor capr
chagoma caga
chamiso cais
chamois cais
chamoix caix
chefdom cefd
chemizo ceiz
chessom cess
chiloma cila
chomage cage
chomped cped
chomper cper
chorism cris
chrisom cris
chromas cras
chromed cred
chromes cres
chromic cric
chromid crid
chromos cros
chromyl cryl
...

Friday update
I can now reveal the section listing that reveals the answer because the submission deadline has passed. It’s here:

...
schmoes sces
schmoos scos
semihot seit
shahdom sahd
shaloms sals
shamalo saal
shammos sams
shamois sais
shamoys says
shampoo sapo
shimose sise
shmooze soze
shoeman sean
sholoms slos
shopman span
shopmen spen
shotman stan
...

See it? I think it leaps out at you:

shampoo sapo

becomes of course:

soap
shampoo

They’re common words found next to each other that obey the rules of the challenge. You can probably tell I’m proud of solving this one. I rarely do. I hope they don’t call on me because I also don’t even play well against the radio on Sunday mornings.

Conclusion
Now I can’t give out the answer right now because the submission deadline is a few days from now. But I will say that the answer pretty much pops out at you when you review the full listing generated with the above sequence of commands. There is no doubt whatsoever.

I have shown how a person with modest command-line familiarity can solve a word problem that was put out on NPR. I don’t think people are so much interested in learning a command line because there is no instant gratification and th learning curve is steep, but for some it is still worth the effort. I use it, well, all the time. Solving the puzzle this way took a lot longer to document, but probably only about 30 minutes of actual tinkering.