Categories
Linux Raspberry Pi

Solution to this week’s NPR puzzle using simple Linux commands

Intro
Every now and then the weekend puzzle is particularly amenable to partial solution by use of simple Linux commands. I suspected such was the case for this week’s, and I was right.

The challenge for this week
Take a seven letter word of one syllable, add the consecutive letters “it” somewhere in the middle to create a nine letter word of four syllables.

The Linux command-line method of solution
On a CentOS system there is a file with words, lots of words. It’s /usr/share/dict/linux.words:

$ cd /usr/share/dict; wc linux.words

479829  479829 4953699 linux.words

So, 479829 words! A lot of them are junk words however, but it has the real ones in there too. This file comes from the RPM package words-3.0-17.el6.noarch.

So here’s a sort of stream-of-consciousness of a Unix person solving the puzzle without doing too much work or too much thinking:

How many seven-letter words are there? First what’s an expression that can answer that? I think this is it but let’s check:

$ egrep '^[a‐z]{7}$' linux.words|more

aaronic
aarrghh
abacate
abacaxi
abacist
abactor
abaculi
abaddon
abadejo
abaised
abaiser
abaisse
abalone
abandon
abandum
abasers
abashed
abashes
abasias
abasing
abatage
abaters
abating
abators
abattis
abattue
abature
abaxial
...

OK, that egrep expresison is right. So the seven-letter word count is then:

$ egrep '^[a‐z]{7}$' linux.words|wc ‐l

40230

That’s a lot – too many to eyeball. OK, so how many nine-letter words are there?

$ egrep '^[a‐z]{9}$' linux.words|wc ‐l

50748

Wow, even more.

OK, we have an idea, based not on what may be the best approach, but based on which Linux commands we know inside and out. The idea is to start from the nine-letter words which contain “it”, remove the “it” and then match the resulting seven-letter character strings against our dictionary to see which are actually words. We know how to do that. The hope is the resulting list will be small enough we can review by hand.

How many nine-letter words contain the consecutive characters “it”?

$ egrep '^[a‐z]{9}$' linux.words|grep it|wc ‐l

3245

They look like this:

abilities
abnormity
aboiteaus
aboiteaux
abolition
abrazitic
absurdity
academite
acanthite
accipiter
acclivity
accredits
accubitum
accubitus
acetosity
acidities
acinacity
aconitine
aconitums
acquisita
...

so it would take forever to go through. If we had a dictionary with the syllable count we coul really narrow it down. I think I’ve seen that, but I’d have to dig that up. We introduce the sed operator to remove the “it” from these words:

$ egrep '^[a‐z]{9}$' linux.words|grep it|sed 's/it//'|more

abilies
abnormy
aboeaus
aboeaux
abolion
abrazic
absurdy
academe
acanthe
acciper
acclivy
accreds
accubum
accubus
acetosy
acidies
acinacy
aconine
aconums
acquisa
...

There are more efficient ways to loop through these results using xargs, but I’m old school and have memorized this older construct which I use:

$ egrep '^[a‐z]{9}$' linux.words|grep it|sed 's/it//'|while read line; do
> grep $line linux.words >> /tmp/lw
> done

We look at the resulting file and found we made a little goof – we didn’t limit the resulting matches to seven characters:

$ more /tmp/lw

academe
academes
Pleuracanthea
vacanthearted
vacantheartedness
aconine
japaconine
pseudaconine
adderspit
affidavit
affidavits
affidavy
preaffidavit
divagating
extravagating
indagating
propagating
self-propagating
...

But that’s easily corrected:

$ cd /tmp; egrep '^[a‐z]{7}$' lw > lw2
$ wc -l lw2

376 lw2

Now that’s a number we can review by hand. Very few of these have only one syllable:

$ more lw2

academe
aconine
alumine
alveole
ammonic
barbary
barbone
basting
bauxite
berline
bethank
boraces
bullion
capella
carbone
carmele
carmine
cascade
catline
cavated
celeste
ceruses
chloric
chondre
chromes
cations
claries
cockney
coenobe
compose
...

I quickly reviewed the list and the answer popped out, somewhere towards the end – you can’t miss it.

Friday update – the solution
The 7-letter word that pops out at you? Reigned, which you immediately see becomes reignited – nine letters and four syllables!

Want to do this on your Raspberry Pi?
The dictionary file there is /usr/share/dictd/wn.index, but you probably don’t have it by default so you’ll need to install a few packages to get it which is simple enough. This post about Words with Friends explains the packages I used to provide that dictionary. Aside from the location of the dictionary, and that it contains fewer(?) words, everything else should be the same.

Conclusion
We have solved this week’s NPR puzzle without any complex programming just by using some simple Linux commands.

References and related
This link is nice because it has a transcription of the puzzle so you don’t have to waste time listening to the whole six-minute segment.
Another NPR puzzle we solved in a similar way.

Categories
Linux

Solving this week’s NPR weekend puzzle with a few Linux commands

Intro
I listen to the NPR puzzle every Sunday morning. I’m not particularly good at solving them, however – I usually don’t. But I always consider if I could get a little help from my friendly Linux server, i.e., if it lends itself to solution by programming. As soon as I heard this week’s challenge I felt that it was a good candidate. I was not disappointed…

The details
So Will Shortz says think of a common word with four letters. Now add O, H and M to that word, scramble the letters to make another common word in seven letters. The words are both things you use daily, and these things might be next to each other.

My thought pattern on that is that, great, we can look through a dictionary of seven-letter words which contain O, H and M. That already might be sufficiently limiting.

This reminded me of using the built-in Linux dictionary to give me some great tips when playing Words with Friends, which I document here.

In my CentOS my dictionary is /unix/share/dict/linux.words. It has 479,829 words:

$ cd /usr/share/dict; wc linux.words

That’s a lot. So of course most of them are garbagey words. Here’s the beginning of the list:

$ more linux.words

1080
10-point
10th
11-point
12-point
16-point
18-point
1st
2
20-point
2,4,5-t
2,4-d
2D
2nd
30-30
3-D
3-d
3D
3M
3rd
48-point
4-D
4GL
4H
4th
5-point
5-T
5th
6-point
6th
7-point
7th
8-point
8th
9-point
9th
-a
A
A.
a
a'
a-
a.
A-1
A1
a1
A4
A5
AA
aa
A.A.A.
AAA
aaa
AAAA
AAAAAA
...

You see my point? But amongst the garbage are real words, so it’ll be fine for our purpose.

What I like to do is to build up to increasingly complex constructions. Mind you, I am no command-line expert. I am an experimentalist through-and-through. My development cycle is Try, Demonstrate, Fix, Try Demonstrate, Improve. The whole process can sometimes be finished in under a minute, so it must have merit.

First try:

$ grep o linux.words|wc

 230908  230908 2597289

OK. Looks like we got some work to do, yet.

Next (using up-arrow key to recall previous command, of course):

$ grep o linux.words|grep m|wc

  60483   60483  724857

Next:

$ grep o linux.words|grep m|grep h|wc

  15379   15379  199724

Drat. Still too many. But what are we actually producing?

$ grep o linux.words|grep m|grep h|more

abbroachment
abdominohysterectomy
abdominohysterotomy
abdominothoracic
Abelmoschus
abhominable
abmho
abmhos
abohm
abohms
abolishment
abolishments
abouchement
absmho
absohm
Acantholimon
acanthoma
acanthomas
Acanthomeridae
acanthopomatous
accompliceship
accomplish
accomplishable
accomplished
accomplisher
accomplishers
accomplishes
accomplishing
accomplishment
accomplishments
accomplisht
accouchement
accouchements
accroachment
Acetaminophen
acetaminophen
acetoamidophenol
acetomorphin
acetomorphine
acetylmethylcarbinol
acetylthymol
Achamoth
achenodium
achlamydeous
Achomawi
...

Of course, words with capitalizations, words longer and shorter than seven letters – there’s lots of tools left to cut this down to manageable size.

With this expression we can simultaneously require exactly seven letters in our words and require only lowercase alphabetical letters: egrep ′^[a-z]{7}$′. This is an extended regular expression that matches the beginning (^) and end ($) of the string, only characters a-z, and exactly seven of them ({7}).

With that vast improvement, we’re down to 352 entries, a list small enough to browse by hand. But the solution still didn’t pop out at me. Most of the words are obscure ones, which should automatically be excluded because we are looking for common words. We have:

$ grep o linux.words|grep m|grep h|egrep ′^[a-z]{7}$′|more

achroma
alamoth
almohad
amchoor
amolish
amorpha
amorphi
amorphy
amphion
amphora
amphore
apothem
apothgm
armhole
armhoop
bemouth
bimorph
bioherm
bochism
bohemia
bohmite
camooch
camphol
camphor
chagoma
chamiso
chamois
chamoix
chefdom
chemizo
chessom
chiloma
chomage
chomped
chomper
chorism
chrisom
chromas
chromed
chromes
chromic
chromid
chromos
chromyl
...

So I thought it might be inspiring to put the four letters you would have if you take away the O, H and M next to each word, right?

I probably ought to use xargs but never got used to it. I’ve memorized this other way:

$ grep o linux.words |grep m|grep h|egrep ′^[a-z]{7}$′|while read line; do
> s=`echo $line|sed s/o//|sed s/h//|sed s/m//`
> echo $line $s
> done|more

sed is an old standard used to do substitutions. sed s/o// for example is a filter which removes the first occurrence of the letter O.

I could almost use the tr command, as in

> …|tr -d ′[ohm]′

in place of all those sed statements, but I couldn’t solve the problem of tr deleting all occurrences of the letters O, H and M. And the solution didn’t jump out at me.

So until I figure that out, use sed. That gives:

achroma acra
alamoth alat
almohad alad
amchoor acor
amolish alis
amorpha arpa
amorphi arpi
amorphy arpy
amphion apin
amphora apra
amphore apre
apothem apte
apothgm aptg
armhole arle
armhoop arop
bemouth beut
bimorph birp
bioherm bier
bochism bcis
bohemia beia
bohmite bite
camooch caoc
camphol capl
camphor capr
chagoma caga
chamiso cais
chamois cais
chamoix caix
chefdom cefd
chemizo ceiz
chessom cess
chiloma cila
chomage cage
chomped cped
chomper cper
chorism cris
chrisom cris
chromas cras
chromed cred
chromes cres
chromic cric
chromid crid
chromos cros
chromyl cryl
...

Friday update
I can now reveal the section listing that reveals the answer because the submission deadline has passed. It’s here:

...
schmoes sces
schmoos scos
semihot seit
shahdom sahd
shaloms sals
shamalo saal
shammos sams
shamois sais
shamoys says
shampoo sapo
shimose sise
shmooze soze
shoeman sean
sholoms slos
shopman span
shopmen spen
shotman stan
...

See it? I think it leaps out at you:

shampoo sapo

becomes of course:

soap
shampoo

!

They’re common words found next to each other that obey the rules of the challenge. You can probably tell I’m proud of solving this one. I rarely do. I hope they don’t call on me because I also don’t even play well against the radio on Sunday mornings.

Conclusion
Now I can’t give out the answer right now because the submission deadline is a few days from now. But I will say that the answer pretty much pops out at you when you review the full listing generated with the above sequence of commands. There is no doubt whatsoever.

I have shown how a person with modest command-line familiarity can solve a word problem that was put out on NPR. I don’t think people are so much interested in learning a command line because there is no instant gratification and th learning curve is steep, but for some it is still worth the effort. I use it, well, all the time. Solving the puzzle this way took a lot longer to document, but probably only about 30 minutes of actual tinkering.