Categories
Linux Perl

Words with Friends Gentle Word Hints

Intro
My friend and I are in a perpetual game of Words With Friends on our smartphones these days. It suits me to a T because I like to take a looong time to come up with just the right move (although as time has passed you’ll see I’ve become less enamored with the app. You’ll see this if you have the patience to read through the whole article which was written progressively as I continued to play more games0. And you can’t knock over the board and lose all your played tiles! Yet you can still get advice from other friends by showing the board on your smartphone.

I have a good vocabulary, my friend a little less so. But he takes advantage of a peculiar quirk of playing the game in this way – he makes up words and sees if they’re accepted by the program. And…sometimes they are! So Words with Friends uses a ridiculous dictionary or dictionaries. That’s how he came up with hila. Later I turned the tables on him. I played a “word” that I didn’t believe to be a word, but rather one that I believed Words with Friends might believe to be a word: carbo. Sure enough. Accepted. 59 points. The “o” allowed me to stretch to reach the triple word tile and intersect a “T” to make “to” across. Check the American heritage dictionary. Not there, except as a prefix. My next best idea would only have been about 21 points.

I tried one of those cheating programs, www.lexicalwordfinder.com. I had the letters e,e, u, m, n, a and I think a. What does it suggest? neume. Ha? Who in the world ever heard of that? So it’s obviously plugged into those ridiculous dictionaries.You might as well let computers play other computers if you’re going to use cheats like that. My idea is to make gentle suggestions that you the educated person would have thought of on your own if you had enough time. Or maybe like me it’s on the tip of your tongue but you can’t quite find it in your brain. So I am writing a simple program which draws on a common dictionary – words every well-educated person ought to know. That’s also easier to program! Since I can’t compete with the big boys, my contribution will be to show the steps how I am writing such a program.

I’m running Ubuntu server, which is a Debian Linux variant. Initially I wasn’t sure what all I would need so I did:

# sudo apt-get install dictd dict dict-wn dict-gcide

in the hopes of getting a words file! dict-wn is the WordNet dictionary; gcide is Gnu collaborative international dictionary of english

I found the goods in /usr/share/dictd. wn.index has 87,924 unique words:

# awk '{print $1}' wn.index|uniq|wc

March, 2013 update – CentOS
I’ve since switched my hosting platform to CentOS. There there is the dictionary /usr/share/dict/linux.words. If you don’t have it install the package words-3.0-17.el6. It has 480,000 “words!” I’m not sure why the huge discrepancy, but I know that’s a lot more words than are traditionally mentioned as the number of words in the English language.

. After removing numbers and punctuation marks it’s at 80,724:

# awk '{print $1}' wn.index|uniq|egrep -v [0-9\'.-]|wc

Here are the first few words now:

# awk '{print $1}' wn.index|uniq|egrep -v [0-9\'.-]|head -25
a
aa
aaa
aachen
aah
aaland
aalborg
aalii
aalst
aalto
aar
aardvark
aardwolf
aare
aarhus
aaron
aarp
aas
aave
ab
aba
abaca
abacinate
aback
abactinal

Many of these look suspicious. I want to at least throw out the proper names. I don’t care if Words with Friends accepts them or not. I don’t believe they should be used. For instance:

# dict aar
1 definition found

From WordNet (r) 3.0 (2006) [wn]:

  Aar
      n 1: a river in north central Switzerland that runs northeast
           into the Rhine [syn: {Aare}, {Aar}, {Aare River}]

So how are we going to get rid of the words with capitals? Unfortunately in the index they are all in lower case. I only see the possibility to do a dictionary lookup on each and every one. Here’s what I came up with for that. I’m sure some wiser guy could write this as a 1-liner, but hey, it is what it is:

#!/usr/bin/perl
# DrJohn, 11/2011
# check if words are upper or lower case
# we can only learn when doing a dict lookup
# input are proposed words
$DEBUG = 0;
while() {
  chomp;
  $word = $_;
  $cnt = 0;
  open(DEF,"dict -d wn $word|");
  while() {
    $cnt++;
    if ($cnt == 5) {
# this is the line that repeats the word
      ($wordagain) = $_ =~ /(\w+)/;
      print $wordagain if $DEBUG;
      print "$word\n" if $wordagain eq $word;
      last;
    } # end line five condition stuff
  } # end loop over definition
} # end STDIN

Run it on and the first few results are now like this:

aa
aah
No definitions found for "aaland"
aalii
aardvark
aardwolf
aba
abaca
abacinate
aback
abactinal

We got rid of Aar and many others, but we also got rid of one of the most common words – “a.” Not that it matters for this game, but this points to a possibility that the choice of case in the definition is somewhat arbitrary and a word with several meanings, any of which requires upper case, say like English, is going to be written upper case. Indeed that is the case for English, which of course is a nice word and perfectly acceptable when used with the meaning (sports) the spin given to a ball by striking it on one side. Sigh, nothing’s ever easy. So lets’ modify our program to make a separate list of rejected words so we can review it by hand and add back in words which can be used in lower case. Whenever yuo do something by eye yuo want to reduce the task as much as possible. Here we can take advantage of another fact: a capitalized word with a single definition is not of interest to us because that single definition is the one for which capitalization is required, like Aar. So we only need to consider the cases of capitalized words with mutliple definitions, one of which may be typically used with the word spelled in lower case, like English.

We showed the results syntax for a word with single definition above, for Aar. Here’s an example of a capitalized word with multiple definitions:

 dict -d wn english
1 definition found

From WordNet (r) 3.0 (2006) [wn]:

  English
      adj 1: of or relating to or characteristic of England or its
             culture or people; "English history"; "the English landed
             aristocracy"; "English literature"
      2: of or relating to the English language
      n 1: an Indo-European language belonging to the West Germanic
           branch; the official language of Britain and the United
           States and most of the commonwealth countries [syn:
           {English}, {English language}]
      2: the people of England [syn: {English}, {English people}]
      3: the discipline that studies the English language and
         literature
      4: (sports) the spin given to a ball by striking it on one side
         or releasing it with a sharp twist [syn: {English}, {side}]

There’s different characteristics we could use as markers. I propose to look for a digit immediately followed by a colon as a definition marker. More than one occurrence probably means the word has multiple definitions and we should consider it. So here’s a re-worked version of our program to accomplish that:

#!/usr/bin/perl
# DrJohn, 11/2011
# check if words are upper or lower case
# we can only learn when doing a dict lookup
# input are proposed words
$DEBUG = 0;
open(CAND,">/tmp/candidates") || die "cannot open /tmp/candidates!!\n";
while() {
  chomp;
  $word = $_;
  $cnt = 0;
  $cand = 0;
  $cntdef = 0;
  open(DEF,"dict -d wn $word|");
  while() {
    $cnt++;
    if ($cnt == 5) {
# this is the line that repeats the word
      ($wordagain) = $_ =~ /(\w+)/;
      print $wordagain if $DEBUG;
      if ($wordagain eq $word) {
        print "$word\n";
        last;
      } else {
# maybe there are multiple definitions
        $cand = 1;
      }
    } elsif ($cand) {
      $cntdef++ if /\d:/;
    } # end line five condition stuff
  } # end loop over definition
# print candidate rejected word if there were multiple definitions
  print CAND "$word\n" if $cntdef > 1;
} # end STDIN

Note the regex \d: that we use to determine a definition.

Running this on the first few words, we have a, aaron and ab to consider. Ab is interesting because it’s the name of a degree (in fact I have that degree), as well as shorthand for muscles of the abdomen. So, a lower case usage exists!

Now the program is running kind of slow. So since we don’t want to run it multiple times, perhaps it’s time to turn our attention to this other the problem: all the lines like No definitions found for “aaland” that we are seeing in the meantime:

# grep ^aaland wn.index
aaland islands  OgB     Cz

So these are due to compound words which we don’t want anyways because they won’t be accepted.

So running the modified program produces an output with 63185 words, and another 2170 words to be considered for review. The first few are as follows:

aa
aah
aalii
aardvark
aardwolf
aba
abaca
abacinate
aback
abactinal
abacus
abaft
abalone
abamp
abampere
abandon

Let’s check one:

# dict aalii
1 definition found

From WordNet (r) 3.0 (2006) [wn]:

  aalii
      n 1: a small Hawaiian tree with hard dark wood

Now check American Heritage dictionary, which I consider the Bible. Not there. I believe it would be accepted by Words with Friends because it plays using the Lexical Word Finder cheating program. Just enter your tiles as A A L I I A A to see for yourself.

And here are the first few rejected words which are to be reviewed:

a
aaron
ab
abdias
abel
aberdeen
abilene
abkhas
abkhasian
abkhaz
abkhazian
abnaki
aboriginal
ac
achaean
achomawi
actinia
actium
ad
adalia
adam
adams
add

Add? How did it get there? Well, you know, ADD, the medical condition? Yes, it’s exasperating. But that’s what you get with free. 2000 is not too many to review, however.

I’m having some doubts about the whole project now. I had the letters D E E G R I U. An open D was on the board where there was room for a couple tiles above it and three tiles below. Using the three tiles below would make the play triple word. I initially came up with drug. Then edger, which works out to the same number of points because in WWF the U is two points for some reason. But I wanted to use another tile so I thought and thought. Edgier? Nope, doesn’t fit. Ridge? Fits, but too short. Then the Aha moment: ridged. And I felt a moment of pleasure realizing that most people wuold not have come up with it. But a word suggestion program? It wuold have spit it out first thing, taking away from that aspect of the game.

But then there is the other side. Successful play depends on rote memorization of all two-letter words. And what passes for a WWF word is very questionable, as I’ve said above. Like how about jo? Yup. No, not in standard dictionaries, though.

I’m still thinking about what algorithm to use for the actual program. I can see that potentially it will be expensive, computationally speaking, given all the variants that must be tested. So I thought of making the task even easier: throw out words with more than eight letters. To see how many long words we’re going to toss:

# egrep '\w{9}' betterdict|wc

where betterdict is the results of all my pruning described above. It’s 32386 words we’ll toss. That ought to help alot. And 464 candidate words less we’ll have to consider. We’ll do something like # egrep -v ‘\w{9}’ betterdict > newbetterdict to build our even more slimmed-down dictionary.

Our First Match Program
Here’s a first stab at a matching program. It actually is pretty good (meaning there aren’t an overwhelming number of results) if you have the typical combination of unruly letters. Note we use the awesome of power of regular expressions to do all the heavy lifting.

#!/usr/bin/perl
$m = $ARGV[0];
$DEBUG = 0;
print "match: $m\n";
$dict = "/usr/share/dictd/newbetterdict";
open(DICT,"$dict") || die "cannot open dict $dict!!\n";
while(<DICT>) {
  chomp;
  $word = $_;
  print "word: $word\n" if $DEBUG;
  if ($word =~ /^[$m]{2,}$/) {
# we have the beginning of a match
    $cnt++;
    print "match: word: $word\n";
  }
}
print "matched: $cnt\n";

You run it from the command line with the letters you want to try as argument:

 # match.pl fxitau
match: fxitau
match: word: aa
match: word: affix
match: word: aft
match: word: ataxia
match: word: ax
match: word: fa
match: word: fat
match: word: faux
match: word: fax
match: word: fiat
match: word: fit
match: word: fix
match: word: ft
match: word: ii
match: word: iii
match: word: ix
match: word: tat
match: word: tatu
match: word: tau
match: word: taut
match: word: tax
match: word: taxi
match: word: tiff
match: word: tit
match: word: titi
match: word: tufa
match: word: tuff
match: word: tuft
match: word: tut
match: word: tux
match: word: xi
match: word: xii
match: word: xiii
match: word: xix
match: word: xx
match: word: xxi
...

What’s wrong of course is that while we are requiring matched words to be formed from our letters and only those letters, we have not taken care to avoid duplicate use of the same letter. That regular expression does a lot, but it’s not quite doing everything at this point. Now we start having to get creative to take it to the next level. I’m not sure myself how I’m going to do it.

More on that game. Now we’re getting into the groove. And by that I mean you make up words. So our final score was 381 to 347. Mind you, I’m not claiming any kind of expertise in the game. This is just a sad reflection on the liberalness of what WWF calls a “word.” Here are our made-up words from that one game: carbo, qi, ne, deni, oi, jo, wo and da. Of these, wo is the only one in the American heritage dictionary as a variant spelling of woe. Sad, right? It completely changes the strategy of the game.

Revised Program: Almost There
I thought for awhile I could use the transliteration operator tr, but alas, it does not do interpolation, so I had to go with something a bit more clumsy. But the whole program, which is basically functional, now returns only those words which match the letters provided, which is already a great help. Here is the warts-and-all version:

#!/usr/bin/perl
$m = $ARGV[0];
$DEBUG = 0;
print "match: $m\n";
# split up match
for($i=0;$i<length($m);$i++) {
  $ltr = substr($m,$i,1);
  print "ltr: $ltr\n" if $DEBUG;
# count frequency of this letter
  $ltrhash{$ltr}++;
}
$dict = "/usr/share/dictd/newbetterdict";
open(DICT,"$dict") || die "cannot open dict $dict!!\n";
while(<DICT>) {
  chomp;
  $word = $_;
  $bad = 0;
  %ltrwordhash = ();
  print "word,m: $word,$m\n" if $DEBUG;
  if ($word =~ /^[$m]{2,}$/) {
    print "Begin word analysis\n" if $DEBUG;
# we have the beginning of a match
    for($i=0;$i<length($word);$i++) {
      $ltr = substr($word,$i,1);
      $ltrwordhash{$ltr}++;
      print "ltr,cnt: $ltr,$ltrwordhash{$ltr}\n" if $DEBUG;
# throw out words with too many letter occurences
      if ($ltrwordhash{$ltr} > $ltrhash{$ltr}) {
        print "word tossed due to excess letters. max: $ltrhash{$ltr}\n" if $DEBUG;
        $bad = 1;
        last;
      }
    }
    next if $bad;
# what remains are the good words!
    print "matched word: $word\n";
  }
}

The DEBUG statements helped me find coding errors. I set DEBUG = 1 and run the program. I had forgotten the
%ltrwordhash = (); statement initially to clear out that hash for each new word. That was not good, but a review of the debug output quickly showed what was going on. Now we run it again with my current letters plus a free one (“l”) I want to use from the board:

# match.pl liaeinpu
match: liaeinpu
matched word: ail
matched word: ain
matched word: ale
matched word: alien
matched word: aline
matched word: alp
matched word: alpine
matched word: ane
matched word: ani
matched word: anil
matched word: anile
matched word: ape
matched word: elan
matched word: en
matched word: ie
matched word: ii
matched word: il
matched word: in
matched word: inula
matched word: lane
matched word: lap
matched word: lapin
matched word: lea
matched word: lean
matched word: leap
matched word: lei
matched word: leu
matched word: li
matched word: lie
matched word: lien
matched word: lieu
matched word: lii
matched word: line
matched word: lineup
matched word: lip
matched word: lupin
matched word: lupine
matched word: nail
matched word: nap
matched word: nape
matched word: napu
matched word: neap
matched word: nil
matched word: nip
matched word: nu
matched word: pa
matched word: pail
matched word: pain
matched word: pal
matched word: pale
matched word: pan
matched word: pane
matched word: panel
matched word: pe
matched word: pea
matched word: peal
matched word: pean
matched word: pel
matched word: pen
matched word: penal
matched word: penial
matched word: pi
matched word: pia
matched word: pie
matched word: pilau
matched word: pile
matched word: pin
matched word: pine
matched word: pineal
matched word: plain
matched word: plan
matched word: plane
matched word: plea
matched word: pul
matched word: pula
matched word: pule
matched word: pun
matched word: ulna
matched word: unai
matched word: up

Cool, huh? I wanted to get a long word starting with “l” to pick up a double-word. I was coming up short. I maybe eventually would have thought of it, but before I ran the program, I was not coming up with long matches. So lineup and lupine are really helpful suggestions. Even though it’s gentle hints, it still feels like cheating, however!

And only 38 lines of code, including comments and DEBUG statements.

Next we’ll add some features. Let’s allow a starting/middle/ending letter to be specified. To support optional command-line arguments it’s nice to have more sophisticated argument parsing.

We started a new game. It’s just getting ridiculous as my friend gravitates towards a style of play which is a lot less about word knowledge than about trying all possible letter combinations to maximize the total, regardless of whehter or not it seems like a word. And in order to remain competitive I have to play that way as well, to a degree. So far our made-up “words” in this game include: qi, noh (used twice), oho, obe, fe, mm, noo, jo, oxo and deva. That deva really killed me as it was used to make a triple word play. Now we’ve played a total of 14 vertical words and 12 horizontal words. So 11/26 of the words we’ve so-far used are fabricated nonsense words!
I think it’s a mixed blessing. I welcome additional two-letter words. They’re readily memorized and only a finite number can exist (262 of course). And they really help with the play. For three-letter made-up words I’m on the edge but inclined to not encourage their use. Definitely not for four-letter words and higher. The universe of such words is just too great.

Some Nice Touches
Here’s the program which permits optional beginning letter, end letter and middle letter. I’ve introduced argument parsing with the Getopt::Std module.

#!/usr/bin/perl
use Getopt::Std;
getopts('b:m:e:l:');
usage() unless $opt_l;
$m = $opt_l;
$begltr = $opt_b ? $opt_b: ".";
$endltr = $opt_e ? $opt_e: ".";
$midltr = $opt_m ? $opt_m: ".";
$DEBUG = 0;
print "match: $m\n";
# split up match
for($i=0;$i<length($m);$i++) {
  $ltr = substr($m,$i,1);
  print "ltr: $ltr\n" if $DEBUG;
# count frequency of this letter
  $ltrhash{$ltr}++;
}
$dict = "/usr/share/dictd/newbetterdict";
open(DICT,"$dict") || die "cannot open dict $dict!!\n";
while(<DICT>) {
  chomp;
  $word = $_;
  $bad = 0;
  %ltrwordhash = ();
  print "word,m: $word,$m\n" if $DEBUG;
  if ($word =~ /^[$m]{2,}$/) {
# meet begin/middle/end conditions
    next unless $word =~ /^$begltr.*$midltr.*$endltr$/;
    print "Begin word analysis\n" if $DEBUG;
# we have the beginning of a match
    for($i=0;$i<length($word);$i++) {
      $ltr = substr($word,$i,1);
      $ltrwordhash{$ltr}++;
      print "ltr,cnt: $ltr,$ltrwordhash{$ltr}\n" if $DEBUG;
# throw out words with too many letter occurences
      if ($ltrwordhash{$ltr} > $ltrhash{$ltr}) {
        print "word tossed due to excess letters. max: $ltrhash{$ltr}\n" if $DEBUG;
        $bad = 1;
        last;
      }
    }
    next if $bad;
# what remains are the good words!
    print "matched word: $word\n";
  }
}
sub usage {
print "Usage: $0 [-b beginning_letter] [-e end_letter] [-m middle_letter] -l letters\n";
exit(1);
}

I called it match2.pl. Here’s an example using it with my awful letters (And is it just me, or does WWF have a definite proclivity to throw horrible letters at you for many turns in a row??):

# match2.pl -b b -l duttdsb
match: duttdsb
matched word: bud
matched word: bus
matched word: bust
matched word: but
matched word: butt

Note that my simplistic dictionary does not seem to have plurals. It also does not seem to have verb tenses besides present tense. Oh, well. If you realize that, it’s no big deal. It’s supposed to be gentle hints after all (which I can hide behind any time the task of doing a complete job becomes too taxing!).

How We Can Put This on the Web
The typical time it takes to run match2.pl is about 55 msec:

# time ./match2.pl -l zatrd
match: zatrd
matched word: adz
matched word: art
matched word: dart
matched word: rad
matched word: rat
matched word: tad
matched word: tar
matched word: trad
matched word: tzar

real    0m0.055s
user    0m0.050s
sys     0m0.000s

That’s pretty fast. So, anyhow, I’m thinking to take the next step and make this available on the web, at least until it becomes popular, in the form of an Ajax/Perl program. Now I’ve never done an Ajax program, but I’ve been looking for an excuse to do one and I think this fits the bill. I’m envisioning a web page where you punch in your letters and the possible word matches appear on the same page. Another way to go is to write the whole thing as a Javascript program. The dictionary isn’t that large, after all. I’ve also never done anything this ambitious in Javascript, so that might take some doing. I think we’ll tackle Ajax first.

Now gcide.index has 149,682 words, but it’s a mess and includes lots of proper names, so I will not use it.

To be continued…

Categories
Admin Apache IT Operational Excellence Linux Security Web Site Technologies

Apache Tips in Light of Security Problems

Intro
I am far from an expert in Apache. But I have a good knowledge of general best practices which I apply when running Apache web server. None of my tips are particularly insightful – they all can be found elsewhere, but this will be a single place to help find them all together.

To Compile or Not
As of this writing the current version is 2.2.21. The version supplied with the current version of SLES, SLES 11, is 2.2.10. To find the version run httpd -v

I think that’s fairly typical for them to be so many version behind. I recommend compiling your own version. But pay attention to security advisories and check every quarter to see what the latest release is. You’ll have to keep up with it on your own or you’ll actually be in worse shape than if you used the vendor version and applied patches regularly.

What You’ll Need to Know for the Range DOS Vulnerability
When you get the source you might try a simple ./configure, followed by a make and finally make install. And it would all seem to work. You can fetch the home page with a curl localhost. Then you remember about that recent Range header denial of service vulnerability described here. If you test for whether you support the Range header you’ll see that you do. I like to test for this as follows:

$ curl -H "Range: bytes=1-2" localhost

If before you saw something like

<html><body><h1>It works!</h1>

now it becomes

ht

i.e., it grabbed bytes one and two from <html>…

Now there are options and opinions about what to do about this. I think turning off Range header support is the best option. But if you try that you will fail. Why? Because you did not compile in the mod_headers module. To turn off Range headers add these lines to the global part of your configuration:

RequestHeader unset Range
RequestHeader unset Request-Range

To see what modules you have available in your apache binary you do

/usr/local/apache2/bin/httpd -l

which should look like the following if you have taken all the defaults:

Compiled in modules:
  core.c
  mod_authn_file.c
  mod_authn_default.c
  mod_authz_host.c
  mod_authz_groupfile.c
  mod_authz_user.c
  mod_authz_default.c
  mod_auth_basic.c
  mod_include.c
  mod_filter.c
  mod_log_config.c
  mod_env.c
  mod_setenvif.c
  mod_version.c
  prefork.c
  http_core.c
  mod_mime.c
  mod_status.c
  mod_autoindex.c
  mod_asis.c
  mod_cgi.c
  mod_negotiation.c
  mod_dir.c
  mod_actions.c
  mod_userdir.c
  mod_alias.c
  mod_so.c

Notice there is no mod_headers.c which means there is no mod_headers module. And in fact when you restart your apache web server you are likely to see this error:

Syntax error on line 360 of /usr/local/apache2/conf/httpd.conf:
Invalid command 'RequestHeader', perhaps misspelled or defined by a module not included in the server configuration

So you need to compile in mod_headers. Begin by cleaning your slate by running make clean in your source directory; then run configure as follows:

./configure –enable-headers –enable-rewrite

I’ve thrown in the –enable-rewrite qualifier because I like to be able to use mod_rewrite. It is not actually used for the security problems being discussed in this article.

Side note for those using the system-provided apache2 package on SLES
As an alternative to compiling yourself, you may be using an apache package. I have only tested this for SLES (so it would probably be the same for openSUSE). There you can edit the /etc/sysconfig/apache2 file and add additional modules to load. In particular the line

APACHE_MODULES="actions alias auth_basic authn_file authz_host authz_groupfile authz_default authz_user authn_dbm autoindex
 cgi dir env expires include log_config mime negotiation setenvif ssl suexec userdir php5 reqtimeout"

can be changed to

APACHE_MODULES="actions alias auth_basic authn_file authz_host authz_groupfile authz_default authz_user authn_dbm autoindex
 cgi dir env expires include log_config mime negotiation setenvif ssl suexec userdir php5 reqtimeout headers"

Back to compiling. Note that ./configure -help gives you some idea of all the options available, but it doesn’t exactly link the options to the precise module names, though it gives you a good idea via the description.

Then run make followed by make install as before. You should be good to go!

A Built-in Contradiction
You may have successfully suppressed use of range-headers, but on my web server, I noticed a contradictory HTTP Response header was still being issued after all that:

Accept-Ranges:

I use a simple

curl -i localhost

to look at the HTTP Response headers. The contradiction is that your server is not accepting ranges while it’s sending out the message that it is!

So turn that off to be consistent. This is what I did.

# need the following line to not send Accept-Ranges header
Header unset Accept-Ranges
#

Don’t Give Away the Keys
Don’t reveal too much about your server version such as OS and patch level of your web server. I suppose it is OK to reveal your web server type and its major version. Here is what I did:

# don't reveal too much about the server version - just web server and major version
# see http://www.ducea.com/2006/06/15/apache-tips-tricks-hide-apache-software-version/
ServerTokens Major

After all these changes curl -i localhost output looks as follows:

HTTP/1.1 200 OK
Date: Fri, 04 Nov 2011 20:39:02 GMT
Server: Apache/2
Last-Modified: Fri, 14 Oct 2011 15:37:41 GMT
ETag: "12005-a-4af4409a09b40"
Content-Length: 10
Content-Type: text/html

See? I’ve gotten rid of the Accept-Ranges and provide only sketchy information about the server.

I put these security-related measures into a single file I include from the global configuration file httpd.conf into a file I call security.conf. To put it all toegther, at this point my security.conf looks like this:

# 11/2011
# prevent DOS attack.  
# See http://mail-archives.apache.org/mod_mbox/httpd-announce/201108.mbox/%3C20110824161640.122D387DD@minotaur.apache.org%3E - JH 8/31/11
# a good explanation of how to test it: 
# http://devcentral.f5.com/weblogs/macvittie/archive/2011/08/26/f5-friday-zero-day-apache-exploit-zero-problem.aspx
# looks like we do have this vulnerability, 
# trying curl -i -H 'Range:bytes=1-5' http://bsm2.com/index.html
# note that I had to compile with ./configure --enable-headers to be able to use these directives
RequestHeader unset Range
RequestHeader unset Request-Range
#
# need the following line to not send Accept-Ranges header
Header unset Accept-Ranges
#
# don't reveal too much about the server version - just web server and major version
# see http://www.ducea.com/2006/06/15/apache-tips-tricks-hide-apache-software-version/
ServerTokens Major

SSL (added December, 2014)
Search engines are encouraging web site operators to switch to using SSL for the obvious added security. If you’re going to use SSL you’ll also need to do that responsibly or you could get a false sense of security. I document it in my post on working with cipher settings.

Disable folder browsing/directory listing
I recently got caught out on this rookie mistake: Web Directories listing vulnerability. The solution is simple. In side your main HTDOCS section of configuration you may have a line that looks like:

Options Indexes FollowSymLinks ExecCGI

Get rid of that Indexes – that’s what permits folder browsing, So this is better:

Options FollowSymLinks ExecCGI

Turn off php version listing, December 2016 update
Oops. I read about how the 47% of the top million web sites have security issues. One bases for the judgment is to see what version of PHP is running based on the headers. So i checked my https server, and, oops:

$ curl ‐s ‐i ‐k https://drjohnstechtalk.com/blog/|head ‐22

HTTP/1.1 200 OK
Date: Fri, 16 Dec 2016 20:00:09 GMT
Server: Apache/2
Strict-Transport-Security: max-age=15811200; includeSubDomains; preload
Vary: Cookie,Accept-Encoding
X-Powered-By: PHP/5.4.43
X-Pingback: https://drjohnstechtalk.com/blog/xmlrpc.php
Last-Modified: Fri, 16 Dec 2016 20:00:10 GMT
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
 
<!DOCTYPE html>
<html lang="en-US">
<head>
...

So there is was, hanging out for all to see, PHP version 5.4.43. I’d rather not publicly admit that. So I turned it off by adding the following to my php.ini file and re-starting apache:

expose_php = off

After this my HTTP response headers show only this:

HTTP/1.1 200 OK
Date: Fri, 16 Dec 2016 20:00:55 GMT
Server: Apache/2
Strict-Transport-Security: max-age=15811200; includeSubDomains; preload
Vary: Cookie,Accept-Encoding
X-Pingback: https://drjohnstechtalk.com/blog/xmlrpc.php
Last-Modified: Fri, 16 Dec 2016 20:00:57 GMT
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

I must have overlooked this when I compiled my own apache v 2.4 and used it to run my principal web server over https.

June 2017 update
PCI compliance will ding you for lack of an X-Frame-Options header. So for a simple web site like mine I can always safely send one out by adding this to my apache.conf file (or whichever apache conf file you deem most appropriate. I have a special security file in conf.d where I actually put it):

# don't permit framing from other sources, DrJ 6/16/17
# https://www.simonholywell.com/post/2013/04/three-things-i-set-on-new-servers/
Header always append X-Frame-Options SAMEORIGIN

PCI compliance will also ding you if TRACE method is enabled. In that security file of my configuration I disable it thusly:

TraceEnable Off

Test both those things in one fell swoop
$ curl ‐X TRACE ‐i ‐k https://drjohnstechtalk.com/

HTTP/1.1 405 Method Not Allowed
Date: Fri, 16 Jun 2017 18:20:24 GMT
Server: Apache/2
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=15811200; includeSubDomains; preload
Allow:
Content-Length: 295
Content-Type: text/html; charset=iso-8859-1
 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>405 Method Not Allowed</title>
</head><body>
<h1>Method Not Allowed</h1>
<p>The requested method TRACE is not allowed for the URL /.</p>
<hr>
<address>Apache/2 Server at drjohnstechtalk.com Port 443</address>
</body></html>

See? X-Frame-Options header now comes out with desired value. TRACE method was disallowed. All good.

Conclusion
Make sure you are taking some precautions against known security problems in Apache2. For information on running multiple web server instances under SLES see my next post Running Multiple Web Server Instances under SLES.

References and related
Remember, for handling the apache SSL hardening go here.
Compiling apache 2.4
drjohnstechtalk is now an HTTPS site!
TRACE method sounds useful for debugging, but I guess there are exploits so it needs to be disabled. Wikipedia documents it: https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods. Don’t forget that curl -v also shows you your request headers!

Categories
Admin IT Operational Excellence Linux

Splitting a Text File Into Two Lines with Awk

Intro
How do you split a text file into two lines output per one original input line? Of course there are zillions of ways, with shell, xargs, Perl, your favorite tool, etc. But I decided to revisit that old standard awk to see if it might not just be the best (most compact and intelligible) way to do it!

The Challenge
I was provided a spreadsheet concerning printers in a new building, which I was to use to create access table entries for sendmail, i.e., so that they would be permitted to relay mail (these days it seems all printers are also scanners).

I wanted to have a comment line with the native printer name, with format

# Printer_Name

Then the appropriate access table entry, which has format

IP_ADDRESS   RELAY

As an additional wrinkle the spreadsheet had columns with variable amount of whitespace! It was very similar to the input below, which I had in a file called tmp:

PA01-USCVI-B52_160-P137C              Bldg 52 Plant 1st 160           10.12.210.161
PA02-Y-B53_160-D220                 Blag 53 Plant 1st 160               10.13.209.162
PA03-UIT-B54_COPY1-D645C         Bldg 54 Plant 1st Copy Rm   10.208.211.163
PA04-RUITY-B55-P235                 Bldg 55 Plant Basement Off         10.14.205.169
PA05-THY-675            John Tollesin    Bldg 53 Plant 2nd 220          10.13.204.156 
 

Fortunately I was interested in the first and last fields, which kept things simple. Here’s what I came up with:
 

awk '{print "# "$1"\n"$NF"\tRELAY"}' tmp

Not bad, eh? In addition to being relatively few characters, it makes sense to me, so I will remember this trick for the next time, which is a timesaver.

I have to get myself to a Unix or Cygwin session to show the output, but it is as I described. I guess the biggest trick is that awk allowed me to conveniently write out two lines in one statement by creating an ASCII newline character with the “\n”character. It’s probably better known that $1 stands for the first field and $NF (number of fields) stands for the last field of a line.

Conclusion
Sexier tools have come along, but don’t give up on our old friend awk – basic knowledge of what it does can be a real timesaver.

Categories
Linux

An SSH Terminal App for the HP Touchpad

October, 2016 Update
Needless to say, the HP Touchpad never caught on and mine is collecting dust.

What I just got is the new 8″ Amazon Fire HD Tablet. You can get a free good-quality ssh client for it called serverauditor. The keyboard emulation (linux CLI needs all those unusual keys pretty badly). The battery life is genuinely good – better than the Touchpad. It’s $89.

I’ll keep the blog post below online for historical purposes.

Updated Version
My previous post got out-of-date so rapidly that I have to start this topic all over! DO NOT follow my previous advice.

The Bluetooth Keyboard – It’s Worth It
My Bluetooth keyboard came in. It’s really awesome. I advise to get it if you want to treat your Touchpad (the cognoscenti prefer TP) like a Netbook from time-to-time, namely, by having the ability to type rapidly and comfortably. Get the HP one made for the Touchpad because:
– it’s small like you’d expect as a companion for a small tablet computer, yet the keys are full size
– it has some really convenient shortcut keys so you’re not spending too much time shifting your hand from keyboard to screen, namely:
— volume controls
— screen on/off
— even a key that shows your cards
– plus some keys that do stuff that’s harder with just a TP
— Ctrl (control) key, yeah!
— arrow keys
–mute key
–screen brightness/dimmer keys
–plus other keys I haven’t tested yet
– and the : and / keys are primary keys like they should be

So far I’m missing a
-Home/End key and if I ever get my terminal working again
– an ESC (escape) key

All-in-all I’d say the Bluetooth keyboard is an obviously well-engineered product – a perfect pairing for the TP.

It’s $45 at Amazon. And yes, I am writing this blog entry on my new keyboard!

I also bought an off-brand display stand. By Mivizu. It’s better than NOT having one, but it’s kind of flimsy and awkward. In no way a fun and beautifully engineered companion to the TP, unlike the IPad case that everyone likes to play with.

What About that SSH Terminal?
I probably messed things up with Preware alpha/beta software. They have released an Xterm, but I arrived at it from various previous upgrades and either Xecutah or the XServer is not working for me. The XServer does not launch a new card like it should. See below.

I will probably have to Web Dr my device (start from a factory install state), which they warn you should be prepared to do when using test software. Live and learn. I have not had time to do that yet, but I wanted to delete my old post and get the new facts out here before others went down the wrong path.

So, briefly, an ssh, bash, xterm to your underlying Linux on your TP are all available from the webos-internals.org site.

Sep 29th I saw an upgrade for Xecutah, Servers and xterm – to v 0.9.3. I did the upgrade and, to my surprise, I am back in business again! The xterm launches once again and so I do not have to Web Dr my Touchpad.

I thought I owed it to the community to experiment, so I decided to change root’s shell to bash! That’s right, the shell is that old /bin/sh by default. Once you’ve installed it, bash appears in /opt/bin/bash. Well…that worked too. I now have a comfortable shell that launches for me when I fire up my xterm, or xterms. Of course I brought over my .bashrc file – using sftp of course – with its familiar prompt definition and convenience aliases such as the universal “ll” for ls -l. To make really sure I hadn’t blown up my Touchpad, I rebooted. Yes, reboot from your shell really does work to reboot your TP! And yes, it came back with flying colors.

Esc key in the xterm for real Keyboard Users
I don’t think the HP keyboard has an escape key, not that I can find. So you’re in a bit of a bind if you use it for your xterm during a vi editing session. What you can do is momentarily bring up the virtual keyboard by hitting the, um, keyboard key. Xecutah now comes with instructions on how to generate the escape key on the virtual keyboard (hold t, choose right-most character, then “[” as your next character) which work. Then, when you’ve got your Esc, which you don’t need to often anyways, hit that keyboard key again to recommended using your comfy real keyboard.

So I am a happier camper once again. I even contributed to webos-internals. You should, too, if you think they’re providing a valued service as I do.

To be continued.

Categories
IT Operational Excellence Linux

Grep is Slow as a Snail in SLES 11 – Solved

I had written earlier about the performance problems of Suse Linux Enterprise Server v 11  Service Pack 1 (SLES 11 SP1)  under VMWare: http://drjohnstechtalk.com/blog/2011/06/performance-degradation-with-sles-11-sp1-under-vmware/.  What I hadn’t fully appreciated at that time is that part of the problem could be with the command grep itself.  Further investigation has convinced me that grep as implemented under SLES 11 SP 1 X86_64 is horrible.  It is seriously broken. The following results are invariant under both a VM and a physical server.

Methodology 1

A cksum shows that grep has changed between SLES 10 SP 3 and SLES 11 SP 1.  I’m not sure what the changes are.  So I performed an strace while grep’ing a short file to see if there are any extra system calls which occur under SLES 11 SP 1.  There are not.

I copied the grep binary from SLES 10 SP 3 to a SLES 11 SP 1 system.  I was afraid this wouldn’t work because it might rely on dynamic libraries which also could have changed.  However this appears to not be the case and the grep binary from the SLES 10 system is about 19 times faster, running on the same SLES 11 system!

Methodology 2

I figure that I am a completely amateur programmer.  If with all my limitations I can implement a search utility that does considerably better than the shell command grep, I can fairly decisively conclude that grep is broken.  Recall that we already have comparisons that show that grep under SLES 10 SP 3 is many times faster than under SLES 11 SP 1.

Results

The table summarizes the findings. All tests were on a 109 MB file which has 460,000 lines.

OS

Type of Grep

Time (s)

SLES 11 SP 1

built-in

42.6

SLES 11 SP 1

SLES 10 SP 3 grep binary

2.5

SLES 11 SP 1

Perl grep

1.1

SLES 10 SP 3

built-in

1.2

SLES 10 SP 3

Perl grep

0.35 s

The Code for Perl Grep

Hey, I don’t know about you, but I only use a fraction of the features in grep. The switches i and v cover about 99% of what I do with it. Well, come to think of it I do use alternate expressions in egrep (w/ the “|” character), and the C switch (provides context by including surrounding lines) can sometimes be really helpful. The i (filenames only) and n (include line numbers) look useful on paper, but you almost never end up needing them. Anyways I simply didn’t program those things to keep it simple. Maybe later. To make it as fast as possible I avoided anything I thought the interpreter might trip over, at the expense of repeating code snippets multiple times. At some point (allowing another switch or two) my approach would be ludicrous as there would be too many combinations to consider. But at least in my testing it does function just like grep, only, as you see from the table above, it is much faster than grep. If I had written it in a compiled language like C it should go even faster still. Perl is an interpreted language so there should always be a performance penalty in using it. The advantage is of course that it is so darn easy to write useful code.

#!/usr/bin/perl
# J.Hilgart, 6/2011
# model grep implementation in Perl
# feel free to borrow or use this, but it will not be supported
use Getopt::Std;
$DEBUG = 0;
# Get the command line options.
getopts('iv');
# the search string has to be present
$mstr = shift @ARGV;
usage() unless $mstr;
$mstr =~ s/\./\\./g;
# the remaining arguments are the files to be searched
$nofiles = @ARGV;
print "nofiles: $nofiles\n" if $DEBUG;
$filePrefix = $nofiles > 1 ? "$_:" : "";
 
# call subroutine based on arguments present
optiv() if $opt_i && $opt_v;
opti()  if $opt_i;
optv()  if $opt_v;
normal();
################################
sub normal {
foreach (@ARGV) {
  open(FILE,"$_") || die "Cannot open $_!!\n";
  while(<FILE>) {
# print filename if there is more than one file being searched
    print "$filePrefix$_" if /$mstr/;
  }
  close(FILE);
}
if (! $nofiles) {
# no files specified, use STDIN
while(<STDIN>) {
  print if /$mstr/;
}
}
exit;
} # end sub normal
###############################
sub opti {
foreach (@ARGV) {
  open(FILE,"$_") || die "Cannot open $_!!\n";
  while(<FILE>) {
    print "$filePrefix$_" if /$mstr/i;
  }
  close(FILE);
}
if (! $nofiles) {
# no files specified, use STDIN
while(<STDIN>) {
  print if /$mstr/i;
}
}
exit;
} # end sub opti
#################################
sub optv {
foreach (@ARGV) {
  open(FILE,"$_") || die "Cannot open $_!!\n";
  while(<FILE>) {
    print "$filePrefix$_" unless /$mstr/;
  }
  close(FILE);
}
if (! $nofiles) {
# no files specified, use STDIN
while(<STDIN>) {
  print unless /$mstr/;
}
}
exit;
} # end sub optv
##############################
sub optiv {
foreach (@ARGV) {
  open(FILE,"$_") || die "Cannot open $_!!\n";
  while(<FILE>) {
    print "$filePrefix$_" unless /$mstr/i;
  }
  close(FILE);
}
if (! $nofiles) {
# no files specified, use STDIN
while(<STDIN>) {
  print unless /$mstr/i;
}
}
exit;
} # end sub optiv
sub usage {
# I never did finish this...
}

Conclusion
So built-in grep performs horribly on SLES 11 SP 1, about 17 times slower than the SLES 10 SP 3 grep. I wonder what an examination of the source code would reveal? But who has time for that? So I’ve shown a way to avoid it entirely, by using a perl grep instead – modify to suit your needs. It’s considerably faster than what the system provides, which is really sad since it’s an amateur, two-hour effort compared to the decade+ (?) of professional development on Posix grep. What has me more concerned is what haven’t I found, yet, that also performs horribly under SLES 11 SP 1? It’s like deer on the side of the road in New Jersey – where there’s one there’s likely to be more lurking nearby : ) .

Follow Up
We will probably open a support case with Novell. I am not very optimistic about our prospects. This will not be an easy problem for them to resolve – the code may be contributed, for instance. So, this is where it gets interesting. Is the much-vaunted rapid bug-fixing of open source really going to make a substantial difference? I would have to look to OpenSUSE to find out (where I suppose the fixed code would first be released), which I may do. I am skeptical this will be fixed this year. With luck, in a year’s time.

7/15 Update
There is a newer version of grep available. Old version: grep-2.5.2-90.18.41; New version: grep-2.6.3-90.18.41 Did it fix the problem? Depends how low you want to lower the bar. It’s a lot better, yes. But it’s still three times slower than grep from SLES 10 SP3. So…still a long ways to go.

9/7 Update – The Solution
Novell came through today, three months later. I guess that’s better than I pessimistically predicted, but hardly anything to brag about.

Turns out that things get dramatically better if you simple define the environment variable LC_ALL=POSIX. They do expect a better fix with SLES 11 SP 2, but there’s no release date for that yet. Being a curious sort, I revisited SLES 10 SP3 with this environment variable defined and it also considerably improved performance there as well! This variable has to do with the Locale and language support. Here’s a table with some recent results. Unfortunately the SLES 11 SP 1 is a VM, and SLES 10 SP3 is a physical server, although the same file was used. So the thing to concentrate on is the improvement in performance of grep with vs without LC_ALL defined.

OS

LC_ALL=POSIX defined?

Time (s)

SLES 11 SP 1

no

6.9

SLES 11 SP 1

yes

0.36

SLES 10 SP 3

no

0.35

SLES 10 SP 3

yes

0.19 s

So if you use SLES 10/11, make sure you have a

export LC_ALL=POSIX

defined somewhere in your profile if you plan to use grep very often. It makes a 19x performance improvement in SLES 11 and almost a 2x performance improvement under SLES 10 SP3.

Related
If you like the idea of grep but want a friendlier interface, I was thinking I ought to mention Splunk. A Google search will lead you to it. It started with a noble concept – all the features of grep, plus a convenient web interface so you never have to get yuor hands dirty and actually log into a Linux/unix system. It was like a grep on steroids. But then in my opinion they ruined a simple utility and blew it up with so many features that it’ll take hours to just scratch the surface of its capabilities. And I’m not even sure a free version is still available. Still, it might be worth a look in some cases. In my case it also slowed down searching though supposedly it should have sped them up.

And to save for last what should have come first, grep is a search utility that’s great for looking at unstructured (not in a relational database) data.

Categories
Linux

Gnu Parallel Really Helps With Zcat

I happened across what I think will turn out to be a very useful open source application: Gnu parallel.  Let me explain how I got to that point.  I was uncompressing gzip’d files, of which I have lots.  So many, in fact, it can take the better part of a day to go through them all.  The server I was using is four years old but I don’t have access to anything better.  In fact it seems processor speed has sort of plateaud.  Moore’s Law only applies if you count the number of processors, which are multiplying like rabbits.  So I start to think that maybe my single-threaded approach is wrong, see? I need to scale horizontally, not vetically.

It just happens that in my former life I spent quite some effort into parallelizing a compute-intensive program I wrote.  I can’t believe that it’s still hanging around on the Internet!  I haven’t looked for it for at least 17 years: my FERMISV  Monte Carlo.  So I thought, what if I could parallelize my zcat|grep?  Writing such a thing from scratch would be too time-consuming, but after searching for parallelize zcat I quickly found Gnu Parallel.  It’s one thing to identify such a program in a Google search, another to get it working.  Believe me, I’ve put in my time with some of these WordPress plugins.

The upshot is that, yes, Gnu Parallel actually works and it’s not hard to learn to use.  They have a nice Youtube video.  For me I employed syntax like:  

ls *gz|time parallel -k "zcat {}|grep 192.168.23.34"  > /tmp/matches

.  This, of course, is to be compared with the normal operation I have been doing for eons:

time zcat *gz|grep 192.168.23.34 > /tmp/matches

.  The “time” is just for benchmarking and normally wouldn’t be present. 

The results are that the matches file produced in the two different ways are identical.  That’s good!  The -k switch ensured the order is preserved in the output, and anyways, it didn’t cost me any time. Otherwise the order is not guaranteed. I tested on a server with four cores.  Gnu Parallel reported that it used 375% of my CPU!  It finished in less than half the time of my normal way of doing things.  I need to put it through more real-world exercises, but it really looks promising.  I never got onto the xargs bandwagon (perhaps I should be ashamed to admit it?), but Gnu Parallel can do a lot of that same type of thing, but in my opinion is more intuitive.  Take a look!

Categories
Linux

Performance Degradation With SLES 11 SP1 Under VMWare

Some friends and I found severe performance degradation using SLES 11 SP1 under VMWare.  That’s Suse Linux Enterprise Server Service Pack 1.  I’m still trying to get my head around it – there are lots of variables to control for. Later update For my analysis of the specific problem with grep on SLES 11 SP1, please go here: http://drjohnstechtalk.com/blog/2011/06/grep-is-slow-as-a-snail-in-sles-11/.

The Method

I start with a 20 Mb gzip’d file.  Let’s call it file.gz.  Uncompressed it is 107 MB.  I run

time zcat file.gz > /dev/null

I run it several times and report the lowest number which I am consistently able to reproduce. That feels the fairest way to me to benchmark in this case.

On a fast, 3 GHz physical server running SLES 10 SP3 it takes 0.63 s. Let’s call that server Physical.

Then I add in something to make it useful: grep’ing for an IP address:

time zcat file.gz|grep 192.168.23.34 > /dev/null

On Physical that takes 0.65 s.

My Amazon EC2 image – let’s call it Amazon-EC2 – runs ubuntu 10.10 on a 2.6 GHz VM. To learn CPU speed in ubuntu:

cat /proc/cpuinfo|grep MHz

My SLES 11 SP1 is a guest VM on VMWare ESX. It has a 2.4 GHz processor. Let’s call it SLES11SP1. For CPU speed in SLES*:

dmesg|grep -i cpu

and look for the line that says Intel … GHz.

* 7/1/2011 Correction The correct way to get the processor speed is

grep -i /proc/cpuinfo

The cpu Mhz line shows the running speed. this also seems to work in RHEL (Redhat) and Debian distributions such as Ubuntu. I’ve looked at several models. Usually the model name – which is what you get from the dmesg way – lists a speed that is the same as the cpu MHz given the 1000x difference between GHz and MHz, but not always! I have come across a recently purchased server with a surprisingly slow clock speed, and one that is quite different from the CPU name:

model name      : Intel(R) Xeon(R) CPU           E7520  @ 1.87GHz
cpu MHz         : 1064.000

Who knew you could even buy a server-class CPU that slow these days?

For comparison I have a SLES 10 SP3 VM under VMWare. It also has a 2.4 GHz CPU. SLES10SP3. All servers are X86_64.

The Results
The amazing results:

Server zcat time zcat|grep IP time
Physical 0.63 s 0.65 s
Amazon-EC2 0.73 s 1.06 s
SLES11SP1 0.99 s 5.8 s
SLES10SP3 0.78 s 0.93 s

Analysis
I’ve tried many more variants than displayed in that table, but you get the idea. All VMs are slower than all physical servers tested in all tests. Most discouragingly, the SLES11 SP3 is five or six times slower than a comparable physical server for the real-life test involving grep. I used the same file in all tests.

Conclusions
Virtualization is not a panacea. It has its role, but also its limitations. And either something is wrong with SLES 11 SP1, which I doubt, or something is wrong with the way I set it up, despite the fact that I’ve tried a few variants. We’re going to try it on a physical server next. I’ll update this post if anything noteworthy happens.

Update
I got it tested on a physical server now, too. A HP G6 w/ 4 Gb RAM. SLES 11 SP1. The CPU identified itself as Xeon E5504 @ 2.0 GHz. Here are the awful timings:

Server zcat time zcat|grep IP time
SLES 11 SP1 Physical 0.90 s 4.8 s

That shows that SLES 11 SP1 itself is causing my performance degradation. Not every instance of SLES 11 is faulty, however. It turns out that IBM’s Watson is running it! http://whatis.techtarget.com/definition/ibm-watson-supercomputer.html

For the record, the kernel in SLES 11 SP1 is 2.6.32.12-0.7, while in SLES 10 SP3 it is 2.6.16.60-0.68.1. On a RHEL 5.6 VM (I did not bother to list its results), where the kernel was 2.6.18-238.1.1, the degradation was not nearly so bad either. I wonder if the kernel version is having a major impact? Perhaps not. The kernel in ubuntu 10.10 looks pretty new – 2.6.35-24. I am running

uname -a

to see the kernel version.

Further Update
We also got to test with SLES 10 SP4 VM. You see from the table that it is well-behaved. It had 2 CPUs which identified themselves as X6550 @ 2.0 GHz. 4 GB RAM. Kernel 2.6.16.60-0.85.1

Server zcat time zcat|grep IP time
SLES 10 SP4 VM 0.96 s 1.0 s