Categories
Admin Perl Web Site Technologies

Turning HP SiteScope into SiteScope Classic with Perl

Intro
HP siteScope is a terrific web application tool and not too expensive for those who have any kind of a budget. The built-in monitor types are a bit limited, but since it allows calls to user-provided scripts your imagination is the only real limitation. For those with too many responsibilities and too little time on their hands it is a real productivity enhancer.

I’ve been using the product for 12 years now – since it was Freshwater SiteScope. I still have misgivings about the interface change introduced some years ago when it was part of Mercury. It went from simple and reliable to Java, complicated and flaky. To this day I have to re-start a SiteScope screen in my browser on a daily basis as the browser cannot recover from a server restart or who knows what other failures.

So I longed for the days of SiteScope Classic. We kept it running for as long as possible, years in fact. But at some point there were no more releases created for the classic view. So I investigated the feasibility of creating my own conversion tool. And…partially succeeded. Succeeded to the point where I can pull up the web page on my Blackberry and get the statuses and history. Think you can do that with regular HP SiteScope? I can’t. Maybe there’s an upgrade for it, but still. It’s nice to have the classic interface when you want to pull up the statuses as quickly as possible, regardless of the Blackberry display issue.

Looking back at my code, I obviously decided to try my hand at OO (object oriented) programming in Perl, with mixed results. Perl’s OO syntax isn’t the best, which addles comprehension. Without further ado, let’s jump into it.

The Details
It relies on something I noticed, that this URL on your HP SiteScope server, http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getConfigurationSnapshot, contains a tree of relationships of all the monitors. Cool, right? But it’s not a tree like you or I would design. Between parent and child is an intermediate layer. I suppose you need that because a group can contain monitors (my only focus in this exercise), but it can also contain alerts and maybe some other properties as well. So I guess the intermediate layer gives them the flexibility to represent all that, though it certainly added to my complication in parsing it. That’s why you’ll see the concern over “grandkids.” I developed a recursive, web-enabled Perl program to parse through this xml. That gives me the tools to build the nice hierarchical groupings. But it does not give me the statuses.

For the status of each monitor I wrote a separate scraper script that simply reads the entire daily SiteScope log every minute! Crude, but it works. I use it for an installation with hundreds of monitors and a log file that grows to 9 MB by the end of the day so I know it scales to that size. Beyond that it’s untested.

In addition to giving only the relationships, the xml also changes with every invocation. It attaches ID numbers to the monitors which initially you think is a nice unique identifier, but they change from invocation to invocation! So an additional challenge was to match up the names of the monitors in the xml output to the names as recorded in the SiteScope log. Also a bit tricky, but in general doable.

So without further ado, here’s the source code for the xml parser and main program which gets called from the web:

#!/usr/bin/perl
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# build v.simple SiteScope web GUI appropriate for smartphones
# 7/2010
#
# Id is our package which defines th Id class
use Id;
use CGI::Pretty;
my $cgi=new CGI;
$DEBUG = 0;
# GIF location on SiteScope classic
$ssgifs = "/artwork/";
$health{good} = qq(<img src="${ssgifs}okay.gif">);
$health{error} = qq(<img src="${ssgifs}error.gif">);
$health{warning} = qq(<img src="${ssgifs}warning.gif">);
# report CGI
$rprt = "/SS/rprt";
# the frustrating thing is that this xml output changes almost every time you call it
$url = 'http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getConfigurationSnapshot';
# get current health of all monitors - which is scraped from the log every minute by a hilgarj cron job
$monitorstats = "/tmp/monitorstats.txt";
print "Content-type: text/plain\n\n" if $DEBUG;
open(MONITORSTATS,"$monitorstats") || die "Cannot open monitor stats file $monitorstats!!";
while(<MONITORSTATS>) {
  chomp;
  ($monitor,$status,$value) = /([^\t]+)\t([^\t]+)\t([^\t]+)/;
  $monitors{"$monitor"} = $status;
  $monitorv{"$monitor"} = $value;
}
open(CURL,"curl $url 2>/dev/null|") || die "cannot open $url for reading!!\n";
my %myobjs = ();
# the xml is one long line!
@lines = <CURL>;
#print "xml line: $lines[0]\n" if $DEBUG;
@multiRefs = split "<multiRef",$lines[0];
#parse multiRefs
# create top-level object
my $id = Id->new (
      id => "id0");
# hash of this object with id as key
$myobjs{"id0"} = $id;
 
# first build our objects...
foreach $mref (@multiRefs) {
  next unless $mref =~ /\sid=/;
#  id="id0" ...
  ($parentid) =  $mref =~ /id=\"(id\d+)/;
  print "parentid: $parentid\n" if $DEBUG;
# watch out for <item><key xsi:type="soapenc:string">groupSnapshotChildren</key><value href="#id3 ...
# vs <item><key xsi:type="soapenc:string">Network</key><value href="#id40"/>
  print "mref: $mref\n" if $DEBUG;
  @ids = split /<item><key/, $mref;
# then loop over ids mentioned in this mref
  foreach $myid (@ids) {
    next unless $myid =~ /href="#(id\d+)/;
    next unless $myobjs{"$parentid"};
# types include group, monitor, alert
    ($typebyregex) = $myid =~ />snapshot_(\w+)SnapshotChildren</;
    $parenttype = $myobjs{"$parentid"}->type();
    $type = $typebyregex ? $typebyregex : $parenttype;
    print "type: $type\n" if $DEBUG;
# skip alert definitions
    next if $type eq "alert";
    print "myid: $myid\n" if $DEBUG;
    ($actualid) = $myid =~ /href="#(id\d+)/;
    print "actualid: $actualid\n" if $DEBUG;
# construct object
    my $id = Id->new (
      id => $actualid,
      type => $type,
      parentid => $parentid );
# build hash of these objects with actualid as key
    $myobjs{$actualid} = $id;
# addchild to parent. note that parent should already have been encountered
    $myobjs{"$parentid"}->addchild($actualid);
    if ($myid !~ /groupSnapshotChildren/) {
# interesting child - has name (every other generation has no name!)
      ($name) = $myid =~ /string\">(.+?)<\/key/;  # use non-greedy operator
      print "name: $name\n" if $DEBUG;
# some names are not of interest to us: alerts, which end in "error" or "good"
      if ($name !~ /(error|good)$/) {
# name may not be unique - get extended name which include all parents
        if (defined $myobjs{"$parentid"}->parentid()) {
          $gdparid = $myobjs{"$parentid"}->parentid();
          $gdparname = $myobjs{"$gdparid"}->extname();
# extname -> extended, or distinguished name.  Should be unique
          $extname = $gdparname. '/' . $name;
        } else {
# 1st generation
          print "1st generation\n" if $DEBUG;
          $extname = $name;
        }
        print "extname: $extname\n" if $DEBUG;
        $id->name($name);
        $id->extname($extname);
        $id->isanamedid(1);
        $myobjs{"$parentid"}->hasnamedkids(1); # want to mark its parent as "special"
# we also need our hash to reference objects by extended name since id changes with each extract and name
may not be unique
        $myobjs{"$extname"} = $id;
      } # end conditional over desirable name check
    } else {
      $id->isanamedid(0);
    }
  }
}
#
# now it's all parsed and our objects are alive. Let's build a web site!
#
# build a cookie containing path
my $pi = $ENV{PATH_INFO};
$script = $ENV{SCRIPT_NAME};
$ua = $ENV{HTTP_USER_AGENT};
# Blackberry browser test
$BB = $ua =~ /^BlackBerry/i ? 1 : 0;
$MSIE = $ua =~ /MSIE /;
# font-size depends on browser
$FS = "font-size: x-small;" if $MSIE;
$cookie = $cgi->cookie("pathinfo");
$uri = $script . $pi;
$cookie=$cgi->cookie(-name=>"pathinfo", -value=>"$uri");
print $cgi->header(-type=>"text/html",-cookie=>$cookie);
($url) = $pi =~ m#([^/]+)$#;
#  -title=>'SmartPhone View',
# this doesn't work, sigh...
#print $cgi->start_html(-head=>meta({-http_equiv=>'Refresh'}));
print qq( <HEAD>
<meta http-equiv="Expires" content="0">
<meta http-equiv="Pragma" content="no-cache">
<meta HTTP-EQUIV="Refresh" CONTENT="60; URL=$url">
<TITLE>SiteScope Classic $url Detail</TITLE>
<style type="text/css">
a.good {color: green; }
a.warning {color: green; }
a.error {color: red; }
td {font-family: Arial, Helvetica, sans-serif; $FS}
p.ss {font-family: Arial, Helvetica, sans-serif;}
</style>
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
<script type=text/javascript>
function changeme(elemid,longvalue)
{
document.getElementById(elemid).innerText=longvalue;
}
function restoreme(elemid,truncvalue)
{
document.getElementById(elemid).innerText=truncvalue;
}
</script>
</HEAD><body>
);
 
#print $cgi->h1("This is the heading");
# parse path
# top lvl name:2nd lvl name:3rd lvl name
$altpi = $cgi->path_info();
print $cgi->p("pi is $pi") if $DEBUG;
#print $cgi->p("altpi is $altpi");
# relative url
$rurl = $cgi->url(-relative=>1);
if ($pi eq "") {
# the top
# top id is id3
  print qq(<p class="ss">);
  $myid = "id3";
  foreach $kid ($myobjs{"$myid"}->get_children()) {
    my $kidname = $myobjs{"$kid"}->name();
# kids can be subgroups or standalone monitors
    my $health = recurse("/$kidname");
    print "$health{$health} <a href=\"$rurl/$kidname\">$kidname</a><br>\n";
    $prodtest = $kid if $kidname eq "Production";
  }
  print "</p>\n";
} else {
  $extname = $pi;
  print "pi,name,extname,script: $pi,$name,$extname,$script\n" if $DEBUG;
# print where we are
  $uriname = $pi;
  $uriname =~ s#^/##;
  #print $cgi->p("name is $name");
  #print $cgi->p("uriname is $uriname");
  $uricompositepart = "/";
  @uriparts = split('/',$uriname);
  $lastpart = pop @uriparts;
  print qq(<p class="ss"><a href="$script"><b>Sitescope</b></a><br>);
  print qq(<b>Monitors in: );
  foreach $uripart (@uriparts) {
    my $healthp = recurse("$uricompositepart$uripart");
# build valid link
    ##$link = qq(<a class="good" href="$script$uricompositepart$uripart">$uripart</a>: );
    $link = qq(<a class="$healthp" href="$script$uricompositepart$uripart">$uripart</a>: );
    $uricompositepart .= "$uripart/";
    print $link;
  }
  my $healthp = recurse("$uricompositepart$lastpart");
  $color = $healthp eq "error" ? "red" : "green";
  print qq(<font color="$color">$lastpart</font></b></p>\n);
  print qq(<table border="1" cellspacing="0">);
  #print qq(<table>);
  %hashtrs = ();
  foreach $kid ($myobjs{"$extname"}->get_children()) {
    print "kid id: " . $myobjs{"$kid"}->id() . "\n" if $DEBUG;
    next unless $myobjs{"$kid"}->hasnamedkids();
    foreach $gdkid ($myobjs{"$kid"}->get_children()) {
      print "gdkid id: " . $myobjs{"$gdkid"}->id() . "\n" if $DEBUG;
      $gdkidname = $myobjs{"$gdkid"}->name();
      $gdkidextname = $myobjs{"$gdkid"}->extname();
      my $health = recurse("$gdkidextname");
      my $type = $myobjs{"$gdkid"}->type();
# dig deeper to learn health of the grankid's grandkids
      $objct = $healthct{good} = $healthct{error} = $healthct{warning} = 0;
      foreach $ggkid ($myobjs{"$gdkidextname"}->get_children()) {
        print "ggkid id: " . $myobjs{"$ggkid"}->id() . "\n" if $DEBUG;
        next unless $myobjs{"$ggkid"}->hasnamedkids();
        foreach $gggdkid ($myobjs{"$ggkid"}->get_children()) {
          print "gggdkid id: " . $myobjs{"$gggdkid"}->id() . "\n" if $DEBUG;
          $gggdkidname = $myobjs{"$gggdkid"}->name();
          $gggdkidextname = $myobjs{"$gggdkid"}->extname();
          my $health = recurse("$gggdkidextname");
          $objct++;
          $healthct{$health}++;
        }
      }
      $elemct++;
      $elemid = "elemid" . $elemct;
# groups should have distinctive cell background color to set them apart from monitors
      if ($type eq "group") {
        $bgcolor = "#F0F0F0";
        $celllink = "$lastpart/$gdkidname";
        $truncvalue = qq(<font color="red">$healthct{error}</font>/$objct);
        $tdval = $truncvalue;
      } else {
        $bgcolor = "#FFFFFF";
        $celllink = "$rprt?$gdkidname";
# truncate monitor value to save display space
        $longvalue = $monitorv{"$gdkidname"};
        (my $truncvalue) = $monitorv{"$gdkidname"} =~ /^(.{7,9})/;
        $truncvalue = $truncvalue? $truncvalue : "&nbsp;";
        $tdval = qq(<span id="$elemid" onmouseover="changeme('$elemid','$longvalue')" onmouseout="restorem
e('$elemid','$truncvalue')">$truncvalue</span>);
      }
      $hashtrs{"$gdkidname"} = qq(<tr><td bgcolor="#000000">$health{$health} </td><td>$tdval</td><td bgcol
or="$bgcolor"><a href="$celllink">$gdkidname</a></td></tr>\n);
# for health we're going to have to recurse
    }
  }
# print out in alphabetical order
  foreach $key (sort(keys %hashtrs)) {
    print $hashtrs{"$key"};
  }
  print "</table>";
}
print $cgi->end_html();
#######################################
sub recurse {
# to get the union of health of all ancestors
my $moniext = shift;
my ($moni) = $moniext =~ m#/([^/]+)$#;
# don't bother recursing and all that unless we have to...
return $myobjs{"$moniext"}->health() if defined $myobjs{"$moniext"}->health();
print "moni,moniext: $moni, $moniext\n" if $DEBUG;
my ($kid,$gdkidextname,$health,$cumhealth);
$cumhealth = $health = $monitors{"$moni"} ? $monitors{"$moni"} : "good";
foreach $kid ($myobjs{"$moniext"}->get_children()) {
    if ($myobjs{"$kid"}->hasnamedkids()) {
      foreach $gdkid ($myobjs{"$kid"}->get_children()) {
        $gdkidextname = $myobjs{"$gdkid"}->extname();
# for health we're going to have to recurse
        $health = recurse("$gdkidextname");
        if ($health eq "error" || $cumhealth eq "error") {
          $cumhealth = "error";
        } elsif ($health eq "warning" || $cumhealth eq "warning") {
          $cumhealth = "warning";
        }
      }
    } else {
# this kid is end of line
      $health = $monitors{"$kid"} ? $monitors{"$kid"} : "good";
        if ($health eq "error" || $cumhealth eq "error") {
          $cumhealth = "error";
        } elsif ($health eq "warning" || $cumhealth eq "warning") {
          $cumhealth = "warning";
        }
    }
}
$myobjs{"$moniext"}->health("$cumhealth");
return $cumhealth;
} # end sub recurse

I call it simply “ss” to minimize the typing required. You see it uses a package called Id.pm which I wrote to encapsulate the class and methods. Here is Id.pm:

package Id;
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# class for storing data about an id
# URL (not currently protected): http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getC
onfigurationSnapshot
# class for storing data about a group
use warnings;
use strict;
use Carp;
#group methods
# constructor
# get_members
# get_name
# get_id
# addmember
#
# member methods
# constructor
# get_id
# get_name
# get_type
# get_gp
# set_gp
 
sub new {
  my $class = shift;
  my $self = {@_};
  bless($self, "Id");
  return $self;
}
# get-set methods, p. 355
sub parentid { $_[0]->{parentid}=$_[1] if defined $_[1]; $_[0]->{parentid} }
sub isanamedid { $_[0]->{isanamedid}=$_[1] if defined $_[1]; $_[0]->{isanamedid} }
sub id { $_[0]->{id}=$_[1] if defined $_[1]; $_[0]->{id} }
sub name { $_[0]->{name}=$_[1] if defined $_[1]; $_[0]->{name} }
sub extname { $_[0]->{extname}=$_[1] if defined $_[1]; $_[0]->{extname} }
sub type { $_[0]->{type}=$_[1] if defined $_[1]; $_[0]->{type} }
sub health { $_[0]->{health}=$_[1] if defined $_[1]; $_[0]->{health} }
sub hasnamedkids { $_[0]->{hasnamedkids}=$_[1] if defined $_[1]; $_[0]->{hasnamedkids} }
 
# get children - use anonymous array, book p. 221-222
sub get_children {
# return empty array if arrary hasn't been defined...
  defined @{$_[0]->{children}} ? @{$_[0]->{children}} : ();
}
# adding children
sub addchild {
  $_[0]->{children} = [] unless defined  $_[0]->{children};
  push @{$_[0]->{children}},$_[1];
}
 
1;

ss also assumes the existence of just a few of the images from SiteScope classic – the green circle for good, red diamond for error and yellow warning, etc.. I borrowed them SiteScope classic.

Here is the code for the log scraper:

#!/usr/bin/perl
# analyze SiteScope log file
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# 8/2010
$DEBUG = 0;
$logdir = "/opt/SiteScope/logs";
$monitorstats = "/tmp/monitorstats.txt";
$monitorstatshis = "/tmp/monitorstats-his.txt";
$date = `date +%Y_%m_%d`;
chomp($date);
$file = "$logdir/SiteScope$date.log";
open(LOG,"$file") || die "Cannot open SiteScope log file: $file!!\n";
# example lines:
# 16:51:07 08/02/2010     good    LDAPServers     LDAP SSL test : ldapsrv.drj.com exit: 0, 0.502 sec    1:
3481  0       502
#16:51:22 08/02/2010     good    Network DNS: (AMEAST) ns2  0.033 sec   2:3459      200     33      ok
#16:51:49 08/02/2010     good    Proxy   proxy.pac script on iwww    0.055 sec   2:12467 200     55   ok
     4288    1280782309      0    0  55      0       0      200  0
#16:52:04 08/02/2010     good    Proxy   Disk Space: earth /logs   66% full, 13862MB free, 41921MB total
 3:3598      66      139862
#16:52:09 08/02/2010     good    DrjExtranet  URL: wwwsecure.drj.com     0.364 sec    1:3604      200
364  ok 26125   1280782328     0    0   358     4       2       200  0
while(<LOG>) {
  ($time,$date,$status,$group,$monitor,$value) = /(\S+)\s(\S+)\t(\S+)\t(\S+)\t([^\t]+)\t([^\t]+)/;
  print '$time,$date,$status,$group,$monitor,$value' . "$time,$date,$status,$group,$monitor,$value\n" if $DEBUG;
  next if $group =~ /__health__/; # don't care about these lines
  $mons{"$monitor"} = 1;
  push @{$mont{"$monitor"}} , $time;
  push @{$mond{"$monitor"}} , $date;
  push @{$monh{"$monitor"}} , $status;
  push @{$monv{"$monitor"}} , $value;
}
# open output at last moment to minimize chances of reading while locked for writing
open(MONITORSTATS,">$monitorstats") || die "Cannot open monitor stats file $monitorstats!!\n";
open(MONITORSTATSHIS,">$monitorstatshis") || die "Cannot open monitor stats file $monitorstatshis!!\n";
# write it all out - will always print the latest values
foreach $monitor (keys %mons) {
# dereference our anonymous arrays
  @times = @{$mont{"$monitor"}};
  @dates = @{$mond{"$monitor"}};
  @status = @{$monh{"$monitor"}};
  @value = @{$monv{"$monitor"}};
# last element is the latest measured status and value
  print MONITORSTATS "$monitor\t$status[-1]\t$value[-1]\n";
  print MONITORSTATSHIS "$monitor\n";
  #for ($i=-11;$i<0;$i++) {
# put latest measure on top
  for ($i=-1;$i>-13;$i--) {
    $time = defined $times[$i] ? $times[$i] : "NA";
    $date = defined $dates[$i] ? $dates[$i] : "NA";
    $stat = defined $status[$i] ? $status[$i] : "NA";
    $val = defined $value[$i] ? $value[$i] : "NA";
    print MONITORSTATSHIS "\t$time\t$date\t$stat\t$val\n";
  }
}

As I said it gets called every minute by cron.

That’s it! I enter the url sitescope.drj.com/SS/ss to access the main program which gets executed because I made /SS a CGI-BIN directory.

This gives you a read-only, Java-free view into your SiteScope status and hierarchy which beckons back to the good old days of Freshwater SiteScope.

Know your limits
What it does not do, unfortunately, is allow you to run a monitor – that seems like the next most simple thing which I should have been able to do but couldn’t figure out – much less define new monitors (never going to happen) or alerts.

I use this successfully against my HP SiteScope instance of roughly 400 monitors which itself is on a VM and there is no apparent strain. At some point this simple-minded script would no longer scale to suit the task at hand, but it might be good for up to a few thousand monitors.

And now a word about open source alternatives
Since I was so enamored with SiteScope Classic there seemed to be no compelling reason to shell out the dough for HP SiteScope with its unwanted interface, so I briefly looked around at free alternatives. Free sounds good, right? Not so much in practice. Out there in Cyberspace there is an enthusiast for a product called Zabbix. I just want to go on the record that Zabbix is the most confused piece of junk I have run across. You are getting less than what you paid for ($0) because you will be wasting a lot of time with it, and in the end it isn’t all that capable. Nagios also had its limits – I can’t remember the exact reason I didn’t go down that route, but there were definite reasons.

HP SiteScope is no panacea. “HP” and “stifling bureaucracy” need to be mentioned in the same sentence. Every time we renew support it is the most confusing mess of line items. Every time there’s a new cast of characters over at HP who nothing about the account’s history. You practically have to beg them to accept your money for a low-budget item like SiteScope because they really don’t pursue it in any way. Then their SAID and contract numbers stuff is confusing if you only see it once every few years.

Conclusion
A conversion program does exist for turning the finicky HP SiteScope Java-encumbered view into pure SiteScope Classic because I wrote it! But it’s a limited read-only view. Still, it’s helpful in a pinch and can even be viewed on the Blackberry’s browser.

Another problem is that HP has threatened to completely change the API so this tool, which is designed for HP SiteScope v 10.12, will probably completely break for newer versions. Oh, well.

References
This post shows some silly mistakes to avoid when doing a minor upgrade in version 11.

Categories
Internet Mail

How to run sendmail in queue-only mode

Intro
I guess I’ve ragged on sendmail before. Incredibly powerful program. Finding out how to do that simple thing you want to do may not be so easy, even with the bible at your side. So to that end I’m making an effort to document those simple things which I’ve found I’ve struggled with.

The Details
Today I wanted to capture all email coming into my sendmail daemon. Well, actually it’s a little more complicated. I didn’t want to disturb production email, but I wanted to capture a spam sample. Today there was a hugely effective spam campaign purporting to be email from the Better Business Bureau (BBB). All the emails however actually came from various senders @aicpa.org. Postini put a filter in place but I knew more were getting through. But they weren’t coming to me. How to get capture them without disturbing users?

In this post I gave some obscure but useful tips for sendmail admins, including the ever-useful smarttable add-on. To reprise, smarttable allows you to make delivery decisions based on sender! That’s totally antithetical to your run-of-the-mill sendmail admin, but it’s really useful… Like now. So I quickly put up a sendmail instance, copying a working config I use in production. But I changed the listener to IP address 127.0.0.2 (which I fortunately had already set up for some other reason I can no longer recall). That one’s pretty standard. That’s just:

DAEMON_OPTIONS(`Name=sm-cap, Addr=127.0.0.2')dnl

Of course you want to create a new queue directory just for the captured emails. I created /mqueue/c0 and put in this line into my .mc file:

define(QUEUE_DIR, `/mqueue/c*')dnl

And here’s the main point, how to defer delivery of all emails. Sendmail actually distinguishes between defer and queueonly. I chose queueonly thusly:

define(`confDELIVERY_MODE',`queueonly')dnl

If by chance you happen to misspell DELIVERY_MODE, like, let’s say, DELIERY_MODE, you don’t seem to get a whole lot of errors. Not that that would ever happen to us, mind you, I’m just saying. That’s why it’s good to also know about the command-line option. Keep reading for that.

It’s simple enough to test once you have it running (which I do with this line: sudo sendmail -bd -q -C/etc/mail/capture.cf).

> telnet 127.0.0.2 25
Trying 127.0.0.2…
Connected to 127.0.0.2.
Escape character is ‘^]’.
220 drj.com ESMTP server ready at Fri, 24 Feb 2012 15:16:40 -0500
helo localhost
250 drjemgw2.drj.com Hello [127.0.0.2], pleased to meet you
mail from: [email protected]
250 2.1.0 [email protected]… Sender ok
rcpt to: [email protected]
250 2.1.5 [email protected]… Recipient ok
data
354 Enter mail, end with “.” on a line by itself
subject: test of the capture-only sendmail instance

Just a test!
-Dr J
.

250 2.0.0 q1OKGet2008636 Message accepted for delivery
quit
221 2.0.0 drj.com closing connection
Connection closed by foreign host.

Is the message there, queued up the way we’d like? You bet:

> ls -l /mqueue/c0

total 16
-rw------- 1 root root  19 2012-02-24 15:17 dfq1OKGet2008636
-rw------- 1 root root 542 2012-02-24 15:17 qfq1OKGet2008636

There also seems to be a second way to run sendmail in queue-only fashion. I got it to work from the command-line like this:

> sudo sendmail -odqueueonly -bd -C/etc/mail/capture.cf

The book says this is deprecrated usage, however. But let’s see, that’s O’Reilly’s Sendmail 3rd edition, published in 2003, we’re in 2012, so, hmm, they still haven’t cut us off…

One last thing, that smarttable entry for my main sendmail daemon. I added the line:

@aicpa.org relay:[127.0.0.2]

Conclusion
It can be useful to queue all incoming emails for various reasons. It’s a little hard to find out how to do this precisely. We found a way to do this without stopping/starting our main sendmail process. This post shows a couple ways to do it, and why you might need to.

May 2012 Update
Just wanted to mention about BBB email how I handle it now. They told me they maintain an accurate SPF record. Sure enough, they do. Now we only accept bbb.org email when the SPF record is a match. But I don’t use sendmail for that, I use Postini’s (OK, Google’s, technically) mail hygiene service. Postini rocks!

My most recent post on how to tame the confounding sendmail log is here.

Categories
Apache

Running CGI Scripts from any Directory with Apache

Intro
This is a really basic issue, but the documentation out there often doesn’t speak directly to this single issue – other things get thrown into the mix. This document is to show how to enable the running of CGI scripts from any directory under htdocs with the Apache2 web server.

The Details
Let’s get into it. CGI – common gateway interface – is a great environment for simple web server programs. But if you’ve got a fairly generic apache2 configuration file and are trying CGI you might encounter a few different errors before getting it right. Here’s the contents of my test.cgi file:

#!/bin/sh
echo "Content-type: text/html"
echo "Location: http://drjohnstechtalk.com"
echo ""

Not tuning my dflt.conf apache2 configuration file in any way, I find to my horror that the cgi file contents gets returned as is initially, just as if it were no more special than a random test file:

> curl -i localhost/test/test.cgi

HTTP/1.1 200 OK
Date: Fri, 24 Feb 2012 14:07:37 GMT
Server: Apache/2
Last-Modified: Fri, 24 Feb 2012 14:05:29 GMT
ETag: "56d8-5c-4b9b640c9dc40"
Content-Length: 92
Content-Type: text/plain
 
#!/bin/sh
echo "Content-type: text/html"
echo "Location: http://drjohnstechtalk.com"
echo ""

That’s no good.

Now I do some research and realize the following line should be added. Here I show it in its context:

<Directory "/usr/local/apache2/htdocs">
# make .cgi and .pl extensions valid CGI types
    AddHandler cgi-script .cgi .pl
    ...

and we get…
> curl -i localhost/test/test.cgi

HTTP/1.1 403 Forbidden
Date: Fri, 24 Feb 2012 14:11:55 GMT
Server: Apache/2
Content-Length: 215
Content-Type: text/html; charset=iso-8859-1
 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /test/test.cgi
on this server.</p>
</body></html>

Hmmm. Better, perhaps, but definitely not working. What did we miss? Chances are we had something like this in our dflt.conf:

<Directory "/usr/local/apache2/htdocs">
# make .cgi and .pl extensions valid CGI types
    AddHandler cgi-script .cgi .pl
    Options Indexes FollowSymLinks
    ...

but what we need is this:

<Directory "/usr/local/apache2/htdocs">
# make .cgi and .pl extensions valid CGI types
    AddHandler cgi-script .cgi .pl
    Options Indexes FollowSymLinks ExecCGI
    ...

Note the addition of the ExecCGI to the Options statement.

With that tweak to our apache2 configuration we get the desired result after restarting:

> curl -i localhost/test/test.cgi

HTTP/1.1 302 Found
Date: Fri, 24 Feb 2012 14:15:47 GMT
Server: Apache/2
Location: http://drjohnstechtalk.com
Content-Length: 209
Content-Type: text/html; charset=iso-8859-1
 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://drjohnstechtalk.com">here</a>.</p>
</body></html>

But what if we want our CGI script to be our default page? What I did for that is to add this statement:

DirectoryIndex index.html index.cgi

so now we have:

<Directory "/usr/local/apache2/htdocs">
# make .cgi and .pl extensions valid CGI types
    AddHandler cgi-script .cgi .pl
    Options Indexes FollowSymLinks ExecCGI
    DirectoryIndex index.html index.cgi
    ...

I create an index.cgi and delete the index.html and index.cgi is run from the top-level URL:

> curl -i johnstechtalk.com

HTTP/1.1 302 Found
Date: Fri, 29 Apr 2013 14:15:47 GMT
Server: Apache/2
Location: http://drjohnstechtalk.com/blog/
Content-Length: 209
Content-Type: text/html; charset=iso-8859-1
 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://drjohnstechtalk.com/blog/">here</a>.</p>
</body></html>

Conclusion
We have shown the two most common problems that occur when trying to enable CGI execution with the apache2 web server. We have shown how to fix the configuration.

Categories
TCP/IP

C++ TCP Socket Program

Intro
I was looking around for a sample TCP socket program written in C++ that might make working with TCP sockets less mysterious. I expected to find a flood of things to pick from, but that really wasn’t the case.

The Details
OK, I only looked for a few minutes, to be honest. The one I did settle on seems adequate. It’s sufficiently old, however, that it doesn’t actually work as-is. Probably if it did I wouldn’t even mention it. So I thought it was worth repeating here, with some tiny semantic updates.

What I used is from this web page: http://cs.baylor.edu/~donahoo/practical/CSockets/practical/. I was really only interested in the TCP echo client. It’s a good stand-ion for any TCP client I think.

Here’s TCPechoClient.cpp:

/*
 *   C++ sockets on Unix and Windows
 *   Copyright (C) 2002
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 *   This program is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *   GNU General Public License for more details.
 *
 *   You should have received a copy of the GNU General Public License
 *   along with this program; if not, write to the Free Software
 *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */
 
// taken from http://cs.baylor.edu/~donahoo/practical/CSockets/practical/TCPEchoClient.cpp
 
#include "PracticalSocket.h"  // For Socket and SocketException
#include <iostream>           // For cerr and cout
#include <cstdlib>            // For atoi()
#include <cstring>            // author forgot this
 
using namespace std;
 
const int RCVBUFSIZE = 32;    // Size of receive buffer
 
int main(int argc, char *argv[]) {
  if ((argc < 3) || (argc > 4)) {     // Test for correct number of arguments
    cerr << "Usage: " << argv[0]
         << " <Server> <Echo String> [<Server Port>]" << endl;
    exit(1);
  }
 
  string servAddress = argv[1]; // First arg: server address
  char *echoString = argv[2];   // Second arg: string to echo
// DrJ test
//  echoString = "GET / HTTP/1.0\n\n";
  int echoStringLen = strlen(echoString);   // Determine input length
  unsigned short echoServPort = (argc == 4) ? atoi(argv[3]) : 7;
 
  try {
    // Establish connection with the echo server
    TCPSocket sock(servAddress, echoServPort);
 
    // Send the string to the echo server
    sock.send(echoString, echoStringLen);
 
    char echoBuffer[RCVBUFSIZE + 1];    // Buffer for echo string + \0
    int bytesReceived = 0;              // Bytes read on each recv()
    int totalBytesReceived = 0;         // Total bytes read
    // Receive the same string back from the server
    cout << "Received: ";               // Setup to print the echoed string
    while (totalBytesReceived < echoStringLen) {
      // Receive up to the buffer size bytes from the sender
      if ((bytesReceived = (sock.recv(echoBuffer, RCVBUFSIZE))) <= 0) {
        cerr << "Unable to read";
        exit(1);
      }
      totalBytesReceived += bytesReceived;     // Keep tally of total bytes
      echoBuffer[bytesReceived] = '\0';        // Terminate the string!
      cout << echoBuffer;                      // Print the echo buffer
    }
    cout << endl;
 
    // Destructor closes the socket
 
  } catch(SocketException &e) {
    cerr << e.what() << endl;
    exit(1);
  }
 
  return 0;
}

Note the cstring header file I needed to include. The standard must have changed to require this since the original code was published.

Then I neeed PracticalSocket.h, but that has no changes from the original version: “http://cs.baylor.edu/~donahoo/practical/CSockets/practical/PracticalSocket.h, and his Makefile is also just fine: http://cs.baylor.edu/~donahoo/practical/CSockets/practical/Makefile. For the fun of it I also set up the TCP Echo Server: http://cs.baylor.edu/~donahoo/practical/CSockets/practical/TCPEchoServer.cpp.

Run

make TCPEchoclient

and you should be good to go. How to test this TCPEchoClient against your web server? I found that the following works:

~/TCPEchoClient drjohnstechtalk.com 'GET / HTTP/1.0
Host: drjohnstechtalk.com
 
' 80

which gives this output:

Received: HTTP/1.1 301 Moved Permanently
Date: Thu, 23 Feb 2012 17:19:02

which, now that I analyze it, looks cut-off. Hmm. Because with curl I have:

curl -i drjohnstechtalk.com

HTTP/1.1 301 Moved Permanently
Date: Thu, 23 Feb 2012 17:19:43 GMT
Server: Apache/2.2.16 (Ubuntu)
X-Powered-By: PHP/5.3.3-1ubuntu9.5
Location: http://www.drjohnstechtalk.com/blog/
Vary: Accept-Encoding
Content-Length: 2
Content-Type: text/html

I guess that’s what you get for demo code. At this point I don’t have a need to sort it out so I won’t. Perhaps we’ll come back to it later. Looking at it, I see the received buffer size is quite small, 32 bytes. I tried to set that to a reasonable value, 200 MBytes, but get a segmentation fault. The largest I could manage, after experiementation, is 10000000 bytes:

//const int RCVBUFSIZE = 32;    // Size of receive buffer
const int RCVBUFSIZE = 10000000;    // Size of receive buffer - why is 10 MB the max. value??

and this does indeed give us the complete output from our web server home page now.

Conclusion
There is some demo C++ code which creates a useable class for dealing with TCP sockets. There might be some work to do before it could be used in a serious application, however.

Categories
Apache Linux Web Site Technologies

Turning Apache into a Redirect Factory

Intro
I’m getting a little more used to Apache. It’s a strange web server with all sorts of bolt-on pieces. The official documentation is horrible so you really need sites like this to explain how to actually do useful things. You needs real, working examples. In this example I’m going to show how to use the mod_rewrite engine of Apache to build a powerful and convenient web server whose sole purpose in life is for all types of redirects. I call it a redirect factory.

Which Redirects Will it Handle
The redirects will be read in from a file with an easy, editable format. So we never have to touch our running web server. We’ll build in support for the types of redirect requests that I have actually encountered. We don’t care what kind of crazy stuff Apache might permit. You’ll pull your hair out trying to understand it all. All redirects I have ever encountered fall into a relatively small handful of use cases. Ordered by most to least common:

  1. host -> new_url
  2. host/uri[Suffix] -> new_fixed_url (this can be a case-sensitive or case-insensitive match to the uri)
  3. host/uri[Suffix] -> new_prefix_uri[Suffix] (also either case-sensitive or not)

So some examples (not the best examples because I don’t manage drj.com or drj.net, but pretend I did):

  1. drj.com/WHATEVER -> http://drjohnstechtalk.com/
  2. www.drj.com -> http://drjohnstechtalk.com/
  3. drj.com/abcPATH/Preserve -> http://drjohnstechtalk.com/abcPATH/Preserve
  4. drj.com/defPATH/Preserve -> http://drjohnstechtalk.com/ghiPATH/Preserve
  5. drj.com/path/with/slash -> http://drjohnstechtalk.com/other/path
  6. drj.com/path/with/prefix -> http://drjohnstechtalk.com/other/path
  7. drj.net/pAtH/whatever -> https://drjohnstechtalk.com/straightpath
  8. drj.net/2pAtH/stuff?hi=there http://drjohnstechtalk.com/2straightpath/stuff?hi=there
  9. my.host -> http://regular-redirect.com/
  10. whatever-host.whatever-domain/whatever-URI -> http://whatever-new-host.whatever-new-domain/whatever-new-URI

All these different cases can be handled with one config file. I’ve named it redirs.txt. It looks like this:

# redirs file
# The default target has to be listed first
defaultTarget   D       http://www.drjohnstechtalk.com/blog/
# hosts with URI-matching grouped together
# available flags: "P" - preserve part after match
#                  "C" - exact case match of URI
 
# Begin host: drj.com:www.drj.com - ":"-separated list of applicable hostnames
/                       http://drjohnstechtalk.com/
/abc    P       http://drjohnstechtalk.com/abc
/def    P       http://drjohnstechtalk.com/ghi
/path/with/slash https://drjohnstechtalk.com/other/path
/path/with/prefix P  https://drjohnstechtalk.com/other/path
# end host drj.com:www.drj.com
 
# this syntax - host/URI - is also OK...
drj.net/ter             http://drjohnstechtalk.com/terminalredirect
drj.net/pAtH    C       http://drjohnstechtalk.com/straightpath
drj.net/2pAtH   CP      http://drjohnstechtalk.com/2straightpath
 
# hosts with only host-name matching
my.host                 http://regular-redirect.com/
www.drj.edu             http://education-redirect.edu/edu-path

The Apache configuration file piece is this:

# I really don't think this does anything other than chase away a scary warning in the error log...
RewriteLock ${APACHE_LOCK_DIR}/rewrite_lock
 
# Inspired by the dreadful documentation on http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
RewriteEngine on
RewriteMap  redirectMap prg:conf/vhosts/redirect.pl
#RewriteCond ${lowercase:%{HTTP_HOST}} ^(.+)$
RewriteCond ${redirectMap:%{HTTP_HOST}%{REQUEST_URI}} ^(.+)$
# %N are backreferences to RewriteCond matches, and $N are backreferences to RewriteRule matches
RewriteRule ^/.* %1 [R=301,L]

Remember I split up apache configuration into smaller files. So that’s why you don’t see the lines about logging and what port to listen on, etc. And the APACHE_LOCK_DIR is an environment variable I set up elsewhere. This file is called redirect.conf and is in my conf/vhosts directory.

In my main httpd.conf file I extended the logging to prefix the lines in the access log with the host name (since this redirect server handles many host names this is the only way to get an idea of which hosts are popular):

...
    LogFormat "%{Host}i %h %l %u %t \"%r\" %&gt;s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
...

So a typical log line looks something like the following:

drj.com 201.212.205.11 - - [10/Feb/2012:09:09:07 -0500] "GET /abc HTTP/1.1" 301 238 "http://www.google.com.br/url?sa=t&amp;rct=j&amp;q=drjsearch" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E)"

I had to re-compile apache because originally my version did not have mod_rewrite compiled in. My description of compiling Apache with this module is here.

The directives themselves I figured out based on the lousy documentation at their official site: http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html. The heavy lifting is done in the Perl script because there you have some freedom (yeah!) and are not constrained to understand all their silly flags. One trick that does not seem documented is that you can send the full URL to your mapping program. Note the %{HTTP_HOST}%{REQUEST_URI} after the “:”.

I tried to keep redirect.pl brief and simple. Considering the many different cases it isn’t too bad. It weighs in at 70 lines. Here it is:

#!/usr/bin/perl
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# input is $HTTP_HOST$REQUEST_URI
$redirs = "redirs.txt";
# here I only want the actual script name
$working_directory = $script_name = $0;
$script_name =~ s/.*\///g;
$working_directory =~ s/\/$script_name$//g;
$finalType = "";
$DEBUG = 0;
$|=1;
while () {
  chomp;
  ($host,$uri) = /^([^\/]+)\/(.*)/;
  $host = lc $host;
# use generic redirect file
  open(REDIRS,"$working_directory/$redirs") || die "Cannot open redirs file $redirs!!\n";
  $lenmatchmax = -1;
  while() {
# look for alternate names section
    if (/#\s*Begin host\s*:\s*(\S+)/i) {
      @hostnames = split /:/,$1;
      $pathsection = 1;
    } elsif (/#\s*End host/i) {
      $pathsection = 0;
    }
    @hostnames = () unless $pathsection;
    next if /^#/ || /^\s*$/; # ignore comments and blank lines
    chomp;
    $type = "";
# take out trailing spaces after the target URL
    s/\s+$//;
    if (/^(\S+)\s+(\S{1,2})\s+(\S+)$/) {
      ($redirsURL,$type,$targetURL) = ($1,$2,$3);
    } else {
       ($redirsURL,$targetURL) = /^(\S+)\s+(\S+)$/;
    }
# set default target if specified. It has to come at beginning of file
    $finalURL = $targetURL if $type =~ /D/;
    $redirsHost = $redirsURI = $redirsURIesc = "";
    ($redirsHost,$redirsURI) = $redirsURL =~ /^([^\/]*)\/?(.*)/;
    $redirsURIesc = $redirsURI;
    $redirsURIesc =~ s/([\/\?\.])/\\$1/g;
    print "redirsHost,redirsURI,redirsURIesc,targetURL,type: $redirsHost,$redirsURI,$redirsURIesc,$targetURL,$type\n" if $DEBUG;
    push @hostnames,$redirsHost unless $pathsection;
    foreach $redirsHost (@hostnames) {
    if ($host eq $redirsHost) {
# assume case-insensitive match by default.  Use type of 'C' to demand exact case match
# also note this matches even if uri and redirsURI are both empty
      if ($uri =~ /^$redirsURIesc/ || ($type !~ /C/ &amp;&amp; $uri =~ /^$redirsURIesc/i)) {
# find longest match
        $lenmatch = length($redirsURI);
        if ($lenmatch &gt; $lenmatchmax) {
          $finalURL = $targetURL;
          $finalType = $type;
          $lenmatchmax = $lenmatch;
          if ($type =~ /P/) {
# prefix redirect
            if ($uri =~ /^$redirsURIesc(.+)/ || ($type !~ /C/ &amp;&amp; $uri =~ /^$redirsURIesc(.+)/i)) {
              $finalURL .= $1;
             }
          }
        }
      }
    } # end condition over input host matching host from redirs file
    } # end loop over hostnames list
  } # end loop over lines in redirs file
  close(REDIRS);
# non-prefix re-direct. This is bizarre, but you have to end URI with "?" to kill off the query string, unless the target already contains a "?", in which case you must NOT add it! Gotta love Apache...
  $finalURL .= '?' unless $finalType =~ /P/ || $finalURL =~ /\?/;
  print "$finalURL\n";
} # end loop over STDIN

The nice thing here is that there are a couple of ways to test it, which gives you a sort of cross-check capability. Of course I made lots of mistakes in programming it, but I worked through all the cases until they were all right, using rapid testing.

For instance, let’s see what happens for www.drj.com. We run this test from the development server as follows:

> curl -i -H ‘Host: www.drj.com’ ‘localhost:90’

HTTP/1.1 301 Moved Permanently
Date: Thu, 09 Feb 2012 15:24:25 GMT
Server: Apache/2
Location: http://drjohnstechtalk.com/
Content-Length: 235
Content-Type: text/html; charset=iso-8859-1

Moved Permanently

The document has moved here.

 

And from the command line I test redirect.pl as follows:

> echo “www.drj.com/”|./redirect.pl

http://drjohnstechtalk.com/?

That terminal “?” is unfortunate, but apparently you need it to kill off any possible query_string.

You want some more? OK. How about matching a host and the initial path in a case-insensitive manner? No problem, we’re up to the challenge:

> curl -i -H ‘Host: DRJ.COM’ ‘localhost:90/PATH/WITH/SLASH/stuff?hi=there’

HTTP/1.1 301 Moved Permanently
Date: Thu, 09 Feb 2012 15:38:12 GMT
Server: Apache/2
Location: https://drjohnstechtalk.com/other/path
Content-Length: 246
Content-Type: text/html; charset=iso-8859-1

Moved Permanently

The document has moved here.

 

Refer back to the redirs file and you see this is the desired behaviour.

We could go on with an example for each case, but we’ll conclude with one last one:

> curl -i -H ‘Host: DRJ.NET’ ‘localhost:90/2pAtHstuff?hi=there’

HTTP/1.1 301 Moved Permanently
Date: Thu, 09 Feb 2012 15:44:37 GMT
Server: Apache/2
Location: http://drjohnstechtalk.com/2straightpathstuff?hi=there
Content-Length: 262
Content-Type: text/html; charset=iso-8859-1

Moved Permanently

The document has moved here.

 

A case-sensitive, preserve match. Change “pAtH” to “path” and there is no matching line in redirs.txt so you will get the default URL.

Creating exceptions

Eventually I wanted to have an exception – a URI which should be served with a 200 status rather than redirected. How to handle?

# Inspired by the dreadful documentation on http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
        RewriteEngine on
# just this one page should NOT be redirected
        Rewriterule ^/dontredirectThisPage.php - [L]
        RewriteMap  redirectMap prg:redirect.pl
        ... etc ...

The above apache configuration snippet shows that I had to put the page which shouldn’t be redirected at the top of the ruleset and set the target to “-“, which turns off redirection for that match, and make this the last executed Rewrite rule. I think this is better than a negated match (!) which always gets complicated.

Conclusion
A powerful redirect factory was constructed from Apache and Perl. We suffered quite a bit during development because of incomprehensible documentation. But hopefully we’ve saved someone else this travail.

References and related

2022 update. This is a very nice commercial service for redirects which I have just learned about: https://www.easyredir.com/
This post describes how to massage Apache so that it always returns a maintenance page no matter what URI was originally requested.
I have since learned that another term used in the industry for rediect server is persistent URL (PURL). It’s explained in Wikipedia by this article: https://en.wikipedia.org/wiki/Persistent_uniform_resource_locator