Categories
Internet Mail

Analyzing the sendmail log

Intro
If you’ve read any of my posts you’ll see that I believe sendmail is a technical marvel, but that’ snot to say it’s without its flaws.

One of the annoying things is that the From line and To line are recorded spearately, in defiance of common logic. I present a simple program I wrote to join these lines.

The details

Without further ado, here is the program, which I call combine.pl:

#!/usr/bin/perl
# combine lines from stat.log
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# DrJ 6/2013 - make more readable based on this format:
# Date|Time|Size[Bytes]|Direction|Sender|Recipient|Relay-MTA
#
#
# From= usually has address surrounded by <>, but not always
#
# input of
#Jun 20 10:00:21 drjemgw sm-mta[24318]: r5KE0K1U024318: [email protected], size=5331, class=0, nrcpts=1, msgid=<15936355.7941275772268.JavaMail.easypaynet@Z32C1GBSZ4AN2>, proto=SMTP, daemon=sm-mta, relay=amx139.postini.com [20.26.143.12]
#Jun 20 10:00:22 drjemgw sm-mta[24367]: r5KE0K1U024318: to=<[email protected]>, delay=00:00:02, xdelay=00:00:01, mailer=esmtp, pri=125331, relay=drjinternal.com [50.17.188.196], dsn=2.0.0, stat=Sent (r5KE0M6E027784 Message accepted for delivery)
# produces
#20.6.2013|10:00:21|5331|IN|[email protected]|[email protected]|amx139.postini.com
#
use Getopt::Std;
getopts('s:f'); # -s search_string -f for "full" version of output
$DEBUG = 0;
 
print "$relay{$ID}, $lines{$ID}, $sender{$ID}, $size{$ID}\n";
$year = `date +%Y`;
chomp($year);
while(<>) {
  chomp;
  print $_ ."\n" if $DEBUG;
#
# get ID
  ($ID) = /\[\d{2,10}\]:\s+(\w+):\s+/;
#print "ID: $ID\n";
  if ($lines{"$ID"} && / stat=Sent /) {
    if ($opt_f) {
      $lines{"$ID"} .= '**to**line**'.$_;
    } else {
      ($recip,$relay) = /:\sto=<(.+)>,\s.*\srelay=(\S+)\s/;
# there can be multiple recipients listed
      $recip =~ s/[\<\>]//g;
# disposition of email.  This needs customization for your situation, but it only determines IN vs OUT vs INTERNAL so it's not critical...
# In this example coding we get all our inbound email from postini.com, and outbound mail comes from drjinternal
      if ($relay{$ID} =~ /postini\.com/) {
        $disp = "IN";
      } else {
        $disp = $relay =~ /drjinternal/ ? "INTERNAL" : "OUT";
      }
      $lines = "$lines{$ID}|$size{$ID}|$disp|$sender{$ID}|$recip|$relay{$ID}";
      if ( ($lines =~ /$opt_s/ || ! $opt_s) && ($sender{$ID} || $recip) ) {
        $lines .= "|$ID" if $DEBUG;
#        push @lines, $lines; # why bother?  just spit it out immediately
         print "$lines\n";
      }
# save memory, hopefully? - can't do this. sometimes we have multiple To lines
#      undef $relay{$ID}, $lines{$ID}, $sender{$ID}, $size{$ID};
      print "$recip\n" if $DEBUG;
    }
  } else {
    if ($opt_f) {
      $lines{"$ID"} .= '**from**line**'.$_;
    } else {
      ($mon,$date,$time,$sender,$size,$relay) = /^(\w+)\s+(\d+)\s+([\d:]+)\s.+\sfrom=<?([^<][^,]*\w)>?,\ssize=(\d+).*relay=(\S+)/;
# convert month name to month number
      $monno = index('JanFebMarAprMayJunJulAugSepOctNovDec',$mon)/3 + 1;
# the year is faked - it's not actually recorded in the logs so we assume it's the current year...
      $lines{$ID} = "$date.$monno.$year|$time";
      $size{$ID} = $size;
      $sender{$ID} = $sender;
      $relay{$ID} = $relay;
 
      print "$mon,$date,$time,$sender,$size,$relay\n" if $DEBUG;
    }
  }
}
 
# now start matching
if ($opt_f) {
  foreach (@lines) {
    print $_."\n"
  }
}

What it does is combine the From and To lines based on the message ID which is unique to a message.

Usage
I usually use it to suck in an entire day’s log (I call my sendmail log stat.log) and grep the output to look for a particular string. For instance today there was a spam blast where ADP’s identity was phished. The sending domains all contained some variant of adp: adp.net, adp.org, adpmail.com, adp.biz, etc. So I wanted to find the answer to the question who’s received any of these ADP phishing emails today? Here’s how you use the program, to do that:

$ combine.pl<stat.log|grep adp.com|more

The input lines look like this:

Jun 20 10:00:21 drjemgw sm-mta[24318]: r5KE0K1U024318: [email protected], size=5331, class=0, nrcpts=1, msgid=<15936355.7941275772268.JavaMail.easypaynet@Z32C1GBSZ4AN2>, proto=SMTP, daemon=sm-mta, relay=amx139.postini.com [27.16.14.22]
Jun 20 10:00:22 drjemgw sm-mta[24367]: r5KE0K1U024318: to=<[email protected]>, delay=00:00:02, xdelay=00:00:01, mailer=esmtp, pri=125331, relay=drjinternal.com. [50.17.188.196], dsn=2.0.0, stat=Sent (r5KE0M6E027784 Message accepted for delivery)

The output from combine.pl looks like this:

20.6.2013|10:00:21|5331|IN|[email protected]|[email protected]|amx139.postini.com

Yeah, I got that ADP spam by the way…

Conclusion
A useful Perl script has been presented which helps mail admins combine separate output lines into a single entry, preserving the most important meta-data from the mail.

Other interesting sendmail posts are also available here and here.

Leave a Reply

Your email address will not be published. Required fields are marked *