Categories
Admin Internet Mail Linux

The IT Detective Agency: generated email goofs attachments

Intro
Today was a busy day! A rather expert user asks for my advice about emails being generated from his own Unix system, by his own processes, that occasionally come out with the encoding of the attachments showing up in the body of the message.

I am not a message formatting expert, or at least I wasn’t prior to this question today. Bbut if a sendmail expert can’t provide an answer, who will? So anyways, this user, let’s call him Rob forwards me this email:

Dear DrJohn,
 
I was wondering if you have some idea as to why sometimes the attachment 
is getting included in the text of the email instead of being recognized as attachment, 
see example below.
 
Any pointers would be helpful, as this makes it at the least cumbersome 
to open the attachment by cutting , pasting the text attachment part 
as file and uudecoding it into binary before it can be opened for content view.
 
Regards,
Rob
 
Mime-Version: 1.0
Content-Type: multipart/mixed;
                 boundary="----=_Part_168946_477699193.1322837415283"
X-Mailer: sendmsg
 
------=_Part_168946_477699193.1322837415283
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
 
Dear Team;
 
While executing the flow, In.process:RapidProductMovementReport_New, the following exception occurred.
 
java.lang.Exception: java.sql.SQLException:ORA-12899: value too large for column "B2B_RAPID"."P_MOVEMENT_DETAIL"."UNIT_OF_MEASURE_TYP" (actual: 3, maximum: 2)
 
 
Error Dump :
com.wm.lang.flow.FlowException:java.lang.Exception: java.sql.SQLException:ORA-12899: value too large for column ... (actual: 3, maximum: 2)
 
 
Pipeline values (see attachment)
 
Caller: Rapid_In.process:RapidRouter
 
Stack: %serviceStack%
------=_Part_168946_477699193.1322837415283
Content-Type: application/gzip; name=pipeline.xml.gz
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=pipeline.xml.gz
 
H4sIAAAAAAAAAOy9bXPiSJr3+/rMp8hwbEzMHbFFKVOpp57umpVBBp0SkkcS5WHeVLhtutr32sbH
dtV0f/uTEmDLIMBISErBf2O3to2SJJUP18PvujLz53/8cXdLfkwen26m97+c0I5yQib3V9Prm/tv
v5yM4rMP5sk/Pv3l5y+Xt98nT68FmSj46S+E/PwjeUDuL+8mv5xcT6++303un+M/HyYnn84fp9ff
r56H0x+T5MNw8jB9fP75Y/qFla/eX/72fOlNvzl3lze39vX14+Tp6eTTKTu1r+9u7v/n18un3zpX
07t1X7++eZxcPYuWnXxy/dNg5PfWlbydfvvt5naS/HHy6eP04fnjfya/3k2ef59eP328fHi4vbm6
TCoS5Z4.......

Hmm. So what do you think? I have often seen these Mime-type headers and paid absolutely no attention to them. When things work what does it matter? But now they’re not working and it does matter.

First I tried to reproduce the problem using a technique I had gleaned looking over the shoulders of some Unix admins. I knew they had an easy method to send attachments from the command line of their Unix systems so I walked over and asked them how they did it. Sure enough, it was dead easy. It goes something like this:

# uuencode file.txt file.txt|mailx -s “here is your attachment” recipient_address

I rolled up my sleeves, set recipient_address to a valid SMTP mail address. Sure enough. I got it as an attachment in Lotus Notes. But it’s an ugly attachment and doesn’t have any of the nice MIME formatting about it. so it’s probably a bit of luck that my MUA (mail user agent) understands that I mean to create an attachment. I don’t think all MUAs will do that, unless it’s following a more obscure RFC which I’m not aware of.

The original source of the message looks like this (my attachment is called cogstartup.xml.gz):

Date: Fri, 2 Dec 2011 14:54:34 -0500
From: ...
Message-Id: ...
To: ...
Subject: test using uuencode
X-MIMETrack: Itemize by SMTP Server on ... (Release 7.0.4|March 23, 2009) at
 02/12/2011 20:54:36,
		 Serialize by Notes Client ... (Release 8.5.1|September
 28, 2009) at 12/02/2011 04:12:40 PM,
		 Serialize complete at 12/02/2011 04:12:40 PM
X-TNEFEvaluated: 1
 
begin 644 cogstartup.xml.gz
M'XL(""57L4X``V-O9W-T87)T=7`N>&UL`.U]VW;;R+'H^WP%HA?-G$5)MN<2
MCW?&>].2/:/$NL24,\EY\0(!D$0,`@P`2N9\_:E+W]$@`(J2LG-&*QE+)-!=
M755=MZZ...

Very different from what we saw above, right? None of those MIME-related headers seem to be present. So I decided that uuencode is too primitive to reproduce the problem.

I was next going to try to generate an email with the help of a PERL module, like MIME::Lite. But I decided that was too much trouble as I would need to download it and install it first!

So I ventured to see if I could get lucky and figure out the problem by educating myself on the standard, without wasting too much time. The relevant RFC seems to be 1341. I prefer the older RFCs because I suppose they’re shorter – easier to understand because life was simpler in those days! Once yuo parse through the verbiage and repitition, there wasn’t much to it. In particular, it mentioned that the Header

Mime-Version: 1.0

has to be included amongst the header fields. If it is not, the correct behaviour for a MUA is to interpret all the encodings and stuff as just part of a regular body text, which it will display to the user.

I ran a test using sendmail as my sending MUA from a Linux server. With the sendmail agent you can add headers, at least that’s how I remember it:

# sudo sendmail -v recipient< tst where tst is a file I created that starts with the line Mime-Version: 1.0 Yes, indeedy. I received it and my MUA interpreted the attachment as an attachment, displaying the attachment name and the appropriate icon type for it! Now put just a blank line at the top of my tst file, pushing all the rest of the stuff down by a line, and the behaviour is completely different. Then my MUA treats everything as literal body text, just as the old RFC says it must, and it looks just the way it did when Rob forwarded it to me. Conclusion
I explained to Rob that he must sometimes be introducing an extra line above the Mime-Verison header, which would cause this problem.

He thanked me.

Case closed!

Categories
Admin Internet Mail

Obscure Tips for Sendmail Admins

Intro
Sendmail is an amazing program. The O’Reilly Sendmail book is its equal, coming in well over 1000 pages. I constantly marvel at how it was possible to pack so much knowledge into one book written by one person. Having run sendmail for over 10 years, I’ve built up a few inside tips that can be extremely hard to find out by yourself, even with the book’s help. I just learned one today, in fact, so I thought I’d put it plus some others in one place where their chances of being useful is slightly greater.

Tip 1: Multiple IPs in a Mailertable Entry, No MX Record Required
Today I learned that you can specify multiple domains in a mailertable entry even when you’re using IP addresses, as in this example:

drjohnstechblog.com       smtp:[50.17.188.196]:[10.10.10.11]

I tested it by putting 50.17.188.196 behind a firewall where it was unreachable. Sure enough, the smtp mail delivery agent of sendmail tried 10.10.10.11 next. You can continue to extend this with additional IPs

Why is this important? If someone has provided you a private IP to forward mail to, say because of a company-to-company VPN, you cannot rely on the usual DNS lookups to do the routing. And a big outfit may have two MTAs reachable in this way. Now you’ve got redundancy built-in to your delivery methods. Just as you have for organizations with multiple MX records. I paged through the book this morning and did not find it. Maybe it’s there. But it’s in an obscure spot if it is.

Tip 2: Error message Containing Punctuation
I also don’t think it’s obvious how to include multiple punctuation marks in a custom error message, even after reading the book. Here’s an example for your access table:

To:hotimail.com   ERROR:"550 You sent an email to hotimail.com.  You probably meant hotmail.com?"

So it’s the quotes that allow you to include the several punctuation marks. The 550 at the beginning will be seen as the error number.

Tip 3: Smarttable for Sender-Based Routing Decisions
Have you ever wanted to make routing decisions based on sender address rather than recipient address? Well, you can! The key is to use smarttable. In my MC file I have:

dnl Define an enhancement, smarttable, from Andrzej Filip
dnl now at http://jmaimon.com/sendmail/anfi.homeunix.net/sendmail/smarttab.html
FEATURE(`smarttable',`hash -o /etc/mail/smarttable')dnl

It’s sufficiently well documented at that page. You need his smarttable.m4 file. So this is not for beginners, but it’s not that hard, either. Although it looks like smarttable hasn’t been updated since 2002, I want to mention that it still works with the latest versions of sendmail. You can route based on the sender domain, or an individual sender address. I use it to send some messages to an encryption gateway. My smarttable entries tend to look like this:

[email protected]          relay:[192.168.12.34]

What’s First: Routing Based on Sender or Recipient??
What if your recipient’s domain is in the mailertable and your sender’s address is in the smarttable? What takes precedence in that case? The mailertable entry does. I do not know a way to change that. I actually did experience that conflict and found one way around it.

In my case I had some mailertable entries like this one:

drjohnstechblog.com            relay:drjohnstechtalk.com

with my smarttable entry as above. So I get into this conflict when [email protected] wants to send email to [email protected]. What I did is run a private BIND DNS server and remove the mailertable entry. My private DNS server is mostly a cache-only server with the usual Internet root servers. But since the public Internet value for the MX record for drjohnstechblog.com is not what I wanted for mail delivery purposes, I created a zone for drjohnstechblog.com on my private DNS server and created the MX record

drjohnstechblog.com            IN   MX   0 drjohnstechtalk.com.

thus overwriting the public MX value for drjohnstechblog.com. Then, of course, I have my server where I am running sendmail set to use my private DNS server as nameserver in /etc/resolv.conf, i.e.,

nameserver 127.0.0.1

since I ran my private DNS server on the same box. Without the mailertable entry sendmail uses DNS to determine how to deliver email unless of course the sender matches a smarttable entry! If my server relies on resolving other resource records within drjohnstechblog.com for other purposes then I have to redefine them, too.

This trick works for individual domains. What if you feel the need for an “everythnig else” entry in your mailertable, i.e.,

.                     relay:relayhost.drjohnstechtalk.com

Well, you’re stuck! I don’t have a solution for you. My DNS trick above could be extended to work for mail with some wildcard entries, but it will break so many other things that you don’t want to go there.

Tip 4: How to send the same email to two (or more) different servers
Someone claimed to need this unusual feature. See the discussion in the comment section about how I believe this is possible to do and an outline of how I would do it.
The blog posting I reference about running sendmail in queue-only mode is here.

Conclusion
Hopefully these sendmail tips will make your life as a sendmail admin toiling away in obscurity (not that I know anyone like that : ) ), just a little easier.

Resources
The sendmail book is the one by Bryan Costales. At Amazon: http://www.amazon.com/sendmail-4th-Bryan-Costales/dp/0596510292/ref=sr_1_1?s=books&ie=UTF8&qid=1316630255&sr=1-1.

My most recent post on how to tame the confounding sendmail log is here.

Using smarttable with a catch-all mailertable entry, plus virtusertable and more, is described in my latest sendmail post.

Categories
Internet Mail IT Operational Excellence Spam

How to Stop Chinese Spam – for Mail Admins, w/ June 2014 update

(Updated 12/19/2011 and 6/2014 with additional character sets)
(updated 9/2012 with additional signature)
Intro
I have been a target for random Chinese language spam in my various email accounts, but the problem has really gotten worse in the past few months.

The thing about these messages is that at first Postini (a Google spam filtering service used mostly by businesses), wasn’t very good at catching them. Postini is about the best in the business, and they’re competently catching just about every other type of spam. But these Chinese character messages kept slipping through…

Their support tech gave me some advice which turned out to be incorrect, but led me in the right direction. Their tech told told me to create a content manager rule, but the actual rule he provided was only going to catch Russian and Ukranian spam!

This is the rule he provided:

Rule Name: Non_English_spam
"Match Any"
Header - matches regex

koi8-r|koi8-u|koi7|koi8
Disposition: delete (blackhole)
Set quarantine to Recipient

I had no idea what that was doing, so I looked up koi8-r, koi8, etc and found that it had to do with the Cyrillic alphabet. So I wondered if the Chinese language spams have something similar, but for Chinese. Indeed they do: gb2312. Looking at a few of my Chinese spams, almost all contain this string in the headers. It’s not always in the exact same place, but it’s there. To be concrete, here’s an example (some headers have been obfuscated to prevent the bad guys from trying to reverse engineer Postini’s scoring algorithms):

Received: from websmtp.sohu.com ([61.135.132.136]) by eu1sys200amx108.postini.com ([207.126.147.10]) with SMTP;
		 Sun, 28 Aug 2011 18:41:21 GMT
Received: from omlbw (unknown [110.53.27.141])
		 by websmtp.sohu.com (Postfix) with ESMTPA id 9B3C6720CEA;
		 Sun, 28 Aug 2011 23:55:04 +0800 (CST)
Message-ID: <[email protected]>
From: =?gb2312?B?y7O1wsf4xu/A1rbguabE3NfU0NCztdPQz965q8u+?= <[email protected]>
To: 
Subject: =?gb2312?B?d3Azz/ogytsg1vcgudwg1/Yg0KkgIMqyIMO0IA==?=
		 =?gb2312?B?uaQg1/cgssUgxNwgzOEgIMn9INK1ILyoIKO/LS0=?=
		 =?gb2312?B?qIk=?=
Date: Sun, 28 Aug 2011 23:55:37 +0800
MIME-Version: 1.0
X-mailer: Lzke 2
X-SOHU-Antispam-Bayes: 0
X-pstn-levels:     omitted
X-pstn-settings: omitted
X-pstn-addresses: from <[email protected]> [49/2] 

Content-Type: multipart/mixed;
		 boundary="----=_NextPart_000_015A_013AC9FA.1A2D5A60"

------=_NextPart_000_015A_013AC9FA.1A2D5A60
Content-Transfer-Encoding: base64
Content-Type: text/html;
		 charset="gb2312"

See it? charset=”gb2312″ appears in the content-type header and =?gb2312? appears in both the Subject and From fields.

That message looks like this as displayed in my mail client:

How do I know this is Chinese? I pasted the characters into translate.google.com and it auto-detected it. That’s a convenient tool!

How do I know it is spam? I am open-minded. Perhaps it is a legitimate business proposition that just happens to be written in Chinese? It does sort of read that way from the translation of any one such message. On the other side are some stronger pieces of evidence. The empty To: header is a strong hint, but some legitimate messages could contain that undesirable feature, so that is merely an indicator but not definitive. Most important is the fact that I get these messages, all showing similar patterns in appearance, and most telling always coming from a different sender tells me unambiguously that this is really, truly spam.

So the actual Postini Content Manager rule to capture Chinese spam is this:

Rule Name: Chinese_spam
"Match Any"
Header matches regex (charset="gb2312"|=\?GB2312\?)

Disposition: delete (blackhole)
Set quarantine to Recipient

Obviously this type of rule is a bit dangerous. What if you are expecting something written in Chinese? It will be subject to the same treatment as the spam. That is why the suggestion is to Set quarantine to recipient so that these messages could be delivered from the user quarantine.

And over the course of a couple months Postini has gotten much better about capturing this type of spam. That is the best thing – to let the experts handle it. They just needed to train their algorithms. I was quite concerned at first that this spam is so different from the usual, recognizable spam campaigns that they might have a hard time spotting it while simultaneously allowing the good Chinese email through. But they’re almost there…

12/19 UpdateThe filter described above has been working extremely well for me. Essentially perfectly, in fact, as I can see when I look in my quarantine. But not today. Today I got some suspected Chinese spam in and examing the headers showed something slightly different. The subject looks like this:

Subject: =?GBK?B?bnZ2dyAyMDExLjEyLTIwMTItMDEgvqsgxrcgzcYgz/ogIGZkZXI=?=

And the Mime header also had that string:

Content-Type: text/plain;
		 charset=GBK

Looking up GBK character set you’ll immediately see it is simplified Chinese, extended. So I think we better add that character set to our expression. It makes our content manager rule only a little more complicated. Now we would have:

Rule Name: Chinese_spam
"Match Any"
Header matches regex (charset="gb(k|2312)"|=\?GB(K|2312)\?)

Disposition: delete (blackhole)
Set quarantine to Recipient

For the complete prescription see the summary in the Conclusion.

If you happened upon this article and don’t have the Postini service is there any relevance? Yes, I think so. You should be able to filter on the message headers to look for the string =?gb2312? or =?gbk? in the beginning of the subject line. To speak about mailers with which I have some experience, in sendmail you could do this with a milter. In PureMessage it would be possible to concoct an appropriate rule as well.

9/2012 Update
My filter was working so well these past few months I essentially forgot about the problem, but the occasional Chinese spam slipped through. How? It used a different encoding. Here is an example subject line:

Subject: =?utf-8?B?6K+35p+l5pS277yB?=

This is displayed by my mail client as three Chinese characters followed by “!” They used a different encoding. This one drove me to do a little research. This is an Encoded-Word, according to Wikipedia’s excellent MIME writeup. The “?B?” in the front means base64 encoding. I had previously written a mimedecoder in perl, which I put to use:

> mimedecode 6K+35p+l5pS277yB

which produces:

???!

which is pretty much garbage. So I decided to analyze the output with unix utility od:

> mimedecode 6K+35p+l5pS277yB|od -x

which gives

0000000 e8af b7e6 9fa5 e694 b6ef bc81

Next, I needed a UTF-8 converter, which I found at this Swiss site.

I used it with input type hexadecimal.

The results reproduced exactly the Chinese characters my mail client displayed to me! It also gives a lot of other descriptions for these characters (such as Cangjie). The first few lines begin:

As character names:

U+8BF7 CJK UNIFIED IDEOGRAPH character (请)
U+67E5 CJK UNIFIED IDEOGRAPH character (查)
U+6536 CJK UNIFIED IDEOGRAPH character (收)
U+FF01 FULLWIDTH EXCLAMATION MARK character (!)

As raw characters:

请查收!

Well, that was an interesting exercise, but I’m not sure we’ve learned anything that can be put to use in a RegEx on the original expression. Unless there’s a way to uniquely identify Chinese characters by the beginning of the encoded-word sequence following the ?B?. I have my doubts, but since I don’t seem to get thee UTF-8 emails from other sources, and I have a sample size of about five emails that fooled the other filter to work with, I have developed a content filter which would capture all of them!

Check for a header containing the RegEx:

=\?utf-8\?B\?[56]

More specifically sometimes the utf-8 string is used in the From header, sometimes it is in the subject. Most of my samples would have been caught by the simpler RegEx =\?utf-8\?B\?5, and I mention that in case you want to be more specific, but there was one recent one that had a “6” instead of a “5.”

For the record here’s that mimedecode “program”

#!/usr/bin/perl
# base64 MIME decoding
# example:
# mimedecode Nz84QGxhdGU=
# =&gt; 7?8@late
use MIME::Base64;
 
foreach (@ARGV) {
#      $encoded = encode_base64($_);
      $decoded = decode_base64($_);
#print "enc,dec: $encoded, $decoded\n";
        print $decoded;
}

And its sister program, which I call mimeencode:

#!/usr/bin/perl
# base64 MIME decoding
# DrJ, 6/2004
# example:
# mimedecode Nz84QGxhdGU=
# =&gt; 7?8@late
use MIME::Base64;
 
foreach (@ARGV) {
      $encoded = encode_base64($_);
#      $decoded = decode_base64($_);
#print "enc,dec: $encoded, $decoded\n";
        print $encoded;
}

There’s probably a built-in linux utility which does the same thing, I just don’t know what that is.

2022(!) update

Well, I finally ran across it. The built-in program to do mimeencode/mimedecode is base64. Oh well, better late than never…

Conclusion
Your users needn’t suffer from Chinese Spam. The vast majority are characterized by, um, Chinese characters, of course, whose presence is almost always indicated by the string gb2312 in the message headers. You can take advantage of that fact and build an appropriate rule for Postini or your mailer. But beware of throwing out the baby with the bathwater! In other words, make sure you only subject your users to this rule unless you either have a good quarantine, or they are sure they should never receive this type of email.

There are some spam types which evade the gb2312 rule mentioned above, however. And this part is not as well tested, frankly. The exceptions, which are still a minority of my Chinese spam, are characterized by a subject line or sender that contains =?utf-8?B?5… or =?utf-8?B?6… (see summary below). My honest expectation is that a rule this broad and coarse will also catch a few other languages (Portuguese?, Urdu?, etc.) so be careful! If you are expecting to get non-english email more testing is in order before implementing the utf-8 filter. But it will certainly help to eliminate even more Chinese spam.

4/2013 update
Summary, including 6/2014 update
My filter has worked very well for me and has withstood the test of time. I catch at least a dozen Chinese spams each day. One got through in 6/2014 however, with character set gb18030. I realize reading the above write-up is confusing because I’ve mixed my love of telling a good IT mystery with my desire to convey useful information. So, to summarize, the new combined rule is:

Match Any:

Header matches RegEx:
(charset=”gb(k|2312|18030)”|=\?GB(K|2312|18030)\?)

Header matches RegEx:
=\?utf-8\?B\?[56]

References
A spate of spam from enom-registered domains is described here.
A disappointing case where Google is not operating their Gmail service as a white-glove service is described here.

Categories
Admin Internet Mail IT Operational Excellence

The IT Detective Agency: The Case of Slow Sendmail Performance Finally Cracked

I’ve been running sendmail for years and years. It’s a very solid MTA, though perhaps not fashionable these days. At one point I even made the leap from running on Sun/solaris to SLES. I’ve always had a particular problem on a couple of these servers: they do not react gracefully to mail storms. An application running on another server sends out a daily mail blast to 2000 users, all at once. Hey I’m not running Gmail here, but normal volume is several messages per second nonetheless, and that is handled fairly well.

But this mail blast actually knocks the system offline for a few minutes. The load average rockets up to 160. It’s essentially a self-inflicted denial-of-service attack. In my gut I always felt the situation could be improved, but was too busy to look into it.

When it was time to buy a replacement server, I had to consider and justify what to get. A “screaming server” is a little hard for a hardware vendor to turn into an order! So where are the bottlenecks? I decided to capture output of uptime, which provides load averages, and iostat, an optional package which analyzes I/O usage, at five secon intervals throughout the day. Here’s the iostat job:

nohup iostat -t -c  -m -x 3 > /tmp/iostat &

and the uptime was a tiny script I called cpu-loop.sh:

#!/bin/sh
while /bin/true; do
sleep 5
date
uptime
done

called from the command line as:

nohup ~/cpu-loop.sh > /tmp/cpu &

Strange thing is that though load average shoots the roof, cpu usage isn’t all that high.

If I have this right, load average shows the number of processes scheduled by the scheduler. Sendmail forks a process for each incoming email, so the number of sendmail processes climbs dramatically during a mail storm.

The fundamental issue is are we thirsting for more CPU or more I/O? Then there are the peripheral concerns like speed of pci bus, size of level two cache and number of cpus. The standard profiling tools don’t quite give you enough information.

Here’s actual output of three consecutive iostat executions:

Time: 05:11:56 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.92    0.00    5.36   21.74    0.00   66.99

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    10.00    0.00    3.00     0.00     0.05    37.33     0.03    8.53   5.33   1.60
sdb               0.00   788.40    0.00  181.40     0.00     3.91    44.12     4.62   25.35   5.46  98.96
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    2.40     0.00     0.01     8.00     0.02    8.00   1.33   0.32
dm-3              0.00     0.00    0.00    2.40     0.00     0.01     8.00     0.01    5.67   2.33   0.56
dm-4              0.00     0.00    0.00    0.80     0.00     0.00     8.00     0.01   12.00   6.00   0.48
dm-5              0.00     0.00    0.00    7.60     0.00     0.03     8.00     0.08   10.32   1.05   0.80
hda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00  975.00     0.00     3.81     8.00    20.93   21.39   1.01  98.96
dm-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Time: 05:12:01 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.05    0.00    4.34   19.98    0.00   70.64

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    10.80    0.00    2.80     0.00     0.05    40.00     0.03   10.57   6.86   1.92
sdb               0.00   730.60    0.00  164.80     0.00     3.64    45.20     3.37   20.56   5.47  90.16
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    2.60     0.00     0.01     8.00     0.03   12.31   2.15   0.56
dm-3              0.00     0.00    0.00    2.40     0.00     0.01     8.00     0.02    6.33   3.33   0.80
dm-4              0.00     0.00    0.00    0.80     0.00     0.00     8.00     0.01    9.00   5.00   0.40
dm-5              0.00     0.00    0.00    7.60     0.00     0.03     8.00     0.10   13.37   1.16   0.88
hda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00  899.60     0.00     3.51     8.00    16.18   18.03   1.00  90.24
dm-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Time: 05:12:06 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.91    0.00    1.36   10.83    0.00   85.89

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     6.40    0.00    3.40     0.00     0.04    25.88     0.04   12.94   5.18   1.76
sdb               0.00   303.40    0.00   88.20     0.00     1.59    36.95     1.83   20.30   5.48  48.32
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    2.60     0.00     0.01     8.00     0.04   14.77   2.46   0.64
dm-3              0.00     0.00    0.00    0.60     0.00     0.00     8.00     0.00   12.00   5.33   0.32
dm-4              0.00     0.00    0.00    0.80     0.00     0.00     8.00     0.01   11.00   5.00   0.40
dm-5              0.00     0.00    0.00    5.80     0.00     0.02     8.00     0.08   12.97   1.66   0.96
hda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00  393.00     0.00     1.54     8.00     6.46   16.03   1.23  48.32
dm-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device sdb has reached crazy high utilization levels – 98% before dropping back down to 48%. An average queue size of 4.62 in the first run means a lot of queued up processes awaiting I/O. Write requests (merged) per second of 788 seems respectable. All this, while the CPU is 67% idle!

The conclusion: a solid state drive is in order. We are dying thirsting for I/O more than for CPU. But solid state drives cost money and have to be justified which takes time. Can we do something which proves it will bear out our hypothesis and really alleviate the problem? Yes! SSD is like accessing memory. So let’s build a virtual partition from our memory. tmpfs has made this sinfully easy:

mount -t tmpfs none /mqueue -o size=8192m

We set this to be sendmail’s queue directory. The sendmail mc command looks like this:

define(`QUEUE_DIR',`/mqueue/q*')dnl

which I need to further explain at some point.

Now it’s interesting that this tmpfs filesystem doesn’t even show up in iostat! I guess its usage all counts as cpu usage.

I now have to send my mail blast to the system with this tmpfs setup. I’m expecting to have essentially converted my lack of I/O into better usage of spare CPU, resulting in a higher-performance system.

The Results
The results are in and they are dramatic. Previous results using traditional 15K rotating drive:

- disk device became 98% busy
- cpu idle time only dropped as low as 69%
- load average peaked at 37
- SMTP port shut down for some minutes
- 2030 messages accepted in 187 seconds
- 11 messages/second

and now using tmpfs virtual filesystem:

- the load average rose to 3.1 - a much more tolerable result
- the cpu idle time dropped to 32% during the busiest time
- most imporantly, the server stayed open for business - the SMTP port did not shut down for the first time!!
- the 2000 messages were accepted in 34 seconds.  
- that's a record 59 messages/second!

Conclusion
Disk I/O was definitely the bottleneck for sendmail. tmpfs rocks! sendmail becomes five times faster using it, and is better behaved. The drawback of this filesystem type is that it is completely volatile and I stand to lose messages if the power ever goes out!

Case Closed!