I’ve been running sendmail for years and years. It’s a very solid MTA, though perhaps not fashionable these days. At one point I even made the leap from running on Sun/solaris to SLES. I’ve always had a particular problem on a couple of these servers: they do not react gracefully to mail storms. An application running on another server sends out a daily mail blast to 2000 users, all at once. Hey I’m not running Gmail here, but normal volume is several messages per second nonetheless, and that is handled fairly well.
But this mail blast actually knocks the system offline for a few minutes. The load average rockets up to 160. It’s essentially a self-inflicted denial-of-service attack. In my gut I always felt the situation could be improved, but was too busy to look into it.
When it was time to buy a replacement server, I had to consider and justify what to get. A “screaming server” is a little hard for a hardware vendor to turn into an order! So where are the bottlenecks? I decided to capture output of uptime, which provides load averages, and iostat, an optional package which analyzes I/O usage, at five secon intervals throughout the day. Here’s the iostat job:
nohup iostat -t -c -m -x 3 > /tmp/iostat & |
and the uptime was a tiny script I called cpu-loop.sh:
#!/bin/sh while /bin/true; do sleep 5 date uptime done |
called from the command line as:
nohup ~/cpu-loop.sh > /tmp/cpu & |
Strange thing is that though load average shoots the roof, cpu usage isn’t all that high.
If I have this right, load average shows the number of processes scheduled by the scheduler. Sendmail forks a process for each incoming email, so the number of sendmail processes climbs dramatically during a mail storm.
The fundamental issue is are we thirsting for more CPU or more I/O? Then there are the peripheral concerns like speed of pci bus, size of level two cache and number of cpus. The standard profiling tools don’t quite give you enough information.
Here’s actual output of three consecutive iostat executions:
Time: 05:11:56 AM avg-cpu: %user %nice %system %iowait %steal %idle 5.92 0.00 5.36 21.74 0.00 66.99 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 10.00 0.00 3.00 0.00 0.05 37.33 0.03 8.53 5.33 1.60 sdb 0.00 788.40 0.00 181.40 0.00 3.91 44.12 4.62 25.35 5.46 98.96 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 2.40 0.00 0.01 8.00 0.02 8.00 1.33 0.32 dm-3 0.00 0.00 0.00 2.40 0.00 0.01 8.00 0.01 5.67 2.33 0.56 dm-4 0.00 0.00 0.00 0.80 0.00 0.00 8.00 0.01 12.00 6.00 0.48 dm-5 0.00 0.00 0.00 7.60 0.00 0.03 8.00 0.08 10.32 1.05 0.80 hda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 975.00 0.00 3.81 8.00 20.93 21.39 1.01 98.96 dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Time: 05:12:01 AM avg-cpu: %user %nice %system %iowait %steal %idle 5.05 0.00 4.34 19.98 0.00 70.64 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 10.80 0.00 2.80 0.00 0.05 40.00 0.03 10.57 6.86 1.92 sdb 0.00 730.60 0.00 164.80 0.00 3.64 45.20 3.37 20.56 5.47 90.16 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 2.60 0.00 0.01 8.00 0.03 12.31 2.15 0.56 dm-3 0.00 0.00 0.00 2.40 0.00 0.01 8.00 0.02 6.33 3.33 0.80 dm-4 0.00 0.00 0.00 0.80 0.00 0.00 8.00 0.01 9.00 5.00 0.40 dm-5 0.00 0.00 0.00 7.60 0.00 0.03 8.00 0.10 13.37 1.16 0.88 hda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 899.60 0.00 3.51 8.00 16.18 18.03 1.00 90.24 dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Time: 05:12:06 AM avg-cpu: %user %nice %system %iowait %steal %idle 1.91 0.00 1.36 10.83 0.00 85.89 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.40 0.00 3.40 0.00 0.04 25.88 0.04 12.94 5.18 1.76 sdb 0.00 303.40 0.00 88.20 0.00 1.59 36.95 1.83 20.30 5.48 48.32 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 2.60 0.00 0.01 8.00 0.04 14.77 2.46 0.64 dm-3 0.00 0.00 0.00 0.60 0.00 0.00 8.00 0.00 12.00 5.33 0.32 dm-4 0.00 0.00 0.00 0.80 0.00 0.00 8.00 0.01 11.00 5.00 0.40 dm-5 0.00 0.00 0.00 5.80 0.00 0.02 8.00 0.08 12.97 1.66 0.96 hda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 393.00 0.00 1.54 8.00 6.46 16.03 1.23 48.32 dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device sdb has reached crazy high utilization levels – 98% before dropping back down to 48%. An average queue size of 4.62 in the first run means a lot of queued up processes awaiting I/O. Write requests (merged) per second of 788 seems respectable. All this, while the CPU is 67% idle!
The conclusion: a solid state drive is in order. We are dying thirsting for I/O more than for CPU. But solid state drives cost money and have to be justified which takes time. Can we do something which proves it will bear out our hypothesis and really alleviate the problem? Yes! SSD is like accessing memory. So let’s build a virtual partition from our memory. tmpfs has made this sinfully easy:
mount -t tmpfs none /mqueue -o size=8192m |
We set this to be sendmail’s queue directory. The sendmail mc command looks like this:
define(`QUEUE_DIR',`/mqueue/q*')dnl |
which I need to further explain at some point.
Now it’s interesting that this tmpfs filesystem doesn’t even show up in iostat! I guess its usage all counts as cpu usage.
I now have to send my mail blast to the system with this tmpfs setup. I’m expecting to have essentially converted my lack of I/O into better usage of spare CPU, resulting in a higher-performance system.
The Results
The results are in and they are dramatic. Previous results using traditional 15K rotating drive:
- disk device became 98% busy - cpu idle time only dropped as low as 69% - load average peaked at 37 - SMTP port shut down for some minutes - 2030 messages accepted in 187 seconds - 11 messages/second
and now using tmpfs virtual filesystem:
- the load average rose to 3.1 - a much more tolerable result - the cpu idle time dropped to 32% during the busiest time - most imporantly, the server stayed open for business - the SMTP port did not shut down for the first time!! - the 2000 messages were accepted in 34 seconds. - that's a record 59 messages/second!
Conclusion
Disk I/O was definitely the bottleneck for sendmail. tmpfs rocks! sendmail becomes five times faster using it, and is better behaved. The drawback of this filesystem type is that it is completely volatile and I stand to lose messages if the power ever goes out!
Case Closed!