Categories
Linux

Performance Degradation With SLES 11 SP1 Under VMWare

Some friends and I found severe performance degradation using SLES 11 SP1 under VMWare.  That’s Suse Linux Enterprise Server Service Pack 1.  I’m still trying to get my head around it – there are lots of variables to control for. Later update For my analysis of the specific problem with grep on SLES 11 SP1, please go here: http://drjohnstechtalk.com/blog/2011/06/grep-is-slow-as-a-snail-in-sles-11/.

The Method

I start with a 20 Mb gzip’d file.  Let’s call it file.gz.  Uncompressed it is 107 MB.  I run

time zcat file.gz > /dev/null

I run it several times and report the lowest number which I am consistently able to reproduce. That feels the fairest way to me to benchmark in this case.

On a fast, 3 GHz physical server running SLES 10 SP3 it takes 0.63 s. Let’s call that server Physical.

Then I add in something to make it useful: grep’ing for an IP address:

time zcat file.gz|grep 192.168.23.34 > /dev/null

On Physical that takes 0.65 s.

My Amazon EC2 image – let’s call it Amazon-EC2 – runs ubuntu 10.10 on a 2.6 GHz VM. To learn CPU speed in ubuntu:

cat /proc/cpuinfo|grep MHz

My SLES 11 SP1 is a guest VM on VMWare ESX. It has a 2.4 GHz processor. Let’s call it SLES11SP1. For CPU speed in SLES*:

dmesg|grep -i cpu

and look for the line that says Intel … GHz.

* 7/1/2011 Correction The correct way to get the processor speed is

grep -i /proc/cpuinfo

The cpu Mhz line shows the running speed. this also seems to work in RHEL (Redhat) and Debian distributions such as Ubuntu. I’ve looked at several models. Usually the model name – which is what you get from the dmesg way – lists a speed that is the same as the cpu MHz given the 1000x difference between GHz and MHz, but not always! I have come across a recently purchased server with a surprisingly slow clock speed, and one that is quite different from the CPU name:

model name      : Intel(R) Xeon(R) CPU           E7520  @ 1.87GHz
cpu MHz         : 1064.000

Who knew you could even buy a server-class CPU that slow these days?

For comparison I have a SLES 10 SP3 VM under VMWare. It also has a 2.4 GHz CPU. SLES10SP3. All servers are X86_64.

The Results
The amazing results:

Server zcat time zcat|grep IP time
Physical 0.63 s 0.65 s
Amazon-EC2 0.73 s 1.06 s
SLES11SP1 0.99 s 5.8 s
SLES10SP3 0.78 s 0.93 s

Analysis
I’ve tried many more variants than displayed in that table, but you get the idea. All VMs are slower than all physical servers tested in all tests. Most discouragingly, the SLES11 SP3 is five or six times slower than a comparable physical server for the real-life test involving grep. I used the same file in all tests.

Conclusions
Virtualization is not a panacea. It has its role, but also its limitations. And either something is wrong with SLES 11 SP1, which I doubt, or something is wrong with the way I set it up, despite the fact that I’ve tried a few variants. We’re going to try it on a physical server next. I’ll update this post if anything noteworthy happens.

Update
I got it tested on a physical server now, too. A HP G6 w/ 4 Gb RAM. SLES 11 SP1. The CPU identified itself as Xeon E5504 @ 2.0 GHz. Here are the awful timings:

Server zcat time zcat|grep IP time
SLES 11 SP1 Physical 0.90 s 4.8 s

That shows that SLES 11 SP1 itself is causing my performance degradation. Not every instance of SLES 11 is faulty, however. It turns out that IBM’s Watson is running it! http://whatis.techtarget.com/definition/ibm-watson-supercomputer.html

For the record, the kernel in SLES 11 SP1 is 2.6.32.12-0.7, while in SLES 10 SP3 it is 2.6.16.60-0.68.1. On a RHEL 5.6 VM (I did not bother to list its results), where the kernel was 2.6.18-238.1.1, the degradation was not nearly so bad either. I wonder if the kernel version is having a major impact? Perhaps not. The kernel in ubuntu 10.10 looks pretty new – 2.6.35-24. I am running

uname -a

to see the kernel version.

Further Update
We also got to test with SLES 10 SP4 VM. You see from the table that it is well-behaved. It had 2 CPUs which identified themselves as X6550 @ 2.0 GHz. 4 GB RAM. Kernel 2.6.16.60-0.85.1

Server zcat time zcat|grep IP time
SLES 10 SP4 VM 0.96 s 1.0 s