First, let me say that these results should be taken with a grain of salt. These are rather informal tests, and are geared to my environment and my way of working.
Still, I tried to be somewhat rigorous by using the excellent Phoronix Test Suite, and keeping conditions as uniform as possible across the various virtualization environments.
Testing with multiple VMs running concurrently
Initially, I wanted to test under three conditions :
* One VM running alone within the host
* Two VMs running concurrently within the host
* Three VMs running concurrently within the host
My HP MicroServer has one CPU with two cores, so I wanted to see if I would start getting contention issues when the number of VMs exceeded the number of cores.
My testing found that, at least for the CPU-intensive tests, running two VMs on the host was fine. In other words, performance "scores" (ex. MFLOPs) were pretty stable for either one or two VMs.
I only tested two environments, Windows 2008 R2 Hyper-V, and VirtualBox with three concurrent VMs. Because I had limited on-server storage, and because ESXi seemed to limit my ability to micromanage that storage, I didn't have enough space to create three VMs under ESXi.
Anyway, I found expected results for three concurrent VMs under Hyper-V. For example, here are the results (time in seconds) for LAME MP3 encoding under one, two, and three concurrent instances
Note under Hyper-V, the MP3 encoding time was 51 seconds for one or two concurrent VMs, but went up to about 71 seconds when three VMs were running concurrently.
Times for VirtualBox under one and three VMs were comparable, but slightly slower.
Little variation in CPU-intensive tasks
Most of my testing involved testing either one or two VMs running concurrently. In these tests, for CPU-intensive tasks, there was little variation in test results across the three virtualization technologies.
For example, here are results for LAME MP3 encoding when running one or two VMs concurrently. Time is in seconds, so lower numbers are better.
Note ESXi performance was a bit better than Hyper-V, which was a bit better than VirtualBox, with KVM had the worst times on average.
This is for one test, MP3 encoding, but these results tended to be repeated for the CPU-intensive tests. In other words, ESXi tended to have slightly better performance than Hyper-V, Hyper-V slightly better than VirtualBox, with KVM being the worst by a modest amount (maybe 10 percent slower than the best). Of course, in some tests, Hyper-V had the best performance etc.
Dramatic change in I/O intensive tests
I ran a suite of 26 tests in Phoronix. From the looks of the tests, the majority were CPU intensive. Some were a mixture (7-zip or BZIP compression and PostgreSQL transactions per second), and some were I/O intensive.
The I/O intensive tasks had the greatest variation.
For example, here are some results from a "Threaded I/O Write" test. Numbers are MB/sec, so higher is better :
Note the wide variation in numbers. In all cases, I tried to create thin volumes, so this can explain some of the variation. In other words, with thin volumes, the first writes of the test may be allocating new disk space, which is overhead which won't be seen subsequently.
However, the Phoronix Test Suite has built-in guards to run a test more often when it sees variation in test times, so I'm not sure this overhead had that much of an effect.
For Hyper-V, KVM, and VirtualBox, I placed the concurrent VM disks on the same physical disks, so I expected and did see slow-downs in concurrent tests.
As I mentioned for ESXi, I had questions about how ESXi actually allocated storage, but I thought that ESXi was using RAID 0 in my setup, so I expected that ESXi would deliver better performance. However, that wasn't the case.
Despite the wide variation in I/O results, it seemed like Hyper-V had best results, followed by VirtualBox, then ESXi, with KVM last.
However, I'm sure with some tweaking you could get any kind of I/O performance you wanted. Suffice it to say that if you're disappointed with your VM's performance, you should concentrate your tuning efforts on I/O tweaks.