This is a selection of relatively straight forward optimizations that can increase performance and reduce overhead of guest operating systems running inside KVM. Some of these notes are distribution specific, where they are they are specific to a Ubuntu 10.04 LTS host and Ubuntu 12.04 LTS guests. The principles are generic though and should easily translate to other systems. The host Ubuntu 10.04 LTS comes with a relatively old QEMU/KVM 0.12.3. Some of the parameters may have been moved/renamed since, adapt as needed.
Use host CPU for full feature pass-through. This allows the guest to use all host CPU features (higher SSE variants for example). By default the CPU features are restricted to a subset of more compatible features which allows for seamless migration between hosts of different CPU type. When performance is key, or when you're reasonably sure that any potential other/new host has at least the same set of CPU features available, using all host CPU features makes sense.
The guest obviously needs to have virtio drivers available. The disk devices will change from /dev/sd* to /dev/vd* which may require updating /etc/fstab. If the filesystems are mapped based on UUID inside /etc/fstab, then no change is required. Ideally raw volumes or dedicated partitions are used for guest storage as they remove the extra filesystem overhead that image files incur. Raw images are the next best thing. When using sparse files make sure that there is reasonably much space for metadata preallocated so that the file can grow with less overhead.
The virtio based network driver reduces CPU overhead due to not having to emulate an actual network interface card.
TAP interfaces on the host allow for greater performance, less overhead and more flexible configuration.
If you run through a wrapper script anyway, you can do the network setup there. Otherwise you can set the script parameter to a script that prepares the network for the given TAP interface.
Create a persistent TAP interface owned by the user who runs the VMs.
tunctl -t $TAP_DEVICE -u $VM_ADMIN
Configure the TAP interface with the host IP and netmask resulting in a virtual private network between the host and the guest.
ifconfig $TAP_DEVICE $HOST_IP netmask $NETMASK up
Also enable IP forwarding to allow routing between the guest network(s) and the host network(s). Setting up a firewall through iptables is advisable.
sysctl -w net.ipv4.ip_forward=1
Using dnsmasq on the host allows to configure the guest dynamically from the outside through DHCP and provides DNS caching.
dnsmasq --strict-order --except-interface=lo --interface=$TAP_DEVICE \ --listen-address=$HOST_IP --bind-interfaces \ --dhcp-range=$GUEST_IP,$GUEST_IP --conf-file="" \ --pid-file=/var/run/kvm-dnsmasq-$TAP_DEVICE.pid \ --dhcp-leasefile=/var/run/kvm-dnsmasq-$TAP_DEVICE.lease \ --dhcp-no-override
Disable graphics output when running on headless servers.
Remove the virtual graphics card as it's then not needed.
When needed to interact with the guest graphically you can always enable VNC and port forward it through. The -nographic and -vga none need to be removed temporarily for this to output anything.
The QEMU monitor allows to interact with the virtual hardware. It allows to initiate ACPI shutdown, commit snapshoted drives, attach new storage, migrate the guest to another KVM and much more.
After starting the VM, daemonize the KVM instance. Alternatively it could be run through screen or as a background job.
Give the VM a name. This makes it easier to discern the VMs in the process list. According to documentation it should also alter the process name itself, it did not do that for me however. Putting the name at the top of the argument list will make it easy to spot in most circumstances.
Note that it is possible that the KVM process, even though it is supposed to daemonize, still captures the console you are running on. It then disables echo on the TTY so that you don't see what you type anymore. This can be very irritating but can be restored easily using the following command:
When there is a specialized storage controller that abstracts access to the actual disks, i.e. any RAID controller, you don't want to waste resources on optimizing disk scheduling. In some cases it will also introduce delays that aren't necessary at all, as that work is again duplicated by the storage controller. Selecting a simpler elevator is therefore good for latency and throughput. The deadline elevator is sensible for a VM host, however the noop elevator that pretty much only combines adjacent block operations may make even more sense depending on the storage controller. HP for example suggests using noop with Smart Array controllers.
cat /sys/block/<block device>/queue/scheduler
This tells what elevator is currently enabled and what other elevators are available. Note that these are block device specific. Virtual devices like a software RAID using MD don't have a separate scheduler, the block devices that they run on top have.
echo noop | sudo tee /sys/block/<block device>/queue/scheduler
To make the selection permanent add the elevator argument to your bootloader. For GRUB2 this can be specified in /etc/default/grub with the variable GRUB_CMDLINE_LINUX_DEFAULT. Just add "elevator=noop" for the argument.
Virtual kernels are kernels tuned to run as guests in virtual machines. This may or may not include specific performance tweaks (I didn't check). What it does though is radically reduce kernel size by excluding drivers that are not commonly found inside VMs. The size is more in the range of ~30-40MiB instead of over 100MiB. This makes kernel updates quicker and frees some storage space. Ubuntu packages those under linux-image-virtual.
sudo apt-get install linux-image-virtual sudo apt-get purge linux-image-.*-generic sudo apt-get autoremove
Note that this installs the linux-image-virtual specifically, not the broader linux-virtual which would also include the headers that go with that kernel version. When not compiling any kernel modules, these headers aren't really needed and can therefore be left out for further reduced disk space and update overhead.
Since the disk is virtualized, using a spindle specific disk scheduler doesn't really make sense. The hypervisor, the host OS or the host storage controller will already do the scheduling, so no need to duplicate any work and waste CPU resources. See the section in the host configuration for how to switch the elevator permanently.
When moving the VM around, changing from one virtualization solution to another or simply switching emulated NIC models, it can happen that the MAC address of that virtual NIC changes. If the guest isn't specifically configured to handle this, it may not bring up the new interface rendering the VM unreachable. You can see this by boot messages like "waiting for network configuration" and eventually something along the lines of "booting without network".
If you disabled graphical output and did not configure an otherwise reachable interface (like a serial TTY) you will have to enable some means to interact with the guest. One possibility is configuring VNC as mentioned in the section about graphics output. This is convenient as it allows you to tunnel the connection through SSH.
Once logged into the guest, check for network interfaces:
If an unexpected interface (like eth1 instead of eth0) shows up, then there are two options. You could change the interface config in /etc/network/interfaces to include configuration for the new interface:
auto eth1 iface eth1 inet dhcp
But this will still cause the above mentioned messages during boot and it will delay startup of services by over a minute. This is due to the DHCP on the original, no longer present, interface having to time out first. Instead you should simply adjust the udev rules that create the interfaces:
sudo vim /etc/udev/rules.d/70-persistent-net.rules
Remove the section that creates interfaces for any non-existing MAC address and renumber the interface names accordingly so that you get (just) an eth0 for the existing virtual MAC address.