VFIO: Tuning your Windows gaming VM for optimal performance
This will be a guide on advanced tuning for a VFIO gaming VM. If you’re starting from scratch, read through the Arch Wiki guide on PCI passtrhough via OVMF. It is a great starting point and covers all of the basics. I’d recommend using libvirt
instead of straight QEMU.
Host hardware configuration
Before we begin, it’s best I show and explain my host host specs. If you only want configuration info, jump to Host Configurations.
Distro: Arch Linux Motherboard: X399 AORUS Gaming 7 DE: Plasma on X11 CPU: AMD Ryzen Threadripper 2950X OC to @ 4.000GHz GPUs: Radeon RX 480, Radeon RX6900 XT, Radeon RX 550X NIC: Intel X520-DA2 SSDs: Samsung 970 pro 512GB (luks encrypted, BTRFS), Team Group MP34 1TB HDDs: 4x 4TB HGST Deskstar NAS drives in a ZFS RAID-Z1 Memory: 32GB (4x8GB) Gskill Samsung B-die kit running at 3200 MT/s XMP
For the distro, I chose Manjaro Linux because its basically a few months delayed turn key Arch Linux. This is important because when my rig was built, Threadripper had issues with older linux kernels. Manjaro lets the user easily pick a kernel from a nice and easy to use GUI, very important for new-to-desktop-Linux me. Since its based on Arch, it’s also running the latest packges for QEMU and OVMF, always ensuring the ease of new feature deployment, and with AUR, its really easy to get custom kernels and packages.
For the platform, I went with a Threadripper because I wanted to have 64 PCIe lanes for lots of add-in cards, with the added benefit of extra memory slots and NVMe slots. Threadripper is simultaneously a gift with all of its cores and a downside because of its NUMA architecture in the 1st and 2nd gen iterations. NUMA adds extra complexity to the VM, but these days its not too hard with tons of guides working around it. The X399 AORUS Gaming 7 motherboard I chose because I found it for 100 bucks on Amazon because the integrated sound card was broken.
For the GPU’s, I initially had a GTX 1080 but swapped it out for a RX6900XT because I wanted to use it in both Linux and Windows. Despite Arch making Nvidia drivers easy, it’s still a pain to work with in Linux. This can be ignored if you just have the GPU for passthrough and nothing else. The other 2 GPU’s are there for display outputs to my array of 6 monitors.
The NIC was chosen to take advantage of SR-IOV and Virtual Function NICs for use with the VM. The virtual function NIC allows the complete bypassing of the virtio
network stack and use of essentially a PCIEe NIC in the VM. This decreases latency and weird kernel lag caused by 10 gbit speeds with the virt-io drivers in windows. I think this issue could have been fixed but I had the NIC anyways so I wanted to try it out and it works great. The way SR-IOV works you can even set a virtual function device to a specific VLAN either in the host or VM.
The storage for the host is the Samsung 970. Its fast and reliable. My only regret was not getting the 1TB version. For the guest, I have a Team Group SSD passed through. The HDD array uses ZFS because its just the best file system in the world, but more importantly it has Zvol’s, Encryption, Compression, and a really good cache system with ARC.
Host Configurations
BIOS/UEFI
Here are the configs I’d turn on if possible
- Resizeable BAR set to on. This allows the CPU to and GPU to use bigger than 4GB blocks of RAM.
- Above 4G decoding set to on. This allows the communication of the GPU to bypass the CPU and go straight to the storage source.
- IOMMU set to on. This turns on IOMMU grouping, a must have.
- Only turn on “ACS override” if your motherboard IOMMU groups are not usable. It does work but sometimes causes lag
- Make sure you boot via UEFI and disable CSM. Graphics cards, Storage, and networking should all be set to UEFI only
- Make sure the Initial display is not set to the GPU you are passing though
- If you’re overclocking, make sure it is very stable, micro stutters and unstable voltages can cause crashing in the VM but not the host.
Boot and Kernel Parameters
Here are the kernel parameters that I use. Append what I have to the end of your boot line in /etc/default/grub
.
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 iommu=pt"
The ones that are different are the allow_unsafe_interrupts
and ignore_msrs
. These allow more leniency with interrupt timings and fix a Windows bug with EPYC. IOMMU set to passthrough does something that makes passthough work much better.
For mkinitcpio
options, I added the following to /etc/mkinitcpio.conf
:
MODULES="amdgpu vfio_pci vfio vfio_iommu_type1 vfio_virqfd"
The vfio_pci
allows loading of the vfio_pci
driver for any passed through device, important for GPU’s if you don’t want issues. The other VFIO stuff I just know it makes things better for passed through devices, so I enabled them.
For modprobe
, I added an SR-IOV command to make virtual NICs on my Intel X520 in /etc/modprobe.d/ixgbe.conf
:
options ixgbe max_vfs=2
This just makes 2 virtual function NICs per physical NIC on the card.
ZFS config
For my array, I have a VM’s dataset that has ZSTD-fast compression turned on, Access time Off, 16KiB recond size (since i have 4x4KiB sector drives), and AES-256-GCM encryption. I also have a Zvol in this dataset for the VM to use as a storage drive, inheriting the dataset config above. I also have ARC set to a max size of 4gb to avoid having ZFS eat all of my host’s memory.
Start VM script
I have a long VM start script that I run every time my virtual machine. There are many ways to have this done, but just sh it every time I want to use my VM.
vfio-start script (Click to Expand)
#!/bin/sh #==========Sudo Check========= sudo cat /etc/resolv.conf #======================================================================== #=============== Pre Commands =========================================== #======================================================================== echo 'set frequency governer' sudo cpupower frequency-set -g performance echo Setting up monitors xrandr --output HDMI-A-0 --primary sleep 2 xrandr --output DisplayPort-2-3 --off sleep 2 echo waiting for x to catch up sleep 15 echo Done! sleep 1 #=============== SR-IOV Functions ================= #set virtual nic to VLAN 69 (DMZ) sudo ip link set enp7s0f1 vf 1 vlan 69 #================ PCIe Crap ======================================= #run UnBind script for RX 6900xt sudo sh /root/remove_6900xt.sh #================ interrupts ========================================= #grep vfio /proc/interrupts | cut -b 3-4 | while read -r i ; do # echo "set mask fcfc to irq $i" # echo fcfc >/proc/irq/$i/smp_affinity #done #============ Barrier ========================================= #Start Barrier echo Starting Barrier barrier --config /home/grassyloki/barrierconfig.conf </dev/null &>/dev/null & #============= VFIO-Isolate ============== sudo vfio-isolate cpuset-create --cpus N0 --mems N0 -mm /host.slice move-tasks / /host.slice sudo vfio-isolate -u /tmp/undo_irq irq-affinity mask C8-15,24-31 #============================================================= #====================== Start VM ============================= #============================================================= #echo Allocating Huge Pages! #sudo sh /lib/systemd/hugetlb-reserve-pages.sh echo Starting Gaming VM sudo virsh start VFIO-NoHide echo echo Verify Affinity of CPUs sudo virsh vcpuinfo VFIO-NoHide | grep Affinity echo Press any key to end VM read clear #======================================================================== #=========================Stop procedure================================= #======================================================================== echo Shuttingdown Gaming VM sudo virsh shutdown VFIO-NoHide #### Undo VFIO-Isolate sudo vfio-isolate cpuset-delete /host.slice sudo vfio-isolate restore /tmp/undo_irq echo 'Setting CPU governer' sudo cpupower frequency-set -g ondemand echo Killing barrier #kill $(ps -e | grep barrier | awk '{print $1}') ps -ef | grep barrier | grep -v grep | awk '{print $2}' | xargs kill echo Re-init'ing 6900xt sudo sh /root/reinit_6900xt.sh echo 'initdisplays running....' sh /home/grassyloki/initdisplays.sh echo 'script done!' sleep .5
Pre-Commands
These are the commands that I run before the VM is started.The first one is set the CPU speed governor to performance. This is imporatnt to get max FPS since the guest can’t really control the CPU frequency. Next one disconnects my main monitor and sets another monitor as the primary in X. Since my GPU is getting passed through, I need to do this so X does not crash when I yoink the GPU from the host. Next I set one of the virtual function NICs to use a specific VLAN instead of trunking all of them to the host. The next one is the fun one, removing the 6900XT and preparing it for use in the VM. I’ll talk about that more in another section. Below that is a inline script to map interrupts to different cores and numa nodes. This is now handled by VFIO-Isolate so its commented out. Next is the start command. Barrier is the software I use to send my mouse and keyboard to the VM. It is basically a FOSS version of Synergy with some KVM enhancements. Finally VFIO-Isolate. I’ll cover this in its own section below.
Removing the GPU for use in the VM
For dynamic loading and unloading of a gpu in use by the system, some configurations need to be changed. Make sure the GPU you are passing though to the VM is NOT the primary. This can be checked with: xrandr --listproviders
. Provider 0 is the primary gpu. You can change this by phyically changing slots in your motherboard, settings in the UEFI, and worse case linux command line in your bootloader.
Providers: number : 3 Provider 0: id: 0x56 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 5 outputs: 3 associated providers: 2 name:AMD Radeon RX 550 / 550 Series @ pci:0000:09:00.0 Provider 1: id: 0xce cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 6 outputs: 4 associated providers: 1 name:AMD Radeon RX 6900 XT @ pci:0000:45:00.0 Provider 2: id: 0x8e cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 6 outputs: 5 associated providers: 1 name:AMD Radeon RX 480 Graphics @ pci:0000:0a:00.0
From my output, you can see that my primary is a RX 550. This means i can dynamiclly rip the GPU from X without any issues.
This is a very sketchy method for removing the GPU from the host, switching its kernel driver to vfio_pci, then praying X does not crash. I’ll go through it line by line.
#!/bin/sh echo "unbind 6900xt gpu from amdgpu (1002:73bf)" echo 0000:45:00.0 > /sys/bus/pci/drivers/amdgpu/unbind
This tells the GPU’s PCIe address to unbind from amdgpu. I’m still unsure if it is a good idea to do this for all of the PCIe devices in the IOMMU group, but it seems to work this way
sleep 2 echo 1002 73bf > /sys/bus/pci/drivers/vfio-pci/new_id || echo -n "0000:45:00.0" > /sys/bus/pci/drivers/vfio-pci/bind echo done
This section binds the card to the vfio_pci kernel driver by giving the vfio-pci a new pci device id it can use. Now do this for all of the rest of the GPU’s devices.
echo "unbind gpu sound card (1002:ab28)" echo 0000:45:00.1 > /sys/bus/pci/drivers/snd_hda_intel/unbind sleep 2 echo 1002 ab28 > /sys/bus/pci/drivers/vfio-pci/new_id || echo -n "0000:45:00.1" > /sys/bus/pci/drivers/vfio-pci/bind echo done sleep 1 echo "unbind gpu usb card (1002:73a6)" echo 0000:45:00.2 > /sys/bus/pci/drivers/xhci_hcd/unbind sleep 2 echo 1002 73a6 > /sys/bus/pci/drivers/vfio-pci/new_id || echo -n "0000:45:00.2" > /sys/bus/pci/drivers/vfio-pci/bind echo done sleep 1 echo "unbind gpu serial card (1002:73a4)" echo 0000:45:00.3 > /sys/bus/pci/drivers/i2c-designware-pci/unbind sleep 2 echo 1002 73a4 > /sys/bus/pci/drivers/vfio-pci/new_id || echo -n "0000:45:00.3" > /sys/bus/pci/drivers/vfio-pci/bind sleep 1 echo "script done"
After this script finishes, the GPU should be ready for passthough. Either that or X crashed. Sometimes it likes to do that. Xorg does not support hot-remove of gpu’s so it kind of panics. The key thing to make sure is that the GPU getting removed is not the primary render GPU for X. You can check that with the command “xrandr –listproviders” where provider 0 is the primary. If it is the primary I think it will just crash when its yanked regardless. Wayland supports both hot-add and hot-remove, so if you can use Wayland use it for a better experience. If you got a way to make X happy please post below.
vfio-isolate
vfio-isolate is a crazy good project for mapping interrupts and host CPU prioritizes to other CPUs. This is important because host interrupts and CPU usage will cause high latency, stutters, or even crashing in the VM. For my setup, I have 2 numa nodes, with basically 1 dedicated to the VM. Use tools like lstopo
to make sure that A) your GPU is on the numa node of the CPU that the VM is using, and B) that your CPU’s for the host are all on the same node, do not mix physical cores and SMT/hyperthreaded cores of other nodes. In my setup, Node 0 is CPU’s 0-7, 16-25 and Node 1 is CPU’s 8-15, 24-31. The first vfio-isolate command is saying to make a CPU “slice” and move all host tasks to it. The second command sets the IRQ affinity mask to not use these CPU’s for host interrupts. This really helps with micro stutters and weird latency issues / game crashes.
Start VM
This part just starts the VM. I originally had static huge pages, but I’ve since moved to dynamic pages, it’s no longer needed and thus commented out. Next the VM starts, and if there is any error it shows. Next I dump the CPU mappings that the VM is using for CPU’s. It’s important that each CPU core in pinned correctly so that there are proper L1, L2, and L3 cache hits. It improves performance and decreases latency and stutters.
Stop procedure
To start off, we issue the shutdown command to the VM. Next we remove the blocks on all other CPU cores and memory blocks so that all programs can use all cores and all memory. Next line removes the interrupt mappings. After I kill the Barrier program. Next is the fun one, re-adding the GPU to the host. That has its own section below. Lastly sleep for 5 seconds while X finds the GPU, then turn on the displays.
Re Init RX 6900XT
This script is still kind of work in progress and does not fully work. It is basically the remove gpu script but reversed.
echo "unbind gpu serial card from vfio-pci to i2c (1002:73a4)" echo 0000:45:00.3 > /sys/bus/pci/drivers/vfio-pci/unbind
This unbinds the card from the vfio_pci driver. This will work ONLY AFTER the VM has fully turned off and a grace period of 5 seconds has passed.
sleep 2 echo 0000:45:00.3 > /sys/bus/pci/drivers/i2c-designware-pci/bind echo done
This binds the GPU serial port for the gpu. We are not using new_id because the ID is already cleared to use the driver, instead just sending the bind command. Now, repeate this for all of the other non-gpu parts of the GPU.
echo "unbind gpu usb card from vfio-pci to xhci_hcd (1002:73a6)" echo 0000:45:00.2 > /sys/bus/pci/drivers/vfio-pci/unbind sleep 2 echo 0000:45:00.2 > /sys/bus/pci/drivers/xhci_hcd/bind echo done echo "unbind gpu sound card from vfio-pci to snd_hda_intel (1002:ab28)" echo 0000:45:00.1 > /sys/bus/pci/drivers/vfio-pci/unbind sleep 2 echo 0000:45:00.1 > /sys/bus/pci/drivers/snd_hda_intel/bind echo done
Now that all of the other parts of the gpu has been re-added to the host with the proper drivers, you can attempt to add the gpu back to the system
echo "unbind gpu from vfio-pci to amdgpu (1002:73bf)" echo 0000:45:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind sleep 2 echo 0000:45:00.0 > /sys/bus/pci/drivers/amdgpu/bind echo script done
Unfortinetly mine does not re-add fully to the computer after this… I think it might have something to do with the amdgpu gpu reset bug, but im not sure. this should “just werk” but it doesn’t. If you manage to get it working with an AMD card please post how below so i can update this part of the script.
Virtual Machine Configuration
My virtual machine is a bit crazy. It comes from 4 years of learning, tuning, and figuring out what seems to work the best. Some of the things I’ve done seem a bit overkill (because they are) but doing VFIO was the whole purpose of this rig, so some design decisions were made with this in mind. That does NOT mean that these optimizations can’t be used say on a laptop or a normal desktop.
Libvirt Configuration
Here is my Libvirt XML (Click to Expand)
<domain type="kvm"> <name>VFIO-NoHide</name> <uuid>9694362e-5fd3-4add-876e-e28a2e509bb6</uuid> <metadata> <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0"> <libosinfo:os id="http://microsoft.com/win/10"/> </libosinfo:libosinfo> </metadata> <memory unit="KiB">16777216</memory> <currentMemory unit="KiB">16777216</currentMemory> <vcpu placement="static">16</vcpu> <iothreads>2</iothreads> <iothreadids> <iothread id="1"/> <iothread id="2"/> </iothreadids> <cputune> <vcpupin vcpu="0" cpuset="8"/> <vcpupin vcpu="1" cpuset="9"/> <vcpupin vcpu="2" cpuset="10"/> <vcpupin vcpu="3" cpuset="11"/> <vcpupin vcpu="4" cpuset="12"/> <vcpupin vcpu="5" cpuset="13"/> <vcpupin vcpu="6" cpuset="14"/> <vcpupin vcpu="7" cpuset="15"/> <vcpupin vcpu="8" cpuset="24"/> <vcpupin vcpu="9" cpuset="25"/> <vcpupin vcpu="10" cpuset="26"/> <vcpupin vcpu="11" cpuset="27"/> <vcpupin vcpu="12" cpuset="28"/> <vcpupin vcpu="13" cpuset="29"/> <vcpupin vcpu="14" cpuset="30"/> <vcpupin vcpu="15" cpuset="31"/> <emulatorpin cpuset="2"/> <iothreadpin iothread="1" cpuset="4"/> <iothreadpin iothread="2" cpuset="5"/> </cputune> <os> <type arch="x86_64" machine="pc-q35-7.0">hvm</type> <loader readonly="yes" type="pflash">/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader> <nvram>/var/lib/libvirt/qemu/nvram/VFIO_VARS.fd</nvram> <smbios mode="host"/> </os> <features> <acpi/> <apic/> <hyperv mode="passthrough"> <relaxed state="on"/> <vapic state="on"/> <spinlocks state="on" retries="8191"/> <vpindex state="on"/> <runtime state="on"/> <synic state="on"/> <stimer state="on"/> <reset state="off"/> <vendor_id state="on" value="7ba845ec2647"/> <frequencies state="on"/> <reenlightenment state="off"/> <tlbflush state="on"/> <ipi state="on"/> <evmcs state="off"/> </hyperv> <kvm> <hidden state="off"/> </kvm> <vmport state="off"/> <ioapic driver="kvm"/> </features> <cpu mode="host-passthrough" check="none" migratable="on"> <topology sockets="1" dies="1" cores="8" threads="2"/> <feature policy="require" name="topoext"/> </cpu> <clock offset="localtime"> <timer name="rtc" tickpolicy="catchup"/> <timer name="pit" tickpolicy="delay"/> <timer name="hypervclock" present="yes"/> <timer name="hpet" present="yes"/> <timer name="tsc" present="yes" mode="native"/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled="no"/> <suspend-to-disk enabled="no"/> </pm> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type="file" device="disk"> <driver name="qemu" type="qcow2" io="threads" iothread="1"/> <source file="/var/lib/libvirt/images/vfio.qcow2"/> <target dev="vda" bus="virtio"/> <serial>HUS6588D984332</serial> <boot order="1"/> <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/> </disk> <disk type="block" device="disk"> <driver name="qemu" type="raw" cache="none" io="threads" discard="unmap" iothread="2"/> <source dev="/dev/RustTank/VirtualMachines/VFIO_Games_Drive"/> <target dev="vdc" bus="virtio"/> <address type="pci" domain="0x0000" bus="0x0d" slot="0x00" function="0x0"/> </disk> <disk type="file" device="disk"> <driver name="qemu" type="qcow2"/> <source file="/var/lib/libvirt/images/win10.qcow2"/> <target dev="vdd" bus="virtio"/> <readonly/> <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/> </disk> <disk type="file" device="cdrom"> <driver name="qemu" type="raw"/> <source file="/var/lib/libvirt/images/virtio-win.iso"/> <target dev="sda" bus="sata"/> <readonly/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk> <controller type="usb" index="0" model="qemu-xhci" ports="15"> <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/> </controller> <controller type="sata" index="0"> <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/> </controller> <controller type="pci" index="0" model="pcie-root"/> <controller type="pci" index="1" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="1" port="0x10"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/> </controller> <controller type="pci" index="2" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="2" port="0x11"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/> </controller> <controller type="pci" index="3" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="3" port="0x12"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/> </controller> <controller type="pci" index="4" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="4" port="0x13"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/> </controller> <controller type="pci" index="5" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="5" port="0x14"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/> </controller> <controller type="pci" index="6" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="6" port="0x15"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/> </controller> <controller type="pci" index="7" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="7" port="0x16"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/> </controller> <controller type="pci" index="8" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="8" port="0x17"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/> </controller> <controller type="pci" index="9" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="9" port="0x18"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0" multifunction="on"/> </controller> <controller type="pci" index="10" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="10" port="0x19"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x1"/> </controller> <controller type="pci" index="11" model="pcie-to-pci-bridge"> <model name="pcie-pci-bridge"/> <address type="pci" domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/> </controller> <controller type="pci" index="12" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="12" port="0x1a"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x2"/> </controller> <controller type="pci" index="13" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="13" port="0x1b"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x3"/> </controller> <controller type="pci" index="14" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="14" port="0x1c"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x4"/> </controller> <controller type="pci" index="15" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="15" port="0x1d"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x5"/> </controller> <controller type="pci" index="16" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="16" port="0x1e"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x6"/> </controller> <controller type="pci" index="17" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="17" port="0x1f"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x7"/> </controller> <controller type="pci" index="18" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="18" port="0x20"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x04" function="0x0" multifunction="on"/> </controller> <controller type="pci" index="19" model="pcie-root-port"> <model name="pcie-root-port"/> <target chassis="19" port="0x21"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x04" function="0x1"/> </controller> <controller type="virtio-serial" index="0"> <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/> </controller> <interface type="network"> <mac address="52:54:00:dd:82:2e"/> <source network="vnet_internal0"/> <model type="virtio"/> <address type="pci" domain="0x0000" bus="0x0c" slot="0x00" function="0x0"/> </interface> <serial type="pty"> <target type="isa-serial" port="0"> <model name="isa-serial"/> </target> </serial> <console type="pty"> <target type="serial" port="0"/> </console> <channel type="spicevmc"> <target type="virtio" name="com.redhat.spice.0"/> <address type="virtio-serial" controller="0" bus="0" port="1"/> </channel> <input type="tablet" bus="usb"> <address type="usb" bus="0" port="1"/> </input> <input type="mouse" bus="ps2"/> <input type="keyboard" bus="ps2"/> <tpm model="tpm-crb"> <backend type="emulator" version="2.0"/> </tpm> <graphics type="spice" autoport="yes"> <listen type="address"/> <image compression="off"/> <gl enable="no"/> </graphics> <sound model="ich9"> <address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/> </sound> <audio id="1" type="spice"/> <video> <model type="qxl" ram="65536" vram="65536" vgamem="16384" heads="1" primary="yes"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/> </video> <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x0b" slot="0x00" function="0x3"/> </source> <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/> </hostdev> <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x07" slot="0x10" function="0x1"/> </source> <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/> </hostdev> <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x42" slot="0x00" function="0x0"/> </source> <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/> </hostdev> <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x45" slot="0x00" function="0x3"/> </source> <address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/> </hostdev> <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x45" slot="0x00" function="0x2"/> </source> <address type="pci" domain="0x0000" bus="0x0e" slot="0x00" function="0x0"/> </hostdev> <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x45" slot="0x00" function="0x1"/> </source> <address type="pci" domain="0x0000" bus="0x10" slot="0x00" function="0x0"/> </hostdev> <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x45" slot="0x00" function="0x0"/> </source> <address type="pci" domain="0x0000" bus="0x11" slot="0x00" function="0x0"/> </hostdev> <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x07" slot="0x10" function="0x3"/> </source> <address type="pci" domain="0x0000" bus="0x12" slot="0x00" function="0x0"/> </hostdev> <memballoon model="virtio"> <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/> </memballoon> </devices> </domain>
I’m only going to go over the I think relivant things… so here we go:
CPU Pinning
Here I make sure to pin a physical core and its hyperthreded core together so that the L1 cache Windows thinks is there is accurate. From my testing, Windows is expecting is 1 to 1 what Linux is reporting. Make sure to double check with lstopo and other tools this is correct!
CPU mode
I’m using host-passthrough, so Windows thinks it its the same CPU as my host. I do this so that all the cpu extentions are properly utulized, also some games complain about the KVM generic cpu or the generic EPYC one. I have specter and meltdown midigations enabled with “migratable=”on””. This does make perforance slightly worse, but for my use case I can’t take any chances. The topoloy matches the topology of NUMA node 1 on my system that is pinned, so 1 numa node, 8 cores with 2x threads per. The only feature policy I have force enabled is topoext, could add more but this works. If I should add more, please post what and why.
Clock offset
This depends on how hidden you want your system to be from anti-cheat programs. One of the way they detect if a machine is a VM or not is by the RTC being different than the machine boot time. For mine, I don’t care to be hidden (at least in this VFIO vm), so I have options enabled to make the RTC more forgiving to lag, stutters, and what not.
IO Threads and IO Thread Pin
This is critical if you have fast storage. For my case, I have the boot drive on my Linux root SSD, and a Zvol mapped to another drive. Since these speeds can get crazy fast, they can cause an interupt or a compute task to take place on a random core. To fix this, I made 2 IO threads, each statically assigned to a single device and given a cores to minimize cross core tasks.
Emulator Pin
This sets the QEMU emulation tasks to be done on core #2. This just keeps it from getting in the way of other processes, and minimizes latency so it can utulize its L1 cache properly
OS Section
In the first line, I have my machine type set to Q35 version 7.0, which at the time of writing this is the latest machine type and latest QEMU version. This should pull the latest defaults for everything I think. Next 2 are just OVMF UEFI stuff. Lastly is the “smbios mode=”host””. This makes it so dmi-decode is passed through to the VM. It’s a no cost obfuscation stuff and bypasses some basic anti-cheat VM detection stuff. You can manually set these strings if you want, but I just pass through my host options.
Hyper-V stuff
These hyper-v tuneables are critical for decreasing lag with the VM by turning on hyper-v guest feaures that make the VM more easy to emulate. The first line “hyperv mode=”passthrough”” makes it so it tries to enable all of the featues. I’m not going to go through all of the featues, but you can read more about them here. Generally the more enabled the better, but not always. If you want to hide the fact that you are running a VM, you want to disable some of these and other things in the featues part. This part gets updated regulary, so check back on the official libvirt documentation for thigns to enable/disable.
Virtio-Block tuning
For each block device, I have manual IO threadding on with “io=”threads” iothread=”1″”. I also have a fake serial number enabled to somewhat hide that this is a VM with “<serial>HUS6588D984332</serial>” were the serial is some random real-ish sounding thing. For my ZFS array, I disabled caching for the virtual disk and set the discard mode to unmap. This helps with weird speed spiking issues. It is slower, but much more stable.
Audio
Audio on my setup is handled by a passed though USB controller that has a soundblaster X3 on it. I dump this into a mixer to combine the audio from the host and guest. Audio was a pain for me with this project, but I was lucky to find out that my usb controllers are on unique IOMMU groups, so I passed one through.
Windows Settings
Page file
For some reason some games freak out when the page file is small. I had a long running issue with Call of Duty Warzone where the game would crash with some memory issue. Turns out it was some issue with page file not being equal to the system memory. I set mine at 16384 mb static size on my passed through NVMe ssd.
PCIe Device interrupts
There are many ways for a pcie device to request resources from storage or memory. The old way to do this was line based interrupts, where all devices share the same IRQ. The new method is MSI (Message Signal Interrupts). More info can be found in this thread on guru3d. The short version of it is the old line based interrupts can cause unnecessary latency, so the devices should be set to MSI or even better MSI-X interrupts. There is a tool in that thread called MSI utility v3 that makes enabling these options easy in a nice GUI. I have my GPU forced to MSI-x and its priority set to high. Make sure every device that can support it has it enabled, as it will greatly aid with small stutters.
Re sizeable BAR and Above 4G decoding
Resizeable bar is a setting that allows the the GPU to bypass the CPU for getting things into memory. Helps dramatically with latency. You need to make sure your OVMF firmware has the feature enabled and your host UEFI has them enabled for this to work. Check with GPU-Z to see if these features are enabled.
Diagnosing issues
You will run across weird issues. These are the tools I’ve used to help me detect issues.
Latencymon
Latencymon is a tool used to see what is causing stuttering in Windows. It is great for finding what driver is causing lag. Load the system with something like a steam download, Iperf test, or benchmark and see what driver is causing issue.
dmesg
In Linux as root run dmesg. See what the latest errors are if there are any.
journalctl
Look at the logs for libvirt or other system functions.
numastat
Show numa hits and misses for calls, interupts and other things. Useful in a NUMA system.
glances, bashtop, htop
All of these are terminal tools to show what is using cpu, memory, disk, etc. very useful to see if another thing is taking resources.
Conclusion
I hope you were able to improve your VM gaming expirence with these tweaks. If you have any sudgested tweaks of your own, please post them! Congrats if you got to the end of this. If you have questions, I might be able to help, but no guarantees…
Thanks for the write-up. I gotta try some of these tunings to see if I can get better performance. I’m using a threadripper 1950x and seems like all my cores are in one NUMA node. Probably can’t make any improvements on that front.
The hyper-v “passthrough” mode makes the following lines in this section less relevant. Try leaving them out and post your results.