Add PCI Troubleshooting

This commit is contained in:
PROTechThor 2020-10-18 10:08:03 +01:00
parent f1e6afeb14
commit 57d66a7146
3 changed files with 143 additions and 27 deletions

1
doc.md
View File

@ -122,6 +122,7 @@ Core documentation for Qubes users.
* [Installation Troubleshooting](/doc/installation-troubleshooting)
* [UEFI Troubleshooting](/doc/uefi-troubleshooting/)
* [PCI Troubleshooting](/doc/pci-troubleshooting/)
* [Home directory is out of disk space error](/doc/out-of-memory/)
* [Installing on system with new AMD GPU (missing firmware problem)](https://groups.google.com/group/qubes-devel/browse_thread/thread/e27a57b0eda62f76)
* [How to install an Nvidia driver in dom0](/doc/install-nvidia-driver/)

View File

@ -81,30 +81,7 @@ For example, if `00_1a.0` is the BDF of the device you want to attach to the "wo
## Possible Issues ##
### DMA Buffer Size ###
VMs with attached PCI devices in Qubes have allocated a small buffer for DMA operations (called swiotlb).
By default it is 2MB, but some devices need a larger buffer.
To change this allocation, edit VM's kernel parameters (this is expressed in 512B chunks):
# qvm-prefs netvm |grep kernelopts
kernelopts : iommu=soft swiotlb=2048 (default)
# qvm-prefs -s netvm kernelopts "iommu=soft swiotlb=8192"
This is [known to be needed][ml1] for the Realtek RTL8111DL Gigabit Ethernet Controller.
### PCI Passthrough Issues ###
Sometimes the PCI arbitrator is too strict.
There is a way to enable permissive mode for it.
See also: [this thread][ml2] and the Xen wiki's [PCI passthrough] page.
At other times, you may instead need to disable the FLR requirement on a device.
Both can be achieved during attachment with `qvm-pci` as described below.
Visit the [PCI Troubleshooting guide](pci-troubleshoot) to see issues that may arise due to PCI devices and how to troubleshoot them.
## Additional Attach Options ##
@ -166,9 +143,7 @@ or
[USB]:/doc/usb-devices/
[appmenu]: /attachment/wiki/Devices/qubes-appmenu-select.png
[domain manager icon]: /attachment/wiki/Devices/qubes-logo-icon.png
[pci-troubleshoot]:/doc/pci-troubleshooting
[qvm-device]: /doc/device-handling/#general-qubes-device-widget-behavior-and-handling
[side channel attacks]: https://en.wikipedia.org/wiki/Side-channel_attack
[ml1]: https://groups.google.com/group/qubes-devel/browse_thread/thread/631c4a3a9d1186e3
[ml2]: https://groups.google.com/forum/#!topic/qubes-users/Fs94QAc3vQI
[PCI passthrough]: https://wiki.xen.org/wiki/Xen_PCI_Passthrough

View File

@ -0,0 +1,140 @@
---
layout: doc
title: PCI Troubleshooting
permalink: /doc/pci-troubleshooting/
---
# PCI troubleshooting #
## DMA errors ##
VMs with attached PCI devices in Qubes have allocated a small buffer for DMA operations (called swiotlb).
By default, it is 2MB, but some devices (such as the [Realtek RTL8111DL Gigabit Ethernet Controller](https://groups.google.com/group/qubes-devel/browse_thread/thread/631c4a3a9d1186e3)) need a larger DMA buffer size.
Without a larger buffer, you will face DMA errors such as `Failed to map TX DMA`.
To change this allocation, edit VM's kernel parameters (this is expressed in 512B chunks) by running the following in a dom0 terminal:
# qvm-prefs netvm |grep kernelopts
kernelopts : iommu=soft swiotlb=2048 (default)
# qvm-prefs -s netvm kernelopts "iommu=soft swiotlb=8192"
## PCI Passthrough Issues ##
Sometimes the PCI arbitrator is too strict, which may cause errors such as `Unable to reset PCI device` and other PCI-related errors.
There is a way to enable permissive mode for it.
See also: [this thread](https://groups.google.com/forum/#!topic/qubes-users/Fs94QAc3vQI) and the Xen wiki's [PCI passthrough](https://wiki.xen.org/wiki/Xen_PCI_Passthrough) page.
Other times, you may instead need to disable the FLR requirement on a device.
Both can be achieved during attachment with `qvm-pci` as described [PCI Devices documentaton](/doc/pci-devices/#additional-attach-options).
## "Unable to reset PCI device" errors ##
### libvirt.libvirtError: internal error: Unable to reset PCI device [...]: internal error: Active [...] devices on bus with [...], not doing bus reset ###
After running `qvm-start sys-net`, you may encounter an error message which begins with `libvirt.libvirtError: internal error: Unable to reset PCI device`.
This issue is likely to occur if you have the same device assigned to more than one
VM.
When you try to start sys-net with the `qvm-start sys-net` command, there is already a VM running (e.g., autostarting) with one or more of the same devices as those assigned to sys-net.
To fix the error, remove the offending PCI device.
#### Using the Qubes interface ####
From the "Selected" panel in sys-net, navigate to VM Settings, then Devices. There, you can remove the offending PCI device(s) and keep the desired PCI device.
#### Using the command line ####
1. To see all the PCI available devices, enter the `lspci` command into the dom0 terminal. Each device will be listed on a line, for example:
~~~
0000:03:00.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b)
~~~
In the above output, the BDF (Bus Device Function) of the device is `0000:03:00.0`
2. Now that you can see all the PCI devices and their BDFs, you can decide which to remove and which to keep.
Imagine we faced the following error message:
~~~
libvirt.libvirtError: internal error: Unable to reset PCI device 0000:03:00.1: internal error: Active 0000:03:00.0 devices on bus with 0000:03:00.1, not doing bus reset
~~~
In the above case, the device `0000:03:00.1` is the device which we want to use. But we are facing the `Unable to reset PCI device` error because another device, `0000:03:00.0`, is active.
To fix this error and get device `0000:03:00.1` to work, we must first remove the offending device `0000:03:00.0`
~~~
sudo su
echo -n "1" > /sys/bus/pci/devices/0000:03:00.0/remove
~~~
3. In order to make this change persistent, create a file `/etc/systemd/system/qubes-pre-netvm.service` and add the following:
~~~
[Unit]
Description=Netvm fixup
Before=qubes-netvm.service
[Service]
ExecStart=/bin/sh -c 'echo -n "1" > /sys/bus/pci/devices/0000:03:00.0/remove'
Type=oneshot
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
~~~
Finally, run `systemctl enable qubes-pre-netvm.service` and it will now persist between reboots.
### Domain [...] has failed to start: internal error: Unable to reset PCI device [...]: no FLR, PM reset or bus reset available ###
This is a [PCI passthrough issue](/doc/pci-troubleshooting/#pci-passthrough-issues), which occurs when PCI arbitrator is too strict.
There is a way to enable permissive mode for it.
Sometimes, you may instead need to disable the FLR requirement on a device.
Both can be achieved during attachment with `qvm-pci` as described below.
NOTE: The `permissive` flag increases attack surface and possibility of [side channel attacks](https://en.wikipedia.org/wiki/Side-channel_attack).
While using the `no-strict-reset` flag, do not require PCI device to be reset before attaching it to another VM. This may leak usage data even without malicious intent. Both `permissive` and `no-strict-reset` options may not be necessary and you should try one first, then the other, before using both.
~~~
qvm-pci attach --persistent --option permissive=true --option no-strict-reset=true sys-usb dom0:<BDF_OF_DEVICE>
~~~
Be sure to replace `<BDF_OF_DEVICE>` with the BDF of your PCI device, which can be be obtained from running `qvm-pci`.
You can also configure strict reset directly from the Qubes interface by following these steps:
1. Go to the sys-net VM settings
2. Go to Devices
3. Make sure the device is in the right field
4. Click "Configure strict reset for PCI devices"
5. Select the device, click OK and apply
## Broadcom BCM43602 Wi-Fi card causes system freeze ##
You may face the problem where the BCM43602 Wi-Fi chip causes a system freeze whenever it is attached to a VM. To fix this problem on a Macbook, follow the steps in [Macbook Troubleshooting](/doc/macbook-troubleshooting/#7-fix-system-freezes-due-to-broadcom-bcm43602).
For other non-Macbook machines, it is advisable to replace the Broadcom BCM43602 with one known to work on Qubes, such as the Atheros AR9462.
Note that your computer manufacturer may have added a Wi-Fi card whitelist in your BIOS, which will prevent booting your computer if you have a non-listed wireless card.
It is possible bypass this limitation by removing the whitelist, disabling a check for it or modifying the whitelist to replace device ID of a whitelisted WiFi card with device ID of your new WiFi card.
## Wireless card stops working after dom0 update ##
There have been many instances where a Wi-Fi card stops working after a dom0 update.
If you run `sudo dmesg` in sys-net, you may see errors beginning with `iwlwifi`.
You can fix the problem by going to the sys-net VM's settings and changing the VM kernel to the previous version.
## Attached devices in Windows HVM stop working on suspend/resume ##
After the whole system gets suspended into S3 sleep and subsequently resumed, some attached devices may stop working.
Refer to [Suspend/Resume Troubleshooting](/doc/suspend-resume-troubleshooting/#attached-devices-in-Windows-HVM-stop-working-on-suspendresume) for a solution.
## PCI device not available in dom0 after unassigning from a qube ##
After assigning a PCI device to a qube, then unassigning it/shutting down the qube, the device is not available in dom0.
This is an intended feature.
A device which was previously assigned to a less trusted qube could attack dom0 if it were automatically reassigned there.
Look at the [FAQs](/faq/#i-assigned-a-pci-device-to-a-qube-then-unassigned-itshut-down-the-qube-why-isnt-the-device-available-in-dom0) to learn how to re-enable the device in dom0.