Files
blog/docs/tech/2026-3-18.md
T
cattom 6722718271
Deploy / deploy (push) Successful in 39s
Edit 2026-3-18.md
2026-03-20 12:40:41 +08:00

397 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 2026年3月18日事故
## 运行环境
- 零刻 SEi12 12450H
- CPU: i5-12450H
- 系统: Proxmox Virtual Environment 8.4.17
- Windows 虚拟机: Windows 11 专业版 24H2
## 主要问题
- Intel 核显 SR-IOV 补丁编译失败:
- 对应内核版本 v6.8: [strongtz/i915-sriov-dkms at 2025.07.22](https://github.com/strongtz/i915-sriov-dkms/tree/2025.07.22)
- 存在修改的文件: `/etc/default/grub` `/etc/sysfs.conf`
## 故障流程回顾
``` bash
apt install net-tools lshw ethtool dkms pve-headers
```
``` hl_lines="4"
Building module:
Cleaning build area...
export LEX=flex; export YACC=bison; cp defconfigs/i915_only .config; 'make' -j12 KLIB=/lib/modules/6.8.12-20-pve olddefconfig; 'make' -j12 KLIB=/lib/modules/6.8.12-20-pve BUILD_CONFIG=nodrm.....(bad exit status: 2)
Error! Bad return status for module build on kernel: 6.8.12-20-pve (x86_64)
Consult /var/lib/dkms/intel-i915-dkms/1.24.1.19.240119.1.nodrm/build/make.log for more information.
Error! One or more modules failed to install during autoinstall.
Refer to previous errors for more information.
dkms: autoinstall for kernel: 6.8.12-20-pve failed!
run-parts: /etc/kernel/postinst.d/dkms exited with return code 11
Failed to process /etc/kernel/postinst.d at /var/lib/dpkg/info/proxmox-kernel-6.8.12-20-pve-signed.postinst line 20.
dpkg: error processing package proxmox-kernel-6.8.12-20-pve-signed (--configure):
installed proxmox-kernel-6.8.12-20-pve-signed package post-installation script subprocess returned error exit status 2
Setting up lshw (02.19.git.2021.06.19.996aaad9c7-2+b1) ...
Setting up usb.ids (2025.07.26-0+deb12u1) ...
Setting up proxmox-headers-6.8.12-20-pve (6.8.12-20) ...
dpkg: dependency problems prevent configuration of proxmox-kernel-6.8:
proxmox-kernel-6.8 depends on proxmox-kernel-6.8.12-20-pve-signed | proxmox-kernel-6.8.12-20-pve; however:
Package proxmox-kernel-6.8.12-20-pve-signed is not configured yet.
Package proxmox-kernel-6.8.12-20-pve is not installed.
Package proxmox-kernel-6.8.12-20-pve-signed which provides proxmox-kernel-6.8.12-20-pve is not configured yet.
dpkg: error processing package proxmox-kernel-6.8 (--configure):
dependency problems - leaving unconfigured
Setting up proxmox-headers-6.8 (6.8.12-20) ...
Setting up proxmox-default-headers (1.1.0) ...
Setting up pve-headers (8.4.0) ...
Processing triggers for man-db (2.11.2-2) ...
Errors were encountered while processing:
proxmox-kernel-6.8.12-20-pve-signed
proxmox-kernel-6.8
E: Sub-process /usr/bin/dpkg returned an error code (1)
```
此时输出提示 Intel 核显 SR-IOV 补丁为内核 `6.8.12-20-pve` 构建失败。
接下来尝试根据项目 README 的指引重新安装 Intel 核显 SR-IOV 补丁。
``` bash
dpkg -i i915-sriov-dkms_2025.07.22_amd64.deb
```
``` hl_lines="14"
Selecting previously unselected package i915-sriov-dkms.
(Reading database ... 146949 files and directories currently installed.)
Preparing to unpack i915-sriov-dkms_2025.07.22_amd64.deb ...
Unpacking i915-sriov-dkms (2025.07.22) ...
Setting up i915-sriov-dkms (2025.07.22) ...
install dkms modules for all kernels
Loading new i915-sriov-dkms-2025.07.22 DKMS files...
Building for 6.1.10-1-pve, 6.8.8-2-pve, 6.8.12-19-pve and 6.8.12-20-pve
Building initial module for 6.1.10-1-pve
Error! The /var/lib/dkms/i915-sriov-dkms/2025.07.22/6.1.10-1-pve/x86_64/dkms.conf for module i915-sriov-dkms includes a BUILD_EXCLUSIVE directive which does not match this kernel/arch/config.
This indicates that it should not be built.
Skipped.
Building initial module for 6.8.8-2-pve
Error! Bad return status for module build on kernel: 6.8.8-2-pve (x86_64)
Consult /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/make.log for more information.
update-initramfs: deferring update (trigger activated)
Processing triggers for initramfs-tools (0.142+deb12u3) ...
update-initramfs: Generating /boot/initrd.img-6.8.12-20-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
```
此时输出提示 Intel 核显 SR-IOV 补丁为内核 `6.8.8-2-pve` 构建失败。注意,此时运行中的内核是 `6.8.8-2-pve`,事情不太妙。
接下来,试图仅为正在运行中的内核构建 Intel 核显 SR-IOV 补丁。
``` bash
dkms install -m i915-sriov-dkms -v 2025.07.22 -k $(uname -r)
```
``` hl_lines="7"
Sign command: /lib/modules/6.8.8-2-pve/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub
Building module:
Cleaning build area...
make -j12 KERNELRELEASE=6.8.8-2-pve -C /lib/modules/6.8.8-2-pve/build M=/var/lib/dkms/i915-sriov-dkms/2025.07.22/build...(bad exit status: 2)
Error! Bad return status for module build on kernel: 6.8.8-2-pve (x86_64)
Consult /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/make.log for more information.
```
``` title="/var/lib/dkms/i915-sriov-dkms/2025.07.22/build/make.log"
DKMS make.log for i915-sriov-dkms-2025.07.22 for kernel 6.8.8-2-pve (x86_64)
Wed Mar 18 01:01:42 PM CST 2026
make: Entering directory '/usr/src/linux-headers-6.8.8-2-pve'
warning: the compiler differs from the one used to build the kernel
The kernel was built by: gcc (Debian 12.2.0-14) 12.2.0
You are using: gcc (Debian 12.2.0-14+deb12u1) 12.2.0
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.9.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.9/drm_dp_tunnel.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.10.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.11.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.12.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_config.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_driver.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_drm_client.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_getparam.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_ioctl.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_irq.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_mitigations.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_module.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_params.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_pci.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_scatterlist.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_suspend.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_switcheroo.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_sysfs.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_utils.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/intel_clock_gating.o
/var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_drm_client.c: In function obj_meminfo:
/var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_drm_client.c:56:13: error: implicit declaration of function drm_gem_object_is_shared_for_memory_stats [-Werror=implicit-function-declaration]
56 | if (drm_gem_object_is_shared_for_memory_stats(&obj->base))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/intel_device_info.o
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:243: /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_drm_client.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [/usr/src/linux-headers-6.8.8-2-pve/Makefile:1926: /var/lib/dkms/i915-sriov-dkms/2025.07.22/build] Error 2
make: *** [Makefile:240: __sub-make] Error 2
make: Leaving directory '/usr/src/linux-headers-6.8.8-2-pve'
```
这下可以确定 Intel 核显 SR-IOV 补丁没法为内核 `6.8.8-2-pve` 构建。
接下来,试着删除已安装的 `i915-sriov-dkms` 包和相关构建目录。
``` bash
dpkg --purge i915-sriov-dkms
dkms remove i915-sriov-dkms/2025.07.22 --all
apt --fix-broken install
dpkg --configure -a
```
这部分的输出忘记记录,不过好像没什么特别。
接下来,发动俺寻思之力,试着删掉一些无用的内核。
``` bash
dpkg --list | grep proxmox-kernel
```
```
rc proxmox-kernel-6.1.10
...
ii proxmox-kernel-6.8.12-20-pve-signed
ii proxmox-kernel-6.8.12-19-pve-signed
ii proxmox-kernel-6.8
ii proxmox-headers-6.8.12-20-pve
ii proxmox-headers-6.8
ii proxmox-kernel-6.8.8-2-pve
...
```
``` bash
dpkg --purge (标着“rc”的内核)
```
``` bash
proxmox-boot-tool refresh
```
抱歉,我忘记执行这个命令,这可能是造成卡引导的关键疏忽...
> 执行完 apt --fix-broken install 后,如果不再出现错误提示,你可以重启服务器,它会自动进入新的内核(6.8.12-20),届时你的系统就彻底恢复了。
>
> Gemini
按照 Gemini 老师的指导,我没多想就点了重启。重启之后服务器就失联了...
采集卡看到的画面是这样的:
![绿油油一片...](https://static.cattom.site/new_blog_image/2026-3-18-1.png?x-oss-process=style/webp)
> 这张图片虽然模糊,但从显示的绿色背景和少量文字来看,这很典型的 Kernel Panic (内核恐慌) 或者 Initramfs 挂载根分区失败。
>
> Gemini
只能说 Gemini 牛大了,凭这绿油油一片的东西也能解读出东西...
可惜 KVM 是自部署的,以 CH9329 为后端的 HID 不稳定,电源控制和挂载镜像的功能是一点没有...
## 修复流程
修复方案三步走:
1. 尝试进入可用内核,恢复系统运行
2. 重新安装 Intel 核显 SR-IOV 补丁,恢复 Windows 虚拟机核显直通
3. 如果系统崩溃或者无法修复 Windows 虚拟机核显直通,则重装系统,顺便升级至 Proxmox 9
### 尝试进入可用内核
首先,请~~驻场运维为服务器加装可远控的电源开关~~老爸到房间给服务器电源加上闲置的米家智能插座。
然后,服务器上电,毫不意外,服务器没法正常启动,看来旧内核 `proxmox-kernel-6.8.8-2-pve` 大抵是坏掉了。
![毫不意外呀...](https://static.cattom.site/new_blog_image/2026-3-18-2.png?x-oss-process=style/webp)
遂重新上电,GNU GRUB - Advanced options for Proxmox VE GNU/Linux - `proxmox-kernel-6.8.12-20-pve`,正常进入系统,第一步完成。
### 恢复 Windows 虚拟机核显直通
首先,将 `/etc/default/grub` `/etc/sysfs.conf` 中有关 Intel 核显 SR-IOV 补丁的配置删除。
``` bash
proxmox-boot-tool kernel list
```
```
Manually selected kernels:
None.
Automatically selected kernels:
6.8.8-2-pve
6.8.12-19-pve
6.8.12-20-pve
Pinned kernel:
6.8.8-2-pve
```
诶?怎么内核版本被锁在 `6.8.8-2-pve`。遂解除内核锁定,让系统自动选择最新版本。
``` bash
proxmox-boot-tool kernel unpin
proxmox-boot-tool refresh
update-grub
```
重启,然后卡在了 GNU GRUB 的蓝色界面,怀疑是 CH9329 为后端的 HID 崩溃导致的。于是断电,过了一会再上电,系统使用 `proxmox-kernel-6.8.12-20-pve` 内核正常启动。
尝试根据项目 README 的指引重新安装 Intel 核显 SR-IOV 补丁。
``` bash
wget -O /tmp/i915-sriov-dkms_2025.07.22_amd64.deb "https://github.com/strongtz/i915-sriov-dkms/releases/download/2025.07.22/i915-sriov-dkms_2025.07.22_amd64.deb"
dpkg -i /tmp/i915-sriov-dkms_2025.07.22_amd64.deb
```
```
Selecting previously unselected package i915-sriov-dkms.
(Reading database ... 146949 files and directories currently installed.)
Preparing to unpack .../i915-sriov-dkms_2025.07.22_amd64.deb ...
Unpacking i915-sriov-dkms (2025.07.22) ...
Setting up i915-sriov-dkms (2025.07.22) ...
install dkms modules for all kernels
Loading new i915-sriov-dkms-2025.07.22 DKMS files...
Building for 6.1.10-1-pve, 6.8.8-2-pve, 6.8.12-19-pve and 6.8.12-20-pve
Building initial module for 6.1.10-1-pve
Error! The /var/lib/dkms/i915-sriov-dkms/2025.07.22/6.1.10-1-pve/x86_64/dkms.conf for module i915-sriov-dkms includes a BUILD_EXCLUSIVE directive which does not match this kernel/arch/config.
This indicates that it should not be built.
Skipped.
Building initial module for 6.8.8-2-pve
Error! Bad return status for module build on kernel: 6.8.8-2-pve (x86_64)
Consult /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/make.log for more information.
update-initramfs: deferring update (trigger activated)
Processing triggers for initramfs-tools (0.142+deb12u3) ...
update-initramfs: Generating /boot/initrd.img-6.8.12-20-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
```
输出提示 Intel 核显 SR-IOV 补丁为内核 `6.8.8-2-pve` 构建失败。等等?怎么还有 `6.8.8-2-pve` 的事?算了,先做几个检查确认目前情况。
``` bash
dkms status
```
```
i915-sriov-dkms/2025.07.22: added
```
``` bash
ls -l /lib/modules/$(uname -r)/kernel/drivers/gpu/drm/i915/
```
```
total 8760
-rw-r--r-- 1 root root 8114737 Mar 13 16:15 i915.ko
-rw-r--r-- 1 root root 849561 Mar 13 16:15 kvmgt.ko
```
``` bash
cat /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/make.log
```
``` title="/var/lib/dkms/i915-sriov-dkms/2025.07.22/build/make.log"
DKMS make.log for i915-sriov-dkms-2025.07.22 for kernel 6.8.8-2-pve (x86_64)
Fri Mar 20 12:15:58 AM CST 2026
make: Entering directory '/usr/src/linux-headers-6.8.8-2-pve'
warning: the compiler differs from the one used to build the kernel
The kernel was built by: gcc (Debian 12.2.0-14) 12.2.0
You are using: gcc (Debian 12.2.0-14+deb12u1) 12.2.0
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.9.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.9/drm_dp_tunnel.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.10.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.11.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/compat/backport-6.12.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_config.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_driver.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_drm_client.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_getparam.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_ioctl.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_irq.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_mitigations.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_module.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_params.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_pci.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_scatterlist.o
/var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_drm_client.c: In function obj_meminfo:
/var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_drm_client.c:56:13: error: implicit declaration of function drm_gem_object_is_shared_for_memory_stats [-Werror=implicit-function-declaration]
56 | if (drm_gem_object_is_shared_for_memory_stats(&obj->base))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_suspend.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_switcheroo.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_sysfs.o
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_utils.o
cc1: some warnings being treated as errors
CC [M] /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/intel_clock_gating.o
make[2]: *** [scripts/Makefile.build:243: /var/lib/dkms/i915-sriov-dkms/2025.07.22/build/drivers/gpu/drm/i915/i915_drm_client.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [/usr/src/linux-headers-6.8.8-2-pve/Makefile:1926: /var/lib/dkms/i915-sriov-dkms/2025.07.22/build] Error 2
make: *** [Makefile:240: __sub-make] Error 2
make: Leaving directory '/usr/src/linux-headers-6.8.8-2-pve'
```
上述输出确定 Intel 核显 SR-IOV 补丁包已经被添加到了 DKMS 中,为内核 `6.8.8-2-pve` 构建失败,为当前运行内核 `6.8.12-20-pve` 的构建情况未知。
接下来,令 DKMS 仅针对当前运行内核编译。
``` bash
dkms remove i915-sriov-dkms/2025.07.22 --all
dkms add i915-sriov-dkms/2025.07.22
dkms install i915-sriov-dkms/2025.07.22 -k $(uname -r)
```
``` bash
dkms status
```
```
i915-sriov-dkms/2025.07.22, 6.8.12-20-pve, x86_64: installed
```
Intel 核显 SR-IOV 补丁包为当前运行内核 `6.8.12-20-pve` 安装成功。接下来继续按照项目 README 的指引重新配置 Intel 核显 SR-IOV 功能。
``` bash
nano /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on i915.enable_guc=3 i915.max_vfs=7 module_blacklist=xe"
```
``` bash
update-grub
update-initramfs -u
```
``` bash
proxmox-boot-tool kernel pin 6.8.12-20-pve
proxmox-boot-tool refresh
```
``` bash
apt install sysfsutils
echo "devices/pci0000:00/0000:00:02.0/sriov_numvfs = 7" > /etc/sysfs.conf
```
最后,重启服务器。启动 Windows 虚拟机,重新安装 Intel 显卡驱动,Jellyfin 视频转码正常,成功恢复 Windows 虚拟机核显直通。
Windows 虚拟机配置显卡直通可参考[补丁项目 README 的相关内容](https://github.com/strongtz/i915-sriov-dkms/tree/2025.07.22#windows-guest-tested-with-proxmox-83--windows-11-24h2--intel-driver-32010164603201016259),其中提取显卡 EFI 固件的部分不一定需要。