ZFS 2.3.3 crash: kernel panic in scrub path (zio_*)

### System information
Type | Version/Name
--- | ---
Distribution Name | Ubuntu
Distribution Version | 24.04 LTS
Kernel Version | 6.8.0-64-generic
Architecture | x86_64
OpenZFS Version | 2.3.3-1

---

### Describe the problem you're observing

During a `zpool scrub` on a `raidz1` pool with 8 SATA HDDs, the system experiences a kernel crash.  
The crash consistently occurs early in the scrub process.  
This leads to the `txg_sync` thread becoming blocked indefinitely. The system becomes partially unresponsive and requires a hard reboot to recover.

---

### Describe how to reproduce the problem

1. Boot into a system with ZFS 2.3.3 and Linux 6.8.0.
2. Have a pool configured with `raidz1` using 8 physical drives (WWN-based paths).
3. Start a `zpool scrub` on the pool:
   ```bash
   zpool scrub storage
   ```
4. Monitor with:
   ```bash
   watch zpool status -v
   ```
5. After some GB scanned, observe the system lock and crash in kernel.

---

### Include any warning/errors/backtraces from the system logs

#### 🔧 Live kernel messages captured:
```
jul 21 13:07:03 gresint-server kernel: BUG: unable to handle page fault for address: 00007970a8dcc605
jul 21 13:07:03 gresint-server kernel: RIP: 0010:zio_vdev_io_done+0x6e/0x240 [zfs]
jul 21 13:07:03 gresint-server kernel:  ? zio_vdev_io_done+0x6e/0x240 [zfs]
jul 21 13:07:03 gresint-server kernel:  ? zio_vdev_io_done+0x4e/0x240 [zfs]
jul 21 13:07:03 gresint-server kernel:  zio_execute+0x94/0x170 [zfs]
jul 21 13:07:03 gresint-server kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
jul 21 13:07:03 gresint-server kernel: RIP: 0010:zio_vdev_io_done+0x6e/0x240 [zfs]
```

---

### Additional Information

#### `zpool status -v`
```
  pool: storage
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub in progress since Sun Jul 20 23:49:14 2025
	7.19T / 125T scanned at 1.13G/s, 2.96T / 125T issued at 474M/s
	884K repaired, 2.37% done, 3 days 02:55:32 to go
config:

	NAME                        STATE     READ WRITE CKSUM
	storage                     ONLINE       0     0     0
	  raidz1-0                  ONLINE       0     0     0
	    wwn-0x5000c500e8a8c504  ONLINE       0     0     9  (repairing)
	    wwn-0x5000c500e8a8a1f6  ONLINE       0     0     4  (repairing)
	    wwn-0x5000c500f6e5f0ab  ONLINE       0     0     7  (repairing)
	    wwn-0x5000c500e8496d51  ONLINE       0     0     2  (repairing)
	    wwn-0x5000c500f6ee1532  ONLINE       0     0     3  (repairing)
	    wwn-0x5000c500e8b3a6f9  ONLINE       0     0     7  (repairing)
	    wwn-0x5000c500e88ed746  ONLINE       0     0     8  (repairing)
	    wwn-0x5000c500e8a8a0aa  ONLINE       0     0     6  (repairing)
	logs	
	  ubuntu-vg/slog-lv         ONLINE       0     0     0
	cache
	  ubuntu--vg-l2arc--lv      ONLINE       0     0     0

errors: No known data errors

```

#### `zpool get all storage`
```
NAME     PROPERTY                       VALUE                          SOURCE
storage  size                           146T                           -
storage  capacity                       85%                            -
storage  altroot                        -                              default
storage  health                         ONLINE                         -
storage  guid                           13753470766290521828           -
storage  version                        -                              default
storage  bootfs                         -                              default
storage  delegation                     on                             default
storage  autoreplace                    off                            default
storage  cachefile                      -                              default
storage  failmode                       wait                           default
storage  listsnapshots                  off                            default
storage  autoexpand                     on                             local
storage  dedupratio                     1.00x                          -
storage  free                           20.7T                          -
storage  allocated                      125T                           -
storage  readonly                       off                            -
storage  ashift                         12                             local
storage  comment                        -                              default
storage  expandsize                     -                              -
storage  freeing                        0                              -
storage  fragmentation                  36%                            -
storage  leaked                         0                              -
storage  multihost                      off                            default
storage  checkpoint                     -                              -
storage  load_guid                      4324850870775088562            -
storage  autotrim                       off                            default
storage  compatibility                  off                            default
storage  bcloneused                     0                              -
storage  bclonesaved                    0                              -
storage  bcloneratio                    1.00x                          -
storage  dedup_table_size               0                              -
storage  dedup_table_quota              auto                           default
storage  last_scrubbed_txg              0                              -
storage  feature@async_destroy          enabled                        local
storage  feature@empty_bpobj            enabled                        local
storage  feature@lz4_compress           active                         local
storage  feature@multi_vdev_crash_dump  enabled                        local
storage  feature@spacemap_histogram     active                         local
storage  feature@enabled_txg            active                         local
storage  feature@hole_birth             active                         local
storage  feature@extensible_dataset     active                         local
storage  feature@embedded_data          active                         local
storage  feature@bookmarks              enabled                        local
storage  feature@filesystem_limits      enabled                        local
storage  feature@large_blocks           enabled                        local
storage  feature@large_dnode            enabled                        local
storage  feature@sha512                 enabled                        local
storage  feature@skein                  enabled                        local
storage  feature@edonr                  enabled                        local
storage  feature@userobj_accounting     active                         local
storage  feature@encryption             enabled                        local
storage  feature@project_quota          active                         local
storage  feature@device_removal         enabled                        local
storage  feature@obsolete_counts        enabled                        local
storage  feature@zpool_checkpoint       enabled                        local
storage  feature@spacemap_v2            active                         local
storage  feature@allocation_classes     enabled                        local
storage  feature@resilver_defer         enabled                        local
storage  feature@bookmark_v2            enabled                        local
storage  feature@redaction_bookmarks    enabled                        local
storage  feature@redacted_datasets      enabled                        local
storage  feature@bookmark_written       enabled                        local
storage  feature@log_spacemap           active                         local
storage  feature@livelist               enabled                        local
storage  feature@device_rebuild         enabled                        local
storage  feature@zstd_compress          enabled                        local
storage  feature@draid                  enabled                        local
storage  feature@zilsaxattr             enabled                        local
storage  feature@head_errlog            active                         local
storage  feature@blake3                 enabled                        local
storage  feature@block_cloning          enabled                        local
storage  feature@vdev_zaps_v2           active                         local
storage  feature@redaction_list_spill   enabled                        local
storage  feature@raidz_expansion        enabled                        local
storage  feature@fast_dedup             enabled                        local
storage  feature@longname               enabled                        local
storage  feature@large_microzap         enabled                        local

```

#### SMART data (excerpt)
```
Obteniendo todos los discos del pool ZFS...
Se encontraron los siguientes discos:
  - wwn-0x5000c500e8a8c504
  - wwn-0x5000c500e8a8a1f6
  - wwn-0x5000c500f6e5f0ab
  - wwn-0x5000c500e8496d51
  - wwn-0x5000c500f6ee1532
  - wwn-0x5000c500e8b3a6f9
  - wwn-0x5000c500e88ed746
  - wwn-0x5000c500e8a8a0aa

---------------------------------------------------------
SMART info para wwn-0x5000c500e8a8c504 (/dev/sda):
---------------------------------------------------------
Device Model:     ST20000NM007D-3DJ103
Serial Number:    
SMART overall-health self-assessment test result: PASSED
Reallocated_Sector_Ct: 0
Current_Pending_Sector: 0
Offline_Uncorrectable: 0
Temperature: 36°C

---------------------------------------------------------
SMART info para wwn-0x5000c500e8a8a1f6 (/dev/sdb):
---------------------------------------------------------
Device Model:     ST20000NM007D-3DJ103
Serial Number:    
SMART overall-health self-assessment test result: PASSED
Reallocated_Sector_Ct: 0
Current_Pending_Sector: 0
Offline_Uncorrectable: 0
Temperature: 36°C

---------------------------------------------------------
SMART info para wwn-0x5000c500f6e5f0ab (/dev/sdc):
---------------------------------------------------------
Device Model:     ST20000NM007D-3DJ103
Serial Number:    
SMART overall-health self-assessment test result: PASSED
Reallocated_Sector_Ct: 2
Current_Pending_Sector: 0
Offline_Uncorrectable: 0
Temperature: 36°C
⚠️  Reallocated sectors detectados

-----------------------------------
... (truncated)
```

#### Troubleshooting steps attempted:
- Tested with `zfs_deadman_failmode` set to `panic`, `wait`, and `continue`
- Adjusted `zfs_vdev_scrub_max_active` and `min_active`
- Monitored `journalctl -k` live during scrub
- Verified all disks are SMART clean

---

### Final Notes

This appears to be a kernel-space memory access bug triggered by ZIO completion under scrub load.  
I'm available to test debug builds or apply custom patches if required.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZFS 2.3.3 crash: kernel panic in scrub path (zio_*) #17559

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

🔧 Live kernel messages captured:

Additional Information

`zpool status -v`

`zpool get all storage`

SMART data (excerpt)

Troubleshooting steps attempted:

Final Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	24.04 LTS
Kernel Version	6.8.0-64-generic
Architecture	x86_64
OpenZFS Version	2.3.3-1

ZFS 2.3.3 crash: kernel panic in scrub path (zio_*) #17559

Description

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

🔧 Live kernel messages captured:

Additional Information

zpool status -v

zpool get all storage

SMART data (excerpt)

Troubleshooting steps attempted:

Final Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`zpool status -v`

`zpool get all storage`