Skip to content

#18172: Detect and heal orphaned DTL entries on pool import#18173

Draft
nmarasoiu wants to merge 1 commit intoopenzfs:masterfrom
nmarasoiu:fix/orphaned-dtl-healing
Draft

#18172: Detect and heal orphaned DTL entries on pool import#18173
nmarasoiu wants to merge 1 commit intoopenzfs:masterfrom
nmarasoiu:fix/orphaned-dtl-healing

Conversation

@nmarasoiu
Copy link

@nmarasoiu nmarasoiu commented Feb 1, 2026

#18172

When a system crashes during vdev detach operations, the on-disk state can become inconsistent: the vdev tree shows the device as a "hole" or "missing", but DTL entries from the detached device remain on disk.

On pool import, these orphaned DTL entries can trigger a phantom resilver that scans the entire pool with no valid target device, causing severe I/O load and system instability.

This patch adds:

  • vdev_dtl_check_orphaned(): detect orphaned DTL on hole/missing vdevs
  • New import flag ZFS_IMPORT_HEAL_ORPHANED_DTL (0x100)
  • CLI option: zpool import -o heal_orphaned_dtl=on poolname
  • Clear warning messages to inform the administrator

By default, import fails with a helpful error message when orphaned DTL entries are detected, directing the user to re-import with the healing option. This ensures administrators are explicitly aware of and consent to the recovery action.

Motivation and Context

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

When a system crashes during vdev detach operations, the on-disk state
can become inconsistent: the vdev tree shows the device as a "hole" or
"missing", but DTL entries from the detached device remain on disk.

On pool import, these orphaned DTL entries can trigger a phantom
resilver that scans the entire pool with no valid target device,
causing severe I/O load and system instability.

This patch adds:
- vdev_dtl_check_orphaned(): detect orphaned DTL on hole/missing vdevs
- New import flag ZFS_IMPORT_HEAL_ORPHANED_DTL (0x100)
- CLI option: zpool import -o heal_orphaned_dtl=on poolname
- Clear warning messages to inform the administrator

By default, import fails with a helpful error message when orphaned
DTL entries are detected, directing the user to re-import with the
healing option. This ensures administrators are explicitly aware of
and consent to the recovery action.

Signed-off-by: Nicola Remasoiu <dumitru.nicolae.marasoiu@outlook.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
@nmarasoiu nmarasoiu marked this pull request as draft February 1, 2026 10:11
@github-actions github-actions bot added the Status: Work in Progress Not yet ready for general review label Feb 1, 2026
@amotin
Copy link
Member

amotin commented Mar 5, 2026

Would we have transactions to handle atomic operations... ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Work in Progress Not yet ready for general review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants