From e19ff7162fab8cfa5a09598f86ff87f36d7367ce Mon Sep 17 00:00:00 2001 From: PUVVADA BHASKAR <2400030295@kluniversity.in> Date: Tue, 4 Nov 2025 15:19:21 +0530 Subject: [PATCH 1/2] Updated deduplication section in zfsconcepts.7 for clarity --- man/man7/zfsconcepts.7 | 49 +++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/man/man7/zfsconcepts.7 b/man/man7/zfsconcepts.7 index bb2178d85bcd..1671eedd0f01 100644 --- a/man/man7/zfsconcepts.7 +++ b/man/man7/zfsconcepts.7 @@ -181,32 +181,33 @@ See .Xr systemd.mount 5 for details. .Ss Deduplication -Deduplication is the process for removing redundant data at the block level, -reducing the total amount of data stored. -If a file system has the +Deduplication is the process of eliminating redundant data blocks at the storage +level, so that only one copy of each unique block is kept. When the .Sy dedup -property enabled, duplicate data blocks are removed synchronously. -The result -is that only unique data is stored and common components are shared among files. -.Pp -Deduplicating data is a very resource-intensive operation. -It is generally recommended that you have at least 1.25 GiB of RAM -per 1 TiB of storage when you enable deduplication. -Calculating the exact requirement depends heavily -on the type of data stored in the pool. -.Pp -Enabling deduplication on an improperly-designed system can result in -performance issues (slow I/O and administrative operations). -It can potentially lead to problems importing a pool due to memory exhaustion. -Deduplication can consume significant processing power (CPU) and memory as well -as generate additional disk I/O. -.Pp -Before creating a pool with deduplication enabled, ensure that you have planned -your hardware requirements appropriately and implemented appropriate recovery -practices, such as regular backups. -Consider using the +property is enabled on a dataset, ZFS compares new data to existing blocks and +stores references instead of duplicate copies. + +.Pp +While this can reduce storage usage when large amounts of identical data exist, +deduplication is a very resource-intensive feature. It maintains a +deduplication table (DDT) in memory, which can grow significantly depending on +the amount of stored data. As a general guideline, at least 1.25 GiB of RAM per +1 TiB of pool storage is recommended, though the actual requirement varies with +workload and data type. + +.Pp +Enabling deduplication without sufficient system resources can lead to slow I/O, +excessive memory and CPU use, and in extreme cases, difficulty importing the +pool due to memory exhaustion. For these reasons, deduplication is not generally +recommended unless there is a clear need for it—such as virtual machine images +or backup datasets containing highly duplicated data. + +.Pp +For most users, the .Sy compression -property as a less resource-intensive alternative. +property offers a more efficient and safer way to save space with far less +performance impact. Always test and verify system performance before enabling +deduplication in a production environment. .Ss Block cloning Block cloning is a facility that allows a file (or parts of a file) to be .Qq cloned , From 765ec9df6dfd11e56a98158ffb21e2d5f3767727 Mon Sep 17 00:00:00 2001 From: PUVVADA BHASKAR <2400030295@kluniversity.in> Date: Tue, 4 Nov 2025 16:21:19 +0530 Subject: [PATCH 2/2] docs: clarify deduplication and add block cloning details in zfsconcepts.7 --- man/man7/zfsconcepts.7 | 43 +++++++++++++++++++++--------------------- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/man/man7/zfsconcepts.7 b/man/man7/zfsconcepts.7 index 1671eedd0f01..afe925dd63c1 100644 --- a/man/man7/zfsconcepts.7 +++ b/man/man7/zfsconcepts.7 @@ -181,35 +181,36 @@ See .Xr systemd.mount 5 for details. .Ss Deduplication -Deduplication is the process of eliminating redundant data blocks at the storage -level, so that only one copy of each unique block is kept. When the +Deduplication is the process of eliminating redundant data blocks at the +storage level so that only one copy of each unique block is kept. +When the .Sy dedup property is enabled on a dataset, ZFS compares new data to existing blocks and stores references instead of duplicate copies. - .Pp While this can reduce storage usage when large amounts of identical data exist, -deduplication is a very resource-intensive feature. It maintains a +deduplication is a very resource-intensive feature. +It maintains a deduplication table (DDT) in memory, which can grow significantly depending on -the amount of stored data. As a general guideline, at least 1.25 GiB of RAM per -1 TiB of pool storage is recommended, though the actual requirement varies with -workload and data type. - +the amount of stored data. +As a general guideline, at least 1.25 GiB of RAM per 1 TiB of pool storage is +recommended, though the actual requirement varies with workload and data type. .Pp Enabling deduplication without sufficient system resources can lead to slow I/O, excessive memory and CPU use, and in extreme cases, difficulty importing the -pool due to memory exhaustion. For these reasons, deduplication is not generally -recommended unless there is a clear need for it—such as virtual machine images -or backup datasets containing highly duplicated data. - +pool due to memory exhaustion. +For these reasons, deduplication is not generally recommended unless there is a +clear need for it, such as virtual machine images or backup datasets containing +highly duplicated data. .Pp For most users, the .Sy compression property offers a more efficient and safer way to save space with far less -performance impact. Always test and verify system performance before enabling -deduplication in a production environment. +performance impact. +Always test and verify system performance before enabling deduplication in a +production environment. .Ss Block cloning -Block cloning is a facility that allows a file (or parts of a file) to be +Block cloning is a facility that allows a file, or parts of a file, to be .Qq cloned , that is, a shallow copy made where the existing data blocks are referenced rather than copied. @@ -224,8 +225,8 @@ Cloned blocks are tracked in a special on-disk structure called the Block Reference Table .Po BRT .Pc . -Unlike deduplication, this table has minimal overhead, so can be enabled at all -times. +Unlike deduplication, this table has minimal overhead, so it can be enabled at +all times. .Pp Also unlike deduplication, cloning must be requested by a user program. Many common file copying programs, including newer versions of @@ -233,15 +234,15 @@ Many common file copying programs, including newer versions of will try to create clones automatically. Look for .Qq clone , -.Qq dedupe +.Qq dedupe , or .Qq reflink in the documentation for more information. .Pp There are some limitations to block cloning. -Only whole blocks can be cloned, and blocks can not be cloned if they are not -yet written to disk, or if they are encrypted, or the source and destination +Only whole blocks can be cloned, and blocks cannot be cloned if they are not yet +written to disk, or if they are encrypted, or if the source and destination .Sy recordsize properties differ. -The OS may add additional restrictions; +The operating system may add additional restrictions; for example, most versions of Linux will not allow clones across datasets.