Skip to content

btrfs-progs: offline filesystem resize feature #1007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: devel
Choose a base branch
from

Conversation

loemraw
Copy link
Contributor

@loemraw loemraw commented Jul 29, 2025

This patch introduces the ability to resize a btrfs filesystem while it is not mounted via a new --offline flag. Currently only increasing the size of the filesystem is supported, though I believe it would be possible to implement shrinking the filesystem to the end of the last device extent.

This is a more general, and hopefully more useful, solution to the problem I was trying to solve with the
("btrfs-progs: add slack space for mkfs --shrink") patch. This patch should enable users to resize a filesystem without the higher capabilities needed for mounting a filesystem.

SidongYang and others added 15 commits July 23, 2025 18:21
Use SUBVOL_SYNC_WAIT ioctl for 'btrfs subvolume sync' command before
checking periodically and add an option to not use sync wait ioctl call
and force to check periodically. This patch calls a new function
wait_for_subvolume_sync() that calls BTRFS_IOC_SUBVOL_SYNC_WAIT for each
subvol.

Issue: kdave#953
Pull-request: kdave#989
Signed-off-by: Sidong Yang <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The 'btrfs rescue zero-log' text says that it's for old bugs with log
replay but there were some recent ones in 6.15.x, the command will stay
just in case. The rest are minor updates.

Issue: kdave#1000
Signed-off-by: David Sterba <[email protected]>
As reported, some of the information is not up to date regarding status.

Issue: kdave#996
Signed-off-by: David Sterba <[email protected]>
Make the message a bit more clear that it's related to the send/receive
use case where the parent subvolume needs to match.

Issue: kdave#1003
Pull-request: kdave#1005
Signed-off-by: David Sterba <[email protected]>
Pull-request: kdave#999
Signed-off-by: Diego Viola <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Add new option --nocomp to set flag which will tell kernel to defragment
file extents without compression and decompress existing extents if
needed. The defrag setting will override any current compression
settings like mount options or file properties.  The option is separate
from '-c' so it's more obvious it's mutually exclusive.

Signed-off-by: David Sterba <[email protected]>
Partial sync.

Signed-off-by: David Sterba <[email protected]>
GCC 15 is available, add it as default compiler to tumbleweed image to
catch new warnings.

Signed-off-by: David Sterba <[email protected]>
It's been a year since the last update (2024-06-30) and the docker image
does not build anymore due to missing packages and unreachable archives.

Signed-off-by: David Sterba <[email protected]>
Previous Android compatibility was removed in 51f15d3
("btrfs-progs: build: remove incomplete android support").

- add pthread_cancel() API stubs and pthread_setcanceltype() emulation,
  since Android NDK does not support it, use pthread_kill() instead

- add manual thread state tracking to 'rescue chunk-recover' command
  using atomics; code is from termux/termux-packages

- add stub for qsort_r() using tread-local storage

The compatibility code is in common/compat.h and should be include unles
the internal APIs include that already (like sort-utils.h or
task-utils.h).

The CI does not yet verify the Android build yet.

Pull-request: kdave#982
Signed-off-by: Shadichy <[email protected]>
Signed-off-by: David Sterba <[email protected]>
- reformat or reflow nested lists
- links in () without description

[ci skip]

Signed-off-by: David Sterba <[email protected]>
@Forza-tng
Copy link
Contributor

Will this allow for resizing a fs that would normally go ro due to enospc during mount?

@loemraw
Copy link
Contributor Author

loemraw commented Jul 29, 2025

Will this allow for resizing a fs that would normally go ro due to enospc during mount?

Yeah, I don't see why not!

@adam900710
Copy link
Collaborator

Will this allow for resizing a fs that would normally go ro due to enospc during mount?
Yeah, I don't see why not!

Unfortunately not that simple. If the fs has metadata full, it may still fail.
As this is still using regular transaction protection, metadata still need space for COW.

Thankfully this one is only mostly modifying chunk tree, which is not that common to exhaust system chunks.
But your btrfs_start_transaction() is using fs root, which is the wrong target, and can lead to unexpected ENOSPC error.

Please use chunk root instead since you're only modifying chunk tree.

Furthermore there are a lot of code style problems, like unexpected new lines splitting variable definitions (e.g. between new_size and old_total inside offline_resize()).
IIRC you can use checkpatch.pl from the kernel, as we mostly follow the same code style here.

And the parameter list for that function is also pretty bad. I'd say normally we put more important parameter (like path, which specifies a whole fs) first, and more specific parameters like new size after that.
Furthermore, the amount parameter as char doesn't make much sense, why not just parse the value and directly pass a s64 instead?

@adam900710
Copy link
Collaborator

And a lot of error handling is missing.

E.g. if btrfs_update_device() failed, you should abort the current transaction other than continue to commit the transaction.
This also gets rid of the unnecessary ret2 variable.

@loemraw
Copy link
Contributor Author

loemraw commented Jul 30, 2025

Will this allow for resizing a fs that would normally go ro due to enospc during mount?
Yeah, I don't see why not!

Unfortunately not that simple. If the fs has metadata full, it may still fail. As this is still using regular transaction protection, metadata still need space for COW.

Ahh yeah, that makes sense.

Thankfully this one is only mostly modifying chunk tree, which is not that common to exhaust system chunks. But your btrfs_start_transaction() is using fs root, which is the wrong target, and can lead to unexpected ENOSPC error.

Good catch...

I don't have the best understanding of btrfs transactions. When I was testing this patch it appeared like the filesystem was being resized correctly, why would this be the case if I'm passing the wrong target? Was it making changes to the wrong tree? Was I just getting lucky?

Please use chunk root instead since you're only modifying chunk tree.

Will do.

Furthermore there are a lot of code style problems, like unexpected new lines splitting variable definitions (e.g. between new_size and old_total inside offline_resize()). IIRC you can use checkpatch.pl from the kernel, as we mostly follow the same code style here.

Will check this in v2.

And the parameter list for that function is also pretty bad. I'd say normally we put more important parameter (like path, which specifies a whole fs) first, and more specific parameters like new size after that. Furthermore, the amount parameter as char doesn't make much sense, why not just parse the value and directly pass a s64 instead?

I'll rethink the parameter ordering.

I pass the amount parameter as a char because I call check_offline_resize_args inside offline_resize and I wanted to keep all of the argument logic together.

@loemraw
Copy link
Contributor Author

loemraw commented Jul 30, 2025

And a lot of error handling is missing.

E.g. if btrfs_update_device() failed, you should abort the current transaction other than continue to commit the transaction. This also gets rid of the unnecessary ret2 variable.

Ok, makes sense will fix in v2.

@adam900710
Copy link
Collaborator

I don't have the best understanding of btrfs transactions. When I was testing this patch it appeared like the filesystem was being resized correctly, why would this be the case if I'm passing the wrong target? Was it making changes to the wrong tree? Was I just getting lucky?

The target root for btrfs_start_transaction() is only for space reservation purpose. For btrfs-progs, it's only checking if we have enough space. So a wrong root here won't cause a problem, until the metadata space is almost exhausted.
In that case, btrfs_start_transaction() will return ENOSPC.

I pass the amount parameter as a char because I call check_offline_resize_args inside offline_resize and I wanted to keep all of the argument logic together.

You can check how other codes in btrfs-progs is doing.
For most cases, if we want to pass a size (or a diff of size), we would pass u64 or s64 instead, and do the parsing before calling the final function.

There are exceptions like the existing check_resize_args() where amount can be the string cancel.
But for most cases, the string parsing is done as soon as possible.
E.g. the size variable inside cmd_filesystem_mkswapfile().

In your particular case, we won't support anything other than a number, thus parsing it early will be a more common solution.

BTW, for your update, you can just force push the same branch. No need to create a new PR.

@loemraw
Copy link
Contributor Author

loemraw commented Jul 31, 2025

I pass the amount parameter as a char because I call check_offline_resize_args inside offline_resize and I wanted to keep all of the argument logic together.

You can check how other codes in btrfs-progs is doing. For most cases, if we want to pass a size (or a diff of size), we would pass u64 or s64 instead, and do the parsing before calling the final function.

There are exceptions like the existing check_resize_args() where amount can be the string cancel. But for most cases, the string parsing is done as soon as possible. E.g. the size variable inside cmd_filesystem_mkswapfile().

In your particular case, we won't support anything other than a number, thus parsing it early will be a more common solution.

In this case there is a difference between passing "+1G" and "1G" that would be lost if I just passed a u64. "+1G" increases the filesystem size by 1G and "1G" sets the filesystem size to 1G. I could pass a u64 along with a boolean to indicate whether the size is relative to the existing filesystem size, but it feels like this logic would be better encapsulated inside check_offline_resize_args.

@Zygo
Copy link

Zygo commented Jul 31, 2025

Will this allow for resizing a fs that would normally go ro due to enospc during mount?

An offline tool that increases a device's size--and does nothing else--is much more likely to succeed in that situation than mount + fi-resize, but it can still fail in one really specific case.

mount does a number of things that can result in enospc failure. Off the top of my head: orphan inode reclaim, the snapshot cleaner thread, tearing down an incomplete reloc tree, and resuming a balance (this is the only one that can be turned off by a mount option). Avoiding these other tasks makes success far more likely.

The chunk tree is stored in the system chunk, which is a dedicated contiguous storage area that is usually already large enough for two copies of the chunk tree. To run out of space there, you'd need hundreds of thousands of chunks, like a single profile 100 TiB filesystem that has no unallocated space at the moment that the chunk tree needs to be enlarged, with a system chunk that still has the original 32MB size from mkfs. On the other hand, while that's a rare case, it would be an especially painful one that you can't fix with this method.

Users can run into this case fairly often when they have filesystems that are close to this threshold size (100 TiB or multiples thereof). A smaller filesystem doesn't allocate enough chunks to require a new system chunk, while a bigger filesystem would have unallocated space when it needs a new system chunk. It also comes up if you're running any striped profile (raid0, raid10, raid5, or raid6) and you don't do something to reduce dev_extent fragmentation--I hit this with a 26T filesystem once, that had 250k chunks after several drive upgrades and naive balances.

If you're willing to accept overwriting a metadata page in place, you can resize a device by finding the page in the chunk tree where the device item is, changing the size field in the dev item, and updating the csum (in addition to the superblock changes, which are already overwrite-in-place, and repeated across all mirrors). Increasing or decreasing a device size (without relocating any chunks) doesn't require any new allocations.

@adam900710
Copy link
Collaborator

In this case there is a difference between passing "+1G" and "1G" that would be lost if I just passed a u64. "+1G" increases the filesystem size by 1G and "1G" sets the filesystem size to 1G.

Just use s64

This patch introduces the ability to resize a btrfs filesystem while it
is not mounted via a new `--offline` flag. Currently only increasing the
size of the filesystem is supported, though I believe it would be
possible
to implement shrinking the filesystem to the end of the last device
extent.

This is a more general, and hopefully more useful, solution to the
problem
I was trying to solve with the
("btrfs-progs: add slack space for mkfs --shrink") patch. This patch
should enable users to resize a filesystem without the higher
capabilities needed for mounting a filesystem.

Signed-off-by: Leo Martins <[email protected]>
---
Changelog:

v1->v2:
- use chunk root instead of fs root
- fix offline resize error handling
- fix variable declarations to not have newlines
- fix parameter list to have more important arguments first
@loemraw loemraw force-pushed the feature-offline-resize branch from d7e49b7 to 44e653a Compare August 6, 2025 21:27
@loemraw
Copy link
Contributor Author

loemraw commented Aug 6, 2025

Force pushed with a v2

  • use chunk root instead of fs root
  • fix offline resize error handling
  • fix variable declarations to not have newlines
  • fix parameter list to have more important arguments first

In this case there is a difference between passing "+1G" and "1G" that would be lost if I just passed a u64. "+1G" increases the filesystem size by 1G and "1G" sets the filesystem size to 1G.

Just use s64

I'm still passing the amount as a cstring because a s64 doesn't convey the difference between "+1G" and "1G".

/* For target sizes without +/- sign prefix (e.g. 1:150g) */
if (mod == 0) {
new_size = diff;
} else if (mod > 0) {

Check warning

Code scanning / CodeQL

Comparison result is always the same Warning

Comparison is always true because mod >= 1.
@kdave kdave force-pushed the devel branch 2 times, most recently from f660864 to a452b1e Compare August 8, 2025 17:39
@loemraw
Copy link
Contributor Author

loemraw commented Aug 13, 2025

@adam900710 ping for v2 review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants