Skip to content

PXB-3747: report backup size and uncompressed size#1737

Open
satya-bodapati wants to merge 4 commits intopercona:8.4from
satya-bodapati:PXB-3747-8.4
Open

PXB-3747: report backup size and uncompressed size#1737
satya-bodapati wants to merge 4 commits intopercona:8.4from
satya-bodapati:PXB-3747-8.4

Conversation

@satya-bodapati
Copy link
Copy Markdown
Contributor

@satya-bodapati satya-bodapati commented Apr 16, 2026

Summary

xtrabackup now embeds two byte-accurate size fields in xtrabackup_info
and mirrors them to the error log:

  • backup_size (bytes) — final bytes the backup wrote to disk
    (or to the stream). Always reported.
  • uncompressed_backup_size (bytes) — logical pre-compression
    bytes (data + redo + metadata + server-encrypted InnoDB files +
    RocksDB SSTs that skip the compressor). Reported only when
    --compress is used.

Both values are byte-perfect against the actual on-disk artifact for
every combination of --compress{,-lz4,-zstd} / --encrypt /
--stream=xbstream / --extra-lsndir / incremental / sparse.
Strict-equality asserts in t/backup_size_basic.sh and
t/backup_size_compress.sh cover 15+ scenarios end-to-end.

JIRA: https://perconadev.atlassian.net/browse/PXB-3747

How to use this feature

No new flags or options — the behavior is enabled for every backup.

  1. Final backup size (backup_size) — always present in
    xtrabackup_info, e.g.:

    backup_size = 129456789
    

    and logged:

    Backup size: 123.45 MiB (129456789 bytes)
    
  2. Uncompressed size under --compress (uncompressed_backup_size)
    added only when --compress{,-lz4,-zstd} is used:

    uncompressed_backup_size = 479123456
    

    along with two extra log lines:

    Uncompressed backup size: 456.78 MiB (479123456 bytes)
    Compression ratio: 3.70x
    

    This helps operators estimate the disk space required to extract
    a compressed backup before running xbstream -x /
    xtrabackup --decompress, and makes backups taken at different
    dates comparable regardless of compressibility.

  3. Where xtrabackup_info lives — the copy inside the backup
    directory follows the transport the user picked: it is compressed
    under --compress, encrypted under --encrypt, and wrapped in
    the xbstream under --stream=xbstream. To always obtain an
    unencrypted, uncompressed copy of xtrabackup_info, pass
    --extra-lsndir=<dir> — the file written there is plaintext
    regardless of --compress / --encrypt.

  4. Error-log output — the two (or three, under --compress)
    lines above appear at the end of the backup, just before the
    "completed OK!" banner.

Commits

Split into two focused, bisect-friendly commits:

  1. PXB-3747 : Report final on-disk backup_size in xtrabackup_info
    — adds backup_size with zero --compress awareness; includes
    the shared strict-size test infrastructure.
  2. PXB-3747 : Report uncompressed_backup_size for --compress backups
    — adds uncompressed_backup_size, the "Uncompressed backup size"
    log line, and the "Compression ratio" log line, wired up only
    for --compress runs.

Commit 1 design — backup_size

The datasink pipeline has exactly one authoritative component: the
leaf. Every byte that lands in the backup artifact passes
through ds_local / ds_stdout / ds_fifo. Wrapper datasinks
(encrypt, xbstream, buffer, tmpfile, compress) only transform and
forward bytes.

    ds_data (head)
         |  ds_write()
         v
    [ wrappers: xbstream / encrypt / compress / ... ]
         |  pipe_ctxt
         v
    leaf (ds_local | ds_stdout | ds_fifo)   authoritative byte count
         |
         v
    my_write() / fwrite()

Rather than adding one-off get_foo() vtable slots per metric, the
patch introduces a tiny per-datasink metrics API: each datasink
can publish named {name, value} pairs via an optional
report_metrics slot, and callers read one off a specific node
with ds_find_metric. The three leaves publish an atomic
bytes_written counter bumped on every successful write.
get_final_backup_size() walks to the leaf and reads it;
backup_size = N is written to xtrabackup_info and logged.

Tablespace_map::serialize() and the transition-key dumper are
moved ahead of backup_finish() so their bytes are included in
the reported backup_size.

Commit 2 design — uncompressed_backup_size

Why two data pipelines exist

xtrabackup maintains two top-level data pipelines because not
every file benefits from compression:

  • ds_data — normal InnoDB tablespaces, redo, metadata.
    Compressed when --compress is on.
  • ds_uncompressed_data — server-encrypted InnoDB tablespaces
    (re-compressing ciphertext is wasteful) and RocksDB SST files
    (already compressed internally). Bypasses the compress wrapper.

Both pipelines share their encrypt / xbstream / leaf tail as an
init-time context-sharing optimization:

    ds_data              --> buffer --> compress --> encrypt --> leaf
                                                                  ^
                                                                  |
    ds_uncompressed_data -----------------------> encrypt --------+

Why the counter cannot live on any ds_ctxt_t

The shared tail makes "attach a counter to the ctxt" broken no
matter which ctxt is picked:

  • Leaf ctxt. Under --stream=xbstream, ds_data, ds_redo,
    ds_meta, and ds_uncompressed_data all terminate at the same
    leaf. A leaf-ctxt counter would sum bytes from every pipeline,
    and for the compressed pipeline those bytes are post-compression
    — neither the logical total nor the on-disk total.
  • Encrypt / xbstream / buffer ctxts. ds_data and
    ds_uncompressed_data share these too. Counting there mixes
    post-compression bytes (from ds_data) with raw bytes (from
    ds_uncompressed_data); the sum is meaningless.

Per-file ds_file_t handle is the right anchor

The counter attaches where a single logical write unambiguously
enters exactly one pipeline — on the per-file handle returned by
the top-level ds_open(). Each top-level open creates a fresh
ds_file_t, the caller decides whether that file's bytes count,
and ds_write / ds_write_sparse bump the counter with the
length supplied at the entry, before any wrapper transforms it:

    xtrabackup top-level open
         |  ds_open_track_uncomp(..., xb_global_track_uncomp_bytes())
         |                             |
         |                             +-- nullptr when !--compress
         v
    ds_file_t { uncomp_bytes -> xb_uncomp_bytes_counter }
         |  ds_write(len) / ds_write_sparse(packed_len)
         |     if (file->uncomp_bytes) uncomp_bytes->add_uncompressed(len)
         v
    [ compress / encrypt / buffer / xbstream wrappers ]
         |
         v
    leaf  (already counted as backup_size by commit 1)

    xb_uncomp_bytes_counter.get_uncompressed_backup_size()
         |
         v
    uncompressed_backup_size  (xtrabackup_info + error log)

Tracking is opt-in and scoped to the top-level opens on ds_data
/ ds_redo / ds_meta / ds_uncompressed_data, so the hot-path
cost is one atomic add per write on --compress runs; non-compress
runs skip even that. The same counter aggregates both pipelines,
so server-encrypted IBDs and RocksDB SSTs are counted exactly
once, at their logical size.

xb_global_track_uncomp_bytes() returns a real counter only when
--compress is active, so non-compress runs emit neither the
field nor the log lines.

Test plan

  • t/backup_size_basic.sh--target-dir, --stream=xbstream,
    --encrypt on both, incremental chains, sparse tablespaces, full
    restore round-trip. Every scenario asserts backup_size matches
    the on-disk / on-stream byte count exactly; uncompressed_backup_size
    must not be present.
  • t/backup_size_compress.sh--compress target-dir,
    --compress xbstream, --compress + --encrypt, incremental
    chains with compression, RocksDB, server-encrypted InnoDB,
    redo-log encryption, sparse + --compress, and --extra-lsndir.
    Every scenario asserts backup_size matches the compressed
    on-disk size and, after decompression, uncompressed_backup_size
    matches the decompressed tree size byte-for-byte.
  • suites/reducedlock/backup_size_reduced.sh — reduced-lock
    • concurrent TRUNCATE edge case (zero-size tablespace).
  • -s compression — 12/12 runnable tests pass; shared
    strict-size helpers live in inc/common.sh.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
1 pipeline(s) were filtered out due to trigger conditions.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
1 pipeline(s) were filtered out due to trigger conditions.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
1 pipeline(s) were filtered out due to trigger conditions.

@satya-bodapati satya-bodapati changed the title PXB-3747: Display backup size in error log and xtrabackup_info PXB-3747: Report backup size in xtrabackup error log and xtrabackup_info Apr 16, 2026
@satya-bodapati satya-bodapati changed the title PXB-3747: Report backup size in xtrabackup error log and xtrabackup_info PXB-3747: report backup size and uncompressed size Apr 20, 2026
@satya-bodapati satya-bodapati force-pushed the PXB-3747-8.4 branch 3 times, most recently from 1172be9 to abcbaa3 Compare April 20, 2026 21:56
@satya-bodapati
Copy link
Copy Markdown
Contributor Author

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds byte-accurate backup size reporting to xtrabackup by plumbing size counters through the datasink pipeline and emitting the results in xtrabackup_info and the error log. It also introduces end-to-end shell tests that assert strict equality between reported sizes and on-disk/on-stream artifacts across compressed/uncompressed, encrypted, streamed, incremental, and sparse scenarios.

Changes:

  • Add backup_size reporting based on a leaf datasink byte counter (local/stdout/fifo) and log it at backup completion.
  • Add uncompressed_backup_size reporting for --compress via per-file metrics tracked at top-level ds_open() entrypoints (pre-compression logical bytes).
  • Add shared test helpers and new test suites validating strict byte-perfect invariants across many backup configurations.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
storage/innobase/xtrabackup/test/t/backup_size_compress.sh New end-to-end coverage validating backup_size + uncompressed_backup_size under --compress across multiple scenarios.
storage/innobase/xtrabackup/test/t/backup_size_basic.sh New end-to-end coverage validating backup_size for non-compress backups and asserting absence of uncompressed_backup_size.
storage/innobase/xtrabackup/test/suites/reducedlock/backup_size_reduced.sh New reduced-lock TRUNCATE regression test for byte-perfect backup_size.
storage/innobase/xtrabackup/test/inc/common.sh Adds shared helpers for parsing xtrabackup_info and performing strict size assertions.
storage/innobase/xtrabackup/src/xtrabackup.h Declares size-reporting APIs and introduces xb_get_metrics() helper for conditional tracking under --compress.
storage/innobase/xtrabackup/src/xtrabackup.cc Implements size getters, ensures tracked opens for key backup artifacts, and reorders operations so size sampling includes late-written files.
storage/innobase/xtrabackup/src/space_map.cc Switches tablespace map output to tracked open for inclusion in uncompressed metrics when compressing.
storage/innobase/xtrabackup/src/redo_log.cc Switches redo logfile creation to tracked open for uncompressed metrics.
storage/innobase/xtrabackup/src/keyring_plugins.cc Switches key dump output to tracked open for uncompressed metrics.
storage/innobase/xtrabackup/src/ds_xbstream.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_tmpfile.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_stdout.cc Adds leaf byte counting + get_bytes_written implementation for stdout leaf and ensures file initialization.
storage/innobase/xtrabackup/src/ds_local.cc Adds leaf byte counting + get_bytes_written implementation for local leaf (including sparse trailing-byte fix accounting).
storage/innobase/xtrabackup/src/ds_fifo.cc Adds leaf byte counting + get_bytes_written implementation for FIFO leaf.
storage/innobase/xtrabackup/src/ds_encrypt.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_decrypt.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_decompress_zstd.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_decompress_lz4.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_decompress.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_compress_zstd.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_compress_lz4.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_compress.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/ds_buffer.cc Updates datasink vtable for new API slot and ensures ds_init_file() is called.
storage/innobase/xtrabackup/src/datasink.h Extends datasink interface with get_bytes_written, introduces metrics plumbing on ds_file_t, and adds tracked open helpers.
storage/innobase/xtrabackup/src/datasink.cc Implements tracked open + metrics accumulation, adds ds_leaf(), and adjusts ds_create initialization.
storage/innobase/xtrabackup/src/backup_mysql.cc Embeds backup_size and conditionally appends uncompressed_backup_size into xtrabackup_info.
storage/innobase/xtrabackup/src/backup_copy.cc Switches top-level opens to tracked opens and logs human-readable size summaries + compression ratio.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread storage/innobase/xtrabackup/src/datasink.cc Outdated
Comment thread storage/innobase/xtrabackup/src/backup_copy.cc Outdated
Comment thread storage/innobase/xtrabackup/src/datasink.cc Outdated
Comment thread storage/innobase/xtrabackup/src/datasink.cc Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread storage/innobase/xtrabackup/src/datasink.cc Outdated
Comment thread storage/innobase/xtrabackup/src/ds_local.cc
Comment thread storage/innobase/xtrabackup/src/backup_copy.cc Outdated
Comment thread storage/innobase/xtrabackup/test/inc/common.sh Outdated
https://perconadev.atlassian.net/browse/PXB-3747

Problem:
--------
xtrabackup never tells the operator how large the backup it just
produced actually is on disk (or on the stream).  Automation,
retention, and capacity tooling have had to measure the artifact
out-of-band, which is fragile across --target-dir, --stream, and
--encrypt transports.

Analysis:
---------
The datasink pipeline already has exactly one authoritative
component: the leaf.  Every byte that lands in the backup artifact
passes through ds_local / ds_stdout / ds_fifo on its way out.
Wrapper datasinks (encrypt, xbstream, buffer, tmpfile, compress*)
only transform and forward bytes.

    ds_data (head)
         |  ds_write()
         v
    [ wrappers: xbstream / encrypt / compress / ... ]
         |  pipe_ctxt
         v
    leaf (ds_local | ds_stdout | ds_fifo)   authoritative byte count
         |
         v
    my_write() / fwrite()

There is no framework API for the top of the pipeline to read a
metric off the leaf, and adding one-off getters per metric would
not generalize to future needs (per-file checksums, stats, ...).

Fix:
----
Add a small per-datasink metrics API: each datasink may publish
named {name, value} pairs via an optional report_metrics vtable
slot, and callers read one off a specific node with ds_find_metric.
The three leaves publish an atomic "bytes_written" counter bumped
on successful writes.  get_final_backup_size() reads it off the
leaf; backup_size=N is written into xtrabackup_info and logged.

Move Tablespace_map::serialize() and the transition-key dumper
ahead of backup_finish() so their bytes are included in the count.

Tests:
------
t/backup_size_basic.sh covers --target-dir, --stream=xbstream,
--encrypt on both, incremental chains, sparse tablespaces, and a
full restore round-trip, with strict byte-for-byte checks.

How to use this feature:
------------------------
No flags required.  On every successful backup, xtrabackup now
embeds:

  backup_size = <bytes>

as an extra line in the xtrabackup_info file, and logs the same
value to the error log, e.g.:

  Backup size: 123.45 MiB (129456789 bytes)

xtrabackup_info inside the backup directory follows the transport
the user selected: it is compressed under --compress, encrypted
under --encrypt, and packed into the stream under --stream=xbstream.
To always obtain an unencrypted, uncompressed copy of
xtrabackup_info, pass --extra-lsndir=<dir>; the file written there
is the plaintext version.
https://perconadev.atlassian.net/browse/PXB-3747

Problem:
--------
For --compress backups, backup_size reports post-compression bytes
on disk, which says nothing about the logical volume that was
backed up.  Two artifacts of the same database taken on different
dates with different compressibility are only comparable at the
uncompressed layer, and the compression ratio is only computable
if both numbers are reported.

Analysis:
---------
Under --compress xtrabackup builds TWO pipelines, because some
files must skip compression (they are already encrypted or
pre-compressed -- re-compressing them is wasteful):

    ds_data head --> buffer --> compress --> encrypt --> leaf    [regular files]
                                                         ^
                                                         |  (shared)
                                                         v
    ds_uncompressed_data head ---------> encrypt ------> leaf    [encrypted ibd, SST]

(buffer and encrypt are optional per invocation; diagram shows a
representative full chain.)

Counting in ds_ctxt_t would double-count: the leaf sees every byte
once per head, and the leaf ctxt is shared across both heads.
Counting on the head has the opposite problem -- ds_meta aliases
the ds_data head for bookkeeping, and ds_uncompressed_data can
alias a mid-chain node under --stream, so head-level counters miss
some writes and over-count others.

The right place is ds_file_t.  Each logical top-level open produces
exactly one ds_file_t on the head of a pipeline, and that file
object sees every byte of its file exactly once, whatever the
downstream shape is.

Fix:
----
Add a backup-run aggregate (xb_uncomp_bytes_counter) and enable it
on each top-level ds_file_t via a small ds_open_track_uncomp()
wrapper.  ds_write / ds_write_sparse bump the counter when enabled;
internal wrapper opens stay on plain ds_open() so each logical byte
is counted exactly once at the top.  xtrabackup_info gains
uncompressed_backup_size=N under --compress, and the compression
ratio is logged alongside.

Tests:
------
t/backup_size_compress.sh covers compress, compress+encrypt,
compress+stream=xbstream, compress+stream+encrypt, and
--extra-lsndir variants; each scenario decrypts/decompresses in
place and asserts uncompressed_backup_size matches the plaintext
sum byte for byte.

How to use this feature:
------------------------
No flags required beyond --compress.  Under --compress, xtrabackup
additionally embeds:

  uncompressed_backup_size = <bytes>

in xtrabackup_info, and emits two extra error-log lines:

  Uncompressed backup size: 456.78 MiB (479123456 bytes)
  Compression ratio: 3.70

uncompressed_backup_size is the pre-compression logical volume
(tablespaces + redo + metadata + server-encrypted IBDs + RocksDB
SSTs that skip the compressor).  It lets operators estimate the
disk space required to extract a compressed backup before running
xbstream -x / xtrabackup --decompress, and makes backups comparable
across dates regardless of compressibility.

The field and log lines are omitted for non-compress runs since
backup_size already captures the on-disk volume in that case.
@satya-bodapati
Copy link
Copy Markdown
Contributor Author

Pushed follow-up commit 7cc89789e5bPXB-3747 : Addressing review comments. Will squash before merge.

What it addresses:

  1. fs_support_punch_hole initialization — ds_ctxt_t::pipe_ctxt now defaults to nullptr in the struct, and the five my_malloc-based sinks (local, stdout, xbstream, buffer, tmpfile) use MY_FAE | MY_ZEROFILL. xbstream_init() advertises fs_support_punch_hole = true since its wire format round-trips sparse chunks natively. ds_create() no longer overrides the capability bit.
  2. local_close() counts the 1-byte trailing-hole write into bytes_written, so sparse files under --target-dir no longer under-report by 1 byte per file.
  3. "Finsh" → "Finish" in backup_copy.cc.
  4. Reflowed the report_backup_size() doc comment so the "Compression ratio: inf" example reads as a single sentence.
  5. sum_file_bytes() in inc/common.sh early-returns 0 when the argument is not a directory, silencing the stray find(1) stderr message.

Verified by rebuilding in Debug and running t/backup_size_basic.sh and t/backup_size_compress.sh — both pass.

Comment thread storage/innobase/xtrabackup/src/datasink.cc Outdated
Comment thread storage/innobase/xtrabackup/src/ds_buffer.cc Outdated
Comment thread storage/innobase/xtrabackup/src/ds_local.cc Outdated
https://perconadev.atlassian.net/browse/PXB-3747

Follow-up to the two commits on this branch that introduced per-datasink
metrics reporting and uncompressed-backup-size tracking.  Will be squashed
before merge.

Problem:
--------------

Review raised five issues on the initial split:

1. fs_support_punch_hole was left uninitialized for every non-local
   datasink.  ds_local_init() probes the filesystem and sets it, but
   every other init() allocated ds_ctxt_t with my_malloc() without
   MY_ZEROFILL, so the field held whichever bytes my_malloc returned.
   ds_create() previously masked this by unconditionally clobbering
   the field to false after init, which also discarded the local
   datasink's probe result (the regression we already fixed).
   That left xbstream - which natively encodes sparse chunks in its
   wire format and therefore does support punch-hole - advertising
   whatever garbage happened to be there.

2. local_close() writes one trailing zero byte to stamp out the
   final page of a sparse file, but it did not add that byte to
   ds_local_ctxt_t::bytes_written.  Backups with sparse files
   therefore under-reported backup_size by 1 byte per sparse file.

3. backup_copy.cc had a "Finsh" typo and an awkwardly reflowed doc
   comment above report_backup_size().

4. test/inc/common.sh::sum_file_bytes() ran find without a -d guard,
   so callers probing a directory that might not exist got a stray
   "No such file or directory" on stderr.

Fix/Implementation:
-------------------

1. Framework-field initialization:
   - ds_ctxt_t::pipe_ctxt gets an in-class default of nullptr (matches
     the existing fs_support_punch_hole = false default), so every
     new ds_ctxt_t site starts with a sane pipe_ctxt.
   - The sinks that allocate ds_ctxt_t via my_malloc (local, stdout,
     xbstream, buffer, tmpfile) now pass MY_FAE | MY_ZEROFILL, so
     those framework fields start at zero/false too.
   - xbstream_init() then sets fs_support_punch_hole = true because
     the xbstream wire format carries sparse chunks natively and can
     faithfully round-trip holes regardless of the extraction
     filesystem.
   - ds_create()'s normalization block is replaced with a short
     comment explaining the new contract: every init() is responsible
     for advertising its own capabilities; the framework no longer
     overrides them.

2. local_close() adds 1 to ds_local_ctxt_t::bytes_written right
   after the successful 1-byte trailing write, so backup_size stays
   authoritative for sparse files under --target-dir.

3. "Finsh" -> "Finish"; the report_backup_size() doc comment is
   reflowed so the closing parenthesis lands on the same line as
   the example it belongs to.

4. sum_file_bytes() early-returns 0 when its argument is not a
   directory, eliminating the stray find(1) error and matching the
   "0 if the directory is empty or does not exist" contract in the
   helper's own comment.

Test:
-----

- bld/ (Debug) rebuild clean.
- t/backup_size_basic.sh: passed.
- t/backup_size_compress.sh: passed.
@satya-bodapati
Copy link
Copy Markdown
Contributor Author

Follow-up on c935a9d (force-pushed): dropped the redundant packed_len loop in ds_write_sparse(). The len parameter is already the packed (hole-excluded) payload size — callers pre-pack the buffer (dst_pos in write_ibd_buffer, chunk.length in xbstream.cc) and local_write_sparse walks the sparse_map writing exactly that many bytes in total. Now we just add_uncompressed(len). Verified with both t/backup_size_basic.sh and t/backup_size_compress.sh (byte-exact size assertions).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread storage/innobase/xtrabackup/src/backup_copy.cc
Comment thread storage/innobase/xtrabackup/src/backup_mysql.cc
Copy link
Copy Markdown
Contributor

@jakub-nowakowski-percona jakub-nowakowski-percona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with optional remarks.

Comment thread storage/innobase/xtrabackup/src/datasink.h
Comment thread storage/innobase/xtrabackup/src/datasink.h
Comment thread storage/innobase/xtrabackup/src/datasink.h Outdated
Comment thread storage/innobase/xtrabackup/src/datasink.h Outdated
Comment thread storage/innobase/xtrabackup/src/datasink.h Outdated
Comment thread storage/innobase/xtrabackup/src/ds_buffer.cc
https://perconadev.atlassian.net/browse/PXB-3747

Follow-up on c935a9d ("PXB-3747 : Addressing review comments").
Will be squashed before merge.

Problem:
--------
Review feedback on the datasink metrics interface:

1. ds_ctxt_t and ds_file_t had some pointer members defaulted to
   nullptr and others left raw, which relied on every allocation site
   knowing to use MY_ZEROFILL or initialise each field explicitly.

2. ds_metric::name was a bare const char *, and the non-owning
   "must-outlive-consumer" contract lived only in a prose comment.

3. ds_find_metric's @param doc leaked an implementation detail
   ("strcmp comparison"), and its name parameter was const char *.

4. report_metrics' out parameter was a pointer for "fill this
   vector", but it must never be null -- the only caller
   (ds_find_metric) always passes the address of a local stack vector.

Fix:
----
1. Default every remaining pointer in ds_ctxt_t (datasink, root, ptr)
   and ds_file_t (ptr, path, datasink, ctxt, uncomp_bytes) to
   nullptr in-class, so newly constructed ctxt/file handles are
   never observed with garbage regardless of allocation path.

2. ds_metric::name becomes std::string_view.  The ownership sentence
   is replaced with a short note that the view is non-owning (and
   typically backed by a string literal in the producer).  Producers
   do not change -- string literals implicitly convert.

3. ds_find_metric's @param doc drops "(strcmp comparison)"; its
   name parameter becomes std::string_view; the body swaps
   strcmp(m.name, name) == 0 for m.name == name.  <cstring> is no
   longer needed in datasink.cc.

4. report_metrics' out parameter becomes std::vector<ds_metric> &.
   The three leaf implementations (local, stdout, fifo) switch
   from out->push_back(...) to out.push_back(...).  The "nullptr"
   placeholders in other vtables are unaffected (a null function
   pointer is type-agnostic at assignment).

Test:
-----
- bld/ (Debug) rebuild clean.
- t/backup_size_basic.sh: passed.
- t/backup_size_compress.sh: passed.
- No strcmp left in datasink.cc.
@satya-bodapati
Copy link
Copy Markdown
Contributor Author

Follow-up commit 8d20d4b — PXB-3747 : Address Jakub's review comments (will squash before merge).

What it addresses from Jakub's review round:

  • J2 (datasink.h pointer init) — every remaining pointer member in ds_ctxt_t (datasink, root, ptr) and all five in ds_file_t (ptr, path, datasink, ctxt, uncomp_bytes) is now default-initialised to nullptr in-class. Newly constructed handles no longer depend on the allocation site remembering MY_ZEROFILL / explicit field init.
  • J3 + J5 (ds_metric::name and ds_find_metric doc)ds_metric::name and ds_find_metric's name parameter are now std::string_view. The ownership comment and the "(strcmp comparison)" implementation-leak are gone. ds_find_metric uses m.name == name; strcmp / <cstring> removed from datasink.cc. Producers stay the same — string literals convert implicitly.
  • J4 (report_metrics out param) — the vtable slot is now std::vector<ds_metric> &out. The three leaf implementations (ds_local, ds_stdout, ds_fifo) use out.push_back(...) accordingly; the single caller (ds_find_metric) passes a stack-local vector. Other vtables' nullptr /* report_metrics */ placeholders are unaffected.

Left as-is with the existing replies standing: J1 (ALWAYS_INLINE) and J6 (MY_ZEROFILL).

C1 (Compression ratio 3.70 vs 3.70x): the PR description already shows 3.70x; the Commit-2 message example will be fixed at squash time.

Verified: Debug rebuild clean; t/backup_size_basic.sh and t/backup_size_compress.sh both pass (byte-exact size assertions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants