Skip to content

write_derived: skip metadata regeneration unless csv/spec-list actually change#255

Open
andersone1 wants to merge 12 commits intomainfrom
check-if-need-update
Open

write_derived: skip metadata regeneration unless csv/spec-list actually change#255
andersone1 wants to merge 12 commits intomainfrom
check-if-need-update

Conversation

@andersone1
Copy link
Collaborator

@andersone1 andersone1 commented Feb 13, 2026

Summary

  • Refactored write_derived() to avoid unnecessary metadata regeneration and reduce noisy VCS diffs.
  • Replaced legacy diff-file behavior with a structured last-run-summary.txt output (also mirrored to console).
  • Added spec-diff reporting (added/removed variables and changed spec fields).
  • Added legacy metadata migration handling when diffs.csv is detected.
  • Expanded test coverage substantially across new and existing behavior.
  • Bumped package version to 0.13.0.9001.

Key Changes

  • write_derived() now always writes CSV + spec-list.yml, but only regenerates XPT/define when CSV/spec-list content actually changes (MD5-
    based .needs_update).
  • Data diffs are only recomputed when .needs_update is TRUE.
  • New summary output file: last-run-summary.txt, containing:
    • generated timestamp/user
    • data-change section
    • spec-change section
  • If no data/spec diffs are found, summary file is not updated.
  • Legacy migration path:
    • if metadata folder contains diffs.csv, warn and recreate metadata folder
    • preserve prior spec baseline so spec diffs still report correctly after migration
  • Added internal helpers:
    • execute_spec_diffs() for spec-list comparison
    • build_run_summary_lines() for consistent summary formatting
  • Updated execute_data_diffs() with .print_output to support reuse without duplicate console printing.
  • Updated docs for write_derived() and execute_data_diffs().

…ly change

  - Always write the output CSV and `spec-list.yml`.
  - Determine `.needs_update` from old vs new MD5 of:
    - `.file` (CSV)
    - `<meta>/spec-list.yml`

  `.needs_update` is TRUE in these cases:
  1. CSV did not exist before this call.
  2. `spec-list.yml` did not exist before this call.
  3. CSV content changed.
  4. `spec-list.yml` content changed.

  When `.needs_update` is TRUE:
  - regenerate XPT
  - rerender define output
  - run diff generation only if `base_df` exists and `.execute_diffs` is TRUE
  - write `diffs.csv` only if generated diffs are non-empty (no delete/overwrite when empty)
  - as a result, `diffs.csv` reflects the most recent run that produced at least one diff

  When `.needs_update` is FALSE:
  - skip XPT and define regeneration
  - skip diff generation (even if `.prev_file` changes)
  - keep any existing `diffs.csv` unchanged (no delete/overwrite)

  tests:
  - assert `diffs.csv` is not created when no diffs are found
  - assert XPT is not rewritten when rerunning with unchanged CSV/spec-list
  - assert existing `diffs.csv` is preserved (mtime unchanged) when no new diffs are generated
  - assert diff generation is skipped when `.needs_update` is FALSE
…leanup

  - write human-readable latest-data-diff.txt (timestamp/user header + same diff rows)
  - keep skip-if-unchanged behavior intact
  - if metadata contains legacy diffs.csv, show clear cli warning and recreate folder
  - update docs and tests for new output + backward-compatibility migration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant