Skip to content

feat: Refactor monolith into PyPI package with DazzleTreeLib integration #10

@djdarcy

Description

@djdarcy

feat: Refactor dazzlesum monolith into PyPI package with modular structure

Problem

dazzlesum.py is a 4,359-line monolithic file that handles: CLI parsing, hashing, verification, shadow directories, progress tracking, directory walking, resume state, monolithic file writing, symlink handling, and more. This makes it:

  • Hard to navigate and maintain
  • Difficult to test individual components
  • Not installable as a proper Python package (pip install dazzlesum)
  • Unable to leverage shared DazzleTools libraries (DazzleTreeLib, dazzle-filekit, unctools)
  • Missing a scripts/ subtree pointing to git-repokit-common for versioning

Proposed solution

Phase 1: PyPI package structure

Refactor the monolith into a proper package layout:

dazzlesum/
  __init__.py
  __main__.py           # python -m dazzlesum entry point
  _version.py           # canonical version (sync-versions.py compatible)
  cli.py                # argparse setup, command routing
  core/
    __init__.py
    generator.py        # ChecksumGenerator class
    verifier.py         # verification logic (currently mixed into generator)
    calculator.py       # DazzleHashCalculator + native tool wrappers
    monolithic.py       # MonolithicWriter
  tree/
    __init__.py
    walker.py           # FIFODirectoryWalker
    counter.py          # count_dirs_and_files
    cache.py            # SQLite tree cache (from #9)
    shadow.py           # ShadowPathResolver
  progress/
    __init__.py
    tracker.py          # ProgressTracker (with throughput display)
    summary.py          # SummaryCollector
  utils/
    __init__.py
    symlinks.py         # SymlinkHandler, junction detection
    patterns.py         # include/exclude pattern matching
    encoding.py         # UTF-8 subprocess handling
  pyproject.toml        # PEP 621 metadata, entry points
  scripts/              # git-repokit-common subtree

Phase 2: git-repokit-common integration

Add scripts/ subtree from git-repokit-common for:

  • sync-versions.py and version management
  • Pre-commit/post-commit hooks
  • GitHub tools (gh_issue_full.py, gh_sub_issues.py)

Phase 3: DazzleTreeLib integration

Replace the internal FIFODirectoryWalker and count_dirs_and_files with DazzleTreeLib (C:\code\DazzleTreeLib):

  • Universal adapter system for tree traversal
  • 4-5x caching speedup for repeated traversals
  • Async support for parallel directory enumeration
  • Memory-efficient streaming iterators for large trees (3.3M+ files)

This is particularly relevant for the SQLite cache feature (#9) -- DazzleTreeLib's caching layer could provide the fast-load tree that dazzlesum needs.

Phase 4: Shared library integration

Use existing DazzleTools shared libraries instead of rolling own:

  • dazzle-filekit (C:\code\filetoolkit / C:\proj\dazzlelib\dazzle-filekit): Path handling, normalization
  • unctools (C:\code\unctools): UNC path conversion, drive letter mapping

Performance testing needed

  • Benchmark DazzleTreeLib vs current FIFODirectoryWalker on D:\M (180K dirs, 3.3M files)
  • Benchmark DazzleTreeLib async vs sync traversal on network shares
  • Benchmark native tools (certutil, sha256sum) vs Python hashlib vs DazzleTreeLib cached hashes
  • Profile the current monolith to identify actual bottlenecks (is it I/O bound or CPU bound?)

Where does library-sha-snapshot.py live?

Currently in D:\M\Software\...\__Scripts and DeDrm\. Options:

  • Keep as D:\M-specific wrapper (current location)
  • Move into dazzlesum as an example/contrib script
  • Generalize into a dazzlesum profile or dazzlesum preset feature

Acceptance criteria

  • Installable via pip install dazzlesum (or pip install -e . for development)
  • python -m dazzlesum works as entry point
  • All existing CLI commands work identically after refactoring
  • scripts/ subtree from git-repokit-common with working hooks
  • DazzleTreeLib used for tree traversal (or clear path to integration)
  • Existing tests pass against modular structure
  • PyPI-ready pyproject.toml with proper metadata

Related

Analysis

See 2026-04-07__15-29-07__dev-workflow-process_cached-tree-and-incremental-hashing.md for tree caching design.
See 2026-04-07__08-23-05__dev-workflow-process_verify-exclude-and-shadow-path-fix.md for recent bug fix context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions