feat: Refactor dazzlesum monolith into PyPI package with modular structure
Problem
dazzlesum.py is a 4,359-line monolithic file that handles: CLI parsing, hashing, verification, shadow directories, progress tracking, directory walking, resume state, monolithic file writing, symlink handling, and more. This makes it:
- Hard to navigate and maintain
- Difficult to test individual components
- Not installable as a proper Python package (
pip install dazzlesum)
- Unable to leverage shared DazzleTools libraries (DazzleTreeLib, dazzle-filekit, unctools)
- Missing a
scripts/ subtree pointing to git-repokit-common for versioning
Proposed solution
Phase 1: PyPI package structure
Refactor the monolith into a proper package layout:
dazzlesum/
__init__.py
__main__.py # python -m dazzlesum entry point
_version.py # canonical version (sync-versions.py compatible)
cli.py # argparse setup, command routing
core/
__init__.py
generator.py # ChecksumGenerator class
verifier.py # verification logic (currently mixed into generator)
calculator.py # DazzleHashCalculator + native tool wrappers
monolithic.py # MonolithicWriter
tree/
__init__.py
walker.py # FIFODirectoryWalker
counter.py # count_dirs_and_files
cache.py # SQLite tree cache (from #9)
shadow.py # ShadowPathResolver
progress/
__init__.py
tracker.py # ProgressTracker (with throughput display)
summary.py # SummaryCollector
utils/
__init__.py
symlinks.py # SymlinkHandler, junction detection
patterns.py # include/exclude pattern matching
encoding.py # UTF-8 subprocess handling
pyproject.toml # PEP 621 metadata, entry points
scripts/ # git-repokit-common subtree
Phase 2: git-repokit-common integration
Add scripts/ subtree from git-repokit-common for:
sync-versions.py and version management
- Pre-commit/post-commit hooks
- GitHub tools (
gh_issue_full.py, gh_sub_issues.py)
Phase 3: DazzleTreeLib integration
Replace the internal FIFODirectoryWalker and count_dirs_and_files with DazzleTreeLib (C:\code\DazzleTreeLib):
- Universal adapter system for tree traversal
- 4-5x caching speedup for repeated traversals
- Async support for parallel directory enumeration
- Memory-efficient streaming iterators for large trees (3.3M+ files)
This is particularly relevant for the SQLite cache feature (#9) -- DazzleTreeLib's caching layer could provide the fast-load tree that dazzlesum needs.
Phase 4: Shared library integration
Use existing DazzleTools shared libraries instead of rolling own:
dazzle-filekit (C:\code\filetoolkit / C:\proj\dazzlelib\dazzle-filekit): Path handling, normalization
unctools (C:\code\unctools): UNC path conversion, drive letter mapping
Performance testing needed
- Benchmark DazzleTreeLib vs current FIFODirectoryWalker on D:\M (180K dirs, 3.3M files)
- Benchmark DazzleTreeLib async vs sync traversal on network shares
- Benchmark native tools (certutil, sha256sum) vs Python hashlib vs DazzleTreeLib cached hashes
- Profile the current monolith to identify actual bottlenecks (is it I/O bound or CPU bound?)
Where does library-sha-snapshot.py live?
Currently in D:\M\Software\...\__Scripts and DeDrm\. Options:
- Keep as D:\M-specific wrapper (current location)
- Move into dazzlesum as an example/contrib script
- Generalize into a
dazzlesum profile or dazzlesum preset feature
Acceptance criteria
Related
Analysis
See 2026-04-07__15-29-07__dev-workflow-process_cached-tree-and-incremental-hashing.md for tree caching design.
See 2026-04-07__08-23-05__dev-workflow-process_verify-exclude-and-shadow-path-fix.md for recent bug fix context.
feat: Refactor dazzlesum monolith into PyPI package with modular structure
Problem
dazzlesum.pyis a 4,359-line monolithic file that handles: CLI parsing, hashing, verification, shadow directories, progress tracking, directory walking, resume state, monolithic file writing, symlink handling, and more. This makes it:pip install dazzlesum)scripts/subtree pointing to git-repokit-common for versioningProposed solution
Phase 1: PyPI package structure
Refactor the monolith into a proper package layout:
Phase 2: git-repokit-common integration
Add
scripts/subtree from git-repokit-common for:sync-versions.pyand version managementgh_issue_full.py,gh_sub_issues.py)Phase 3: DazzleTreeLib integration
Replace the internal
FIFODirectoryWalkerandcount_dirs_and_fileswith DazzleTreeLib (C:\code\DazzleTreeLib):This is particularly relevant for the SQLite cache feature (#9) -- DazzleTreeLib's caching layer could provide the fast-load tree that dazzlesum needs.
Phase 4: Shared library integration
Use existing DazzleTools shared libraries instead of rolling own:
dazzle-filekit(C:\code\filetoolkit/C:\proj\dazzlelib\dazzle-filekit): Path handling, normalizationunctools(C:\code\unctools): UNC path conversion, drive letter mappingPerformance testing needed
Where does library-sha-snapshot.py live?
Currently in
D:\M\Software\...\__Scripts and DeDrm\. Options:dazzlesum profileordazzlesum presetfeatureAcceptance criteria
pip install dazzlesum(orpip install -e .for development)python -m dazzlesumworks as entry pointscripts/subtree from git-repokit-common with working hookspyproject.tomlwith proper metadataRelated
Analysis
See
2026-04-07__15-29-07__dev-workflow-process_cached-tree-and-incremental-hashing.mdfor tree caching design.See
2026-04-07__08-23-05__dev-workflow-process_verify-exclude-and-shadow-path-fix.mdfor recent bug fix context.