Skip to content

refactor: modularize large files across codebase#188

Merged
Goldziher merged 7 commits intomainfrom
refactor-large-files
Jan 19, 2026
Merged

refactor: modularize large files across codebase#188
Goldziher merged 7 commits intomainfrom
refactor-large-files

Conversation

@Goldziher
Copy link
Collaborator

Summary

  • Split files over 500 lines into smaller, focused modules across the codebase
  • Targets ~500 lines per file with single-responsibility modules
  • No functional changes - pure refactoring for maintainability

Phase 1 - Binding crates

  • Node.js: Split lib.rs into enums, handles, options, types/, visitor/
  • Python: Split lib.rs into conversion/, handles, helpers, options, types/, visitor/
  • PHP: Split lib.rs into build, enums, options, types
  • WASM: Split lib.rs into options, types
  • Ruby: Split lib.rs into conversion/, options, types, visitor/
  • Elixir: Split lib.rs into options, types

Phase 2 - Core Rust engine

  • Split converter/inline/semantic.rs into marks.rs and typography.rs
  • Split options.rs into conversion.rs, inline_image.rs, preprocessing.rs
  • Split metadata.rs into collector.rs, config.rs, extraction.rs
  • Split hocr/converter.rs into core.rs, elements.rs, hierarchy.rs

Phase 3 - FFI and CLI

  • Split FFI visitor into callbacks_core.rs, lifecycle.rs, visitor_impl.rs
  • Split CLI main.rs into args.rs, convert.rs, output.rs, validators.rs

Test plan

  • All pre-commit hooks pass (prek run --all-files)
  • Cargo check passes
  • Clippy passes with no warnings
  • CI tests pass

🤖 Generated with Claude Code

Split files over 500 lines into smaller, focused modules:

Phase 1 - Binding crates:
- Node.js: Split lib.rs into enums, handles, options, types/, visitor/
- Python: Split lib.rs into conversion/, handles, helpers, options,
  types/, visitor/
- PHP: Split lib.rs into build, enums, options, types
- WASM: Split lib.rs into options, types
- Ruby: Split lib.rs into conversion/, options, types, visitor/
- Elixir: Split lib.rs into conversion/, options, types, visitor/

Phase 2 - Core Rust engine:
- Split converter/inline/semantic.rs into marks.rs and typography.rs
- Split visitor_helpers/callbacks.rs into macros.rs, bridge.rs
- Split options.rs into conversion.rs, inline_image.rs, preprocessing.rs
- Split metadata.rs into collector.rs, config.rs, extraction.rs
- Split hocr/converter.rs into core.rs, elements.rs, hierarchy.rs

Phase 3 - FFI and CLI:
- Split FFI visitor into callbacks_core.rs, lifecycle.rs, visitor_impl.rs
- Split CLI main.rs into args.rs, convert.rs, output.rs, validators.rs

All files now target ~500 lines max with single-responsibility modules.
Fixes RefCell already borrowed panic at text_node.rs:219:42 by ensuring
visitor borrows are released before match statements. This prevents
conflicts when walk_node recursively processes children that need to
borrow the same visitor.

Pattern applied across 15 files:
- Release borrow in scoped block before match
- Prevents nested borrows during recursive tree walking
The c_char import is only used in tests behind #[cfg(feature = "metadata")],
so the import itself needs the same feature gate.
- Gate Arc import behind async-visitor feature in handles.rs
- Gate metadata type imports behind metadata feature in types/mod.rs
- Gate metadata module imports behind metadata feature in metadata.rs
@Goldziher Goldziher force-pushed the refactor-large-files branch from 6f3ff05 to f68ac36 Compare January 19, 2026 15:41
@Goldziher Goldziher merged commit 789ae3b into main Jan 19, 2026
38 of 47 checks passed
@Goldziher Goldziher deleted the refactor-large-files branch January 19, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments