Skip to content

nightshift: tech-debt-classify — test gaps, god modules, missing docs #27

@nightshift-micr

Description

@nightshift-micr

nightshift: tech-debt-classify — Microck/traccia

Automated tech debt classification by nightshift.

Summary

Analyzed 19 source modules (6,710 lines) and 8 test files. The codebase is well-structured with good type annotation coverage on return types, but has significant gaps in test coverage (10 of 17 modules lack tests), function documentation (nearly zero docstrings outside cli.py), and several oversized functions that should be decomposed.


🔴 High Severity

1. Critical test coverage gaps — 10 modules untested

Effort: High | Severity: High

The following source modules have no corresponding test file:

Module Lines Functions
rendering.py 930 26
storage.py 517 28
llm.py 506 33
family_normalizer.py 449 18
cli.py 432 31
bootstrap.py 390 4
document_normalizer.py 287 11
extraction.py 192 8
pipeline_support.py 163 8
taxonomy.py 123 1
config.py 96 4
utils.py 45 8

Recommendation: Prioritize storage.py, llm.py, and rendering.py — they are the largest untested modules with critical I/O and data transformation logic.

2. Oversized functions needing decomposition

Effort: Medium | Severity: High

Several functions exceed 50 lines significantly, making them hard to test and maintain:

File Function Lines
rendering.py _write_viewer 222
pipeline.py recompute_graph 179
pipeline.py ingest_directory 135
storage.py replace_graph 98
rendering.py _write_node_pages 86
rendering.py _write_obsidian_skill_notes 87
rendering.py _write_obsidian_evidence_notes 54
rendering.py _write_profile 51
pipeline.py _build_person_skill_state 80
parsers.py _parse_reddit_export 72
parsers.py _parse_source_content 70
cli.py doctor 52

Recommendation: Break recompute_graph (179 lines) and _write_viewer (222 lines) into smaller composable functions. Each sub-function should be independently testable.


🟡 Medium Severity

3. Near-zero docstring coverage

Effort: Low-Medium | Severity: Medium

File Functions Docstrings
pipeline.py 59 0
llm.py 33 0
rendering.py 26 0
storage.py 28 0
parsers.py 27 0
family_normalizer.py 18 0
cli.py 31 3

Only cli.py has any docstrings (3/31). This makes it hard for new contributors (or the CLAUDE.md agent) to understand intent.

Recommendation: Add docstrings to all public functions, starting with storage.py and pipeline.py which define the core data flow.

4. Broad exception handling in pipeline.py

Effort: Low | Severity: Medium

# src/traccia/pipeline.py:225
except Exception as exc:

A single broad except Exception in the pipeline module could silently swallow errors from parsing, graph manipulation, or I/O operations.

Recommendation: Replace with specific exception types (e.g., ParserError, StorageError, GraphError). At minimum, log the full traceback before re-raising.

5. rendering.py is a god module (930 lines, 26 functions)

Effort: Medium | Severity: Medium

rendering.py handles viewer generation, node pages, profile writing, and Obsidian export all in one file. These are distinct concerns.

Recommendation: Split into rendering/viewer.py, rendering/nodes.py, rendering/profile.py, and rendering/obsidian.py.


🟢 Low Severity

6. Return type annotations are good, but parameter types could improve

Effort: Low | Severity: Low

Return type annotations are near 100% across all modules (231/232 functions). This is excellent. However, parameter type annotations should be audited for completeness, especially in llm.py (32/33 return types).

7. No linting beyond F + I rules

Effort: Low | Severity: Low

The ruff config only enables F (pyflakes) and I (isort) rules. Consider enabling additional rule sets:

[tool.ruff.lint]
select = ["F", "I", "UP", "B", "SIM", "TCH", "RUF"]

This would catch bugbears (B), simplify patterns (SIM), and flag unnecessary type-checking blocks (TCH).

8. pipeline.py complexity

Effort: Medium | Severity: Low

At 1,251 lines with 59 functions, pipeline.py is the largest module. While functions are reasonably sized individually, the orchestration logic (ingest_directory, recompute_graph) creates deep call stacks. Consider extracting a pipeline/ package with separate orchestrators.


Classification Summary

Category Count Key Examples
Test gaps 10 modules rendering, storage, llm, family_normalizer
Code smells 12 functions >50 lines _write_viewer (222), recompute_graph (179)
Documentation gaps 225 undocumented functions pipeline, llm, rendering, storage
Architecture 2 god modules rendering.py (930), pipeline.py (1251)
Error handling 1 broad except pipeline.py:225
Linting Minimal rules F+I only

Priority Recommendations

  1. Add tests for storage.py and llm.py — core I/O and backend modules with zero coverage
  2. Decompose _write_viewer (222 lines) — largest single function in the codebase
  3. Decompose recompute_graph (179 lines) — critical graph computation logic
  4. Add docstrings to public functions in pipeline.py and storage.py
  5. Split rendering.py into focused submodules
  6. Replace broad except Exception with specific exception types
  7. Expand ruff linting rules to catch more code quality issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions