Skip to content

Source resolution and deep link provenance (v0.2.0)#1

Merged
Markus-Doc merged 8 commits into
mainfrom
feat/source-resolution
Jun 12, 2026
Merged

Source resolution and deep link provenance (v0.2.0)#1
Markus-Doc merged 8 commits into
mainfrom
feat/source-resolution

Conversation

@Markus-Doc

Copy link
Copy Markdown
Owner

What

RESYNTH 0.2.0 adds a source resolution stage verb and richer provenance.

  • resynth resolve follows links and file references found inside ingested sources and registers each as a first class source: html articles reduced to clean text, pdfs (remote and local), local files, and Vimeo or YouTube caption transcripts. Videos without public captions become pending transcript stubs that a later run upgrades in place, keeping their source id.
  • Schema v2 source frontmatter: source_type, url, resolved_from, transcript_status. Old projects keep loading and gate with a warning only.
  • Claims gain an optional structured source_locator (url, page, timestamp, anchor) validated by extract-verify and surfaced in the claims index and master outputs.
  • MASTER.json moves to resynth-master/2 with a sources array. New load_master reads /1 and /2.
  • resynth migrate upgrades a project explicitly. Re-sealing stays an operator act.
  • resynth --version, wizard offers resolution straight after intake.

Stdlib networking only, the four dependency decision holds. Robots respected, one second per host, 30 second timeout, 10 MiB cap. No live network in tests (62 new tests, 131 total).

Docs

  • docs/SOURCE-RESOLUTION.md (flow, manifest, schema reference, migration guide)
  • CHANGELOG 0.2.0 cut, six new DECISIONS.md records, README workflow + CLI reference

Live acceptance (already run on the dev host)

secure_ai migrated idempotently, both cybercx pages fetched as html-article, three Vimeo webinars created as pending stubs (no public captions), the Teams event failed cleanly, all five gates re-passed and the project re-sealed as resynth-secure_ai-v2 with a resynth-master/2 export. Log captured.

Known follow-up

A manually pasted transcript leaves the stored sha256 stale until the operator updates it (documented in docs/SOURCE-RESOLUTION.md). A resynth rehash helper is a candidate for 0.2.1.

🤖 Generated with Claude Code

Markus-Doc and others added 8 commits June 12, 2026 20:57
Sources gain schema_version, source_type, url, resolved_from and
transcript_status. register_source extracts the numbering, dedup and
frontmatter write path for reuse by the resolver, with max-SID+1
numbering so deleted sources never cause id reuse. The intake gate
validates v2 fields and warns (never fails) on pre 0.2.0 sources.
Version bumped to 0.2.0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
resynth resolve turns links and file references inside sources into
first class fetched sources: html articles reduced to clean text, pdfs
via pdftotext, local files, and vimeo or youtube caption transcripts
with a transcript pending stub that later upgrades in place. Stdlib
networking only, robots respected, rate limited, 10 MiB cap, with an
idempotent manifest at index/resolution.jsonl. 31 tests, no live
network.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claims may carry a structured locator (url, page, timestamp, anchor)
validated by extract-verify, which also warns on video claims without
timestamps. The claims index surfaces locator hints for the synthesis
operator. Fully backward compatible, existing claims files stay valid.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MASTER.json now embeds normalised source metadata including type and
url, and the source register in MASTER.md gains Type and Link columns.
New load_master reads both resynth-master/1 and /2.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
resynth migrate adds the v2 frontmatter keys to older sources with the
body untouched so stored hashes stay valid, re-evaluates gate 01 and
prints re-seal guidance. Re-sealing stays an operator act.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The wizard offers to fetch discovered links right after intake and
hints when video transcripts are pending.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds docs/SOURCE-RESOLUTION.md covering the resolution flow, network
etiquette, manifest semantics, schema v2 reference, source_locator,
master format /2 and the migration guide. Changelog cut for 0.2.0,
six new decision records, readme gains the resolve workflow and CLI
entries, docstring pass over the new modules.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Markus-Doc Markus-Doc merged commit f3032bf into main Jun 12, 2026
6 checks passed
@Markus-Doc Markus-Doc deleted the feat/source-resolution branch June 12, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant