Source resolution and deep link provenance (v0.2.0)#1
Merged
Conversation
Sources gain schema_version, source_type, url, resolved_from and transcript_status. register_source extracts the numbering, dedup and frontmatter write path for reuse by the resolver, with max-SID+1 numbering so deleted sources never cause id reuse. The intake gate validates v2 fields and warns (never fails) on pre 0.2.0 sources. Version bumped to 0.2.0. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
resynth resolve turns links and file references inside sources into first class fetched sources: html articles reduced to clean text, pdfs via pdftotext, local files, and vimeo or youtube caption transcripts with a transcript pending stub that later upgrades in place. Stdlib networking only, robots respected, rate limited, 10 MiB cap, with an idempotent manifest at index/resolution.jsonl. 31 tests, no live network. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claims may carry a structured locator (url, page, timestamp, anchor) validated by extract-verify, which also warns on video claims without timestamps. The claims index surfaces locator hints for the synthesis operator. Fully backward compatible, existing claims files stay valid. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MASTER.json now embeds normalised source metadata including type and url, and the source register in MASTER.md gains Type and Link columns. New load_master reads both resynth-master/1 and /2. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
resynth migrate adds the v2 frontmatter keys to older sources with the body untouched so stored hashes stay valid, re-evaluates gate 01 and prints re-seal guidance. Re-sealing stays an operator act. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The wizard offers to fetch discovered links right after intake and hints when video transcripts are pending. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds docs/SOURCE-RESOLUTION.md covering the resolution flow, network etiquette, manifest semantics, schema v2 reference, source_locator, master format /2 and the migration guide. Changelog cut for 0.2.0, six new decision records, readme gains the resolve workflow and CLI entries, docstring pass over the new modules. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
RESYNTH 0.2.0 adds a source resolution stage verb and richer provenance.
resynth resolvefollows links and file references found inside ingested sources and registers each as a first class source: html articles reduced to clean text, pdfs (remote and local), local files, and Vimeo or YouTube caption transcripts. Videos without public captions become pending transcript stubs that a later run upgrades in place, keeping their source id.source_type,url,resolved_from,transcript_status. Old projects keep loading and gate with a warning only.source_locator(url, page, timestamp, anchor) validated by extract-verify and surfaced in the claims index and master outputs.resynth-master/2with a sources array. Newload_masterreads/1and/2.resynth migrateupgrades a project explicitly. Re-sealing stays an operator act.resynth --version, wizard offers resolution straight after intake.Stdlib networking only, the four dependency decision holds. Robots respected, one second per host, 30 second timeout, 10 MiB cap. No live network in tests (62 new tests, 131 total).
Docs
Live acceptance (already run on the dev host)
secure_ai migrated idempotently, both cybercx pages fetched as html-article, three Vimeo webinars created as pending stubs (no public captions), the Teams event failed cleanly, all five gates re-passed and the project re-sealed as resynth-secure_ai-v2 with a resynth-master/2 export. Log captured.
Known follow-up
A manually pasted transcript leaves the stored sha256 stale until the operator updates it (documented in docs/SOURCE-RESOLUTION.md). A
resynth rehashhelper is a candidate for 0.2.1.🤖 Generated with Claude Code