JSON-LD Graph Manager#5829
Conversation
Group the flat 17-section layout into five titled parts (Motivation, Architecture, Data Model & Validation, Operations, Reference) with short intros, add a design-spec status banner, add TL;DR leads to the densest sections, de-duplicate canonical-identity and producer-contract discussion, and add a manager-vs-cohort comparison table. Add five Operations sections promised but not previously specified: Testing Strategy, Performance Considerations, Rollback And Coexistence, Direct-Push API Surface, and Security Considerations. Open questions are marked inline so reviewers can react to concrete text. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Hello, I'm the AEM Code Sync Bot and I will run some actions to deploy your branch.
Commits
|
There was a problem hiding this comment.
Pull request overview
Adds a design-specification document for a planned JsonLdGraphManager runtime, describing motivation, architecture/lifecycle, canonical graph/merge rules, operational concerns (logging/testing/perf/rollback), and reference examples.
Changes:
- Introduces a comprehensive JSON-LD graph-manager design spec (feature-flagging, lifecycle, data contracts).
- Defines normalization/merge/dedupe and provenance conventions for multi-producer JSON-LD aggregation.
- Documents operational strategy (observability via Lana, testing levels, performance envelope, rollback).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… remove self-import from push example
Second pass on the JsonLdGraphManager design doc focused on readability and presentation flow for a broader audience. - Restructure into 6 parts (add Part II Rollout) with italic dicta under each section heading to anchor the key idea - Add Quickstart, "Who this is for" audience matrix, and Glossary - Add Mermaid diagrams: 3-beat architecture flowchart, before/after comparison, initialization and mutation sequence diagrams, canonical editorial and product page graph shapes - Annotate Appendix A examples with "What to notice" callouts - Consolidate all Open Questions into Appendix B table
Reorganize JsonLdGraphManager spec so the reading order follows the systems/design-paper convention (why -> what -> does-it-work -> how-we-ship -> caveats) instead of interleaving deployment before design. - Part I Introduction (Abstract, Scope, Problem, Before/After, Contributions) - Part II Design (Decision, Architecture, Lifecycle, DOM & Output Contracts, Producer Integration, Direct-Push API, Normalization, Canonical Graph Model) - Part III Evaluation (Validation Cohort, Testing, Performance) - Part IV Deployment (Feature Flag, Rollout, Rollback, Observability) - Part V Security Considerations (promoted to top-level, RFC convention) - Part VI Related Work & Reference (Authoring Catalog, References, Appendices A-D; Glossary moved to appendix) Specific moves: - Design Decision moves from Motivation to opener of Design - Before/After moves from Architecture to Introduction (motivation device) - Direct-Push API moves from Operations to Design (it's a public interface) - Validation Cohort + Testing + Performance grouped in Evaluation - Security promoted from Operations subsection to top-level part - Glossary moves to Appendix D - Rename "Data Model And Contracts" -> "DOM And Output Contracts" to eliminate name collision with the data-model material in Part II - Add bulleted Contributions list in Introduction No content changes; only section relocations, one rename, and the new Contributions list.
|
This PR has not been updated recently and will be closed in 7 days if no action is taken. Please ensure all checks are passing, https://github.com/orgs/adobecom/discussions/997 provides instructions. If the PR is ready to be merged, please mark it with the "Ready for Stage" label. |
Reframe the spec to point at the requirements sheet in structured-data-json-ld.json as the machine-readable source of truth and keep the markdown doc as rationale and contract. Remove sections that restated rules now owned by the JSON sheet; remove provenance entirely (debug mode is the appropriate place to surface per-source origin). - Externalize: drop "DOM And Output Contracts" subsections, identity policy table, dedupe policy, governing-rules bullets, and the "Manager guarantees vs. cohort expectations" table; replace each with a one-line pointer to the requirements sheet. - Provenance: remove the provenance contract subsection, the Provenance preservation security bullet, the Provenance glossary entry, and all producerName/producerType/ingestMode/discoveryPhase references in the Producer Integration Model, Direct-Push API, runtime lifecycle, sequence diagram, and testing strategy. Reframe observability so debug mode logs the original captured payload and DOM location rather than persisting a provenance record. - Naming: rename section 3 from "Evaluation" to "Conformance" -- the doc covers conformance to the requirements spec, not empirical evaluation. Rename section 4 from "Deployment" to "Operations" so feature flagging and observability sit naturally together. - Section numbering: collapse the 2.1->2.2->2.3->2.6 gap to a contiguous 2.1->2.6 sequence after the renames; add 3.1, 3.2. - Out of scope: add a 3.2 "Out Of Scope" note clarifying that search-engine effectiveness measurement (bot-traffic logs, GSC URL Inspection API) is not gated by this spec. - Cross-references: drop the broken anchor link on the canonical-graph section (target was renumbered); drop "direct graph-manager push" from the merge priority since the direct-push API is no longer specified in this doc; drop BreadcrumbList from Article.hasPart in the editorial diagram and Example 1 since it isn't a supplemental per the supplemental-linkage rule. - Typos and grammar: paramater, eachother, this these, fo this, compelete, it's complexity, on on, speadsheet, awkward "JSON-LD on page meets" wording in the e2e testing bullet.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…pace, feature-flag default
Add a single-file ES module at libs/features/jsonld-graph-manager.js that collects all per-page JSON-LD emitted by existing producers and rewrites it as one canonical, linked @graph. Disabled by default; enabled per page via the jsonld-graph-manager metadata flag or URL query parameter (string 'true', case-insensitive). The implementation is organized as pure helper functions plus a class, all in one file, with named exports for unit-testability: - RULES table encodes the requirements sheet (WebPage, Organization, Article, BreadcrumbList, SoftwareApplication, HowTo, FAQPage, VideoObject, Event, Product) — identity fragments, singleton flags, and default linkage edges. - parsePayload: accepts object | array | { @graph } shapes; logs a Lana warning on parse failure. - normalizeNode: strips per-node @context; rewrites @id to canonical page-scoped fragment (e.g. #article) or site-wide id (Organization). - mergeNodes: resolves scalar conflicts by source priority (bootDom < runtime); unions reference arrays (hasPart, mainEntity, itemListElement) by @id. - injectLinks: derives WebPage.mainEntity/breadcrumb/publisher and Article.isPartOf/mainEntityOfPage/publisher from the RULES table. - JsonLdGraphManager class: boot scan of existing unmanaged scripts, MutationObserver on documentElement (childList + subtree), debounced rebuild queue, and rewrite() that synthesizes a minimal WebPage root when producers haven't provided one. - init() default export: idempotent singleton stored on window.__jsonLdGraphManager. Boot wiring added to documentPostSectionLoading in libs/utils/utils.js — placed before seotech/richresults so the MutationObserver is attached before those producers append their scripts. Tests (37, all passing) cover: flattenPayload, parsePayload (valid shapes + invalid JSON → Lana warning), normalizeNode (canonical ids, context strip, unknown type retention), unionByRef, mergeNodes (priority resolution, field union, reference array union), injectLinks (forward/back links, no-overwrite), boot scan, singleton enforcement, output contract (one managed script, no per-node @context, WebPage-first ordering), MutationObserver pickup, and three e2e pipeline fixtures (editorial, product, multi-producer priority). What v1 does not include: direct-push producer API, runtime fetch of the requirements sheet, provenance persistence, e2e cohort tests against live URLs, search-effectiveness measurement.
|
This PR does not qualify for the zero-impact label as it touches code outside of the allowed areas. The label is auto applied, do not manually apply the label. |
…le logging Add ?jsonld-graph-manager-debug=true URL flag that emits console.debug output at each queue lifecycle event: enqueue (source, DOM location, original payload), rebuild (batch size, graph size), parsed (types, node count), removed from DOM, and rewrite (node count, full expandable graph object). The graph object logged on rewrite is the canonical @graph as produced, inspectable in DevTools without a separate console snippet. Debug output is gated entirely on the URL param and is independent of lanadebug and the Lana endpoint -- these are high-volume success-path events that should never be sent to Lana.
…ug flag doc Organization synthesis: - Always ensure a canonical Organization node is present in the graph. rewrite() synthesizes a minimal default if none is provided, or merges the default at graph-manager-generated priority (weight 2) so baseline fields (name, url, logo) always win over producer-supplied values while producer-only fields (e.g. sameAs) are preserved. - Domain-aware: siteRoot() returns https://business.adobe.com for hostnames matching /business|bacom/i; defaults to https://www.adobe.com. defaultOrg() derives name ("Adobe" / "Adobe for Business"), url, logo, and @id from the site root. Both accept an optional hostname override for testability. - 3-tier merge priority: generated (2) > runtime (1) > bootDom (0). Inline entity extraction: - extractInlineEntities() walks publisher, author, creator, provider, brand properties; hoists any inline typed object that lacks @id to a top-level graph node (via normalizeNode) and replaces the property value with an @id reference. Called during rebuild() after each node is normalized. Doc (libs/utils/json-ld.md): - Summary: add one-line mention of jsonld-graph-manager-debug=true. - §4.1: add debug flag entry alongside the feature flag. - §4.2: replace vague "debug logging conventions" bullets with a concrete description of the five lifecycle events logged by the debug param; remove stale lanadebug reference. Tests: 45 passing (8 new cases covering synthesis, precedence, domain selection for www/business/bacom, inline extraction, and integration).
- Turn off no-continue globally in .eslintrc.js - Add file-level no-use-before-define disable (lanaLog hoisted above parsePayload) - Add inline no-nested-ternary disables for unionByRef coercions - Add missing no-console disables for console.error/warn in lanaLog - Rename _collect → collect (private method, underscore convention unnecessary) - Rename window.__jsonLdGraphManager → window.miloJsonLdGraphManager - Remove unused canonicalUrl import from test file - Add no-promise-executor-return disable for test microtask flush Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduced by a907365: when mergeNodes promoted @type to a SoftwareApplication subtype (WebApplication / MobileApplication / VideoGame), injectLinks() failed two ways: 1. WebPage.mainEntity was never set, because the byType index keyed on the exact @type and the lookup was 'byType.Article ?? byType.SoftwareApplication'. With @type=WebApplication, byType.SoftwareApplication is undefined. 2. provider / isPartOf weren't auto-injected on the SA-subtype node, because the linksBack rule lookup was 'RULES[node["@type"]]' and RULES has no entry for the subtype. Fix: introduce effectiveType(t) that maps SA subtypes to 'SoftwareApplication', and apply it in two places: - byType build: index the node under both its exact @type AND its effective parent (so byType.SoftwareApplication is populated when the node is a subtype) - linksBack lookup: RULES[effectiveType(node['@type'])] so SA's linksBack rules apply to subtypes Also extend the WebPage.mainEntity primary-type fallback to include NewsArticle (richresults emits this and it should attach as mainEntity the same way Article does). Tests: 71/71 passing (68 + 3 new) covering mainEntity for WebApplication, auto-provider on WebApplication, and mainEntity for NewsArticle. Lint: clean.
Add AggregateRating to the canonical graph as its own top-level node: - New requirement aggregaterating-singleton (error): at most one AggregateRating per page, at the canonical @id '{canonicalPageURL}#aggregaterating'. - New requirement aggregaterating-extraction (info): inline aggregateRating values on host entities (SoftwareApplication, Article, Product, etc.) are hoisted to the top-level @graph and replaced with { @id } references. - New section 4.10 AggregateRating: schema.org hierarchy, Google rich-result citations (Software App, Product, Course, Review snippet), manager handling, known producers (review flow). Implementation: - Add AggregateRating: { idFragment: '#aggregaterating', singleton: true } to RULES so normalizeNode rewrites the @id. - Add 'aggregateRating' to ENTITY_PROPS so extractInlineEntities hoists it. Why singleton: every Adobe.com primary entity that exposes ratings has exactly one canonical rating; multi-producer contributions describe the same product (team-hardcoded snapshot vs. live review-block fetch) and should merge. Source priority resolves freshness — runtime (review block) wins over bootDom (team hardcode), so the freshest counts surface to Google's software-app rich result. Tests: 73/73 passing (71 + 2 new — extractInlineEntities hoisting, end-to-end merge with bootDom + runtime contributions). One existing end-to-end assertion updated to expect '{ @id }' instead of inline body. Lint: clean.
|
This pull request is not passing all required checks. Please see this discussion for information on how to get all checks passing. Inconsistent checks can be manually retried. If a test absolutely can not pass for a good reason, please add a comment with an explanation to the PR. |
|
This PR has not been updated recently and will be closed in 7 days if no action is taken. Please ensure all checks are passing, https://github.com/orgs/adobecom/discussions/997 provides instructions. If the PR is ready to be merged, please mark it with the "Ready for Stage" label. |
Suppress weak rating signal so consumers (Google rich-results, LLMs, search) do not surface low-quality ratings under the Adobe brand: - aggregaterating-min-rating-value (error): ratingValue MUST be >= 3.2 - aggregaterating-min-rating-count (error): ratingCount MUST be >= 100 These are Milo policy thresholds, not Google requirements. Google publishes no documented minimums for ratingValue or ratingCount on Software App or Review snippet rich results. The thresholds protect the brand from publishing poor-but-real ratings (2.x stars) or noisy small-sample ratings (<100 reviewers, statistical accidents). Implementation: rewrite() checks the canonical AggregateRating node before serialization. If below threshold, removes the node from the graph AND deletes any 'aggregateRating' reference from host entities so consumers do not see a dangling @id. Also capture softwareapplication-default-offer as an info-severity TODO under section 3.8. The original framing ('inject default free Offer when AggregateRating is displayed') is too narrow — Google's Software App rich-result spec requires offers.price *unconditionally*, plus one of aggregateRating or review. The TODO is widened to: 'synthesize a default free Offer on any primary-entity SoftwareApplication that lacks one.' This matches Google's actual rule and captures the AR case as a subset. Tests: 80/80 passing (73 + 7 new): aggregateRatingMeetsThresholds unit tests (pass case, low value, low count, missing/non-numeric, null) plus three end-to-end cases (low-value drops + reference cleanup, low-count drops, threshold-meeting rating emits normally). Lint: clean.
…ication Complete softwareapplication-default-offer (promoted from info TODO to error severity): when the page's primary SoftwareApplication (or subtype) has no offers (missing property or empty array), the manager synthesizes a default Offer at the canonical @id with price='0', priceCurrency='USD', availability='https://schema.org/InStock', source='generated'. SA.offers is set to [{ @id }] reference to the synthesized node. Why this rule: Google's Software App rich result requires offers.price *unconditionally*, plus one of (aggregateRating, review). The earlier framing 'inject when AggregateRating is displayed' is too narrow — it only fixed the AR-conditional cell of Google's actual rule. Broader framing also subsumes the SA-with-only-review case and matches the Adobe.com norm (products are gateway-free with paid tiers; producers needing non-free Offer supply their own). Also fix a contradiction surfaced by the new test fixture: normalizeNode was rewriting ALL Offer @id values to canonical '#offer', even when the producer supplied a distinct fragment ('#paid', '#free-trial'). This contradicted repeatable-types ('distinct @id values from producers are required to materialize multiple instances') and Appendix A.2 (which shows two offers with distinct fragments). Fix: for repeatable types, when the producer-supplied fragment differs from the rule's default fragment, preserve the producer fragment but canonicalize the URL prefix to the current canonical page URL. Tests: 86/86 passing (80 + 6 new): - synthesis on bare SoftwareApplication - synthesis on bare WebApplication (SA subtype) - no synthesis when SA already has offers - no synthesis when no SA on page - synthesis when offers is empty array - distinct producer fragments (#paid, #free-trial) preserved (codifies the repeatable-types fix) Lint: clean.
Extend softwareapplication-default-offer: the synthesized free Offer now carries category: 'Free Trial' so the node is self-describing and disambiguates from any future producer-supplied paid Offers. Reflects the Adobe.com norm — primary entry to a product is a free trial of the paid tier. Spec, implementation, and test updated in lockstep. Tests: 86/86 passing. Lint: clean.
The last breadcrumb is the current page and conventionally has no <a> tag (UX convention: don't link to the page you're on). The SEO generator's fallback at that branch used window.location.href, which includes query parameters and #hash. On a managed-graph audit URL this leaked our debug params (?milolibs=...&jsonld-graph-manager=...&...) into structured data. In production it would still leak session params (?cmd=, ?segment=, etc.) and any current #hash. The fix prefers <link rel="canonical">, falling back to the bare origin+pathname of the current URL (query and hash stripped). Every other ListItem.item in the BreadcrumbList already resolves to the AEM-authored canonical via <a href>, so the last item now matches. This is the gnav-side complement to the changes in this branch — the JSON-LD Graph Manager rewrites cross-page WebPage references to the canonical page id; this commit fixes the producer that was emitting non-canonical URLs in the first place. Tests: 'should create a breadcrumb SEO element' was unintentionally validating the bug (it asserted item === window.location.href, which matched whatever the runtime URL happened to be including the wtr session id query param). Updated both assertions to expect the stripped origin+pathname form. Pre-existing test failure 'should localize breadcrumb links' is unrelated to this change (port 2000 vs 8000 fixture mismatch in test harness; reproduces on origin code).
…oduction origin
The previous fix only handled the last crumb. Looking at a real audit
(express qr-code-generator on aem.live) revealed all crumbs were aem.live
URLs, not just the last — because the authored <a href> values in
express breadcrumbs are relative ('/express', '/express/feature'), so
link.href resolves against the current document base and gives aem.live.
To mimic production rendering in the structured data regardless of
environment, rewrite every item URL whose hostname matches the current
rendering origin to use the canonical link's origin instead. Always strip
query string and #hash too. External URLs (different hostname) are
preserved but still get query/hash stripped.
Specifically:
- Read <link rel="canonical"> once; derive its origin as the production
origin.
- For each ListItem: if the URL is same-origin as the current page,
swap origin to production and strip query/hash. Otherwise keep origin,
strip query/hash.
- Last-crumb fallback (no <a>): canonical URL when available, otherwise
current-page origin+pathname (stripped).
On the express qr-code-generator audit URL, every ListItem.item is now
https://www.adobe.com/... with no debug params, ratings token, or
session noise.
Tests: 13 passing (11 prior + 2 new — canonical origin rewrite, external
URL preservation). The pre-existing 'should localize breadcrumb links'
failure is unrelated (port 2000 vs 8000 fixture mismatch; reproduces on
origin code).
Lint: clean.
Revert 129bc95 and 7e16408. Canonicalization of breadcrumb item URLs belongs in the JSON-LD Graph Manager, not in the gnav producer: it matches the existing defensive-normalization pattern (e.g., '#org' -> '#organization', cross-page WebPage rewriting), and it keeps the producer-side blast radius scoped to the manager's feature flag. A follow-up commit adds canonicalizeBreadcrumbItems to the manager along with a normative requirement in section 3.7 of the design doc. This reverts commit 129bc95fb1d8a92dca6e1bf4ee44a8a4e8db8ddc and commit 7e1640880b5fa1f1ec1fbc05923f1a3b80e0a2b3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Defensive normalization that replaces the (now-reverted) gnav-side fix. Producers emit what's natural — relative <a href> values that resolve to the current rendering origin (aem.live, aem.page, branch URLs); on the last crumb, window.location.href with full query string and #hash. The manager normalizes these to the canonical production form when ingesting. New requirement breadcrumblist-items-canonical-origin (info, section 3.7): for each BreadcrumbList.itemListElement[*].item, the manager rewrites same-origin URLs to the origin of <link rel='canonical'> and strips query strings and #hash from every item URL. External-host URLs preserve their origin. The rewrite is skipped when no canonical link is present. Why this lives in the manager rather than the producer: - Same architectural pattern as #org -> #organization canonicalization and cross-page WebPage rewriting. - Behavior gated by the existing jsonld-graph-manager feature flag; no blast radius on pages where the manager is off. - Producer-side fix-up is still preferred long-term; this is defensive. Implementation: - canonicalizeBreadcrumbItems(node) helper exported alongside the other canonicalize* functions. - Called per-node in rebuild() alongside rewriteCrossPageRefs and canonicalizeReferences. Tests: 91 passing (86 + 5 new — non-BC no-op, missing-canonical no-op, same-origin rewrite with query/hash strip, external-host preservation, end-to-end with a producer-emitted BreadcrumbList on a non-prod hostname). Section 4.3 updated to mention this manager handling. Lint: clean.
…uery param Add an escape hatch so callers can leave specific producer JSON-LD untouched. When a producer script matches an entry on the ignore list, the manager does not parse it for ingestion, does not remove it from the DOM, and does not let it contribute to the managed graph. Off by default. Spec (sections 3.7, 3.4, 6.1): - New normative requirement ignore-types-bypass (info) describing the match rules: case-insensitive lowercase comparison against schema.org @type values, plus a special pseudo-type 'graph' that matches any script whose top-level shape is { '@graph': [...] } regardless of its contents. - Added a rule-interaction aside under section 3.4 documenting that webpage-canonical-singleton, organization-singleton, and breadcrumblist-singleton remain satisfied via baseline synthesis or 'when applicable' semantics; required-primary-type is at risk if callers ignore their sole primary type. - Section 6.1 extended with the new ignore-types flag and example. - Section 6.2 lists a new 'ignored' debug event. Implementation (libs/features/jsonld-graph-manager/jsonld-graph-manager.js): - parseIgnoreParam(search) reads the comma-separated query parameter, trims, lowercases, and drops empty entries. Exported. - shouldIgnoreScript(scriptEl, ignoreTypes) parses the script JSON, detects the @graph pseudo-type, walks top-level @type values, and returns true if any match. When a script has mixed types and only some match, the whole script is bypassed and a Lana warn is emitted recommending split-or-use-'graph'. Exported. - enqueue() gates on shouldIgnoreScript before queueing. Applies to both bootDom and runtime entry paths (both end at enqueue). - JsonLdGraphManager constructor accepts ignoreTypes option for testability; default falls back to module-level IGNORE_TYPES parsed from the URL. Tests: 102 passing (91 + 11 new) — parseIgnoreParam empty/whitespace, case-insensitive matching, @graph pseudo-type, unparseable JSON, end-to-end script-in-DOM preservation, sibling-not-affected, @graph bypass, mixed-type Lana warn, runtime/MutationObserver path. Lint: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
flattenPayload only handled the common case where a script's entire
content was { '@context', '@graph': [...] }. Two edge cases leaked
producer-side wrapper objects (or lost typed-wrapper fields) into the
managed output:
- Case A: a script whose top-level content is an array containing a
graph wrapper, e.g. [{Article}, {'@graph': [Video]}]. The wrapper
passed through normalizeNode without an @type, got keyed by
JSON.stringify, and appeared as a node in the managed @graph carrying
a residual '@graph' property. The Video inside was stranded.
- Case B: a typed object that also carried '@graph', e.g.
{ '@type': 'WebPage', name: 'X', '@graph': [...] }. The old code
returned only the inner '@graph' contents and silently dropped the
WebPage's name and other top-level fields.
Fix: make flattenPayload recursive. Arrays flatMap through flattenPayload.
Objects with '@graph' yield their inner contents (flattened), plus the
wrapper-minus-'@graph'-and-'@context' as a sibling iff it has '@type'.
Nested wrappers flatten to any depth. The managed @graph is now
guaranteed to contain no node carrying its own '@graph' property.
Spec section 2.3 updated to make 'recursively flattened' explicit and
to document the typed-wrapper split.
Tests: 107 passing (102 + 5 new) — array-with-wrapper, typed-with-graph
preserving fields, nested wrappers, pure wrapper, plus an end-to-end
managed-graph assertion confirming no embedded '@graph' property leaks.
Lint: clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Real-world repro from the express qr-code-generator page: the team ships
a pre-baked graph as [{ @context, @graph: [WebPage, SoftwareApplication,
BreadcrumbList, FAQPage, ... ] }] — an array wrapping a single graph
container. With jsonld-graph-manager-ignore=graph the user expected the
whole script to be bypassed; instead, the manager ingested every node
inside the wrapper.
Cause: shouldIgnoreScript only treated the script as a graph container
when the parsed JSON itself was { '@graph': [...] }. The array branch
walked each element looking for a string @type, found none on the
wrapper, and returned false. The script entered the queue and the
recursive flatten unpacked every inner node into the managed graph.
Fix: rewrite shouldIgnoreScript around a unified 'match ids' model.
Each top-level item (the parsed content itself if it's an object, or
each element if it's an array) contributes up to two ids:
- 'graph' if the item has an @graph property
- lowercase @type if the item carries a string @type
A script is bypassed when any id is on the ignore list. Mixed cases —
some ids match, some don't — still bypass the whole script and emit the
existing Lana warning. The data-milo-jsonld='graph' attribute on the
manager's output script ensures consumers can always distinguish the
manager-emitted graph from a bypassed producer script in DOM.
Semantic refinement: the @graph wrapper is no longer 'transparent' for
type matching — its inner @types do not satisfy a type-name ignore on
their own. Callers who want to bypass a wrapped graph by inner type
should include 'graph' in the ignore list. This brings shouldIgnoreScript
in line with the existing spec language ('a match is the pseudo-type
graph') rather than the previous implementation's leakier behavior. One
test asserted the leaky behavior (line 1401) but its inline comment
already described the new correct behavior — assertion now matches the
comment.
Spec section 3.7 updated to describe the match-ids model and the
data-milo-jsonld='graph' marker.
Tests: 110 passing (107 + 3 new) — array-wrapped @graph match, @type+@graph
single object with mixed warning, end-to-end DA Express scenario with the
exact shape from the production repro.
Lint: clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous fix added array-wrapped-wrapper detection for the 'graph' pseudo-type but only inspected top-level @type values for name-based matching. That fell short of the user's intent: if 'breadcrumblist' is on the ignore list and a producer's pre-baked graph contains a BreadcrumbList nested inside an @graph wrapper, the script should be bypassed — same as a free-standing BreadcrumbList script. Fix: type-name matching now uses the same recursive flattenPayload pass that ingestion uses. Every @type discovered at any depth (top level or inside @graph wrappers, even nested wrappers) is considered for the match. The 'graph' pseudo-type retains its short-circuit semantics — if a wrapper exists at the top level and 'graph' is on the ignore list, the script is bypassed immediately with no further analysis (and no mixed-types warning). When the recursive type set contains both matched and unmatched ids, the whole script is still bypassed but a Lana warning is logged with the kept ids surfaced. Example: producer ships [{@graph: [WebPage, BreadcrumbList]}] and user passes ignore=breadcrumblist — the script is bypassed and a warning lists 'webpage' as also dropped. Two earlier test assertions were renamed and flipped to match the new recursive semantics — they had been validating the prior leaky behavior. Spec section 3.7 updated: the description now distinguishes the 'graph' short-circuit from recursive type-name matching and explicitly mentions that nested @types in @graph wrappers count. Tests: 111 passing (110 + 1 new) — added an end-to-end test that reproduces the user's stated case (pre-baked array-wrapped graph + ignore on a nested type) and verifies the bypass plus the mixed-types warning. Lint: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The manager's previous 'bootDom' source label only meant 'present when the manager initialized' — by that point, block decoration had already emitted runtime scripts that looked indistinguishable from HTML-authored ones. Future policy work (e.g., 'leave HTML-authored scripts alone, only manage runtime-emitted ones') needs an authoritative HTML-vs-runtime signal. Take a WeakSet snapshot of every JSON-LD script in <head> at the very top of loadArea(document) — before setCountry, before checkForPageMods, before any section decoration runs. Anything in the snapshot was in the raw HTML; anything that arrives later was emitted by a Milo block or feature. Implementation: - New libs/utils/jsonld-ns.js holds the small read-side API for the shared 'window.miloJsonLd' namespace: jsonLdNs(), snapshotHtmlJsonLd(), isHtmlJsonLd(). Idempotent snapshot; repeated calls are no-ops. - libs/utils/utils.js: inline three-line snapshot at the top of loadArea (the file deliberately avoids static imports; dynamic-import would introduce a microtask gap during which producers could sneak scripts in, so the snapshot is inlined). - libs/features/jsonld-graph-manager/jsonld-graph-manager.js: rename the singleton handle 'window.miloJsonLdGraphManager' to 'window.miloJsonLd.manager' so the namespace holds both authored scripts and the manager instance under one key instead of two. - Test reset helper updated for the new namespace. No behavior change yet — the manager doesn't consume isHtmlJsonLd() anywhere. This commit just establishes the signal so subsequent work (e.g., a 'don't manage HTML-authored scripts' policy) has reliable data. Tests: 117 passing (111 manager + 6 new for jsonld-ns covering snapshot capture, runtime exclusion, idempotency, explicit-root, and pre-snapshot no-op). Lint: clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
This PR has not been updated recently and will be closed in 7 days if no action is taken. Please ensure all checks are passing, https://github.com/orgs/adobecom/discussions/997 provides instructions. If the PR is ready to be merged, please mark it with the "Ready for Stage" label. |
Lets local agent skill folders (now linked under .agents/skills/) stay out of git status without tracking the symlinks.
…TML JSON-LD Initialize JsonLdGraphManager near the top of loadArea(document) — after canonical URL finalization, before any block/feature decoration — instead of in documentPostSectionLoading. The manager's boot scan already captures the JSON-LD present in the DOM as `bootDom` and its MutationObserver captures everything emitted afterward as `runtime`, so it is its own HTML-vs-runtime signal. This makes the separate snapshot mechanism redundant: remove libs/utils/jsonld-ns.js (jsonLdNs/snapshotHtmlJsonLd/isHtmlJsonLd), the inline WeakSet snapshot in loadArea, and the htmlJsonLd reset in the manager test helper. No consumer read isHtmlJsonLd(), so dropping the always-on global state and unused signal is a net simplification. Tradeoff: init now runs after checkForPageMods (canonical URL must be final for page-scoped @id derivation), so MEP-injected JSON-LD lands in the boot scan as `bootDom` rather than `runtime`. The signal only feeds merge priority today, which is rarely contested for these producers. Tests: 111 manager + 161 utils passing. Lint clean.
Summary
The JSON-LD Graph Manager is a Milo feature that collects all the JSON-LD on a page and rewrites it as one canonical, linked
@graph. This centralization enables the manager to automatically apply JSON-LD graph features that may improve search engine and LLM visibility, such as cross-entity@idlinking and singleton enforcement for certain types.Specification
See
libs/utils/json-ld.md.Testing
You can use the following URL query parameters with any AEM url:
milolibs=hgpa-jsonld-graph-managerto load this Milo from this branchjsonld-graph-manager=trueto enable the feature (off by default). This can also be done via page metadata.jsonld-graph-manager-debug=trueto enable console.debug logging. Remember to add 'Verbose' to Console levels to view.Example URLs:
Use the following JavaScript snippet to quickly parse available JSON-LD content: