fix(core): exclude drafts and trashed entries from taxonomy term counts#822
fix(core): exclude drafts and trashed entries from taxonomy term counts#822r2sake wants to merge 8 commits intoemdash-cms:mainfrom
Conversation
getTaxonomyTerms and getTerm counted rows directly from
content_taxonomies, ignoring the entry's status or deleted_at.
Drafts and soft-deleted entries inflated the Categories / Tags
widget counts on public sites.
Join the underlying ec_{collection} table for each collection
declared on the taxonomy definition and filter to status='published'
AND deleted_at IS NULL. Dynamic identifiers are validated and bound
via sql.ref() per the project's SQL conventions, with a defensive
fallback that skips collections whose ec_* table doesn't exist
(pre-migration databases or drift).
🦋 Changeset detectedLatest commit: 1b220d5 The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/blocks
@emdash-cms/cloudflare
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
Query-count snapshot changes7 routes changed, total Δ +4 queries. SQLite
D1
Comparing snapshot files between base and head. Updated automatically on each push. |
|
Thanks @r2sake. Correctness-wise this is exactly the right fix, and I want to merge it. But I'd like us to address the per-page query cost before it lands. Two concerns:
If we keep the per-collection loop, we should at least merge them into a single query via const unionParts = collections.map((c) => sql`
SELECT ct.taxonomy_id, COUNT(*) AS count
FROM content_taxonomies ct
INNER JOIN ${sql.ref(`ec_${c}`)} e ON e.id = ct.entry_id
WHERE ct.collection = ${c}
AND e.status = 'published'
AND e.deleted_at IS NULL
${termsFilter}
GROUP BY ct.taxonomy_id
`);
const result = await sql<{ taxonomy_id: string; count: number }>`
SELECT taxonomy_id, SUM(count) AS count FROM (${sql.join(unionParts, sql` UNION ALL `)}) GROUP BY taxonomy_id
`.execute(db);That keeps the cost at one round-trip regardless of collection count. We'd still need the
If we put the term-counts map behind a worker-lifetime holder and wire const TERM_COUNTS_KEY = Symbol.for("@emdash-cms/core/term-counts@1");
interface TermCountsHolder { byTaxonomy: Map<string, Map<string, number>> }
function holder() {
const g = globalThis as Record<symbol, TermCountsHolder | undefined>;
if (!g[TERM_COUNTS_KEY]) g[TERM_COUNTS_KEY] = { byTaxonomy: new Map() };
return g[TERM_COUNTS_KEY]!;
}
// inside getTaxonomyTerms, after fetching `def`:
const cached = holder().byTaxonomy.get(taxonomyName);
if (cached) {
// build from cache
} else {
const counts = await countPublishedTermAssignments(db, def.collections);
holder().byTaxonomy.set(taxonomyName, counts);
}
// invalidateTermCache becomes:
export function invalidateTermCache(): void {
holder().byTaxonomy.clear();
}This makes content writes responsible for clearing the cache, which is exactly what the existing invalidation call sites do already. Stale-read window across colos exists but is bounded by the eventual-consistency window of the write itself. What I'd love to see in this PR:
|
Overlapping PRsThis PR modifies files that are also changed by other open PRs: This may cause merge conflicts or duplicated work. A maintainer will coordinate. |
Addresses review feedback on emdash-cms#822. countTermAssignmentsForTaxonomy now combines the per-collection joins into one UNION ALL round-trip, so a `tags` taxonomy spanning `posts` and `products` costs one query instead of N. Pre-migration databases or drift where an `ec_{collection}` table is missing fall back to a per-collection loop so the rest of the count is still correct. Layers a worker-lifetime cache on top via a `globalThis` symbol holder (same pattern as secrets.ts), keyed by `(db, taxonomyName)` so playground / per-DO / per-test databases each maintain isolation. The in-flight promise is cached so concurrent callers share one query; errors drop the entry so the next caller retries. invalidateTermCache() — previously a no-op left over from an old "has any assignments?" probe — now clears the holder. The taxonomy and term-assignment write paths already called it; this commit also wires the call into handleContentPublish / handleContentUnpublish / handleContentDelete / handleContentRestore / handleContentPermanentDelete so the cache is dropped any time visibility changes. Tests: existing emdash-cms#581 cases still pass; two new cases pin the worker-lifetime cache (stale until invalidate, fresh after) and the multi-collection aggregation behavior.
There was a problem hiding this comment.
Pull request overview
Fixes incorrect taxonomy term counts on public sites by ensuring term assignment counts only include published, non-trashed entries, and adds caching + invalidation so counts remain performant and correct across requests.
Changes:
- Update taxonomy term counting to join against
ec_{collection}tables and filter tostatus = 'published' AND deleted_at IS NULL. - Add a worker-lifetime term-counts cache (globalThis singleton) with explicit invalidation hooks on publish/unpublish/delete/restore/permanent delete.
- Add unit tests covering draft/trashed exclusion, cache invalidation behavior, and multi-collection aggregation.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
packages/core/src/taxonomies/index.ts |
Implements published-only term counting with dynamic ec_* joins and worker-lifetime cached results. |
packages/core/src/api/handlers/content.ts |
Invalidates the term-count cache on content lifecycle operations that affect visibility. |
packages/core/tests/unit/taxonomies/term-counts.test.ts |
Adds regression tests for #581 and validates cache behavior. |
.changeset/fix-taxonomy-term-counts.md |
Patch changeset documenting the bugfix. |
scripts/query-counts.snapshot.sqlite.json |
Updates query-count snapshot reflecting the new query behavior. |
scripts/query-counts.snapshot.d1.json |
Updates D1 query-count snapshot reflecting the new query behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…optimization Bundles two unrelated cleanups for the same file so they ship in one push: 1. Restore the `peekRequestCache` early-return in `getTaxonomyDef` that PR emdash-cms#840 (64bf5b9) added on main. The earlier merge resolution in 5b2791e silently took "ours" and dropped it; without this fix emdash-cms#822 would regress that per-render dedupe on merge. Apologies for the slip. 2. Address Copilot's review comments on the term-counts helper: - Header / `invalidateTermCache` doc comments now spell out that counts also change on term-assignment writes (`setTermsForEntry`, `attachToEntry`, `detachFromEntry`, `clearEntryTerms`) and on term create/update/delete, not only on entry-visibility flips. Future write paths can audit the list instead of guessing. - `countTermAssignmentsForTaxonomy` now actually scopes its SQL to the target taxonomy by joining `taxonomies` and filtering `t.name = ${taxonomyName}`. Previously the docstring claimed single-taxonomy semantics but the query aggregated every term living in the same collection — harmless because the per- taxonomy cache key dropped the extras, but wasted bytes and misleading. - `safeCollections` is deduped with `new Set(collections)` before the UNION ALL is built, so a `_emdash_taxonomy_defs.collections` JSON value with duplicates can't double-count assignments. - `getTerm` now carries a comment explaining why it deliberately reads the full per-taxonomy counts map from the shared worker- lifetime cache rather than running a narrower single-term query. A separate `WHERE taxonomy_id = ?` path would either bypass the cache or fragment it; the shared map turns the second call into a Map lookup with no extra round-trip on the typical widget+detail render. Tests still 48/48 in `tests/unit/taxonomies/`. No public API change.
|
/review |
There was a problem hiding this comment.
Correctness fix is solid and the tests cover the four bug shapes well. Two things worth pushing on before merge:
-
The worker-lifetime cache is effectively a no-op on D1. Keying the cache by
Kyselyinstance means it misses on every request when D1 Sessions are enabled, becausecreateRequestScopedDb(in@emdash-cms/cloudflare) builds a freshKyselyper request and stashes it in ALS. The query-count snapshot in this PR confirms it: D1 warm went +1 on/category/developmentand/tag/webdev, while SQLite warm (singleton db) stayed flat. That's the exact opposite of the warm-cost-zero outcome ascorbic asked for; on the actual deployment target the cache only pays back within a single render. Inline comment with a suggested fix below. -
PR description references a
termIdsoption that isn't in the diff. The shipped implementation took a different (and reasonable) tack — share the per-taxonomy counts map betweengetTaxonomyTermsandgetTermvia the cache, sogetTermis a Map lookup once the widget has primed it. The body just needs to be updated to match. (Copilot's other three comments don't apply —t.namefilter, dedupe, and assignment-side invalidation are all already in the diff.)
Nothing else stuck out. SQL builder usage is right, validateIdentifier + sql.ref() for the dynamic table name is the correct shape, the isMissingTableError fallback covers both pre-migration and orphaned-table drift, and the invalidation call sites cover every write path that exists today (no handleTaxonomyDefUpdate, so the missing collections-array invalidation isn't reachable).
| function getCachedTermCounts( | ||
| db: Kysely<any>, | ||
| taxonomyName: string, | ||
| collections: string[], | ||
| ): Promise<Map<string, number>> { | ||
| const { cache } = getTermCountsHolder(); | ||
| let perDb = cache.get(db); | ||
| if (!perDb) { | ||
| perDb = new Map(); | ||
| cache.set(db, perDb); | ||
| } | ||
| const cached = perDb.get(taxonomyName); | ||
| if (cached) return cached; | ||
| const promise = countTermAssignmentsForTaxonomy(db, taxonomyName, collections).catch((error) => { | ||
| perDb.delete(taxonomyName); | ||
| throw error; | ||
| }); | ||
| perDb.set(taxonomyName, promise); | ||
| return promise; | ||
| } |
There was a problem hiding this comment.
Cache misses on every D1 request. The WeakMap<Kysely, ...> layer means the cache is keyed by the per-request session-bound Kysely that createRequestScopedDb (in @emdash-cms/cloudflare/db/d1.ts:165) creates anew on every request. As a result, the second request to /category/x gets a different db reference than the first and reruns the count query — which is exactly what the query-count snapshot diff shows on this PR (D1 warm +1 for /category/development and /tag/webdev, while SQLite warm — using the module singleton — stays flat).
The secrets.ts precedent has the same shape but is only on edit/preview paths, so the regression there is invisible. Public taxonomy widget renders are the hot path that ascorbic's request was specifically targeting.
Simplest fix: drop the per-db indirection. The globalThis holder + Symbol.for keying is enough for SSR-bundle module-duplication safety, and invalidateTermCache() already swaps the holder wholesale on writes. Tests can keep working because they call invalidateTermCache() between cases (and could also be tightened to clear in beforeEach if cross-test isolation matters).
| function getCachedTermCounts( | |
| db: Kysely<any>, | |
| taxonomyName: string, | |
| collections: string[], | |
| ): Promise<Map<string, number>> { | |
| const { cache } = getTermCountsHolder(); | |
| let perDb = cache.get(db); | |
| if (!perDb) { | |
| perDb = new Map(); | |
| cache.set(db, perDb); | |
| } | |
| const cached = perDb.get(taxonomyName); | |
| if (cached) return cached; | |
| const promise = countTermAssignmentsForTaxonomy(db, taxonomyName, collections).catch((error) => { | |
| perDb.delete(taxonomyName); | |
| throw error; | |
| }); | |
| perDb.set(taxonomyName, promise); | |
| return promise; | |
| } | |
| interface TermCountsHolder { | |
| byTaxonomy: Map<string, Promise<Map<string, number>>>; | |
| } | |
| function getTermCountsHolder(): TermCountsHolder { | |
| // eslint-disable-next-line typescript-eslint(no-unsafe-type-assertion) -- globalThis singleton pattern | |
| const holder = globalThis as Record<symbol, TermCountsHolder | undefined>; | |
| let entry = holder[TERM_COUNTS_CACHE_KEY]; | |
| if (!entry) { | |
| entry = { byTaxonomy: new Map() }; | |
| holder[TERM_COUNTS_CACHE_KEY] = entry; | |
| } | |
| return entry; | |
| } |
And correspondingly in getCachedTermCounts:
function getCachedTermCounts(
db: Kysely<any>,
taxonomyName: string,
collections: string[],
): Promise<Map<string, number>> {
const { byTaxonomy } = getTermCountsHolder();
const cached = byTaxonomy.get(taxonomyName);
if (cached) return cached;
const promise = countTermAssignmentsForTaxonomy(db, taxonomyName, collections).catch((error) => {
byTaxonomy.delete(taxonomyName);
throw error;
});
byTaxonomy.set(taxonomyName, promise);
return promise;
}The db only needs to be passed for the actual query execution, not for cache scoping — the cache lives at the worker level, which is logically per-deployment-per-database. (If you do want per-DB scoping for playground / DO previews, key by something stable like dbConfig.entrypoint rather than the Kysely instance — but in practice playgrounds run in their own isolates anyway, so the globalThis already provides isolation.)
With this change the warm path on D1 should drop to 0 queries for the term-counts read, matching the SQLite warm baseline.
| // the shared cache turns the second call into a Map lookup with no | ||
| // extra round-trip. A separate `WHERE taxonomy_id = ?` path would | ||
| // either bypass the cache (extra query on every render) or fragment | ||
| // it (separate cache keys for full vs. single-term reads). |
There was a problem hiding this comment.
Minor: PR description mentions a termIds option on a shared countPublishedTermAssignments helper so the single-term path doesn't scan the whole taxonomy. The shipped code went with a different (better) design — share the full per-taxonomy counts map between getTaxonomyTerms and getTerm via the cache, as the comment here explains. Worth updating the PR body to match what's actually implemented so reviewers don't go looking for the option.
|
Review posted successfully. Summary of what I flagged: Major finding: The Minor finding: PR description mentions a Otherwise the correctness fix is good — proper SQL builder usage, dedupe, identifier validation, drift handling, and all current write paths invalidate. |
What does this PR do?
getTaxonomyTermsandgetTermcounted assignments straight fromcontent_taxonomieswith no filter on entry status ordeleted_at. As a result, drafts and soft-deleted entries showed up in the Categories / Tags sidebar widget counts on public sites — the post count next to a category name didn't match the number of posts you'd actually find when clicking through.This joins the underlying
ec_{collection}table for each collection declared on the taxonomy definition and restricts counts tostatus = 'published' AND deleted_at IS NULL. Dynamic table names are validated viavalidateIdentifierand bound throughsql.ref()per the project's SQL conventions. A new helpercountPublishedTermAssignmentsis shared betweengetTaxonomyTermsandgetTerm, with atermIdsoption so the single-term path doesn't scan the whole taxonomy. Pre-migration databases (or drift where a referencedec_*table doesn't exist) contribute zero rows via the existingisMissingTableErrorbranch.Closes #581
Type of change
Checklist
pnpm typecheckpassespnpm lintpassespnpm testpasses (or targeted tests for my change)pnpm formathas been runmessages.pochanges except in translation PRs — a workflow extracts catalogs on merge tomain.AI-generated code disclosure
Screenshots / test output
The 4 new test cases in
packages/core/tests/unit/taxonomies/term-counts.test.tscover:getTaxonomyTermscountsgetTermcount0