fix(core): exclude drafts and trashed entries from taxonomy term counts by r2sake · Pull Request #822 · emdash-cms/emdash

r2sake · 2026-04-29T14:31:13Z

What does this PR do?

getTaxonomyTerms and getTerm counted assignments straight from content_taxonomies with no filter on entry status or deleted_at. As a result, drafts and soft-deleted entries showed up in the Categories / Tags sidebar widget counts on public sites — the post count next to a category name didn't match the number of posts you'd actually find when clicking through.

This joins the underlying ec_{collection} table for each collection declared on the taxonomy definition and restricts counts to status = 'published' AND deleted_at IS NULL. Dynamic table names are validated via validateIdentifier and bound through sql.ref() per the project's SQL conventions. A new helper countPublishedTermAssignments is shared between getTaxonomyTerms and getTerm, with a termIds option so the single-term path doesn't scan the whole taxonomy. Pre-migration databases (or drift where a referenced ec_* table doesn't exist) contribute zero rows via the existing isMissingTableError branch.

Closes #581

Type of change

Checklist

I have read CONTRIBUTING.md
pnpm typecheck passes
pnpm lint passes
pnpm test passes (or targeted tests for my change)
pnpm format has been run
I have added/updated tests for my changes (if applicable)
User-visible strings in the admin UI are wrapped for translation (if applicable). Do not include messages.po changes except in translation PRs — a workflow extracts catalogs on merge to main.
I have added a changeset (if this PR changes a published package)
New features link to an approved Discussion: https://github.com/emdash-cms/emdash/discussions/...

AI-generated code disclosure

This PR includes AI-generated code

Screenshots / test output

$ pnpm --filter emdash test --run tests/unit/taxonomies
 ✓ tests/unit/taxonomies/term-counts.test.ts (4 tests) 245ms
 ✓ tests/unit/taxonomies/get-all-terms-for-entries.test.ts (7 tests) 389ms
 ✓ tests/unit/taxonomies/taxonomies.test.ts (35 tests) 1702ms

 Test Files  3 passed (3)
      Tests  46 passed (46)

$ pnpm --filter emdash test
 Test Files  183 passed (183)
      Tests  3037 passed (3037)

The 4 new test cases in packages/core/tests/unit/taxonomies/term-counts.test.ts cover:

drafts excluded from getTaxonomyTerms counts
drafts excluded from getTerm count
soft-deleted entries excluded from term counts
term with only draft attachments returns count 0

getTaxonomyTerms and getTerm counted rows directly from content_taxonomies, ignoring the entry's status or deleted_at. Drafts and soft-deleted entries inflated the Categories / Tags widget counts on public sites. Join the underlying ec_{collection} table for each collection declared on the taxonomy definition and filter to status='published' AND deleted_at IS NULL. Dynamic identifiers are validated and bound via sql.ref() per the project's SQL conventions, with a defensive fallback that skips collections whose ec_* table doesn't exist (pre-migration databases or drift).

changeset-bot · 2026-04-29T14:31:18Z

🦋 Changeset detected

Latest commit: 1b220d5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages

Name	Type
emdash	Patch
@emdash-cms/cloudflare	Patch
@emdash-cms/fixture-perf-site	Patch
@emdash-cms/perf-demo-site	Patch
@emdash-cms/cache-demo-site	Patch
@emdash-cms/admin	Patch
@emdash-cms/auth	Patch
@emdash-cms/blocks	Patch
@emdash-cms/gutenberg-to-portable-text	Patch
@emdash-cms/x402	Patch
create-emdash	Patch
@emdash-cms/auth-atproto	Patch
@emdash-cms/plugin-embeds	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2026-04-29T14:33:12Z

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@822

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@822

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@822

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@822

emdash

npm i https://pkg.pr.new/emdash@822

create-emdash

npm i https://pkg.pr.new/create-emdash@822

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@822

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@822

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@822

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@822

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@822

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@822

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@822

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@822

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@822

commit: 1b220d5

github-actions · 2026-04-29T19:06:04Z

Query-count snapshot changes

7 routes changed, total Δ +4 queries.

SQLite

Route	Before	After	Δ
`GET /category/development (cold)`	15	16	+1
`GET /posts/building-for-the-long-term (warm)`	23	21	-2
`GET /tag/webdev (cold)`	14	15	+1

D1

Route	Before	After	Δ
`GET /category/development (cold)`	25	26	+1
`GET /category/development (warm)`	14	15	+1
`GET /tag/webdev (cold)`	25	26	+1
`GET /tag/webdev (warm)`	14	15	+1

_{Comparing snapshot files between base and head. Updated automatically on each push.}

ascorbic · 2026-04-29T19:34:29Z

Thanks @r2sake. Correctness-wise this is exactly the right fix, and I want to merge it. But I'd like us to address the per-page query cost before it lands. Two concerns:

Round-trip count scales with def.collections.length

countPublishedTermAssignments loops over collections and runs one query per collection. The snapshot diff in this PR shows +1 cold/+1 warm because the sample taxonomies span one collection each, but on real sites where a tags taxonomy is shared across posts and products (the canonical multi-collection case), this becomes +N round-trips on every category/tag page render.

If we keep the per-collection loop, we should at least merge them into a single query via UNION ALL. Roughly:

const unionParts = collections.map((c) => sql`
  SELECT ct.taxonomy_id, COUNT(*) AS count
  FROM content_taxonomies ct
  INNER JOIN ${sql.ref(`ec_${c}`)} e ON e.id = ct.entry_id
  WHERE ct.collection = ${c}
    AND e.status = 'published'
    AND e.deleted_at IS NULL
    ${termsFilter}
  GROUP BY ct.taxonomy_id
`);
const result = await sql<{ taxonomy_id: string; count: number }>`
  SELECT taxonomy_id, SUM(count) AS count FROM (${sql.join(unionParts, sql` UNION ALL `)}) GROUP BY taxonomy_id
`.execute(db);

That keeps the cost at one round-trip regardless of collection count. We'd still need the validateIdentifier / isMissingTableError handling per collection.

The result is a strong cache candidate

getTaxonomyTerms is already wrapped in requestCached, so within a single render this only fires once. But across renders, and especially across colos where the warm count is the steady-state, counts change only when entries are published, unpublished, trashed, or restored. invalidateTermCache() is already exported and called from the right places (seed apply, all taxonomies handlers, the content/[collection]/[id]/terms/[taxonomy] route), but the function body is currently a no-op (vestigial from the old "has any assignments?" probe).

If we put the term-counts map behind a worker-lifetime holder and wire invalidateTermCache() to clear it, the warm cost goes from "+1 query every render that calls a taxonomy widget" to "0 queries until a content write invalidates". The pattern to follow is bylinesHolder (packages/core/src/bylines/index.ts) or SECRETS_CACHE_KEY (packages/core/src/config/secrets.ts):

const TERM_COUNTS_KEY = Symbol.for("@emdash-cms/core/term-counts@1");
interface TermCountsHolder { byTaxonomy: Map<string, Map<string, number>> }
function holder() {
  const g = globalThis as Record<symbol, TermCountsHolder | undefined>;
  if (!g[TERM_COUNTS_KEY]) g[TERM_COUNTS_KEY] = { byTaxonomy: new Map() };
  return g[TERM_COUNTS_KEY]!;
}
// inside getTaxonomyTerms, after fetching `def`:
const cached = holder().byTaxonomy.get(taxonomyName);
if (cached) {
  // build from cache
} else {
  const counts = await countPublishedTermAssignments(db, def.collections);
  holder().byTaxonomy.set(taxonomyName, counts);
}

// invalidateTermCache becomes:
export function invalidateTermCache(): void {
  holder().byTaxonomy.clear();
}

This makes content writes responsible for clearing the cache, which is exactly what the existing invalidation call sites do already. Stale-read window across colos exists but is bounded by the eventual-consistency window of the write itself.

What I'd love to see in this PR:

Single UNION ALL query for the cold path (item 1)
Worker-lifetime cache wired to the existing invalidateTermCache() (item 2)

github-actions · 2026-04-30T03:44:54Z

Overlapping PRs

This PR modifies files that are also changed by other open PRs:

perf(core): dedupe taxonomy-def and posts-list fetches per request #840 (3 shared files)

This may cause merge conflicts or duplicated work. A maintainer will coordinate.

Addresses review feedback on emdash-cms#822. countTermAssignmentsForTaxonomy now combines the per-collection joins into one UNION ALL round-trip, so a `tags` taxonomy spanning `posts` and `products` costs one query instead of N. Pre-migration databases or drift where an `ec_{collection}` table is missing fall back to a per-collection loop so the rest of the count is still correct. Layers a worker-lifetime cache on top via a `globalThis` symbol holder (same pattern as secrets.ts), keyed by `(db, taxonomyName)` so playground / per-DO / per-test databases each maintain isolation. The in-flight promise is cached so concurrent callers share one query; errors drop the entry so the next caller retries. invalidateTermCache() — previously a no-op left over from an old "has any assignments?" probe — now clears the holder. The taxonomy and term-assignment write paths already called it; this commit also wires the call into handleContentPublish / handleContentUnpublish / handleContentDelete / handleContentRestore / handleContentPermanentDelete so the cache is dropped any time visibility changes. Tests: existing emdash-cms#581 cases still pass; two new cases pin the worker-lifetime cache (stale until invalidate, fresh after) and the multi-collection aggregation behavior.

Copilot

Pull request overview

Fixes incorrect taxonomy term counts on public sites by ensuring term assignment counts only include published, non-trashed entries, and adds caching + invalidation so counts remain performant and correct across requests.

Changes:

Update taxonomy term counting to join against ec_{collection} tables and filter to status = 'published' AND deleted_at IS NULL.
Add a worker-lifetime term-counts cache (globalThis singleton) with explicit invalidation hooks on publish/unpublish/delete/restore/permanent delete.
Add unit tests covering draft/trashed exclusion, cache invalidation behavior, and multi-collection aggregation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`packages/core/src/taxonomies/index.ts`	Implements published-only term counting with dynamic `ec_*` joins and worker-lifetime cached results.
`packages/core/src/api/handlers/content.ts`	Invalidates the term-count cache on content lifecycle operations that affect visibility.
`packages/core/tests/unit/taxonomies/term-counts.test.ts`	Adds regression tests for #581 and validates cache behavior.
`.changeset/fix-taxonomy-term-counts.md`	Patch changeset documenting the bugfix.
`scripts/query-counts.snapshot.sqlite.json`	Updates query-count snapshot reflecting the new query behavior.
`scripts/query-counts.snapshot.d1.json`	Updates D1 query-count snapshot reflecting the new query behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…optimization Bundles two unrelated cleanups for the same file so they ship in one push: 1. Restore the `peekRequestCache` early-return in `getTaxonomyDef` that PR emdash-cms#840 (64bf5b9) added on main. The earlier merge resolution in 5b2791e silently took "ours" and dropped it; without this fix emdash-cms#822 would regress that per-render dedupe on merge. Apologies for the slip. 2. Address Copilot's review comments on the term-counts helper: - Header / `invalidateTermCache` doc comments now spell out that counts also change on term-assignment writes (`setTermsForEntry`, `attachToEntry`, `detachFromEntry`, `clearEntryTerms`) and on term create/update/delete, not only on entry-visibility flips. Future write paths can audit the list instead of guessing. - `countTermAssignmentsForTaxonomy` now actually scopes its SQL to the target taxonomy by joining `taxonomies` and filtering `t.name = ${taxonomyName}`. Previously the docstring claimed single-taxonomy semantics but the query aggregated every term living in the same collection — harmless because the per- taxonomy cache key dropped the extras, but wasted bytes and misleading. - `safeCollections` is deduped with `new Set(collections)` before the UNION ALL is built, so a `_emdash_taxonomy_defs.collections` JSON value with duplicates can't double-count assignments. - `getTerm` now carries a comment explaining why it deliberately reads the full per-taxonomy counts map from the shared worker- lifetime cache rather than running a narrower single-term query. A separate `WHERE taxonomy_id = ?` path would either bypass the cache or fragment it; the shared map turns the second call into a Map lookup with no extra round-trip on the typical widget+detail render. Tests still 48/48 in `tests/unit/taxonomies/`. No public API change.

ascorbic · 2026-05-02T14:28:38Z

/review

ask-bonk

Correctness fix is solid and the tests cover the four bug shapes well. Two things worth pushing on before merge:

The worker-lifetime cache is effectively a no-op on D1. Keying the cache by Kysely instance means it misses on every request when D1 Sessions are enabled, because createRequestScopedDb (in @emdash-cms/cloudflare) builds a fresh Kysely per request and stashes it in ALS. The query-count snapshot in this PR confirms it: D1 warm went +1 on /category/development and /tag/webdev, while SQLite warm (singleton db) stayed flat. That's the exact opposite of the warm-cost-zero outcome ascorbic asked for; on the actual deployment target the cache only pays back within a single render. Inline comment with a suggested fix below.
PR description references a termIds option that isn't in the diff. The shipped implementation took a different (and reasonable) tack — share the per-taxonomy counts map between getTaxonomyTerms and getTerm via the cache, so getTerm is a Map lookup once the widget has primed it. The body just needs to be updated to match. (Copilot's other three comments don't apply — t.name filter, dedupe, and assignment-side invalidation are all already in the diff.)

Nothing else stuck out. SQL builder usage is right, validateIdentifier + sql.ref() for the dynamic table name is the correct shape, the isMissingTableError fallback covers both pre-migration and orphaned-table drift, and the invalidation call sites cover every write path that exists today (no handleTaxonomyDefUpdate, so the missing collections-array invalidation isn't reachable).

ask-bonk · 2026-05-02T14:34:37Z

+function getCachedTermCounts(
+	db: Kysely<any>,
+	taxonomyName: string,
+	collections: string[],
+): Promise<Map<string, number>> {
+	const { cache } = getTermCountsHolder();
+	let perDb = cache.get(db);
+	if (!perDb) {
+		perDb = new Map();
+		cache.set(db, perDb);
+	}
+	const cached = perDb.get(taxonomyName);
+	if (cached) return cached;
+	const promise = countTermAssignmentsForTaxonomy(db, taxonomyName, collections).catch((error) => {
+		perDb.delete(taxonomyName);
+		throw error;
+	});
+	perDb.set(taxonomyName, promise);
+	return promise;
+}


Cache misses on every D1 request. The WeakMap<Kysely, ...> layer means the cache is keyed by the per-request session-bound Kysely that createRequestScopedDb (in @emdash-cms/cloudflare/db/d1.ts:165) creates anew on every request. As a result, the second request to /category/x gets a different db reference than the first and reruns the count query — which is exactly what the query-count snapshot diff shows on this PR (D1 warm +1 for /category/development and /tag/webdev, while SQLite warm — using the module singleton — stays flat).

The secrets.ts precedent has the same shape but is only on edit/preview paths, so the regression there is invisible. Public taxonomy widget renders are the hot path that ascorbic's request was specifically targeting.

Simplest fix: drop the per-db indirection. The globalThis holder + Symbol.for keying is enough for SSR-bundle module-duplication safety, and invalidateTermCache() already swaps the holder wholesale on writes. Tests can keep working because they call invalidateTermCache() between cases (and could also be tightened to clear in beforeEach if cross-test isolation matters).

Suggested change

function getCachedTermCounts(

db: Kysely<any>,

taxonomyName: string,

collections: string[],

): Promise<Map<string, number>> {

const { cache } = getTermCountsHolder();

let perDb = cache.get(db);

if (!perDb) {

perDb = new Map();

cache.set(db, perDb);

}

const cached = perDb.get(taxonomyName);

if (cached) return cached;

const promise = countTermAssignmentsForTaxonomy(db, taxonomyName, collections).catch((error) => {

perDb.delete(taxonomyName);

throw error;

});

perDb.set(taxonomyName, promise);

return promise;

}

interface TermCountsHolder {

byTaxonomy: Map<string, Promise<Map<string, number>>>;

}

function getTermCountsHolder(): TermCountsHolder {

// eslint-disable-next-line typescript-eslint(no-unsafe-type-assertion) -- globalThis singleton pattern

const holder = globalThis as Record<symbol, TermCountsHolder | undefined>;

let entry = holder[TERM_COUNTS_CACHE_KEY];

if (!entry) {

entry = { byTaxonomy: new Map() };

holder[TERM_COUNTS_CACHE_KEY] = entry;

}

return entry;

}

And correspondingly in getCachedTermCounts:

function getCachedTermCounts( db: Kysely<any>, taxonomyName: string, collections: string[], ): Promise<Map<string, number>> { const { byTaxonomy } = getTermCountsHolder(); const cached = byTaxonomy.get(taxonomyName); if (cached) return cached; const promise = countTermAssignmentsForTaxonomy(db, taxonomyName, collections).catch((error) => { byTaxonomy.delete(taxonomyName); throw error; }); byTaxonomy.set(taxonomyName, promise); return promise; }

The db only needs to be passed for the actual query execution, not for cache scoping — the cache lives at the worker level, which is logically per-deployment-per-database. (If you do want per-DB scoping for playground / DO previews, key by something stable like dbConfig.entrypoint rather than the Kysely instance — but in practice playgrounds run in their own isolates anyway, so the globalThis already provides isolation.)

With this change the warm path on D1 should drop to 0 queries for the term-counts read, matching the SQLite warm baseline.

ask-bonk · 2026-05-02T14:34:37Z

+	// the shared cache turns the second call into a Map lookup with no
+	// extra round-trip. A separate `WHERE taxonomy_id = ?` path would
+	// either bypass the cache (extra query on every render) or fragment
+	// it (separate cache keys for full vs. single-term reads).


Minor: PR description mentions a termIds option on a shared countPublishedTermAssignments helper so the single-term path doesn't scan the whole taxonomy. The shipped code went with a different (better) design — share the full per-taxonomy counts map between getTaxonomyTerms and getTerm via the cache, as the comment here explains. Worth updating the PR body to match what's actually implemented so reviewers don't go looking for the option.

ask-bonk · 2026-05-02T14:34:44Z

Review posted successfully. Summary of what I flagged:

Major finding: The WeakMap<Kysely, ...> cache layer defeats the worker-lifetime cache on D1 — createRequestScopedDb creates a fresh Kysely per request, so cache always misses across requests. The PR's own snapshot diff confirms it (D1 warm +1, SQLite warm flat). This was the explicit reason ascorbic asked for the cache. Suggested fix: drop the per-db indirection and key by taxonomyName only on the globalThis holder.

Minor finding: PR description mentions a termIds option that isn't in the implementation — the actual approach (shared cache between getTaxonomyTerms and getTerm) is fine, just needs the body updated.

Otherwise the correctness fix is good — proper SQL builder usage, dedupe, identifier validation, drift handling, and all current write paths invalidate.

github run

github-actions Bot added area/core size/L cla: signed labels Apr 29, 2026

ascorbic mentioned this pull request Apr 29, 2026

fix(ci): query-counts-apply workflow no longer silently fails #828

Merged

18 tasks

ascorbic and others added 2 commits April 29, 2026 19:38

Merge branch 'main' into fix/581-taxonomy-counts

7cf0087

ci: update query-count snapshots

adeebbc

github-actions Bot added the query-count changed PR diff modifies query-count snapshot files label Apr 29, 2026

Merge branch 'main' into fix/581-taxonomy-counts

5b2791e

github-actions Bot mentioned this pull request Apr 30, 2026

perf(core): dedupe taxonomy-def and posts-list fetches per request #840

Merged

18 tasks

github-actions Bot added the overlap label Apr 30, 2026

github-actions Bot added the needs-rebase label Apr 30, 2026

Copilot AI review requested due to automatic review settings May 2, 2026 10:20

Copilot started reviewing on behalf of r2sake May 2, 2026 10:20 View session

Copilot AI reviewed May 2, 2026

View reviewed changes

Comment thread packages/core/src/taxonomies/index.ts Outdated

Comment thread packages/core/src/taxonomies/index.ts Outdated

Comment thread packages/core/src/taxonomies/index.ts Outdated

Comment thread packages/core/src/taxonomies/index.ts

r2sake and others added 2 commits May 2, 2026 19:33

Merge branch 'main' into fix/581-taxonomy-counts

720c8d4

github-actions Bot removed the needs-rebase label May 2, 2026

ci: update query-count snapshots

1b220d5

ask-bonk Bot reviewed May 2, 2026

View reviewed changes

github-actions Bot added the needs-rebase label May 6, 2026

Conversation

r2sake commented Apr 29, 2026

What does this PR do?

Type of change

Checklist

AI-generated code disclosure

Screenshots / test output

Uh oh!

changeset-bot Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pkg-pr-new Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Query-count snapshot changes

SQLite

D1

Uh oh!

ascorbic commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026

Overlapping PRs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ascorbic commented May 2, 2026

Uh oh!

ask-bonk Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot Bot commented Apr 29, 2026 •

edited

Loading

pkg-pr-new Bot commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading

ascorbic commented Apr 29, 2026 •

edited

Loading