Improved URL service and sitemap boot performance by rob-ghost · Pull Request #26676 · TryGhost/Ghost

rob-ghost · 2026-03-03T21:54:47Z

The URL service is one of the slowest parts of Ghost's boot sequence. It fetches every published resource from the database, generates URLs, and then notifies the sitemap service one resource at a time via events. This PR tackles four independent bottlenecks in that pipeline.

First, the Urls data store was coupled to event emission — every call to Urls.add() fired a url.added event, which the sitemap service consumed one-by-one during boot. This meant N event dispatches, N moment.js date parses, and N XML node constructions before the server could start. The store is now a pure data structure, and the URL service emits a single bulk url.init event after all URLs are generated. The sitemap manager handles this bulk event to initialize all entries at once, while runtime additions/removals still use per-resource events emitted by the UrlGenerator.

Second, the resource config used exclude lists that grew with every new schema column and still fetched ~20 columns per post when only ~10 are needed for URL generation, sitemap XML, and change detection. These are now explicit include lists, making the contract between the URL service and the database self-documenting and reducing query payload.

Third, raw-knex.js existed specifically to bypass Bookshelf ORM overhead, yet still called toJSON(), fixBools(), and fixDatesWhenFetch() through the Bookshelf prototype on every row. toJSON/serialize just shallow-copy attributes with no meaningful transformation in this context. fixDatesWhenFetch parses dates with moment.js redundantly since knex already returns JavaScript Date objects. The only necessary operation — boolean coercion — is now a pre-computed column loop without Bookshelf prototype lookups.

Fourth, relation queries (tags, authors) used WHERE IN with every post ID materialized as a literal. For 10k posts that's 10k string values the query planner must parse. This is replaced with a subquery that mirrors the main query's NQL filter, letting the database use an index scan instead of parsing a literal list.

An integration test suite (url-service-and-sitemap.test.js) was added first as a safety net, covering both default routing and custom multi-collection routing with custom fixtures. It verifies URL generation, sitemap content, canonical exclusions, and draft filtering end-to-end.

Safety-net test that boots the URL service with custom fixtures and verifies both URL resolution and sitemap XML output from the same entrypoint. Tests outcomes: correct paths per collection, canonical_url exclusion, draft exclusion, orphan tag exclusion, feature_image in sitemap image nodes, and multi-collection routing.

The Urls class emitted url.added/url.removed events on every add/remove, coupling the data structure to notification. During boot this fires once per resource — the only consumer is the sitemap service, which does per-row moment() parsing and XML node construction for each event. Urls is now a pure data structure. UrlGenerator emits url.added and url.removed for runtime changes (post published, updated, deleted). UrlService emits a single url.init event after boot completes, and SiteMapManager handles it in bulk. This eliminates N event emissions during init, replacing them with one.

The old config used exclude lists that grew with every new column added to the schema. Include lists are explicit about what the URL service needs: only fields used for URL generation (permalink patterns, NQL filter evaluation), sitemap XML (dates, images, canonical_url), and runtime change detection. This reduces the query payload for posts from ~20 columns to 10, and similarly for other resource types. Also updated raw-knex.js to support `include` option for column selection, and resources.js to derive ignored-properties for change detection from the include list rather than the exclude list.

raw-knex.js existed specifically to bypass Bookshelf's per-row overhead, but still called toJSON(), fixBools(), and fixDatesWhenFetch() through the Bookshelf prototype on every row. toJSON/serialize just shallow-copy attributes with no meaningful transformation. fixDatesWhenFetch parses dates with moment.js but knex already returns JavaScript Date objects. fixBools is the only necessary operation — replaced with a pre-computed boolean column loop that runs without Bookshelf prototype lookups or moment.js overhead.

Relation queries (tags, authors) used WHERE IN with every post ID materialized as a literal — for 10k posts that's 10k string values the query planner must parse and optimize. Replaced with a subquery that mirrors the main query's NQL filter and shouldHavePosts conditions, letting the database use an index scan instead.

coderabbitai · 2026-03-03T21:54:57Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chore/url-service-boot-optimisations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ErisDS · 2026-03-03T22:22:32Z

🤖 Velo CI Failure Analysis

Classification: 🟠 SOFT FAIL

Workflow: CI
Failed Step: Legacy tests
Run: View failed run
What failed: Test assertion failure: expected 301 to equal 200 in frontend behavior tests
Why: The root cause is a test assertion failure, where the expected HTTP status code of 301 did not match the actual 200. This is a code issue, as the test is validating the application's behavior and the failure indicates a problem with the code under test.
Action:
The author should investigate the frontend behavior tests and fix the issue that is causing the assertion failure. This is likely a bug in the application code that needs to be addressed.

vikaspotluri123 · 2026-03-04T07:03:59Z

ghost/core/core/server/models/base/plugins/raw-knex.js

-                // exclude fields if provided
-                if (exclude) {
+                // select only the fields needed
+                if (include) {


Should there be a debug assertion that errors when both include and exclude are provided?

FYI this was a PoC which is being decomposed into more reviewable chunks, the first of which is here which removes exclude as its not used by anything after this change.

rob-ghost · 2026-03-04T19:33:27Z

Closing in favour of decomposing into more reviewable chunks, first is here: #26689

rob-ghost added 6 commits March 3, 2026 21:52

Cleaned up unused imports in URL service integration test

ff1434c

vikaspotluri123 reviewed Mar 4, 2026

View reviewed changes

rob-ghost closed this Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved URL service and sitemap boot performance#26676

Improved URL service and sitemap boot performance#26676
rob-ghost wants to merge 6 commits intomainfrom
chore/url-service-boot-optimisations

rob-ghost commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026

Review skipped

Uh oh!

ErisDS commented Mar 3, 2026

Uh oh!

vikaspotluri123 Mar 4, 2026

Uh oh!

rob-ghost Mar 4, 2026

Uh oh!

rob-ghost commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

rob-ghost commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026

Review skipped

Uh oh!

ErisDS commented Mar 3, 2026

🤖 Velo CI Failure Analysis

Uh oh!

vikaspotluri123 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

rob-ghost Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

rob-ghost commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants