Skip to content

feat: track bookmark indexing time and improve progress report#2278

Closed
MohamedBassem wants to merge 8 commits intomainfrom
claude/bookmark-indexing-progress-QwZSI
Closed

feat: track bookmark indexing time and improve progress report#2278
MohamedBassem wants to merge 8 commits intomainfrom
claude/bookmark-indexing-progress-QwZSI

Conversation

@MohamedBassem
Copy link
Collaborator

No description provided.

This commit adds tracking of when bookmarks are indexed for search and
improves the import session progress reporting to show detailed stages
of bookmark processing.

Changes:
- Add `lastIndexedAt` field to bookmarks table to track when each
  bookmark was last indexed by the search worker
- Update search worker to set `lastIndexedAt` timestamp after indexing
- Extend import session stats to separately track crawling, tagging,
  and indexing progress
- Update ImportSessionCard UI to display detailed progress breakdown
  with user-friendly explanations:
  * Crawling (Fetching webpage content)
  * Tagging (AI generating tags)
  * Indexing (Making searchable)
- Generate database migration for the new field
Address performance and best practice issues with indexing tracking:

- Use SQL CASE expression in groupBy to check if indexed (null vs not null)
  instead of grouping by exact timestamp values for better query performance
- Add backfill query in migration to set lastIndexedAt to createdAt for
  existing bookmarks
- Move lastIndexedAt update to onComplete handler in search worker instead
  of within the indexing function for better separation of concerns
Add tests to verify the new detailed progress breakdown functionality:

- Test that crawling, tagging, and indexing stats are tracked separately
- Test progression through all three stages with proper counts
- Test that failed states are correctly tracked for each stage
- Test that bookmarks are only marked completed when all three stages
  (crawling, tagging, and indexing) are successfully completed
- Verify that indexing status (via lastIndexedAt) is properly considered
  in overall completion status
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 20, 2025

Walkthrough

This change extends the import sessions feature by adding indexing-aware progress tracking. A new lastIndexedAt timestamp column is introduced to the bookmarks table, updated by the search worker upon successful indexing. Statistics calculations are expanded to track per-stage progress (crawling, tagging, indexing), and UI components display detailed progress breakdowns with status badges. Translations are added to support the new UI elements.

Changes

Cohort / File(s) Summary
Frontend UI & Translations
apps/web/components/settings/ImportSessionCard.tsx, apps/web/lib/i18n/locales/en/translation.json
Added "Detailed Progress Breakdown" and "Overall Status Badges" UI sections displaying per-stage progress (crawling, tagging, indexing) with completion counters and status indicators. Extended translation keys to support new stage-specific UI labels and status descriptions.
Database Schema & Migrations
packages/db/schema.ts, packages/db/drizzle/0073_add_last_indexed_at_to_bookmarks.sql, packages/db/drizzle/meta/_journal.json
Added lastIndexedAt integer timestamp column to bookmarks table. New migration backfills existing records by setting lastIndexedAt to createdAt. Journal entry updated to reflect migration version 73.
Type Definitions
packages/shared/types/importSessions.ts
Extended ZImportSessionWithStats schema with per-stage progress fields: crawlingPending, crawlingCompleted, crawlingFailed, taggingPending, taggingCompleted, taggingFailed, indexingPending, indexingCompleted.
Backend Processing Logic
apps/workers/workers/searchWorker.ts, packages/trpc/models/importSessions.ts
Search worker onComplete handler now asynchronously updates bookmarks.lastIndexedAt on successful indexing. TRPC model extended with indexing-aware statistics: added computed isIndexed flag to aggregated query, adjusted groupBy and per-status counting logic to track indexed vs. non-indexed bookmarks across all three stages, modified completion criteria to require indexing success alongside crawling and tagging.
Tests
packages/trpc/routers/importSessions.test.ts
Added three new test blocks covering detailed progress tracking across crawling/tagging/indexing state transitions, failed state scenarios, and completion logic requiring all three stages to finish. Imported bookmarkLinks from db/schema for table updates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Key areas requiring extra attention:

  • packages/trpc/models/importSessions.ts: Complex SQL query logic with new groupBy dimensions, isIndexed computation, and multi-stage pending/completed conditions. Verify correctness of per-status counting and completion classification logic, especially the interaction between crawling, tagging, and indexing states.
  • apps/workers/workers/searchWorker.ts: Async handler now performs DB writes with side-effects. Confirm error handling, transaction safety, and that the timestamp update doesn't conflict with concurrent operations.
  • packages/trpc/routers/importSessions.test.ts: Three new test scenarios covering state transitions and edge cases. Verify test data setup and assertions align with updated completion logic.

Possibly related PRs

  • feat: Revamp import experience #2001: Foundational PR that introduced the Import Sessions feature; this PR extends it by adding per-stage progress tracking and indexing awareness to the same UI components and type schemas.

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to evaluate whether the description relates to the changeset. Add a pull request description explaining the motivation, implementation details, and impact of tracking bookmark indexing time and the new progress breakdown UI.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding bookmark indexing time tracking and improving the progress report UI with detailed per-stage breakdown.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
apps/workers/workers/searchWorker.ts (1)

29-41: Consider error handling for the DB update.

The implementation correctly updates lastIndexedAt only for "index" operations after successful completion. However, if the database update fails, the error would propagate and potentially cause issues. Consider wrapping in a try-catch to ensure indexing completion isn't affected by timestamp update failures.

🔎 Suggested defensive error handling
          onComplete: async (job) => {
            workerStatsCounter.labels("search", "completed").inc();
            const jobId = job.id;
            logger.info(`[search][${jobId}] Completed successfully`);

            // Update the lastIndexedAt timestamp after successful indexing
            const request = zSearchIndexingRequestSchema.safeParse(job.data);
            if (request.success && request.data.type === "index") {
+             try {
                await db
                  .update(bookmarks)
                  .set({ lastIndexedAt: new Date() })
                  .where(eq(bookmarks.id, request.data.bookmarkId));
+             } catch (error) {
+               logger.warn(
+                 `[search][${jobId}] Failed to update lastIndexedAt: ${error}`,
+               );
+             }
            }
          },
packages/trpc/models/importSessions.ts (1)

154-161: Consider handling taggingStatus === null for consistency.

Unlike crawling (line 148) where null is counted as completed, taggingStatus === null isn't counted in any detailed tagging bucket. This could cause the sum of taggingPending + taggingCompleted + taggingFailed to be less than totalBookmarks in edge cases (e.g., if the LEFT JOIN yields no matching bookmark).

🔎 Proposed fix for null handling consistency
      // Track tagging status
      if (taggingStatus === "pending") {
        stats.taggingPending += count;
-     } else if (taggingStatus === "success") {
+     } else if (taggingStatus === "success" || taggingStatus === null) {
        stats.taggingCompleted += count;
      } else if (taggingStatus === "failure") {
        stats.taggingFailed += count;
      }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bdba54 and ddf7278.

📒 Files selected for processing (10)
  • apps/web/components/settings/ImportSessionCard.tsx
  • apps/web/lib/i18n/locales/en/translation.json
  • apps/workers/workers/searchWorker.ts
  • packages/db/drizzle/0073_add_last_indexed_at_to_bookmarks.sql
  • packages/db/drizzle/meta/0073_snapshot.json
  • packages/db/drizzle/meta/_journal.json
  • packages/db/schema.ts
  • packages/shared/types/importSessions.ts
  • packages/trpc/models/importSessions.ts
  • packages/trpc/routers/importSessions.test.ts
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{ts,tsx,js,jsx,json,css,md}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier according to project standards

Files:

  • packages/db/drizzle/meta/_journal.json
  • packages/shared/types/importSessions.ts
  • apps/web/lib/i18n/locales/en/translation.json
  • apps/web/components/settings/ImportSessionCard.tsx
  • packages/db/schema.ts
  • packages/trpc/models/importSessions.ts
  • packages/trpc/routers/importSessions.test.ts
  • apps/workers/workers/searchWorker.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use TypeScript for type safety in all source files

Files:

  • packages/shared/types/importSessions.ts
  • apps/web/components/settings/ImportSessionCard.tsx
  • packages/db/schema.ts
  • packages/trpc/models/importSessions.ts
  • packages/trpc/routers/importSessions.test.ts
  • apps/workers/workers/searchWorker.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Lint code using oxlint and fix issues with pnpm lint:fix

Files:

  • packages/shared/types/importSessions.ts
  • apps/web/components/settings/ImportSessionCard.tsx
  • packages/db/schema.ts
  • packages/trpc/models/importSessions.ts
  • packages/trpc/routers/importSessions.test.ts
  • apps/workers/workers/searchWorker.ts
packages/shared/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Organize shared code and types in the packages/shared directory for use across packages

Files:

  • packages/shared/types/importSessions.ts
apps/web/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

apps/web/**/*.{ts,tsx}: Use Tailwind CSS for styling in the web application
Use Next.js for building the main web application

Files:

  • apps/web/components/settings/ImportSessionCard.tsx
packages/db/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Drizzle ORM for database schema and migrations in the db package

Files:

  • packages/db/schema.ts
packages/trpc/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Organize business logic in the tRPC router and procedures located in packages/trpc

Files:

  • packages/trpc/models/importSessions.ts
  • packages/trpc/routers/importSessions.test.ts
**/*.{test,spec}.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Vitest for writing and running tests

Files:

  • packages/trpc/routers/importSessions.test.ts
🧠 Learnings (4)
📚 Learning: 2025-10-04T10:37:57.828Z
Learnt from: MohamedBassem
Repo: karakeep-app/karakeep PR: 2001
File: packages/trpc/models/importSessions.ts:51-57
Timestamp: 2025-10-04T10:37:57.828Z
Learning: In projects using Drizzle ORM with `createdAtField()` and `modifiedAtField()` helper functions in the schema definition (as seen in packages/db/schema.ts), Drizzle automatically populates the `createdAt` and `modifiedAt` fields during insert operations, so they do not need to be set explicitly in the insert values.

Applied to files:

  • packages/db/schema.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/db/**/*.{ts,tsx} : Use Drizzle ORM for database schema and migrations in the db package

Applied to files:

  • packages/trpc/models/importSessions.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/trpc/**/*.{ts,tsx} : Organize business logic in the tRPC router and procedures located in `packages/trpc`

Applied to files:

  • packages/trpc/models/importSessions.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to **/*.{test,spec}.{ts,tsx,js,jsx} : Use Vitest for writing and running tests

Applied to files:

  • packages/trpc/routers/importSessions.test.ts
🧬 Code graph analysis (3)
packages/trpc/models/importSessions.ts (1)
packages/db/schema.ts (2)
  • bookmarkLinks (230-258)
  • bookmarks (168-228)
packages/trpc/routers/importSessions.test.ts (2)
packages/trpc/testUtils.ts (1)
  • CustomTestContext (59-63)
packages/db/schema.ts (2)
  • bookmarkLinks (230-258)
  • bookmarks (168-228)
apps/workers/workers/searchWorker.ts (2)
packages/shared-server/src/queues.ts (1)
  • zSearchIndexingRequestSchema (55-58)
packages/db/schema.ts (1)
  • bookmarks (168-228)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: typecheck
🔇 Additional comments (12)
packages/db/drizzle/meta/_journal.json (1)

516-522: LGTM!

The migration journal entry follows the established pattern with correct idx sequencing, version, and format.

apps/web/lib/i18n/locales/en/translation.json (1)

399-419: LGTM!

The translation strings are well-structured and consistent with the UI implementation. The absence of a "failed" key for the indexing stage aligns with the component that doesn't render a failed state for indexing.

packages/db/drizzle/0073_add_last_indexed_at_to_bookmarks.sql (1)

1-5: Reasonable backfill strategy with a caveat.

The backfill sets lastIndexedAt = createdAt for existing bookmarks, which is a practical assumption. However, if search indexing was ever disabled or failed for existing bookmarks, they would incorrectly appear as indexed. This is likely acceptable given:

  1. It's a reasonable default for most deployments
  2. Users can trigger re-indexing if needed

The comment documenting the assumption is helpful.

packages/db/schema.ts (1)

208-208: LGTM!

The lastIndexedAt column is correctly defined as a nullable timestamp. The nullable design is appropriate since null indicates the bookmark hasn't been indexed yet, which aligns with the isIndexed computation in the stats logic.

apps/web/components/settings/ImportSessionCard.tsx (1)

185-309: LGTM!

The detailed progress breakdown UI is well-structured with consistent patterns across all three stages. The implementation correctly:

  • Uses translation keys that match the added i18n strings
  • Conditionally renders pending/failed messages only when counts are non-zero
  • Omits "failed" display for indexing (consistent with the data model)
  • Follows Tailwind CSS styling conventions per coding guidelines
packages/trpc/routers/importSessions.test.ts (3)

235-364: LGTM! Comprehensive test coverage for detailed progress tracking.

This test thoroughly validates the state transitions through all three stages (crawling → tagging → indexing) and verifies the final completion status. The test correctly validates:

  • Initial pending states for text vs link bookmarks (text doesn't need crawling)
  • Incremental completion of each stage
  • Final "completed" status when all stages finish

366-417: LGTM! Good failure scenario coverage.

Tests correctly verify that crawling and tagging failures are tracked in the detailed progress breakdown and reflected in failedBookmarks count.


419-461: LGTM! Critical test for the new completion criteria.

This test validates the key behavioral change: a bookmark is only considered "completed" when all three stages (crawling, tagging, and indexing) are finished. It correctly asserts that taggingStatus: "success" alone is insufficient—the lastIndexedAt timestamp must also be set.

packages/shared/types/importSessions.ts (1)

39-47: LGTM! Schema extension for detailed progress tracking looks correct.

The asymmetry between indexing (pending/completed only) and crawling/tagging (pending/completed/failed) is appropriate since lastIndexedAt is a timestamp rather than a status enum—indexing either happened or it hasn't.

packages/trpc/models/importSessions.ts (3)

92-121: LGTM! Query structure is correct.

The isIndexed CASE expression is properly duplicated in both SELECT and GROUP BY, which is required SQL semantics. Using Drizzle's sql template literal ensures proper parameterization.


170-201: LGTM! Overall status calculation correctly incorporates indexing requirement.

The logic properly ensures a bookmark is only considered complete when all three stages (crawling, tagging, indexing) are successful. The fallback to pending on lines 199-200 is good defensive programming.


123-138: LGTM! Stats object initialization is correct and aligns with the schema.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 10 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/shared/types/importSessions.ts">

<violation number="1" location="packages/shared/types/importSessions.ts:47">
P2: Missing `indexingFailed` field. The crawling and tagging progress breakdowns each have Pending, Completed, and Failed states, but indexing is missing the Failed state. This appears to be an oversight.</violation>
</file>

<file name="packages/trpc/routers/importSessions.test.ts">

<violation number="1" location="packages/trpc/routers/importSessions.test.ts:419">
P1: This test contradicts the existing test &#39;marks text-only imports as completed when tagging succeeds&#39; (line 131) which expects `completedBookmarks: 1` when only tagging is successful. If the new behavior requires indexing for completion, the existing test should be updated or removed to reflect the new requirements. These tests cannot both pass simultaneously.</violation>
</file>

<file name="packages/trpc/models/importSessions.ts">

<violation number="1" location="packages/trpc/models/importSessions.ts:157">
P2: Inconsistent null handling: `taggingStatus === null` is not handled in the detailed tagging tracking, unlike the crawling status tracking which includes `null` in `crawlingCompleted`. This could cause detailed tagging stats to not sum to `totalBookmarks`.</violation>
</file>

<file name="apps/workers/workers/searchWorker.ts">

<violation number="1" location="apps/workers/workers/searchWorker.ts:37">
P2: The async database update lacks error handling. If updating `lastIndexedAt` fails, it could cause an unhandled promise rejection even though the indexing itself completed successfully. Consider wrapping this in a try/catch to log the error without affecting the job completion status.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

taggingCompleted: z.number(),
taggingFailed: z.number(),
indexingPending: z.number(),
indexingCompleted: z.number(),
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing indexingFailed field. The crawling and tagging progress breakdowns each have Pending, Completed, and Failed states, but indexing is missing the Failed state. This appears to be an oversight.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/shared/types/importSessions.ts, line 47:

<comment>Missing `indexingFailed` field. The crawling and tagging progress breakdowns each have Pending, Completed, and Failed states, but indexing is missing the Failed state. This appears to be an oversight.</comment>

<file context>
@@ -36,6 +36,15 @@ export const zImportSessionWithStatsSchema = zImportSessionSchema.extend({
+  taggingCompleted: z.number(),
+  taggingFailed: z.number(),
+  indexingPending: z.number(),
+  indexingCompleted: z.number(),
 });
 export type ZImportSessionWithStats = z.infer&lt;
</file context>
Suggested change
indexingCompleted: z.number(),
indexingCompleted: z.number(),
indexingFailed: z.number(),
Fix with Cubic

});
});

test<CustomTestContext>("considers bookmark completed only when crawled, tagged, and indexed", async ({
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: This test contradicts the existing test 'marks text-only imports as completed when tagging succeeds' (line 131) which expects completedBookmarks: 1 when only tagging is successful. If the new behavior requires indexing for completion, the existing test should be updated or removed to reflect the new requirements. These tests cannot both pass simultaneously.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/trpc/routers/importSessions.test.ts, line 419:

<comment>This test contradicts the existing test &#39;marks text-only imports as completed when tagging succeeds&#39; (line 131) which expects `completedBookmarks: 1` when only tagging is successful. If the new behavior requires indexing for completion, the existing test should be updated or removed to reflect the new requirements. These tests cannot both pass simultaneously.</comment>

<file context>
@@ -231,4 +231,232 @@ describe(&quot;ImportSessions Routes&quot;, () =&gt; {
+    });
+  });
+
+  test&lt;CustomTestContext&gt;(&quot;considers bookmark completed only when crawled, tagged, and indexed&quot;, async ({
+    apiCallers,
+    db,
</file context>
Fix with Cubic

// Track tagging status
if (taggingStatus === "pending") {
stats.taggingPending += count;
} else if (taggingStatus === "success") {
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Inconsistent null handling: taggingStatus === null is not handled in the detailed tagging tracking, unlike the crawling status tracking which includes null in crawlingCompleted. This could cause detailed tagging stats to not sum to totalBookmarks.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/trpc/models/importSessions.ts, line 157:

<comment>Inconsistent null handling: `taggingStatus === null` is not handled in the detailed tagging tracking, unlike the crawling status tracking which includes `null` in `crawlingCompleted`. This could cause detailed tagging stats to not sum to `totalBookmarks`.</comment>

<file context>
@@ -112,21 +114,60 @@ export class ImportSession {
+      // Track tagging status
+      if (taggingStatus === &quot;pending&quot;) {
+        stats.taggingPending += count;
+      } else if (taggingStatus === &quot;success&quot;) {
+        stats.taggingCompleted += count;
+      } else if (taggingStatus === &quot;failure&quot;) {
</file context>
Suggested change
} else if (taggingStatus === "success") {
} else if (taggingStatus === "success" || taggingStatus === null) {
Fix with Cubic

// Update the lastIndexedAt timestamp after successful indexing
const request = zSearchIndexingRequestSchema.safeParse(job.data);
if (request.success && request.data.type === "index") {
await db
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The async database update lacks error handling. If updating lastIndexedAt fails, it could cause an unhandled promise rejection even though the indexing itself completed successfully. Consider wrapping this in a try/catch to log the error without affecting the job completion status.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/workers/workers/searchWorker.ts, line 37:

<comment>The async database update lacks error handling. If updating `lastIndexedAt` fails, it could cause an unhandled promise rejection even though the indexing itself completed successfully. Consider wrapping this in a try/catch to log the error without affecting the job completion status.</comment>

<file context>
@@ -26,11 +26,19 @@ export class SearchIndexingWorker {
+            // Update the lastIndexedAt timestamp after successful indexing
+            const request = zSearchIndexingRequestSchema.safeParse(job.data);
+            if (request.success &amp;&amp; request.data.type === &quot;index&quot;) {
+              await db
+                .update(bookmarks)
+                .set({ lastIndexedAt: new Date() })
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants