feat: track bookmark indexing time and improve progress report#2278
feat: track bookmark indexing time and improve progress report#2278MohamedBassem wants to merge 8 commits intomainfrom
Conversation
This commit adds tracking of when bookmarks are indexed for search and improves the import session progress reporting to show detailed stages of bookmark processing. Changes: - Add `lastIndexedAt` field to bookmarks table to track when each bookmark was last indexed by the search worker - Update search worker to set `lastIndexedAt` timestamp after indexing - Extend import session stats to separately track crawling, tagging, and indexing progress - Update ImportSessionCard UI to display detailed progress breakdown with user-friendly explanations: * Crawling (Fetching webpage content) * Tagging (AI generating tags) * Indexing (Making searchable) - Generate database migration for the new field
Address performance and best practice issues with indexing tracking: - Use SQL CASE expression in groupBy to check if indexed (null vs not null) instead of grouping by exact timestamp values for better query performance - Add backfill query in migration to set lastIndexedAt to createdAt for existing bookmarks - Move lastIndexedAt update to onComplete handler in search worker instead of within the indexing function for better separation of concerns
Add tests to verify the new detailed progress breakdown functionality: - Test that crawling, tagging, and indexing stats are tracked separately - Test progression through all three stages with proper counts - Test that failed states are correctly tracked for each stage - Test that bookmarks are only marked completed when all three stages (crawling, tagging, and indexing) are successfully completed - Verify that indexing status (via lastIndexedAt) is properly considered in overall completion status
WalkthroughThis change extends the import sessions feature by adding indexing-aware progress tracking. A new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Key areas requiring extra attention:
Possibly related PRs
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
apps/workers/workers/searchWorker.ts (1)
29-41: Consider error handling for the DB update.The implementation correctly updates
lastIndexedAtonly for "index" operations after successful completion. However, if the database update fails, the error would propagate and potentially cause issues. Consider wrapping in a try-catch to ensure indexing completion isn't affected by timestamp update failures.🔎 Suggested defensive error handling
onComplete: async (job) => { workerStatsCounter.labels("search", "completed").inc(); const jobId = job.id; logger.info(`[search][${jobId}] Completed successfully`); // Update the lastIndexedAt timestamp after successful indexing const request = zSearchIndexingRequestSchema.safeParse(job.data); if (request.success && request.data.type === "index") { + try { await db .update(bookmarks) .set({ lastIndexedAt: new Date() }) .where(eq(bookmarks.id, request.data.bookmarkId)); + } catch (error) { + logger.warn( + `[search][${jobId}] Failed to update lastIndexedAt: ${error}`, + ); + } } },packages/trpc/models/importSessions.ts (1)
154-161: Consider handlingtaggingStatus === nullfor consistency.Unlike crawling (line 148) where
nullis counted as completed,taggingStatus === nullisn't counted in any detailed tagging bucket. This could cause the sum oftaggingPending + taggingCompleted + taggingFailedto be less thantotalBookmarksin edge cases (e.g., if the LEFT JOIN yields no matching bookmark).🔎 Proposed fix for null handling consistency
// Track tagging status if (taggingStatus === "pending") { stats.taggingPending += count; - } else if (taggingStatus === "success") { + } else if (taggingStatus === "success" || taggingStatus === null) { stats.taggingCompleted += count; } else if (taggingStatus === "failure") { stats.taggingFailed += count; }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
apps/web/components/settings/ImportSessionCard.tsxapps/web/lib/i18n/locales/en/translation.jsonapps/workers/workers/searchWorker.tspackages/db/drizzle/0073_add_last_indexed_at_to_bookmarks.sqlpackages/db/drizzle/meta/0073_snapshot.jsonpackages/db/drizzle/meta/_journal.jsonpackages/db/schema.tspackages/shared/types/importSessions.tspackages/trpc/models/importSessions.tspackages/trpc/routers/importSessions.test.ts
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{ts,tsx,js,jsx,json,css,md}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier according to project standards
Files:
packages/db/drizzle/meta/_journal.jsonpackages/shared/types/importSessions.tsapps/web/lib/i18n/locales/en/translation.jsonapps/web/components/settings/ImportSessionCard.tsxpackages/db/schema.tspackages/trpc/models/importSessions.tspackages/trpc/routers/importSessions.test.tsapps/workers/workers/searchWorker.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use TypeScript for type safety in all source files
Files:
packages/shared/types/importSessions.tsapps/web/components/settings/ImportSessionCard.tsxpackages/db/schema.tspackages/trpc/models/importSessions.tspackages/trpc/routers/importSessions.test.tsapps/workers/workers/searchWorker.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Lint code using oxlint and fix issues with
pnpm lint:fix
Files:
packages/shared/types/importSessions.tsapps/web/components/settings/ImportSessionCard.tsxpackages/db/schema.tspackages/trpc/models/importSessions.tspackages/trpc/routers/importSessions.test.tsapps/workers/workers/searchWorker.ts
packages/shared/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Organize shared code and types in the
packages/shareddirectory for use across packages
Files:
packages/shared/types/importSessions.ts
apps/web/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
apps/web/**/*.{ts,tsx}: Use Tailwind CSS for styling in the web application
Use Next.js for building the main web application
Files:
apps/web/components/settings/ImportSessionCard.tsx
packages/db/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use Drizzle ORM for database schema and migrations in the db package
Files:
packages/db/schema.ts
packages/trpc/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Organize business logic in the tRPC router and procedures located in
packages/trpc
Files:
packages/trpc/models/importSessions.tspackages/trpc/routers/importSessions.test.ts
**/*.{test,spec}.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use Vitest for writing and running tests
Files:
packages/trpc/routers/importSessions.test.ts
🧠 Learnings (4)
📚 Learning: 2025-10-04T10:37:57.828Z
Learnt from: MohamedBassem
Repo: karakeep-app/karakeep PR: 2001
File: packages/trpc/models/importSessions.ts:51-57
Timestamp: 2025-10-04T10:37:57.828Z
Learning: In projects using Drizzle ORM with `createdAtField()` and `modifiedAtField()` helper functions in the schema definition (as seen in packages/db/schema.ts), Drizzle automatically populates the `createdAt` and `modifiedAt` fields during insert operations, so they do not need to be set explicitly in the insert values.
Applied to files:
packages/db/schema.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/db/**/*.{ts,tsx} : Use Drizzle ORM for database schema and migrations in the db package
Applied to files:
packages/trpc/models/importSessions.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to packages/trpc/**/*.{ts,tsx} : Organize business logic in the tRPC router and procedures located in `packages/trpc`
Applied to files:
packages/trpc/models/importSessions.ts
📚 Learning: 2025-11-27T16:02:47.440Z
Learnt from: CR
Repo: karakeep-app/karakeep PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T16:02:47.440Z
Learning: Applies to **/*.{test,spec}.{ts,tsx,js,jsx} : Use Vitest for writing and running tests
Applied to files:
packages/trpc/routers/importSessions.test.ts
🧬 Code graph analysis (3)
packages/trpc/models/importSessions.ts (1)
packages/db/schema.ts (2)
bookmarkLinks(230-258)bookmarks(168-228)
packages/trpc/routers/importSessions.test.ts (2)
packages/trpc/testUtils.ts (1)
CustomTestContext(59-63)packages/db/schema.ts (2)
bookmarkLinks(230-258)bookmarks(168-228)
apps/workers/workers/searchWorker.ts (2)
packages/shared-server/src/queues.ts (1)
zSearchIndexingRequestSchema(55-58)packages/db/schema.ts (1)
bookmarks(168-228)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: typecheck
🔇 Additional comments (12)
packages/db/drizzle/meta/_journal.json (1)
516-522: LGTM!The migration journal entry follows the established pattern with correct idx sequencing, version, and format.
apps/web/lib/i18n/locales/en/translation.json (1)
399-419: LGTM!The translation strings are well-structured and consistent with the UI implementation. The absence of a "failed" key for the indexing stage aligns with the component that doesn't render a failed state for indexing.
packages/db/drizzle/0073_add_last_indexed_at_to_bookmarks.sql (1)
1-5: Reasonable backfill strategy with a caveat.The backfill sets
lastIndexedAt = createdAtfor existing bookmarks, which is a practical assumption. However, if search indexing was ever disabled or failed for existing bookmarks, they would incorrectly appear as indexed. This is likely acceptable given:
- It's a reasonable default for most deployments
- Users can trigger re-indexing if needed
The comment documenting the assumption is helpful.
packages/db/schema.ts (1)
208-208: LGTM!The
lastIndexedAtcolumn is correctly defined as a nullable timestamp. The nullable design is appropriate sincenullindicates the bookmark hasn't been indexed yet, which aligns with theisIndexedcomputation in the stats logic.apps/web/components/settings/ImportSessionCard.tsx (1)
185-309: LGTM!The detailed progress breakdown UI is well-structured with consistent patterns across all three stages. The implementation correctly:
- Uses translation keys that match the added i18n strings
- Conditionally renders pending/failed messages only when counts are non-zero
- Omits "failed" display for indexing (consistent with the data model)
- Follows Tailwind CSS styling conventions per coding guidelines
packages/trpc/routers/importSessions.test.ts (3)
235-364: LGTM! Comprehensive test coverage for detailed progress tracking.This test thoroughly validates the state transitions through all three stages (crawling → tagging → indexing) and verifies the final completion status. The test correctly validates:
- Initial pending states for text vs link bookmarks (text doesn't need crawling)
- Incremental completion of each stage
- Final "completed" status when all stages finish
366-417: LGTM! Good failure scenario coverage.Tests correctly verify that crawling and tagging failures are tracked in the detailed progress breakdown and reflected in
failedBookmarkscount.
419-461: LGTM! Critical test for the new completion criteria.This test validates the key behavioral change: a bookmark is only considered "completed" when all three stages (crawling, tagging, and indexing) are finished. It correctly asserts that
taggingStatus: "success"alone is insufficient—thelastIndexedAttimestamp must also be set.packages/shared/types/importSessions.ts (1)
39-47: LGTM! Schema extension for detailed progress tracking looks correct.The asymmetry between indexing (pending/completed only) and crawling/tagging (pending/completed/failed) is appropriate since
lastIndexedAtis a timestamp rather than a status enum—indexing either happened or it hasn't.packages/trpc/models/importSessions.ts (3)
92-121: LGTM! Query structure is correct.The
isIndexedCASE expression is properly duplicated in both SELECT and GROUP BY, which is required SQL semantics. Using Drizzle'ssqltemplate literal ensures proper parameterization.
170-201: LGTM! Overall status calculation correctly incorporates indexing requirement.The logic properly ensures a bookmark is only considered complete when all three stages (crawling, tagging, indexing) are successful. The fallback to pending on lines 199-200 is good defensive programming.
123-138: LGTM! Stats object initialization is correct and aligns with the schema.
There was a problem hiding this comment.
4 issues found across 10 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="packages/shared/types/importSessions.ts">
<violation number="1" location="packages/shared/types/importSessions.ts:47">
P2: Missing `indexingFailed` field. The crawling and tagging progress breakdowns each have Pending, Completed, and Failed states, but indexing is missing the Failed state. This appears to be an oversight.</violation>
</file>
<file name="packages/trpc/routers/importSessions.test.ts">
<violation number="1" location="packages/trpc/routers/importSessions.test.ts:419">
P1: This test contradicts the existing test 'marks text-only imports as completed when tagging succeeds' (line 131) which expects `completedBookmarks: 1` when only tagging is successful. If the new behavior requires indexing for completion, the existing test should be updated or removed to reflect the new requirements. These tests cannot both pass simultaneously.</violation>
</file>
<file name="packages/trpc/models/importSessions.ts">
<violation number="1" location="packages/trpc/models/importSessions.ts:157">
P2: Inconsistent null handling: `taggingStatus === null` is not handled in the detailed tagging tracking, unlike the crawling status tracking which includes `null` in `crawlingCompleted`. This could cause detailed tagging stats to not sum to `totalBookmarks`.</violation>
</file>
<file name="apps/workers/workers/searchWorker.ts">
<violation number="1" location="apps/workers/workers/searchWorker.ts:37">
P2: The async database update lacks error handling. If updating `lastIndexedAt` fails, it could cause an unhandled promise rejection even though the indexing itself completed successfully. Consider wrapping this in a try/catch to log the error without affecting the job completion status.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| taggingCompleted: z.number(), | ||
| taggingFailed: z.number(), | ||
| indexingPending: z.number(), | ||
| indexingCompleted: z.number(), |
There was a problem hiding this comment.
P2: Missing indexingFailed field. The crawling and tagging progress breakdowns each have Pending, Completed, and Failed states, but indexing is missing the Failed state. This appears to be an oversight.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/shared/types/importSessions.ts, line 47:
<comment>Missing `indexingFailed` field. The crawling and tagging progress breakdowns each have Pending, Completed, and Failed states, but indexing is missing the Failed state. This appears to be an oversight.</comment>
<file context>
@@ -36,6 +36,15 @@ export const zImportSessionWithStatsSchema = zImportSessionSchema.extend({
+ taggingCompleted: z.number(),
+ taggingFailed: z.number(),
+ indexingPending: z.number(),
+ indexingCompleted: z.number(),
});
export type ZImportSessionWithStats = z.infer<
</file context>
| indexingCompleted: z.number(), | |
| indexingCompleted: z.number(), | |
| indexingFailed: z.number(), |
| }); | ||
| }); | ||
|
|
||
| test<CustomTestContext>("considers bookmark completed only when crawled, tagged, and indexed", async ({ |
There was a problem hiding this comment.
P1: This test contradicts the existing test 'marks text-only imports as completed when tagging succeeds' (line 131) which expects completedBookmarks: 1 when only tagging is successful. If the new behavior requires indexing for completion, the existing test should be updated or removed to reflect the new requirements. These tests cannot both pass simultaneously.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/trpc/routers/importSessions.test.ts, line 419:
<comment>This test contradicts the existing test 'marks text-only imports as completed when tagging succeeds' (line 131) which expects `completedBookmarks: 1` when only tagging is successful. If the new behavior requires indexing for completion, the existing test should be updated or removed to reflect the new requirements. These tests cannot both pass simultaneously.</comment>
<file context>
@@ -231,4 +231,232 @@ describe("ImportSessions Routes", () => {
+ });
+ });
+
+ test<CustomTestContext>("considers bookmark completed only when crawled, tagged, and indexed", async ({
+ apiCallers,
+ db,
</file context>
| // Track tagging status | ||
| if (taggingStatus === "pending") { | ||
| stats.taggingPending += count; | ||
| } else if (taggingStatus === "success") { |
There was a problem hiding this comment.
P2: Inconsistent null handling: taggingStatus === null is not handled in the detailed tagging tracking, unlike the crawling status tracking which includes null in crawlingCompleted. This could cause detailed tagging stats to not sum to totalBookmarks.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/trpc/models/importSessions.ts, line 157:
<comment>Inconsistent null handling: `taggingStatus === null` is not handled in the detailed tagging tracking, unlike the crawling status tracking which includes `null` in `crawlingCompleted`. This could cause detailed tagging stats to not sum to `totalBookmarks`.</comment>
<file context>
@@ -112,21 +114,60 @@ export class ImportSession {
+ // Track tagging status
+ if (taggingStatus === "pending") {
+ stats.taggingPending += count;
+ } else if (taggingStatus === "success") {
+ stats.taggingCompleted += count;
+ } else if (taggingStatus === "failure") {
</file context>
| } else if (taggingStatus === "success") { | |
| } else if (taggingStatus === "success" || taggingStatus === null) { |
| // Update the lastIndexedAt timestamp after successful indexing | ||
| const request = zSearchIndexingRequestSchema.safeParse(job.data); | ||
| if (request.success && request.data.type === "index") { | ||
| await db |
There was a problem hiding this comment.
P2: The async database update lacks error handling. If updating lastIndexedAt fails, it could cause an unhandled promise rejection even though the indexing itself completed successfully. Consider wrapping this in a try/catch to log the error without affecting the job completion status.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/workers/workers/searchWorker.ts, line 37:
<comment>The async database update lacks error handling. If updating `lastIndexedAt` fails, it could cause an unhandled promise rejection even though the indexing itself completed successfully. Consider wrapping this in a try/catch to log the error without affecting the job completion status.</comment>
<file context>
@@ -26,11 +26,19 @@ export class SearchIndexingWorker {
+ // Update the lastIndexedAt timestamp after successful indexing
+ const request = zSearchIndexingRequestSchema.safeParse(job.data);
+ if (request.success && request.data.type === "index") {
+ await db
+ .update(bookmarks)
+ .set({ lastIndexedAt: new Date() })
</file context>
No description provided.