feat: track bookmark indexing time and improve progress report by MohamedBassem · Pull Request #2278 · karakeep-app/karakeep

MohamedBassem · 2025-12-20T10:31:51Z

No description provided.

This commit adds tracking of when bookmarks are indexed for search and improves the import session progress reporting to show detailed stages of bookmark processing. Changes: - Add `lastIndexedAt` field to bookmarks table to track when each bookmark was last indexed by the search worker - Update search worker to set `lastIndexedAt` timestamp after indexing - Extend import session stats to separately track crawling, tagging, and indexing progress - Update ImportSessionCard UI to display detailed progress breakdown with user-friendly explanations: * Crawling (Fetching webpage content) * Tagging (AI generating tags) * Indexing (Making searchable) - Generate database migration for the new field

Address performance and best practice issues with indexing tracking: - Use SQL CASE expression in groupBy to check if indexed (null vs not null) instead of grouping by exact timestamp values for better query performance - Add backfill query in migration to set lastIndexedAt to createdAt for existing bookmarks - Move lastIndexedAt update to onComplete handler in search worker instead of within the indexing function for better separation of concerns

Add tests to verify the new detailed progress breakdown functionality: - Test that crawling, tagging, and indexing stats are tracked separately - Test progression through all three stages with proper counts - Test that failed states are correctly tracked for each stage - Test that bookmarks are only marked completed when all three stages (crawling, tagging, and indexing) are successfully completed - Verify that indexing status (via lastIndexedAt) is properly considered in overall completion status

coderabbitai · 2025-12-20T10:32:02Z

Walkthrough

This change extends the import sessions feature by adding indexing-aware progress tracking. A new lastIndexedAt timestamp column is introduced to the bookmarks table, updated by the search worker upon successful indexing. Statistics calculations are expanded to track per-stage progress (crawling, tagging, indexing), and UI components display detailed progress breakdowns with status badges. Translations are added to support the new UI elements.

Changes

Cohort / File(s)	Summary
Frontend UI & Translations `apps/web/components/settings/ImportSessionCard.tsx`, `apps/web/lib/i18n/locales/en/translation.json`	Added "Detailed Progress Breakdown" and "Overall Status Badges" UI sections displaying per-stage progress (crawling, tagging, indexing) with completion counters and status indicators. Extended translation keys to support new stage-specific UI labels and status descriptions.
Database Schema & Migrations `packages/db/schema.ts`, `packages/db/drizzle/0073_add_last_indexed_at_to_bookmarks.sql`, `packages/db/drizzle/meta/_journal.json`	Added `lastIndexedAt` integer timestamp column to bookmarks table. New migration backfills existing records by setting `lastIndexedAt` to `createdAt`. Journal entry updated to reflect migration version 73.
Type Definitions `packages/shared/types/importSessions.ts`	Extended `ZImportSessionWithStats` schema with per-stage progress fields: `crawlingPending`, `crawlingCompleted`, `crawlingFailed`, `taggingPending`, `taggingCompleted`, `taggingFailed`, `indexingPending`, `indexingCompleted`.
Backend Processing Logic `apps/workers/workers/searchWorker.ts`, `packages/trpc/models/importSessions.ts`	Search worker onComplete handler now asynchronously updates `bookmarks.lastIndexedAt` on successful indexing. TRPC model extended with indexing-aware statistics: added computed `isIndexed` flag to aggregated query, adjusted groupBy and per-status counting logic to track indexed vs. non-indexed bookmarks across all three stages, modified completion criteria to require indexing success alongside crawling and tagging.
Tests `packages/trpc/routers/importSessions.test.ts`	Added three new test blocks covering detailed progress tracking across crawling/tagging/indexing state transitions, failed state scenarios, and completion logic requiring all three stages to finish. Imported `bookmarkLinks` from db/schema for table updates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Key areas requiring extra attention:

packages/trpc/models/importSessions.ts: Complex SQL query logic with new groupBy dimensions, isIndexed computation, and multi-stage pending/completed conditions. Verify correctness of per-status counting and completion classification logic, especially the interaction between crawling, tagging, and indexing states.
apps/workers/workers/searchWorker.ts: Async handler now performs DB writes with side-effects. Confirm error handling, transaction safety, and that the timestamp update doesn't conflict with concurrent operations.
packages/trpc/routers/importSessions.test.ts: Three new test scenarios covering state transitions and edge cases. Verify test data setup and assertions align with updated completion logic.

Possibly related PRs

feat: Revamp import experience #2001: Foundational PR that introduced the Import Sessions feature; this PR extends it by adding per-stage progress tracking and indexing awareness to the same UI components and type schemas.

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to evaluate whether the description relates to the changeset.	Add a pull request description explaining the motivation, implementation details, and impact of tracking bookmark indexing time and the new progress breakdown UI.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: adding bookmark indexing time tracking and improving the progress report UI with detailed per-stage breakdown.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

apps/workers/workers/searchWorker.ts (1)
29-41: Consider error handling for the DB update.

The implementation correctly updates lastIndexedAt only for "index" operations after successful completion. However, if the database update fails, the error would propagate and potentially cause issues. Consider wrapping in a try-catch to ensure indexing completion isn't affected by timestamp update failures.
🔎 Suggested defensive error handling
          onComplete: async (job) => {
            workerStatsCounter.labels("search", "completed").inc();
            const jobId = job.id;
            logger.info(`[search][${jobId}] Completed successfully`);

            // Update the lastIndexedAt timestamp after successful indexing
            const request = zSearchIndexingRequestSchema.safeParse(job.data);
            if (request.success && request.data.type === "index") {
+             try {
                await db
                  .update(bookmarks)
                  .set({ lastIndexedAt: new Date() })
                  .where(eq(bookmarks.id, request.data.bookmarkId));
+             } catch (error) {
+               logger.warn(
+                 `[search][${jobId}] Failed to update lastIndexedAt: ${error}`,
+               );
+             }
            }
          },
packages/trpc/models/importSessions.ts (1)
154-161: Consider handling taggingStatus === null for consistency.

Unlike crawling (line 148) where null is counted as completed, taggingStatus === null isn't counted in any detailed tagging bucket. This could cause the sum of taggingPending + taggingCompleted + taggingFailed to be less than totalBookmarks in edge cases (e.g., if the LEFT JOIN yields no matching bookmark).
🔎 Proposed fix for null handling consistency
      // Track tagging status
      if (taggingStatus === "pending") {
        stats.taggingPending += count;
-     } else if (taggingStatus === "success") {
+     } else if (taggingStatus === "success" || taggingStatus === null) {
        stats.taggingCompleted += count;
      } else if (taggingStatus === "failure") {
        stats.taggingFailed += count;
      }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bdba54 and ddf7278.

📒 Files selected for processing (10)

apps/web/components/settings/ImportSessionCard.tsx
apps/web/lib/i18n/locales/en/translation.json
apps/workers/workers/searchWorker.ts
packages/db/drizzle/0073_add_last_indexed_at_to_bookmarks.sql
packages/db/drizzle/meta/0073_snapshot.json
packages/db/drizzle/meta/_journal.json
packages/db/schema.ts
packages/shared/types/importSessions.ts
packages/trpc/models/importSessions.ts
packages/trpc/routers/importSessions.test.ts

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{ts,tsx,js,jsx,json,css,md}