Skip to content

Add similarity search functions for tasks and tickets#166

Open
dolliecoder wants to merge 1 commit intoAOSSIE-Org:mainfrom
dolliecoder:feat/task-ticket-vector-search
Open

Add similarity search functions for tasks and tickets#166
dolliecoder wants to merge 1 commit intoAOSSIE-Org:mainfrom
dolliecoder:feat/task-ticket-vector-search

Conversation

@dolliecoder
Copy link
Contributor

@dolliecoder dolliecoder commented Feb 13, 2026

This PR introduces SQL helper functions to enable vector similarity search for tasks and tickets using the pgvector setup added in PR1.

It is a follow-up, incremental step toward Issue #65. While PR1 introduced the description_embedding vector(768) columns for tasks and tickets (storage layer), this PR builds on that foundation by adding database-level similarity search functions (retrieval layer).
No embedding generation, indexing, or AI service integration is included here. This PR strictly enables semantic retrieval capability at the database level.

Dependency Note:
This PR depends on PR1, as it relies on the description_embedding columns introduced there. PR1 must be merged before this PR to ensure the functions execute against an existing schema. pr1 : #160

Changes Made

Added get_similar_tasks(query_embedding, match_count) SQL function
Added get_similar_tickets(query_embedding, match_count) SQL function

Each function:
Computes cosine similarity using <=>
Returns top-k most semantically similar rows
Ignores rows without embeddings (IS NOT NULL)
Added new Supabase migration file to maintain proper migration ordering

✅ Checklist

I have read the contributing guidelines.

I have added tests that prove my fix is effective or that my feature works.
(Not applicable – database-level capability addition only.)

I have added necessary documentation (if applicable).
(Not required at this stage.)

Any dependent changes have been merged and published in downstream modules.
(Depends on PR1 – embedding schema changes.)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

This pull request adds two PostgreSQL PL/pgSQL functions for semantic similarity search. The get_similar_tasks and get_similar_tickets functions compute cosine similarity between query embeddings and stored embeddings, returning ranked results up to a configurable limit.

Changes

Cohort / File(s) Summary
Database Vector Search Functions
supabase/migrations/20251021110000_task_ticket_vector_search.sql
Adds two new PL/pgSQL functions (get_similar_tasks and get_similar_tickets) that perform semantic similarity search using vector embeddings. Both functions accept a query embedding (768-dimensional vector) and match count parameter, computing cosine similarity via the <=> operator and returning ranked results with similarity scores.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Embeddings dance in vectors bright,
Eight seventy-four dimensions take flight,
Cosine whispers "find what's near,"
Similar tasks and tickets appear!
Search with similarity, clear and light. ✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add similarity search functions for tasks and tickets' directly and clearly summarizes the main change—adding two SQL functions for vector-based similarity search on tasks and tickets.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql`:
- Around line 17-20: Add HNSW vector indexes so the <=> similarity ORDER BY on
description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on
tasks using hnsw for column description_embedding with vector_cosine_ops, and
create IF NOT EXISTS idx_tickets_embedding on tickets similarly for
description_embedding; ensure the new indexes are applied before running
similarity queries (also add the same index when you see other queries ordering
by description_embedding <=> query_embedding).
🧹 Nitpick comments (1)
supabase/migrations/20251021110000_task_ticket_vector_search.sql (1)

1-23: Consider LANGUAGE sql and marking as STABLE.

Since the function body is a single RETURN QUERY SELECT, PL/pgSQL is unnecessary overhead — plain LANGUAGE sql avoids the PL/pgSQL executor layer. Additionally, these functions have no side effects and return consistent results for the same inputs within a transaction, so marking them STABLE lets the planner optimize repeated calls.

♻️ Suggested diff
 CREATE OR REPLACE FUNCTION get_similar_tasks(
     query_embedding vector(768),
     match_count INT DEFAULT 3
 )
 RETURNS TABLE (
     task_id UUID,
     title TEXT,
     description TEXT,
     similarity FLOAT
-) AS $$
-BEGIN
-    RETURN QUERY
+) LANGUAGE sql STABLE AS $$
     SELECT
         t.id AS task_id,
         t.title,
         t.description,
         1 - (t.description_embedding <=> query_embedding) AS similarity
     FROM tasks t
     WHERE t.description_embedding IS NOT NULL
     ORDER BY t.description_embedding <=> query_embedding
     LIMIT match_count;
-END;
-$$ LANGUAGE plpgsql;
+$$;

Same applies to get_similar_tickets below.

@dolliecoder
Copy link
Contributor Author

Just checking in on this PR I will be happy to revise based on reviews. Please let me know if there’s anything I can adjust to help in moving this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant