feat: refactor recomendations to use user taste directly#435
feat: refactor recomendations to use user taste directly#435iFraan wants to merge 28 commits intofredrikburmester:mainfrom
Conversation
…mmendations Replace per-request N+1 similarity searches with batch-precomputed user taste profiles. Each user now has a single unified embedding vector that combines their movie and series watch history, enabling single HNSW-indexed vector searches for recommendations. - Add user_embeddings table to store pre-computed taste profiles - Add nightly batch job to compute user embeddings from watch history - Create recommendation-engine.ts with profile-based recommendation logic - Refactor movie and series recommendations to use pre-computed profiles - Apply recency decay and bounce penalty weights in profile computation - Add freshness boost for recently added items in recommendations
Add migration to create user_embeddings table with vector column for storing pre-computed user preference embeddings. Includes foreign keys to users and servers tables, unique constraint on user-server pairs, and index on server_id for efficient queries.
Make the select shape internal to the module as it's only used within the recommendation engine and doesn't need to be part of the public API.
Add manual trigger for user embeddings calculation per server: - New triggerServerUserEmbeddingsSync method in SyncScheduler class - POST /scheduler/trigger-user-embeddings-sync endpoint - Validates server existence before queueing job
…agement Replace maxPercentComplete metric with totalPlayDuration and expectedRuntime for more accurate engagement measurement in user embedding calculations. Also adjust recency decay half-life from ~70 days to ~200 days for more forgiving recommendations.
The user embeddings feature has been implemented with the user_embeddings table, sync job, and scheduler endpoint. This planning document is no longer needed.
- Move vector math functions (`normalizeVector`, `toPgVectorLiteral`) from embedding jobs to shared `utils/vector.ts` module - Update both job-server and nextjs-app to import from the new shared location - Add dimension mismatch logging and configurable series engagement threshold - Improve recommendation engine with similarity filtering and exclusion list capping
Add minimum completion thresholds to avoid penalizing series and movies incorrectly. Introduce SERIES_BOUNCE_THRESHOLD (0.15) and MOVIES_BOUNCE_THRESHOLD (0.10) constants, replacing hardcoded values. For series, calculate average completion ratio using total play duration and expected runtime. Apply bounce penalty (-0.3) when average completion falls below threshold, preventing low-engagement samples from inflating user profile weights. This aligns series engagement logic with existing movie bounce detection. Query additional fields (totalPlayDuration, avgEpisodeRuntimeTicks) to support runtime calculations for weight determination.
Reviewer's GuideRefactors personalized recommendations to use precomputed per-user taste embeddings stored in a new user_embeddings table, introduces a scheduled job and manual trigger to compute/update these embeddings from watch history, centralizes profile-based recommendation queries in a shared engine, and updates APIs, UI, and AI tools to consume the new profile-based recommendations while keeping item-to-item similarity and hiding logic intact. Sequence diagram for profile-based personalized recommendationssequenceDiagram
actor User
participant Client as ClientApp
participant API as NextjsAPI_recommendations
participant Stats as SimilarStatistics_getSimilarStatistics
participant Engine as RecommendationEngine_getProfileRecommendations
participant DB as Database
User->>Client: Request personalized recommendations
Client->>API: GET /api/recommendations
API->>Stats: getSimilarStatistics({ serverId, userId, type, limit, offset })
alt userId missing
Stats->>API: []
API->>Client: Empty recommendations
else userId resolved
Stats->>Engine: getProfileRecommendations(serverId, userId, type, limit, offset)
Engine->>DB: SELECT embedding FROM user_embeddings WHERE userId AND serverId
alt no user embedding profile
DB-->>Engine: []
Engine-->>Stats: []
Stats-->>API: []
API-->>Client: Empty recommendations
else profile exists
DB-->>Engine: user embedding vector
par load exclusions
Engine->>DB: SELECT hiddenRecommendations WHERE userId AND serverId
Engine->>DB: SELECT DISTINCT sessions.itemId (movies)
Engine->>DB: SELECT DISTINCT sessions.seriesId (series)
and vector search
Engine->>DB: Vector search on items.embedding using user profile
end
DB-->>Engine: Candidate items with similarity
Engine-->>Stats: RecommendationResult[]
Stats-->>API: RecommendationResult[]
API-->>Client: Recommendations with similarity and reasons
end
end
ER diagram for new user_embeddings taste profile tableerDiagram
users {
text id PK
integer serverId FK
timestamp lastActivityDate
}
servers {
integer id PK
integer embeddingDimensions
}
userEmbeddings {
serial id PK
text userId FK
integer serverId FK
vector embedding
integer itemCount
timestamp lastCalculatedAt
timestamp createdAt
timestamp updatedAt
}
items {
text id PK
integer serverId FK
vector embedding
text type
timestamp createdAt
}
sessions {
integer id PK
integer serverId FK
text userId FK
text itemId FK
text seriesId
integer playDuration
integer runtimeTicks
timestamp endTime
}
hiddenRecommendations {
integer id PK
integer serverId FK
text userId FK
text itemId FK
}
users ||--o{ sessions : has
servers ||--o{ sessions : has
users ||--o{ userEmbeddings : has
servers ||--o{ userEmbeddings : has
servers ||--o{ items : has
users ||--o{ hiddenRecommendations : has
servers ||--o{ hiddenRecommendations : has
items ||--o{ hiddenRecommendations : has
Class diagram for recommendation engine and user embedding jobclassDiagram
class RecommendationCardItem {
+string id
+string name
+string type
+number productionYear
+number runtimeTicks
+string[] genres
+number communityRating
+string primaryImageTag
+string primaryImageThumbTag
+string primaryImageLogoTag
+string[] backdropImageTags
+string seriesId
+string seriesPrimaryImageTag
+string parentBackdropItemId
+string[] parentBackdropImageTags
+string parentThumbItemId
+string parentThumbImageTag
}
class RecommendationResult {
+RecommendationCardItem item
+number similarity
+RecommendationCardItem[] basedOn
}
class RecommendationEngine {
+getProfileRecommendations(serverId number, userId string, targetType string, limit number, offset number) RecommendationResult[]
}
class UserEmbeddingRecord {
+number id
+string userId
+number serverId
+number[] embedding
+number itemCount
+Date lastCalculatedAt
+Date createdAt
+Date updatedAt
}
class CalculateUserEmbeddingsJobData {
+number serverId
}
class WatchedItemForProfile {
+string itemId
+number[] embedding
+string type
+string seriesId
+number totalPlayDuration
+number expectedRuntime
+number lastWatchedMs
+number episodeCount
}
class UserEmbeddingsJob {
+calculateUserEmbeddingsJob(job PgBossJob) Promise
-ensureUserEmbeddingIndex(dimensions number) Promise
-computeUserProfile(serverId number, userId string, now number) object
}
class VectorUtils {
+normalizeVector(vec number[]) number[]
+toPgVectorLiteral(value number[]) string
}
class DatabaseTables {
+userEmbeddings
+items
+sessions
+hiddenRecommendations
}
RecommendationEngine --> RecommendationResult
RecommendationResult --> RecommendationCardItem
RecommendationEngine --> DatabaseTables
RecommendationEngine --> VectorUtils
UserEmbeddingsJob --> UserEmbeddingRecord
UserEmbeddingsJob --> WatchedItemForProfile
UserEmbeddingsJob --> DatabaseTables
UserEmbeddingsJob --> VectorUtils
DatabaseTables --> UserEmbeddingRecord
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
How can you compare an arbitrary "user watch" with items? Those vectors won't be similar at all, no? The point of the current system is that we pull items deterministically based on user sessions and then find matching items with embeddings, comparing items with items, giving good results. I think the improvement should be to filter out the list of user watched items before doing the embedding comparison, excluding items not watched fully or similar. But I'm very open to improvements so if this actually works, I'd like to know more about how! Can you explain it a bit more in detail? Edit: Maybe comparing user sessions to items actually works and i'm just too restrictive in what i think embeddings can do haha. |
Convert freshness threshold to ISO string for proper date comparison in SQL query, ensuring consistent behavior across different database systems.
|
I was having a problem that all recommended items felt 'generic' and 'not for me'.
They have all the same Now onto the implementation So, to be clear, I'm not comparing session data against items directly. The user embedding is not an exact average since we weight different things to affect the final result, like completion and a recency decay. With that we fabricate an unique user taste thingy. Having it this way allows us to have a 'true' taste since we are not omitting any watch, and we can filter hidden recommendations or watched items in the final query only. No more post fetch sort/limit hacks, since it's all handled by the Postgres engine. Oh, and did i mention is super fast? Writing this comment gave me some ideas like hidden recommendations giving negative weight and considering an n+1 query to return the 'basedOn' items. I would love to hear a review on how it works for you! Really. |
|
Yeah I agree, I also feel like the recommendations are too generic.. The pre-computed user profile approach seems like a solid improvement. Just a few things:
Other than that i think it's a solid improvement we can merge! |
|
@fredrikburmester What if we average all user embeddings to create a 'popular on the server' or similar. Just for this no-user-data cases. |
Moves vector normalization and pgvector literal conversion utilities from individual apps to the shared @streamystats/database package. Also removes duplicate type definitions in similar-statistics modules by reusing types from recommendation-engine, removes unused time window parameters, and cleans up debug logging.
Adds a new database table to store user embeddings with vector data, supporting the refactored recommendation system. Includes foreign keys to users and servers tables, a unique constraint on user-server pairs, and an index on server_id for efficient queries.
…rofile
Add server-wide average embeddings fallback when user has no watch history.
The recommendation system now returns a source field ("user", "server", or "none")
to indicate whether recommendations are personalized or based on server popularity.
Components display appropriate titles and descriptions based on source.
Yeah sure, but let's put that in another PR to not bloat this one. |
Already too late haha ≥ 3 users with embeddings → they see "Popular Movies/Series on This Server" (server-average fallback) Let me just test this a bit and I'll mark the pr as ready to review |
The reason I pushed for that to be a separate PR is because it probably needs more discussion.
Might be more things I'm not thinking of rn. |
Moved recommendation-related interfaces and types from recommendation-engine.ts to a dedicated recommendation-types.ts file to improve code organization and reduce circular dependencies. This change centralizes all recommendation type definitions in one location, making the codebase more maintainable and easier to understand. The refactoring affects multiple files including the recommendation engine, similar statistics modules, and API routes, which now import types from the new centralized location.
I meant servers with ≥ 3 users, and servers with < 3 users. But you are right. Since we have it, we can also use it to show popular shows to all users. It deserves its own PR. Reverting the last bit. |
This reverts commit 035d562.
…ithout profile" This reverts commit d3ba424.
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
computeUserProfile's series aggregation,avgEpisodeRuntimeTicksis computed fromsessions.runtimeTicks, butruntimeTicksis a property ofitems, so this should be switched toAVG(items.runtimeTicks)to avoid referencing a non-existent column and to get correct duration data. - In
getProfileRecommendations, capping theNOT INexclusion list atMAX_EXCLUSION_LIST_SIZEcan silently drop some hidden/watched IDs from the filter; consider switching to aNOT EXISTSjoin or a temp table approach so you can exclude all relevant IDs without truncation.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `computeUserProfile`'s series aggregation, `avgEpisodeRuntimeTicks` is computed from `sessions.runtimeTicks`, but `runtimeTicks` is a property of `items`, so this should be switched to `AVG(items.runtimeTicks)` to avoid referencing a non-existent column and to get correct duration data.
- In `getProfileRecommendations`, capping the `NOT IN` exclusion list at `MAX_EXCLUSION_LIST_SIZE` can silently drop some hidden/watched IDs from the filter; consider switching to a `NOT EXISTS` join or a temp table approach so you can exclude all relevant IDs without truncation.
## Individual Comments
### Comment 1
<location path="apps/job-server/src/jobs/embedding-jobs.ts" line_range="149-150" />
<code_context>
};
}
-function toPgVectorLiteral(value: number[]): string {
- return `[${value.join(",")}]`;
-}
</code_context>
<issue_to_address>
**issue (bug_risk):** toPgVectorLiteral is removed here but not re-imported, which will break existing usages in this file.
The helper now lives in `@streamystats/database/vector`, but this file doesn’t import it even though it’s still used later (e.g. for embedding upserts), which will cause a compile-time error. Add the appropriate import (for example `import { toPgVectorLiteral } from "@streamystats/database";`) and rely on that instead of an inline implementation.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
please view this PR #455 before margin this. because i think it is relevant since this will change how the recommendation engine behaves. Inside the SQL queries the system already does: Meaning only the top matches per base item survive. 1 - It fixes cross-genre dilution The real gain will come from raising the candidate threshold. So 0.1 was letting in a lot of garbage candidates. Please review before making drastic change to the recommendation engine. keeping separate table for user taste and all would still work even with the formular that I have proposed. and I plan to make a new PR that would address long-running series bias fix (for users who has watched a series with 100s of episodes like The Simpsons and other making the base list polluted with long shows) |
|
A last idea. Would it be possible to have a setting (either personal or system wide) to select which method we use to deliver recommendations? |
Do you mean pre-pr recommendations or server-wide recommendations vs user-wide recommendations? For server-wide vs user-wide we should create a 'server-embedding' to both prevent running the average every time, and to maybe weight-in different stuff like recent added content, or even an admin curated list. Maybe repurpose the user-embeddings table to hold both. For pre-pr i guess it can be done; the question is, should we? |
|
Can you try how this PR works for your case? It avoids comparing "weak similarity" items because it no longer calculates the cosine similarity of recent watches against the entire item catalog. The new system averages the entire watch history rather than limiting it to the top or last 15 entries.
To fix long-running series bias, this defines "engagement" as more than five episodes watched and use runtime to scale weights. But it caps at five. From five to infinite, the episode count doesn't matter anymore. Just runtime averages. For example, if a user watches less than 15% of an episode on average, it suggests they didn't enjoy it. Instead of just ignoring that episode, the system now actively steers recommendations away from similar series. On the other hand, if a user watches an episode twice (as an average across the series, not just a single re-watch), we amplify the weight to push even more content like it. The averaging algorithm can be tweaked even more (add admin curated lists, genre-average weight up, etc.) but I personally think moving towards a user taste is a step in the right direction. We can change some details later down the line if needed. |
|
As mentioned before, I was working on a separate PR (#455) to fix cross-genre score dilution in the existing N+1 recommendation logic, but after reviewing your approach I'm The pre-computed taste profile with recency decay, bounce detection, and series engagement normalization is the right architectural direction. It also naturally solves the long-running series bias (Simpsons/Family Guy dominating the base list) which I was planning to address in a follow-up PR. Your One thing I noticed: if a user has no profile yet in Great work on this. |
|
One more thing: if a user clears item embeddings to switch models or This could be handled wherever the embedding reset/clear action is triggered, a simple |
When item embeddings are cleared for a server, user embeddings that derive from them must also be cleared. Otherwise they remain in a stale dimensional space, causing mismatched recommendation calculations until recalculated.
We evaluated using server-average to present "popular on server" but decided it was out of scope for this pr, but will revisit later.
You are right! There's a window until the job is re-run that will break the recommendations. Deleting the current embedding should be enough. The user-embedding-job will see the dimension mismatch and recreate the index if necessary. I think we should also put a button on settings/ai to trigger the job manually. |
Adds the ability to manually trigger user taste embeddings generation from the settings UI. This includes: - New `triggerUserEmbeddingsSync` function in the server library to call the job server - New UI section in EmbeddingsManager for manually generating user embeddings - Updated label from "Movie Embeddings" to "Item Embeddings" for broader terminology
|
some models share the same output dimensions but produce embeddings in a completely different vector space depending on their training data. In this case no error occurs, but the cosine similarity between the stale profile vector and the freshly re-embedded items will be semantically meaningless, producing nonsensical recommendations with no visible indication that anything is wrong. This is the more dangerous case. So if the item embeddings are cleared, its better to automatically clean the user embeddings as well (if they changed the model but kept the dimensions same) |
This will be a nice touch. But I think you are over thinking this. Popular on server or trending on server can be the most played moves and series across all users during the last x number of days. No need to hook the recommendation engine to it. It would be an in house trending generator |
|
One more issue I noticed while looking at the code: return await getProfileRecommendations(
serverIdNum,
targetUserId,
"Movie", // hardcoded
limit,
offset,
);The The fix needs two changes:
|
The original implementation had 'Movie' hardcoded too, the component that uses it expects it. Now that |
Add support for filtering recommendations by media type (Movie, Series, or all) across the recommendation system. This refactors the query logic in `getProfileRecommendations` and `getSimilarStatistics` to accept a type parameter instead of filtering results post-query, improving efficiency.
…s into single function Replaced the separate `getSimilarSeries` function with a unified `getSimilarStatistics` function that accepts a `type` parameter. Updated all callers across the codebase (dashboard components, API routes, and AI tools) to use the new object parameter pattern. Removed the now-redundant `similar-series-statistics` module.
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
calculateUserEmbeddingsJob.computeUserProfilethe series query usesAVG(${sessions.runtimeTicks}), butruntimeTicksappears to be anitemsfield elsewhere; this likely should beAVG(${items.runtimeTicks})to avoid a broken query or incorrect runtime aggregation. - The recommendation types are now treated as if
basedOncan be absent (e.g. usingr.basedOn ?? []andbasedOn?:in dashboard types), butRecommendationResultinrecommendation-engine.tsstill typesbasedOnas a required array; consider making it optional there too to keep the type system aligned with the new behavior.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `calculateUserEmbeddingsJob.computeUserProfile` the series query uses `AVG(${sessions.runtimeTicks})`, but `runtimeTicks` appears to be an `items` field elsewhere; this likely should be `AVG(${items.runtimeTicks})` to avoid a broken query or incorrect runtime aggregation.
- The recommendation types are now treated as if `basedOn` can be absent (e.g. using `r.basedOn ?? []` and `basedOn?:` in dashboard types), but `RecommendationResult` in `recommendation-engine.ts` still types `basedOn` as a required array; consider making it optional there too to keep the type system aligned with the new behavior.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Yes, confirmed. In the current main branch, the AI chat can't recommend series via embeddings at all. When asked, it falls back to a genre based keyword search using watch history instead of the actual embedding similarity. The user just sees "embeddings not configured" message with no indication of why. |



Refactor recommendations to use new "user taste" embedding
Instead of creating a pseudo user taste on last watches, it calculates an embedding using full session history.
Older watches matter, just less than recent ones. Also, sessions can have negative weights if the user abandons them quickly.
Generates a user taste embedding and compares it directly to items embeddings.
For me it seems to improve recommendations, but basedOn items are lost (since it compares embeddings directly)
Summary by Sourcery
Refactor personalized recommendations to use precomputed user taste embeddings and add infrastructure to compute, store, and schedule these user embedding profiles.
New Features:
Enhancements:
Build:
Summary by Sourcery
Refactor personalized recommendations to use precomputed per-user taste embeddings and introduce infrastructure to compute, store, and schedule these profiles while adapting APIs and UI to the new engine.
New Features:
Enhancements:
Build: