fix(summary): exclude duplicate edges from node summary generation#1223
fix(summary): exclude duplicate edges from node summary generation#1223prasmussen15 merged 2 commits intomainfrom
Conversation
When resolving extracted edges, edges that match existing edges in the graph were still being passed to node summary generation, causing facts to be duplicated in summaries. Changes: - Update resolve_extracted_edges to return new_edges (non-duplicates) - Update _extract_and_resolve_edges to pass through new_edges - Pass only new_edges to extract_attributes_from_nodes in add_episode - An edge is considered "new" if its resolved UUID matches extracted UUID Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| invalidated_edges.extend(result[1]) | ||
| # result[2] is new_edges - not used in bulk flow since attributes |
There was a problem hiding this comment.
The comment says "not used in bulk flow since attributes are extracted before edge resolution", but this means the bug being fixed in add_episode (duplicate facts in summaries) could still occur in the bulk flow.
If _resolve_nodes_and_edges_bulk extracts attributes before edge resolution, and those attributes include summaries based on edges, wouldn't the same duplication problem exist? The bulk flow appears to call extract_attributes_from_nodes with all edges including potential duplicates.
Consider clarifying in the PR description whether this is a known limitation, or investigate whether the bulk flow actually has this issue.
| for extracted_edge, result in zip(extracted_edges, results, strict=True): | ||
| resolved_edge = result[0] | ||
| invalidated_edge_chunk = result[1] | ||
| # result[2] is duplicate_edges list |
There was a problem hiding this comment.
This comment is misleading. result[2] is not a "duplicate_edges list" - looking at the resolve_extracted_edge function signature/return type would clarify what this actually is. The third element appears to be something else based on the tuple type annotation at line 433.
Consider either removing this comment or verifying what result[2] actually contains and documenting it accurately.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
resolve_extracted_edgesto return a third value:new_edges(edges that are new to the graph, not duplicates)new_edgestoextract_attributes_from_nodesfor summary generation inadd_episode()Details
When an extracted edge is resolved against existing edges in the graph, if it matches an existing edge (duplicate), the resolved edge takes on the UUID of the existing edge. Previously, all resolved edges were passed to summary generation, causing duplicate facts.
Now we track which edges are "new" by comparing
resolved_edge.uuid == extracted_edge.uuid. Only new edges (non-duplicates) are passed to the summary generation flow.Test plan
new_edgesbehavior🤖 Generated with Claude Code