feat(vector-store): Add pluggable Milvus vector store overlay for hybrid search#1228
feat(vector-store): Add pluggable Milvus vector store overlay for hybrid search#1228supmo668 wants to merge 9 commits intogetzep:mainfrom
Conversation
Introduce a VectorStoreClient ABC and MilvusVectorStoreClient that wraps pymilvus AsyncMilvusClient. Includes MilvusSearchInterface (vector similarity + BM25 fulltext) and MilvusGraphOperationsInterface (CRUD sync), plus schema builders, serialization helpers, and filter utilities in milvus_utils.py. - 4 Milvus collections: entity_nodes, entity_edges, episodic_nodes, community_nodes with HNSW/COSINE dense + BM25 sparse indexes - group_id as partition key for multi-tenancy - Datetimes stored as INT64 epoch ms (0 = null sentinel) - pymilvus >= 2.5.3 as optional [milvus] extra - 93 unit tests + 12 integration tests (Zilliz Cloud, auto-skip)
Add a vector_store attribute to GraphDriver and best-effort dual-write hooks so saves and deletes propagate to both the graph DB and the attached vector store. Graph DB remains source of truth; vector store failures are logged but never block graph writes. Hooks added in: - EntityNode.save(), CommunityNode.save(), EntityEdge.save() - add_nodes_and_edges_bulk_tx() for bulk writes - Node.delete(), delete_by_group_id(), delete_by_uuids() - EntityEdge.delete(), delete_by_uuids() - remove_communities(), clear_data() 23 new unit tests covering save, delete, and failure tolerance paths.
Accept an optional VectorStoreClient in Graphiti.__init__() and attach it to the driver for dual-write support. Auto-attaches MilvusSearchInterface when a MilvusVectorStoreClient is provided and no search_interface is already set. - close() now also closes the vector store connection - build_indices_and_constraints() calls ensure_ready() on vector store - 9 new unit tests for constructor, lifecycle, and auto-attach behavior
Create VectorStoreFactory with match/case pattern mirroring DatabaseDriverFactory. Add VectorStoreAppConfig and VectorStoreProvidersConfig to the config schema. Update graphiti MCP server to create vector store via factory and pass it to the Graphiti constructor instead of the previous env-var overlay approach. Auto-detects milvus provider when MILVUS_URI env var is set.
8496831 to
9835c3b
Compare
…yment Add deployment configurations for running Graphiti MCP server with Milvus as a vector search overlay alongside either FalkorDB or Neo4j. Docker setup: - FalkorDB + Milvus stack (docker-compose-falkordb-milvus.yml) - Neo4j + Milvus stack (docker-compose-neo4j-milvus.yml) - Conditional INSTALL_MILVUS build arg in Dockerfile.standalone - Runtime volume mount for vector_store package from source - Config templates with vector_store section for each graph DB variant Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add backfill_vector_store() that reads all entity nodes, entity edges, episodic nodes, and community nodes from the graph DB and batch-upserts them into the vector store. Supports group_id filtering and configurable batch size. Skips records without embeddings. 7 new unit tests covering empty graph, batching, filtering, and all four collection types.
9835c3b to
214edd4
Compare
…ion tests Zilliz Cloud serverless has eventual consistency for both vector indexing and BM25 full-text indexing. Add asyncio.sleep() delays after all writes before searching, and a teardown_class fixture to clean up test collections. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run `make format` per CONTRIBUTING.md checklist. All files pass `make lint` (ruff check + pyright 0 errors). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… base class Add 10 domain-aware save/delete methods to VectorStoreClient so core files (nodes.py, edges.py, bulk_utils.py, etc.) call abstract interface methods instead of importing Milvus-specific serialization. This enables any vector DB backend to be plugged in by implementing the interface — zero changes to core files required. - Add save_entity_nodes, save_entity_edges, save_episodic_nodes, save_community_nodes, delete_entity_nodes, delete_entity_edges, delete_nodes_by_uuids, delete_community_nodes, delete_by_group_ids, clear_all to VectorStoreClient base class - Implement all 10 methods in MilvusVectorStoreClient - Replace 11 dual-write hooks across 5 core files with 3-5 line calls - Remove all milvus_utils imports from core framework files - Add 21 unit tests for new domain-aware methods - Update 23 dual-write tests for new API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hi @danielchalef @prasmussen15 — this PR is ready for review. TL;DR: Adds a pluggable What it enables: Validation: 153 new tests (unit + integration against Zilliz Cloud), The RFC for this feature is at #1263. Happy to address any feedback or split the PR if preferred — commits are organized as 9 reviewable units (listed in the description). |
|
Update: This PR has been split into two parts for easier review:
Recommend reviewing #1264 first — it's the smaller, safer change that establishes the extension point. This PR builds on top of it. RFC issue updated at #1263 with the two-part structure. |
|
This monolithic PR has been split into two focused PRs per RFC #1263:
This PR can be closed once both parts are reviewed. The split keeps each PR focused and easier to review. |
|
Closing in favor of the split PRs:
See RFC #1263 for the full design. |
Summary
Add a pluggable vector store overlay to Graphiti that enables hybrid semantic search (dense vector + BM25 full-text) via external vector databases, with Milvus/Zilliz Cloud as the reference implementation. The graph database remains primary; the vector store is a secondary search acceleration layer with best-effort dual-write.
Type of Change
Objective
Problem: Graphiti relies on graph-DB-native search indexes. Dedicated vector databases offer GPU-accelerated ANN search, native BM25, managed scaling, and hybrid retrieval that graph DBs can't match.
Solution: Add a
VectorStoreClientplugin interface with domain-aware methods (save_entity_nodes,delete_entity_edges, etc.) so core framework files never import vendor-specific code. Each dual-write hook is ~3 lines. Adding a new vector store backend requires implementing the interface — zero changes tonodes.py,edges.py, orbulk_utils.py.Architecture:
graph TD A[Graphiti Core] --> B[Graph DB<br/>Neo4j / FalkorDB] A --> C[VectorStoreClient<br/>optional overlay] B -->|Primary store| D[Graph structure<br/>Episodes & relationships<br/>Entity deduplication] C -->|Search acceleration| E[Vector similarity<br/>BM25 full-text<br/>Hybrid retrieval] C -.->|Pluggable backends| F[MilvusVectorStoreClient] C -.->|Pluggable backends| G[Future: Qdrant / Pinecone / ...] style A fill:#4a90d9,color:#fff style B fill:#2ecc71,color:#fff style C fill:#e67e22,color:#fff style F fill:#e67e22,color:#fff style G fill:#95a5a6,color:#fff,stroke-dasharray: 5 5sequenceDiagram participant App participant Graphiti participant GraphDB as Graph DB (Neo4j) participant VS as VectorStoreClient (Milvus) App->>Graphiti: add_episode(content) Graphiti->>GraphDB: Write nodes & edges GraphDB-->>Graphiti: OK Graphiti->>VS: save_entity_nodes([node]) Note over VS: Best-effort dual-write<br/>Failures logged, not raised VS-->>Graphiti: OK App->>Graphiti: search(query) Graphiti->>VS: SearchInterface.search(query) VS-->>Graphiti: Ranked results (dense + BM25) Graphiti->>GraphDB: Hydrate full graph context GraphDB-->>Graphiti: Nodes, edges, episodes Graphiti-->>App: Search resultsKey Components
VectorStoreClientgraphiti_core/vector_store/client.pyMilvusVectorStoreClientgraphiti_core/vector_store/milvus_client.pyMilvusSearchInterfacegraphiti_core/vector_store/milvus_search_interface.pyMilvusGraphOperationsInterfacegraphiti_core/vector_store/milvus_graph_operations.pynodes.py,edges.py,bulk_utils.py, etc.mcp_server/src/services/factories.pyVectorStoreFactory+ YAML configgraphiti_core/utils/vector_store_sync.pyKey Design Decisions
graph_operations_interfaceis either-or (replaces graph writes). A newvector_storeattribute onGraphDriverenables additive writes — graph DB always writes first, vector store is best-effort.driver.vector_store.save_entity_nodes([node])— no vendor imports. New backends implement theVectorStoreClientinterface only.if driver.vector_store is not None:guards — zero overhead when not used.Anytyping: Avoids circular imports, matches existingSearchInterfacepattern.pymilvus>=2.5.3as optional extra:pip install "graphiti-core[milvus]"Usage
Commit Breakdown (9 reviewable units)
37b5f3d) —VectorStoreClientABC,MilvusVectorStoreClient, search/graph-ops adapters, schema builders, serialization utils899818b) —vector_storeattribute onGraphDriver, best-effort hooks innodes.py,edges.py,bulk_utils.py,clear_data(),remove_communities()2c2781f) —vector_storeparameter onGraphiti.__init__(), auto-attachMilvusSearchInterface, lifecycle managementf86f88c) —VectorStoreFactory,VectorStoreAppConfigschema, factory-based creation with env var overridesa0fb9f4) — Docker Compose stacks for Neo4j+Milvus and FalkorDB+Milvus214edd4) —backfill_vector_store()for syncing existing graph data into Milvusd3a0396) — Eventual consistency delays for Zilliz Cloud serverless8fd50de) —make formatpass per CONTRIBUTING.md checklist1c3ed17) — Add 10 domain-aware methods toVectorStoreClient, remove all milvus_utils imports from core filesTesting
test_add_tripletfrom fix(graphiti): prevent add_triplet from overwriting edges with different src/dst #1212)make lintpasses (ruff + pyright: 0 errors, 0 warnings)Breaking Changes
No breaking changes. The
vector_storeparameter is optional — existing users are unaffected.Checklist
make lintpasses)Related Issues
Relates to #1263 (RFC: Add pluggable vector store overlay for hybrid search)
Closes #1229
Built on top of the driver operations redesign (#1232)
Files changed (click to expand)
New files (core library):
graphiti_core/vector_store/__init__.py— package initgraphiti_core/vector_store/client.py— VectorStoreClient base class (10 domain-aware methods)graphiti_core/vector_store/milvus_client.py— Milvus implementationgraphiti_core/vector_store/milvus_search_interface.py— Hybrid search via Milvusgraphiti_core/vector_store/milvus_graph_operations.py— Graph ops for standalone Milvusgraphiti_core/vector_store/milvus_utils.py— Milvus schema, serialization, BM25 configgraphiti_core/utils/vector_store_sync.py— Backfill utilityModified files (dual-write hooks):
graphiti_core/nodes.py— 5 dual-write hooks (save + delete)graphiti_core/edges.py— 3 dual-write hooks (save + delete)graphiti_core/utils/bulk_utils.py— Bulk dual-write hookgraphiti_core/utils/maintenance/graph_data_operations.py— clear_data hookgraphiti_core/utils/maintenance/community_operations.py— remove_communities hookgraphiti_core/graphiti.py—vector_storeconstructor param, auto-attach SearchInterfaceMCP server integration:
mcp_server/src/services/factories.py— VectorStoreFactorymcp_server/src/config/schema.py— VectorStoreAppConfigmcp_server/src/graphiti_mcp_server.py— Wire vector store into MCP servermcp_server/config/config-docker-neo4j-milvus.yaml— Docker configmcp_server/config/config-docker-falkordb-milvus.yaml— Docker configmcp_server/docker/docker-compose-neo4j-milvus.yml— Docker composemcp_server/docker/docker-compose-falkordb-milvus.yml— Docker composeTests (153 new tests):
tests/vector_store/test_milvus_vector_store_client.py— 33 unit teststests/test_dual_write.py— 23 dual-write teststests/test_graphiti_vector_store.py— 9 Graphiti constructor teststests/test_vector_store_sync.py— 7 backfill utility teststests/driver/test_milvus_search_interface.py— 20 search interface teststests/driver/test_milvus_graph_operations.py— 27 graph operations teststests/driver/test_milvus_utils.py— 34 schema/serialization teststests/driver/test_milvus_search_int.py— 10 Zilliz Cloud integration tests