Skip to content
Closed
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8ea27b1
Add viz dependency group
caseyclements Sep 11, 2025
1452cd0
Added viz to project.optional-dependencies
caseyclements Sep 25, 2025
f643224
Split out networkx. Cleaned up API. Testing via notebook not yet chec…
caseyclements Sep 29, 2025
b31faf8
Adds API TODO
caseyclements Oct 3, 2025
84ef46c
Add .ipynb_checkpoints to .gitignore
caseyclements Oct 7, 2025
5d6d624
Added basic GraphRAG notebook
caseyclements Oct 7, 2025
1319e9a
Removed viz packages only needed for advanced graph layouts
caseyclements Oct 7, 2025
e91b683
Removes script added by mistake
caseyclements Oct 7, 2025
a66050b
Finishes to_network, and view API (opts kwargs) and adds documentation.
caseyclements Oct 7, 2025
34ef377
Adds small working GraphStore visualization notebook. Requires existi…
caseyclements Oct 7, 2025
24a8d7e
Add large-scale knowledge graph test with 100+ entities
caseyclements Oct 7, 2025
b0e973f
Add large-scale knowledge graph visualization notebook
caseyclements Oct 7, 2025
84636a0
Improve entity extraction prompts for relationship array alignment
caseyclements Oct 8, 2025
3be3004
Completed self-standing GraphRAG example. Removed second one.
caseyclements Oct 8, 2025
6a37690
Removed .gitignore in lib
caseyclements Oct 8, 2025
036f116
Codespell docstring
caseyclements Oct 8, 2025
de3aa82
Added minimum versions to viz group in pyproject
caseyclements Oct 8, 2025
bd08f49
Merge branch 'main' into INTPYTHON-501-GraphRAGView
caseyclements Oct 8, 2025
bb865b0
Fix holoviews version and update locks
caseyclements Oct 8, 2025
82ec17a
Add viz extra in dependencies of _lint.yml
caseyclements Oct 8, 2025
44e2efc
Revert "Add viz extra in dependencies of _lint.yml"
caseyclements Oct 8, 2025
3a96215
Add viz extras to lint workflow for langchain-mongodb type checking
caseyclements Oct 8, 2025
47b98f2
Updated install dependencies condition to use env.WORKDIR
caseyclements Oct 8, 2025
11535d0
Updated check in GraphRAG tests for OpenAI model by including AZURE_O…
caseyclements Oct 9, 2025
3222651
Removed viz logic based on env.WORKDIR. Instead, adding extra viz to …
caseyclements Oct 9, 2025
d32a78e
Cleaned up closing of graphstore and clients in tests.
caseyclements Oct 9, 2025
47f2e47
Add saved html rendering of graph
caseyclements Oct 9, 2025
9d892b7
INTPYTHON-780 Make TestMongoDBAtlasFullTextSearchRetriever more robus…
blink1073 Oct 9, 2025
3d7a678
INTPYTHON-785 Only list authorized collections when listing collectio…
rafaelodon Oct 9, 2025
ccab4f3
PYTHON-5580: Add pull request template (#213)
Jibola Oct 10, 2025
19def7f
INTPYTHON-787 Added check for AZURE_OPENAI_ENDPOINT test_graphrag.py …
caseyclements Oct 10, 2025
90d0b10
Added basic tests of to_networkx and view. Added pytest mark called viz
caseyclements Oct 13, 2025
a9a0975
Removed test_graph_large, moved its data to examples, removed html th…
caseyclements Oct 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .github/workflows/_lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,12 @@

- name: Install dependencies
working-directory: ${{ inputs.working-directory }}
run: just install
run: |
if [[ "${{ env.WORKDIR }}" == *"langchain-mongodb"* ]]; then
uv sync --frozen --extra viz
else
just install
fi
- name: Get .mypy_cache to speed up mypy
uses: actions/cache@v4
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ __pycache__
.env
.venv*
.local_atlas_uri
.ipynb_checkpoints
docs/langchain_mongodb
docs/langgraph_checkpoint_mongodb
docs/index.md
1 change: 0 additions & 1 deletion libs/langchain-mongodb/.gitignore

This file was deleted.

4,210 changes: 4,210 additions & 0 deletions libs/langchain-mongodb/examples/GraphRAG.ipynb

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions libs/langchain-mongodb/langchain_mongodb/graphrag/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from langchain_mongodb.graphrag.graph import MongoDBGraphStore

__all__ = ["MongoDBGraphStore"]
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
"startDate": ["2018-01-01"]
}},
"relationships": {{
"targets": ["Jasbinder Kaur", "Jarnail Singh"],
"target_ids": ["Jasbinder Kaur", "Jarnail Singh"],
"types": ["Friend", "Friend"],
"attributes": [
{{ "since": ["2019-05-01"] }},
Expand All @@ -37,7 +37,7 @@
"_id": "Jarnail Singh",
"type": "Person",
"relationships": {{
"targets": ["Alice Palace"],
"target_ids": ["Alice Palace"],
"types": ["Friend"],
"attributes": [{{ "since": ["2019-05-01"] }}]
}}
Expand All @@ -46,7 +46,7 @@
"_id": "Jasbinder Kaur",
"type": "Person",
"relationships": {{
"targets": ["Alice Palace"],
"target_ids": ["Alice Palace"],
"types": ["Friend"],
"attributes": [{{ "since": ["2015-05-01"], "frequency": ["weekly"] }}]
}}
Expand All @@ -71,8 +71,9 @@
"location": ["San Francisco"]
}},
"relationships": {{
"targets": ["Elon Musk", "Sam Altman"],
"types": ["Speaker", "Speaker"]
"target_ids": ["Elon Musk", "Sam Altman"],
"types": ["Speaker", "Speaker"],
"attributes": [{{}}, {{}}]
}}
}},
{{ "_id": "Elon Musk", "type": "Person" }},
Expand All @@ -92,8 +93,9 @@
"_id": "Quantum Computing",
"type": "Concept",
"relationships": {{
"targets": ["Quantum Mechanics"],
"types": ["Based On"]
"target_ids": ["Quantum Mechanics"],
"types": ["Based On"],
"attributes": [{{}}]
}}
}},
{{ "_id": "Quantum Mechanics", "type": "Concept" }}
Expand All @@ -114,8 +116,9 @@
"type": "Event",
"attributes": {{ "date": ["2023-03-01"] }},
"relationships": {{
"targets": ["NASA"],
"types": ["Managed By"]
"target_ids": ["NASA"],
"types": ["Managed By"],
"attributes": [{{}}]
}}
}},
{{
Expand All @@ -126,8 +129,9 @@
"_id": "Bill Nelson",
"type": "Person",
"relationships": {{
"targets": ["Artemis II Mission"],
"types": ["Praised By"]
"target_ids": ["Artemis II Mission"],
"types": ["Praised By"],
"attributes": [{{}}]
}}
}}
]
Expand All @@ -146,16 +150,18 @@
"_id": "Rust",
"type": "Programming Language",
"relationships": {{
"targets": ["Memory Safety"],
"types": ["Ensures"]
"target_ids": ["Memory Safety"],
"types": ["Ensures"],
"attributes": [{{}}]
}}
}},
{{
"_id": "Memory Safety",
"type": "Concept",
"relationships": {{
"targets": ["Ownership Model"],
"types": ["Uses"]
"target_ids": ["Ownership Model"],
"types": ["Uses"],
"attributes": [{{}}]
}}
}},
{{ "_id": "Ownership Model", "type": "Concept" }}
Expand Down
146 changes: 144 additions & 2 deletions libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import json
import logging
from copy import deepcopy
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
from typing import TYPE_CHECKING, Any, Callable, Dict, List, Optional, Union

from langchain_core.documents import Document
from langchain_core.language_models.chat_models import BaseChatModel
Expand All @@ -22,13 +22,16 @@

if TYPE_CHECKING:
try:
from typing import TypeAlias # type:ignore[attr-defined] # Python 3.10+
from typing import TypeAlias # type:ignore[attr-defined] # Python 3.10+
except ImportError:
from typing_extensions import TypeAlias # Python 3.9 fallback

Entity: TypeAlias = Dict[str, Any]
"""Represents an Entity in the knowledge graph with specific schema. See .schema"""

import holoviews # type: ignore[import-untyped]
import networkx

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -544,3 +547,142 @@ def chat_response(
entity_schema=entity_schema,
)
)

def to_networkx(
self,
nx_opts: Optional[dict] = None,
json_opts: Optional[dict] = None,
**kwargs: Any,
) -> networkx.DiGraph:
"""Utility converts Entity Collection to `NetworkX DiGraph <https://networkx.org/documentation/stable/index.html>`_

NOTE: Requires optional-dependency "viz", i.e. `uv sync --extra viz`

Args:
nx_opts: Keyword arguments for networkx calls.
json_opts: Keyword arguments for printing of node attributes and types.
**kwargs: Keyword arguments available for compatibility.

Returns: networkx.DiGraph
"""

try:
import json

import networkx as nx
except ImportError as e:
raise ImportError(
"Install optional-dependency `viz` for networkx or to view in Holoviews"
) from e

def _safe_get(lst: list, i: int, default: Any = "") -> Any:
return lst[i] if i < len(lst) else default

nx_opts = {} if nx_opts is None else nx_opts
json_opts = {} if json_opts is None else json_opts

# First pass: Add all nodes with their attributes
nx_graph = nx.DiGraph(**nx_opts)
for doc in self.collection.find({}):
# Add node with all attributes
node_id = doc["_id"]
node_attrs = {}
node_attrs["type"] = json.dumps(doc.get("type", ""), **json_opts)
node_attrs["attributes"] = json.dumps(
doc.get("attributes", {}), **json_opts
)
nx_graph.add_node(node_id, **node_attrs, **json_opts)

# Second pass: Add edges based on relationships
for doc in self.collection.find({}):
source_id = doc["_id"]
relationships = doc.get("relationships", {})
# relationships can contain numerous target_ids, each with type and attributes
target_ids = relationships.get("target_ids", [])
n_targets = len(target_ids)
types = relationships.get("types", [])
attrs = relationships.get("attributes", [])

for t in range(n_targets):
# Add edge and attributes
edge_attrs = {}
edge_attrs["type"] = json.dumps(_safe_get(types, t), **json_opts)
edge_attrs["attributes"] = json.dumps(_safe_get(attrs, t), **json_opts)
if nx_graph.has_node(target_ids[t]):
nx_graph.add_edge(source_id, target_ids[t], **edge_attrs, **nx_opts)
else:
logger.warning(
f"{source_id=} references {target_ids[t]=} not found in collection"
)

return nx_graph

def view(
self,
layout: Optional[Callable] = None,
nx_opts: Optional[dict] = None,
json_opts: Optional[dict] = None,
edge_opts: Optional[dict] = None,
node_opts: Optional[dict] = None,
**kwargs: Any,
) -> holoviews.Graph:
"""Draws a Knowledge Graph as Holoviews/Bokeh interactive plot.

We first convert the entity collection to a NetworkX Graph,
and then convert it to a Holoviews Graph via their API.

Both of these libraries are incredibly feature rich.
We encourage those interested in visualization and/or graph analysis
to dig deeper into their documentation.
The customization options are truly stunning.

The default layout chosen is the spring_layout.
This maximizes the distance between nodes. As our entities have a type field,
however, another good layout choice is:
`layout=nx.multipartite_layout, nx_opts["subset_key"]= "type"`

NOTE: Requires optional-dependency "viz", i.e. `uv sync --extra viz`

You can save the view as any HoloViews object with `.save`.
The type will be inferred from the filename's suffix,
(e.g., hv.save(graph, "graph.html")) or by clicking the download widget
on the Bokeh plot from a Jupyter notebook.

Args:
layout: `networkx layout. <https://networkx.org/documentation/stable/reference/drawing.html#module-networkx.drawing.layout>`_
Defaults to networkx.spring_layout.
nx_opts: Keyword arguments for to_networkx function.
json_opts: Keyword arguments for printing of node attributes and types.
edge_opts: Keyword arguments to draw edges.
node_opts: Keyword arguments to draw nodes.
**kwargs: Keyword arguments available for compatibility.

Returns: `holoviews.Graph <https://holoviews.org/user_guide/Network_Graphs.html>`_

"""
try:
import holoviews as hv
import networkx as nx

hv.extension("bokeh")
except ImportError as e:
raise ImportError("To view graph, install optional-dependency `viz`") from e

hv.opts.defaults(
hv.opts.Graph(xaxis=None, yaxis=None), hv.opts.Nodes(xaxis=None, yaxis=None)
)

if layout is None:
layout = nx.spring_layout
# Convert entity collection to NetworkX graph.
nx_opts = {} if nx_opts is None else nx_opts
json_opts = {} if json_opts is None else json_opts
nx_graph = self.to_networkx(**nx_opts, **json_opts)
# Convert to HoloViews Graph
hv_graph = hv.Graph.from_networkx(nx_graph, layout, **nx_opts)
# Display with hover tools over edges and nodes
edge_opts = {} if edge_opts is None else edge_opts
node_opts = {} if node_opts is None else node_opts
return hv_graph.opts(
inspection_policy="edges", **edge_opts
) * hv_graph.nodes.opts(**node_opts)
34 changes: 33 additions & 1 deletion libs/langchain-mongodb/langchain_mongodb/graphrag/prompts.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,37 @@
Instead of using specific and momentary types such as 'worked_at', use more general and timeless relationship types
like 'employee'. Add details as attributes. Make sure to use general and timeless relationship types!

### CRITICAL: Array Length Alignment
The relationships object contains three arrays: `target_ids`, `types`, and `attributes`.
**These three arrays MUST have EXACTLY the same length.**
- Each position (index) in these arrays describes ONE complete relationship.
- Position 0 in `target_ids`, `types`, and `attributes` together describe the first relationship.
- Position 1 in `target_ids`, `types`, and `attributes` together describe the second relationship.
- And so on...

If a relationship has no attributes, you MUST still include an empty object `{{}}` in the `attributes` array at that position.

Example of CORRECT alignment:
```json
"relationships": {{
"target_ids": ["Entity A", "Entity B"],
"types": ["partners", "supplier"],
"attributes": [
{{"since": ["2020"]}},
{{}}
]
}}
```

Example of INCORRECT (DO NOT DO THIS):
```json
"relationships": {{
"target_ids": ["Entity A", "Entity B"],
"types": ["partners"],
"attributes": [{{"since": ["2020"]}}]
}}
```

**Allowed Relationship Types**:
- Extract ONLY relationships whose `type` matches one of the following: {allowed_relationship_types}.
- If this list is empty, ANY relationship type is permitted.
Expand All @@ -64,7 +95,8 @@
1. Validate that all extracted entities have an `_id` and `type`.
2. Validate that all `type` values are in {allowed_entity_types}.
3. Validate that all relationships use keys in {allowed_relationship_types}.
4. Exclude any entities or relationships failing validation.
4. **CRITICAL**: For each entity with relationships, verify that `target_ids`, `types`, and `attributes` arrays have EXACTLY the same length.
5. Exclude any entities or relationships failing validation.

## Output Schema
Output a valid JSON document with a single top-level key, `entities`, as an array of objects.
Expand Down
8 changes: 8 additions & 0 deletions libs/langchain-mongodb/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ dev = [
"typing-extensions>=4.12.2",
]

[project.optional-dependencies]
viz = [
"networkx>=3.0",
"holoviews>=1.19",
"jupyter>=1.1",
"pyparsing>=3.1",
]

[tool.pytest.ini_options]
minversion = "7"
addopts = "--snapshot-warn-unused --strict-markers --strict-config --durations=5"
Expand Down
Loading
Loading