Skip to content

Commit 9a89146

Browse files
authored
Merge branch 'main' into tools-retriever-3
2 parents 526ace9 + 09a65fc commit 9a89146

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+2424
-1113
lines changed

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,29 @@
77
- Added a `ToolsRetriever` retriever that uses an LLM to decide on what tools to use to find the relevant data.
88
- Added `convert_to_tool` method to the `Retriever` interface to convert a Retriever to a Tool so it can be used within the ToolsRetriever. This is useful when you might want to have both a VectorRetriever and a Text2CypherRetreiver as a fallback.
99

10+
### Fixed
11+
12+
- Fixed an edge case where the LLM can output a property with type 'map', which was causing errors during import as it is not a valid property type in Neo4j.
13+
14+
## 1.9.1
15+
16+
### Fixed
17+
18+
- Fixed documentation for PdfLoader
19+
- Fixed a bug where the `format` argument for `OllamaLLM` was not propagated to the client.
20+
- Fixed `AttributeError` in `SchemaFromTextExtractor` when filtering out node/relationship types with no labels.
21+
- Fixed an import error in `VertexAIEmbeddings`.
22+
23+
## 1.9.0
24+
25+
### Fixed
26+
27+
- Fixed a bug where Session nodes were duplicated.
28+
29+
## Added
30+
31+
- Added automatic rate limiting with retry logic and exponential backoff for all LLM providers using tenacity. The `RateLimitHandler` interface allows for custom rate limiting strategies, including the ability to disable rate limiting entirely.
32+
1033

1134
## 1.8.0
1235

docs/source/api.rst

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,28 @@ MistralAILLM
347347
:members:
348348

349349

350+
Rate Limiting
351+
=============
352+
353+
RateLimitHandler
354+
----------------
355+
356+
.. autoclass:: neo4j_graphrag.llm.rate_limit.RateLimitHandler
357+
:members:
358+
359+
RetryRateLimitHandler
360+
---------------------
361+
362+
.. autoclass:: neo4j_graphrag.llm.rate_limit.RetryRateLimitHandler
363+
:members:
364+
365+
NoOpRateLimitHandler
366+
--------------------
367+
368+
.. autoclass:: neo4j_graphrag.llm.rate_limit.NoOpRateLimitHandler
369+
:members:
370+
371+
350372
PromptTemplate
351373
==============
352374

@@ -473,6 +495,8 @@ Errors
473495

474496
* :class:`neo4j_graphrag.exceptions.LLMGenerationError`
475497

498+
* :class:`neo4j_graphrag.exceptions.RateLimitError`
499+
476500
* :class:`neo4j_graphrag.exceptions.SchemaValidationError`
477501

478502
* :class:`neo4j_graphrag.exceptions.PdfLoaderError`
@@ -597,6 +621,13 @@ LLMGenerationError
597621
:show-inheritance:
598622

599623

624+
RateLimitError
625+
==============
626+
627+
.. autoclass:: neo4j_graphrag.exceptions.RateLimitError
628+
:show-inheritance:
629+
630+
600631
SchemaValidationError
601632
=====================
602633

docs/source/user_guide_kg_builder.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -583,7 +583,7 @@ This package currently supports text extraction from PDFs:
583583
from neo4j_graphrag.experimental.components.pdf_loader import PdfLoader
584584
585585
loader = PdfLoader()
586-
await loader.run(path=Path("my_file.pdf"))
586+
await loader.run(filepath=Path("my_file.pdf"))
587587
588588
To implement your own loader, use the `DataLoader` interface:
589589

@@ -783,16 +783,16 @@ Here is a code block illustrating these concepts:
783783
NodeType(
784784
label="Person",
785785
properties=[
786-
SchemaProperty(name="name", type="STRING"),
787-
SchemaProperty(name="place_of_birth", type="STRING"),
788-
SchemaProperty(name="date_of_birth", type="DATE"),
786+
PropertyType(name="name", type="STRING"),
787+
PropertyType(name="place_of_birth", type="STRING"),
788+
PropertyType(name="date_of_birth", type="DATE"),
789789
],
790790
),
791791
NodeType(
792792
label="Organization",
793793
properties=[
794-
SchemaProperty(name="name", type="STRING"),
795-
SchemaProperty(name="country", type="STRING"),
794+
PropertyType(name="name", type="STRING"),
795+
PropertyType(name="country", type="STRING"),
796796
],
797797
),
798798
],

docs/source/user_guide_pipeline.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ See :ref:`pipelineevent` and :ref:`taskevent` to see what is sent in each event
154154
import logging
155155
156156
from neo4j_graphrag.experimental.pipeline import Pipeline
157-
from neo4j_graphrag.experimental.pipeline.types import Event
157+
from neo4j_graphrag.experimental.pipeline.notification import Event
158158
159159
logger = logging.getLogger(__name__)
160160
logging.basicConfig()

docs/source/user_guide_rag.rst

Lines changed: 88 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,15 +125,15 @@ To use VertexAI, instantiate the `VertexAILLM` class:
125125
126126
generation_config = GenerationConfig(temperature=0.0)
127127
llm = VertexAILLM(
128-
model_name="gemini-1.5-flash-001", generation_config=generation_config
128+
model_name="gemini-2.5-flash", generation_config=generation_config
129129
)
130130
llm.invoke("say something")
131131
132132
133133
.. note::
134134

135135
In order to run this code, the `google-cloud-aiplatform` Python package needs to be installed:
136-
`pip install "neo4j_grpahrag[vertexai]"`
136+
`pip install "neo4j_graphrag[google]"`
137137

138138

139139
See :ref:`vertexaillm`.
@@ -225,6 +225,7 @@ it can be queried using the following:
225225
from neo4j_graphrag.llm import OllamaLLM
226226
llm = OllamaLLM(
227227
model_name="orca-mini",
228+
# model_params={"options": {"temperature": 0}, "format": "json"},
228229
# host="...", # when using a remote server
229230
)
230231
llm.invoke("say something")
@@ -294,6 +295,91 @@ Here's an example using the Python Ollama client:
294295
See :ref:`llminterface`.
295296

296297

298+
Rate Limit Handling
299+
===================
300+
301+
All LLM implementations include automatic rate limiting that uses retry logic with exponential backoff by default. This feature helps handle API rate limits from LLM providers gracefully by automatically retrying failed requests with increasing wait times between attempts.
302+
303+
Default Rate Limit Handler
304+
--------------------------
305+
306+
Rate limiting is enabled by default for all LLM instances with the following configuration:
307+
308+
- **Max attempts**: 3
309+
- **Min wait**: 1.0 seconds
310+
- **Max wait**: 60.0 seconds
311+
- **Multiplier**: 2.0 (exponential backoff)
312+
313+
.. code:: python
314+
315+
from neo4j_graphrag.llm import OpenAILLM
316+
317+
# Rate limiting is automatically enabled
318+
llm = OpenAILLM(model_name="gpt-4o")
319+
320+
# The LLM will automatically retry on rate limit errors
321+
response = llm.invoke("Hello, world!")
322+
323+
.. note::
324+
325+
To change the default configuration of `RetryRateLimitHandler`:
326+
327+
.. code:: python
328+
329+
from neo4j_graphrag.llm import OpenAILLM
330+
from neo4j_graphrag.llm.rate_limit import RetryRateLimitHandler
331+
332+
# Customize rate limiting parameters
333+
llm = OpenAILLM(
334+
model_name="gpt-4o",
335+
rate_limit_handler=RetryRateLimitHandler(
336+
max_attempts=10, # Increase max retry attempts
337+
min_wait=2.0, # Increase minimum wait time
338+
max_wait=120.0, # Increase maximum wait time
339+
multiplier=3.0 # More aggressive backoff
340+
)
341+
)
342+
343+
Custom Rate Limiting
344+
--------------------
345+
346+
You can customize the rate limiting behavior by creating your own rate limit handler:
347+
348+
.. code:: python
349+
350+
from neo4j_graphrag.llm import AnthropicLLM
351+
from neo4j_graphrag.llm.rate_limit import RateLimitHandler
352+
353+
class CustomRateLimitHandler(RateLimitHandler):
354+
"""Implement your custom rate limiting strategy."""
355+
# Implement required methods: handle_sync, handle_async
356+
pass
357+
358+
# Create custom rate limit handler and pass it to the LLM interface
359+
custom_handler = CustomRateLimitHandler()
360+
361+
llm = AnthropicLLM(
362+
model_name="claude-3-sonnet-20240229",
363+
rate_limit_handler=custom_handler,
364+
)
365+
366+
Disabling Rate Limiting
367+
-----------------------
368+
369+
For high-throughput applications or when you handle rate limiting externally, you can disable it:
370+
371+
.. code:: python
372+
373+
from neo4j_graphrag.llm import CohereLLM, NoOpRateLimitHandler
374+
375+
# Disable rate limiting completely
376+
llm = CohereLLM(
377+
model_name="command-r-plus",
378+
rate_limit_handler=NoOpRateLimitHandler(),
379+
)
380+
llm.invoke("Hello, world!")
381+
382+
297383
Configuring the Prompt
298384
========================
299385

examples/customize/embeddings/vertexai_embeddings.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,6 @@
44

55
from neo4j_graphrag.embeddings import VertexAIEmbeddings
66

7-
embeder = VertexAIEmbeddings(model="text-embedding-004")
7+
embeder = VertexAIEmbeddings(model="text-embedding-005")
88
res = embeder.embed_query("my question")
99
print(res[:10])

examples/customize/llms/custom_llm.py

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
import random
22
import string
3-
from typing import Any, List, Optional, Union
3+
from typing import Any, Awaitable, Callable, List, Optional, TypeVar, Union
44

55
from neo4j_graphrag.llm import LLMInterface, LLMResponse
6+
from neo4j_graphrag.llm.rate_limit import (
7+
RateLimitHandler,
8+
# rate_limit_handler,
9+
# async_rate_limit_handler,
10+
)
611
from neo4j_graphrag.message_history import MessageHistory
712
from neo4j_graphrag.types import LLMMessage
813

@@ -13,6 +18,8 @@ def __init__(
1318
):
1419
super().__init__(model_name, **kwargs)
1520

21+
# Optional: Apply rate limit handling to synchronous invoke method
22+
# @rate_limit_handler
1623
def invoke(
1724
self,
1825
input: str,
@@ -24,6 +31,8 @@ def invoke(
2431
)
2532
return LLMResponse(content=content)
2633

34+
# Optional: Apply rate limit handling to asynchronous ainvoke method
35+
# @async_rate_limit_handler
2736
async def ainvoke(
2837
self,
2938
input: str,
@@ -33,6 +42,33 @@ async def ainvoke(
3342
raise NotImplementedError()
3443

3544

36-
llm = CustomLLM("")
45+
llm = CustomLLM(
46+
""
47+
) # if rate_limit_handler and async_rate_limit_handler decorators are used, the default rate limit handler will be applied automatically (retry with exponential backoff)
3748
res: LLMResponse = llm.invoke("text")
3849
print(res.content)
50+
51+
# If rate_limit_handler and async_rate_limit_handler decorators are used and you want to use a custom rate limit handler
52+
# Type variables for function signatures used in rate limit handlers
53+
F = TypeVar("F", bound=Callable[..., Any])
54+
AF = TypeVar("AF", bound=Callable[..., Awaitable[Any]])
55+
56+
57+
class CustomRateLimitHandler(RateLimitHandler):
58+
def __init__(self) -> None:
59+
super().__init__()
60+
61+
def handle_sync(self, func: F) -> F:
62+
# error handling here
63+
return func
64+
65+
def handle_async(self, func: AF) -> AF:
66+
# error handling here
67+
return func
68+
69+
70+
llm_with_custom_rate_limit_handler = CustomLLM(
71+
"", rate_limit_handler=CustomRateLimitHandler()
72+
)
73+
result: LLMResponse = llm_with_custom_rate_limit_handler.invoke("text")
74+
print(result.content)

examples/customize/llms/ollama_llm.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
llm = OllamaLLM(
88
model_name="<model_name>",
9+
# model_params={"options": {"temperature": 0}, "format": "json"},
910
# host="...", # if using a remote server
1011
)
1112
res: LLMResponse = llm.invoke("What is the additive color model?")
Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
from neo4j_graphrag.llm import LLMResponse, VertexAILLM
22
from vertexai.generative_models import GenerationConfig
33

4-
generation_config = GenerationConfig(temperature=0.0)
4+
generation_config = GenerationConfig(temperature=1.0)
55
llm = VertexAILLM(
6-
model_name="gemini-1.5-flash-001",
6+
model_name="gemini-2.0-flash-001",
77
generation_config=generation_config,
88
# add here any argument that will be passed to the
99
# vertexai.generative_models.GenerativeModel client
1010
)
11-
res: LLMResponse = llm.invoke("say something")
11+
res: LLMResponse = llm.invoke(
12+
"say something",
13+
system_instruction="You are living in 3000 where AI rules the world",
14+
)
1215
print(res.content)

0 commit comments

Comments
 (0)