You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+23Lines changed: 23 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,29 @@
7
7
- Added a `ToolsRetriever` retriever that uses an LLM to decide on what tools to use to find the relevant data.
8
8
- Added `convert_to_tool` method to the `Retriever` interface to convert a Retriever to a Tool so it can be used within the ToolsRetriever. This is useful when you might want to have both a VectorRetriever and a Text2CypherRetreiver as a fallback.
9
9
10
+
### Fixed
11
+
12
+
- Fixed an edge case where the LLM can output a property with type 'map', which was causing errors during import as it is not a valid property type in Neo4j.
13
+
14
+
## 1.9.1
15
+
16
+
### Fixed
17
+
18
+
- Fixed documentation for PdfLoader
19
+
- Fixed a bug where the `format` argument for `OllamaLLM` was not propagated to the client.
20
+
- Fixed `AttributeError` in `SchemaFromTextExtractor` when filtering out node/relationship types with no labels.
21
+
- Fixed an import error in `VertexAIEmbeddings`.
22
+
23
+
## 1.9.0
24
+
25
+
### Fixed
26
+
27
+
- Fixed a bug where Session nodes were duplicated.
28
+
29
+
## Added
30
+
31
+
- Added automatic rate limiting with retry logic and exponential backoff for all LLM providers using tenacity. The `RateLimitHandler` interface allows for custom rate limiting strategies, including the ability to disable rate limiting entirely.
@@ -294,6 +295,91 @@ Here's an example using the Python Ollama client:
294
295
See :ref:`llminterface`.
295
296
296
297
298
+
Rate Limit Handling
299
+
===================
300
+
301
+
All LLM implementations include automatic rate limiting that uses retry logic with exponential backoff by default. This feature helps handle API rate limits from LLM providers gracefully by automatically retrying failed requests with increasing wait times between attempts.
302
+
303
+
Default Rate Limit Handler
304
+
--------------------------
305
+
306
+
Rate limiting is enabled by default for all LLM instances with the following configuration:
307
+
308
+
- **Max attempts**: 3
309
+
- **Min wait**: 1.0 seconds
310
+
- **Max wait**: 60.0 seconds
311
+
- **Multiplier**: 2.0 (exponential backoff)
312
+
313
+
.. code:: python
314
+
315
+
from neo4j_graphrag.llm import OpenAILLM
316
+
317
+
# Rate limiting is automatically enabled
318
+
llm = OpenAILLM(model_name="gpt-4o")
319
+
320
+
# The LLM will automatically retry on rate limit errors
321
+
response = llm.invoke("Hello, world!")
322
+
323
+
.. note::
324
+
325
+
To change the default configuration of `RetryRateLimitHandler`:
326
+
327
+
.. code:: python
328
+
329
+
from neo4j_graphrag.llm import OpenAILLM
330
+
from neo4j_graphrag.llm.rate_limit import RetryRateLimitHandler
331
+
332
+
# Customize rate limiting parameters
333
+
llm = OpenAILLM(
334
+
model_name="gpt-4o",
335
+
rate_limit_handler=RetryRateLimitHandler(
336
+
max_attempts=10, # Increase max retry attempts
337
+
min_wait=2.0, # Increase minimum wait time
338
+
max_wait=120.0, # Increase maximum wait time
339
+
multiplier=3.0# More aggressive backoff
340
+
)
341
+
)
342
+
343
+
Custom Rate Limiting
344
+
--------------------
345
+
346
+
You can customize the rate limiting behavior by creating your own rate limit handler:
347
+
348
+
.. code:: python
349
+
350
+
from neo4j_graphrag.llm import AnthropicLLM
351
+
from neo4j_graphrag.llm.rate_limit import RateLimitHandler
352
+
353
+
classCustomRateLimitHandler(RateLimitHandler):
354
+
"""Implement your custom rate limiting strategy."""
# Optional: Apply rate limit handling to synchronous invoke method
22
+
# @rate_limit_handler
16
23
definvoke(
17
24
self,
18
25
input: str,
@@ -24,6 +31,8 @@ def invoke(
24
31
)
25
32
returnLLMResponse(content=content)
26
33
34
+
# Optional: Apply rate limit handling to asynchronous ainvoke method
35
+
# @async_rate_limit_handler
27
36
asyncdefainvoke(
28
37
self,
29
38
input: str,
@@ -33,6 +42,33 @@ async def ainvoke(
33
42
raiseNotImplementedError()
34
43
35
44
36
-
llm=CustomLLM("")
45
+
llm=CustomLLM(
46
+
""
47
+
) # if rate_limit_handler and async_rate_limit_handler decorators are used, the default rate limit handler will be applied automatically (retry with exponential backoff)
37
48
res: LLMResponse=llm.invoke("text")
38
49
print(res.content)
50
+
51
+
# If rate_limit_handler and async_rate_limit_handler decorators are used and you want to use a custom rate limit handler
52
+
# Type variables for function signatures used in rate limit handlers
0 commit comments