You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,7 @@
13
13
- Support for Python 3.13
14
14
- Added support for automatic schema extraction from text using LLMs. In the `SimpleKGPipeline`, when the user provides no schema, the automatic schema extraction is enabled by default.
15
15
- Added ability to return a user-defined message if context is empty in GraphRAG (which skips the LLM call).
16
+
- Added automatic rate limiting with retry logic and exponential backoff for all LLM providers using tenacity. The `RateLimitHandler` interface allows for custom rate limiting strategies, including the ability to disable rate limiting entirely.
In order to run this code, the `google-cloud-aiplatform` Python package needs to be installed:
136
-
`pip install "neo4j_grpahrag[vertexai]"`
136
+
`pip install "neo4j_graphrag[google]"`
137
137
138
138
139
139
See :ref:`vertexaillm`.
@@ -294,6 +294,91 @@ Here's an example using the Python Ollama client:
294
294
See :ref:`llminterface`.
295
295
296
296
297
+
Rate Limit Handling
298
+
===================
299
+
300
+
All LLM implementations include automatic rate limiting that uses retry logic with exponential backoff by default. This feature helps handle API rate limits from LLM providers gracefully by automatically retrying failed requests with increasing wait times between attempts.
301
+
302
+
Default Rate Limit Handler
303
+
--------------------------
304
+
305
+
Rate limiting is enabled by default for all LLM instances with the following configuration:
306
+
307
+
- **Max attempts**: 3
308
+
- **Min wait**: 1.0 seconds
309
+
- **Max wait**: 60.0 seconds
310
+
- **Multiplier**: 2.0 (exponential backoff)
311
+
312
+
.. code:: python
313
+
314
+
from neo4j_graphrag.llm import OpenAILLM
315
+
316
+
# Rate limiting is automatically enabled
317
+
llm = OpenAILLM(model_name="gpt-4o")
318
+
319
+
# The LLM will automatically retry on rate limit errors
320
+
response = llm.invoke("Hello, world!")
321
+
322
+
.. note::
323
+
324
+
To change the default configuration of `RetryRateLimitHandler`:
325
+
326
+
.. code:: python
327
+
328
+
from neo4j_graphrag.llm import OpenAILLM
329
+
from neo4j_graphrag.llm.rate_limit import RetryRateLimitHandler
330
+
331
+
# Customize rate limiting parameters
332
+
llm = OpenAILLM(
333
+
model_name="gpt-4o",
334
+
rate_limit_handler=RetryRateLimitHandler(
335
+
max_attempts=10, # Increase max retry attempts
336
+
min_wait=2.0, # Increase minimum wait time
337
+
max_wait=120.0, # Increase maximum wait time
338
+
multiplier=3.0# More aggressive backoff
339
+
)
340
+
)
341
+
342
+
Custom Rate Limiting
343
+
--------------------
344
+
345
+
You can customize the rate limiting behavior by creating your own rate limit handler:
346
+
347
+
.. code:: python
348
+
349
+
from neo4j_graphrag.llm import AnthropicLLM
350
+
from neo4j_graphrag.llm.rate_limit import RateLimitHandler
351
+
352
+
classCustomRateLimitHandler(RateLimitHandler):
353
+
"""Implement your custom rate limiting strategy."""
# Optional: Apply rate limit handling to synchronous invoke method
22
+
# @rate_limit_handler
16
23
definvoke(
17
24
self,
18
25
input: str,
@@ -24,6 +31,8 @@ def invoke(
24
31
)
25
32
returnLLMResponse(content=content)
26
33
34
+
# Optional: Apply rate limit handling to asynchronous ainvoke method
35
+
# @async_rate_limit_handler
27
36
asyncdefainvoke(
28
37
self,
29
38
input: str,
@@ -33,6 +42,33 @@ async def ainvoke(
33
42
raiseNotImplementedError()
34
43
35
44
36
-
llm=CustomLLM("")
45
+
llm=CustomLLM(
46
+
""
47
+
) # if rate_limit_handler and async_rate_limit_handler decorators are used, the default rate limit handler will be applied automatically (retry with exponential backoff)
37
48
res: LLMResponse=llm.invoke("text")
38
49
print(res.content)
50
+
51
+
# If rate_limit_handler and async_rate_limit_handler decorators are used and you want to use a custom rate limit handler
52
+
# Type variables for function signatures used in rate limit handlers
0 commit comments