Commit a200ff2
feat: simplify extraction pipeline and add batch entity summarization (#1224)
* feat(llm): add token usage tracking for LLM calls
Add TokenUsageTracker class to track input/output tokens by prompt type
during LLM calls. This helps analyze token costs across different
operations like extract_nodes, extract_edges, resolve_nodes, etc.
Changes:
- Add graphiti_core/llm_client/token_tracker.py with TokenUsageTracker
- Update LLMClient base class to include token_tracker instance
- Update OpenAI base client to capture and record token usage
- Add token_tracker property on Graphiti class for easy access
- Update podcast_runner.py to print token usage summary after ingestion
Usage:
client = Graphiti(...)
# ... run ingestion ...
client.token_tracker.print_summary(sort_by='prompt_name')
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: temporarily disable summary early return optimization
Disable the optimization that skips LLM calls when node summary + edge
facts is under 2000 characters. This forces all summaries to be
generated via LLM for token usage analysis.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Revert "chore: temporarily disable summary early return optimization"
This reverts the summary optimization changes.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: simplify extraction pipeline and add batch entity summarization
- Remove chunking code for entity-dense episodes (node_operations.py)
- Delete _extract_nodes_chunked, _extract_from_chunk, _merge_extracted_entities
- Always use single LLM call for entity extraction
- Remove chunking code for edge extraction (edge_operations.py)
- Remove MAX_NODES constant and generate_covering_chunks usage
- Process all nodes in single LLM call instead of covering subsets
- Add batch entity summarization (node_operations.py, extract_nodes.py)
- New SummarizedEntity and SummarizedEntities Pydantic models
- New extract_summaries_batch prompt for batch processing
- New _extract_entity_summaries_batch function
- Nodes with short summaries get edge facts appended directly (no LLM)
- Only nodes needing LLM summarization are batched together
- Simplify edge attribute extraction (extract_edges.py, edge_operations.py)
- Remove episode_content from context (attributes from fact only)
- Keep reference_time for temporal resolution
- Add existing_attributes to preserve/update existing values
- Improve edge deduplication prompt (dedupe_edges.py, edge_operations.py)
- Use continuous indexing across duplicate and invalidation candidates
- Deduplicate invalidation candidates against duplicate candidates
- Allow EXISTING FACTS to be both duplicates AND contradicted
- Consolidate to single contradicted_facts field
- Remove obsolete chunking tests (test_entity_extraction.py)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: bump version to 0.27.2pre1
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Add token tracking for Anthropic/Gemini clients and missing tests
- Implement token tracking in AnthropicClient._generate_response()
and generate_response() using result.usage.input_tokens/output_tokens
- Implement token tracking in GeminiClient._generate_response()
and generate_response() using response.usage_metadata
- Add comprehensive unit tests for TokenUsageTracker class
- Add tests for _extract_entity_summaries_batch function covering:
- No nodes needing summarization
- Short summaries with edge facts
- Long summaries requiring LLM
- Node filter (should_summarize_node)
- Batch multiple nodes
- Unknown entity handling
- Missing episode and summary
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Update test_node_operations.py for batch summarization API
- Remove import of extract_attributes_from_node (function was removed)
- Add import of _extract_entity_summaries_batch
- Update tests to use new batch summarization API
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Add MAX_NODES limit for batch entity summarization
- Add MAX_NODES = 30 constant
- Partition nodes needing summarization into flights of MAX_NODES
- Extract _process_summary_flight helper for processing each flight
- Each flight makes a separate LLM call to avoid context overflow
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Change default OpenAI models to gpt-5-mini
Update both DEFAULT_MODEL and DEFAULT_SMALL_MODEL to use gpt-5-mini.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Update podcast_runner.py to use default OpenAI models
Remove explicit model configuration to use the default gpt-5-mini models
from OpenAIClient.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Revert default model changes to gpt-4.1-mini/nano
Restore the original default models instead of gpt-5-mini.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Address PR review comments
- Fix unreachable code in _handle_structured_response (check response.refusal)
- Process node summary flights in parallel using semaphore_gather
- Use case-insensitive name matching for LLM summary responses
- Handle duplicate node names by applying summary to all matching nodes
- Fix edge case when both edge lists are empty in contradiction processing
- Fix potential AttributeError when episode is None in edge attributes
- Add tests for flight partitioning and case-insensitive name matching
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>1 parent fe19482 commit a200ff2
File tree
17 files changed
+1108
-605
lines changed- examples/podcast
- graphiti_core
- llm_client
- prompts
- utils/maintenance
- tests
- llm_client
- utils/maintenance
17 files changed
+1108
-605
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
85 | 86 | | |
86 | 87 | | |
87 | 88 | | |
88 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
89 | 95 | | |
90 | 96 | | |
91 | 97 | | |
| |||
149 | 155 | | |
150 | 156 | | |
151 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
152 | 162 | | |
153 | 163 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
260 | 260 | | |
261 | 261 | | |
262 | 262 | | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
263 | 275 | | |
264 | 276 | | |
265 | 277 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
257 | 257 | | |
258 | 258 | | |
259 | 259 | | |
260 | | - | |
| 260 | + | |
261 | 261 | | |
262 | 262 | | |
263 | 263 | | |
| |||
267 | 267 | | |
268 | 268 | | |
269 | 269 | | |
270 | | - | |
| 270 | + | |
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
| |||
295 | 295 | | |
296 | 296 | | |
297 | 297 | | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
298 | 305 | | |
299 | 306 | | |
300 | 307 | | |
301 | 308 | | |
302 | 309 | | |
303 | 310 | | |
304 | 311 | | |
305 | | - | |
| 312 | + | |
306 | 313 | | |
307 | 314 | | |
308 | 315 | | |
309 | 316 | | |
310 | | - | |
| 317 | + | |
311 | 318 | | |
312 | 319 | | |
313 | 320 | | |
| |||
372 | 379 | | |
373 | 380 | | |
374 | 381 | | |
| 382 | + | |
| 383 | + | |
375 | 384 | | |
376 | 385 | | |
377 | 386 | | |
378 | | - | |
| 387 | + | |
379 | 388 | | |
380 | 389 | | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
381 | 395 | | |
382 | 396 | | |
383 | 397 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
80 | 81 | | |
81 | 82 | | |
82 | 83 | | |
| 84 | + | |
83 | 85 | | |
84 | 86 | | |
85 | 87 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
239 | 239 | | |
240 | 240 | | |
241 | 241 | | |
242 | | - | |
| 242 | + | |
243 | 243 | | |
244 | 244 | | |
245 | 245 | | |
| |||
250 | 250 | | |
251 | 251 | | |
252 | 252 | | |
253 | | - | |
| 253 | + | |
254 | 254 | | |
255 | 255 | | |
256 | 256 | | |
| |||
306 | 306 | | |
307 | 307 | | |
308 | 308 | | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
309 | 316 | | |
310 | 317 | | |
311 | 318 | | |
| |||
322 | 329 | | |
323 | 330 | | |
324 | 331 | | |
325 | | - | |
| 332 | + | |
326 | 333 | | |
327 | 334 | | |
328 | 335 | | |
| |||
333 | 340 | | |
334 | 341 | | |
335 | 342 | | |
336 | | - | |
| 343 | + | |
337 | 344 | | |
338 | 345 | | |
339 | 346 | | |
340 | | - | |
| 347 | + | |
341 | 348 | | |
342 | 349 | | |
343 | 350 | | |
| |||
394 | 401 | | |
395 | 402 | | |
396 | 403 | | |
| 404 | + | |
| 405 | + | |
397 | 406 | | |
398 | 407 | | |
399 | 408 | | |
400 | | - | |
| 409 | + | |
401 | 410 | | |
402 | 411 | | |
403 | 412 | | |
404 | 413 | | |
405 | 414 | | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
406 | 421 | | |
407 | 422 | | |
408 | 423 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
116 | | - | |
117 | | - | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
118 | 124 | | |
119 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
120 | 133 | | |
121 | | - | |
122 | | - | |
123 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
124 | 137 | | |
125 | | - | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
126 | 142 | | |
127 | | - | |
128 | | - | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
129 | 146 | | |
130 | | - | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
131 | 156 | | |
132 | 157 | | |
133 | 158 | | |
134 | 159 | | |
135 | 160 | | |
136 | 161 | | |
137 | 162 | | |
138 | | - | |
139 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
140 | 169 | | |
141 | 170 | | |
142 | 171 | | |
| |||
210 | 239 | | |
211 | 240 | | |
212 | 241 | | |
| 242 | + | |
| 243 | + | |
213 | 244 | | |
214 | 245 | | |
215 | 246 | | |
216 | | - | |
| 247 | + | |
217 | 248 | | |
218 | 249 | | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
219 | 256 | | |
220 | 257 | | |
221 | 258 | | |
| |||
0 commit comments