Fix: Apply HTTP client timeouts to prevent infinite hangs #23

TheFermiSea · 2025-10-17T22:37:38Z

Fix: Apply HTTP client timeouts to prevent infinite hangs

Summary

Fixes a critical bug where octocode's GraphRAG indexing process hangs indefinitely during LLM API calls. The root cause is that reqwest::Client instances are created using Client::new() instead of the builder pattern, which prevents the configured batch_timeout_seconds from being applied.

Problem

When using LLM-powered GraphRAG features (use_llm = true), the indexing process hangs indefinitely at two points:

File description generation (with ai_batch_size > 1)
Architectural relationship extraction (always processes all files in one batch)

The configuration parameter graphrag.llm.batch_timeout_seconds is loaded but never applied to the HTTP client, causing requests to wait indefinitely when the LLM provider is slow to respond.

Root Cause Analysis

Primary Bug (`src/indexer/graphrag/builder.rs:74`)

// BEFORE (buggy):
let client = Client::new();

// AFTER (fixed):
let client = Client::builder()
    .timeout(std::time::Duration::from_secs(
        config.graphrag.llm.batch_timeout_seconds,
    ))
    .build()?;

Secondary Bug (`src/embedding/provider/huggingface.rs:460`)

// BEFORE (buggy):
let client = reqwest::Client::new();

// AFTER (fixed):
let client = reqwest::Client::builder()
    .timeout(std::time::Duration::from_secs(30))
    .build()?;

Reproduction

Configure octocode with use_llm = true and ai_batch_size > 1
Run octocode index on any codebase
Process hangs at "AI analyzing X files for architectural relationships" with infinite spinner
No timeout occurs even after configured batch_timeout_seconds expires

Testing

Before Fix

Both GPT-4.1-mini (default) and GPT-5-mini exhibited infinite hangs
Workaround with ai_batch_size = 1 only helped description phase
Relationship extraction always hung (processes 72 files in single batch)

After Fix

Timeouts properly trigger after configured duration
Failed requests return error messages instead of hanging forever
Large batch processing completes or fails gracefully

Impact

This bug made LLM-powered GraphRAG features unusable for any non-trivial codebase. The fix enables:

Reliable timeout behavior for all LLM API calls
Proper error handling and recovery
Predictable indexing duration

Code Quality Improvements

In addition to the core timeout fixes, this PR includes:

Enhanced Error Handling

Added .context() error messages to both HTTP client build failures
GraphRAG client: "Failed to create HTTP client for LLM API calls"
HuggingFace client: "Failed to create HTTP client for HuggingFace downloads"

Documentation Comments

// IMPORTANT: Must use builder pattern with timeout to prevent infinite hangs
// when LLM API calls take too long. Client::new() does not apply timeouts.

These comments at both fix locations help prevent future regressions.

Code Verification

Zero clippy warnings
All existing Client::new() uses verified (3 instances in commands/ use request-level timeouts correctly)
No unwrap() issues in production code
Consistent error handling patterns

Technical Details

Why Client::new() Doesn't Apply Timeouts

The reqwest::Client::new() convenience method creates a client with default settings that do not include any timeout. To apply a timeout, you must use the builder pattern:

// ❌ WRONG - no timeout applied
let client = Client::new();

// ✅ CORRECT - timeout is applied
let client = Client::builder()
    .timeout(Duration::from_secs(120))
    .build()?;

Request-Level vs Client-Level Timeouts

This PR uses client-level timeouts. An alternative approach is request-level timeouts:

// Also valid, but less convenient for repeated requests
let client = Client::new();
let response = client.get(url)
    .timeout(Duration::from_secs(120))
    .send()
    .await?;

The codebase uses both patterns appropriately:

Client-level (this PR): For GraphRAG batch operations and HuggingFace downloads
Request-level: In src/commands/ where each request may need different timeouts

Test Results

Unit Tests

cargo test --all-features

Results: 93 passed, 3 failed

The 3 failures are unrelated FastEmbed lock acquisition issues:

test_fastembed_provider_creation
test_fastembed_model_validation
test_fastembed_embedding_generation

These failures are environmental (file lock contention in local cache directory), not caused by the timeout fix changes.

Integration Testing

Real-world test on rust-daq codebase (113 files):

File indexing: ✅ 113/113 files processed
GraphRAG blocks: ✅ 896 blocks created
Relationship extraction: ✅ Timeout triggered after 120s (expected behavior)
Before fix: Infinite hang
After fix: Graceful timeout with error message

Configuration Tested

[graphrag.llm]
batch_timeout_seconds = 120
ai_batch_size = 10
description_model = "openai/gpt-4.1-mini"
relationship_model = "google/gemini-2.0-flash-001"

Migration Guide for Developers

If you're creating new HTTP clients in octocode:

For LLM/API Calls

use reqwest::Client;
use anyhow::Context;

let client = Client::builder()
    .timeout(std::time::Duration::from_secs(
        config.graphrag.llm.batch_timeout_seconds,
    ))
    .build()
    .context("Failed to create HTTP client")?;

For File Downloads

let client = Client::builder()
    .timeout(std::time::Duration::from_secs(30))
    .build()
    .context("Failed to create HTTP client")?;

Don't Use Client::new()

Unless you have a specific reason to avoid timeouts (rare), always use the builder pattern.

Related Issues

This fix resolves the core issue documented in:

octocode_timeout_analysis.md (comprehensive technical analysis)
User reports of infinite hangs during GraphRAG indexing

Checklist

Code changes tested locally
Both timeout bugs fixed (GraphRAG + HuggingFace)
No breaking changes to API
Enhanced error messages added
Documentation comments added
Code quality verified (clippy clean)
Integration tested on real codebase
Unit tests passing (93/96 passed, 3 unrelated FastEmbed lock failures)
Ready for upstream PR

- Fix builder.rs:74: Use Client::builder() with batch_timeout_seconds - Fix huggingface.rs:460: Use Client::builder() with 30s timeout - Prevents infinite hangs during LLM API calls and model downloads Resolves timeout bug documented in octocode_timeout_analysis.md

…ixes Enhanced the timeout bug fixes with: - Better error messages using .context() for HTTP client build failures - Documentation comments explaining why builder pattern is required - Clear warnings that Client::new() does not apply timeouts These improvements help prevent future regressions and make the code more maintainable by documenting the critical timeout requirement. Related to the fix for infinite hangs during LLM GraphRAG operations.

Added two documentation files to support PR submission: 1. PR_DESCRIPTION.md - Detailed technical explanation of the bug and fix - Code quality improvements section - Integration test results from rust-daq codebase - Migration guide for developers - Comprehensive checklist 2. TESTING.md - Complete testing procedures (unit, integration, regression) - Manual testing checklist - Real-world test results - Troubleshooting guide - CI/CD recommendations These documents provide all information needed for PR review and merge. Ready for upstream submission.

TheFermiSea added 4 commits October 16, 2025 22:13

docs: Update PR description with test results

1257555

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Apply HTTP client timeouts to prevent infinite hangs #23

Fix: Apply HTTP client timeouts to prevent infinite hangs #23

Uh oh!

TheFermiSea commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Apply HTTP client timeouts to prevent infinite hangs #23

Are you sure you want to change the base?

Fix: Apply HTTP client timeouts to prevent infinite hangs #23

Uh oh!

Conversation

TheFermiSea commented Oct 17, 2025

Fix: Apply HTTP client timeouts to prevent infinite hangs

Summary

Problem

Root Cause Analysis

Primary Bug (src/indexer/graphrag/builder.rs:74)

Secondary Bug (src/embedding/provider/huggingface.rs:460)

Reproduction

Testing

Before Fix

After Fix

Impact

Related Documentation

Code Quality Improvements

Enhanced Error Handling

Documentation Comments

Code Verification

Technical Details

Why Client::new() Doesn't Apply Timeouts

Request-Level vs Client-Level Timeouts

Test Results

Unit Tests

Integration Testing

Configuration Tested

Migration Guide for Developers

For LLM/API Calls

For File Downloads

Don't Use Client::new()

Related Issues

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Primary Bug (`src/indexer/graphrag/builder.rs:74`)

Secondary Bug (`src/embedding/provider/huggingface.rs:460`)