Fix: /file2document/convert blocks event loop on large folders causing 504 timeout by SyedShahmeerAli12 · Pull Request #13784 · infiniflow/ragflow

SyedShahmeerAli12 · 2026-03-25T08:00:21Z

Problem

The /file2document/convert endpoint ran all file lookups, document deletions, and insertions synchronously inside the
request cycle. Linking a large folder (~1.7GB with many files) caused 504 Gateway Timeout because the blocking DB loop
held the HTTP connection open for too long.

Fix

Extracted the heavy DB work into a plain sync function _convert_files
Inputs are validated and folder file IDs expanded upfront (fast path)
The blocking work is dispatched to a thread pool via get_running_loop().run_in_executor() and the endpoint returns 200
immediately
Frontend only checks data.code === 0 so the response change (file2documents list → True) has no impact

Fixes #13781

When the same model name exists for multiple types (common with OpenAI-API-Compatible providers), calling get_api_key without a model_type filter could return the wrong tenant_llm_id, causing the tenant_{key} columns to reference an incorrect model type. This is the same class of bug fixed in PR infiniflow#13569 for get_model_config_by_type_and_name, now applied consistently to ensure_tenant_model_id_for_params in tenant_utils.py. Fixes infiniflow#13775 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Added `inline-block max-w-[120px] truncate align-middle` classes to prevent long usernames from wrapping to multiple lines in the UI. Fixes infiniflow#13748 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t 504 timeout The convert endpoint executed all file lookups, document removals, and insertions synchronously in the request cycle. For large folders this caused 504 Gateway Timeout errors. Fix: validate inputs and expand folder file IDs upfront, then dispatch the blocking DB work to a thread pool via get_running_loop().run_in_executor so the HTTP response is returned immediately without waiting for completion. Fixes infiniflow#13781 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes /file2document/convert request timeouts by moving the heavy synchronous DB work off the async request path so the event loop isn’t blocked during large folder conversions.

Changes:

Dispatches convert’s delete/insert DB loop to a thread via asyncio.get_running_loop().run_in_executor(...) and returns immediately.
Adds tenant-model lookup disambiguation by passing a specific LLMType to TenantLLMService.get_api_key(...).
Truncates SharedBadge UI content to prevent overflow.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
`api/apps/file2document_app.py`	Makes `/convert` schedule conversion work in an executor instead of doing it inline.
`api/utils/tenant_utils.py`	Adds key→`LLMType` mapping and passes `model_type` into tenant model lookup.
`web/src/components/shared-badge.tsx`	Adds truncation/width styling to prevent badge overflow.

Copilot · 2026-03-25T15:54:15Z

api/apps/file2document_app.py

+        for kb_id in kb_ids:
+            e, kb = KnowledgebaseService.get_by_id(kb_id)
+            if not e:
+                continue
+            e, file = FileService.get_by_id(id)
+            if not e:
+                continue


_convert_files calls FileService.get_by_id(id) inside the for kb_id in kb_ids loop, so the same file is re-fetched once per KB. Fetch the file once per file_id (outside the KB loop) to avoid redundant DB calls, which can be significant for large folders and many datasets.

Suggested change

for kb_id in kb_ids:

e, kb = KnowledgebaseService.get_by_id(kb_id)

if not e:

continue

e, file = FileService.get_by_id(id)

if not e:

continue

# Fetch the file once per file_id to avoid redundant DB calls in the kb_ids loop.

e, file = FileService.get_by_id(id)

if not e:

continue

for kb_id in kb_ids:

e, kb = KnowledgebaseService.get_by_id(kb_id)

if not e:

continue

Copilot · 2026-03-25T15:54:15Z

api/apps/file2document_app.py

-                        return get_data_error_result(
-                            message="Database error (Document removal)!")
-                File2DocumentService.delete_by_file_id(id)



The new async behavior returns success after validating only file_ids. Previously this endpoint validated kb_ids exist (and would fail fast with "Can't find this dataset!"). With the current code, invalid kb_ids will still return 200/code==0 while the background worker silently skips inserts. Please validate all kb_ids up front before scheduling the executor job.

Suggested change

# Validate all knowledge bases exist before scheduling background work

for kb_id in kb_ids:

e, kb = KnowledgebaseService.get_by_id(kb_id)

if not e or kb is None:

return get_data_error_result(message="Can't find this dataset!")

Copilot · 2026-03-25T15:54:16Z

api/apps/file2document_app.py

+        asyncio.get_running_loop().run_in_executor(
+            None, _convert_files, all_file_ids, kb_ids, user_id
+        )


run_in_executor(...) is fire-and-forget here: the returned Future isn’t stored or awaited, so exceptions raised in _convert_files can become "Future exception was never retrieved" warnings and failures are effectively invisible to callers. Add a done-callback (or store the Future in a task registry) to log/handle exceptions, and consider limiting concurrency with a dedicated executor if this endpoint can be called frequently.

Copilot · 2026-03-25T15:54:16Z

api/apps/file2document_app.py

+        asyncio.get_running_loop().run_in_executor(
+            None, _convert_files, all_file_ids, kb_ids, user_id
+        )
+        return get_json_result(data=True)


This endpoint’s contract changed substantially (now returns data=True immediately and no longer surfaces per-document/per-dataset errors). There are existing unit tests that assert the old synchronous behavior and response payload (e.g., test/testcases/test_web_api/test_file_app/test_file2document_routes_unit.py::test_convert_branch_matrix_unit), which will fail and should be updated to reflect the new async semantics and response shape.

Copilot · 2026-03-25T15:54:16Z

api/utils/tenant_utils.py

+from common.constants import LLMType
 from api.db.services.tenant_llm_service import TenantLLMService

+_KEY_TO_MODEL_TYPE = {
+    "llm_id": LLMType.CHAT,
+    "embd_id": LLMType.EMBEDDING,
+    "asr_id": LLMType.SPEECH2TEXT,
+    "img2txt_id": LLMType.IMAGE2TEXT,
+    "rerank_id": LLMType.RERANK,
+    "tts_id": LLMType.TTS,
+}


This file introduces model-type mapping logic for tenant model IDs, but the PR title/description focus on /file2document/convert timeout behavior. If this change is intentional, please mention it in the PR description (or split it into a separate PR) to keep scope and review risk clear.

Copilot · 2026-03-25T15:54:16Z

api/apps/file2document_app.py

+            tenant_id = DocumentService.get_tenant_id(doc_id)
+            if tenant_id:
+                DocumentService.remove_document(doc, tenant_id)
+        File2DocumentService.delete_by_file_id(id)
+


In _convert_files, DocumentService.get_tenant_id(doc_id) can return None (e.g., when the document’s knowledgebase is not VALID). The current code skips remove_document in that case but still deletes the File2Document rows, which can leave orphan Document rows and incorrect KB counters/chunk-store state. Consider handling the None tenant case explicitly (e.g., log and still delete the document + update KB counts, or abort without deleting mappings).

- Move FileService.get_by_id() outside kb loop to avoid redundant DB calls - Validate kb_ids upfront before scheduling background work - Log warning when tenant_id is None instead of silently skipping - Add done-callback to log exceptions from fire-and-forget executor future

codecov · 2026-03-25T16:45:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.72%. Comparing base (e705ac6) to head (16b76aa).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #13784   +/-   ##
=======================================
  Coverage   96.72%   96.72%           
=======================================
  Files          10       10           
  Lines         702      702           
  Branches      112      112           
=======================================
  Hits          679      679           
  Misses          5        5           
  Partials       18       18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…nature - Use files_set.get() with falsy check to catch both missing and invalid files - Update test_convert_branch_matrix_unit to reflect new async behavior: file and kb validation still synchronous, background errors no longer surfaced - Add model_type=None to get_api_key mock to match real signature

yingfeng · 2026-03-26T03:25:20Z

CI fails:

=================================== FAILURES ===================================
__________ test_tenant_info_and_set_tenant_info_exception_matrix_unit __________
test/testcases/test_web_api/test_user_app/test_user_app_unit.py:1104: in test_tenant_info_and_set_tenant_info_exception_matrix_unit
    assert "tenant update boom" in res["message"], res
E   AssertionError: {'code': 100, 'message': "TypeError('_load_dialog_module.<locals>._TenantLLMService.get_api_key() takes 2 positional arguments but 3 were given')"}
E   assert 'tenant update boom' in "TypeError('_load_dialog_module.<locals>._TenantLLMService.get_api_key() takes 2 positional arguments but 3 were given')"
=========================== short test summary info ============================
FAILED test/testcases/test_web_api/test_user_app/test_user_app_unit.py::test_tenant_info_and_set_tenant_info_exception_matrix_unit - AssertionError: {'code': 100, 'message': "TypeError('_load_dialog_module.<locals>._TenantLLMService.get_api_key() takes 2 positional arguments but 3 were given')"}
assert 'tenant update boom' in "TypeError('_load_dialog_module.<locals>._TenantLLMService.get_api_key() takes 2 positional arguments but 3 were given')"
==== 1 failed, 718 passed, 26 skipped, 203 deselected in 188.27s (0:03:08) =====
Error: Process completed with exit code 1.

…test

SyedShahmeerAli12 and others added 3 commits March 25, 2026 12:45

Fix: prevent username line break in SharedBadge component

f1a7990

Added `inline-block max-w-[120px] truncate align-middle` classes to prevent long usernames from wrapping to multiple lines in the UI. Fixes infiniflow#13748 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 🐖api The modified files are located under directory 'api/apps/sdk' 🐞 bug Something isn't working, pull request that fix bug. labels Mar 25, 2026

yingfeng added the ci Continue Integration label Mar 25, 2026

yingfeng marked this pull request as draft March 25, 2026 15:49

yingfeng marked this pull request as ready for review March 25, 2026 15:49

yingfeng requested a review from Copilot March 25, 2026 15:49

Copilot started reviewing on behalf of yingfeng March 25, 2026 15:50 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

SyedShahmeerAli12 added 2 commits March 25, 2026 21:23

Merge branch 'main' into fix/file2document-convert-504-timeout

b593a37

SyedShahmeerAli12 added 2 commits March 26, 2026 11:48

fix(tests): add model_type=None to get_api_key mock in dialog routes …

772a3a7

…test

Merge branch 'main' into fix/file2document-convert-504-timeout

16b76aa

yingfeng merged commit ff92b55 into infiniflow:main Mar 26, 2026
1 check passed

dosubot bot mentioned this pull request Mar 27, 2026

[Bug]: 502/504 Request timeout under high concurrency #13825

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: /file2document/convert blocks event loop on large folders causing 504 timeout#13784

Fix: /file2document/convert blocks event loop on large folders causing 504 timeout#13784
yingfeng merged 8 commits intoinfiniflow:mainfrom
SyedShahmeerAli12:fix/file2document-convert-504-timeout

SyedShahmeerAli12 commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

codecov bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

yingfeng commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

+        # Validate all knowledge bases exist before scheduling background work
+        for kb_id in kb_ids:
+            e, kb = KnowledgebaseService.get_by_id(kb_id)
+            if not e or kb is None:
+                return get_data_error_result(message="Can't find this dataset!")

Conversation

SyedShahmeerAli12 commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yingfeng commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Mar 25, 2026 •

edited

Loading