-
Notifications
You must be signed in to change notification settings - Fork 107
Description
Description
The _get_summaries_needing_reconciliation function in aperag/tasks/reconciler.py contains a logic bug that causes COMPLETE status summaries to be repeatedly selected for reconciliation every minute.
Current Behavior
The reconciler task runs every 1 minute via Celery Beat and logs:
Collection summary task completed for colXXXXXXXX
This happens continuously even when document uploads were completed hours ago.
Root Cause
In PR #1229 (commit 42ad173), the SQL condition was changed from and_ to or_:
Before (correct):
def _get_summaries_needing_reconciliation(self, session: Session):
stmt = select(CollectionSummary).where(
and_(
CollectionSummary.version != CollectionSummary.observed_version,
CollectionSummary.status == CollectionSummaryStatus.PENDING,
)
)After (bug):
def _get_summaries_needing_reconciliation(self, session: Session):
stmt = select(CollectionSummary).where(
or_(
CollectionSummary.version != CollectionSummary.observed_version,
CollectionSummary.status != CollectionSummaryStatus.GENERATING,
)
)Problem Analysis
The condition status != GENERATING is TRUE for:
PENDING → selected (intended)
COMPLETE → selected (unintended)
FAILED → selected (may or may not be intended)
This causes every COMPLETE summary to be selected on every reconciliation cycle.
Impact
Unnecessary Task Execution: Every completed summary triggers a Celery task every minute
Potential LLM API Calls: If version != observed_version, actual LLM API calls occur
Log Noise: Continuous "Collection summary task completed" messages
Resource Waste: CPU/memory overhead from repeated task scheduling
Suggested Fix
Option 1: Restore original logic (recommended)
stmt = select(CollectionSummary).where(
and_(
CollectionSummary.version != CollectionSummary.observed_version,
CollectionSummary.status == CollectionSummaryStatus.PENDING,
)
)
Option 2: Include FAILED retry
stmt = select(CollectionSummary).where(
and_(
CollectionSummary.version != CollectionSummary.observed_version,
CollectionSummary.status.in_([
CollectionSummaryStatus.PENDING,
CollectionSummaryStatus.FAILED
]),
)
)
Option 3: Version mismatch only (minimal change)
stmt = select(CollectionSummary).where(
CollectionSummary.version != CollectionSummary.observed_version
)