Skip to content

Comments

chore: plans for async generators and task-queue dataset builder#347

Open
andreatgretel wants to merge 1 commit intomainfrom
andreatgretel/chore/async-generators-plan
Open

chore: plans for async generators and task-queue dataset builder#347
andreatgretel wants to merge 1 commit intomainfrom
andreatgretel/chore/async-generators-plan

Conversation

@andreatgretel
Copy link
Contributor

  • Adds plan for transforming generators to async-first and replacing the sequential column-wise builder with a dependency-aware task queue.
  • Plan only — implementation will follow in subsequent PRs.

Part of #346

@andreatgretel andreatgretel requested a review from a team as a code owner February 20, 2026 20:26
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 20, 2026

Greptile Summary

This PR adds a comprehensive planning document for transforming DataDesigner's dataset builder from sequential column-by-column processing to an async task queue with dependency-aware scheduling. The plan outlines 8 implementation steps covering dependency mapping, completion tracking, async task scheduling, generator migration, buffer management, builder integration, and testing.

Key Changes:

  • Documents architectural shift from sequential to parallel column processing
  • Proposes dynamic dependency resolution using existing required_columns property
  • Details task granularity for different generator types (from-scratch, cell-by-cell, full-column)
  • Plans concurrency control via asyncio.Semaphore
  • Includes comprehensive risk analysis (memory, record dropping, full-column ordering, pre-batch processors)
  • Maintains backward compatibility with sync path via DATA_DESIGNER_ASYNC_ENGINE flag

Observations:

  • Plan is thorough, well-structured, and technically sound
  • Leverages existing config infrastructure (required_columns) intelligently
  • Properly considers edge cases and risks
  • Clear delineation of what's in/out of scope

Confidence Score: 5/5

  • Safe to merge - plan-only PR with no code changes
  • This is a documentation-only PR adding a comprehensive technical plan. No code is modified, so there are no runtime risks. The plan itself is technically sound, well-researched, and demonstrates thorough understanding of the existing architecture.
  • No files require special attention

Important Files Changed

Filename Overview
plans/346/async-generators-and-task-queue.md New plan document detailing async generator transformation and task-queue architecture - comprehensive and well-structured with clear implementation steps

Last reviewed commit: f42bd6a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant