Skip to content

Conversation

@fdelache
Copy link

@fdelache fdelache commented Nov 3, 2025

Summary

Adds a new Parallelizable concern that enables maintenance tasks to process items in parallel using threads. This provides meaningful speedup for I/O-bound operations while maintaining all maintenance_tasks guarantees (pausable, resumable, cursor tracking).

Why this is needed

Many maintenance tasks involve I/O-bound operations (API calls, external services, file operations) where processing each item sequentially is inefficient. By processing items in parallel within batches, we can achieve significant speedup (near-linear with thread count) for these types of operations.

Implementation

The implementation is intentionally simple and leverages existing framework features:

Key Components

  1. ParallelExecutor (~50 lines): Handles thread creation, parallel execution, and error collection
  2. Parallelizable concern (~109 lines): Converts batches to arrays and delegates to executor

Design Decisions

  • Users batch their own collections: Instead of auto-converting collections, users explicitly use .in_batches(of: N), csv_collection(in_batches: N), or .each_slice(N)
  • Batch size = thread count: Direct, explicit relationship. No separate configuration needed
  • Leverages existing features: No duplication of batching logic that already exists in the framework
  • Simple conversion: Just converts batch to array and spawns one thread per item

Usage

class Maintenance::UpdateUsersTask < MaintenanceTasks::Task
  include MaintenanceTasks::Concerns::Parallelizable

  def collection
    User.where(status: 'pending').in_batches(of: 10)
  end

  def process_item(user)
    # This will be called in parallel (10 concurrent threads per batch)
    user.update!(status: 'processed')
  end
end

Important Requirements

  • Idempotent operations: Cursor tracks batches, not items. If interrupted mid-batch, items will be reprocessed
  • Thread-safe operations: No shared mutable state between items
  • Connection pool sizing: Database connection pool must be >= batch size

Test plan

  • All new tests pass (parallelizable_test.rb, parallel_executor_test.rb)
  • Tests cover AR batches, CSV batches, arrays, error handling, thread safety
  • Verified batch size directly controls thread count
  • Confirmed existing maintenance_tasks tests still pass

🤖 Generated with Claude Code

Adds a new concern that enables maintenance tasks to process items in
parallel using threads. This is useful for I/O-bound operations where
significant speedup can be achieved through concurrent processing.

Key features:
- Task authors batch their collections using existing framework features
  (.in_batches, csv_collection(in_batches:), .each_slice)
- Batch size directly determines thread count (explicit relationship)
- Spawns one thread per item within each batch
- Maintains all maintenance_tasks guarantees (pausable, resumable)
- Requires idempotent and thread-safe process_item implementations

Implementation:
- ParallelExecutor: Handles thread creation, execution, and error handling
- Parallelizable concern: Converts batches to arrays and delegates to executor
- Simple API: include concern + implement process_item instead of process

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant