Skip to content

Optimize extract_core_parallel to reduce memory usage through streaming BC calculations and worker pool management #59

@jibarozzo

Description

@jibarozzo

Problem Statement

The current extract_core_parallel function has significant memory issues that cause OOM (Out of Memory) kills on HPC systems:

  1. Each worker loads the entire dataset into memory
  2. Bray-Curtis calculations create large intermediate matrices
  3. Memory usage scales multiplicatively with the number of workers

These issues prevent successful execution on large datasets, even with substantial memory allocation (e.g., 256GB RAM with 32 cores).

Proposed Solutions

Streaming/Incremental BC Calculations

Current Memory-Heavy Approach:

# Calculate BC for each OTU addition by rebuilding entire matrix
current_matrix <- rbind(start_matrix, t(otu[otu_ranked$otu[i], ]))
current_bc <- calculate_bc(current_matrix, nReads)  # Full recalculation

Proposed Incremental Approach:

  • Maintain running sums of BC components instead of storing growing matrices
  • For each new OTU, add only its incremental contribution to existing totals
  • Calculate BC from running totals: numerator_sum / (2 * nReads)

Benefits:

  • Constant memory per OTU addition regardless of dataset size
  • No matrix accumulation or redundant calculations
  • Maintains mathematical accuracy of sequential OTU ranking

Worker Pool Management

Current Uncontrolled Parallelism:

parallel_results <- parallel::mclapply(
    2:nrow(otu_ranked),  # Could be thousands of OTUs
    bc_rank_task,
    mc.cores = ncores    # Could be 31+ cores = 31+ simultaneous workers
)

Proposed Controlled Approach:

  • Add max_workers parameter to limit concurrent workers regardless of available cores
  • Use min(max_workers, ncores) to prevent memory explosion
  • Implement memory-aware worker scaling
  • Add periodic garbage collection during processing

Benefits:

  • Controlled memory usage through limited active workers
  • Prevents OOM conditions while maintaining parallelization benefits
  • Scalable across different system configurations

Implementation Requirements

Sequential Dependency Preservation

  • Critical: The OTU ranking algorithm has sequential dependencies where each OTU's contribution depends on all previously added OTUs
  • No chunking/batching: Cannot process OTUs independently as this breaks the ranking mathematics
  • Solution: Maintain sequential OTU addition while optimizing BC calculations and worker management

Integration Points

  • Modify extract_core_parallel() in R/functions/extract_core_parallel.R
  • Maintain compatibility with existing calculate_bc() function signature
  • Preserve identical output format and mathematical results
  • Add new parameters: max_workers, optional memory monitoring

Expected Outcomes

  • Successful execution on large datasets without OOM kills
  • Reduced memory footprint for BC calculations
  • Maintained mathematical accuracy and algorithm correctness
  • Improved scalability across different hardware configurations

Testing Requirements

  • Verify identical results compared to original implementation
  • Test with various dataset sizes and worker configurations
  • Validate memory usage reduction through profiling
  • Ensure compatibility with existing downstream analyses

This optimization is essential for processing the Inter-BRC core microbiome datasets that currently fail due to memory constraints.

Metadata

Metadata

Assignees

Labels

Normal PriorityWhen can we get this started/done?bugSomething isn't workingenhancementNew feature or request

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions