-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Normal PriorityWhen can we get this started/done?When can we get this started/done?bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Milestone
Description
Problem Statement
The current extract_core_parallel function has significant memory issues that cause OOM (Out of Memory) kills on HPC systems:
- Each worker loads the entire dataset into memory
- Bray-Curtis calculations create large intermediate matrices
- Memory usage scales multiplicatively with the number of workers
These issues prevent successful execution on large datasets, even with substantial memory allocation (e.g., 256GB RAM with 32 cores).
Proposed Solutions
Streaming/Incremental BC Calculations
Current Memory-Heavy Approach:
# Calculate BC for each OTU addition by rebuilding entire matrix
current_matrix <- rbind(start_matrix, t(otu[otu_ranked$otu[i], ]))
current_bc <- calculate_bc(current_matrix, nReads) # Full recalculationProposed Incremental Approach:
- Maintain running sums of BC components instead of storing growing matrices
- For each new OTU, add only its incremental contribution to existing totals
- Calculate BC from running totals:
numerator_sum / (2 * nReads)
Benefits:
- Constant memory per OTU addition regardless of dataset size
- No matrix accumulation or redundant calculations
- Maintains mathematical accuracy of sequential OTU ranking
Worker Pool Management
Current Uncontrolled Parallelism:
parallel_results <- parallel::mclapply(
2:nrow(otu_ranked), # Could be thousands of OTUs
bc_rank_task,
mc.cores = ncores # Could be 31+ cores = 31+ simultaneous workers
)Proposed Controlled Approach:
- Add
max_workersparameter to limit concurrent workers regardless of available cores - Use
min(max_workers, ncores)to prevent memory explosion - Implement memory-aware worker scaling
- Add periodic garbage collection during processing
Benefits:
- Controlled memory usage through limited active workers
- Prevents OOM conditions while maintaining parallelization benefits
- Scalable across different system configurations
Implementation Requirements
Sequential Dependency Preservation
- Critical: The OTU ranking algorithm has sequential dependencies where each OTU's contribution depends on all previously added OTUs
- No chunking/batching: Cannot process OTUs independently as this breaks the ranking mathematics
- Solution: Maintain sequential OTU addition while optimizing BC calculations and worker management
Integration Points
- Modify
extract_core_parallel()inR/functions/extract_core_parallel.R - Maintain compatibility with existing
calculate_bc()function signature - Preserve identical output format and mathematical results
- Add new parameters:
max_workers, optional memory monitoring
Expected Outcomes
- Successful execution on large datasets without OOM kills
- Reduced memory footprint for BC calculations
- Maintained mathematical accuracy and algorithm correctness
- Improved scalability across different hardware configurations
Testing Requirements
- Verify identical results compared to original implementation
- Test with various dataset sizes and worker configurations
- Validate memory usage reduction through profiling
- Ensure compatibility with existing downstream analyses
This optimization is essential for processing the Inter-BRC core microbiome datasets that currently fail due to memory constraints.
Metadata
Metadata
Assignees
Labels
Normal PriorityWhen can we get this started/done?When can we get this started/done?bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request