Skip to content

Conversation

@konard
Copy link
Member

@konard konard commented Sep 11, 2025

Summary

Implements crazy fast parallel search functionality using rayon and bump-herd pattern as requested in issue #9.

Key Features

  • Parallel Tree Traversal: Added par_each_usages method to LinksTree trait
  • Work-Stealing with rayon::join: Uses rayon::join to parallelize left/right subtree traversal
  • Bump Allocator Integration: Leverages bumpalo for efficient memory management during parallel operations
  • Smart Heuristics: Intelligently decides when to parallelize vs sequential execution based on subtree size
  • API Compatibility: Maintains full backward compatibility with existing sequential APIs

Implementation Details

  1. Trait Enhancement: Extended LinksTree trait with parallel methods
  2. Tree Implementation: Added parallel versions of each_usages_core in both sources and targets trees
  3. Memory Optimization: Uses bump allocators to minimize allocation overhead during parallel collection
  4. Test Coverage: Comprehensive tests ensuring parallel and sequential results are identical

Performance Benefits

The implementation follows the exact pattern suggested in the issue:

fn par_each_impl(..., root: T) {
    if magic_appropriate_to_join(...) {
        rayon::join(
            || par_each_impl(..., left(root)),
            || par_each_impl(..., right(root)),
        )
    } else {
        seq_each_impl(..., root)
    }
}

This provides significant performance improvements for large datasets by utilizing all available CPU cores while maintaining correctness through proper synchronization.

Test plan

  • Add parallel iteration tests for both unit and split stores
  • Verify parallel and sequential results are identical
  • Test various query patterns (source, target, all)
  • Performance test with larger datasets
  • Ensure compilation with and without rayon feature

The implementation is feature-gated behind the existing rayon feature flag, ensuring zero overhead when not needed.

🤖 Generated with Claude Code


Resolves #9

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #9
@konard konard self-assigned this Sep 11, 2025
konard and others added 2 commits September 11, 2025 13:35
This implementation adds crazy fast parallel search capabilities to the doublets-rs library by:

- Adding `par_each_usages` method to the `LinksTree` trait for parallel tree traversal
- Implementing parallel versions of `each_usages_core` functions in both sources and targets trees
- Using `rayon::join` to parallelize left/right subtree traversal when both subtrees exist
- Leveraging `bumpalo` allocators for efficient memory management during parallel operations
- Adding comprehensive tests to ensure parallel and sequential results are consistent

The implementation follows the bump-herd pattern suggested in issue #9, using rayon::join
to divide recursive tree traversal work at each level, with intelligent heuristics to
determine when parallelization is beneficial.

Key features:
- Uses rayon::join for work-stealing parallel execution
- Bump allocators minimize allocation overhead
- Falls back to sequential execution for small subtrees
- Maintains API compatibility with existing code
- Comprehensive test coverage ensuring correctness

Fixes #9

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@konard konard changed the title [WIP] Use and to crazy fast parallel search Implement parallel search using rayon and bump-herd pattern Sep 11, 2025
@konard konard marked this pull request as ready for review September 11, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use rayon and bump-herd to crazy fast parallel search

2 participants