Skip to content

Feature: improve relationship builders for better async and reduced memory utilization #2077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ahgraber
Copy link
Contributor

  • CosineSimilarityBuilder now uses a sharded/chunked similarity calculation to significantly reduce memory requirements
  • CosineSimilarityBuilder and JaccardSimilarityBuilder now leverage generate_execution_plan to support async iteration over tasks (for potential future multithreading or improved concurrency)
  • Added unit tests

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 13, 2025
ahgraber added 6 commits July 18, 2025 12:54
…methods

- Refactored the JaccardSimilarityBuilder to use async methods for finding similar embedding pairs.
- Introduced a new method `generate_execution_plan` to generate coroutines of comparisons for better tracking and potential concurrency
- Updated the `transform` method to utilize the new async functionality.
- Added comprehensive test coverage for the new features in the JaccardSimilarityBuilder.
@ahgraber ahgraber force-pushed the feature/relationship-builder branch from c29fbeb to e765bae Compare July 18, 2025 16:54
@ahgraber ahgraber force-pushed the feature/relationship-builder branch from fb608d0 to d30e58d Compare July 18, 2025 17:21
- Improved logic for generating similar and dissimilar sets based on input constraints.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant