Skip to content

Add pgvectorscale extension #1723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from

Conversation

arthikek
Copy link

@arthikek arthikek commented Jul 27, 2025

Key Files Modified

nix/tests/sql/z_15_pgvectorscale.sql         # PostgreSQL 15 regression tests
nix/tests/sql/z_17_pgvectorscale.sql         # PostgreSQL 17 regression tests
nix/tests/expected/z_15_pgvectorscale.out    # Expected test outputs
nix/tests/expected/z_17_pgvectorscale.out
nix/tests/smoke/standard/                    # Universal smoke tests
nix/tests/smoke/no-oriol/                    # Non-OrioleDB smoke tests
nix/cargo-pgrx/buildPgrxExtension.nix        # Consolidated build function
nix/checks.nix                               # Enhanced test filtering

Testing Validation

The implementation includes comprehensive test coverage:

  • Extension installation across PostgreSQL 15 & 17

Build verification:

nix build .#checks.x86_64-linux.psql_15 -L
nix build .#checks.x86_64-linux.psql_17 -L
nix flake check

Additional context

This integration incorporates pgvectorscale from [timescale/pgvectorscale](https://github.com/timescale/pgvectorscale), providing a high-performance complement to pgvector for cost-efficient vector search on large workloads.

What kind of change does this PR introduce?

Feature addition - Integration of pgvectorscale extension and build system improvements.

What is the current behavior?

The Nix-based PostgreSQL builds currently include only pgvector for vector operations.

What is the new behavior?

Core Features Added

  • pgvectorscale extension included in PostgreSQL builds (15, 17)
  • StreamingDiskANN indexes with configurable parameters
  • Label-based filtering using efficient array overlap operators
  • Statistical Binary Quantization for storage optimization
  • Support for cosine, L2, and inner product distance functions

Testing Infrastructure

  • Regression tests for PostgreSQL versions 15 and 17
  • Smoke tests with OrioleDB compatibility handling
  • Validation of index creation, querying, and custom parameters

Performance Characteristics

Based on benchmarks with 50M Cohere embeddings (768 dimensions):

  • 28x lower p95 latency compared to storage-optimized alternatives
  • 16x higher query throughput for approximate nearest neighbor queries
  • 99% recall accuracy maintained
  • 75% cost reduction when self-hosted

Key Files Modified

  • Regression tests for PostgreSQL 15 and 17 with expected outputs
  • Smoke test organization with OrioleDB compatibility handling
  • Consolidated pgrx build system removing redundant wrapper files
  • Enhanced test filtering logic in checks.nix
  • Docker integration with pgvectorscale version documentation

Build System Refactoring

  • Eliminated redundant default.nix wrapper in cargo-pgrx module
  • Updated cargo-pgrx to fetch directly from GitHub with provided lock files

@build verification:

nix build .#checks.x86_64-linux.psql_15 -L
nix build .#checks.x86_64-linux.psql_17 -L
nix flake check

Additional context

This integration incorporates pgvectorscale from [timescale/pgvectorscale](https://github.com/timescale/pgvectorscale), providing a high-performance complement to pgvector for cost-efficient vector search on large workloads.

Special thanks to TigerData for the performance benchmarking insights and to the pgvectorscale team at Timescale for developing this high-performance vector search solution.

Note: As someone new to both Nix and Rust ecosystems, I've approached this integration through systematic problem-solving and extensive testing. To better understand and work with the existing codebase, I also made some refactoring changes - removing debug code, consolidating functions, and simplifying where I could. These changes were primarily to help me understand the code better as I worked on the integration, not because I necessarily know the optimal structure.

I welcome feedback from the community on how to improve the implementation, particularly around:

  • Nix packaging best practices
  • Rust toolchain optimization
  • Test organization and coverage
  • Build system architecture
  • Whether my refactoring decisions were appropriate or should be reverted

@arthikek arthikek requested review from a team as code owners July 27, 2025 17:59
@soedirgo
Copy link
Member

Thanks for the PR! Unfortunately we can't accept this contribution since we have no plans to support pgvectorscale. You can vote for the extension here and depending on demand we may add it in the future.

@soedirgo soedirgo closed this Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants