Skip to content

Conversation

@HabebNawatha
Copy link

What does this PR do?

This PR improves vector database registration by making provider_id a mandatory field, aligning with best practices across the codebase and ensuring consistent, explicit behavior.

Previously, if multiple vector_io providers were registered and no provider_id was provided, the system would arbitrarily pick the first provider or raise a less-clear error. This change eliminates ambiguity and ensures users always specify which provider to use.

Closes #2834

Test Plan

  • Modified the register_vector_db method signature to require provider_id explicitly.
  • Verified that API now returns a validation error when provider_id is missing.
  • Confirmed behavior with multiple and single provider setups to ensure proper enforcement.
Screenshot 2025-08-14 at 17 07 42 Screenshot 2025-08-14 at 17 08 35

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 14, 2025
@HabebNawatha HabebNawatha changed the title Make provider_id Mandatory in Vector DB Registration to Avoid Ambiguity refactor: make provider_id mandatory in vector DB registration to avoid ambiguity Aug 14, 2025
@leseb leseb force-pushed the fix/vector-db-mandatory-provider-id branch from 121a858 to 820ec43 Compare September 1, 2025 12:42
@leseb leseb requested a review from yanxi0830 as a code owner September 1, 2025 12:42
Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HabebNawatha see unit test failures, thanks

@HabebNawatha HabebNawatha requested a review from leseb September 11, 2025 09:04
)
else:
raise ValueError("No provider available. Please configure a vector_io provider.")
provider_vector_db_id = provider_vector_db_id or vector_db_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is provider_vector_db_id and why does it take precedence here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback provider_vector_db_id = provider_vector_db_id or vector_db_id is no longer needed
since the merged logic now uses provider_vector_db_id or vector_store_id downstream. Updated.

@ehhuang
Copy link
Contributor

ehhuang commented Sep 12, 2025

@franciscojavierarceo this is a breaking change? do you have a preferred strategy for rolling this out?

@franciscojavierarceo franciscojavierarceo changed the title refactor: make provider_id mandatory in vector DB registration to avoid ambiguity refactor!: make provider_id mandatory in vector DB registration to avoid ambiguity BREAKING CHANGE Sep 12, 2025
provider_vector_db_id: str | None = None,
vector_db_name: str | None = None,
) -> VectorDB:
if provider_id is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically i role out a log first to note that the change is breaking but i suppose it's too late for that in this change.

@franciscojavierarceo
Copy link
Collaborator

@franciscojavierarceo this is a breaking change? do you have a preferred strategy for rolling this out?

I commented on the PR but yeah typically i stagger releases by introduce logging first and then the break in the next release.

This one is pretty easy though so just a call out in the release notes is probably fine.

@leseb
Copy link
Collaborator

leseb commented Oct 15, 2025

@HabebNawatha @franciscojavierarceo still relevant? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revisit whether "provider_id" should be mandatory during resource registration for vector DB

4 participants