Skip to content

Conversation

@jairad26
Copy link
Contributor

@jairad26 jairad26 commented Jul 10, 2025

Description of changes

Previously, the go code did not update the configuration str for collections because it was not passed as part of generateCollectionUpdatesWithoutID

This was fixed, and tests were added. It also cleans up the existing go types, and updates the rust types to use a similar InternalCollectionConfiguration for updates.

Test plan

How are these changes tested?

Tests were added to validate that both the modify worked, and on future fetches maintains the values. Db tests were also added to the dao

  • [x ] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

jairad26 commented Jul 10, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@jairad26 jairad26 force-pushed the jai/cleanup-update-vector-index branch 3 times, most recently from 6b7d4ae to c561c11 Compare July 10, 2025 19:54
@jairad26 jairad26 force-pushed the jai/cleanup-update-vector-index branch 7 times, most recently from b1fde3c to 99f88de Compare July 29, 2025 01:37
@jairad26 jairad26 changed the title [CLN] Use InternalUpdateConfiguration in Rust, cleanup go code [BUG] Use InternalUpdateConfiguration in Rust, correctly update configjsonstr in go Jul 29, 2025
@jairad26 jairad26 changed the title [BUG] Use InternalUpdateConfiguration in Rust, correctly update configjsonstr in go [BUG] Use InternalUpdateConfiguration in Rust, correctly update configuration in go Jul 29, 2025
@jairad26 jairad26 force-pushed the jai/cleanup-update-vector-index branch 2 times, most recently from e62b9cc to c1e7f74 Compare July 29, 2025 02:05
@jairad26 jairad26 marked this pull request as ready for review July 29, 2025 05:36
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Jul 29, 2025

Fix Collection Configuration Updates and Unify Config Types Across Go and Rust

This PR addresses an issue in the Go implementation where collection configuration updates (specifically, ConfigurationJsonStr) were not persisted during modifications. It remedies this, enhances the collection update logic in Go, and brings Go/Rust type and update logic into closer alignment by introducing InternalUpdateCollectionConfiguration in Rust and refactoring related code paths. Additional tests have been added to Go and Rust to ensure that configuration updates, round-tripping, and persistence behave as expected.

Key Changes

• Go: Fixes logic in generateCollectionUpdatesWithoutID to ensure ConfigurationJsonStr is updated when modifying a collection.
• Go: Cleans up collection configuration/type structure (VectorIndexConfiguration and EmbeddingFunctionConfiguration) to match Rust conventions; type fields are removed, and configuration structurally normalized.
• Go: Expands configuration update test coverage in table_catalog_test.go and collection_test.go (multiple new tests)
• Rust: Adds type InternalUpdateCollectionConfiguration for consistent update application, deprecating direct use of UpdateCollectionConfiguration for update logic.
• Rust: Refactors the collection_configuration.rs update logic to use InternalUpdateCollectionConfiguration, ensuring update/merge semantics are clear and match the Go side.
• Rust: Updates all relevant API, sysdb, and Python bindings interfaces to use InternalUpdateCollectionConfiguration for updates.
• Rust: Adds extensive test coverage for configuration merging (including edge cases for HNSW, SPANN, and embedding functions).
• Go/Rust: Unifies configuration update semantics and brings test coverage and field-level mapping between the two implementations.

Affected Areas

• Go: sysdb/coordinator/model/collection_configuration.go
• Go: sysdb/coordinator/table_catalog.go and table_catalog_test.go
• Go: sysdb/metastore/db/dao/collection.go and collection_test.go
• Rust: types/src/collection_configuration.rs, types/src/api_types.rs, types/src/spann_configuration.rs
• Rust: sysdb (sysdb.rs, sqlite.rs), frontend/server.rs, python_bindings
• Rust/Go: test coverage in both codebases
• Python: Integration tests for configuration persistence (test_collection_configuration.py)

This summary was automatically generated by @propel-code-bot

Comment on lines +284 to +292
coll = client.get_collection(name="test_updates")
loaded_config = coll.configuration_json
if loaded_config and isinstance(loaded_config, dict):
hnsw_config = loaded_config.get("hnsw", {})
if isinstance(hnsw_config, dict):
assert hnsw_config.get("ef_search") == 20
assert hnsw_config.get("space") == "cosine"
assert hnsw_config.get("ef_construction") == 100
assert hnsw_config.get("max_neighbors") == 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The logic to re-fetch a collection and verify its configuration is duplicated in this test and also in test_configuration_spann_updates and test_spann_update_from_json. To improve maintainability and reduce code duplication, consider extracting this verification logic into a helper function.

For example:

def _verify_config_persistence(client: ClientAPI, collection_name: str, index_type: str, expected_params: Dict[str, Any]) -> None:
    coll = client.get_collection(name=collection_name)
    loaded_config = coll.configuration_json
    assert loaded_config and isinstance(loaded_config, dict)
    
    index_config = loaded_config.get(index_type, {})
    assert isinstance(index_config, dict)

    for key, value in expected_params.items():
        assert index_config.get(key) == value

This would make the tests cleaner and more maintainable.

}

pub fn update(&mut self, configuration: &UpdateCollectionConfiguration) {
pub fn update(&mut self, configuration: &InternalUpdateCollectionConfiguration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this used v/s updateCollectionConfiguration in table_catalog.go? Why do we need this if sysdb already does a Read modify write

Copy link
Contributor Author

@jairad26 jairad26 Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used for sqlite sysdb. in that, it does the same logic of reading, then updating the config if applicable

Copy link
Contributor

@sanketkedia sanketkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, sqlite did not have this bug right?

}

#[derive(Deserialize, Serialize, ToSchema, Debug, Clone)]
pub struct InternalUpdateCollectionConfiguration {
Copy link
Collaborator

@HammadB HammadB Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between the internal type and not? a comment would be useful

Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have more tests on the rust and go update paths

@jairad26 jairad26 force-pushed the jai/cleanup-update-vector-index branch from 28890fe to e059b16 Compare July 30, 2025 20:10
@blacksmith-sh blacksmith-sh bot deleted a comment from jairad26 Jul 30, 2025
@jairad26 jairad26 merged commit 75ae50a into main Jul 30, 2025
59 checks passed
Inventrohyder pushed a commit to Inventrohyder/chroma that referenced this pull request Aug 5, 2025
…guration in go (chroma-core#5069)

## Description of changes

Previously, the go code did not update the configuration str for
collections because it was not passed as part of
`generateCollectionUpdatesWithoutID`

This was fixed, and tests were added. It also cleans up the existing go
types, and updates the rust types to use a similar
InternalCollectionConfiguration for updates.



## Test plan

_How are these changes tested?_

Tests were added to validate that both the modify worked, and on future
fetches maintains the values. Db tests were also added to the dao

- [x ] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes

_Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants