Skip to content

Conversation

priyankc
Copy link
Member

@priyankc priyankc commented May 18, 2025

Description of changes

Fixes chroma-core#4388

In chromadb/api/types.py:

  • Added a RESERVED_OPERATORS constant that lists all operator names that are reserved
  • Updated validate_metadata() to check metadata keys against this list and reject any that match
  • Added validation to prevent metadata keys starting with $ to avoid future conflicts

In chromadb/segment/impl/metadata/sqlite.py:

  • Updated _where_map_criterion() to explicitly check for $in and $nin operators at the top level
  • Added specialized handling for these operators to ensure they're properly processed

Test plan

python test_reproduce/test_complex_operator_bug.py

  • Above returns wrong results without the patch
  • With the fix, we throw an exception rejecting RESERVE_WORDS as keys

Scenario Represented in the Test Case

  1. Create a collection with documents that have metadata keys matching operator names:

    client = chromadb.Client()
    collection = client.create_collection("test_collection_complex")
    collection.add(
        ids=["id1", "id2", "id3", "id4"],
        documents=["doc1", "doc2", "doc3", "doc4"],
        metadatas=[
            {"$in": "value1", "category": "A"},           # Has a key that matches operator
            {"category": "A", "tag": "special"},          # Normal metadata, category A
            {"$in": "value3", "category": "B"},           # Has a key that matches operator, different category
            {"category": "B", "tag": "special"}           # Normal metadata, category B
        ]
    )
  2. Execute a complex query that uses $in both as a metadata key name and as an operator in different parts of the where clause:

    result = collection.get(
        where={"$and": [
            {"$in": "value3"},                # Using $in as a metadata key
            {"category": "B"},                # Regular filter
            {"tag": {"$in": ["special"]}}     # Using $in as an operator
        ]}
    )
  3. Observe that the query returns no results, even though there should be a match (document with id4).

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

More details:

@hesreallyhim
Copy link

@priyankc FYI RCA link is saying "Report not found." not sure if just me/VPN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Metadata/where edge cases

2 participants