-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
workbackai/chroma
#6Labels
bugSomething isn't workingSomething isn't working
Description
What happened?
Related: #4346
Description
In addition to the observation in #4346 that document ID can be empty string, I found some edge cases that we may wish to disallow:
collection.upsert(
embeddings=[
[1.1, 2.3, 3.2],
[4.5, 6.9, 4.4],
[1.1, 2.3, 3.2],
[4.5, 6.9, 4.4],
[1.7, 4.3, 3.2],
[4.7, 4.8, 3.2],
],
metadatas=[
{"uri": "img1.png", "style": "style1"},
{"": "img2.png"},
{"": "", "$nin": "uhoh"},
{"uri": "img4.png", "computed": "style" + "1"},
{"uri": "img5.png", "uhoh": "$contains"},
{"uri": "$contains", "$nin": "$nin", "bool": True, "num": 21},
],
documents=["doc9", "doc2", "doc3", "doc4", "doc5", "doc6"],
ids=["id1", "id2", "id3", "id4", "id5", "id6"],
)
result1 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"style": {"$eq": "style1"}},
)
print("result1", result1)
result2 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"$nin": "uhoh"}
)
print("result2", result2)
result3 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"uhoh": "$contains"}
)
print("result3", result3)
result4 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"$nin": "$nin"}
)
print("result4", result4)
result5 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"bool": 5 == 5}
)
print("result5", result5)
result6 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"bool": True or False}
)
print("result6", result6)
result7 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"bool": False or 0 or True or 6} # chain of falsey and then match True
)
print("result7", result7)
result8 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"bool": False or "truthy" or True or 6} # chain with a non-matching truthy, cuts off the real match
)
print("result8", result8)
result9 = collection.query(
query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
n_results=15,
where={"num": {"$in": list(range(25))}}
)
print("result9", result9)
# result10 = collection.query(
# query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]],
# n_results=15,
# where={"num": {"$in": list(range(10000000000000000000))}} # DoS attack(?)
# )
# print("result10", result10)
View in this colab notebook:
https://colab.research.google.com/drive/1BKGRLM9CmuGHHFW0hBorlLN6g-Coz2U1#scrollTo=64dWyeEdKAX9
The last one is maybe a potential DoS attack for Chroma Cloud(??)
I think this also means that ID filtering (new feature) will already allow for "operations" ("$gte", etc.) since you can do a lot with list comprehension, and basically simulate the same functionality.
Versions
Chroma 1.0.7
python 3.11.12
Relevant log output
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working