fix: FAISS vectorstore returns square of distance instead of distance #176

CuberMessenger · 2025-07-18T06:21:30Z

Summary

The FAISS score (distance) computation returns the square of L2 distance instead of L2 distance.

Also, when cosine similarity is used. FAISS should index with inner product because it equals to cosine similarity as long as the vectors are normalized to unit length.

Description

Please check out the langchain issue to get an overview.

Here's a simple example to show to difference

embedding_model = get_google_embedding_model()

sents = [
    "Sidewinder has used JavaScript to drop and execute malware loaders.",
]

db_euc = FAISS.from_texts(
    sents,
    embedding_model,
    ids=range(len(sents)),
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
    normalize_L2=True,
)

query = "execute malware loaders"

embedding_doc = np.array(db_euc._embed_query(db_euc.get_by_ids([0])[0].page_content))
embedding_euc = np.array(db_euc._embed_query(query))

# Euclidean distance and similarity by hand
euc_distence = np.sqrt(((embedding_doc - embedding_euc) ** 2).sum())
euc_similarity = 1 - euc_distence / (2**0.5)

print(f"[By hand] Euclidean distance: {euc_distence}")
print(f"[By hand] Euclidean similarity: {euc_similarity}")
"""
[By hand] Euclidean distance: 0.6381204577868655
[By hand] Euclidean similarity: 0.5487806970850433
"""

# Score by FAISS
score = db_euc.similarity_search_with_score(query, k=1)[0][1]

print(f"[FAISS] Score: {score}")
"""
[FAISS] Score: 0.40719807147979736

The score is actually the square of the euclidean distance.
(0.6381204577868655) ** 2) = 0.4071977186461187
"""

# Relevance score by FAISS
relevance_score = db_euc.similarity_search_with_relevance_scores(query, k=1)[0][1]
print(f"[FAISS] Relevance score: {relevance_score}")
"""
[FAISS] Relevance score: 0.7120674848556519

This relevance score is actually
0.7120674848556519 = 1 - (0.40719807147979736 / (2**0.5))
"""


### Monkey patch the fix ###
...

def similarity_search_with_score_by_vector(...) -> List[Tuple[Document, float]]:
    ...
    scores, indices = self.index.search(vector, k if filter is None else fetch_k)
    scores = np.sqrt(scores) ################# ADDED #########################
    ...
    return docs[:k]


FAISS.similarity_search_with_score_by_vector = similarity_search_with_score_by_vector

### Monkey patch the fix ###

# Test the fixed score by FAISS
score = db_euc.similarity_search_with_score(query, k=1)[0][1]

print(f"[Fixed FAISS] Score: {score}")
"""
[FAISS] Score: 0.638120710849762
This matches the by-hand calculation of the euclidean distance.
"""

# Test the fixed relevance score by FAISS
score = db_euc.similarity_search_with_relevance_scores(query, k=1)[0][1]
print(f"[Fixed FAISS] Relevance score: {score}")
"""
[FAISS] Relevance score: 0.5487805008888245
This matches the by-hand calculation of the euclidean similarity.
"""

CuberMessenger added 2 commits July 18, 2025 14:12

fix: FAISS vectorstore returns square of distance instead of distance

5907b96

Merge branch 'main' into main

e9cffc6

mdrxy assigned eyurtsev Sep 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: FAISS vectorstore returns square of distance instead of distance #176

fix: FAISS vectorstore returns square of distance instead of distance #176

Uh oh!

CuberMessenger commented Jul 18, 2025

Uh oh!

Uh oh!

fix: FAISS vectorstore returns square of distance instead of distance #176

Are you sure you want to change the base?

fix: FAISS vectorstore returns square of distance instead of distance #176

Uh oh!

Conversation

CuberMessenger commented Jul 18, 2025

Summary

Description

Uh oh!

Uh oh!