Skip to content

HFDataLoader returns HuggingFace Dataset, but EvaluateRetrieval expects dict (AttributeError: 'Dataset' object has no attribute 'keys') #208

@kcambrek

Description

@kcambrek

Description

I’m running into an internal incompatibility when using HFDataLoader together with
EvaluateRetrieval for dense retrieval.

This is confusing because the setup closely follows BEIR’s own example using the HF
loader, but it fails at runtime due to mismatched data structures.

Code snippet

from beir.datasets.data_loader_hf import HFDataLoader
from beir.retrieval.models import SentenceBERT
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
from beir.retrieval.evaluation import EvaluateRetrieval

corpus, queries, qrels = HFDataLoader(
    hf_repo=REPO_ID,
    hf_repo_qrels=REPO_ID,
    corpus_file="corpus.jsonl",
    query_file="queries.jsonl",
    streaming=False
).load(split="test")

beir_model = SentenceBERT(
    "sentence-transformers/static-similarity-mrl-multilingual-v1"
)

model = DRES(beir_model, batch_size=128)
retriever = EvaluateRetrieval(model, score_function="dot")

results = retriever.retrieve(corpus, queries)

Error

AttributeError: 'Dataset' object has no attribute 'keys'

Traceback points to:

# beir/retrieval/search/dense/exact_search.py
query_ids = list(queries.keys())

What’s going wrong

  • HFDataLoader.load() returns Hugging Face Dataset objects for corpus and queries
  • EvaluateRetrieval / DenseRetrievalExactSearch assumes dict-like inputs
    (queries.keys(), indexing by ID)
  • This results in an immediate runtime failure

This is especially surprising because the HF loader is explicitly provided to reduce
RAM usage, and its usage is demonstrated in the repository itself:

https://github.com/beir-cellar/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_sbert_hf_loader.py

Expected behavior

One of the following:

  • EvaluateRetrieval should natively support Hugging Face Dataset objects returned
    by HFDataLoader, or
  • The HF loader example should clearly document that users must convert datasets to
    dicts before retrieval, or
  • HFDataLoader should optionally return dicts in the format expected by the
    retrieval pipeline

Right now the API contracts don’t line up, even though they are presented as compatible. Or I am missing something

Why this matters

The HF loader is very useful for large corpora, but at the moment it’s not actually
usable with the standard dense retrieval pipeline without custom glue code. This makes
the example misleading and the failure mode non-obvious.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions