HFDataLoader returns HuggingFace Dataset, but EvaluateRetrieval expects dict (AttributeError: 'Dataset' object has no attribute 'keys')

### Description

I’m running into an internal incompatibility when using `HFDataLoader` together with
`EvaluateRetrieval` for dense retrieval.

This is confusing because the setup closely follows BEIR’s own example using the HF
loader, but it fails at runtime due to mismatched data structures.

### Code snippet

```python
from beir.datasets.data_loader_hf import HFDataLoader
from beir.retrieval.models import SentenceBERT
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
from beir.retrieval.evaluation import EvaluateRetrieval

corpus, queries, qrels = HFDataLoader(
    hf_repo=REPO_ID,
    hf_repo_qrels=REPO_ID,
    corpus_file="corpus.jsonl",
    query_file="queries.jsonl",
    streaming=False
).load(split="test")

beir_model = SentenceBERT(
    "sentence-transformers/static-similarity-mrl-multilingual-v1"
)

model = DRES(beir_model, batch_size=128)
retriever = EvaluateRetrieval(model, score_function="dot")

results = retriever.retrieve(corpus, queries)
```

### Error

```text
AttributeError: 'Dataset' object has no attribute 'keys'
```

Traceback points to:

```python
# beir/retrieval/search/dense/exact_search.py
query_ids = list(queries.keys())
```

### What’s going wrong

- `HFDataLoader.load()` returns Hugging Face `Dataset` objects for `corpus` and `queries`
- `EvaluateRetrieval` / `DenseRetrievalExactSearch` assumes dict-like inputs
  (`queries.keys()`, indexing by ID)
- This results in an immediate runtime failure

This is especially surprising because the HF loader is explicitly provided to reduce
RAM usage, and its usage is demonstrated in the repository itself:

https://github.com/beir-cellar/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_sbert_hf_loader.py

### Expected behavior

One of the following:

- `EvaluateRetrieval` should natively support Hugging Face `Dataset` objects returned
  by `HFDataLoader`, or
- The HF loader example should clearly document that users must convert datasets to
  dicts before retrieval, or
- `HFDataLoader` should optionally return dicts in the format expected by the
  retrieval pipeline

Right now the API contracts don’t line up, even though they are presented as compatible. Or I am missing something

### Why this matters

The HF loader is very useful for large corpora, but at the moment it’s not actually
usable with the standard dense retrieval pipeline without custom glue code. This makes
the example misleading and the failure mode non-obvious.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HFDataLoader returns HuggingFace Dataset, but EvaluateRetrieval expects dict (AttributeError: 'Dataset' object has no attribute 'keys') #208

Description

Code snippet

Error

What’s going wrong

Expected behavior

Why this matters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HFDataLoader returns HuggingFace Dataset, but EvaluateRetrieval expects dict (AttributeError: 'Dataset' object has no attribute 'keys') #208

Description

Description

Code snippet

Error

What’s going wrong

Expected behavior

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions