This work introduces FactOReS, the first publicly available dataset for evidence-based veracity prediction in Spanish, constructed from authentic Spanish-language claims sourced from Maldita.es, a leading Spanish fact-checking organization. We establish performance baselines by systematically applying In-Context Learning (ICL) with Large Language Models (LLMs) to both an established English dataset and our novel Spanish dataset.
This dataset contains verifiable claims paired with verification questions and contextual evidence snippets (571 instances in total) in Spanish extracted from online sources.
It is designed for the Automatic Fact-Checking (AFC) task.
Each entry includes:
claim_id: unique identifier of the claimclaim: textual claim to be verifiedquestion: a question targeting aspects of the claimsummarized_text: summary of the retrieved or supporting evidencerelevance: binary indicator of evidence relevancecritical_*: critical dimensions (what, who, where, when, how) capturing key fact-checking attributes (values: 1 or 0 (null))objectivity: binary indicator of objectivity in the evidenceTOTAL: aggregated scoreSTANCE: stance of the evidence relative to the claim (Positive,Negative,Neutral)label: gold veracity label for fact-checking (Supported,Refuted,Not Enough Evidence)
{
"claim_id":6901,
"claim":"Diario Sur tuitea que Málaga, Marbella y prácticamente toda la Costa del Sol cerrarán toda la actividad no esencial",
"question":"Cuáles son las últimas medidas anunciadas oficialmente por el Ayuntamiento de Málaga sobre actividades no esenciales?",
"summarized_text":"Málaga capital ha superado la tasa de mil contagios de COVID-19 por cada cien mil habitantes, lo que obliga al cierre de negocios no esenciales durante al menos 14 días a partir de este miércoles. Esta medida, establecida por la Junta de Andalucía, busca frenar la propagación del virus en sectores como la hostelería, comercio y cultura. Además, otros municipios como Casares, Ojén, Benaoján y otros también implementarán estas restricciones debido a la alta incidencia.",
"relevance":1,
"critical_what":1,
"critical_who":0,
"critical_where":1,
"critical_when":0,
"critical_how":0,
"objectivity":1,
"TOTAL":4,
"STANCE":"Positive",
"label":"Supported"
}Follow these steps in order to recreate the dataset:
The first step is to generate the dataset, which involves preprocessing the original dataset, performing Question Generation and Evidence Retrieval.
Run the generate_dataset.ipynb notebook and follow the steps.
- Create an environment using
requirements_AFC.txtand activate it. - Make sure you have all the required API keys (OPENAI, HUGGINGFACE) with permissions for the following models:
- GPT-4o
- Qwen 2.5 (7B & 72B Instruct)
- LLaMA 3 (8B & 70B Instruct)
To reproduce experimental results:
The final JSON output from the notebook is filtered so that we only keep 50 chunks of evidence per question.
- Filename must be:
unique_claims_qs_context_50_filtered.json
For each question, only the first evidence is kept and summarized.
- Execute:
summarize_1evidence.py - Resulting filename:
unique_claims_qs_context_50_summarized.json
Modify the dataset so that each line contains one claim–question–evidence trio.
- Execute:
modify_factores_Evidence_per_line.py - Resulting filename:
unique_claims_qs_context_50_formatted_per_line.jsonl
Run stance and veracity prediction on the formatted dataset. For experiments with AVeriTeC, we use the baseline development split (dev.json) provided by the authors.
The file is available in the official AVeriTeC repository (HuggingFace).
(python src/veracity/veracity_prediction_pydantic.py --dataset_type [factores or averitec] [--few_shot] [--useStance])
- Note that this is not necessary. Intermediate steps to reproduce FactOReS dev.json, execute: 1.
create_annotation_csv.py(from JSON format to csv), 2. Manually annotate the dataset, 3.analyze_annotation_file.py(check dataset distribution), 4.calculate_agreement.py(Inter-Annotation Agreement), 4.xlsxs_to_json.py(.xlsxs format to JSON). - Execute:
stance_eval_factores.pyfor stance evaluation with data from FactOReS dataset - Execute:
veracity_evaluation.pyfor veracity prediction with data from AVeriTeC or FactOReS (python src/veracity/veracity_evaluation.py --dataset_type [averitec_dataset or factores_dataset])
This repository includes both code and data under different licenses:
- Code: Apache License 2.0
- FactOReS dataset: Creative Commons Attribution 4.0 International (CC BY 4.0)