GitHub - The-FinAI/FinCriticalED: Repo for FinCritalED

FinCriticalED

Annotation, model evaluation, and LLM-as-Judge evaluation framework for FinCriticalED.

Repository Structure

1. Annotation

Annotation folder contains post-processing and annotation quality assessments.

Annotation/Highlight_Annotation.ipynb: process expert annotations by wrapping financially critical entities with <"Number"> and <"Time"> labels to make it the gold standard of FinCriticalED dataset.
Annotation/calculate_agreement.py and Annotation/run_agreement.sh: calculate overall and pairwise annotator agreement scores to ensure annotation quality

2. Running Models

Before running models, configure model_eval/agent.py and OPEN_AI_API_KEY and TOGETHER_API_KEY for running models
Run model_eval/main.py to generate model OCR output. To change model, or only run model on small sample, update

def evaluate(
    model_name="gpt-4o", 
    experiment_tag="zero-shot", 
    language="en", 
    local_version=True, 
    local_dir="./FinCriticalED", 
    sample=None
):

DeepSeek-OCR and and MinerU2.5 can be run seperately in model_eval/deepseekocr/batch_process_deepseek.py and model_eval/miner/batch_process_miner.py, respectively.

3. Running Evaluation on traditional OCR metrics

Upon running main.py, run model_eval/evaluation.py on ROUGE-1,ROUGE-L, Edit Distance. To control input output path, or change models, csv names etc., update

def main():
    models = [
        ...
        "Qwen/Qwen2.5-VL-72B-Instruct",
        "google/gemma-3n-E4B-it",
        "gpt-5",
    ]
    languages = [
        "smallocr"
         ...
    ]

4. LLM-as-Judge on comparing financial OCR results to gold standards

In llm-as-a-judge.ipynb, a large language model (GPT-4o) serves as the evaluator responsible for extracting financial facts from the ground-truth HTML and verifying their presence in the model-generated HTML. The LLM Judge processes both inputs under a structured evaluation prompt, enabling it to perform normalization, contextual matching, and fine-grained fact checking.

Dataset

Dataset are available on HuggingFace: TheFinAI/FinCriticalED

Research Paper

Paper is available on Arxiv: 2511.14998

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Annotation		Annotation
model_eval		model_eval
README.md		README.md
llm-as-a-judge.ipynb		llm-as-a-judge.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinCriticalED

Repository Structure

1. Annotation

2. Running Models

3. Running Evaluation on traditional OCR metrics

4. LLM-as-Judge on comparing financial OCR results to gold standards

Dataset

Research Paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FinCriticalED

Repository Structure

1. Annotation

2. Running Models

3. Running Evaluation on traditional OCR metrics

4. LLM-as-Judge on comparing financial OCR results to gold standards

Dataset

Research Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages