Skip to content

CEA-LIST/runia_core

RunIA Logo

Runtime Uncertainty estimation for AI models



Overview

RunIA-core is an open-source Python library for uncertainty estimation and Out-of-Distribution (OoD) detection in AI models. It provides comprehensive tools for evaluating and deploying uncertainty estimation methods across computer vision tasks (image classification, object detection, semantic segmentation) and natural language processing (LLM hallucination detection).

Key Features

  • Object Detection support: Beyond Image classification, RunIA-core includes modules for object detection architectures (Faster RCNN, YOLOv8, RT-DETR, Deformable DETR, OWLv2) and semantic segmentation (DeepLabv3+, U-Net)
  • Latent Space Uncertainty Estimation: LaRED (Latent Representations Density) and LaREM (Latent Representations Mahalanobis) methods for OoD detection
  • Multiple Baseline Methods: Support for 15+ baseline OoD detection methods (MSP, Energy, Mahalanobis, kNN, ViM, DDU, DICE, ReAct, and more)
  • LLM Uncertainty: Hallucination detection with methods like semantic entropy, RAUQ, perplexity, and eigen scores
  • Monte Carlo Dropout (MCD): Epistemic uncertainty estimation through MC sampling and entropy computation
  • Feature Extraction: Image-level and object-level feature extraction for various architectures
  • Flexible Inference: Production-ready inference modules for real-time OoD detection
  • Comprehensive Evaluation: Built-in metrics (AUROC, AUPR, FPR@95) and visualization tools

Table of Contents


Installation

Prerequisites

  • Python 3.9 or higher
  • CUDA-capable GPU (recommended for computer vision tasks and LLMs)
Using pip
# Clone the repository
git clone <repository-url>
cd runia_core

# Create a virtual environment (recommended)
python -m venv runia_env
source runia_env/bin/activate  # On Windows: runia_env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install .
Using conda
# Create a conda environment
conda create -n runia_env python=3.9
conda activate runia_env

# Install dependencies and package
pip install -r requirements.txt
pip install .
Using uv

See uv documentation for installation instructions. Dependencies are installed on first run. Therefore, you can run any script with:

# Directly run any script (dependencies are installed on first run)
uv run your_script.py

Quick Start

Computer Vision OoD Detection with Latent Space Methods LaREx

import torch
from runia_core.evaluation import Hook, get_latent_representation_mcd_samples, get_dl_h_z
from runia_core.inference import LaRExInference, MCSamplerModule, LaREMPostprocessor
from runia_core import apply_pca_ds_split

# Setup model with dropout/dropblock layer
model = YourModel()
hooked_layer = Hook(model.dropout_layer)
model.eval()

# Extract MC samples and compute entropy
latent_samples = get_latent_representation_mcd_samples(
    model, dataloader, n_samples=16, hooked_layer
)
_, entropy_samples = get_dl_h_z(latent_samples, mcd_samples_nro=16)

# Setup OoD detector
pca_train, pca_transform = apply_pca_ds_split(entropy_samples, nro_components=256)
detector = LaREMPostprocessor()
detector.setup(pca_train)

# Inference on new images
inference_module = LaRExInference(
    dnn_model=model,
    detector=detector,
    mcd_sampler=MCSamplerModule,
    pca_transform=pca_transform,
    mcd_samples_nro=16,
    layer_type="Conv"
)

prediction, confidence_score = inference_module.get_score(test_image, layer_hook=hooked_layer)

LLM Uncertainty Estimation (White-box methods) for Hallucination Detection

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
from runia_core.llm_uncertainty import compute_uncertainties

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

gen_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)

# Define uncertainty methods
uncertainty_requests = [
    {"method_name": "semantic_entropy"},
    {"method_name": "perplexity"},
    {"method_name": "eigen_score"},
    {"method_name": "RAUQ", "token_aggregation": "original", "head_aggregation": "mean_heads"}
]

# Compute uncertainties
generated_text, scores = compute_uncertainties(
    model, tokenizer, "Your prompt here",
    uncertainty_requests, gen_config, num_samples=10
)

Supported Tasks and Architectures

Computer Vision

Task Datasets (In-Dist) Datasets (OoD) Architectures
Image Classification CIFAR10 FMNIST, SVHN, Places365, Textures, iSUN, LSUN ResNet-18, ResNet-18 + Spectral Norm
Object Detection BDD100k, Pascal VOC COCO, OpenImages Faster RCNN, YOLOv8, RT-DETR, Deformable DETR, OWLv2
Semantic Segmentation Woodscape, Cityscapes Woodscape-anomalies, Cityscapes-anomalies DeepLabv3+, U-Net

Natural Language Processing

Task Datasets (In-Dist) Datasets (OoD) Architectures
Hallucination Detection SQuADv2 TriviaQA, Natural Questions, HotpotQA Llama-3.1, DistilBERT-base

Note: For epistemic uncertainty estimation in computer vision tasks, models should include dropout or DropBlock2D layers to enable Monte Carlo Dropout sampling. However, RunIA can be used with any architecture by hooking any latent layer and extracting features for OoD detection with LaRED/LaREM, without the need for MC sampling. In this case, the latent space methods will be applied on the extracted features instead of the entropy from MC samples.


Usage Examples

OOD detection in Object Detection. Evaluation Pipeline

Computer Vision: OoD Detection in Object Detection

1. Evaluation Pipeline

Evaluate OOD detection methods on In-Distribution (InD) vs Out-of-Distribution (OoD) datasets in Object Detection. The library is focused on latent space methods but can compute 10+ other methods (MSP, Energy, Mahalanobis, kNN, ViM, DDU, DICE, ReAct, etc.):

import torch
from omegaconf import OmegaConf
from runia_core.feature_extraction import Hook, BoxFeaturesExtractor, get_aggregated_data_dict, associate_precalculated_baselines_with_raw_predictions
from runia_core.evaluation import log_evaluate_larex, calculate_all_baselines, remove_latent_features
from runia_core.inference.abstract_classes import get_baselines_thresholds

# Setup
BASELINES_NAMES = ["msp", "gen", "energy", "mdist", "knn", "ddu"]
LATENT_SPACE_POSTPROCESSORS = ["MD"]
cfg = OmegaConf.create({"ood_datasets": ['ood_dataset_name']})
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load model and hook dropout/dropblock layer
model = YourModel.load_from_checkpoint("model.pt")
hooked_layers = [Hook(model.my_latent_layer)]
model.to(device).eval()

# Instantiate BoxFeaturesExtractor for object detection architectures (e.g., Faster RCNN, YOLOv8, RT-DETR, Deformable DETR, OWLv2)
samples_extractor = BoxFeaturesExtractor(
    model=model,
    hooked_layers=hooked_layers,
    device=device,
    roi_output_sizes=[16],
    roi_sampling_ratio=-1,
    return_raw_predictions=False,
    return_stds=False,
    hook_layer_output=True,
    architecture="rcnn"
)

# Extract latent samples and features for InD and OoD datasets
ind_data_dict = {
  "train": samples_extractor.get_ls_samples(ind_train_data_loader, predict_conf=0.5),
  "valid": samples_extractor.get_ls_samples(ind_val_data_loader, predict_conf=0.5)
}
aggregated_ind_data_dict = dict()
# Track images with no objects found from varying the confidence of predictions
ind_no_obj = dict()
non_empty_preds_ind_im_ids = dict()

ood_data_dict = {"ood_dataset_name": samples_extractor.get_ls_samples(ood_data_loader, predict_conf=0.5)} 
aggregated_ood_data_dict = dict()
non_empty_preds_ood_im_ids = dict()
ood_no_obj = dict()

# Preprocess ID datasets and aggregate data in the format required for evaluation (one entry per image with aggregated features from all predicted boxes)
for split in ind_data_dict:    
    aggregated_ind_data_dict, ind_no_obj, non_empty_preds_ind_im_ids = get_aggregated_data_dict(
        data_dict=ind_data_dict,
        dataset_name=split,
        aggregated_data_dict=aggregated_ind_data_dict,
        no_obj_dict=ind_no_obj,
        non_empty_predictions_ids=non_empty_preds_ind_im_ids,
        probs_as_logits=False
    )

# Preprocess OOD datasets and aggregate data in the format required for evaluation
for ood_dataset_name in cfg.ood_datasets:
    aggregated_ood_data_dict, ood_no_obj, non_empty_preds_ood_im_ids = get_aggregated_data_dict(
        data_dict=ood_data_dict,
        dataset_name=ood_dataset_name,
        aggregated_data_dict=aggregated_ood_data_dict,
        no_obj_dict=ood_no_obj,
        non_empty_predictions_ids=non_empty_preds_ood_im_ids,
        probs_as_logits=False
    )
    
aggregated_ind_data_dict, aggregated_ood_data_dict, ood_baselines_scores_dict = calculate_all_baselines(
    baselines_names=BASELINES_NAMES,
    ind_data_dict=aggregated_ind_data_dict,
    ood_data_dict=aggregated_ood_data_dict,
    fc_params=None,
    cfg=cfg,
    num_classes=10 if cfg.ind_dataset == "bdd" else 20
)

aggregated_ind_data_dict, aggregated_ood_data_dict = remove_latent_features(
    id_data=aggregated_ind_data_dict,
    ood_data=aggregated_ood_data_dict,
    ood_names=cfg.ood_datasets
)
baselines_thresholds = get_baselines_thresholds(
    baselines_names=BASELINES_NAMES,
    baselines_scores_dict=aggregated_ind_data_dict,
    z_score_percentile=cfg.z_score_thresholds
)

# Associate calculated baselines scores with raw predictions dicts
# OOD
for ood_dataset_name in cfg.ood_datasets:
    ood_data_dict[ood_dataset_name] = associate_precalculated_baselines_with_raw_predictions(
        data_dict=ood_data_dict[ood_dataset_name],
        dataset_name=ood_dataset_name,
        ood_baselines_dict=ood_baselines_scores_dict,
        baselines_names=BASELINES_NAMES,
        non_empty_ids=non_empty_preds_ood_im_ids[ood_dataset_name],
        is_ood=True
    )
# InD
ind_data_dict["valid"] = associate_precalculated_baselines_with_raw_predictions(
    data_dict=ind_data_dict["valid"],
    dataset_name="valid",
    ood_baselines_dict=aggregated_ind_data_dict,
    baselines_names=BASELINES_NAMES,
    non_empty_ids=non_empty_preds_ind_im_ids["valid"],
    is_ood=False
)
    
metrics_df, best_postprocessors_dict, postprocessor_thresholds, aggregated_ood_data_dict = log_evaluate_larex(
    cfg=cfg,
    baselines_names=BASELINES_NAMES,
    ind_data_dict=aggregated_ind_data_dict,
    ood_data_dict=aggregated_ood_data_dict,
    ood_baselines_scores=ood_baselines_scores_dict,
    mlflow_run_name="my_run_name",
    mlflow_logging=False,
    visualize_score=LATENT_SPACE_POSTPROCESSORS[0],
    postprocessors=LATENT_SPACE_POSTPROCESSORS,
)
print(metrics_df)
OOD detection in Object Detection. Inference pipeline

2. Inference Pipeline

Deploy OoD detection in production with the inference module, using the best postprocessor from evaluation (e.g., LaREM or LaRED) or any other method from the evaluation pipeline by setting the appropriate confidence threshold for predictions to be considered in inference (can be tuned based on evaluation results):

import torch
from runia_core.feature_extraction import get_aggregated_data_dict
from runia_core.inference import postprocessors_dict, ObjectLevelInference, postprocessor_input_dict

METHOD = "energy"  # or "MD" for LaREM, "KDE" for LaRED, or any other method from the evaluation pipeline
LATENT_SPACE_METHOD = False  # Set to True if using latent space postprocessors (LaREM or LaRED), False for other methods
INFERENCE_THRESHOLD = 0.5  # Set the confidence threshold for predictions to be considered in inference (can be tuned based on evaluation results)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load pre-calculated latent space activations, features or logits for InD datasets
# Extracted using the BoxFeaturesExtractor or any other method and saved in the required format for evaluation and inference
# InD
ind_data_splits = ["train", "valid"]
ind_data_dict = dict()
aggregated_ind_data_dict = dict()
non_empty_preds_ind_im_ids = dict()
# Track images with no objects found from varying the confidence of predictions
ind_no_obj = dict()
for split in ind_data_splits:
    ind_file_name = f"my/file/name_{split}.pt"
    # Load InD latent space activations
    ind_data_dict[f"{split}"] = torch.load(ind_file_name, map_location=device)
    aggregated_ind_data_dict, ind_no_obj, non_empty_preds_ind_im_ids = get_aggregated_data_dict(
        data_dict=ind_data_dict,
        dataset_name=split,
        aggregated_data_dict=aggregated_ind_data_dict,
        no_obj_dict=ind_no_obj,
        non_empty_predictions_ids=non_empty_preds_ind_im_ids,
        probs_as_logits=False
    )

postprocessor = postprocessors_dict[METHOD](flip_sign=False)
postprocessor.setup(ind_train_data=aggregated_ind_data_dict["valid logits"])

# Load model
model = YourModel.load_from_checkpoint("model.pt")
hooked_layers = []  # Specify the layers to hook for feature extraction if using latent space methods (e.g., LaREM or LaRED), otherwise can be left empty for other methods that do not require feature extraction

inference_module = ObjectLevelInference(
    model=model,
    postprocessor=postprocessor,
    architecture="RTDETR",  # or "rcnn", "yolo", "deformable_detr", "owlv2"
    latent_space_method=LATENT_SPACE_METHOD,
    postprocessor_input=postprocessor_input_dict[METHOD] if not LATENT_SPACE_METHOD else ["latent_space_means"],
    hooked_layers=hooked_layers,
    roi_output_sizes=[16],
)

with torch.no_grad():
    # Perform inference on new images
    for idx, input_im in enumerate(my_data_loader):
        predictions, scores = inference_module.get_score(input_im, predict_conf=INFERENCE_THRESHOLD)
LLM uncertainty estimation for hallucination detection

LLM Uncertainty Estimation

Detect hallucinations and measure uncertainty in LLM outputs:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
from runia_core.llm_uncertainty import compute_uncertainties

# Load model
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

gen_config = GenerationConfig(
    max_new_tokens=50,
    do_sample=True,
    top_p=0.9,
    temperature=1.0
)

# Define uncertainty methods
uncertainty_methods = [
    {"method_name": "semantic_entropy"},      # Semantic uncertainty
    {"method_name": "eigen_score"},            # Eigenvalue-based score
    {"method_name": "perplexity"},             # Model perplexity
    {"method_name": "normalized_entropy"},     # Normalized entropy
    {"method_name": "generation_entropy"},     # Generation-level entropy
    {
        "method_name": "RAUQ",                 # Attention-based uncertainty
        "token_aggregation": "original",
        "head_aggregation": "mean_heads",
        "alphas": [0.2, 0.4, 0.6],
        "ablation": True
    }
]

# Compute uncertainties
text, scores = compute_uncertainties(
    model,
    tokenizer,
    prompt="What is the capital of France?",
    uncertainty_requests=uncertainty_methods,
    gen_config=gen_config,
    num_samples=10
)

print(f"Generated: {text}")
print(f"Uncertainty Scores: {scores}")

API Overview

Core Modules

Module Description
runia.evaluation MC sampling, entropy computation, baselines, OOD evaluation metrics
runia.inference Production-ready inference with LaRED/LaREM postprocessors and other baselines
runia.feature_extraction Image-level and object-level feature extraction
runia.llm_uncertainty LLM uncertainty and hallucination detection methods
runia.dimensionality_reduction PCA and other dimensionality reduction utilities

Key Classes and Functions

Evaluation:

  • Hook: Capture layer outputs during forward pass
  • BoxFeaturesExtractor: Extract latent samples and features for object detection architectures
  • FastMCDSamplesExtractor: Efficiently extract latent samples for image classification architectures
  • log_evaluate_larex(): Evaluate LaREx and baselines with metrics

Inference:

  • postprocessors_dict: Dictionary of available postprocessors for inference
  • ObjectLevelInference: Inference module for object detection architectures
  • LaRExInference: Main inference module for OoD detection using latent space methods
  • LaREMPostprocessor: Mahalanobis distance-based detector (recommended) for latent space postprocessing

LLM Uncertainty:

  • compute_uncertainties(): Compute multiple uncertainty scores for LLM outputs

Hardware Requirements

  • CPU: Supported but slow for computer vision tasks
  • GPU: Required for efficient inference on object detection and segmentation
  • Memory: Varies by model size (8GB+ GPU memory recommended)

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests by following the Contribution Guidelines.

License

See LICENSE.txt for details.

Authors


References

Publications


About

Uncertainty estimation methods for Deep Neural Networks (DNNs) in computer vision and natural language processing, with a focus on out-of-distribution (OOD) detection.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages