Skip to content

BRAIN-Lab-AI/UDAN-CLIP

Repository files navigation

UDAN-CLIP 🌊

Afrah Shaahid, Muzammil Behzad
King Fahd University of Petroleum and Minerals · SDAIA-KFUPM Joint Research Center for Artificial Intelligence

Project Page GitHub arXiv License: MIT

UDAN-CLIP (Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning) is a diffusion-based framework for underwater image enhancement. By integrating CLIP-guided semantic alignment, spatial attention mechanisms, and domain-adaptive diffusion modeling, our method restores color fidelity, contrast, and fine structures in images degraded by underwater scattering and absorption.

Underwater images suffer from complex degradations including light absorption, scattering, color casts, and artifacts—making enhancement critical for effective object detection, recognition, and scene understanding in aquatic environments. Existing methods, especially diffusion-based approaches, typically rely on synthetic paired datasets due to the scarcity of real underwater references, introducing bias and limiting generalization. Furthermore, fine-tuning these models can degrade learned priors, resulting in unrealistic enhancements due to domain shifts.

UDAN-CLIP addresses these challenges through an image-to-image diffusion framework pre-trained on synthetic underwater datasets and enhanced with a customized CLIP-based classifier, a spatial attention module, and a novel CLIP-Diffusion loss. The classifier preserves natural in-air priors and semantically guides the diffusion process, while the spatial attention module focuses on correcting localized degradations such as haze and low contrast.

Here is a pipeline diagram of UDAN-CLIP: Teaser

Overview of UDAN-CLIP

UDAN-CLIP Framework

UDAN-CLIP achieves high-quality underwater image enhancement through four key components working in harmony:

  1. Domain-Adaptive Diffusion Module: Learns underwater degradation distributions and progressively restores clean images through a reverse diffusion process, preserving natural in-air priors while adapting to underwater domains.

  2. CLIP-Guided Classifier: Leverages vision-language alignment to semantically guide the enhancement process, ensuring restored images maintain semantic consistency with textual descriptions of "clear underwater scenes."

  3. Spatial Attention Mechanism: Focuses computational resources on heavily degraded regions (e.g., haze, backscatter, low-contrast areas), enabling targeted correction where it matters most.

  4. CLIP-Diffusion Loss: A novel loss function that strengthens visual-textual alignment during reverse diffusion, helping maintain semantic consistency throughout the enhancement pipeline.

Quick Start

Step 1: Clone the Repository

git clone https://github.com/BRAIN-Lab-AI/UDAN-CLIP.git
cd UDAN-CLIP

Step 2: Set Up Environment

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 3: Download Datasets

Download the following datasets and place them in the data/ directory:

  • T200 Dataset: Underwater images from turbid environments
  • Color-Checker7: Color calibration dataset
  • C60 Dataset: Comprehensive underwater image collection

Dataset links and preparation scripts will be provided soon.

Step 4: Configuration

Edit the config/config.yaml file with your model settings:

model:
  diffusion_steps: 1000
  clip_model: "ViT-B/32"
  spatial_attention: true
  
training:
  batch_size: 8
  learning_rate: 1e-4
  epochs: 100
  
data:
  dataset: "T200"
  image_size: 256

Launch UDAN-CLIP

Inference on Single Image

python infer.py --input path/to/image.jpg --output results/

Inference Demo

python infer_demo.py

Batch Processing

python sample.py --input_dir data/test_images/ --output_dir results/

Training from Scratch

python train.py --config config/config.yaml --gpu 0

Evaluation

python eval.py
python final_calculate_metrics.py

Project Structure

├── _pycache_/
├── clip_model/
│   └── ViT-B-32.pt
├── config/
│   └── config.yaml
├── core/
├── data/
│   ├── T200/
│   ├── Color-Checker7/
│   └── C60/
├── misc/
├── model/
│   ├── _pycache_/
│   ├── ddpm_modules/
│   ├── sr3_modules/
│   ├── __init__.py
│   ├── base_model.py
│   ├── model.py
│   └── networks.py
├── static/
│   └── images/
│       ├── architecture_fig1.png
│       ├── architecture_fig2.png
│       ├── C60_comparison.png
│       ├── T200_comparison.png
│       ├── Color-Checker_comparison.png
│       ├── heatmap.png
│       ├── intro_fig.png
│       ├── plot1_T200.png
│       ├── plot2_Color-Checker7.png
│       ├── plot3_C60.png
│       ├── updated_zoomedin1.png
│       ├── updated_zoomedin2.png
│       └── results_table.png
├── LICENSE
├── README.md
├── eval.py
├── final_calculate_metrics.py
├── index.html
├── infer.py
├── infer_demo.py
├── metrics_util.py
├── requirement.txt
├── sample.py
└── train.py

Key Features

Advanced Diffusion Architecture

  • Domain-Adaptive Pretraining: Leverages underwater datasets while preserving natural image priors
  • Progressive Restoration: Multi-step denoising for high-quality output
  • Semantic Guidance: CLIP-based conditioning ensures visually coherent results

CLIP-Guided Enhancement

  • Vision-Language Alignment: Leverages CLIP's multimodal understanding for semantic consistency
  • Textual Conditioning: Uses natural language prompts to guide enhancement direction
  • Contrastive Learning: Employs contrastive objectives to separate enhanced from degraded features

Spatial Attention Mechanism

  • Degradation Localization: Identifies and prioritizes heavily degraded regions
  • Adaptive Focus: Dynamically allocates computational resources based on degradation severity
  • Edge Preservation: Maintains structural integrity while removing artifacts

Comprehensive Evaluation

  • Multiple Metrics: PSNR, SSIM, UIQM, UCIQE, and CPBD for thorough assessment
  • Benchmark Datasets: Evaluated on T200, Color-Checker7, and C60
  • Visual Comparisons: Qualitative results against state-of-the-art methods

Results

Quantitative Comparison

Results Table

UDAN-CLIP consistently outperforms baseline methods across all evaluation metrics and datasets:

Dataset Metric Improvement over SOTA
T200 PSNR +16.15
T200 SSIM +11.38
Color-Checker7 UIQM +0.064
C60 CPBD +0.165

Qualitative Results

C60 Comparison Comparison on C60 dataset showing superior color correction and detail recovery

T200 Comparison Results on turbid T200 images demonstrating haze removal and contrast enhancement

Color Checker Comparison Color checker evaluation showing accurate color restoration

Detail Enhancement

Preservation of fine textures and structural details in complex foreground regions. Our UDAN-CLIP recovers intricate patterns (e.g., coral formations, reef textures) that are lost or blurred in competing approaches. Recovery of fine details in challenging low-light underwater conditions. Our UDAN-CLIP reveals hidden structural elements (e.g., facial features, coin engravings, fish scales, and pool textures) that remain completely obscured in the CLIP-UIE baseline due to severe light absorption and scattering.

Quantitative Plots

T200 Dataset Metrics Color-Checker7 Analysis C60 Dataset Results

Performance Heatmap

Heatmap Analysis Performance heatmap showing enhancement quality across different degradation levels

Community Contributions

We welcome contributions from the community! Here are some ways you can help:

  • Report bugs: Open an issue if you encounter any problems
  • Suggest improvements: Share ideas for enhancing the model or codebase
  • Add features: Submit pull requests for new functionality
  • Share results: Showcase UDAN-CLIP applications in your research

We are particularly interested in:

  • Extending to underwater video enhancement
  • Integration with underwater robotics platforms
  • Adaptation for specific underwater environments (coral reefs, deep sea, etc.)
  • Lightweight versions for edge deployment

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find UDAN-CLIP helpful for your research, please cite our paper:

@article{shaahid2025udanclip,
  title={Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement},
  author={Shaahid, Afrah and Behzad, Muzammil},
  journal={arXiv preprint arXiv:2505.19895},
  year={2025}
}

Acknowledgements

We thank the King Fahd University of Petroleum and Minerals and SDAIA-KFUPM JRC for Artificial Intelligence for supporting this research. We also acknowledge the developers of CLIP and the diffusion models that inspired this work.

Project Page

Visit our project website for more details, visual results, and updates.

Contact

For questions or collaborations, please contact:


⭐ If you find UDAN-CLIP useful, please consider starring the repository! ⭐

About

UDAN-CLIP: Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors