UDAN-CLIP 🌊

Afrah Shaahid, Muzammil Behzad

^{King Fahd University of Petroleum and Minerals} · ^{SDAIA-KFUPM Joint Research Center for Artificial Intelligence}

UDAN-CLIP (Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning) is a diffusion-based framework for underwater image enhancement. By integrating CLIP-guided semantic alignment, spatial attention mechanisms, and domain-adaptive diffusion modeling, our method restores color fidelity, contrast, and fine structures in images degraded by underwater scattering and absorption.

Underwater images suffer from complex degradations including light absorption, scattering, color casts, and artifacts—making enhancement critical for effective object detection, recognition, and scene understanding in aquatic environments. Existing methods, especially diffusion-based approaches, typically rely on synthetic paired datasets due to the scarcity of real underwater references, introducing bias and limiting generalization. Furthermore, fine-tuning these models can degrade learned priors, resulting in unrealistic enhancements due to domain shifts.

UDAN-CLIP addresses these challenges through an image-to-image diffusion framework pre-trained on synthetic underwater datasets and enhanced with a customized CLIP-based classifier, a spatial attention module, and a novel CLIP-Diffusion loss. The classifier preserves natural in-air priors and semantically guides the diffusion process, while the spatial attention module focuses on correcting localized degradations such as haze and low contrast.

Here is a pipeline diagram of UDAN-CLIP:

Overview of UDAN-CLIP

UDAN-CLIP achieves high-quality underwater image enhancement through four key components working in harmony:

Domain-Adaptive Diffusion Module: Learns underwater degradation distributions and progressively restores clean images through a reverse diffusion process, preserving natural in-air priors while adapting to underwater domains.
CLIP-Guided Classifier: Leverages vision-language alignment to semantically guide the enhancement process, ensuring restored images maintain semantic consistency with textual descriptions of "clear underwater scenes."
Spatial Attention Mechanism: Focuses computational resources on heavily degraded regions (e.g., haze, backscatter, low-contrast areas), enabling targeted correction where it matters most.
CLIP-Diffusion Loss: A novel loss function that strengthens visual-textual alignment during reverse diffusion, helping maintain semantic consistency throughout the enhancement pipeline.

Quick Start

Step 1: Clone the Repository

git clone https://github.com/BRAIN-Lab-AI/UDAN-CLIP.git
cd UDAN-CLIP

Step 2: Set Up Environment

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 3: Download Datasets

Download the following datasets and place them in the data/ directory:

T200 Dataset: Underwater images from turbid environments
Color-Checker7: Color calibration dataset
C60 Dataset: Comprehensive underwater image collection

Dataset links and preparation scripts will be provided soon.

Step 4: Configuration

Edit the config/config.yaml file with your model settings:

model:
  diffusion_steps: 1000
  clip_model: "ViT-B/32"
  spatial_attention: true
  
training:
  batch_size: 8
  learning_rate: 1e-4
  epochs: 100
  
data:
  dataset: "T200"
  image_size: 256

Launch UDAN-CLIP

Inference on Single Image

python infer.py --input path/to/image.jpg --output results/

Inference Demo

python infer_demo.py

Batch Processing

python sample.py --input_dir data/test_images/ --output_dir results/

Training from Scratch

python train.py --config config/config.yaml --gpu 0

Evaluation

python eval.py
python final_calculate_metrics.py

Project Structure

├── _pycache_/
├── clip_model/
│   └── ViT-B-32.pt
├── config/
│   └── config.yaml
├── core/
├── data/
│   ├── T200/
│   ├── Color-Checker7/
│   └── C60/
├── misc/
├── model/
│   ├── _pycache_/
│   ├── ddpm_modules/
│   ├── sr3_modules/
│   ├── __init__.py
│   ├── base_model.py
│   ├── model.py
│   └── networks.py
├── static/
│   └── images/
│       ├── architecture_fig1.png
│       ├── architecture_fig2.png
│       ├── C60_comparison.png
│       ├── T200_comparison.png
│       ├── Color-Checker_comparison.png
│       ├── heatmap.png
│       ├── intro_fig.png
│       ├── plot1_T200.png
│       ├── plot2_Color-Checker7.png
│       ├── plot3_C60.png
│       ├── updated_zoomedin1.png
│       ├── updated_zoomedin2.png
│       └── results_table.png
├── LICENSE
├── README.md
├── eval.py
├── final_calculate_metrics.py
├── index.html
├── infer.py
├── infer_demo.py
├── metrics_util.py
├── requirement.txt
├── sample.py
└── train.py

Key Features

Advanced Diffusion Architecture

Domain-Adaptive Pretraining: Leverages underwater datasets while preserving natural image priors
Progressive Restoration: Multi-step denoising for high-quality output
Semantic Guidance: CLIP-based conditioning ensures visually coherent results

CLIP-Guided Enhancement

Vision-Language Alignment: Leverages CLIP's multimodal understanding for semantic consistency
Textual Conditioning: Uses natural language prompts to guide enhancement direction
Contrastive Learning: Employs contrastive objectives to separate enhanced from degraded features

Spatial Attention Mechanism

Degradation Localization: Identifies and prioritizes heavily degraded regions
Adaptive Focus: Dynamically allocates computational resources based on degradation severity
Edge Preservation: Maintains structural integrity while removing artifacts

Comprehensive Evaluation

Multiple Metrics: PSNR, SSIM, UIQM, UCIQE, and CPBD for thorough assessment
Benchmark Datasets: Evaluated on T200, Color-Checker7, and C60
Visual Comparisons: Qualitative results against state-of-the-art methods

Results

Quantitative Comparison

UDAN-CLIP consistently outperforms baseline methods across all evaluation metrics and datasets:

Dataset	Metric	Improvement over SOTA
T200	PSNR	+16.15
T200	SSIM	+11.38
Color-Checker7	UIQM	+0.064
C60	CPBD	+0.165

Qualitative Results

Comparison on C60 dataset showing superior color correction and detail recovery

Results on turbid T200 images demonstrating haze removal and contrast enhancement

Color checker evaluation showing accurate color restoration

Detail Enhancement


Preservation of fine textures and structural details in complex foreground regions. Our UDAN-CLIP recovers intricate patterns (e.g., coral formations, reef textures) that are lost or blurred in competing approaches.	Recovery of fine details in challenging low-light underwater conditions. Our UDAN-CLIP reveals hidden structural elements (e.g., facial features, coin engravings, fish scales, and pool textures) that remain completely obscured in the CLIP-UIE baseline due to severe light absorption and scattering.

Quantitative Plots


T200 Dataset Metrics	Color-Checker7 Analysis	C60 Dataset Results

Performance Heatmap

Performance heatmap showing enhancement quality across different degradation levels

Community Contributions

We welcome contributions from the community! Here are some ways you can help:

Report bugs: Open an issue if you encounter any problems
Suggest improvements: Share ideas for enhancing the model or codebase
Add features: Submit pull requests for new functionality
Share results: Showcase UDAN-CLIP applications in your research

We are particularly interested in:

Extending to underwater video enhancement
Integration with underwater robotics platforms
Adaptation for specific underwater environments (coral reefs, deep sea, etc.)
Lightweight versions for edge deployment

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find UDAN-CLIP helpful for your research, please cite our paper:

@article{shaahid2025udanclip,
  title={Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement},
  author={Shaahid, Afrah and Behzad, Muzammil},
  journal={arXiv preprint arXiv:2505.19895},
  year={2025}
}

Acknowledgements

We thank the King Fahd University of Petroleum and Minerals and SDAIA-KFUPM JRC for Artificial Intelligence for supporting this research. We also acknowledge the developers of CLIP and the diffusion models that inspired this work.

Project Page

Visit our project website for more details, visual results, and updates.

Contact

For questions or collaborations, please contact:

Afrah Shaahid: afrahshaahid@outlook.com
Muzammil Behzad: muzammil.behzad@kfupm.edu.sa

⭐ If you find UDAN-CLIP useful, please consider starring the repository! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
C60_2images		C60_2images
__pycache__		__pycache__
clip_model		clip_model
config		config
core		core
data		data
misc		misc
model		model
static/images		static/images
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
final_calculate_metrics.py		final_calculate_metrics.py
index.html		index.html
infer.py		infer.py
infer_demo.py		infer_demo.py
metrics_util.py		metrics_util.py
requirement.txt		requirement.txt
sample.py		sample.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

UDAN-CLIP 🌊

Overview of UDAN-CLIP

Quick Start

Step 1: Clone the Repository

Step 2: Set Up Environment

Step 3: Download Datasets

Step 4: Configuration

Launch UDAN-CLIP

Inference on Single Image

Inference Demo

Batch Processing

Training from Scratch

Evaluation

Project Structure

Key Features

Advanced Diffusion Architecture

CLIP-Guided Enhancement

Spatial Attention Mechanism

Comprehensive Evaluation

Results

Quantitative Comparison

Qualitative Results

Detail Enhancement

Quantitative Plots

Performance Heatmap

Community Contributions

License

Citation

Acknowledgements

Project Page

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages