VLA for Autonomous Driving (VLAD)

A research project exploring Vision-Language-Action (VLA) models for autonomous driving. VLAD combines visual perception with language understanding to predict driving actions and trajectories in simulation, using CARLA and the Bench2Drive dataset.

Overview

VLAD integrates:

Vision-Language Models (VLMs) — Qwen3-VL for multimodal reasoning over ego-view images
DriveFusion — Transformer-based fusion of image embeddings with diffusion for trajectory prediction
Diffusion Policy — Action prediction conditioned on visual and state inputs
Bench2Drive — Large-scale driving dataset from CARLA (HuggingFace: rethinklab/Bench2Drive)

The model predicts future waypoints and actions from camera history, ego-state, and navigation commands, suitable for end-to-end autonomous driving in simulation.

Project Structure

├── src/
│   ├── models/           # DriveFusion, diffusion policy, diffusion transformer
│   ├── dataloaders/      # Bench2Drive dataset loaders (single-frame & history)
│   ├── vlm/              # Qwen VLM wrappers, embedding cache
│   ├── driver/           # CARLA driver with VLM backbone
│   └── utils/            # Bench2Drive parsing, visualization
├── scripts/              # Setup, dataset download, testing
├── media/                # Example ego-view images for VLM testing
└── carla.sh              # CARLA server launcher (Docker)

Example ego-view image used for VLM queries:

Quick Start

1. Environment Setup

On Bridges-2:

source scripts/setup_env.sh
# Then: conda activate ./conda/vlad

Local: Create a conda env with Python 3.8 and install src/requirements.txt (PyTorch 2.2, CARLA 0.9.15, etc.).

2. Run CARLA Simulation (CARLA 0.9.15)

Open two terminals:

Terminal 1 — Start CARLA server:

./carla.sh

Terminal 2 — Run client:

python3 src/CarlaClientTest.py

This spawns a vehicle in Town02 and runs autopilot for 60 seconds. CARLA listens on port 2000.

3. Download Dataset

Bench2Drive Base (full set):

scripts/download_bench2drive_base.sh

4. Test Dataloader

python3 scripts/test_dataloader.py

This loads the Bench2Drive dataset, prints statistics, and saves sample visualizations (images with overlaid waypoints) to output/samples/.

Training

Diffusion Policy: python3 src/models/train_diffusion_policy.py
DriveFusion: python3 src/models/train_drivefusion.py

Both use Hydra for config and Weights & Biases for logging.

Dependencies

Key dependencies: PyTorch 2.2, CARLA 0.9.15, PyTorch Lightning, HuggingFace Hub, Hydra, OpenCV, Pandas. See src/requirements.txt for full list.

CMU Intro to Deep Learning 11785 Project — VLA for Autonomous Driving

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
media		media
scripts		scripts
src		src
.devcontainer.json		.devcontainer.json
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
carla.sh		carla.sh
requirements.txt		requirements.txt
spawns_Town03.json		spawns_Town03.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLA for Autonomous Driving (VLAD)

Overview

Project Structure

Quick Start

1. Environment Setup

2. Run CARLA Simulation (CARLA 0.9.15)

3. Download Dataset

4. Test Dataloader

Training

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

parths5/VLA-for-Autonomous-Driving

Folders and files

Latest commit

History

Repository files navigation

VLA for Autonomous Driving (VLAD)

Overview

Project Structure

Quick Start

1. Environment Setup

2. Run CARLA Simulation (CARLA 0.9.15)

3. Download Dataset

4. Test Dataloader

Training

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages