LEAP offers a few bootcamps on Climate Data Science each year using LEAP Pangeo Jupyter Hub. Please refer to the LEAP-Pangeo Technical Documentation for more information on LEAP-Pangeo
Please, use this link to join our Slack group, allowing you to be able to communicate during and after bootcamp. BootcampSlack
Check out the LEAP-Pangeo FAQs if you run into problems
Instructor: Qingyuan Yang, Ph.D., Department of Earth and Environment Engineering Columbia University
Learning Goals:
- Learn to process, analyze and visualize climate data in the cloud with scientific Python using the LEAP-Pangeo Jupyter Hub;
- Gain a understanding of fundamental climate concepts, alongside common diagnostic statistics and techniques.
The workplan for Day-1 is on the following page. We mainly follow the workshop 2023 Train- the-Trainer Bootcamp Day 1: Climate and Geospatial Data Analysis by Prof. Ryan Abernathey. The day is divided into lecture sessions and interactive studios, offering participants the chance to solidify their learning through hands-on practice. Students are asked to mirror the instructor’s each step during lecture sessions using LEAP-Pangeo Hub and are encouraged to work in groups.
🪄 Click on the links here to open the materials up directly in the LEAP-Pangeo Hub
The lectures are based on the book An Introduction to Earth and Environmental Data Science. Book github repo You can find the lecture notebooks here
(navigate to `src/lectures' to find the notebooks).
8:30 a.m.: Check-in / Breakfast
9:00 a.m.: Introduction to Bootcamp + Team
9:05 a.m.: LEAP-Pangeo Login and Pre-Lecture Preparation
9:15 a.m.: Introduction of Climate Data & LEAP-Pangeo (Record)
9:25 a.m.: Session 1: Basic Xarray (Record)
10:55 a.m.: Studio 1: Analyzing Atmospheric Radiation Data
11:45 a.m.: Lunch + Break
12:45 p.m.: Session 2: Advanced Xarray (Record)
2:15 p.m.: Studio 2: Analyzing El Niño Variability in SST Data
3:05 p.m.: Break
3:15 p.m.: Session 3: Climate Data in the Cloud (Record)
4:05 p.m.: Studio 3: Multi-Model Analysis and Comparison
4:45 p.m.: Day 1 Concludes
Machine Learning for Climate Data Instructor: Shawn Li, Ph.D. Data Science Institute, Columbia University
Day 2 builds on the data analysis foundations established on Day 1 by introducing machine learning methods tailored to climate and Earth system data. The focus is on understanding how modern neural network architectures can be applied to large-scale, spatiotemporal climate datasets, with emphasis on interpretability, workflow design, and practical implementation. Through a sequence of lectures and hands-on labs, participants will explore how machine learning models complement physical understanding in climate science, and how these tools can be deployed efficiently in a cloud-based environment.
By the end of Day 2, participants will be able to: Understand the basic structure and training principles of neural networks Apply feedforward neural networks to climate datasets in a cloud computing environment Use convolutional neural networks (CNNs) for spatial pattern recognition and temperature forecasting Understand how recurrent neural networks (RNNs) and LSTM models capture temporal dependencies in climate data Gain exposure to emerging architectures, including graph neural networks (GNNs), for Earth system applications
Day 2 follows a lecture–lab structure, with short conceptual sessions immediately followed by hands-on implementation. Each lab is designed to reinforce key ideas by applying them directly to real climate datasets using Python-based machine learning frameworks. Participants are encouraged to: Follow along with live coding demonstrations Experiment with model configurations and hyperparameters during labs Work collaboratively to discuss modeling choices, limitations, and physical interpretation
The day begins with an introduction to neural networks and their role in climate data analysis, followed by hands-on application using cloud-based workflows. The curriculum then advances to convolutional neural networks for spatial forecasting tasks, and recurrent models for temporal sequence learning. The day concludes with a forward-looking discussion on graph neural networks and their potential for representing complex Earth system interactions.
Sessions are organized to gradually move from foundational machine learning concepts toward more advanced architectures, with frequent breaks and labs to support sustained engagement. By the end of the day, participants will have implemented and evaluated multiple neural network models for climate data, and gained perspective on how these tools fit within modern climate research workflows.
All lecture notebooks, lab exercises, and supporting materials are available through the LEAP-Pangeo Jupyter Hub. Participants should open all notebooks directly in the Hub to ensure consistent computing environments and data access.
8:30 a.m.: Check-in / Breakfast
9:00 a.m.: Session 1: Introduction to Neural Networks (Record)
10:00 a.m.: Break
10:15 a.m.: Neural Networks cont’d. (Record)
10:30 a.m.: Lab: Using Neural Networks with Climate Data in the Cloud
11:30 a.m.: Lunch + Break
12:30 p.m.: Session 2: Convolutional Neural Networks (Record)
1:15 p.m.: Break
1:30 p.m.: Lab: Using CNN to forecast global temperature
2:15 p.m.: Discussion to Recurrent Neural Networks (Record)
2:45 p.m.: Break
3:00 p.m.: Lab: LSTM Model
4:15 p.m.: Final Thoughts: Graph Neural Networks (Record)
4:30 p.m.: Bootcamp concludes
5/27/25: 11 am to 1 pm
5/28/25 - 5:30/25, 6/2/25 - 6/6/25: 9:30 am to 11:30 am
Learning Goals:
- Learn to process, analyze and visualize climate data in the cloud with scientific Python using the LEAP-Pangeo Jupyter Hub;
- Gain a understanding of fundamental climate concepts, alongside common diagnostic statistics and techniques.
Day 1-4 = 5/27 - 5/30
Day 1: Climate Data format and storage; LEAP Pangeo; from Pandas to Xarray
Day 2: basic xarray (read, visualization, join, merge, and etc)
Day 3: advanced xarray (broadcast, groupby, interpolation)
Day 4: Introduction CMIP and tools to read and process the datasets & linear regression + non-parametric regression (Gaussian Process)
https://github.com/yiqioyang/LEAP_Bootcamp_summer_2025/tree/main
Learning Goals:
Day 1: Intro to Neural Networks and CNNs
Day 2: Temporal Models: Recurrent Neural Networks & Attention
Day 3: Unsupervised Pattern Discovery & Anomaly Detection
Day 4: Surrogates & Physics-Informed Models
https://github.com/AyaLahlou/LEAP_Bootcamp_summer_2025/tree/main
Day 5: Climate Modeling
Details
| Time | Topic |
|---|---|
| Day 1 | Project Review, Getting Started (Python, Scientific Computing & Visualization) |
| Day 2 | Data Exploration and Analysis with Xarray, Dask |
| Day 3 | Advanced Xarray |
| Day 4 | Fundamental of Climate Science |
Instructor: Yu Huang, Ph.D. candidate, Department of Earth and Environment Engineering Columbia University
Learning Goals:
- Learn to process, analyze and visualize climate data in the cloud with scientific Python using the LEAP-Pangeo Jupyter Hub;
- Gain a understanding of fundamental climate concepts, alongside common diagnostic statistics and techniques.
The workplan for Day-1 is on the following page. We mainly follow the workshop 2023 Train- the-Trainer Bootcamp Day 1: Climate and Geospatial Data Analysis by Prof. Ryan Abernathey. The day is divided into lecture sessions and interactive studios, offering participants the chance to solidify their learning through hands-on practice. Students are asked to mirror the instructor’s each step during lecture sessions using LEAP-Pangeo Hub and are encouraged to work in groups.
🪄 Click on the links here to open the materials up directly in the LEAP-Pangeo Hub
The lectures are based on the book An Introduction to Earth and Environmental Data Science. Book github repo You can find the lecture notebooks here
(navigate to `src/lectures' to find the notebooks).
| Time | Topic |
|---|---|
| 8:30 – 9:15 | Check in / Breakfast |
| 9:15 – 9:20 | Greeting from Prof. Pierre Gentine, LEAP center director |
| 9:20 – 9:30 | LEAP-Pangeo Login and Pre-lecture Preparation |
| 9:30 – 9:40 | Introduction of Climate Data & LEAP-Pangeo |
| 9:40 – 11:10 | Session 1: Basic Xarray (Argo) |
| 11:10 – 12:00 | Studio 1: Analyzing Atmospheric Radiation (CERES) |
| 12:00 – 13:00 | Lunch + Break |
| 13:00 – 14:30 | Session 2: Advanced Xarray |
| 14:30 – 15:20 | Studio 2: Analyzing El Niño Variability (ERSST) |
| 15:20 – 15:30 | Break |
| 15:30 – 16:20 | Session 3: Climate Modeling and Simulations in the Cloud (CMIP6) |
| 16:20 – 17:00 | Studio 3: Multi-Model Analysis and Comparison |
Syllabi and links: Week 1 Schedule
Week 2 Additional Materials, Lectures, Codes
Preprocessing, visualization and climate analysis of REU dataset
Machine learning example using toy REU dataset
Bootcamp Recordings on YouTube
We will cover the Xarray Lectures from Ryan's class. Use this link to open the assignments for the studios.
Use this link to launch the material for Day 2.
Please refer to our guide on how to prepare and run a LEAP bootcamp.
