An interactive dashboard and data pipeline for exploring NASA citizen-science cloud observations from the GLOBE Observer Clouds 2022 Challenge.
Companion site: ruddro-roy.github.io/globe-cloud-insights — a publication-style overview of the dashboard, the data pipeline, and the bundled sample.
42,700+ cloud observations from 89 countries, 7 continents, and 49,450 satellite matches — collected by citizen scientists and made explorable here.
During the NASA GLOBE Cloud Challenge 2022 (January 15 – February 15, 2022), thousands of volunteers around the world looked up at the sky, identified cloud types, and submitted observations through the GLOBE Observer app. This project takes that dataset and turns it into an interactive, visual experience that anyone can explore — no coding required.
- Explore a live dashboard showing cloud observations on a world map
- Filter by date, sky condition, and cloud type to focus on what interests you
- See charts of daily observation trends, sky condition breakdowns, and top contributing countries
- Download the data as a CSV for classroom activities, homework, or your own research
- Learn about cloud classification, citizen science, and how ground observations connect to satellite data
Ever looked up at the sky and wondered what stories those clouds tell? Me too. Out of sheer curiosity, I spent May 5 – Oct 1, 2022 volunteering remotely with NASA's GLOBE Observer (Clouds) program. That experience — standing outside with a phone pointed at the sky, carefully identifying cloud genera while a satellite passed overhead — sparked this open-source project.
The response to the 2022 Cloud Challenge was remarkable: NASA reported over 42,700 cloud observations from 89 countries across all 7 continents, with more than 49,450 satellite matches (over double the 20,000 goal), 108,000+ new sky photographs, and 321,100+ CLOUD GAZE classifications.
This project makes that dataset explorable, visual, and educational.
The dashboard includes four main views, each designed to help you understand a different aspect of the data:
| View | What it shows |
|---|---|
| World Map | Every observation plotted on a globe, with clusters showing hotspots of activity |
| Timeline | Daily observation counts, revealing peaks on school days and organized events |
| Patterns & Distributions | Sky condition breakdown (donut chart), cloud cover spread (histogram), top countries (bar chart) |
| Browse & Download | Searchable table of raw observations with one-click CSV download |
The dashboard uses plain-language labels, guided interpretation callouts, and sensible defaults so that first-time visitors can understand the data without any prior training.
Run streamlit run app/streamlit_app.py to see it live, or launch instantly via Binder/Codespaces above.
| Method | Link |
|---|---|
| Binder (launches in browser) | |
| GitHub Codespaces |
git clone https://github.com/ruddro-roy/globe-cloud-insights.git
cd globe-cloud-insights
pip install -e ".[dev,notebooks]"
streamlit run app/streamlit_app.pyNo API key or data download needed. A 500-row sample dataset is committed to the repository, so the dashboard, notebooks, and tests work out of the box for first-time visitors.
docker build -t globe-cloud-insights .
docker run -p 8501:8501 globe-cloud-insights
# Open http://localhost:8501GLOBE API ──> Raw CSV ──> Quality Filters ──> Tidy Parquet ──> Dashboard
(cached) - Valid coords (analysis-ready)
- Valid timestamps
- Bounds checks
| Stage | Code | What it does |
|---|---|---|
| Fetch | src/globe_cloud_insights/fetch.py |
Downloads data from the GLOBE API in weekly chunks, with SHA-256 checksum caching |
| Clean | src/globe_cloud_insights/clean.py |
Normalizes columns, validates coordinates, derives cloud cover % from sky condition labels, exports Parquet |
| Analyze | src/globe_cloud_insights/analysis.py |
Descriptive and inferential statistics, temporal aggregation, spatial mapping, Plotly/Folium charts |
| Explore | app/streamlit_app.py |
Interactive Streamlit dashboard with filters, maps, charts, and data export |
- Rows missing latitude or longitude: dropped
- Coordinates outside [-90, 90] / [-180, 180]: dropped
- Rows without timestamps: dropped
- Cloud cover %: inferred from sky condition labels where missing (Clear = 0%, Few = 15%, Scattered = 40%, Broken = 70%, Overcast = 95%, Obscured = 100%)
All observation data comes from the GLOBE Program, accessed through the GLOBE API.
| Detail | Value |
|---|---|
| Protocol | GLOBE Clouds (sky_conditions) |
| Date range | January 15 – February 15, 2022 |
| Campaign | NASA GLOBE Cloud Challenge 2022 |
| API endpoint | api.globe.gov/search/v1/measurement/protocol/measureddate |
| Format | GeoJSON (chunked weekly to respect the 1M-record API limit) |
Global Learning and Observations to Benefit the Environment (GLOBE) Program, date data was accessed, globe.gov
Observation coordinates are publicly available through the GLOBE Program's open data policy. However, we encourage responsible use: avoid sensitive inference about individual observers, especially at high spatial resolution.
For a deeper look at the data, two Jupyter notebooks walk through the full pipeline. Both notebooks are pre-executed with saved outputs so they render immediately on GitHub and nbviewer — no local setup required.
- 01_fetch_and_clean.ipynb — Reproducible data pipeline with provenance tracking
- 02_exploratory_analysis.ipynb — Descriptive summaries, spatial maps, temporal trends, and inferential statistics (Pearson correlation, chi-square goodness-of-fit, Kruskal–Wallis test)
- Glossary — Cloud genera, sky conditions, opacity levels, and other key terms
- Tutorial — Step-by-step guide to setup, exploration, and analysis
- Data Schema — Column definitions, quality filters, and provenance details
We welcome contributions! See CONTRIBUTING.md for full guidelines.
Quick version:
- Fork the repo
- Create a feature branch
- Make changes and add tests
- Run
black,isort,flake8,mypy,pytest - Open a pull request
@software{globe_cloud_insights_2024,
author = {Roy, Ruddro},
title = {GLOBE Cloud Insights: Interactive Visualization of NASA
GLOBE Observer Clouds 2022 Challenge Data},
year = {2024},
publisher = {GitHub},
url = {https://github.com/ruddro-roy/globe-cloud-insights},
license = {MIT}
}- NASA GLOBE Program for making citizen-science data openly available
- GLOBE Observer volunteers who contributed 42,700+ cloud observations
- Researchers: Colon Robles et al. (2020), Dodson et al. (2022) for foundational work on GLOBE Observer data quality and intense observation periods
- NASA GLOBE Cloud Challenge 2022 — https://observer.globe.gov/do-globe-observer/challenges/cloudchallenge-2022
- GLOBE Clouds Protocol — https://www.globe.gov/web/atmosphere/protocols/clouds
- NASA: Clouds in a Changing Climate — https://science.nasa.gov/science-research/earth-science/clouds-in-a-changing-climate/
- GLOBE Data Access — https://observer.globe.gov/get-data
- GLOBE API — https://www.globe.gov/globe-data/globe-api
- Colon Robles et al. (2020). GLOBE Observer Data: 2016-2019. Earth and Space Science, 7(8). doi:10.1029/2020EA001175
- Dodson et al. (2022). Intense Observation Periods. Earth and Space Science, 9(3). doi:10.1029/2021EA002058
- NASA Technical Report on GLOBE-SCC — https://ntrs.nasa.gov/api/citations/20230006162/downloads/GLOBE-SCC_paper-rev-v4.pdf
This project is licensed under the MIT License.
You are free to use, modify, and distribute this work. If you use the GLOBE data in your own projects, please credit the GLOBE Program per their citation guidance above.
- Expand dataset to cover the full 2022 calendar year
- Add satellite overpass matching analysis
- Integrate CLOUD GAZE classification data
- Add machine learning notebooks (cloud type prediction from metadata)
- Deploy the dashboard to Streamlit Community Cloud
- Create classroom-ready lesson plans and worksheets
- Add multilingual support for broader global access