Statistical analysis and visualization toolkit for iNaturalist bioblitz projects. Transform your bioblitz into charts and spatial analyses to help you and your participants share the bigger picture in a presentation
- π Comprehensive Analysis: 10+ figures covering spatial, temporal, and taxonomic patterns
- πΊοΈ Advanced Mapping: Observation hotspots, species richness heatmaps, and effort-corrected visualizations
- π Statistical Rigor: Rarefaction curves, confidence intervals, and quality thresholds
- π― Smart Interpolation: IDW interpolation for smooth spatial patterns
- β‘ Performance Optimized: Fast map providers and intelligent caching for large bioblitzes
- π Multiple Outputs: Interactive HTML slideshow (Reveal.js) and editable PowerPoint
- π Incremental Updates: Smart caching makes subsequent runs 10x faster
- π¨ Publication-Ready: High-resolution figures with customizable styling
The Data Dive script creates a comprehensive analytical presentation from your iNaturalist bioblitz data:
- Summary Statistics - Total observations, species, observers, and quality grades with photo collage
- Observation Hotspots - Spatial maps showing where observations occurred with jittered points
- Taxonomic Breakdown - Distribution across taxon groups (Plants, Insects, Birds, etc.) as treemap or bar chart
- Top Observers - Who contributed the most observations (configurable N)
- Temporal Patterns - Day vs. night activity with automatic sunrise/sunset calculation
- Species Richness Heatmaps - Three types:
- Raw species count per grid cell
- Effort-corrected richness (species per observation)
- Smooth interpolated surface (IDW)
- Rarefaction Curves - Species accumulation patterns showing:
- Overall sampling completeness
- Per-taxon accumulation rates
- Confidence intervals from 100+ permutations
-
HTML Slideshow - Interactive Reveal.js presentation with:
- Fullscreen mode (press 'F')
- Navigation controls
- Speaker view (press 'S')
-
PowerPoint (.pptx) - Editable presentation with:
- All figures as high-resolution images
- Ready for customization
- Compatible with Google Slides
- R (version 4.0 or higher) - Download here
- RStudio (Desktop version) - Download here
The script will automatically install all required packages on first run:
httr2,jsonlite- iNaturalist API connectiondplyr,tidyr,purrr,stringr- Data manipulationggplot2,sf- Visualization and mappingmaptiles,terra,osmdata- Map tiles and geographic datascales,viridis,patchwork- Advanced plottingtreemapify- Treemap visualizationssuncalc- Sunrise/sunset calculationsquarto- HTML slideshow generationofficer- PowerPoint generation
First-time setup may take 15-20 minutes while packages install.
Clone or download this repository:
git clone https://github.com/yourusername/inaturalist-bioblitz-datadive.git
cd inaturalist-bioblitz-datadiveOpen Bioblitz_Data_Dive.R in RStudio and edit the configuration section:
# --- Essential Settings ---
project_slug <- "your-project-name-here" # From your iNaturalist project URL
bioblitz_name <- "Your Location" # Appears on slides
bioblitz_year <- 2025 # Year of your bioblitz
# --- Event Window ---
date_min <- as.Date("2025-10-15")
date_max <- as.Date("2025-10-16")
# --- HQ Location (for maps) ---
hq_lon <- 116.634398 # Your headquarters longitude
hq_lat <- -34.992854 # Your headquarters latitude
# --- Optional: Add Your Logo ---
bioblitz_logo <- "your-logo.jpg" # Place logo file in project rootFinding your project slug:
- Go to your iNaturalist project page
- Copy everything after
/projects/in the URL - Example:
https://www.inaturalist.org/projects/city-nature-challenge-2025β"city-nature-challenge-2025"
Finding your coordinates:
- Right-click your HQ location in Google Maps
- Click the coordinates to copy them
- Format: longitude first (e.g., 116.634398), latitude second (e.g., -34.992854)
In RStudio:
- Open the R script file
- Click Source (top right of the script pane), or press
Ctrl+Shift+S(Windows/Linux) orCmd+Shift+S(Mac) - Wait for the script to complete (15-30 minutes first run, progress messages will appear in Console)
- Find your analysis in the
outputs/data_dive/folder
First run takes longer:
- Package installation (if needed): ~5-10 minutes
- Data download from iNaturalist: ~2-5 minutes
- Map tile downloads: ~5-15 minutes (depends on map provider)
- Figure generation: ~5-10 minutes
Subsequent runs are much faster (~2-5 minutes) thanks to caching!
HTML Slideshow:
Open outputs/data_dive/datadive.html in your web browser:
- Press
Ffor fullscreen - Press
Spaceor arrow keys to navigate - Press
Sfor speaker view (shows next slide) - Press
ESCto exit fullscreen
PowerPoint:
Open outputs/data_dive/datadive.pptx in PowerPoint or Google Slides for editing
For detailed instructions, configuration options, and troubleshooting, see:
- π Complete Data Dive Guide - Comprehensive documentation covering:
- Detailed configuration options
- All analysis types explained
- Advanced heatmap settings
- Rarefaction curve interpretation
- Performance optimization strategies
- Troubleshooting common issues
project_slug <- "your-project-slug" # iNaturalist project identifier
bioblitz_name <- "Your Location" # Name for slides
bioblitz_year <- 2025 # Year for slides
date_min <- as.Date("2025-10-15") # Start date
date_max <- as.Date("2025-10-16") # End date# Choose your map style:
map_provider <- "Esri.WorldImagery" # Satellite (high quality, slower)
map_provider <- "OpenStreetMap" # Street map (fast - recommended for large areas)
map_provider <- "CartoDB.Positron" # Minimal clean (fastest)
map_provider <- "CartoDB.Voyager" # Balanced (fast, detailed)
map_provider <- "Esri.WorldTopoMap" # Topographic (medium speed)
base_map_zoom <- 14 # Zoom level (13-15)
buffer_km <- 2.5 # Area around observations (km)# Heatmap analysis
grid_cell_size_m <- 500 # Grid resolution (250-1000m)
min_obs_per_cell <- 3 # Quality threshold
use_interpolation <- TRUE # Smooth IDW surface
# Rarefaction analysis
n_permutations <- 100 # More = smoother curves (50-500)
step_size <- 10 # Sample frequency (5-20)
# Display options
n_top_observers <- 15 # Number of top observers to show
fig2_use_treemap <- TRUE # Treemap vs. bar chart for taxarender_html <- TRUE # Generate HTML slideshow
render_powerpoint <- TRUE # Generate PowerPoint
force_rebuild <- FALSE # Regenerate all figures
use_cached_data <- TRUE # Use cached observation dataFor large bioblitz areas (>50 kmΒ²), use these settings for 10-20x faster generation:
map_provider <- "OpenStreetMap" # Fast map provider
base_map_zoom <- 12 # Lower zoom = fewer tiles
buffer_km <- 1.5 # Smaller areaSpeed comparison for 100kmΒ² area:
- With satellite imagery: 20-30 minutes
- With OpenStreetMap: 2-3 minutes
project_slug <- "your-project-2024"
bioblitz_name <- "Your Location"
bioblitz_year <- 2024
use_cached_data <- FALSE # Fresh data download
force_rebuild <- TRUE # Generate all figuresuse_cached_data <- TRUE # Reuse downloaded data
force_rebuild <- FALSE # Only regenerate missing figures
# Much faster: 2-5 minutes instead of 15-30!map_provider <- "OpenStreetMap"
base_map_zoom <- 13
buffer_km <- 5
grid_cell_size_m <- 1000 # Larger cells for sparser datamap_provider <- "Esri.WorldImagery" # Beautiful satellite
base_map_zoom <- 15 # High detail
buffer_km <- 1.5
grid_cell_size_m <- 250 # Fine spatial resolutionn_permutations <- 200 # Smoother rarefaction curves
step_size <- 5 # More detailed curves
plot_title_size <- 22 # Larger text
render_powerpoint <- TRUE # For manual editingAfter running successfully, check outputs/data_dive/:
- datadive.html - Interactive HTML slideshow
- datadive.pptx - Editable PowerPoint presentation
fig_summary_with_photos.png- Overview statistics with photo collagefig_observation_hotspots_jittered.png- Spatial distribution mapfig_observations_by_taxon.png- Taxonomic breakdown (treemap or bar)fig_top_observers.png- Top contributorsfig_observations_by_hour.png- Temporal patterns (day/night)fig_richness_raw.png- Raw species richness heatmapfig_richness_effort_corrected.png- Effort-corrected richnessfig_richness_interpolated.png- Smooth interpolated surfacefig_rarefaction_all.png- Overall rarefaction curvefig_rarefaction_by_taxon.png- Per-taxon accumulation curves
obs_cache.rds- Cached observation dataphoto_cache/- Downloaded photos for summary*.rdsfiles - Individual figure caches
Important: Don't delete cache files - they make subsequent runs much faster!
1. Observation Hotspots
- Shows where observations occurred
- Jittered points prevent overplotting
- Color-coded by iconic taxon group
- Includes HQ location marker
2. Species Richness Heatmaps Three complementary views:
-
Raw Richness: Total species per grid cell
- Shows absolute biodiversity hotspots
- Not corrected for sampling effort
-
Effort-Corrected: Species per observation
- Accounts for uneven sampling
- More accurate comparison across areas
- Only cells with 3+ observations
-
Interpolated Surface: Smooth continuous pattern
- IDW (Inverse Distance Weighting) interpolation
- Shows biodiversity gradients
- Masked to data coverage area
Observations by Hour
- Automatic sunrise/sunset calculation for your location
- Day vs. night observation patterns
- Identifies peak activity times
- Shows nocturnal vs. diurnal sampling
Rarefaction Curves
- Species accumulation with sampling effort
- Confidence intervals from 100+ permutations
- Shows if sampling was adequate
- Per-taxon comparison of diversity
Interpretation:
- Steep curve β Many species not yet found
- Flattening curve β Most species captured
- Per-taxon curves show which groups are diverse
"No observations found"
- Check
project_slugis correct - Verify
date_minanddate_maxcover your event - Ensure observations have photos and quality grades
"Map shows wrong area"
- Verify
hq_lonandhq_latare correct - Remember: longitude first, latitude second
- Check if coordinates are swapped
Script is very slow (>30 minutes)
- Switch to faster map provider:
map_provider <- "OpenStreetMap" - Reduce zoom:
base_map_zoom <- 12 - Reduce buffer:
buffer_km <- 1.5 - Set
use_cached_data <- TRUEafter first run
"Not enough data for heatmaps"
- Reduce
min_obs_per_cell(try 2 instead of 3) - Increase
grid_cell_size_m(try 750 or 1000) - Some analyses require minimum observation density
Figures look wrong or cut off
- Delete specific figure PNG files to regenerate
- Set
force_rebuild <- TRUEto regenerate all - Check that logo file path is correct (or set to "")
Package installation fails
- Update R and RStudio to latest versions
- Try manual install:
install.packages("package_name") - Check Console for specific error messages
For more help, see the Troubleshooting section in the complete guide.
.
βββ Bioblitz_Data_Dive.R # Main analysis script
βββ DATA_DIVE_GUIDE.md # Comprehensive documentation
βββ README.md # This file
βββ LICENSE.txt # GPL v3 license
βββ outputs/ # Generated analyses (created automatically)
βββ data_dive/
βββ datadive.html # HTML slideshow
βββ datadive.pptx # PowerPoint
βββ slides/ # Individual figure PNGs
βββ styles/ # CSS styling
βββ obs_cache.rds # Cached data
βββ photo_cache/ # Photo cache
Rarefaction Analysis
- Individual-based rarefaction (not sample-based)
- 100+ random permutations for confidence intervals
- Interpolation for unsampled points
- Asymptotic richness estimation
Spatial Analysis
- Grid-based binning for discrete cells
- IDW interpolation (power = 2) for continuous surfaces
- Distance masking to prevent extrapolation
- Effort correction using observations per cell
Quality Thresholds
- Minimum 3 observations per cell for effort-corrected maps
- Cells with 10+ observations marked as "Good" quality
- Rarefaction curves require 100+ observations for reliability
Caching Strategy
- Observation data cached as
.rds - Individual figures cached as PNGs
- Only regenerate when source data changes or
force_rebuild = TRUE - Can reduce runtime from 30 minutes to 2-5 minutes
Map Tile Management
- Automatic tile provider selection
- Zoom level optimization
- Spatial extent buffering
- Efficient tile download and caching
This Data Dive script complements the iNaturalist Bioblitz Slideshow Generator:
| Feature | Data Dive | Slideshow Generator |
|---|---|---|
| Purpose | Statistical analysis | Visual presentation |
| Output | Charts, maps, statistics | Photo slideshow |
| Best for | Research, event displays | Event displays, outreach |
| Figures | 10+ analytical figures | Random photo selection |
| Format | HTML + PowerPoint | HTML slideshow |
| Timing | One-time or periodic | Daily updates |
Use both! The slideshow for public display, the data dive for analysis and reporting.
Contributions are welcome! If you've made improvements or have suggestions:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-analysis) - Commit your changes (
git commit -m 'Add amazing analysis') - Push to the branch (
git push origin feature/amazing-analysis) - Open a Pull Request
This project is licensed under the GNU General Public License v3.0 - see the LICENSE.txt file for details.
This means you are free to:
- Use the software for any purpose
- Change the software to suit your needs
- Share the software with others
- Share the changes you make
Under the following conditions:
- You must share your modifications under the same GPL v3 license
- You must include the original copyright notice
- You must include a copy of the GPL v3 license
Olly Berry and Claude
- Thanks to all organizers and participants in the Walpole Wilderness Bioblitzes
- iNaturalist for providing the API and platform
- The R community for excellent statistical and mapping packages
- Quarto and Reveal.js for slideshow capabilities
- Map data providers: Esri, OpenStreetMap, CartoDB
officerpackage developers for PowerPoint generation
- Documentation: See DATA_DIVE_GUIDE.md
- Issues: Open an issue on GitHub
- Questions: Contact through GitHub discussions
Happy Analyzing! ππ¬πΏ
If you create cool analyses with this script, please consider sharing them back with the iNaturalist community!