-
Notifications
You must be signed in to change notification settings - Fork 0
Data Strategy
This document outlines the standardized data and code management practices for research and advisory projects within our computational limnology team.
Each project must follow a consistent folder and documentation structure similar to this:
βββ README.md # Project overview and objectives
βββ data/ # Organized into raw/, processed/, external/
β βββ README.md # Data descriptions and sources
βββ scripts/ # Analysis and workflow scripts
βββ outputs/ # Results: figures, tables, model output
βββ reports/ # Reports, manuscripts
βββ figures/ # Figures, visualizations
βββ targets.R # Workflow pipeline (e.g., {targets} in R)
βββ renv/ or environment.yml # Environment/dependency management
βββ devlog.md # Log of decisions and progress (optional)
-
All projects must have a GitHub repository:
-
Public for research projects
-
Private for advisory/confidential projects
-
-
Repositories must be mirrored locally on AUβs O:\ drive for backup
-
(Ideally) Use standardized naming (e.g.,
project-<year>-<topic>)
-
Use {targets} (R) or snakemake (Python/bash) for all analysis pipelines
-
Final results must be fully reproducible from raw data
-
Use renv, virtualenv, or pipenv for dependency management
-
Clearly comment all scripts and functions
-
On completion, archive the GitHub repository to Zenodo to mint a DOI
-
Include a CITATION.cff file for machine-readable citation metadata
-
Tag final release version for reproducibility and reference
All datasets should follow FAIR guidelines:
-
Findable: Documented, discoverable via DOI
-
Accessible: Public repositories or open-access archives
-
Interoperable: Use open formats (CSV, NetCDF); standard units
-
Reusable: Include metadata (metadata.yaml or data_dictionary.csv)
Each dataset must be accompanied by meta-data:
-
Description of variables (names, units, meaning)
-
Sampling methods and instruments
-
Collection date, location, and processing steps
All code must be well-commented and follow team style guides
- Preferably using {roxygen2} for commenting
Project documentation includes:
-
README.md β Overview
-
devlog.md β Key decisions and changes
-
GitHub Projects β Plan and track decisions and work packages
-
GitHub Issues β Task and discussion tracking (encouraged)
Automate repetitive tasks where possible:
-
Convert often used scripts or functions into packages/libraries (R, Python)
-
GitHub repo creation, license, and structure setup
-
Sync to O:\ drive using post-commit hooks or scheduled scripts
-
DOI publishing via GitHub-Zenodo integration