Data Strategy

This document outlines the standardized data and code management practices for research and advisory projects within our computational limnology team.

📁 Project File Structure

Each project must follow a consistent folder and documentation structure similar to this:


├── README.md             # Project overview and objectives
├── data/                 # Organized into raw/, processed/, external/
│   └── README.md         # Data descriptions and sources
├── scripts/              # Analysis and workflow scripts
├── outputs/              # Results: figures, tables, model output
├── reports/              # Reports, manuscripts
├── figures/              # Figures, visualizations
├── targets.R             # Workflow pipeline (e.g., {targets} in R)
├── renv/ or environment.yml # Environment/dependency management
└── devlog.md             # Log of decisions and progress (optional)

🔄 Version Control & Backup

All projects must have a GitHub repository:
- Public for research projects
- Private for advisory/confidential projects
Repositories must be mirrored locally on AU’s O:\ drive for backup
(Ideally) Use standardized naming (e.g., project-<year>-<topic>)

🧪 Reproducibility Standards

Use {targets} (R) or snakemake (Python/bash) for all analysis pipelines
Final results must be fully reproducible from raw data
Use renv, virtualenv, or pipenv for dependency management
Clearly comment all scripts and functions

📦 Data Publishing & Archiving

On completion, archive the GitHub repository to Zenodo to mint a DOI
Include a CITATION.cff file for machine-readable citation metadata
Tag final release version for reproducibility and reference

✅ FAIR Data Principles

All datasets should follow FAIR guidelines:

Findable: Documented, discoverable via DOI
Accessible: Public repositories or open-access archives
Interoperable: Use open formats (CSV, NetCDF); standard units
Reusable: Include metadata (metadata.yaml or data_dictionary.csv)

Each dataset must be accompanied by meta-data:

Description of variables (names, units, meaning)
Sampling methods and instruments
Collection date, location, and processing steps

🧾 Documentation & Communication

All code must be well-commented and follow team style guides

Preferably using {roxygen2} for commenting

Project documentation includes:

README.md – Overview
devlog.md – Key decisions and changes
GitHub Projects – Plan and track decisions and work packages
GitHub Issues – Task and discussion tracking (encouraged)

⚙️ Automation & Tools

Automate repetitive tasks where possible:

Convert often used scripts or functions into packages/libraries (R, Python)
GitHub repo creation, license, and structure setup
Sync to O:\ drive using post-commit hooks or scheduled scripts
DOI publishing via GitHub-Zenodo integration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Strategy

📁 Project File Structure

🔄 Version Control & Backup

🧪 Reproducibility Standards

📦 Data Publishing & Archiving

✅ FAIR Data Principles

🧾 Documentation & Communication

⚙️ Automation & Tools

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally