Skip to content

Feature: Publish Metadata Records to GitHub Repository #430

@JessyBarrette

Description

@JessyBarrette

Overview

Add functionality for reviewers and admins to publish metadata records directly to a GitHub repository. This feature will enable automated publishing of ISO19115-3 XML and YAML files to configurable GitHub repositories with support for multiple deployment environments.

Integration with CIOOS Infrastructure

Once published to the GitHub repository:

  1. GitHub Pages: The repository will host a static website that presents the metadata files in a browsable format
  2. CKAN Harvesting: The different Regional Association (RA) CKAN catalogues will automatically harvest the published metadata from the GitHub repository
  3. Catalogue Updates: The harvested metadata will be used to update the regional CKAN catalogues, making the records discoverable through the CIOOS metadata search infrastructure

This creates an automated pipeline: Metadata Entry Form → GitHub Repository → Static Website → CKAN Catalogues → Public Discovery

User Story

As a reviewer or admin
I want to publish approved metadata records to a GitHub repository
So that the metadata files are hosted on a static website and automatically harvested by Regional Association CKAN catalogues, making them discoverable through the CIOOS metadata search infrastructure

Detailed Requirements

Workflow

  1. Initiate Publishing: Reviewer/admin clicks a "Publish to GitHub" button on the reviewer page for a specific record
  2. Select Environment: A dialog appears allowing the user to select which environment(s) to publish to (e.g., dev, staging, production)
  3. Confirm and Customize: User confirms the selection and optionally provides a custom git commit message
  4. Convert to XML: The form data is converted to ISO19115-3 XML format using the existing Python Firebase conversion function
  5. Generate YAML: The form data is also converted to YAML format
  6. Upload to GitHub: Both files are committed to the configured GitHub repository using the GitHub API
  7. Confirmation: The frontend displays a success message with:
    • Link to the commit in GitHub
    • List of files uploaded
    • Confirmation that the repository was successfully updated

Required Configurations (Admin Page)

The admin page should include a new section for GitHub Publishing Configuration:

GitHub Repository Settings

  • Repository Owner: GitHub organization or user account name
    • Default: cioos-siooc
    • The owner of the repository where metadata will be published
  • Repository Name: Repository name
    • Default: cioos-siooc-forms
    • Combined with owner forms the full URL: https://github.com/{owner}/{repo}
  • GitHub Token: Personal Access Token with repo write permissions (stored securely)
  • Target Branch: Branch name for commits (default: main)

Environment Configuration

  • Environments List: Configurable list of environment names
    • Default: ["prod"]
    • Example: ["dev", "staging", "prod"]
  • Each environment represents a subdirectory path: forms/{environment}/

File Naming Configuration

  • Naming Convention Template: Configurable template for output filenames
    • Variables available: {uuid}, {filename}, {title}, {date}
    • Default: {uuid} (uses the record's identifier field)
    • Example: {filename} or record-{uuid}

Repository Structure

Files will be organized as:

forms/
└── prod/               # Default environment
    ├── record-uuid-1.xml
    ├── record-uuid-1.yaml
    ├── record-uuid-2.xml
    ├── record-uuid-2.yaml
    └── ...

# Optional additional environments (if configured):
forms/
├── dev/
│   ├── record-uuid-1.xml
│   └── record-uuid-1.yaml
├── staging/
│   ├── record-uuid-2.xml
│   └── record-uuid-2.yaml
└── prod/
    ├── record-uuid-3.xml
    └── record-uuid-3.yaml

Technical Specifications

1. Admin Page Modifications

File: src/components/Pages/Admin.jsx

New UI Section: "GitHub Publishing Configuration"

Firebase Database Structure:

admin/{region}/githubCredentials/
  ├── owner: string (default: "cioos-siooc")
  ├── repo: string (default: "cioos-siooc-forms")
  ├── token: string (encrypted/hashed)
  ├── branch: string (default: "main")
  ├── environments: string[] (default: ["prod"])
  └── fileNamingTemplate: string (default: "{uuid}")

Note: Stored as a sibling to the existing dataciteCredentials path to leverage similar security patterns.

Implementation Notes:

  • Follow existing patterns from dataciteCredentials for secure token storage
  • Use Firebase Security Rules to restrict access to admin role only
  • Validate owner and repo names (alphanumeric, hyphens, underscores)
  • Test token validity on save (optional)
  • Full repository URL is constructed as: https://github.com/${owner}/${repo}

2. Reviewer Page Modifications

File: src/components/Pages/Reviewer.jsx

New UI Elements:

  • Add "Publish to GitHub" button in the record actions menu (alongside existing Publish/Unpublish buttons)
  • Create a new dialog component: GitHubPublishDialog

Dialog Components:

<GitHubPublishDialog>
  <EnvironmentSelector
    environments={adminConfig.environments}
    selected={selectedEnvs}
    onChange={handleEnvChange}
  />
  <TextField
    label="Commit Message (optional)"
    placeholder="Default: Publish metadata record: {record title}"
    value={commitMessage}
    onChange={handleCommitMessageChange}
  />
  <DialogActions>
    <Button onClick={handleCancel}>Cancel</Button>
    <Button onClick={handlePublish} color="primary">Publish</Button>
  </DialogActions>
</GitHubPublishDialog>

New Functions:

  • handleGitHubPublish(recordId, environments, commitMessage): Orchestrates the publishing workflow
  • publishToGitHub(): Calls the Firebase Cloud Function

3. Firebase Cloud Function

New File: firebase-functions/functions/githubPublish.js

Function Name: githubPublishRecord

Input Parameters:

{
  recordId: string,
  userId: string,
  region: string,
  environments: string[],
  commitMessage?: string
}

Function Workflow:

  1. Authenticate the caller (verify reviewer/admin role)
  2. Fetch the record data from Firebase
  3. Load GitHub configuration from admin/{region}/githubConfig/
  4. Convert record to ISO19115-3 XML using existing Python function or external API
  5. Convert record to YAML format
  6. For each environment:
    • Generate filename using the configured template
    • Prepare file content for XML and YAML
    • Use Octokit to commit both files to GitHub
  7. Return success response with commit details

Implementation Pattern (following existing issue.js):

const functions = require("firebase-functions");
const { Octokit } = require("@octokit/rest");
const axios = require("axios");

exports.githubPublishRecord = functions.https.onCall(async (data, context) => {
  // 1. Authentication check
  if (!context.auth) {
    throw new functions.https.HttpsError("unauthenticated", "User must be authenticated");
  }

  const { recordId, userId, region, environments, commitMessage } = data;

  // 2. Fetch record from Firebase
  const recordSnapshot = await admin.database()
    .ref(`${region}/users/${userId}/records/${recordId}`)
    .once("value");

  const recordData = recordSnapshot.val();

  // 3. Load GitHub config
  const configSnapshot = await admin.database()
    .ref(`admin/${region}/githubCredentials`)
    .once("value");

  const githubConfig = configSnapshot.val();

  // 4. Convert to XML and YAML
  // Call existing convert_metadata Python function or external API
  const xmlContent = await convertToXML(recordData);
  const yamlContent = await convertToYAML(recordData);

  // 5. Generate filename
  const filename = generateFilename(githubConfig.fileNamingTemplate, recordData);

  // 6. Initialize Octokit
  const octokit = new Octokit({ auth: githubConfig.token });

  // Get repository owner and name from config
  const { owner, repo } = githubConfig;

  // 7. Commit files for each environment
  const results = [];

  for (const env of environments) {
    const xmlPath = `forms/${env}/${filename}.xml`;
    const yamlPath = `forms/${env}/${filename}.yaml`;

    // Create or update files using GitHub API
    const commit = await commitFilesToGitHub(
      octokit,
      owner,
      repo,
      githubConfig.branch,
      [
        { path: xmlPath, content: xmlContent },
        { path: yamlPath, content: yamlContent }
      ],
      commitMessage || `Publish metadata record: ${recordData.title.en}`,
      recordData
    );

    results.push({
      environment: env,
      commitSha: commit.sha,
      commitUrl: commit.html_url,
      files: [xmlPath, yamlPath]
    });
  }

  return {
    success: true,
    results: results
  };
});

Helper Functions:

function generateFilename(template, recordData) {
  // Replace template variables with record data
  // {uuid} -> recordData.identifier
  // {filename} -> recordData.filename
  // {title} -> sanitized recordData.title.en
  // {date} -> current date
}

async function commitFilesToGitHub(octokit, owner, repo, branch, files, message, recordData) {
  // Use GitHub API to:
  // 1. Get current commit SHA for the branch
  // 2. Get tree SHA for current commit
  // 3. Create blobs for each file
  // 4. Create new tree with updated files
  // 5. Create new commit
  // 6. Update branch reference

  // See: https://docs.github.com/en/rest/git/commits#create-a-commit
}

async function convertToXML(recordData) {
  // Call existing Python function via httpsCallable
  // Or use external API: https://api.forms.cioos.ca/record
}

async function convertToYAML(recordData) {
  // Call existing Python function via httpsCallable
  // Or convert using record_json_to_yaml.py logic
}

4. Conversion Integration

Existing Resources:

  • Python function: firebase-functions/python-functions/main.pyconvert_metadata
  • External API: https://api.forms.cioos.ca/record
  • Package: cioos-metadata-conversion

Approach Options:

Option A: Call existing Python Firebase function

const convertMetadata = functions.httpsCallable("convert_metadata");
const xmlResult = await convertMetadata({
  record_data: recordData,
  output_format: "xml"
});
const yamlResult = await convertMetadata({
  record_data: recordData,
  output_format: "yaml"
});

Option B: Call external API

const response = await axios.post("https://api.forms.cioos.ca/record", {
  record_data: recordData,
  output_format: "xml" // or "yaml"
});
const content = response.data;

Recommendation: Use Option A (Python function) for consistency and to avoid external dependencies.

5. User Context Integration

File: src/providers/UserProvider.jsx

New Function Export:

const publishRecordToGitHub = functions.httpsCallable("githubPublishRecord");

// Add to UserContext value
value={{
  // ... existing functions
  publishRecordToGitHub,
}}

6. Security Rules Update

File: firebase-functions/database.rules.json

Add Single Rule for GitHub Credentials (add as sibling to existing dataciteCredentials rule):

"githubCredentials": {
  // Allow write access to GitHub credentials if the authenticated user's email is listed as an admin in the permissions for the region.
  ".write": "root.child('admin').child($regionAdmin).child('permissions').child('admins').val().contains(auth.email)",
}

Location: Add this inside the admin/$regionAdmin object, right after the existing dataciteCredentials rule (around line 65).

Rationale:

  • Read access: Inherited from parent rule (line 35) - allows reviewers and admins to read
  • Write access: Only admins can modify GitHub credentials (matches DataCite pattern)
  • Minimal change to existing rules structure
  • Follows the exact same pattern as dataciteCredentials

7. Dependencies

New npm packages (for Firebase Functions):

  • @octokit/rest: Already used in issue.js, no additional installation needed

Firebase Functions Configuration:

# Store GitHub token as Firebase parameter (alternative to database storage)
firebase functions:config:set github.token="ghp_xxxxxxxxxxxxx"

Affected Files

New Files

  • firebase-functions/functions/githubPublish.js - New Cloud Function for GitHub integration
  • src/components/Dialogs/GitHubPublishDialog.jsx - New dialog component (optional, can be inline)

Modified Files

Implementation Phases

Phase 1: Admin Configuration UI

  1. Create GitHub config section in Admin page
  2. Add Firebase database structure for GitHub settings
  3. Implement configuration save/load functionality
  4. Update security rules

Phase 2: Reviewer Publishing UI

  1. Add "Publish to GitHub" button to Reviewer page
  2. Create environment selection dialog
  3. Add commit message input field
  4. Implement frontend validation

Phase 3: Backend Function

  1. Create githubPublish.js Cloud Function
  2. Implement authentication and authorization checks
  3. Integrate with existing conversion functions
  4. Implement GitHub API integration using Octokit
  5. Handle file naming template logic

Phase 4: Integration & Testing

  1. Connect frontend UI to backend function
  2. Test with multiple environments
  3. Test re-publishing (overwrite) behavior
  4. Error handling and user feedback
  5. Display success confirmation with commit links

Phase 5: Documentation & Deployment

  1. Update README with new feature documentation
  2. Create admin guide for GitHub configuration
  3. Deploy Cloud Functions
  4. Deploy frontend updates

Testing Checklist

Admin Page

  • Can save GitHub repository URL
  • Can save and retrieve GitHub token securely
  • Can configure target branch (default: main)
  • Can add/remove/edit environments list
  • Can configure file naming template
  • Configuration only accessible to admins
  • Configuration persists across sessions

Reviewer Page

  • "Publish to GitHub" button visible to reviewers and admins
  • Button disabled if GitHub not configured
  • Environment selector displays configured environments
  • Can select single or multiple environments
  • Commit message field accepts custom text
  • Default commit message displays correctly
  • Loading state shown during publishing

Backend Function

  • Authenticates user correctly
  • Verifies reviewer/admin permissions
  • Fetches record data successfully
  • Converts to XML format correctly
  • Converts to YAML format correctly
  • Applies file naming template correctly
  • Commits to correct branch
  • Creates correct directory structure (forms/{env}/)
  • Both XML and YAML files appear in repository
  • Handles multiple environments in single request
  • Overwrites existing files correctly
  • Returns commit URL and details

Error Handling

  • Invalid GitHub token shows meaningful error
  • Network errors handled gracefully
  • Missing configuration shows helpful message
  • Unauthorized users receive proper error
  • Conversion failures reported to user
  • GitHub API rate limiting handled

Edge Cases

  • Re-publishing same record overwrites files
  • Special characters in filenames handled correctly
  • Very long commit messages truncated appropriately
  • Empty/missing title handled in commit message
  • Record with missing required fields handled
  • Publishing to non-existent branch handled

Success Criteria

  • Admins can configure GitHub repository settings via Admin page
  • Reviewers and admins can publish records to GitHub from Reviewer page
  • Users can select one or more environments to publish to
  • Users can provide custom commit messages
  • Records are converted to both XML and YAML formats
  • Files are committed to correct paths: forms/{environment}/{filename}.{ext}
  • Success confirmation shows commit link and file list
  • Re-publishing overwrites existing files with new commit
  • All security rules properly enforced
  • Feature works across all regions

Downstream Integration

GitHub Pages Static Website

The GitHub repository hosting the published metadata files should be configured with GitHub Pages to serve the files as a static website. This enables:

  • Direct browsing of metadata files via HTTP/HTTPS
  • Version-controlled hosting with automatic updates on each commit
  • CDN-backed delivery for reliable access by harvesting systems

Configuration Requirements:

  • GitHub Pages enabled on the repository
  • Source branch matches the configured publish branch (default: main)
  • Base URL will be: https://{owner}.github.io/{repo}/
  • Example with default repo: https://cioos-siooc.github.io/cioos-siooc-forms/
  • Metadata file URLs: https://{owner}.github.io/{repo}/forms/{environment}/{filename}.xml
  • Example full URL: https://cioos-siooc.github.io/cioos-siooc-forms/forms/prod/record-123.xml

CKAN Catalogue Harvesting

Regional Association CKAN catalogues will be configured to harvest metadata from the GitHub Pages static website:

Harvesting Configuration:

  • Harvest Source Type: CSW (Catalogue Service for the Web) or WAF (Web Accessible Folder)
  • Harvest URL: Points to the GitHub Pages static website
  • Harvest Frequency: Configurable per CKAN instance (e.g., daily, weekly)
  • Metadata Format: ISO19115-3 XML

Regional Catalogues:

  • Pacific RA CKAN
  • St. Lawrence RA CKAN
  • Atlantic RA CKAN
  • Amundsen RA CKAN
  • CanWIN RA CKAN

Harvest Workflow:

  1. CKAN harvester periodically checks the GitHub Pages URLs for each environment
  2. Harvester detects new or updated XML files
  3. Harvester parses ISO19115-3 XML and extracts metadata fields
  4. CKAN dataset records are created/updated based on the harvested metadata
  5. Records become searchable in the regional CKAN catalogue interface

Benefits:

  • Automated synchronization between metadata entry form and public catalogues
  • Version control and audit trail via Git commits
  • No direct database integration required between systems
  • Standards-compliant metadata exchange using ISO19115-3

Future Enhancements (Out of Scope)

  • Publish history tracking in Firebase
  • Batch publishing (multiple records at once)
  • Pull request creation instead of direct commit
  • Webhook integration to notify CKAN harvesters of new content
  • Preview of XML/YAML before publishing
  • Rollback/unpublish functionality
  • Different GitHub repos per environment
  • Auto-publish on status change to "published"
  • GitHub Pages index.html generation for browsing metadata files
  • STAC (SpatioTemporal Asset Catalog) format output for harvesting

Security Considerations

  1. Token Storage: GitHub tokens should be stored securely, following existing patterns for DataCite credentials
  2. Access Control: Only reviewers and admins can publish; enforced both in UI and Cloud Function
  3. Firebase Rules: GitHub config readable by reviewers but only writable by admins
  4. Input Validation: Sanitize commit messages and filenames to prevent injection attacks
  5. Rate Limiting: Consider GitHub API rate limits (5000 requests/hour for authenticated requests)
  6. Audit Trail: Log all publishing actions for accountability

Additional Notes

  • The feature leverages existing infrastructure (Firebase Functions, Python conversion, Octokit)
  • Follows established patterns from issue.js for GitHub integration
  • Maintains consistency with existing admin configuration patterns
  • Provides flexibility through configurable environments and file naming
  • Supports reviewer workflow without requiring direct GitHub access
  • Integrates into the CIOOS metadata distribution pipeline:
    • Source: CIOOS Metadata Entry Form (this application)
    • Storage & Hosting: GitHub repository with GitHub Pages
    • Discovery: Regional CKAN catalogues harvest from GitHub Pages
    • End Users: Search and discover metadata through CKAN interfaces

Related Issues/PRs

  • Related to existing GitHub issue creation feature in firebase-functions/functions/issue.js
  • Builds on existing conversion functionality in firebase-functions/python-functions/main.py
  • Complements existing publish/unpublish workflow in Reviewer page

Labels: enhancement, feature, reviewer, admin, github-integration, backend, frontend

Priority: Medium-High

Estimated Effort: 3-5 days

Assignee: TBD

Metadata

Metadata

Labels

UX ImprovementImprovements that can be made to benefit the experience of end usersenhancementNew feature or request

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions