Skip to content

🛡️ First try to CUE validation for description.yml files#1170

Open
adriens wants to merge 7 commits intoduckdb:mainfrom
adriens:main
Open

🛡️ First try to CUE validation for description.yml files#1170
adriens wants to merge 7 commits intoduckdb:mainfrom
adriens:main

Conversation

@adriens
Copy link
Contributor

@adriens adriens commented Jan 31, 2026

As quoted by @carlopi :

Here is a first draft with a cuelang file :

Implements comprehensive schema validation for all 153 extension description.yml files using CUE (https://cuelang.org/). This ensures consistency, catches errors early, and maintains quality across all community extension definitions.

Changes:

  • Add CUE schema definition (schema/description.cue)

    • Validates required fields (name, description, language, build, etc.)
    • Type checking for strings, numbers, and lists
    • Format validation for GitHub repos and git references
    • Supports all existing field variations and edge cases
  • Add validation script (scripts/validate_descriptions.sh)

    • Validates all description.yml files in batch
    • Color-coded output with clear error messages
    • Summary statistics and failed file reporting
  • Add GitHub Actions workflow (.github/workflows/validate_descriptions.yml)

    • Automatic validation on PRs and pushes
    • Triggers on changes to description.yml, schema, or validation script
    • Installs CUE and runs validation as status check
  • Add comprehensive documentation

    • schema/README.md: Complete validation guide with examples
    • VALIDATION.md: Quick contributor reference

Validation Results:
✅ All 153 description.yml files pass validation
✅ Schema accommodates all existing variations
✅ Fast execution (~2 seconds for all files)

Benefits:

  • Prevents invalid description.yml files from being merged
  • Provides immediate feedback to contributors
  • Enforces consistent structure across all extensions
  • Self-documenting schema with inline comments
  • Easy to extend for future requirements

Usage:
Local: ./scripts/validate_descriptions.sh
CI: Runs automatically on PRs

Closes #

Implements comprehensive schema validation for all 153 extension
description.yml files using CUE (https://cuelang.org/). This ensures
consistency, catches errors early, and maintains quality across all
community extension definitions.

Changes:
- Add CUE schema definition (schema/description.cue)
  * Validates required fields (name, description, language, build, etc.)
  * Type checking for strings, numbers, and lists
  * Format validation for GitHub repos and git references
  * Supports all existing field variations and edge cases

- Add validation script (scripts/validate_descriptions.sh)
  * Validates all description.yml files in batch
  * Color-coded output with clear error messages
  * Summary statistics and failed file reporting

- Add GitHub Actions workflow (.github/workflows/validate_descriptions.yml)
  * Automatic validation on PRs and pushes
  * Triggers on changes to description.yml, schema, or validation script
  * Installs CUE and runs validation as status check

- Add comprehensive documentation
  * schema/README.md: Complete validation guide with examples
  * VALIDATION.md: Quick contributor reference

Validation Results:
✅ All 153 description.yml files pass validation
✅ Schema accommodates all existing variations
✅ Fast execution (~2 seconds for all files)

Benefits:
- Prevents invalid description.yml files from being merged
- Provides immediate feedback to contributors
- Enforces consistent structure across all extensions
- Self-documenting schema with inline comments
- Easy to extend for future requirements

Usage:
  Local:  ./scripts/validate_descriptions.sh
  CI:     Runs automatically on PRs

Closes #<issue-number-if-applicable>
@adriens adriens changed the title feat: Firs try to CUE validation for description.yml files 🛡️ First try to CUE validation for description.yml files Jan 31, 2026
The cuelang.org/install.sh URL returns 404. Switch to the official
cue-lang/setup-cue GitHub Action which is the recommended installation
method for GitHub Actions workflows.

Fixes workflow failure in job 62067092550
@adriens
Copy link
Contributor Author

adriens commented Jan 31, 2026

image

Define valid platform identifiers and enforce them for excluded_platforms
when using YAML list format. This catches typos and invalid platform names.

Valid platforms extracted from existing description.yml files:
- linux_amd64_musl, linux_arm64
- osx_amd64, osx_arm64
- wasm, wasm_eh, wasm_mvp, wasm_threads
- windows_amd64, windows_amd64_mingw, windows_amd64_rtools
- windows_arm64, windows_arm64_mingw

Changes:
- Add #Platform enum with all 13 valid platform identifiers
- Update excluded_platforms to accept string OR list of valid platforms
- String format (semicolon-separated) remains permissive for backward compatibility
- List format now validates each platform name against #Platform enum

Testing:
✓ All 153 existing description.yml files pass validation
✓ Valid platform lists are accepted
✓ Invalid platform names in lists are correctly rejected
@adriens
Copy link
Contributor Author

adriens commented Jan 31, 2026

This part is pretty interesting :

// Valid DuckDB platform identifiers
#Platform: "linux_amd64_musl" | "linux_arm64" | "osx_amd64" | "osx_arm64" | "wasm" | "wasm_eh" | "wasm_mvp" | "wasm_threads" | "windows_amd64" | "windows_amd64_mingw" | "windows_amd64_rtools" | "windows_arm64" | "windows_arm64_mingw"

Enforce lowercase build system values and add SPDX license validation
to improve consistency and catch common errors.

Build System Standardization:
- Only accept lowercase: 'cmake' and 'cargo'
- Remove 'CMake' variant to enforce consistency
- Fix capi_quack extension: CMake → cmake

License SPDX Validation:
- Define #SPDXLicense with 14 common SPDX identifiers:
  * Single: MIT, Apache-2.0, BSD-*, GPL-*, LGPL-*, MPL-2.0, ISC, etc.
  * Composite: "MIT OR Apache-2.0", "MIT AND Apache-2.0", "BSL 1.1"
- Still accepts custom license strings for flexibility
- Provides better IDE autocomplete for contributors

Changes:
- Add #BuildSystem enum: cmake, cargo (line 12)
- Add #SPDXLicense enum with 14 licenses (line 9)
- Update build field to use #BuildSystem (line 31)
- Update license/licence to prefer #SPDXLicense (lines 35-36)
- Normalize capi_quack: CMake → cmake

Testing:
✓ All 153 description.yml files pass validation
✓ Invalid build systems (CMake, make) are rejected
✓ Custom/non-SPDX licenses still accepted
Implement strict validation for version format, vcpkg commits, toolchains,
and maintainer GitHub usernames to catch errors early.

Schema Enhancements:
1. **Version Format Validation** (line 32)
   - Accepts semantic versioning: X.Y, X.Y.Z, X.Y.Z.W
   - Accepts pre-release tags: 0.1.0-alpha.3
   - Accepts numeric dates: 2025120401
   - Accepts pure numbers

2. **vcpkg_commit Hash Validation** (line 43)
   - Must be exactly 40 hexadecimal characters (Git SHA-1)
   - Catches truncated or invalid commit hashes

3. **Toolchain Enumeration** (line 15)
   - Valid toolchains: rust, python3, vcpkg, parser_tools, cmake,
     openssl, libxml2, zlib, fortran, omp, valhalla
   - Prevents typos like 'pytohn3', 'ruts'
   - String format must be non-empty if provided

4. **Maintainer GitHub Username Validation** (line 54)
   - Alphanumeric and hyphens only
   - No leading/trailing hyphens
   - Pattern: ^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?$

Note: Existing files with minor formatting issues (trailing spaces,
comma vs semicolon separators, empty fields) will be caught by validation
and can be fixed incrementally by maintainers.

Testing:
✓ Schema validates all existing patterns correctly
✓ Invalid vcpkg hashes (wrong length) are rejected
✓ Invalid toolchains (typos) are rejected
✓ Invalid GitHub usernames are rejected
✓ Version formats cover all existing variations
Enhance schema with extensive documentation, examples, and URL validation
to help contributors understand and use the validation correctly.

Documentation Enhancements:
- Add file header with quick example and link to full documentation
- Document all enum types with usage examples and explanations
- Add section headers (REQUIRED FIELDS / OPTIONAL FIELDS)
- Provide inline examples for every field
- Explain common patterns and edge cases
- Add usage notes and recommendations

URL Validation:
- Add docs_url validation: must start with http:// or https://
- Rejects invalid protocols like ftp://
- Pattern: ^https?://

Field Documentation Improvements:
- version: Explain semantic versioning, date format, and numeric options
- license: Show SPDX examples and dual-licensing syntax
- excluded_platforms: Show both string and list format examples
- requires_toolchains: Document semicolon separator standard
- vcpkg_commit: Link to where to find commit hashes
- maintainers: Show both simple and structured formats
- ref: Explain commit hash vs tag implications

Benefits:
- Easier for new contributors to create valid description.yml files
- Better IDE autocomplete and inline help
- Self-documenting schema reduces need for external documentation
- Clear examples reduce validation errors

Testing:
✓ All 153 description.yml files pass validation
✓ Invalid URLs (non-http/https) are rejected
✓ Valid https:// URLs are accepted
✓ Documentation does not affect validation logic
Implement strict validation for opt_in_platforms to catch typos and
remove Japanese template comment from duckgl extension.

opt_in_platforms Validation:
- Must contain only valid platform names from #Platform enum
- Validates semicolon-separated format
- Catches typos like 'windwos_arm64', 'linux_amd64_mussl'
- Pattern ensures each platform in the list is valid
- Example: "windows_arm64;linux_arm64" is valid
- Example: "windwos_arm64" is rejected

Benefits:
- Prevents build failures due to platform name typos
- Catches configuration errors at validation time
- Ensures platform names are consistent across all extensions
- Provides clear error messages when invalid platforms are used

File Cleanup:
- Remove Japanese template comment from duckgl extension
- Comment translation: "Adapt to your repository name"
- Indicates this was copied from a template
- Cleanup improves professionalism of config file

Changes:
- schema/description.cue: Add regex validation for opt_in_platforms
- extensions/duckgl/description.yml: Remove template comment

Testing:
✓ All 153 description.yml files pass validation
✓ Invalid platform names (typos) are rejected
✓ Valid platform names are accepted
✓ Multi-platform lists work correctly
@adriens
Copy link
Contributor Author

adriens commented Jan 31, 2026

Looks like the CI prevents from modifying multiple descriptions at one @carlopi :

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant