Skip to content

Improve CTF test suite structure and debugging #85

@madebygps

Description

@madebygps

Goal

Make .github/skills/ctf-testing/ easier to run, debug, and extend after the current refactor is complete. The current scripts work as an end-to-end black-box test, but they are hard to narrow down when one challenge fails.

Current observations

  • deploy_and_test.sh handles deployment, SSH, reboot testing, and cleanup in one local script.
  • test_ctf_challenges.sh is one large VM-side script that solves all 18 challenges in a single run.
  • The suite has pass and fail counters, but failures do not always include enough command output to quickly see what changed.
  • There is no simple way to run only one challenge test, one section, or a fast smoke test after a targeted change.

Research notes

  • bats-core provides TAP-compliant Bash tests, filtering, setup and teardown hooks, and cleaner failure output.
  • TAP output can make test results easier to parse in CI and easier to summarize in future automation.
  • Even if we do not adopt bats, splitting the VM-side tests into reusable functions would make targeted runs and debugging simpler.

Possible approaches to evaluate

  1. Add flags such as --challenge 10, --section verify, --section export, and --smoke.
  2. Split each challenge solve into a named function with consistent diagnostics on failure.
  3. Add a machine-readable summary, such as TAP or JSON, while keeping plain terminal output readable.
  4. Keep deploy cleanup safe, but add clearer failure artifacts, for example setup log tail, failed service status, and recent journal lines.
  5. Consider bats-core only if the dependency cost is worth it for local and CI usage.
  6. Update SKILL.md so agents know when to run smoke, one-provider, all-provider, and reboot tests.

Acceptance criteria

  • A contributor can run one challenge test without running the full suite.
  • Failures show the command or artifact that caused the failure, without leaking more than needed.
  • The full provider flow still supports --with-reboot.
  • The skill documentation explains the new test modes clearly.
  • Any new test dependency is justified and documented.

Links

Metadata

Metadata

Assignees

Labels

DevOpsDevOps, CI/CD, and infrastructurebashBash/shell scriptingbugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions