Improve CTF test suite structure and debugging

## Goal
Make `.github/skills/ctf-testing/` easier to run, debug, and extend after the current refactor is complete. The current scripts work as an end-to-end black-box test, but they are hard to narrow down when one challenge fails.

## Current observations
- `deploy_and_test.sh` handles deployment, SSH, reboot testing, and cleanup in one local script.
- `test_ctf_challenges.sh` is one large VM-side script that solves all 18 challenges in a single run.
- The suite has pass and fail counters, but failures do not always include enough command output to quickly see what changed.
- There is no simple way to run only one challenge test, one section, or a fast smoke test after a targeted change.

## Research notes
- bats-core provides TAP-compliant Bash tests, filtering, setup and teardown hooks, and cleaner failure output.
- TAP output can make test results easier to parse in CI and easier to summarize in future automation.
- Even if we do not adopt bats, splitting the VM-side tests into reusable functions would make targeted runs and debugging simpler.

## Possible approaches to evaluate
1. Add flags such as `--challenge 10`, `--section verify`, `--section export`, and `--smoke`.
2. Split each challenge solve into a named function with consistent diagnostics on failure.
3. Add a machine-readable summary, such as TAP or JSON, while keeping plain terminal output readable.
4. Keep deploy cleanup safe, but add clearer failure artifacts, for example setup log tail, failed service status, and recent journal lines.
5. Consider bats-core only if the dependency cost is worth it for local and CI usage.
6. Update `SKILL.md` so agents know when to run smoke, one-provider, all-provider, and reboot tests.

## Acceptance criteria
- A contributor can run one challenge test without running the full suite.
- Failures show the command or artifact that caused the failure, without leaking more than needed.
- The full provider flow still supports `--with-reboot`.
- The skill documentation explains the new test modes clearly.
- Any new test dependency is justified and documented.

## Links
- bats-core docs: https://bats-core.readthedocs.io/
- bats-core writing tests and TAP output notes: https://bats-core.readthedocs.io/en/stable/writing-tests.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CTF test suite structure and debugging #85

Goal

Current observations

Research notes

Possible approaches to evaluate

Acceptance criteria

Links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improve CTF test suite structure and debugging #85

Description

Goal

Current observations

Research notes

Possible approaches to evaluate

Acceptance criteria

Links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions