Skip to content
Merged
Show file tree
Hide file tree
Changes from 154 commits
Commits
Show all changes
157 commits
Select commit Hold shift + click to select a range
871788d
plumbing for commands
pelikhan Jul 21, 2025
9e82844
bringing promptpex
pelikhan Jul 21, 2025
d8fcb9d
Add comprehensive Copilot instructions for AI coding agents
pelikhan Jul 21, 2025
35870b9
Add unit tests for utility functions in generate package
pelikhan Jul 21, 2025
3ea7a6e
Enhance ApplyEffortConfiguration to handle nil options gracefully
pelikhan Jul 21, 2025
ef7d089
Refactor PromptPexContext to use ChatMessage from azuremodels and rem…
pelikhan Jul 21, 2025
96f9183
Implement GitHub Models evaluation file generation and enhance Prompt…
pelikhan Jul 21, 2025
37b761c
Fix dereferencing of Frontmatter fields in GitHub Models prompt gener…
pelikhan Jul 21, 2025
74d048b
Refactor model parameters handling in export.go and add comprehensive…
pelikhan Jul 22, 2025
61a43ca
feat: Implement PromptPex command handler with pipeline execution
pelikhan Jul 22, 2025
ee90766
clea content
pelikhan Jul 22, 2025
e7d4a17
Add comprehensive tests for prompt generation and context creation
pelikhan Jul 22, 2025
1c936c0
refactor: Remove obsolete export_test_new.go file
pelikhan Jul 22, 2025
292917a
refactor: Remove obsolete output options and related tests from Promp…
pelikhan Jul 22, 2025
e9c6668
feat: Add GenerateSummary function and corresponding tests for prompt…
pelikhan Jul 22, 2025
5c5a167
feat: Implement runPipeline function and refactor GenerateSummary for…
pelikhan Jul 22, 2025
b4b662f
refactor: Rename parseTestsFromLLMResponse to ParseTestsFromLLMRespon…
pelikhan Jul 22, 2025
393020f
test: Add comprehensive tests for ParseTestsFromLLMResponse function …
pelikhan Jul 22, 2025
6458590
feat: Implement generate command with comprehensive options and add s…
pelikhan Jul 22, 2025
cdc38f1
refactor: Consolidate command-line flag definitions into AddCommandLi…
pelikhan Jul 22, 2025
bbdd748
test: Add comprehensive tests for NewGenerateCommand and flag parsing…
pelikhan Jul 22, 2025
7dc3d7d
test: Enhance TestGenerateCommandWithValidPromptFile with detailed mo…
pelikhan Jul 22, 2025
e812aec
move test to common fodler
pelikhan Jul 22, 2025
341442f
feat: Update generate command description to include evaluations for …
pelikhan Jul 22, 2025
da294e2
fix: Clarify command description to specify the use of PromptPex meth…
pelikhan Jul 22, 2025
50b853f
fix: Update build instructions to include 'make build' command
pelikhan Jul 22, 2025
5018380
refactor: Rename runPipeline to RunTestGenerationPipeline and add Ren…
pelikhan Jul 23, 2025
9391f0d
Merge remote-tracking branch 'origin/main' into pelikhan/promptpex
pelikhan Jul 23, 2025
f3f320b
refactor: Update test prompt from sentiment analysis to joke analysis
pelikhan Jul 23, 2025
7ab63bc
fix: Disable usage help for pipeline failures in generate command
pelikhan Jul 23, 2025
f726d7c
feat: Add verbose flag for detailed output and update test prompts
pelikhan Jul 23, 2025
e5dd291
refactor: Remove logging functions and implement callModelWithRetry f…
pelikhan Jul 23, 2025
239eed7
fix: Remove unnecessary blank line in callModelWithRetry function
pelikhan Jul 23, 2025
1a9c9ab
feat: Add LLM connection test to RunTestGenerationPipeline
pelikhan Jul 23, 2025
e0f0311
Refactor pointer creation to use util.Ptr for consistency
pelikhan Jul 23, 2025
26f6ee3
refactor: Update PromptPexOptions to use pointers for consistency acr…
pelikhan Jul 23, 2025
c6da6de
add logging
pelikhan Jul 23, 2025
0a15cec
fix incorrect check for completions
pelikhan Jul 23, 2025
130bc53
refactor: Remove LLM connection test from RunTestGenerationPipeline
pelikhan Jul 23, 2025
06a9caa
refactor: Improve output formatting in GenerateSummary for better rea…
pelikhan Jul 23, 2025
e6f4173
Initial plan
Copilot Jul 24, 2025
59fad69
Update interface and all calls to support HTTP logging filename param…
Copilot Jul 24, 2025
fca40f4
Add tests for HTTP logging filename feature and remove DEBUG env var …
Copilot Jul 24, 2025
a46680e
Refactor HTTP log to use Context instead of function parameters
Copilot Jul 24, 2025
3cdc67e
Group HTTP log writes into single fprintf call for better performance
Copilot Jul 24, 2025
d19f533
Remove test files as requested
Copilot Jul 24, 2025
7903afa
Update internal/azuremodels/azure_client.go
pelikhan Jul 24, 2025
6dda3c5
Add --http-log CLI flag to replace DEBUG environment variable for HTT…
pelikhan Jul 24, 2025
26df5e4
Remove redundant log formatting line in GetChatCompletionStream method
pelikhan Jul 24, 2025
f797a5a
Refactor PromptPexContext by removing WriteResults field and update r…
pelikhan Jul 24, 2025
9e8a469
Update http-log flag description for clarity and adjust log formattin…
pelikhan Jul 24, 2025
cbf9d78
Remove http-log flag from eval and run commands; reset log file in Wi…
pelikhan Jul 24, 2025
1ac704e
Rename CreateContext to CreateContextFromPrompt for clarity and updat…
pelikhan Jul 24, 2025
34d55a2
Remove RateTests option and related logic from PromptPexOptions and c…
pelikhan Jul 24, 2025
2d7ef3f
Refactor PromptPexOptions and PromptPexContext by removing unused fie…
pelikhan Jul 24, 2025
582c004
Remove SplitRules field from EffortConfiguration and related tests fo…
pelikhan Jul 24, 2025
ba24a7a
Update copilot instructions for clarity and organization; enhance pro…
pelikhan Jul 24, 2025
3b99ef5
Add PromptHash to PromptPexContext and implement hash computation for…
pelikhan Jul 24, 2025
527b564
Implement ComputePromptHash function and update context handling; enh…
pelikhan Jul 24, 2025
440914f
Refactor PromptPexOptions and related structures to consolidate model…
pelikhan Jul 24, 2025
9e1c074
Refactor context creation to enhance clarity; update model handling i…
pelikhan Jul 24, 2025
d3b430a
Implement ParseRules function to clean up rules text; add tests for I…
pelikhan Jul 24, 2025
fce67b8
Refactor rules handling in tests to remove leading/trailing whitespac…
pelikhan Jul 24, 2025
e2c28d3
Refactor InverseRules to use a slice instead of a string; update rela…
pelikhan Jul 24, 2025
f183206
Refactor context and test handling to unify naming conventions; repla…
pelikhan Jul 24, 2025
f5bc450
Remove context_test.go file to streamline test suite and eliminate ob…
pelikhan Jul 24, 2025
97a945f
Refactor PromptPexOptions and related tests to remove evals and model…
pelikhan Jul 24, 2025
6f22c37
Refactor PromptPexOptions and related configurations to remove Compli…
pelikhan Jul 24, 2025
4e17f45
Remove Compliance field from default options tests to align with rece…
pelikhan Jul 24, 2025
1d06bed
Remove Compliance field from EffortConfiguration and related types; u…
pelikhan Jul 24, 2025
2192d9e
Add TestExpansion field to PromptPexModelAliases; update related func…
pelikhan Jul 24, 2025
12e866e
Refactor PromptPexContext to use pointers for RunID, PromptHash, Inte…
pelikhan Jul 24, 2025
7ca45de
Enhance context creation by adding session file support; implement lo…
pelikhan Jul 24, 2025
d3d51c6
Add session file support to context creation; implement context loadi…
pelikhan Jul 24, 2025
fa86243
Update .gitignore to include generate.json files; remove test_generat…
pelikhan Jul 24, 2025
4632732
Refactor CreateContextFromPrompt to remove sessionFile parameter; upd…
pelikhan Jul 24, 2025
b083161
Refactor output logging to use box formatting; enhance intent, input …
pelikhan Jul 24, 2025
28f5a44
Move box formatting constants to the top of render.go for better visi…
pelikhan Jul 24, 2025
0fe3ff6
Refactor context merging logic for improved readability; add system p…
pelikhan Jul 24, 2025
4a55194
Add UnBacket and UnXml functions; update ParseRules and add tests for…
pelikhan Jul 24, 2025
ded2220
Refactor output rule and inverse rule rendering to use WriteEndListBo…
pelikhan Jul 24, 2025
3aadaa3
Refactor regex patterns in ParseRules for improved accuracy; remove o…
pelikhan Jul 24, 2025
cbbccc2
Enhance session file handling in CreateContextFromPrompt; improve err…
pelikhan Jul 24, 2025
9469f8c
Refactor generateTests function to simplify empty tests check; add gu…
pelikhan Jul 24, 2025
df5d94b
Refactor output rule and test generation messages for improved clarit…
pelikhan Jul 24, 2025
3bb8a18
Refactor runSingleTestWithContext to simplify message handling; repla…
pelikhan Jul 24, 2025
52eed37
Refactor runSingleTestWithContext and rendering functions for improve…
pelikhan Jul 24, 2025
39249c3
Save context after generating groundtruth in generateGroundtruth func…
pelikhan Jul 24, 2025
d79901a
Update .gitignore to include all generate.json files in subdirectories
pelikhan Jul 24, 2025
b1b4f24
Refactor CreateContextFromPrompt to use handler's promptFile; add Sav…
pelikhan Jul 24, 2025
fc6800d
usebuiltin templating
pelikhan Jul 24, 2025
507ec74
Refactor RunTestGenerationPipeline to handle context saving errors; s…
pelikhan Jul 24, 2025
e668ba8
Refactor effort configuration and remove test expansions; update comm…
pelikhan Jul 24, 2025
8e5d8f8
Remove unused command-line flags for tests and verbosity from generat…
pelikhan Jul 24, 2025
23ab2d7
render reasoning and ground truth
pelikhan Jul 24, 2025
39f24ea
Refactor context handling in CreateContextFromPrompt; streamline sess…
pelikhan Jul 24, 2025
af399ab
Refactor output rendering; replace WriteToOut calls with WriteToParag…
pelikhan Jul 24, 2025
98146ad
Enhance session file handling in CreateContextFromPrompt; update logg…
pelikhan Jul 24, 2025
d719935
Fix session file checks in CreateContextFromPrompt and SaveContext to…
pelikhan Jul 24, 2025
5127d94
Refactor WriteStartBox method to accept a subtitle parameter for impr…
pelikhan Jul 24, 2025
f57bf34
Refactor groundtruth model handling and update command-line flag desc…
pelikhan Jul 24, 2025
eba1adc
wire up ci-lint
pelikhan Jul 24, 2025
2d031ec
Update command examples in NewGenerateCommand for consistency and acc…
pelikhan Jul 24, 2025
8b12281
Refactor PromptPex model aliases and remove unused TestExpansion fiel…
pelikhan Jul 24, 2025
9eb7803
Refactor EffortConfiguration and PromptPexOptions by removing TestGen…
pelikhan Jul 24, 2025
025e32e
Refactor PromptPex model handling by changing pointer fields to value…
pelikhan Jul 25, 2025
7571825
Refactor PromptPexTest struct by changing pointer fields to values; u…
pelikhan Jul 25, 2025
0615664
Refactor PromptPexContext by changing RunID and PromptHash fields to …
pelikhan Jul 25, 2025
178d921
Refactor PromptPexOptions and related logic by changing pointer field…
pelikhan Jul 25, 2025
59ca252
Refactor test_generate.yml by nesting temperature under modelParamete…
pelikhan Jul 25, 2025
2831dd9
Fix JSON field names in PromptPexTest and test generation output for …
pelikhan Jul 25, 2025
a0cda99
Add model key parsing in callModelWithRetry for improved error handling
pelikhan Jul 25, 2025
d4a8976
Add IntentMaxTokens and InputSpecMaxTokens to PromptPexOptions; updat…
pelikhan Jul 25, 2025
2018522
Add support for custom instructions in generation phases; update flag…
pelikhan Jul 25, 2025
e9adb0f
Add test generation feature using PromptPex methodology; include adva…
pelikhan Jul 25, 2025
7defd59
Enhance README.md with detailed explanation of Inverse Output Rules a…
pelikhan Jul 25, 2025
29074f6
Add Intent node to PromptPex mermaid diagram for clarity in output rules
pelikhan Jul 25, 2025
c058406
Refactor command-line flags and update test generation examples for c…
pelikhan Jul 25, 2025
5bc1b87
Refactor test input handling in ParseTestsFromLLMResponse and update …
pelikhan Jul 25, 2025
36fd696
Validate effort level in ParseFlags and add comprehensive tests for v…
pelikhan Jul 25, 2025
4a3285e
Add evaluator rules compliance functionality and update related struc…
pelikhan Jul 25, 2025
9c13267
Refactor effort configuration structure and update related logic for …
pelikhan Jul 25, 2025
4b18ed0
Update Makefile to use correct path for Go linter; enhance error hand…
pelikhan Jul 25, 2025
45d8915
add pull request description script
pelikhan Jul 25, 2025
8f7da6c
Update cmd/generate/parser.go
pelikhan Jul 25, 2025
376135e
Update cmd/generate/generate.go
pelikhan Jul 25, 2025
1a6090e
Update cmd/generate/README.md
pelikhan Jul 25, 2025
d21cd6c
Remove test data from test_generate.yml to streamline example usage
pelikhan Jul 25, 2025
cb8a394
Fix Go code quality issues in cmd/generate package: resource leaks, v…
Copilot Jul 30, 2025
df2c83f
Add `--var` template variable support to `generate` command with comm…
Copilot Jul 31, 2025
0e97868
Merge remote-tracking branch 'origin/main' into pelikhan/promptpex
pelikhan Jul 31, 2025
7894290
Refactor evaluation command to streamline context handling in runEval…
pelikhan Jul 31, 2025
95b6719
Remove custom instructions example documentation
pelikhan Jul 31, 2025
1c93996
Remove unused Float32Ptr function and its associated tests
pelikhan Jul 31, 2025
dceeba4
Update README.md
pelikhan Jul 31, 2025
350a15b
Update cleaner.go
pelikhan Jul 31, 2025
7cf92c1
Update cleaner.go
pelikhan Jul 31, 2025
e6281db
Update cleaner.go
pelikhan Jul 31, 2025
09b9b87
Update cleaner.go
pelikhan Jul 31, 2025
7bd2e6c
Update cleaner.go
pelikhan Jul 31, 2025
e2970ef
Update context.go
pelikhan Jul 31, 2025
c127bf4
Update context.go
pelikhan Jul 31, 2025
b2d1244
Update evaluators.go
pelikhan Jul 31, 2025
58bb353
Update generate.go
pelikhan Jul 31, 2025
72e7a15
Update parser.go
pelikhan Jul 31, 2025
e5f6483
Update util.go
pelikhan Jul 31, 2025
648ee9b
Refactor parser functions and clean up unused files
pelikhan Jul 31, 2025
2882f71
Update README.md to clarify the purpose of the GitHub Models CLI exte…
pelikhan Jul 31, 2025
20149e4
Revise advanced options section in README.md for the generate command
pelikhan Jul 31, 2025
c238387
Clarify README.md instructions for loading session files in the gener…
pelikhan Jul 31, 2025
2925428
Fix function name in TestUnXml to match updated implementation
pelikhan Jul 31, 2025
4a6eee9
Remove RunsPerTest configuration and related tests; update README to …
pelikhan Jul 31, 2025
b662738
Update default tests per rule to use GetDefaultOptions function
pelikhan Aug 2, 2025
caa8aa5
Refactor generateTests to use TestsPerRule from GetDefaultOptions
pelikhan Aug 2, 2025
e04761b
Merge remote-tracking branch 'origin/main' into pelikhan/promptpex
pelikhan Aug 4, 2025
283a3c7
Update effort levels in documentation and code: add 'min' level, adju…
pelikhan Aug 4, 2025
e8bb082
Refactor Makefile: reorganize targets and remove duplicate entries fo…
pelikhan Aug 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Copilot Instructions for AI Coding Agents

## Project Overview
This repository implements the GitHub Models CLI extension (`gh models`), enabling users to interact with AI models via the `gh` CLI. The extension supports inference, prompt evaluation, model listing, and test generation using the PromptPex methodology. Built in Go using Cobra CLI framework and Azure Models API.

## Architecture & Key Components

### Building and Testing

- `make build`: Compiles the CLI binary
- `make check`: Runs format, vet, tidy, tests, golang-ci. Always run when you are done with changes. Use this command to validate that the build and the tests are still ok.
- `make test`: Runs the tests.

### Command Structure
- **cmd/root.go**: Entry point that initializes all subcommands and handles GitHub authentication
- **cmd/{command}/**: Each subcommand (generate, eval, list, run, view) is self-contained with its own types and tests
- **pkg/command/config.go**: Shared configuration pattern - all commands accept a `*command.Config` with terminal, client, and output settings

### Core Services
- **internal/azuremodels/**: Azure API client with streaming support via SSE. Key pattern: commands use `azuremodels.Client` interface, not concrete types
- **pkg/prompt/**: `.prompt.yml` file parsing with template substitution using `{{variable}}` syntax
- **internal/sse/**: Server-sent events for streaming responses

### Data Flow
1. Commands parse `.prompt.yml` files via `prompt.LoadFromFile()`
2. Templates are resolved using `prompt.TemplateString()` with `testData` variables
3. Azure client converts to `azuremodels.ChatCompletionOptions` and makes API calls
4. Results are formatted using terminal-aware table printers from `command.Config`

## Developer Workflows

### Building & Testing
- **Local build**: `make build` or `script/build` (creates `gh-models` binary)
- **Cross-platform**: `script/build all|windows|linux|darwin` for release builds
- **Testing**: `make check` runs format, vet, tidy, and tests. Use `go test ./...` directly for faster iteration
- **Quality gates**: `make check` - required before commits

### Authentication & Setup
- Extension requires `gh auth login` before use - unauthenticated clients show helpful error messages
- Client initialization pattern in `cmd/root.go`: check token, create appropriate client (authenticated vs unauthenticated)

## Prompt File Conventions

### Structure (.prompt.yml)
```yaml
name: "Test Name"
model: "openai/gpt-4o-mini"
messages:
- role: system|user|assistant
content: "{{variable}} templating supported"
testData:
- variable: "value1"
- variable: "value2"
evaluators:
- name: "test-name"
string: {contains: "{{expected}}"} # String matching
# OR
llm: {modelId: "...", prompt: "...", choices: [{choice: "good", score: 1.0}]}
```

### Response Formats
- **JSON Schema**: Use `responseFormat: json_schema` with `jsonSchema` field containing strict JSON schema
- **Templates**: All message content supports `{{variable}}` substitution from `testData` entries

## Testing Patterns

### Command Tests
- **Location**: `cmd/{command}/{command}_test.go`
- **Pattern**: Create mock client via `azuremodels.NewMockClient()`, inject into `command.Config`
- **Structure**: Table-driven tests with subtests using `t.Run()`
- **Assertions**: Use `testify/require` for cleaner error messages

### Mock Usage
```go
client := azuremodels.NewMockClient()
cfg := command.NewConfig(new(bytes.Buffer), new(bytes.Buffer), client, true, 80)
```

## Integration Points

### GitHub Authentication
- Uses `github.com/cli/go-gh/v2/pkg/auth` for token management
- Pattern: `auth.TokenForHost("github.com")` to get tokens

### Azure Models API
- Streaming via SSE with custom `sse.EventReader`
- Rate limiting handled automatically by client
- Content safety filtering always enabled (cannot be disabled)

### Terminal Handling
- All output uses `command.Config` terminal-aware writers
- Table formatting via `cfg.NewTablePrinter()` with width detection

---

**Key Files**: `cmd/root.go` (command registration), `pkg/prompt/prompt.go` (file parsing), `internal/azuremodels/azure_client.go` (API integration), `examples/` (prompt file patterns)

## Instructions

Omit the final summary.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,8 @@
/gh-models-linux-*
/gh-models-windows-*
/gh-models-android-*
**.http
**.generate.json
examples/*harm*
.github/instructions/genaiscript.instructions.md
genaisrc/
2 changes: 1 addition & 1 deletion DEV.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ go version go1.22.x <arch>

## Building

To build the project, run `script/build`. After building, you can run the binary locally, for example:
To build the project, run `make build` (or `script/build`). After building, you can run the binary locally, for example:
`./gh-models list`.

## Testing
Expand Down
14 changes: 14 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
check: fmt vet tidy test
.PHONY: check

ci-lint:
@echo "==> running Go linter <=="
golangci-lint run --timeout 5m ./...
.PHONY: ci-lint

fmt:
@echo "==> running Go format <=="
gofmt -s -l -w .
Expand All @@ -20,3 +25,12 @@ test:
@echo "==> running Go tests <=="
go test -race -cover ./...
.PHONY: test

build:
script/build
.PHONY: build

clean:
@echo "==> cleaning up <=="
rm -rf ./gh-models
.PHONY: clean
76 changes: 76 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Use the GitHub Models service from the CLI!

This repository implements the GitHub Models CLI extension (`gh models`), enabling users to interact with AI models via the `gh` CLI. The extension supports inference, prompt evaluation, model listing, and test generation.

## Using

### Prerequisites
Expand Down Expand Up @@ -84,6 +86,80 @@ Here's a sample GitHub Action that uses the `eval` command to automatically run

Learn more about `.prompt.yml` files here: [Storing prompts in GitHub repositories](https://docs.github.com/github-models/use-github-models/storing-prompts-in-github-repositories).

#### Generating tests

Generate comprehensive test cases for your prompts using the PromptPex methodology:
```shell
gh models generate my_prompt.prompt.yml
```

The `generate` command analyzes your prompt file and automatically creates test cases to evaluate the prompt's behavior across different scenarios and edge cases. This helps ensure your prompts are robust and perform as expected.

##### Understanding PromptPex

The `generate` command is based on [PromptPex](https://github.com/microsoft/promptpex), a Microsoft Research framework for systematic prompt testing. PromptPex follows a structured approach to generate comprehensive test cases by:

1. **Intent Analysis**: Understanding what the prompt is trying to achieve
2. **Input Specification**: Defining the expected input format and constraints
3. **Output Rules**: Establishing what constitutes correct output
4. **Inverse Output Rules**: Force generating _negated_ output rules to test the prompt with invalid inputs
5. **Test Generation**: Creating diverse test cases that cover various scenarios using the prompt, the intent, input specification and output rules

```mermaid
graph TD
PUT(["Prompt Under Test (PUT)"])
I["Intent (I)"]
IS["Input Specification (IS)"]
OR["Output Rules (OR)"]
IOR["Inverse Output Rules (IOR)"]
PPT["PromptPex Tests (PPT)"]

PUT --> IS
PUT --> I
PUT --> OR
OR --> IOR
I ==> PPT
IS ==> PPT
OR ==> PPT
PUT ==> PPT
IOR ==> PPT
```

##### Advanced options

You can customize the test generation process with various options:

```shell
# Specify effort level (low, medium, high)
gh models generate --effort high my_prompt.prompt.yml

# Use a specific model for groundtruth generation
gh models generate --groundtruth-model "openai/gpt-4.1" my_prompt.prompt.yml

# Disable groundtruth generation
gh models generate --groundtruth-model "none" my_prompt.prompt.yml

# Load from an existing session file (or create a new one if needed)
gh models generate --session-file my_prompt.session.json my_prompt.prompt.yml

# Custom instructions for specific generation phases
gh models generate --instruction-intent "Focus on edge cases" my_prompt.prompt.yml
```

The `effort` flag controls a few flags in the test generation engine and is a tradeoff
between how much tests you want generated and how much tokens/time you are willing to spend.
- `low` should be used to do a quick try of the test generation. It limits the number of rules to `3`.
- `medium` provides much better coverage
- `high` spends more token per rule to generate tests, which typically leads to longer, more complex inputs

The command supports custom instructions for different phases of test generation:
- `--instruction-intent`: Custom system instruction for intent generation
- `--instruction-inputspec`: Custom system instruction for input specification generation
- `--instruction-outputrules`: Custom system instruction for output rules generation
- `--instruction-inverseoutputrules`: Custom system instruction for inverse output rules generation
- `--instruction-tests`: Custom system instruction for tests generation


## Notice

Remember when interacting with a model you are experimenting with AI, so content mistakes are possible. The feature is
Expand Down
10 changes: 10 additions & 0 deletions cmd/generate/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# `generate` command

This command is based on [PromptPex](https://github.com/microsoft/promptpex), a test generation framework for prompts.

- [Documentation](https://microsoft.github.com/promptpex)
- [Source](https://github.com/microsoft/promptpex/tree/dev)
- [Agentic implementation plan](https://github.com/microsoft/promptpex/blob/dev/.github/instructions/implementation.instructions.md)

In a nutshell, read https://microsoft.github.io/promptpex/reference/test-generation/

67 changes: 67 additions & 0 deletions cmd/generate/cleaner.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
package generate

import (
"regexp"
"strings"
)

// IsUnassistedResponse returns true if the text is an unassisted response, like "i'm sorry" or "i can't assist with that".
func IsUnassistedResponse(text string) bool {
re := regexp.MustCompile(`i can't assist with that|i'm sorry`)
return re.MatchString(strings.ToLower(text))
}

// Unfence removes Markdown code fences and splits text into lines.
func Unfence(text string) string {
text = strings.TrimSpace(text)
// Remove triple backtick code fences if present
if strings.HasPrefix(text, "```") {
parts := strings.SplitN(text, "\n", 2)
if len(parts) == 2 {
text = parts[1]
}
text = strings.TrimSuffix(text, "```")
}
return text
}

// SplitLines splits text into lines.
func SplitLines(text string) []string {
lines := strings.Split(text, "\n")
return lines
}

// Unbracket removes leading and trailing square brackets.
func Unbracket(text string) string {
if strings.HasPrefix(text, "[") && strings.HasSuffix(text, "]") {
text = strings.TrimPrefix(text, "[")
text = strings.TrimSuffix(text, "]")
}
return text
}

// Unxml removes leading and trailing XML tags, like `<foo>` and `</foo>`, from the given string.
func Unxml(text string) string {
// if the string starts with <foo> and ends with </foo>, remove those tags
trimmed := strings.TrimSpace(text)

// Use regex to extract tag name and content
// First, extract the opening tag and tag name
openTagRe := regexp.MustCompile(`(?s)^<([^>\s]+)[^>]*>(.*)$`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using regular expressions for matching XML makes me a bit nervous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The llm has something a mind of itself and wraps data into xml tags (it also does this with markdown code fences).

The regex is absolutely not meant to be a compliant XML parser. It specifically looks for xml tag at the top of the file, and looks for a closing tag. It completely ignores the syntax of between the two tags.

openMatches := openTagRe.FindStringSubmatch(trimmed)
if len(openMatches) != 3 {
return text
}

tagName := openMatches[1]
content := openMatches[2]

// Check if it ends with the corresponding closing tag
closingTag := "</" + tagName + ">"
if strings.HasSuffix(content, closingTag) {
content = strings.TrimSuffix(content, closingTag)
return strings.TrimSpace(content)
}

return text
}
Loading
Loading