Use Case
When iterating on an LLM system's safety, you want to:
- Generate attacks once (expensive — uses simulator LLM)
- Re-run the same attacks against improved versions of your model over time
- Compare results across iterations
Currently this workflow is not supported end-to-end.
Current State
RiskAssessment.save(to) exists and writes a JSON file with all test cases, scores, and metadata
- There is no
load(), from_json(), or from_file() method to deserialize a saved RiskAssessment back into Python objects
reuse_simulated_test_cases=True only works within the same Python session (reuses in-memory self.test_cases)
- The
EnumEncoder used in save() has no companion decoder
Proposed Solution
Minimum: Add RiskAssessment.load(path) / RedTeamer.load_test_cases(path)
A class method that deserializes saved JSON back into RTTestCase objects with proper enum types, so they can be injected into red_teamer.test_cases for reuse.
# Save after generation
risk_assessment.save(to='./red-team-attacks/')
# Load in a new session
red_teamer = RedTeamer(...)
red_teamer.load_test_cases('./red-team-attacks/results_20260309.json')
# Re-run with fresh model callback
risk_assessment = red_teamer.red_team(
model_callback=my_callback,
reuse_simulated_test_cases=True,
)
Ideal: First-class dataset support
A Dataset or AttackDataset class (similar to how AISafetyFramework subclasses work with _has_dataset=True) that:
Workaround
We currently:
- Save generated attacks to JSON manually (extracting fields from
RTTestCase)
- Reconstruct
RTTestCase objects from JSON with manual enum mapping
- Call the API ourselves for all test cases
- Use
vulnerability._get_metric(type).a_measure(test_case) directly
This works but requires significant boilerplate and knowledge of DeepTeam internals.
Environment
- deepteam 1.0.6
- Python 3.12
Use Case
When iterating on an LLM system's safety, you want to:
Currently this workflow is not supported end-to-end.
Current State
RiskAssessment.save(to)exists and writes a JSON file with all test cases, scores, and metadataload(),from_json(), orfrom_file()method to deserialize a savedRiskAssessmentback into Python objectsreuse_simulated_test_cases=Trueonly works within the same Python session (reuses in-memoryself.test_cases)EnumEncoderused insave()has no companion decoderProposed Solution
Minimum: Add
RiskAssessment.load(path)/RedTeamer.load_test_cases(path)A class method that deserializes saved JSON back into
RTTestCaseobjects with proper enum types, so they can be injected intored_teamer.test_casesfor reuse.Ideal: First-class dataset support
A
DatasetorAttackDatasetclass (similar to howAISafetyFrameworksubclasses work with_has_dataset=True) that:RiskAssessmentWorkaround
We currently:
RTTestCase)RTTestCaseobjects from JSON with manual enum mappingvulnerability._get_metric(type).a_measure(test_case)directlyThis works but requires significant boilerplate and knowledge of DeepTeam internals.
Environment