[herd] Decouple analysis results from CLI output by fsestini · Pull Request #1833 · herd/herdtools7

fsestini · 2026-05-12T11:08:15Z

This PR is a step toward making herd7 reusable from OCaml code, as outlined in #1782. It primarily addresses point 2 of that issue, namely:

Gradually decouple outputs from stdout/files, so callers can consume results directly as OCaml values instead of scraping CLI output or temporary files.

The main change is to split the functionality of the Top_herd module into:

a result-producing core, which returns structured OCaml values; and
a CLI layer, which preserves the existing stdout/dot-file behaviour.

The user-facing herd7 CLI is intended to remain unchanged.

What Changed

Before this PR, Top_herd mixed three responsibilities in the same control flow:

generating candidate executions and computing analysis results;
formatting analysis results and selected executions for output; and
managing CLI output resources such as stdout dot blocks, output directories, temporary dot files, and viewers.

This PR separates those responsibilities. Top_herd.Make(...).run is now the result-producing part: it generates executions, checks them against the model, and returns structured OCaml values rather than printing directly. Top_herd.Printer is the formatting layer: it knows how to turn those structured values into strings. The new Cli module implements the remaining CLI-specific resource management: directories, temporary files, channels, etc.

The intended split is that the first two pieces can be reused from library-style callers (hence why they are both accessed through Top_herd), while file and directory management stays in a CLI-specific place and won't be exposed.

One detail of this split is that selected executions are returned through an iterator-shaped value, rather than as a list. This is to preserve the previous memory behaviour where executions are processed one by one and not retained in memory.

Note for reviewers: most of this refactor consists of taking pieces of the old Top_herd module and moving them to more specific places. The diff might therefore look noisier than the actual conceptual change, as most of the code that is shown being added/removed in the diff is simply being moved around with little to no change. Overall, I tried my best to keep the diff as tight as possible.

Concretely, this PR proposes the following changes:

New module `Top_herd.TestResult`

The main types introduced here are stats and execution. stats represents the overall results of a litmus test simulation, i.e. the final states, candidate counts, witnesses, flags, etc. execution carries data for a single execution.

Calling Top_herd.Make(M).run now takes a test : M.S.test as input, and returns a triple:

A M.S.test, which is an updated version of the input
A list M.S.event_structure list of event structures (pre solver)
An iterator (execution -> unit) -> stats that iterates over executions and produces a stats summary at the end

The iterator follows the existing CLI selection policy: it visits the executions selected by options such as show and nshow, not every candidate execution. This keeps the new API close to the behaviour currently exposed by the CLI. A more fine-grained iterator over candidate executions can be added later if needed.

Updated `ParseTest`/`RunTest`

The "run" functions in ParseTest/RunTest have been updated so parsing and running return structured outcomes, rather than print out those outcomes themselves.

Outcomes are returned as a first-class module. This seems needed, because the type of analysis outcomes is indexed by semantics S : SemExtra.S, and moreover the input test determines which SemExtra.S/XXXMem.S instance herd7 constructs. Since the exact semantics module is only known after parsing and dispatching on the test architecture and variants, it has to be part of the returned value somehow.

New module `Cli`

The new Cli module contains the parts of the old Top_herd implementation that are specific to the CLI. Most of this code is intentionally copied almost directly from the old Top_herd module.

Changed how `showsome` is computed within `ParseTest`

Before this refactor, showsome was determined inside ParseTest (in part) from the outputdir parameter. As ParseTest is now moving toward being a more generic API, I thought a very CLI-specific parameter like outputdir would not be appropriate. The new collect_graph_data parameter for ParseTest is an attempt to preserve the old behaviour using a less CLI-specific parameter. Having said this, I do still find it a bit awkward to use a module parameter for controlling optimization behaviour, so it would be great if we can later on find a way to remove this while preserving the optimizations as much as possible.

Tests

Added cram tests under herd/tests/other/output.t to ensure that CLI behaviours touched by the refactor are preserved.

TiberiuBucur

I haven't gotten around to reviewing everything just yet, but so far I like the implementation. I left two minor comments so far. In addition, I'm having trouble running make install after the project builds (using make). Apparently the _build/default/herdtools7.install is missing.

TiberiuBucur · 2026-05-12T16:02:28Z

-(* START NOTWWW *)
-(* Interval timer will be stopped just before output, see top_herd *)
-    Itimer.start name TopConf.timeout ;
-(* END NOTWWW *)
-    let start_time = Sys.time () in
-    Misc.input_protect (do_from_file start_time env name) name


Is there any concern in dropping the timing data recording?

Time recording is still performed as before, it just has been moved out of ParseTest and into the new Cli module.

The rationale is that I think timing is a property of a particular client’s consumption of the result, not of the result itself. Since ParseTest is now moving towards exposing a reusable library API whose results are consumed through an iterator in various different ways (in CLI, concurrently, in tests, etc.), the library itself should not impose any particular timing policy onto callers.
In herd7, the CLI the component that knows what operations need to be timed, so now timing is handled there. Other consumers of this API will have the choice to set up their own timing policy, or to not measure time at all (like we do in our test suite).

Having said this, you've reminded me that I should make the time parameter of Top_herd.Printer.pp_stats optional, in case callers want to print out run stats without the 'Time' bit!

TiberiuBucur · 2026-05-12T16:12:58Z

-          begin match Conf.outputdir with
-          | PrettyConf.StdoutOutput | PrettyConf.Outputdir _ -> true
-          | _ -> false
-          end || Misc.is_some Conf.PC.view || Conf.variant Variant.MemTag
+          Conf.collect_graph_data
+              || Misc.is_some Conf.PC.view || Conf.variant Variant.MemTag


I am personally fine with having this flag, but I was curious if you measured a change in performance with and without the showsome flag guard for the code that builds the show member of the state in interpreter.ml. How big of an optimisation are we looking at, compared to recording every relation and filtering at the end?

Thanks, this is a good point. The short answer is that I have not looked deeply into this yet, but I agree it is a question worth asking.

compared to recording every relation and filtering at the end?

I am interpreting this as: what would happen if the interpreter always constructed the show state, and we only decided later which parts of it to use.

One thing worth noting is that show is currently represented lazily via Lazy.t. So, even with showsome = true, we are not necessarily eagerly computing every shown relation while the interpreter runs. What we may still be doing is allocating thunks needed to compute the shown relations at a later stage. So my current understanding is that the likely cost of setting showsome = true unconditionally is less "eagerly compute every relation" and more "allocate and retain more show-related data, possibly increasing memory footprint and GC pressure".

I did a rough comparison on this test:

AArch64 wait-flag1 { 0:X0=x; 1:X0=x; 1:X2=y; } P0 | P1 ; MOV W1,#1 |L0: ; | MOV W3,#1 ; | LDADD W3,WZR,[X2] ; STR W1,[X0] | LDR W1,[X0] ; | CBZ W1,L0 ; exists(1:X1=1)

with

$ /usr/bin/time -l ./_build/default/herd/herd.exe -set-libdir ./herd/libdir ./wait-flag1.litmus -unroll 3

For this particular test, I did not see a measurable difference in runtime or peak memory between the normal build and a build with showsome = true unconditionally. On average:

Standard: 5.74 real 3.52 user 1.20 sys 17907712 maximum resident set size 11616664 peak memory footprint With showsome = true: 5.71 real 3.53 user 1.19 sys 17874944 maximum resident set size 11600304 peak memory footprint

That said, this is not a definitive benchmark. I chose this test because it is heavier on herd7 than others, but nevertheless, it might not be stressing the show machinery particularly well. (@HadrienRenaud I wonder if you might know of any litmus tests that could serve as a good benchmark for this.) I also tried time make test and did not see a visible performance difference there either.

Bottom line is, I don't feel I have enough evidence one way or the other, right now. My reason for keeping the flag is to keep the refactor conservative w.r.t. runtime behaviour, but I agree this is worth measuring more systematically at some point.

fsestini added 5 commits May 5, 2026 09:54

[herd] Remove unused bindings

75cbe05

[herd] Add Constraints.do_constraints_to_string

6a77742

[herd] Add tests to check output behaviour

03ca492

[herd] Add interface file for AArch64ASLParseTest

ae99ab6

[herd] Add interface file for AArch64ParseTest

360a995

fsestini requested review from HadrienRenaud, artkhyzha, maranget and relokin May 12, 2026 11:08

fsestini self-assigned this May 12, 2026

fsestini added 2 commits May 12, 2026 13:01

[herd] Add Test_herd.compute_size

e7a1025

[herd] Split test runners into pure logic and cli output

b969e28

fsestini force-pushed the herd-test-results-simple-iter branch from c2da892 to b969e28 Compare May 12, 2026 12:01

TiberiuBucur reviewed May 12, 2026

View reviewed changes

fix: make time parameter of Top_herd.Printer.pp_stats optional

2862ca7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[herd] Decouple analysis results from CLI output#1833

[herd] Decouple analysis results from CLI output#1833
fsestini wants to merge 8 commits into
herd:masterfrom
fsestini:herd-test-results-simple-iter

fsestini commented May 12, 2026

Uh oh!

TiberiuBucur left a comment •

edited

Loading

Uh oh!

TiberiuBucur May 12, 2026

Uh oh!

fsestini May 13, 2026

Uh oh!

TiberiuBucur May 12, 2026

Uh oh!

fsestini May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fsestini commented May 12, 2026

What Changed

New module Top_herd.TestResult

Updated ParseTest/RunTest

New module Cli

Changed how showsome is computed within ParseTest

Tests

Uh oh!

TiberiuBucur left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TiberiuBucur May 12, 2026

Choose a reason for hiding this comment

Uh oh!

fsestini May 13, 2026

Choose a reason for hiding this comment

Uh oh!

TiberiuBucur May 12, 2026

Choose a reason for hiding this comment

Uh oh!

fsestini May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New module `Top_herd.TestResult`

Updated `ParseTest`/`RunTest`

New module `Cli`

Changed how `showsome` is computed within `ParseTest`

TiberiuBucur left a comment •

edited

Loading