[herd] Decouple analysis results from CLI output#1833
Conversation
c2da892 to
b969e28
Compare
There was a problem hiding this comment.
I haven't gotten around to reviewing everything just yet, but so far I like the implementation. I left two minor comments so far. In addition, I'm having trouble running make install after the project builds (using make). Apparently the _build/default/herdtools7.install is missing.
| (* START NOTWWW *) | ||
| (* Interval timer will be stopped just before output, see top_herd *) | ||
| Itimer.start name TopConf.timeout ; | ||
| (* END NOTWWW *) | ||
| let start_time = Sys.time () in | ||
| Misc.input_protect (do_from_file start_time env name) name |
There was a problem hiding this comment.
Is there any concern in dropping the timing data recording?
There was a problem hiding this comment.
Time recording is still performed as before, it just has been moved out of ParseTest and into the new Cli module.
The rationale is that I think timing is a property of a particular client’s consumption of the result, not of the result itself. Since ParseTest is now moving towards exposing a reusable library API whose results are consumed through an iterator in various different ways (in CLI, concurrently, in tests, etc.), the library itself should not impose any particular timing policy onto callers.
In herd7, the CLI the component that knows what operations need to be timed, so now timing is handled there. Other consumers of this API will have the choice to set up their own timing policy, or to not measure time at all (like we do in our test suite).
Having said this, you've reminded me that I should make the time parameter of Top_herd.Printer.pp_stats optional, in case callers want to print out run stats without the 'Time' bit!
| begin match Conf.outputdir with | ||
| | PrettyConf.StdoutOutput | PrettyConf.Outputdir _ -> true | ||
| | _ -> false | ||
| end || Misc.is_some Conf.PC.view || Conf.variant Variant.MemTag | ||
| Conf.collect_graph_data | ||
| || Misc.is_some Conf.PC.view || Conf.variant Variant.MemTag |
There was a problem hiding this comment.
I am personally fine with having this flag, but I was curious if you measured a change in performance with and without the showsome flag guard for the code that builds the show member of the state in interpreter.ml. How big of an optimisation are we looking at, compared to recording every relation and filtering at the end?
There was a problem hiding this comment.
Thanks, this is a good point. The short answer is that I have not looked deeply into this yet, but I agree it is a question worth asking.
compared to recording every relation and filtering at the end?
I am interpreting this as: what would happen if the interpreter always constructed the show state, and we only decided later which parts of it to use.
One thing worth noting is that show is currently represented lazily via Lazy.t. So, even with showsome = true, we are not necessarily eagerly computing every shown relation while the interpreter runs. What we may still be doing is allocating thunks needed to compute the shown relations at a later stage. So my current understanding is that the likely cost of setting showsome = true unconditionally is less "eagerly compute every relation" and more "allocate and retain more show-related data, possibly increasing memory footprint and GC pressure".
I did a rough comparison on this test:
AArch64 wait-flag1
{
0:X0=x; 1:X0=x;
1:X2=y;
}
P0 | P1 ;
MOV W1,#1 |L0: ;
| MOV W3,#1 ;
| LDADD W3,WZR,[X2] ;
STR W1,[X0] | LDR W1,[X0] ;
| CBZ W1,L0 ;
exists(1:X1=1)
with
$ /usr/bin/time -l ./_build/default/herd/herd.exe -set-libdir ./herd/libdir ./wait-flag1.litmus -unroll 3For this particular test, I did not see a measurable difference in runtime or peak memory between the normal build and a build with showsome = true unconditionally. On average:
Standard:
5.74 real 3.52 user 1.20 sys
17907712 maximum resident set size
11616664 peak memory footprint
With showsome = true:
5.71 real 3.53 user 1.19 sys
17874944 maximum resident set size
11600304 peak memory footprint
That said, this is not a definitive benchmark. I chose this test because it is heavier on herd7 than others, but nevertheless, it might not be stressing the show machinery particularly well. (@HadrienRenaud I wonder if you might know of any litmus tests that could serve as a good benchmark for this.) I also tried time make test and did not see a visible performance difference there either.
Bottom line is, I don't feel I have enough evidence one way or the other, right now. My reason for keeping the flag is to keep the refactor conservative w.r.t. runtime behaviour, but I agree this is worth measuring more systematically at some point.
This PR is a step toward making
herd7reusable from OCaml code, as outlined in #1782. It primarily addresses point 2 of that issue, namely:The main change is to split the functionality of the
Top_herdmodule into:The user-facing
herd7CLI is intended to remain unchanged.What Changed
Before this PR,
Top_herdmixed three responsibilities in the same control flow:This PR separates those responsibilities.
Top_herd.Make(...).runis now the result-producing part: it generates executions, checks them against the model, and returns structured OCaml values rather than printing directly.Top_herd.Printeris the formatting layer: it knows how to turn those structured values into strings. The newClimodule implements the remaining CLI-specific resource management: directories, temporary files, channels, etc.The intended split is that the first two pieces can be reused from library-style callers (hence why they are both accessed through
Top_herd), while file and directory management stays in a CLI-specific place and won't be exposed.One detail of this split is that selected executions are returned through an iterator-shaped value, rather than as a list. This is to preserve the previous memory behaviour where executions are processed one by one and not retained in memory.
Note for reviewers: most of this refactor consists of taking pieces of the old
Top_herdmodule and moving them to more specific places. The diff might therefore look noisier than the actual conceptual change, as most of the code that is shown being added/removed in the diff is simply being moved around with little to no change. Overall, I tried my best to keep the diff as tight as possible.Concretely, this PR proposes the following changes:
New module
Top_herd.TestResultThe main types introduced here are
statsandexecution.statsrepresents the overall results of a litmus test simulation, i.e. the final states, candidate counts, witnesses, flags, etc.executioncarries data for a single execution.Calling
Top_herd.Make(M).runnow takes atest : M.S.testas input, and returns a triple:M.S.test, which is an updated version of the inputM.S.event_structure listof event structures (pre solver)(execution -> unit) -> statsthat iterates over executions and produces astatssummary at the endThe iterator follows the existing CLI selection policy: it visits the executions selected by options such as
showandnshow, not every candidate execution. This keeps the new API close to the behaviour currently exposed by the CLI. A more fine-grained iterator over candidate executions can be added later if needed.Updated
ParseTest/RunTestThe "run" functions in
ParseTest/RunTesthave been updated so parsing and running return structured outcomes, rather than print out those outcomes themselves.Outcomes are returned as a first-class module. This seems needed, because the type of analysis outcomes is indexed by semantics
S : SemExtra.S, and moreover the input test determines whichSemExtra.S/XXXMem.Sinstance herd7 constructs. Since the exact semantics module is only known after parsing and dispatching on the test architecture and variants, it has to be part of the returned value somehow.New module
CliThe new
Climodule contains the parts of the oldTop_herdimplementation that are specific to the CLI. Most of this code is intentionally copied almost directly from the oldTop_herdmodule.Changed how
showsomeis computed withinParseTestBefore this refactor,
showsomewas determined insideParseTest(in part) from theoutputdirparameter. AsParseTestis now moving toward being a more generic API, I thought a very CLI-specific parameter likeoutputdirwould not be appropriate. The newcollect_graph_dataparameter forParseTestis an attempt to preserve the old behaviour using a less CLI-specific parameter. Having said this, I do still find it a bit awkward to use a module parameter for controlling optimization behaviour, so it would be great if we can later on find a way to remove this while preserving the optimizations as much as possible.Tests
Added cram tests under
herd/tests/other/output.tto ensure that CLI behaviours touched by the refactor are preserved.