Add a `zip_axis` / `tuple_axis` that iterates through multiple parameter axes in lockstep

# Overview

RAPIDS needs a way to iterate through correlated axes in lockstep.

# Example Usecase

Consider a benchmark that takes three parameters "X", "Y", and "Z", where X is an int, Y is a float, and Z is a string.

We want to run two instances of this benchmark:

1. "X" is 402, "Y" is 0.6, and "Z" is "foo".
2. "X" is 862, "Y" is 0.2, and "Z" is "bar".

Using regular axes here is troublesome, since those expand to a cartesian product by default. Defining these axes naively will produce 8 parametrizations.

# Existing Solutions

## Derived parameters

If Y and Z could be derived from X, we could just define X, compute Y and Z, then add summaries to the state for markdown output.

However, these parameters may not always be easily related, as in the example above.

## Skipping

[`nvbench::state::skip`](https://github.com/NVIDIA/nvbench/blob/main/examples/skip.cu) provides a mechanism to skip some of the configurations in the cartesian product, and can be used to slice out just the configurations of interest.

However, this is tedious and fragile. You'd need to maintain some sort of validation logic that stays in sync with the axis values.

## Lookup Tables

Each set of values could be put in a lookup function, and a single integer axis could be used to enumerate each desired set of parameters.

This is subpar. There is no way to override the actual values at runtime with `-a` options, only the index is modifiable. By default, the output will be cryptic, reporting only the test case index, not the actual values used. (This could be worked-around by hiding/adding summaries in the benchmark body).

# Proposed Solution

Add a new `zip_axis` (`tuple_axis`?) type that appears as multiple distinct axes, but is defined by specifying discrete sets of inputs as tuples. This effectively adds an abstraction that simplifies the Lookup Table solution.

```
void my_bench(nvbench::state &state)
{
  const auto x = state.get_int64("X"); // 402, 862
  const auto y = state.get_int64("Y"); // 0.6, 0.2
  const auto z = state.get_int64("Z"); // foo, bar

  // ...
}

// (a)
auto my_values = nvbench::zip_axis_values<nvbench::int64_t, nvbench::float64_t, std::string>
{
  // Names for subaxes:
  {"X", "Y", "Z"},
  //  Tuples of grouped parameters
  {402, 0.6, "foo"},
  {862, 0.2, "bar"},
  // ...
};
NVBENCH_BENCH(my_bench)
  .add_zip_axis("ZippedValues", my_values);

// or

// (b)
NVBENCH_BENCH(my_bench)
  .add_zip_axis("ZippedValues",
                        "X", {402, 0.6},
                        "Y", {0.6, 0.2},
                        "Z", {"foo", "bar"});
```

* The (a) form is nice because it lays out the grouped values together, making it easy to check that all subaxes are synced and have the same number of values.
* The (b) form is convenient for small axes.

This should play nicely with other axes -- the cartesian product of all other axes + the zipped values should still be generated.

# Open Questions

## Command line interactions with `-a`

Proposed: If any subaxes are redefined, all subaxes must be redefined and have the same length, e.g.

```
my_bench -b 0 -a "X:[285,128,42]" -a "Y:[0.1,3.4,1.2]" -a "Z:[bing,bang,bong]"
```

If any axes are mismatched in length or unspecified, throw an error. Maybe we allow redefinition of a single subaxis only when the new definition is the same length as the hard-coded axis.

This could be punted on for the first version of this, since it'll be complicated no matter what approach we use, and it's not strictly necessary to meet the immediate need for zipped axes.

## Markdown Output

Proposed: The zip axis should be transparent here -- ignore it and just treat the subaxes as regular axes:

```
|  X  |  Y  |   Z   | ... |
|-----|-----|-------|-----|
| 402 | 0.6 | "foo" | ... |
| 806 | 0.2 | "bar" | ... |
```

## CSV Output

Proposed: Add columns with subaxes values, as well as a value that identifies the index in the zip, e.g.

```
ZippedValues, X, Y, Z, ...
0, 402, 0.6, "foo", ...
1, 806, 0.2, "bar", ....
```

## JSON Output

This will be tricky and require some thought. We'll need to address both the per-benchmark axis definition as well as the per-state parameter encoding.

## Implementation

This will likely be a new subclass of `nvbench::axis_base` that holds a vector of other `nvbench::axis_base`s.

Several areas will need to be updated, including but not limited to:

- [ ] `nvbench::axis_type` enum
- [ ] `nvbench::option_parser` for handling CLI
- [ ] `nvbench::markdown_printer`, `nvbench::csv_printer`, `nvbench::json_printer`
- [ ] `nvbench::benchmark_base` to add the `add_zip_axis`
- [ ] `nvbench::detail::state_generator` and `state_generator_iterator`
- [ ] `nvbench::axis_metadata` will need to learn how to handle these cleanly
- [ ] Unit tests
- [ ] Examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a `zip_axis` / `tuple_axis` that iterates through multiple parameter axes in lockstep #68

Overview

Example Usecase

Existing Solutions

Derived parameters

Skipping

Lookup Tables

Proposed Solution

Open Questions

Command line interactions with `-a`

Markdown Output

CSV Output

JSON Output

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a zip_axis / tuple_axis that iterates through multiple parameter axes in lockstep #68

Description

Overview

Example Usecase

Existing Solutions

Derived parameters

Skipping

Lookup Tables

Proposed Solution

Open Questions

Command line interactions with -a

Markdown Output

CSV Output

JSON Output

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add a `zip_axis` / `tuple_axis` that iterates through multiple parameter axes in lockstep #68

Command line interactions with `-a`