Improve runtime measures for criterion plot and benchmarking plots

## Current Situation / Problem you want to solve

The proposal in this issue concerns the functions `criterion_plot`, `profile_plot` and `convergence_plot`.

- The `criterion_plot` uses the number of function evaluations (`n_evaluations`) as runtime measure
- The `profile_plot` and `convergence_plot` has a `runtime_measure` argument that lets the user switch between `n_evaluations`, `n_batches`, and `walltime`.

Each runtime measure serves a purpose:

- **walltime**: Measures how long it actually takes to achieve a certain progress. This is what a user ultimately cares about in their optimization problem.
- **n_evaluations**: Measures how many evaluations of the objective function it takes to achieve a certain progress. This allows to ignore optimizer overhead and use fast benchmark functions to judge the performance of an optimizer that is designed for expensive objective functions. Moreover, it is deterministic and reproducible across machines. 
- **n_batches**: Similar to n_evaluations. In addition it allows to simulate the performance of a parallel optimizer on small machines. 

`n_evaluations` and `n_batches` measure important aspects but also have a big drawback: They exclusively focus on objective functions and ignore all time that is spent on evaluating derivatives. This is not a problem as long as only derivative free or only derivative based optimizers are compared. But as soon as one compares a derivative free with a derivative based optimizer it becomes misleading. 

## Describe the solution you'd like

### Step 1: Introduce a new runtime measures:

All relevant functions will get a `runtime_measure` argument which can be:

- `"function_time"` (default): The time spent in evaluations of the user provided functions `fun`, `jac`, `fun_and_jac`; Similar to `n_evaluations`, this will ignore the overhead of calculations done in the optimizer. 
- `"batch_function_time"`: The time that would have been spent in evaluations of user provided functions if all evaluations of the same batch were done in parallel (without parallelization overhead). 
- `"walltime"`: The actual time spent (reflecting actual optimizer overheads, parallelization overheads, ...)

We also keep the legacy measures `"n_evaluations"` and `"n_batches"`.


### Step 2: Introduce an optional **cost model**

While `"function_time"` and `"batch_function"` time allow to ignore optimizer overhead, they are not deterministic nor comparable across machines. In order to achieve this, we optionally allow a user to pass a `CostModel` as `runtime_measure`. Using a `CostModel` allows to reproduce all existing measures except for walltime. Moreover, it allows to get reproducible and hardware agnostic runtime measures for almost any situation.

 A cost model looks as follows: 

```python
@dataclass(frozen=True):
class CostModel:
    fun: float | None = None
    jac: float | None = None
    fun_and_jac: float | None = None 

    label: str | None

    def aggregate_batch_times(times: list[float]) -> float:
        return sum(times)

```

The attributes `fun`, `jac`, and `fun_and_jac` allow a user to provide runtimes of the user provided functions. Those could be actual times in seconds or normalized values (e.g. 1 for `fun`). None means, that an actual measured runtime is used. 

The attribute `label` is used as x-axis label in plots. 

The method `aggregate_batch_times` takes a list of times (which might be measured runtimes or replaced times based on the other attributes) and returns a scalar value. The default implementation assumes that no parallelization is used.

To see the cost model in action, let's reproduce a few existing measures:

```python
n_evaluations_cost_model = CostModel(fun=1, jac=0, fun_and_jac=0, label="evaluations of the objective function")
function_time_cost_model = CostModel(label="seconds")

@dataclass(frozen=True)
PerfectParallelizationCostModel:
    def aggregate_batches(times: list[float]) -> float:
        return max(times)

n_batches_cost_model = PerfectParallelizationCostModel(fun=1, jac=0, fun_and_jac=0, label="batch evaluations of the objective function")
```

The zero values for `jac` and `fun_and_jac` make the problems of `n_evaluations` and `n_batches` very apparent. 
 
## Potential variations

- `aggregate_batch_times` could be a callable attribute so users don't have to subclass `CostModel` to change it. 
- Instead of an enum for the runtime measures we could implement subclasses that capture the special cases and let users pass a CostModel subclass or instance (similar to how algorithms are passed).   
- The legacy cases `n_batches` and `n_evaluations` could be deprecated and only be available by using the CostModel

## Questions

- Do we need multiple cost models for the plots that do multiple optimizations (e.g. profile_plot and convergence_plot)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve runtime measures for criterion plot and benchmarking plots #547

Current Situation / Problem you want to solve

Describe the solution you'd like

Step 1: Introduce a new runtime measures:

Step 2: Introduce an optional cost model

Potential variations

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve runtime measures for criterion plot and benchmarking plots #547

Description

Current Situation / Problem you want to solve

Describe the solution you'd like

Step 1: Introduce a new runtime measures:

Step 2: Introduce an optional cost model

Potential variations

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions