statistical significance method and jargon

Thanks for Criterion!

I spent some time understanding `sample-size` and `measurement-time` which I believe I got right. My questions relate to statistical significance and may underscore not having the a-priori knowledge of best practices in benchmarking and here they go.

---

https://bheisler.github.io/criterion.rs/book/analysis.html seems to make it clear that the `sample`s of any granular bench go in linearly increasing iteration count: [d, 2d, 3d, ... Nd].

How does this interact with statistical significance testing which Criterion is doing, which ultimately ends up as possibly warnings like: 

> Found 10 outliers among 50 measurements (20.00%)
  6 (12.00%) low severe
  2 (4.00%) high mild
  2 (4.00%) high severe

Or, what is the motivation for [d, 2d, 3d, ... Nd] in case it's not for catering to a statistical property per-se? 

Is there also already some place where these `low/high` and `mild/severe` text indications are defined?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statistical significance method and jargon #76

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

statistical significance method and jargon #76

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions