Skip to content
85 changes: 85 additions & 0 deletions api/METRICS-DASHBOARD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Proposal for metrics/dashboard API

Goals:
* Unified treatment of many different kinds of metrics in dashboard
* A clean separation between dashboard (for visualization) and different kinds of backends (for metric calculation)

Status quo:
* Dashboard currently keeps track of the metric values in the cache dictionary `fairlearn.widget._fairlearn_widget.FairlearnWidget._response`
* The dictionary is updated in `fairlearn.widget._fairlearn_dashboard.FairlearnDashboard._on_request`
* There's a PR out to fill the whole data structure in `fairlearn.metrics.create_dashboard_dictionary`

Issues with the status quo:
* Code duplication / redundancy / brittleness due to copy-paste errors
* Currently only one kind of a metric (`<metric>_summary`) is supported and hard-wired into the dashboard dictionary

## Proposal

### Part I: More general dashboard dictionary

```python
{
"prediction_type": "binary_classification" or "probabilistic_binary_classification" or "regression",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we're omitting multiclass classification?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we don't have any support for it yet, but we can definitely add other prediction types in future.

"array_bindings": { # all 1D arrays, including features and predictior vectors, are here
"<array_key>" : { # the keys can be arbitrary strings; not sure we need to force any convention, but see examples below
"name": string, # the name of a feature would be the feature name, of a prediction vector would be the model name
"values": number[],
"value_names": string[], # an optional field to encode categorical data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we also specify that extra keys (e.g. inserted by AzureML) are to be preserved.

},
"sensitive_feature gender" : { # an example feature
"name": "gender",
"values": [0, 1, 0, 0, 2],
"value_names": ["female", "male", "non-binary"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a 'type' field in here, so things like 'prediction' and 'sensitive_feature' don't have to go into the key?

},
"y_pred model0" : { # an example prediction vector
"name": "model0",
"values": [0, 0, 1, 1, 0],
},
"y_true": {
"name": "y_true",
"values": [0, 1, 1, 1, 0],
},
"sample_weight": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is user-provided, not the one set within ExponentiatedGradient, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, perhaps it's worth documenting this with a short comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do. this is just an example of an array that we may want to pass to the metrics--since many metrics work with this kind of an argument.

"name": "sample_weight",
"values": [0.1, 0.3, 1, 0.9],
}
...
},
"cache" : [
{
"function": string, # python function name; we could either limit to fairlearn.metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fully qualified names for sure.

# or use fully qualified names
"arguments": {
"<array_argument>": "<array_key>" or null, # array-valued arguments are matched with array bindings
"<numeric_argument>": number or null, # we should also support numeric arguments, strings, booleans
"<string_argument>": string or null, # null corresponds to None
"<boolean_argument>": boolean or null,
},
"return_value": number or string or boolean or null or dict,
# dict could be encoded as { "keys": any[], "values": any[] }
},
{ # an example
"function": "fbeta_score_group_summary",
"arguments": {
"y_true": "y_true",
"y_pred": "y_pred model0",
"sensitive_features": "sensitive_feature gender",
"sample_weight": "sample_weight",
"beta": 0.3,
},
"return_value": {
"overall": 0.11,
"by_group": {
"keys": [0, 1, 2],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the 'keys' necessary, if we required all categoricals to be integer-encoded?

"values": [0.15, 0.04, 0.03],
}
},
},
...
]
}
```

### Part II: How to remove duplication

`<TODO>`