Skip to content

Roadmap Evaluation

Shengjia Zhao edited this page Dec 31, 2021 · 3 revisions

Roadmap for torchuq.evaluate

torchuq.evaluate.point

Metrics:

  • L2 loss [V0.1.0]
  • Pinball loss [V0.1.0]
  • Huber loss [V0.1.0]

Plots:

  • Scatter plot [V0.1.0]
  • Conditional bias [V0.1.0]

torchuq.evaluate.distribution

Metrics:

  • CRPS score [V0.1.0]
  • Negative log-likelihood [V0.1.0]
  • Mean and standard deviation [V0.1.0]
  • Expected calibration error (ECE) [V0.1.0]
  • Debiased ECE [V0.1.0]
  • Threshold calibration error [future]

Plots:

  • Reliability diagram (calibration diagram) [V0.1.0]
  • Visualize probability mass functions (PDF) as a sequence [V0.1.0]
  • Visualize cumulative density function (CDF) as a sequence [V0.1.0]
  • Visualize cumulative density function (CDF) as a single plot [V0.1.0]
  • Visualize inverse cumulative density function (iCDF) as a single plot [V0.1.0]

torchuq.evaluate.quantile

Metrics:

  • Pinball loss [V0.1.0]
  • Quantile calibration error [future V0.2.0]

Plots:

  • Visualize quantiles as a sequence [V0.1.0]
  • Visualize quantile calibration [V0.1.0]

torchuq.evaluate.interval

Metrics:

  • Interval length [V0.1.0]
  • Interval coverage [V0.1.0]

Plots:

  • Visualize the intervals as a sequence [V0.1.0]
  • Visualize the length distribution [V0.1.0]

for each L plot the proportion of intervals with length less than L

torchuq.evaluate.particle

Plots:

  • Plot the density represented by the particles as a sequence [V0.1.0]
  • Plot the trend of the prediction (assuming it's a time series) [V0.1.0]

torchuq.evaluate.decision

[future V0.2.0]

Evaluate simulated decision-making loss for the predictions

torchuq.evaluate.categorical

Metrics:

  • Classification accuracy [V0.1.0]
  • Proper scoring rules [future]
  • Expected calibration error (ECE) computed by binning [V0.1.0]
  • Expected calibration error (ECE) computed by kernel smoothing [V0.1.0]
  • Classwise ECE [future V0.2.0]
  • Multi-accuracy [future]
  • Multi-calibration [future]

Plots:

  • Confusion matrix [V0.1.0]
  • Reliability diagram (or calibration diagram) by smoothing [V0.1.0]
  • Reliability diagram (or calibration diagram) by kernel smoothing [V0.1.0]

torchuq.evaluate.topk

Metrics:

  • Accuracy (or coverage) [V0.1.0]

Plots:

  • Confusion matrix [V0.1.0]