Skip to content

Diachronic view #35

@paulpestov

Description

@paulpestov

(this is a copy of this internal issue for public purposes.

Describe the feature you'd like

Currently we display only the latest information about a workflow in QuiVer. We run a workflow A, the important metrics are saved and overwritten when we run workflow A again.

In order to measure how the changes in the OCR-D software impact the OCR quality as well as the hardware statistics we should introduce diachronic information to QuiVer, e.g. via a time stamp.

User story

As a developer I need an overview of how the changes in the software effect the OCR quality and hardware metrics in order to be certain that the newest contribution to OCR-D really improve the software's outcome.

Ideas we have discussed so far

How to display the information

For each GT corpus available there should be a line chart that depicts how a metric has changed over time. Each step in time (x axis) represents an ocrd_all or a ocrd_core release (clarify!)
Users can choose between the different metrics and can see a tendency whether the metric improves or not.

Underlying data structure

When selecting a GT corpus the front end uses an ID map file that points it to the right collection of JSON objects. Each OCR-D workflow that is executed on a GT corpus has a separate file in which all the runs per release are present.

Given GT workspace 16_ant_simple. We then have a file 16_ant_simple_minimal.json with all its benchmarking workflows, 16_ant_simple_selected_pages.json with all its benchmarking workflows etc. Each executed workflow has a timestamp by which the front end can then sort the single executions and retrieve the relevant data.

TODOs

  • clarify what our steps / increments in time are. A release of ocrd_all? A release of ocrd_core?
  • add time stamps to workflow objects
  • add single files for each GT workspace + workflow. ideally, the data should be sorted chronologically right from the start (although the front end should not depend on that)
  • create id map file

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions