Skip to content

CIViC Variant Interpretation Example

Matthew Brush edited this page Aug 16, 2018 · 7 revisions

The CIViC Database

The Clinical Interpretations of Variants in Cancer (CIViC) database is a crowd-sourced database that curates evidence from the literature that support or refute 'cancer variant interpretations' - assertions about the clinical relevance of genetic variants to cancer.

CIViC curators generate Evidence records, which describe the support provided by a particular study finding for or against a potential clinical interpretation of a variant. When CIViC decides that sufficient evidence exists in support of a particular interpretation, an Assertion record is created to represent this claim and present its evidence. The data is accessible to humans via a web interface, and computationally via an API.

The CIViC data uses established terminologies and identifiers to record the variants, diseases, and treatments its interpretations are about. CIViC also captures structured metadata about the assertions and evidence records produced by its curators, and about the underlying study findings generated by researchers. This metadata includes:

  • the agents and dates of different types of contributions made by CIViC curators across its lifecycle.
  • the publications reporting study findings on which evidence records are based
  • an evidence level for each record, based on the type of study that generated the finding used as evidence
  • a trust rating reflecting the strength and quality of the evidence provided by a study finding
  • a free-text description of key methods and data values reported in the study finding

The Modeled Example

Below we present a SEPIO model of the CIViC AID6 assertion record - a "predictive" interpretation stating that the EGFR(L858R) variant confers response to Afatinib treatment in Non-small Cell Lung Carcinoma (NSCLC). The SEPIO model creates an Assertion object, from which the semantics of its primary statement are approximated by creating links to its 'subject' (the variant), its 'descriptor' (the treatment), a 'predicate' describing the relationship between them, and 'qualifiers' refining the context in which this core statement holds (the treated disease and variant origin).

The SEPIO example includes three of the six total supporting evidences reported in the CIViC assertion record:

  1. EID2997 represents the evidence provided by the FDA approval of Afatinib for treating NSCLC with EGFR L858R mutations.
  2. EID897 represents the evidence provided by the findings of a 2013 phase II clinical trial.
  3. EID2629 represents the evidence provided by the findings of a 2008 in vitro cell based study.

These CIViC evidence records are represented as Evidence Lines in the SEPIO model, as they capture the provenance, direction, and strength of the independent arguments made by different supporting study findings. These Evidence Lines are represented separately from the Study Findings on which they are based, which serve as Evidence Items in the SEPIO model. This is because these Study Findings exist independently of the arguments they make for a particular Assertion, and can provide evidence with different direction, strength, and provenance for other assertions. While the actual data values reported by the Study Findings (e.g. hazard ratios and progression-free survival values) are captured only as free-text description in the CIViC data, the SEPIO model could support a more structured representation if desired.

img Figure 1: SEPIO Representation of the AID6 assertion and its evidence and provenance metadata. Values captured as identifiers are shown as prefixed CURIEs. Identifiers for certain domain entities (e.g. diseases, drugs) are pulled from recommended identifier systems, but a given implementation is free to use its system of choice. Labels for many of the values captured as identifiers are shown parenthetically for human readability. Grey boxes represent terms coming from value sets, showing only the id and label of the value. Click here for a full size image.

Overall, the core SEPIO model is capable of representing the full complexity of CIViC data for this exemplar record. The native CIViC model is very similar to the SEPIO representation, in that they use similar terms and present similar perspectives on the relationship between assertions and the evidence that supports them. A key implementation difference is that the SEPIO model creates separate representations of the Study Finding and the Evidence Line that captures the argument it makes for the target Assertion. As noted above, the SEPIO model could support a structured representation of the data items and their experimental provenance that are currently captured as free-text description in the CIViC data.

Clone this wiki locally