|
| 1 | +--- |
| 2 | +title: Data Contract Specification | OpenMetadata Data Contracts Guide |
| 3 | +description: Create open-source data contracts directly in the OpenMetadata UI |
| 4 | +slug: /how-to-guides/data-contracts/spec |
| 5 | +--- |
| 6 | + |
| 7 | +# Introduction |
| 8 | + |
| 9 | +Data contracts formalize an agreement between data producers and consumers about what to expect from a data asset’s data. They capture the structure, semantics, quality, and SLAs of data in a machine-readable way, similar to an API contract but for data. In essence, a Data Contract is enforceable in the data ecosystem to bring standardization, control, and reliability. |
| 10 | + |
| 11 | +OpenMetadata, as a metadata management platform, integrates this concept by introducing a Data Contract entity defined via JSON Schema. This allows OpenMetadata admins and [data product](https://docs.open-metadata.org/latest/how-to-guides/data-governance/domains-&-data-products#data-products) owners to attach a contract to tables in OpenMetadata, codifying expectations in a structured format. The contract can then be enforced or validated using OpenMetadata’s existing metadata and data quality frameworks. The goal is to have contextually rich, high-quality, well-governed data that is trustworthy. Data contracts achieve this by making data expectations explicit and automating their enforcement. |
| 12 | + |
| 13 | +# Data Contract Entity Schema Design |
| 14 | + |
| 15 | +The JSON Schema definition for Data Contract entities in OpenMetadata defines the contract’s structure and allowed fields. The contract covers seven main categories of expectations: |
| 16 | + |
| 17 | +1. [Schema](#schema) |
| 18 | +2. [Semantics](#semantics) |
| 19 | +3. [Security](#security) |
| 20 | +4. [Business Assertions (data quality)](#quality) |
| 21 | +5. [SLA](#sla) |
| 22 | +6. [Terms of Use](#terms-of-use) |
| 23 | +7. [Status](#status) |
| 24 | + |
| 25 | +We also include an SLA section for service-level agreements and an Ownership field for accountability. Each Data Contract is designed to represent one single data asset (dataset, topic, model, etc.) in a well-structured, templated format. Data contracts are currently available for Table asset types. |
| 26 | + |
| 27 | +The JSON Schema for the Data Contract entity can be found [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/dataContract.json) |
| 28 | + |
| 29 | +# Data Contract Sections |
| 30 | +## Schema |
| 31 | + |
| 32 | +This is where the expected structural schema of the data asset is defined. It includes a list of fields (for a table, these are columns) each with name and data type. This captures the contractual schema that producers and consumers agreed on, which usually is a subset of the available fields on an asset. |
| 33 | + |
| 34 | +## Semantics |
| 35 | + |
| 36 | +Business meaning and documentation requirements are defined in a contract's Semantics section. For example, one can enforce that a data asset must have a description, owner or domain. |
| 37 | + |
| 38 | +These rules complement the formal tests in the quality section, acting as documentation of business expectations. This section ensures the contract isn’t just about technical schema, but also carries business context. |
| 39 | + |
| 40 | +## Security |
| 41 | + |
| 42 | +Data security and access expectations are defined in this section. This can reference an access policy ID or name that should govern this data, or a required classification label. |
| 43 | + |
| 44 | +In practice, this means the contract might require the data asset to be tagged as `PII` or `Confidential` if appropriate, and that only certain roles can access it (through an associated policy). |
| 45 | + |
| 46 | +## Quality (Assertions) |
| 47 | + |
| 48 | +Data quality tests and assertions required by the contract are here. |
| 49 | + |
| 50 | +Built on top of the native Data Quality features in OpenMetadata, this section allows defining specific tests that must pass for the data to be considered compliant with the contract. Tests can be at the column level (e.g., a column must be non-null) or table level (e.g., row count must be above a threshold), and can be managed from the Data Contract UI itself. |
| 51 | + |
| 52 | +## SLA |
| 53 | + |
| 54 | +Service-Level Agreements related to the data’s timeliness and lifecycle are captured in this section. |
| 55 | +This includes: |
| 56 | +Refresh Frequency: how often the data is expected to be updated or refreshed (e.g., daily, weekly,...). |
| 57 | +Max Latency: the maximum allowed delay between data generation and when it’s available to consumers (e.g., data may be up to 4 hours old at most, or one day for typical daily batch ETLs). |
| 58 | +Availability Time: the time by which daily or periodic data should be available (e.g., “09:00 UTC” daily data drop). |
| 59 | +Retention: how long the data is kept accessible (if applicable). |
| 60 | + |
| 61 | +Including them in the contract means producers commit to certain delivery timelines, and consumers know what availability to expect. |
| 62 | + |
| 63 | +## Terms of Use |
| 64 | + |
| 65 | +This section captures the allowed and disallowed uses of the data asset, as well as any compliance or regulatory requirements. This can include: |
| 66 | +- Allowed Uses: Describes what the data can be used for (e.g., internal analytics, reporting). |
| 67 | +- Disallowed Uses: Specifies prohibited uses (e.g., no sharing with third parties, or no training AI models). |
| 68 | +- Compliance Requirements: Any legal or regulatory obligations (e.g., GDPR, HIPAA). |
| 69 | + |
| 70 | +## Status |
| 71 | + |
| 72 | +A status field indicates whether the contract is active, draft, or currently violated. For instance, when first created, a contract is in `DRAFT` when it is not yet enforced or not fully implemented by the data producer. Once a data contract is published, it becomes `ACTIVE`. If a violation occurs (e.g., a test fails or schema deviates), the contract's status is `VIOLATED`. |
| 73 | + |
| 74 | +Data Contracts have approval workflows when changes are made, similar to Glossaries. This ensures that any modifications to the contract (like adding new quality tests or changing schema expectations) go through a review and approval process. |
| 75 | + |
| 76 | +# Applying Contracts to Tables |
| 77 | + |
| 78 | +Below is an example of a data contract for a warehouse.sales.orders table. |
| 79 | + |
| 80 | +## Data Contract YAML Example |
| 81 | + |
| 82 | +This is an example YAML of a Data Contract applied to a table in OpenMetadata. Note that while OpenMetadata brings full UI support for creating and managing Data Contracts, you can still use the API to manage them programmatically. |
| 83 | + |
| 84 | +``` |
| 85 | +name: Customers DC |
| 86 | +status: Active |
| 87 | +entity: |
| 88 | + id: 8beb4301-8302-4791-9944-2897e7614a1a |
| 89 | + type: table |
| 90 | + href: https://example.com/v1/tables/8beb4301-8302-4791-9944-2897e7614a1a |
| 91 | +schema: |
| 92 | + - name: customer_id |
| 93 | + dataType: INT |
| 94 | + dataLength: 1 |
| 95 | + dataTypeDisplay: integer |
| 96 | + description: New ID from Collate UI |
| 97 | + fullyQualifiedName: red.dev.dbt_jaffle.customers.customer_id |
| 98 | + tags: [] |
| 99 | + constraint: 'NULL' |
| 100 | + children: [] |
| 101 | + - name: first_name |
| 102 | + dataType: VARCHAR |
| 103 | + dataLength: 20 |
| 104 | + dataTypeDisplay: character varying(20) |
| 105 | + fullyQualifiedName: red.dev.dbt_jaffle.customers.first_name |
| 106 | + tags: |
| 107 | + - tagFQN: General.Person |
| 108 | + name: Person |
| 109 | + description: >- |
| 110 | + A full person name, which can include first names, middle names or |
| 111 | + initials, and last names. |
| 112 | + source: Classification |
| 113 | + labelType: Generated |
| 114 | + state: Suggested |
| 115 | + - tagFQN: PII.Sensitive |
| 116 | + name: Sensitive |
| 117 | + description: >- |
| 118 | + PII which if lost, compromised, or disclosed without authorization, |
| 119 | + could result in substantial harm, embarrassment, inconvenience, or |
| 120 | + unfairness to an individual. |
| 121 | + source: Classification |
| 122 | + labelType: Generated |
| 123 | + state: Suggested |
| 124 | + constraint: 'NULL' |
| 125 | + children: [] |
| 126 | +semantics: |
| 127 | + - name: Owners is set |
| 128 | + description: Ownership is mandatory |
| 129 | + rule: >- |
| 130 | + {"and":[{"some":[{"var":"owners"},{"!=":[{"var":"fullyQualifiedName"},null]}]}]} |
| 131 | +qualityExpectations: |
| 132 | + - id: 1efbda53-063d-4611-8f69-402f4490a503 |
| 133 | + type: testCase |
| 134 | + name: customer rows |
| 135 | + - id: 707a43f9-d1d1-4fb8-96da-7bb428429f87 |
| 136 | + type: testCase |
| 137 | + name: relationships_orders_customer_id__customer_id__ref_customers_ |
| 138 | + description: '' |
| 139 | +owners: [] |
| 140 | +reviewers: [] |
| 141 | +``` |
| 142 | + |
| 143 | +{%inlineCallout |
| 144 | + color="violet-70" |
| 145 | + bold="Creating Data Contracts" |
| 146 | + icon="MdArrowForward" |
| 147 | + href="/how-to-guides/data-contracts/create"%} |
| 148 | + Create Data Contracts in the OpenMetadata UI. |
| 149 | +{%/inlineCallout%} |
0 commit comments