01 — Pact

Data contracts,
set with care.

DataPact is a zero-dependency Python framework for declaring what your data should look like — then enforcing it. Types, null rules, ranges, allowed sets, cross-column math, referential integrity. No pandas. No PyYAML. Just the standard library.

Data Engineering Data Quality Pipeline Gatekeeping CI-native stdlib-only

02 — Premise

Most data bugs don't crash. They drift.

A total that quietly stops matching subtotal + tax. A status field that grows a typo'd value. A join key that starts producing orphans. DataPact turns the rules you keep in your head into a versioned document — diffable in git, shared between producers and consumers, and enforced like a unit test.

— Contracts as code

Declare it once

A contract is a small YAML-lite or JSON file: column types, null rules, sets, ranges, regexes, cross-column checks. Versioned, reviewable, portable.

— Gate the pipeline

Stop bad data

Wrap a producer with @guard(contract) or call validate_or_raise. A breach raises before bad rows reach anything downstream.

— Fail the build

CI-native

datapact validate --fail-on error exits non-zero. A contract breach fails the job exactly like a failing test.

03 — Declare

A contract reads like a checklist.

DataPact ships its own tiny, safe YAML reader — a strict subset, no arbitrary-object deserialization. Or build the same contract fluently in Python.

# orders_contract.yaml — parsed by DataPact's own stdlib reader
name: orders
version: 1.0
strictness: lenient
columns:
  - name: order_id
    type: int
    nullable: false
    checks:
      - kind: column_values_unique
        severity: error
  - name: status
    type: str
    checks:
      - kind: column_values_in_set
        kwargs: { values: [new, paid, shipped, refunded] }
expectations:
  - kind: multicolumn_sum_to_equal
    kwargs: { columns: [subtotal, tax], total_column: total, tolerance: 0.01 }

04 — Enforce

Validate anything. Gate everything.

CSV, JSON, JSONL, SQLite, or a plain list of dicts — same API. The report is a structured object you can inspect, render, or turn into an exit code.

# Python API
import datapact as dp

report = dp.validate("orders.csv", dp.load_contract("orders.yaml"))
print(report.success, report.passed, report.failed)

# Gate a pipeline — raises DataContractError on an error-severity breach
@dp.guard(contract)
def load_orders():
    return fetch_rows()

# CI gate — non-zero exit fails the build
$ datapact validate orders.csv --contract orders.yaml --fail-on error
$ echo $?  →  1

05 — Report

A report a non-engineer can read.

Every run renders to one self-contained HTML file — no assets, no framework. Failures sort to the top; the data profile shows null rates, distinct counts and distributions. This is the demo, generated from a deliberately dirty orders file:

41Expectations 31Passed 10Failed

06 — Library

The expectations library.

Pure evaluator functions, one per check. Every column-level check supports a mostly= tolerance; every expectation carries an error / warn / info severity.

Column-level

  • values_not_null
  • values_unique
  • values_of_type
  • values_between
  • value_lengths_between
  • values_in_set
  • values_not_in_set
  • values_match_regex
  • match_strftime_format
  • mean_between
  • sum_between
  • stdev_between
  • unique_value_count
  • proportion_unique

Table-level

  • row_count_between
  • columns_match_set
  • columns_match_ordered
  • compound_columns_unique

Cross-column

  • pair_a_greater_than_b
  • multicolumn_sum_to_equal
  • values_in_other_column
    (referential integrity)

Plus

  • profile() statistics
  • suggest_contract()
  • @guard decorator
  • browser UI (serve)