01 — Pact
DataPact is a zero-dependency Python framework for declaring what your data should look like — then enforcing it. Types, null rules, ranges, allowed sets, cross-column math, referential integrity. No pandas. No PyYAML. Just the standard library.
02 — Premise
A total that quietly stops matching subtotal + tax.
A status field that grows a typo'd value. A join key that starts producing orphans.
DataPact turns the rules you keep in your head into a versioned document — diffable in git,
shared between producers and consumers, and enforced like a unit test.
A contract is a small YAML-lite or JSON file: column types, null rules, sets, ranges, regexes, cross-column checks. Versioned, reviewable, portable.
Wrap a producer with @guard(contract) or call validate_or_raise.
A breach raises before bad rows reach anything downstream.
datapact validate --fail-on error exits non-zero. A contract breach fails
the job exactly like a failing test.
03 — Declare
DataPact ships its own tiny, safe YAML reader — a strict subset, no arbitrary-object deserialization. Or build the same contract fluently in Python.
# orders_contract.yaml — parsed by DataPact's own stdlib reader name: orders version: 1.0 strictness: lenient columns: - name: order_id type: int nullable: false checks: - kind: column_values_unique severity: error - name: status type: str checks: - kind: column_values_in_set kwargs: { values: [new, paid, shipped, refunded] } expectations: - kind: multicolumn_sum_to_equal kwargs: { columns: [subtotal, tax], total_column: total, tolerance: 0.01 }
04 — Enforce
CSV, JSON, JSONL, SQLite, or a plain list of dicts — same API. The report is a structured object you can inspect, render, or turn into an exit code.
# Python API import datapact as dp report = dp.validate("orders.csv", dp.load_contract("orders.yaml")) print(report.success, report.passed, report.failed) # Gate a pipeline — raises DataContractError on an error-severity breach @dp.guard(contract) def load_orders(): return fetch_rows() # CI gate — non-zero exit fails the build $ datapact validate orders.csv --contract orders.yaml --fail-on error $ echo $? → 1
05 — Report
Every run renders to one self-contained HTML file — no assets, no framework. Failures sort to the top; the data profile shows null rates, distinct counts and distributions. This is the demo, generated from a deliberately dirty orders file:
06 — Library
Pure evaluator functions, one per check. Every column-level check supports a
mostly= tolerance; every expectation carries an
error / warn / info severity.