A real report generated from orders.cna by
columna inspect. Every number below was read straight from the file's footer and pages —
nothing is mocked.
Footer-last, like Parquet: the writer streams row groups, then writes one compact metadata footer at the end. The reader seeks to the trailer, reads the footer, and now knows every column chunk's byte offset and stats — without scanning the data.
The writer trial-encodes each column with every applicable encoding and keeps the smallest. Here's what it actually chose for row group 0:
| column | type | encoding | codec | min | max | why |
|---|---|---|---|---|---|---|
| order_id | int64 | DELTA | DEFLATE | 100000 | 100499 | monotonic / clustered integers stored as small gaps |
| ts | datetime | BITPACK | DEFLATE | 2026-01-01 08:02:00 | 2026-01-02 08:57:00 | tight integer range packed into minimum bits |
| region | string | DICTIONARY | — | AMER | LATAM | low cardinality — values stored once, indices packed |
| status | string | DICTIONARY | DEFLATE | paid | refunded | low cardinality — values stored once, indices packed |
| amount | float64 | PLAIN | DEFLATE | 1.0 | 1059.83 | high-entropy values, nothing smaller won |
| quantity | int64 | BITPACK | — | 1 | 12 | tight integer range packed into minimum bits |
| note | string | DICTIONARY | — | follow-up | vip | low cardinality — values stored once, indices packed |
Query: scan(where = col("order_id") >= 104500).
Each row group stores min/max for order_id in the footer, so groups whose
max is below the threshold are skipped entirely — no bytes read, no decoding.
Bar shows bytes read vs a full scan. Correctness is verified in the test suite against a brute-force filter — skipping never drops a matching row.
| order_id | ts | region | status | amount | quantity | note |
|---|---|---|---|---|---|---|
| 100000 | 2026-01-01 08:02:00 | EMEA | paid | 1.0 | 9 | None |
| 100001 | 2026-01-01 08:03:00 | EMEA | paid | 148.82 | 9 | None |
| 100002 | 2026-01-01 08:07:00 | APAC | paid | 45.24 | 2 | None |
| 100003 | 2026-01-01 08:11:00 | EMEA | paid | 126.27 | 10 | vip |
| 100004 | 2026-01-01 08:14:00 | EMEA | paid | 150.44 | 10 | urgent |
| 100005 | 2026-01-01 08:17:00 | EMEA | paid | 149.5 | 4 | vip |
| 100006 | 2026-01-01 08:19:00 | APAC | paid | 499.11 | 9 | vip |
| 100007 | 2026-01-01 08:21:00 | AMER | paid | 169.36 | 9 | urgent |
| 100008 | 2026-01-01 08:26:00 | APAC | paid | 91.39 | 6 | urgent |
| 100009 | 2026-01-01 08:27:00 | EMEA | paid | 1.0 | 10 | urgent |
| 100010 | 2026-01-01 08:32:00 | LATAM | paid | 187.62 | 6 | urgent |
| 100011 | 2026-01-01 08:34:00 | LATAM | paid | 92.67 | 6 | urgent |