columna · generated inspector

Inside a .cna file

A real report generated from orders.cna by columna inspect. Every number below was read straight from the file's footer and pages — nothing is mocked.

5,000
rows
7
columns
10
row groups
44 KB
file size
259 KB
source CSV
44 KB
columna .cna
83%
smaller on disk

Byte layout

Footer-last, like Parquet: the writer streams row groups, then writes one compact metadata footer at the end. The reader seeks to the trailer, reads the footer, and now knows every column chunk's byte offset and stats — without scanning the data.

file header
MAGIC "CNA1" · 4 bytes
row group 0 · 500 rows · 7 column chunks → pages [encoding · codec · crc32]
row group 1 · 500 rows · 7 column chunks → pages [encoding · codec · crc32]
row group 2 · 500 rows · 7 column chunks → pages [encoding · codec · crc32]
row group 3 · 500 rows · 7 column chunks → pages [encoding · codec · crc32]
… 6 more row groups …
footer
zlib(JSON) — schema · per-chunk offsets · encodings · min/max stats
trailer
FOOTER_LEN (uint32) + MAGIC "CNA1" — seek from end

Per-column encoding

The writer trial-encodes each column with every applicable encoding and keeps the smallest. Here's what it actually chose for row group 0:

columntypeencodingcodecminmaxwhy
order_idint64DELTADEFLATE100000100499monotonic / clustered integers stored as small gaps
tsdatetimeBITPACKDEFLATE2026-01-01 08:02:002026-01-02 08:57:00tight integer range packed into minimum bits
regionstringDICTIONARYAMERLATAMlow cardinality — values stored once, indices packed
statusstringDICTIONARYDEFLATEpaidrefundedlow cardinality — values stored once, indices packed
amountfloat64PLAINDEFLATE1.01059.83high-entropy values, nothing smaller won
quantityint64BITPACK112tight integer range packed into minimum bits
notestringDICTIONARYfollow-upviplow cardinality — values stored once, indices packed

Predicate pushdown

Query: scan(where = col("order_id") >= 104500). Each row group stores min/max for order_id in the footer, so groups whose max is below the threshold are skipped entirely — no bytes read, no decoding.

9/10
row groups skipped
500
rows actually scanned
500
rows matched
100%
fewer bytes read

Bar shows bytes read vs a full scan. Correctness is verified in the test suite against a brute-force filter — skipping never drops a matching row.

Data preview

order_idtsregionstatusamountquantitynote
1000002026-01-01 08:02:00EMEApaid1.09None
1000012026-01-01 08:03:00EMEApaid148.829None
1000022026-01-01 08:07:00APACpaid45.242None
1000032026-01-01 08:11:00EMEApaid126.2710vip
1000042026-01-01 08:14:00EMEApaid150.4410urgent
1000052026-01-01 08:17:00EMEApaid149.54vip
1000062026-01-01 08:19:00APACpaid499.119vip
1000072026-01-01 08:21:00AMERpaid169.369urgent
1000082026-01-01 08:26:00APACpaid91.396urgent
1000092026-01-01 08:27:00EMEApaid1.010urgent
1000102026-01-01 08:32:00LATAMpaid187.626urgent
1000112026-01-01 08:34:00LATAMpaid92.676urgent