Glossary

Terms, defined plainly.

The concepts that come up across the writing, each in a line or two, with a link to an authoritative source if you want to go deeper.

McNemar's test

A statistical test for paired nominal data. Used here to check whether a before/after change in pass or fail outcomes on the same test cases is real, not noise.

Wikipedia
Wilson score interval

A confidence interval for a proportion that stays sensible with small samples and rates near 0 or 1, where the naive interval breaks down.

Wikipedia
Cohen's kappa

A measure of agreement between two raters that corrects for the agreement you would expect by chance. Used to check how well an AI judge tracks a human grader.

Wikipedia
pass^k / best-of-N

pass^k is the chance a stochastic agent passes the same case on all k runs, a stricter bar than passing once. best-of-N takes the best of N sampled attempts.

Codex paper (arXiv)
Idempotency

A property where running an operation many times has the same effect as running it once. Essential for safely retrying work after a partial failure.

Wikipedia
Transactional outbox

A pattern that writes an event to an outbox table in the same database transaction as the state change, then relays it, so the state and the event cannot diverge.

microservices.io
Watermark

In stream processing, a marker that tracks event-time progress so a system knows when it has seen enough late data to safely finalize a window.

Apache Flink docs
Equivalence-class hashing

Canonicalizing a value to a representative form before hashing, so all members of an equivalence class (any valid answer) hash to the same key and reward equally.

Wikipedia
VRPTW

The vehicle routing problem with time windows: route a fleet to serve stops within their allowed time windows at minimum cost. A canonical hard optimization problem.

Wikipedia
SSE

Server-sent events: a one-way stream from server to browser over a single long-lived HTTP connection. Simpler than WebSocket when you only push updates downstream.

MDN
WebSocket

A protocol for full-duplex communication between browser and server over a single persistent connection, for two-way real-time messaging.

MDN
QUIC / HTTP/3

QUIC is a UDP-based transport with built-in encryption and multiplexing without head-of-line blocking. HTTP/3 is HTTP running over QUIC.

MDN
OCR / CER

OCR (optical character recognition) extracts text from images. CER (character error rate) measures how wrong the extraction is, as a fraction of characters.

Wikipedia
ACCOUNT_USAGE

Snowflake's schema of views exposing account-level metadata: query history, storage, and credit consumption. The basis for most cost-optimization analysis.

Snowflake docs