safe_validator
sqllocks_spindle.inference.safe_validator
¶
SafeProfileValidator — structural, fail-closed static leak scanner (STORY-010).
ADR-006: a static scanner over a serialized artifact (never the live data), usable as a pre-commit / CI gate. Non-zero exit on any hit.
PO rewrite (2026-06-04) — STRUCTURAL deny rules, not a name/shape allowlist¶
The first implementation was security-refuted: it only matched literal field
names (min_value/max_value/…) and only walked a top-level "tables"
key, so five bypasses passed clean. This validator is structural and
fail-closed:
- Walk every dict/list node recursively — it does not depend on a
"tables"key or on columns being dicts. LegacyRegistryProfileJSON (top-levelcolumns, no"tables") and list-shaped columns are still fully scanned. -
Deny by shape, not by name:
-
any list of more than
kraw strings (catchestop_values/samples/ any key); - any numeric extreme-pair — a
min/max-ish pair under ANY parent (e.g.bounds.min/bounds.max,min_value/max_value,min/max); -
any value matching a PII regex (SSN / email / phone / IP / IBAN) anywhere.
-
Allowlist resolution (PO decision 2026-06-04, Option A): "no name-only allowlisting" forbids relying on names to catch a leak (deny rules are structural). It does NOT forbid a tight, closed ALLOWLIST of schema-known safe-aggregate containers. The ONLY safe containers that legitimately carry bare
min/maxarestring_lengthandlength_dist(len()-derived aggregates — never raw values;boundsuseslo/hi, not min/max). So amin/maxextreme-pair is EXEMPT only when its immediate parent key isstring_lengthorlength_dist; under ANY other parent it is FLAGGED. This passes a safeSafeProfileartifact (whose PII-gated columns carrylength_distwith min/max length aggregates) while flagging the legacyRegistryProfileraw-value leak (baremin/maxdirectly under a column, not inside a length container). -
Fail-CLOSED on ambiguity: if a table's
row_countis absent / unknown, a node's safety can't be determined, or the artifact lacksSafeProfileschema markers (legacy / foreign JSON), the artifact is FLAGGED — never skipped. The only "safe" exits are artifacts proven clean. -
The robust
unsafe=truestamp check is retained (ADR-005).
Classes¶
ValidationFinding
dataclass
¶
A single leak finding, with the JSON path that triggered it.
ValidationResult
dataclass
¶
SafeProfileValidator
¶
Structural, fail-closed static leak scanner over a serialized artifact.
Usage::
result = SafeProfileValidator().validate_file("profile.json")
sys.exit(result.exit_code)