Skip to content

safe_validator

sqllocks_spindle.inference.safe_validator

SafeProfileValidator — structural, fail-closed static leak scanner (STORY-010).

ADR-006: a static scanner over a serialized artifact (never the live data), usable as a pre-commit / CI gate. Non-zero exit on any hit.

PO rewrite (2026-06-04) — STRUCTURAL deny rules, not a name/shape allowlist

The first implementation was security-refuted: it only matched literal field names (min_value/max_value/…) and only walked a top-level "tables" key, so five bypasses passed clean. This validator is structural and fail-closed:

  • Walk every dict/list node recursively — it does not depend on a "tables" key or on columns being dicts. Legacy RegistryProfile JSON (top-level columns, no "tables") and list-shaped columns are still fully scanned.
  • Deny by shape, not by name:

  • any list of more than k raw strings (catches top_values / samples / any key);

  • any numeric extreme-pair — a min/max-ish pair under ANY parent (e.g. bounds.min/bounds.max, min_value/max_value, min/max);
  • any value matching a PII regex (SSN / email / phone / IP / IBAN) anywhere.

  • Allowlist resolution (PO decision 2026-06-04, Option A): "no name-only allowlisting" forbids relying on names to catch a leak (deny rules are structural). It does NOT forbid a tight, closed ALLOWLIST of schema-known safe-aggregate containers. The ONLY safe containers that legitimately carry bare min/max are string_length and length_dist (len()-derived aggregates — never raw values; bounds uses lo/hi, not min/max). So a min/max extreme-pair is EXEMPT only when its immediate parent key is string_length or length_dist; under ANY other parent it is FLAGGED. This passes a safe SafeProfile artifact (whose PII-gated columns carry length_dist with min/max length aggregates) while flagging the legacy RegistryProfile raw-value leak (bare min/max directly under a column, not inside a length container).

  • Fail-CLOSED on ambiguity: if a table's row_count is absent / unknown, a node's safety can't be determined, or the artifact lacks SafeProfile schema markers (legacy / foreign JSON), the artifact is FLAGGED — never skipped. The only "safe" exits are artifacts proven clean.

  • The robust unsafe=true stamp check is retained (ADR-005).

Classes

ValidationFinding dataclass

A single leak finding, with the JSON path that triggered it.

ValidationResult dataclass

Outcome of a scan. is_clean only when zero findings.

Attributes
exit_code property

0 only on a proven-clean artifact; 1 on any finding.

SafeProfileValidator

Structural, fail-closed static leak scanner over a serialized artifact.

Usage::

result = SafeProfileValidator().validate_file("profile.json")
sys.exit(result.exit_code)
Methods:
validate_file(path)

Load and scan a JSON artifact file. Fail-closed on any read error.

validate_data(data, path='<data>')

Scan an already-parsed artifact. Fail-closed on missing markers.