Skip to content

gates

sqllocks_spindle.validation.gates

Validation gate framework for Spindle data generation.

Classes

ValidationContext dataclass

Context passed to each validation gate.

GateResult dataclass

Result from a single validation gate check.

ValidationGate

Bases: ABC

Abstract base class for all validation gates.

Methods:
check(context) abstractmethod

Run this gate's validation checks against the given context.

ReferentialIntegrityGate

Bases: ValidationGate

Check that all FK relationships hold across tables.

Every FK value in a child column must exist in the referenced parent PK column. Reports orphan counts per relationship.

SchemaConformanceGate

Bases: ValidationGate

Check that DataFrames match the expected schema.

Validates column names are present, data types are compatible, and no unexpected columns exist. Uses the SpindleSchema from context or an expected_schema dict from config.

NullConstraintGate

Bases: ValidationGate

Check that non-nullable columns have no null values.

UniqueConstraintGate

Bases: ValidationGate

Check that primary key columns have no duplicate values.

RangeConstraintGate

Bases: ValidationGate

Check that numeric columns are within expected ranges.

Configure via context.config with a dict of: { "ranges": { "table_name.column_name": {"min": 0, "max": 100}, ... } }

TemporalConsistencyGate

Bases: ValidationGate

Check temporal consistency of date/datetime columns.

Validates: - Dates are within expected range (configurable) - No unexpected future dates - Temporal ordering (e.g., end_date >= start_date)

Configure via context.config: { "date_range": {"start": "2020-01-01", "end": "2025-12-31"}, "no_future": ["table.column", ...], "ordering": [ {"table": "orders", "start": "order_date", "end": "ship_date"}, ... ] }

FileFormatGate

Bases: ValidationGate

Validate output files are readable, correct format, and not truncated.

Checks parquet, CSV, and JSONL files. Takes file paths from context.file_paths.

SchemaDriftGate

Bases: ValidationGate

Detect schema drift between current data and a baseline schema.

Detects: - Additive changes (new columns, new tables) - Breaking changes (removed columns, renamed columns, retyped columns)

Configure via context.config: { "baseline": { "table_name": { "columns": {"col1": "int64", "col2": "object", ...} }, ... } }

DistributionGate

Bases: ValidationGate

Check that numeric and enum columns match schema-declared distributions.

For columns with strategy="distribution": runs a Kolmogorov-Smirnov (KS) test comparing the actual data to the fitted scipy distribution from the schema. KS p-value < alpha produces a warning (not an error — distribution drift is expected at scale; hard failures are reserved for broken data).

For columns with strategy="enum": runs a chi-squared test comparing observed category frequencies to expected probabilities. Missing expected values produce a warning.

Requires scipy. Skips gracefully (with a warning) when scipy is not installed. Configure significance threshold via context.config["distribution_alpha"] (default 0.05).

GateRunner

Run validation gates against a context and collect results.

Methods:
available_gates() staticmethod

Return names of all registered gates.

register_gate(name, gate_cls) staticmethod

Register a custom gate in the global registry.

run_all(context)

Run all configured gates and return results.

run_gate(gate_name, context)

Run a single gate by name.

summary(results) staticmethod

Produce an aggregate summary of gate results.