gates
sqllocks_spindle.validation.gates
¶
Validation gate framework for Spindle data generation.
Classes¶
ValidationContext
dataclass
¶
Context passed to each validation gate.
GateResult
dataclass
¶
Result from a single validation gate check.
ValidationGate
¶
ReferentialIntegrityGate
¶
Bases: ValidationGate
Check that all FK relationships hold across tables.
Every FK value in a child column must exist in the referenced parent PK column. Reports orphan counts per relationship.
SchemaConformanceGate
¶
Bases: ValidationGate
Check that DataFrames match the expected schema.
Validates column names are present, data types are compatible, and no unexpected columns exist. Uses the SpindleSchema from context or an expected_schema dict from config.
NullConstraintGate
¶
UniqueConstraintGate
¶
RangeConstraintGate
¶
Bases: ValidationGate
Check that numeric columns are within expected ranges.
Configure via context.config with a dict of: { "ranges": { "table_name.column_name": {"min": 0, "max": 100}, ... } }
TemporalConsistencyGate
¶
Bases: ValidationGate
Check temporal consistency of date/datetime columns.
Validates: - Dates are within expected range (configurable) - No unexpected future dates - Temporal ordering (e.g., end_date >= start_date)
Configure via context.config: { "date_range": {"start": "2020-01-01", "end": "2025-12-31"}, "no_future": ["table.column", ...], "ordering": [ {"table": "orders", "start": "order_date", "end": "ship_date"}, ... ] }
FileFormatGate
¶
Bases: ValidationGate
Validate output files are readable, correct format, and not truncated.
Checks parquet, CSV, and JSONL files. Takes file paths from context.file_paths.
SchemaDriftGate
¶
Bases: ValidationGate
Detect schema drift between current data and a baseline schema.
Detects: - Additive changes (new columns, new tables) - Breaking changes (removed columns, renamed columns, retyped columns)
Configure via context.config: { "baseline": { "table_name": { "columns": {"col1": "int64", "col2": "object", ...} }, ... } }
DistributionGate
¶
Bases: ValidationGate
Check that numeric and enum columns match schema-declared distributions.
For columns with strategy="distribution": runs a Kolmogorov-Smirnov (KS) test comparing the actual data to the fitted scipy distribution from the schema. KS p-value < alpha produces a warning (not an error — distribution drift is expected at scale; hard failures are reserved for broken data).
For columns with strategy="enum": runs a chi-squared test comparing observed category frequencies to expected probabilities. Missing expected values produce a warning.
Requires scipy. Skips gracefully (with a warning) when scipy is not installed. Configure significance threshold via context.config["distribution_alpha"] (default 0.05).
GateRunner
¶
Run validation gates against a context and collect results.
Methods:¶
available_gates()
staticmethod
¶
Return names of all registered gates.
register_gate(name, gate_cls)
staticmethod
¶
Register a custom gate in the global registry.
run_all(context)
¶
Run all configured gates and return results.
run_gate(gate_name, context)
¶
Run a single gate by name.
summary(results)
staticmethod
¶
Produce an aggregate summary of gate results.