validation
sqllocks_spindle.validation
¶
Validation gates and quarantine for Spindle data generation.
Classes¶
DistributionGate
¶
Bases: ValidationGate
Check that numeric and enum columns match schema-declared distributions.
For columns with strategy="distribution": runs a Kolmogorov-Smirnov (KS) test comparing the actual data to the fitted scipy distribution from the schema. KS p-value < alpha produces a warning (not an error — distribution drift is expected at scale; hard failures are reserved for broken data).
For columns with strategy="enum": runs a chi-squared test comparing observed category frequencies to expected probabilities. Missing expected values produce a warning.
Requires scipy. Skips gracefully (with a warning) when scipy is not installed. Configure significance threshold via context.config["distribution_alpha"] (default 0.05).
FileFormatGate
¶
Bases: ValidationGate
Validate output files are readable, correct format, and not truncated.
Checks parquet, CSV, and JSONL files. Takes file paths from context.file_paths.
GateResult
dataclass
¶
Result from a single validation gate check.
GateRunner
¶
Run validation gates against a context and collect results.
Methods:¶
available_gates()
staticmethod
¶
Return names of all registered gates.
register_gate(name, gate_cls)
staticmethod
¶
Register a custom gate in the global registry.
run_all(context)
¶
Run all configured gates and return results.
run_gate(gate_name, context)
¶
Run a single gate by name.
summary(results)
staticmethod
¶
Produce an aggregate summary of gate results.
NullConstraintGate
¶
RangeConstraintGate
¶
Bases: ValidationGate
Check that numeric columns are within expected ranges.
Configure via context.config with a dict of: { "ranges": { "table_name.column_name": {"min": 0, "max": 100}, ... } }
ReferentialIntegrityGate
¶
Bases: ValidationGate
Check that all FK relationships hold across tables.
Every FK value in a child column must exist in the referenced parent PK column. Reports orphan counts per relationship.
SchemaConformanceGate
¶
Bases: ValidationGate
Check that DataFrames match the expected schema.
Validates column names are present, data types are compatible, and no unexpected columns exist. Uses the SpindleSchema from context or an expected_schema dict from config.
SchemaDriftGate
¶
Bases: ValidationGate
Detect schema drift between current data and a baseline schema.
Detects: - Additive changes (new columns, new tables) - Breaking changes (removed columns, renamed columns, retyped columns)
Configure via context.config: { "baseline": { "table_name": { "columns": {"col1": "int64", "col2": "object", ...} }, ... } }
TemporalConsistencyGate
¶
Bases: ValidationGate
Check temporal consistency of date/datetime columns.
Validates: - Dates are within expected range (configurable) - No unexpected future dates - Temporal ordering (e.g., end_date >= start_date)
Configure via context.config: { "date_range": {"start": "2020-01-01", "end": "2025-12-31"}, "no_future": ["table.column", ...], "ordering": [ {"table": "orders", "start": "order_date", "end": "ship_date"}, ... ] }
UniqueConstraintGate
¶
ValidationContext
dataclass
¶
Context passed to each validation gate.
ValidationGate
¶
QuarantineEntry
dataclass
¶
Metadata for a single quarantined artifact.
QuarantineManager
¶
Move or copy failed artifacts to a quarantine directory.
Quarantine directory layout::
<quarantine_root>/<domain>/<run_id>/
<filename>
<filename>._quarantine_meta.json
Methods:¶
quarantine_file(source_path, quarantine_root, run_id, reason, gate_name='unknown')
¶
Copy a file into the quarantine directory with metadata.
Returns the path to the quarantined copy.
quarantine_dataframe(df, quarantine_root, run_id, table_name, reason, gate_name='unknown', fmt='parquet')
¶
Write a DataFrame to quarantine with metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The DataFrame to quarantine. |
required |
quarantine_root
|
str | Path
|
Root quarantine directory. |
required |
run_id
|
str
|
Unique identifier for the generation run. |
required |
table_name
|
str
|
Logical table name. |
required |
reason
|
str
|
Why this artifact was quarantined. |
required |
gate_name
|
str
|
Which validation gate triggered quarantine. |
'unknown'
|
fmt
|
str
|
Output format — "parquet", "csv", or "jsonl". |
'parquet'
|
Returns the path to the quarantined file.
list_quarantined(quarantine_root)
¶
List all quarantined items across all domains and runs.
Returns a list of dicts with quarantine metadata.
get_quarantine_report(quarantine_root, run_id)
¶
Get a detailed report for a specific run's quarantined artifacts.
Returns a dict with run-level summary and per-artifact details.