tier2_profiler
sqllocks_spindle.inference.tier2_profiler
¶
Tier 2 fidelity improvements.
Adds: - FormatPreservationAnalyzer — detect and compare format patterns (email, phone, UUID, …) - StringSimilarityAnalyzer — n-gram cosine similarity between string value distributions - CardinalityConstraintChecker — flag columns where synth cardinality diverges significantly - AnomalyRateChecker — verify _spindle_is_anomaly rates match expected fractions - Tier2Report — composite result dataclass
Classes¶
FormatPreservationResult
dataclass
¶
Format preservation metrics for a single string column.
FormatPreservationAnalyzer
¶
Detect format patterns in real data and check synth preserves them.
StringSimilarityResult
dataclass
¶
Character n-gram cosine similarity between string column value distributions.
StringSimilarityAnalyzer
¶
Compute character n-gram cosine similarity between real and synth string columns.
CardinalityConstraintResult
dataclass
¶
Cardinality comparison for a single column.
CardinalityConstraintChecker
¶
Check that synthetic cardinality stays within tolerance of real cardinality.
AnomalyRateResult
dataclass
¶
Checks whether the injected anomaly rate matches the registered anomaly fraction.
Tier2Report
dataclass
¶
Functions:¶
check_anomaly_rates(df, expected_fractions=None, tolerance=0.05)
¶
Verify _spindle_is_anomaly rate in a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame produced by AnomalyRegistry.inject(). |
required |
expected_fractions
|
dict[str, float] | None
|
Optional mapping of anomaly_type -> expected fraction. If None, uses overall anomaly rate with expected = 0.0 (no anomalies). |
None
|
tolerance
|
float
|
Acceptable deviation from expected fraction. |
0.05
|
Returns:
| Type | Description |
|---|---|
AnomalyRateResult | None
|
AnomalyRateResult or None if no anomaly columns present. |
run_tier2(real, synthetic, expected_anomaly_fractions=None)
¶
Run all Tier 2 checks and return a Tier2Report.