Skip to content

tier2_profiler

sqllocks_spindle.inference.tier2_profiler

Tier 2 fidelity improvements.

Adds: - FormatPreservationAnalyzer — detect and compare format patterns (email, phone, UUID, …) - StringSimilarityAnalyzer — n-gram cosine similarity between string value distributions - CardinalityConstraintChecker — flag columns where synth cardinality diverges significantly - AnomalyRateChecker — verify _spindle_is_anomaly rates match expected fractions - Tier2Report — composite result dataclass

Classes

FormatPreservationResult dataclass

Format preservation metrics for a single string column.

FormatPreservationAnalyzer

Detect format patterns in real data and check synth preserves them.

StringSimilarityResult dataclass

Character n-gram cosine similarity between string column value distributions.

StringSimilarityAnalyzer

Compute character n-gram cosine similarity between real and synth string columns.

CardinalityConstraintResult dataclass

Cardinality comparison for a single column.

CardinalityConstraintChecker

Check that synthetic cardinality stays within tolerance of real cardinality.

AnomalyRateResult dataclass

Checks whether the injected anomaly rate matches the registered anomaly fraction.

Tier2Report dataclass

Composite Tier 2 fidelity report.

Methods:
passing_rate()

Fraction of all checks that passed (0.0 - 1.0).

Functions:

check_anomaly_rates(df, expected_fractions=None, tolerance=0.05)

Verify _spindle_is_anomaly rate in a DataFrame.

Parameters:

Name Type Description Default
df DataFrame

DataFrame produced by AnomalyRegistry.inject().

required
expected_fractions dict[str, float] | None

Optional mapping of anomaly_type -> expected fraction. If None, uses overall anomaly rate with expected = 0.0 (no anomalies).

None
tolerance float

Acceptable deviation from expected fraction.

0.05

Returns:

Type Description
AnomalyRateResult | None

AnomalyRateResult or None if no anomaly columns present.

run_tier2(real, synthetic, expected_anomaly_fractions=None)

Run all Tier 2 checks and return a Tier2Report.