Skip to content

comparator

sqllocks_spindle.inference.comparator

Fidelity comparator — compare real vs synthetic data quality.

Produces a FidelityReport with per-column and per-table scores (0-100) based on statistical tests, distribution matching, null rates, and cardinality analysis.

Classes

ColumnFidelity dataclass

Fidelity metrics for a single column.

TableFidelity dataclass

Fidelity metrics for a table.

FidelityReport dataclass

Complete fidelity report comparing real vs synthetic data.

Methods:
summary()

Generate a plain-text summary.

to_markdown()

Generate markdown report.

failing_columns(threshold=85.0)

Return (table, column, score) tuples for columns below threshold.

Parameters:

Name Type Description Default
threshold float

Score threshold (0-100). Columns with score < threshold are included.

85.0

Returns:

Type Description
list[tuple[str, str, float]]

List of (table_name, column_name, score) tuples, sorted by score (lowest first).

to_dict()

Return a JSON-serializable dict representation.

to_dataframe()

Return a flat pandas DataFrame with one row per column.

to_html(title='Spindle Fidelity Report')

Render fidelity report as a self-contained HTML page.

Uses inline CSS — no external dependencies. Score bands: green ≥ 85, amber 70-84, red < 70.

score(real, synthetic, table_name='table', threshold=85.0) classmethod

Compare two DataFrames and return a FidelityReport.

Convenience classmethod for single-table comparison.

Parameters:

Name Type Description Default
real 'pd.DataFrame'

Real data DataFrame.

required
synthetic 'pd.DataFrame'

Synthetic data DataFrame to compare.

required
table_name str

Name for the table in the report (default: "table").

'table'
threshold float

Score threshold for failing_columns() (default: 85.0).

85.0

Returns:

Type Description
'FidelityReport'

FidelityReport comparing the two DataFrames.

FidelityComparator

Compare real and synthetic datasets to produce a fidelity report.

Methods:
compare(real, synthetic)

Compare real vs synthetic data across all shared tables.