tier3_research

`sqllocks_spindle.inference.tier3_research` ¶

Tier 3 research-grade fidelity features.

Experimental features that raise the synthetic data quality ceiling:

ChowLiuNetwork — Bayesian network structure learning via Chow-Liu algorithm
DifferentialPrivacy — Laplace/Gaussian noise injection for (ε,δ)-DP
DriftMonitor — Statistical drift detection between two DataFrames
BootstrapMode — Bootstrap-based synthetic generation from real data
CTGANWrapper — Optional wrapper around sdv/ctgan when installed

All features fail gracefully when optional dependencies (sdv, sklearn) are absent.

Classes¶

`BayesianEdge` `dataclass` ¶

A directed edge in the Chow-Liu tree.

`ChowLiuResult` `dataclass` ¶

Result of Chow-Liu tree structure learning.

`ChowLiuNetwork` ¶

Learn a Bayesian network tree structure using the Chow-Liu algorithm.

Computes pairwise mutual information between columns and finds the maximum spanning tree — the tree that best represents the joint distribution.

This is the theoretical backbone of synthetic data that preserves inter-column dependencies.

Methods:¶

`fit(df)` ¶

Learn the Chow-Liu tree from a DataFrame.

`DPResult` `dataclass` ¶

Result of applying differential privacy noise.

`DifferentialPrivacy` ¶

Apply Laplace or Gaussian noise to achieve (ε,δ)-differential privacy.

For synthetic data, this adds calibrated noise to numeric columns proportional to their sensitivity / ε, ensuring individual records cannot be re-identified.

Methods:¶

`apply(df, rng=None)` ¶

Apply differential privacy noise to all numeric columns.

Returns (noised_df, DPResult).

`ColumnDriftResult` `dataclass` ¶

Drift result for a single column.

`DriftReport` `dataclass` ¶

Drift report comparing a reference and current DataFrame.

`DriftMonitor` ¶

Detect statistical drift between reference and current DataFrames.

Uses KS test for numeric columns, Chi-squared for categoricals, and PSI as a supplementary signal.

Methods:¶

`compare(reference, current)` ¶

Compare reference and current DataFrames for drift.

`BootstrapResult` `dataclass` ¶

Result of bootstrap synthetic generation.

`BootstrapMode` ¶

Generate synthetic data by bootstrapping (sampling with replacement) from real data.

The simplest form of synthetic generation — preserves all real distributions exactly, but does not generalize beyond the source data. Useful as a baseline.

Methods:¶

`generate(source, n_rows=None, table_name='table', seed=42)` ¶

Generate synthetic DataFrame by bootstrapping source.

Parameters:

Name	Type	Description	Default
`source`	`DataFrame`	Real data to bootstrap from.	required
`n_rows`	`int \| None`	Number of rows to generate (default: same as source).	`None`
`table_name`	`str`	Name for result metadata.	`'table'`
`seed`	`int`	Random seed.	`42`

Returns:

Type	Description
`tuple[DataFrame, BootstrapResult]`	(synthetic_df, BootstrapResult)

`CTGANWrapper` ¶

Optional wrapper around CTGAN/TVAE from the sdv library.

Falls back gracefully if sdv is not installed. When available, CTGAN provides deep generative model quality for tabular data.

Install with: pip install sqllocks-spindle[deep]

Methods:¶

`fit(df, discrete_columns=None)` ¶

Fit the CTGAN model on real data.

`sample(n_rows)` ¶

Sample from the fitted CTGAN model.

tier3_research

sqllocks_spindle.inference.tier3_research ¶

Classes¶

BayesianEdge dataclass ¶

ChowLiuResult dataclass ¶

ChowLiuNetwork ¶

Methods:¶

fit(df) ¶

DPResult dataclass ¶

DifferentialPrivacy ¶

Methods:¶

apply(df, rng=None) ¶

ColumnDriftResult dataclass ¶

DriftReport dataclass ¶

DriftMonitor ¶

Methods:¶

compare(reference, current) ¶

BootstrapResult dataclass ¶

BootstrapMode ¶

Methods:¶

generate(source, n_rows=None, table_name='table', seed=42) ¶

CTGANWrapper ¶

Methods:¶

fit(df, discrete_columns=None) ¶

sample(n_rows) ¶

`sqllocks_spindle.inference.tier3_research` ¶

`BayesianEdge` `dataclass` ¶

`ChowLiuResult` `dataclass` ¶

`ChowLiuNetwork` ¶

`fit(df)` ¶

`DPResult` `dataclass` ¶

`DifferentialPrivacy` ¶

`apply(df, rng=None)` ¶

`ColumnDriftResult` `dataclass` ¶

`DriftReport` `dataclass` ¶

`DriftMonitor` ¶

`compare(reference, current)` ¶

`BootstrapResult` `dataclass` ¶

`BootstrapMode` ¶

`generate(source, n_rows=None, table_name='table', seed=42)` ¶

`CTGANWrapper` ¶

`fit(df, discrete_columns=None)` ¶

`sample(n_rows)` ¶