tier3_research
sqllocks_spindle.inference.tier3_research
¶
Tier 3 research-grade fidelity features.
Experimental features that raise the synthetic data quality ceiling:
- ChowLiuNetwork — Bayesian network structure learning via Chow-Liu algorithm
- DifferentialPrivacy — Laplace/Gaussian noise injection for (ε,δ)-DP
- DriftMonitor — Statistical drift detection between two DataFrames
- BootstrapMode — Bootstrap-based synthetic generation from real data
- CTGANWrapper — Optional wrapper around sdv/ctgan when installed
All features fail gracefully when optional dependencies (sdv, sklearn) are absent.
Classes¶
BayesianEdge
dataclass
¶
A directed edge in the Chow-Liu tree.
ChowLiuResult
dataclass
¶
Result of Chow-Liu tree structure learning.
ChowLiuNetwork
¶
Learn a Bayesian network tree structure using the Chow-Liu algorithm.
Computes pairwise mutual information between columns and finds the maximum spanning tree — the tree that best represents the joint distribution.
This is the theoretical backbone of synthetic data that preserves inter-column dependencies.
DPResult
dataclass
¶
Result of applying differential privacy noise.
DifferentialPrivacy
¶
Apply Laplace or Gaussian noise to achieve (ε,δ)-differential privacy.
For synthetic data, this adds calibrated noise to numeric columns proportional to their sensitivity / ε, ensuring individual records cannot be re-identified.
ColumnDriftResult
dataclass
¶
Drift result for a single column.
DriftReport
dataclass
¶
Drift report comparing a reference and current DataFrame.
DriftMonitor
¶
BootstrapResult
dataclass
¶
Result of bootstrap synthetic generation.
BootstrapMode
¶
Generate synthetic data by bootstrapping (sampling with replacement) from real data.
The simplest form of synthetic generation — preserves all real distributions exactly, but does not generalize beyond the source data. Useful as a baseline.
Methods:¶
generate(source, n_rows=None, table_name='table', seed=42)
¶
Generate synthetic DataFrame by bootstrapping source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
DataFrame
|
Real data to bootstrap from. |
required |
n_rows
|
int | None
|
Number of rows to generate (default: same as source). |
None
|
table_name
|
str
|
Name for result metadata. |
'table'
|
seed
|
int
|
Random seed. |
42
|
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, BootstrapResult]
|
(synthetic_df, BootstrapResult) |
CTGANWrapper
¶
Optional wrapper around CTGAN/TVAE from the sdv library.
Falls back gracefully if sdv is not installed. When available, CTGAN provides deep generative model quality for tabular data.
Install with: pip install sqllocks-spindle[deep]