inference

`sqllocks_spindle.inference` ¶

Spindle inference engine — profile existing data and infer schemas.

Provides DataProfiler for analysing DataFrames and SchemaBuilder for converting profiles into ready-to-use SpindleSchema objects. Also includes FidelityComparator for comparing real vs synthetic data.

Classes¶

`DataMasker` ¶

Replace PII in real data with synthetic values preserving distributions.

Methods:¶

`mask(tables, config=None)` ¶

Mask PII columns across all tables.

Parameters¶

tables: Mapping of table name to DataFrame. config: Optional masking configuration. Defaults are sensible.

Returns¶

MaskResult with masked DataFrames and statistics.

`MaskConfig` `dataclass` ¶

Configuration for data masking.

`MaskResult` `dataclass` ¶

Result of masking operation.

Methods:¶

`summary()` ¶

Return a human-readable summary of the masking result.

`ColumnFidelity` `dataclass` ¶

Fidelity metrics for a single column.

`FidelityComparator` ¶

Compare real and synthetic datasets to produce a fidelity report.

Methods:¶

`compare(real, synthetic)` ¶

Compare real vs synthetic data across all shared tables.

`FidelityReport` `dataclass` ¶

Complete fidelity report comparing real vs synthetic data.

Methods:¶

`summary()` ¶

Generate a plain-text summary.

`to_markdown()` ¶

Generate markdown report.

`failing_columns(threshold=85.0)` ¶

Return (table, column, score) tuples for columns below threshold.

Parameters:

Name	Type	Description	Default
`threshold`	`float`	Score threshold (0-100). Columns with score < threshold are included.	`85.0`

Returns:

Type	Description
`list[tuple[str, str, float]]`	List of (table_name, column_name, score) tuples, sorted by score (lowest first).

`to_dict()` ¶

Return a JSON-serializable dict representation.

`to_dataframe()` ¶

Return a flat pandas DataFrame with one row per column.

`to_html(title='Spindle Fidelity Report')` ¶

Render fidelity report as a self-contained HTML page.

Uses inline CSS — no external dependencies. Score bands: green ≥ 85, amber 70-84, red < 70.

`score(real, synthetic, table_name='table', threshold=85.0)` `classmethod` ¶

Compare two DataFrames and return a FidelityReport.

Convenience classmethod for single-table comparison.

Parameters:

Name	Type	Description	Default
`real`	`'pd.DataFrame'`	Real data DataFrame.	required
`synthetic`	`'pd.DataFrame'`	Synthetic data DataFrame to compare.	required
`table_name`	`str`	Name for the table in the report (default: "table").	`'table'`
`threshold`	`float`	Score threshold for failing_columns() (default: 85.0).	`85.0`

Returns:

Type	Description
`'FidelityReport'`	FidelityReport comparing the two DataFrames.

`TableFidelity` `dataclass` ¶

Fidelity metrics for a table.

`ColumnProfile` `dataclass` ¶

Statistical profile of a single column.

`DataProfiler` ¶

Analyse one or more DataFrames and produce profiles.

Methods:¶

`profile_dataframe(df, table_name='table')` ¶

Profile a single DataFrame.

`profile_dataset(tables)` ¶

Profile a dict of DataFrames and detect cross-table relationships.

`profile(df, table_name='table')` ¶

Alias for profile_dataframe(). Profile a single DataFrame.

`from_csv(path, table_name=None, sample_rows=None, **kwargs)` `classmethod` ¶

Profile a CSV file.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to the CSV file.	required
`table_name`	`str \| None`	Name for the table profile. Defaults to the filename stem.	`None`
`sample_rows`	`int \| None`	If set, sample this many rows before profiling.	`None`
`**kwargs`		Passed to DataProfiler constructor (fit_threshold, top_n_values, etc.).	`{}`

`DatasetProfile` `dataclass` ¶

Profile of a multi-table dataset.

`TableProfile` `dataclass` ¶

Profile of a single table (DataFrame).

`ExportedProfile` `dataclass` ¶

A portable profile that can be imported into any domain.

Attributes:

Name	Type	Description
`name`	`str`	Profile identifier (e.g. `"default"`, `"high_volume"`).
`description`	`str`	Human-readable description of what this profile represents.
`source_domain`	`str`	Name of the domain this profile was exported from (or `"inferred"` when created via :meth:`ProfileIO.from_dataframe`).
`distributions`	`dict[str, dict[str, float]]`	Mapping of `"table.column"` keys to value→weight dicts.
`ratios`	`dict[str, float]`	Mapping of ratio names to float multipliers.
`metadata`	`dict[str, Any]`	Arbitrary extra information (row counts, column types, etc.).

`ProfileIO` ¶

Export, import, and list domain profiles.

All public methods are stateless — no configuration is stored on the instance. Instantiate with ProfileIO() and call methods directly.

Example::

io = ProfileIO()
io.export_profile(RetailDomain(), Path("retail_profile.json"))
io.import_profile(Path("retail_profile.json"), HealthcareDomain(), save_as="from_retail")
io.list_profiles(RetailDomain())

Methods:¶

`export_profile(domain, output_path, profile_name='default')` ¶

Export a domain's active profile to a standalone JSON file.

Parameters:

Name	Type	Description	Default
`domain`	`Any`	A :class:`~sqllocks_spindle.domains.base.Domain` instance whose `_profile` dict will be serialised.	required
`output_path`	`str \| Path`	Destination file path (created if it does not exist).	required
`profile_name`	`str`	Label stored in the exported metadata.	`'default'`

Returns:

Type	Description
`Path`	The resolved :class:`Path` the profile was written to.

`import_profile(profile_path, target_domain, save_as=None)` ¶

Import an exported profile into a target domain's profiles/ directory.

The imported file is converted to the standard domain profile format (i.e. metadata is stripped; only name, description, distributions, and ratios are kept).

Parameters:

Name	Type	Description	Default
`profile_path`	`str \| Path`	Path to an exported profile JSON file.	required
`target_domain`	`Any`	The domain instance to import into.	required
`save_as`	`str \| None`	Override the profile name (and filename). When None the name is taken from the file's `"name"` field.	`None`

Returns:

Type	Description
`str`	The name the profile was saved as.

`list_profiles(domain)` ¶

List all profiles available for a domain.

Parameters:

Name	Type	Description	Default
`domain`	`Any`	A :class:`~sqllocks_spindle.domains.base.Domain` instance.	required

Returns:

Type	Description
`list[dict[str, str \| int]]`	A list of dicts with keys `name`, `description`,
`list[dict[str, str \| int]]`	`distributions` (count), and `ratios` (count).

`from_dataframe(df, table_name='table', name='inferred')` ¶

Create a profile by inferring distributions from a DataFrame.

Categorical columns (object dtype or low cardinality) are converted into normalised distribution weights. High-cardinality columns are skipped.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The source DataFrame.	required
`table_name`	`str`	Prefix for distribution keys (`"table_name.column"`).	`'table'`
`name`	`str`	Name to assign to the resulting profile.	`'inferred'`

Returns:

Name	Type	Description
`An`	`ExportedProfile`	class:`ExportedProfile` ready to be serialised or imported.

`ProfileStore` ¶

Persist and retrieve a :class:SafeProfile to/from a JSON file.

All methods are stateless — instantiate with ProfileStore() and call directly, or use the classmethods. This is the only supported public on-disk entrypoint for a SafeProfile (ADR-001 / ADR-007).

Methods:¶

`save(profile, path)` `classmethod` ¶

Write profile to path as JSON (via to_safe_dict).

Parameters:

Name	Type	Description	Default
`profile`	`SafeProfile`	The :class:`SafeProfile` to persist. By construction it carries no raw-bearing fields (ADR-007), so the on-disk JSON is safe-by-construction.	required
`path`	`str \| Path`	Destination file path. Parent directories are created.	required

Returns:

Type	Description
`Path`	The resolved :class:`Path` the profile was written to.

`load(path)` `classmethod` ¶

Read a SafeProfile from a JSON file written by :meth:save.

A file whose schema_version is not the version this code writes (e.g. a legacy artifact with no/old version, or a future version) is loaded read-only with a warning — it never crashes. The returned object is degraded-but-usable: the keys present are reconstructed, and the loaded schema_version is preserved on the returned object so a caller can detect the mismatch.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to a JSON file previously written by :meth:`save`.	required

Returns:

Type	Description
`SafeProfile`	A reconstructed :class:`SafeProfile`.

`SafeColumnProfile` `dataclass` ¶

Safe, persisted statistic set for a single column.

Carries ONLY non-raw-bearing statistics. Notably absent (by construction): min_value, max_value, enum_values, value_counts_ext.

Numeric extremes live in bounds (winsorized quantile bounds, ADR-002, populated by STORY-006). Categorical mass lives in categorical_weights (post-k-anon suppression, ADR-003, populated by STORY-007).

Methods:¶

`from_column_profile(col, config=None, row_count=None)` `classmethod` ¶

Map a rich ColumnProfile to a SafeColumnProfile (STORY-002).

Selects ONLY the safe-and-sufficient statistic set. Reads the REAL attribute names on ColumnProfile (min_value/max_value are never read — ADR-002; bounds derive from quantiles). This fixes the B2 attribute-mismatch bug class where the legacy registry read non-existent .min/.max/.top_values.

Disclosure-control transforms are applied via hooks that are STUBS in this story and become real in their owning stories:

bounds — winsorized quantile bounds (STORY-006 / ADR-002). Stub here: {"lo": p1, "hi": p99} taken from quantiles if present.
categorical_weights — k-anon suppression (STORY-007 / ADR-003): any value with count < k folded into a single __OTHER__ bucket. count is derived from the seeded proportion x row_count (the rich enum_values / value_counts_ext carry value->proportion, not raw counts). row_count is threaded in by the table mapper.
value-pattern PII gate (STORY-008 / ADR-004). When a column's detected pattern is a PII class (:pydata:PII_PATTERNS) OR its cardinality is approximately the row count (high-card free-text backstop), the column persists pattern + length_dist ONLY — categorical_ weights are dropped and no values are carried. Detection is name-independent (catches PII in notes / c_47). This is DEFENSE-IN-DEPTH, NOT a completeness guarantee (ADR-004 / ADR-011).

`to_safe_dict()` ¶

Serialize to a plain dict. Deterministic key order for byte-stability.

`SafeProfile` `dataclass` ¶

The canonical, versioned, on-disk safe profile (ADR-001).

Top-level transport object. Carries schema_version and an embedded redaction_manifest (populated by STORY-009 — present but empty here).

Methods:¶

`from_dataset_profile(dataset_profile, config=None, unsafe_full_fidelity=False)` `classmethod` ¶

Map a rich DatasetProfile to a SafeProfile (STORY-002 / ADR-001).

Builds one SafeTableProfile per table and one SafeColumnProfile per column, selecting ONLY the safe-and-sufficient statistic set. The rich profile is the source; the returned SafeProfile is the safe transport.

The mapper reads only REAL attribute names on the rich dataclasses (min_value/max_value are never read — bounds derive from quantiles per ADR-002), fixing the B2 attribute-mismatch bug class.

config is an optional per-profile/per-column settings dict threaded to the disclosure-control hooks (winsorization percentiles, k-anon k, PII gate).

Safe-by-default (ADR-005 / STORY-009)¶

The scrub — winsorized bounds (ADR-002), k-anon __OTHER__ suppression (ADR-003), and the value-pattern PII gate (ADR-004) — runs by DEFAULT. The safe path is the path of least resistance.

unsafe_full_fidelity=True is the explicit, single opt-out. It disables the disclosure-control transforms (k-anon suppression and the PII gate are turned off so full-fidelity categorical weights / values survive) and stamps unsafe=True on the returned profile. Such an artifact is rejected by validate --safe (STORY-010). It is the ONLY way to persist un-scrubbed statistics.

Every returned profile carries an accurate redaction_manifest (built from the rich source vs. the scrubbed safe columns — see :func:build_redaction_manifest).

`to_safe_dict()` ¶

Serialize to a plain dict with deterministic key order.

`SafeTableProfile` `dataclass` ¶

Safe, persisted profile for a single table.

Methods:¶

`from_table_profile(table, config=None)` `classmethod` ¶

Map a rich TableProfile to a SafeTableProfile (STORY-002).

One SafeColumnProfile per column. Carries the table-level correlation_matrix, primary_key and advisory detected_fks (names/overlap only — no raw values). Column order is preserved.

`to_safe_dict()` ¶

Serialize to a plain dict. Columns serialized in declared order.

`SafeProfileAdapter` ¶

Adapt a loaded :class:SafeProfile to a generatable :class:SpindleSchema.

Stateless; instantiate and call :meth:to_schema, or use the module-level :func:safe_profile_to_schema convenience wrapper.

Methods:¶

`to_schema(profile, domain_name='safe_inferred')` ¶

Build a :class:SpindleSchema from a loaded :class:SafeProfile.

The returned schema is ready to pass to Spindle().generate(schema=..., fidelity_profile=profile).

Raw fields are never consulted (there are none on the safe model); numeric clipping is driven by the winsorized bounds (ADR-002).

`SafeProfileValidator` ¶

Structural, fail-closed static leak scanner over a serialized artifact.

Usage::

result = SafeProfileValidator().validate_file("profile.json")
sys.exit(result.exit_code)

Methods:¶

`validate_file(path)` ¶

Load and scan a JSON artifact file. Fail-closed on any read error.

`validate_data(data, path='<data>')` ¶

Scan an already-parsed artifact. Fail-closed on missing markers.

`ValidationFinding` `dataclass` ¶

A single leak finding, with the JSON path that triggered it.

`ValidationResult` `dataclass` ¶

Outcome of a scan. is_clean only when zero findings.

Attributes¶

`exit_code` `property` ¶

0 only on a proven-clean artifact; 1 on any finding.

`SchemaBuilder` ¶

Convert a DatasetProfile into a SpindleSchema.

Methods:¶

`build(profile, domain_name='inferred', fit_threshold=0.8, correlation_threshold=0.5, include_anomaly_registry=False)` ¶

Build a complete SpindleSchema from a dataset profile.

`LakehouseProfiler` ¶

Profile Fabric Lakehouse Delta tables and return TableProfile objects.

Parameters:

Name	Type	Description	Default
`workspace_id`	`str`	Fabric workspace GUID.	required
`lakehouse_id`	`str`	Fabric lakehouse GUID.	required
`token_provider`	`Any \| None`	A callable returning an Azure access token string. Defaults to DefaultAzureCredential when azure-identity is installed.	`None`
`default_sample_rows`	`int \| None`	Row limit for profiling. Pass None to scan entire table.	`100000`

Methods:¶

`profile_table(table_name, sample_rows='default')` ¶

Profile a single Delta table.

`profile_all(sample_rows='default')` ¶

Profile all tables in the lakehouse.

`detect_foreign_keys(table_names=None, overlap_threshold=0.9, sample_rows='default', full_scan=False)` ¶

Sampled cross-table FK detection (advisory). ADR-009 / STORY-016.

Reads each table's columns (sampled by default) and runs the proven DataProfiler._detect_foreign_keys_advisory core (naming *_id plus value-overlap >= overlap_threshold) across every table pair. Detected FKs are advisory and reported with the measured overlap; a declared star_map / RelationshipDef remains authoritative and overrides (resolved by the caller, not here).

Parameters:

Name	Type	Description	Default
`table_names`	`list[str] \| None`	Tables to scan. Defaults to all tables in the lakehouse.	`None`
`overlap_threshold`	`float`	Minimum child-to-parent value overlap to report a FK (default 0.9, configurable per ADR-009).	`0.9`
`sample_rows`	`int \| None \| str`	Per-table row cap used when reading key columns. `"default"` uses `self.default_sample_rows`; `None` reads the full table. Ignored when `full_scan=True`.	`'default'`
`full_scan`	`bool`	Read entire tables (no sampling) to confirm a sampled result (ADR-009 full-scan option).	`False`

Returns:

Type	Description
`dict[str, dict[str, dict[str, Any]]]`	``{child_table: {col_name: {"parent_table": str, "overlap": float,
`dict[str, dict[str, dict[str, Any]]]`	"advisory": True, "full_scan": bool}}}`` for every detected FK.

`reconcile_declared_foreign_keys(detected, declared)` `staticmethod` ¶

Declared FKs override detected advisory FKs (ADR-009 / STORY-017).

A declared star_map / RelationshipDef is AUTHORITATIVE: where a declaration exists for a (child_table, child_col) it wins over any detected FK, even a high-overlap one. Detected FKs that a declaration overrode are REPORTED (not silently dropped) for transparency.

Parameters:

Name	Type	Description	Default
`detected`	`dict[str, dict[str, dict[str, Any]]]`	the output of :meth:`detect_foreign_keys`.	required
`declared`	`Any`	iterable of `(child_table, child_col, parent_table)` tuples, or dicts with those keys.	required

Returns:

Type	Description
`dict[str, Any]`	`{"foreign_keys": <resolved map>, "overridden": [<reports>]}`.
`dict[str, Any]`	Resolved declared entries carry `advisory=False, declared=True`.

`BootstrapMode` ¶

Generate synthetic data by bootstrapping (sampling with replacement) from real data.

The simplest form of synthetic generation — preserves all real distributions exactly, but does not generalize beyond the source data. Useful as a baseline.

Methods:¶

`generate(source, n_rows=None, table_name='table', seed=42)` ¶

Generate synthetic DataFrame by bootstrapping source.

Parameters:

Name	Type	Description	Default
`source`	`DataFrame`	Real data to bootstrap from.	required
`n_rows`	`int \| None`	Number of rows to generate (default: same as source).	`None`
`table_name`	`str`	Name for result metadata.	`'table'`
`seed`	`int`	Random seed.	`42`

Returns:

Type	Description
`tuple[DataFrame, BootstrapResult]`	(synthetic_df, BootstrapResult)

`BootstrapResult` `dataclass` ¶

Result of bootstrap synthetic generation.

`BayesianEdge` `dataclass` ¶

A directed edge in the Chow-Liu tree.

`ChowLiuNetwork` ¶

Learn a Bayesian network tree structure using the Chow-Liu algorithm.

Computes pairwise mutual information between columns and finds the maximum spanning tree — the tree that best represents the joint distribution.

This is the theoretical backbone of synthetic data that preserves inter-column dependencies.

Methods:¶

`fit(df)` ¶

Learn the Chow-Liu tree from a DataFrame.

`ChowLiuResult` `dataclass` ¶

Result of Chow-Liu tree structure learning.

`CTGANWrapper` ¶

Optional wrapper around CTGAN/TVAE from the sdv library.

Falls back gracefully if sdv is not installed. When available, CTGAN provides deep generative model quality for tabular data.

Install with: pip install sqllocks-spindle[deep]

Methods:¶

`fit(df, discrete_columns=None)` ¶

Fit the CTGAN model on real data.

`sample(n_rows)` ¶

Sample from the fitted CTGAN model.

`DifferentialPrivacy` ¶

Apply Laplace or Gaussian noise to achieve (ε,δ)-differential privacy.

For synthetic data, this adds calibrated noise to numeric columns proportional to their sensitivity / ε, ensuring individual records cannot be re-identified.

Methods:¶

`apply(df, rng=None)` ¶

Apply differential privacy noise to all numeric columns.

Returns (noised_df, DPResult).

`DPResult` `dataclass` ¶

Result of applying differential privacy noise.

`DriftMonitor` ¶

Detect statistical drift between reference and current DataFrames.

Uses KS test for numeric columns, Chi-squared for categoricals, and PSI as a supplementary signal.

Methods:¶

`compare(reference, current)` ¶

Compare reference and current DataFrames for drift.

`DriftReport` `dataclass` ¶

Drift report comparing a reference and current DataFrame.

`ColumnDriftResult` `dataclass` ¶

Drift result for a single column.

`AnomalyRateResult` `dataclass` ¶

Checks whether the injected anomaly rate matches the registered anomaly fraction.

`CardinalityConstraintChecker` ¶

Check that synthetic cardinality stays within tolerance of real cardinality.

`CardinalityConstraintResult` `dataclass` ¶

Cardinality comparison for a single column.

`FormatPreservationAnalyzer` ¶

Detect format patterns in real data and check synth preserves them.

`FormatPreservationResult` `dataclass` ¶

Format preservation metrics for a single string column.

`StringSimilarityAnalyzer` ¶

Compute character n-gram cosine similarity between real and synth string columns.

`StringSimilarityResult` `dataclass` ¶

Character n-gram cosine similarity between string column value distributions.

`Tier2Report` `dataclass` ¶

Composite Tier 2 fidelity report.

Methods:¶

`passing_rate()` ¶

Fraction of all checks that passed (0.0 - 1.0).

`AdvancedProfiler` ¶

Runs Tier 1 fidelity profiling on a pair of DataFrames (real + synthetic).

Usage::

profiler = AdvancedProfiler()
adv = profiler.profile_pair(real_df, synth_df, table_name="orders")
print(f"AUC: {adv.adversarial.auc_roc:.3f}")

Methods:¶

`profile_pair(real, synthetic, table_name='table')` ¶

Profile real + synthetic DataFrames and return AdvancedTableProfile.

`profile_single(df, table_name='table')` ¶

Profile a single DataFrame (no adversarial test — needs both real+synth).

`AdvancedTableProfile` `dataclass` ¶

Extended profile combining base stats with Tier 1 fidelity features.

`AdversarialResult` `dataclass` ¶

Result of the adversarial (distinguishability) test.

Attributes¶

`distinguishability_score` `property` ¶

0 = perfectly indistinguishable, 100 = perfectly distinguishable.

`ConditionalProfile` `dataclass` ¶

Conditional statistics for col_a given values of col_b.

`GMMFit` `dataclass` ¶

Gaussian Mixture Model fit for a numeric column.

`PeriodicityResult` `dataclass` ¶

FFT-based periodicity detection result.

`TemporalProfile` `dataclass` ¶

Temporal / sequence analysis for a datetime or sorted numeric column.

Functions:¶

`build_redaction_manifest(dataset_profile, safe_profile, config=None, unsafe=False)` ¶

Build the self-describing redaction manifest (ADR-005 / STORY-009).

The manifest is computed from the rich source profile and the scrubbed safe profile together, so it reports what was ACTUALLY suppressed — not what was intended. Accuracy is the AC: every figure is read off the real mapping outcome.

Shape::

{
  "unsafe": <bool>,            # mirrors SafeProfile.unsafe
  "k_default": <int>,          # profile-level k that applied by default
  "tables": {
    <table>: {
      <column>: {
        "categories_dropped": <int>,       # k-anon __OTHER__ folds (rare)
        "bounds_winsorized": <bool>,       # winsorized quantile bounds set
        "pattern_only": <bool>,            # PII-gated to pattern+length only
        "k": <int>,                        # effective k for this column
        "sensitive": <bool>,               # sensitive flag raised k
      }, ...
    }, ...
  }
}

rare_categories_dropped reads the per-column suppressed_category_count the k-anon hook actually recorded (STORY-007). pattern_only re-evaluates the exact PII-gate decision the mapper used (STORY-008). In unsafe mode the effective config disabled both controls, so these report 0 / False — accurately.

`safe_profile_to_schema(profile, domain_name='safe_inferred')` ¶

Convenience wrapper around :meth:SafeProfileAdapter.to_schema.

`check_anomaly_rates(df, expected_fractions=None, tolerance=0.05)` ¶

Verify _spindle_is_anomaly rate in a DataFrame.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	DataFrame produced by AnomalyRegistry.inject().	required
`expected_fractions`	`dict[str, float] \| None`	Optional mapping of anomaly_type -> expected fraction. If None, uses overall anomaly rate with expected = 0.0 (no anomalies).	`None`
`tolerance`	`float`	Acceptable deviation from expected fraction.	`0.05`

Returns:

Type	Description
`AnomalyRateResult \| None`	AnomalyRateResult or None if no anomaly columns present.

`run_tier2(real, synthetic, expected_anomaly_fractions=None)` ¶

Run all Tier 2 checks and return a Tier2Report.

inference

sqllocks_spindle.inference ¶

Classes¶

DataMasker ¶

Methods:¶

mask(tables, config=None) ¶

Parameters¶

Returns¶

MaskConfig dataclass ¶

MaskResult dataclass ¶

Methods:¶

summary() ¶

ColumnFidelity dataclass ¶

FidelityComparator ¶

Methods:¶

compare(real, synthetic) ¶

FidelityReport dataclass ¶

Methods:¶

summary() ¶

to_markdown() ¶

failing_columns(threshold=85.0) ¶

to_dict() ¶

to_dataframe() ¶

to_html(title='Spindle Fidelity Report') ¶

score(real, synthetic, table_name='table', threshold=85.0) classmethod ¶

TableFidelity dataclass ¶

ColumnProfile dataclass ¶

DataProfiler ¶

Methods:¶

profile_dataframe(df, table_name='table') ¶

profile_dataset(tables) ¶

profile(df, table_name='table') ¶

from_csv(path, table_name=None, sample_rows=None, **kwargs) classmethod ¶

DatasetProfile dataclass ¶

TableProfile dataclass ¶

ExportedProfile dataclass ¶

ProfileIO ¶

Methods:¶

export_profile(domain, output_path, profile_name='default') ¶

import_profile(profile_path, target_domain, save_as=None) ¶

list_profiles(domain) ¶

from_dataframe(df, table_name='table', name='inferred') ¶

ProfileStore ¶

Methods:¶

save(profile, path) classmethod ¶

load(path) classmethod ¶

SafeColumnProfile dataclass ¶

Methods:¶

from_column_profile(col, config=None, row_count=None) classmethod ¶

to_safe_dict() ¶

SafeProfile dataclass ¶

Methods:¶

from_dataset_profile(dataset_profile, config=None, unsafe_full_fidelity=False) classmethod ¶

Safe-by-default (ADR-005 / STORY-009)¶

to_safe_dict() ¶

SafeTableProfile dataclass ¶

Methods:¶

from_table_profile(table, config=None) classmethod ¶

to_safe_dict() ¶

SafeProfileAdapter ¶

Methods:¶

to_schema(profile, domain_name='safe_inferred') ¶

SafeProfileValidator ¶

Methods:¶

validate_file(path) ¶

validate_data(data, path='<data>') ¶

ValidationFinding dataclass ¶

ValidationResult dataclass ¶

Attributes¶

exit_code property ¶

SchemaBuilder ¶

Methods:¶

build(profile, domain_name='inferred', fit_threshold=0.8, correlation_threshold=0.5, include_anomaly_registry=False) ¶

LakehouseProfiler ¶

Methods:¶

profile_table(table_name, sample_rows='default') ¶

profile_all(sample_rows='default') ¶

detect_foreign_keys(table_names=None, overlap_threshold=0.9, sample_rows='default', full_scan=False) ¶

reconcile_declared_foreign_keys(detected, declared) staticmethod ¶

BootstrapMode ¶

`sqllocks_spindle.inference` ¶

`DataMasker` ¶

`mask(tables, config=None)` ¶

`MaskConfig` `dataclass` ¶

`MaskResult` `dataclass` ¶

`summary()` ¶

`ColumnFidelity` `dataclass` ¶

`FidelityComparator` ¶

`compare(real, synthetic)` ¶

`FidelityReport` `dataclass` ¶

`summary()` ¶

`to_markdown()` ¶

`failing_columns(threshold=85.0)` ¶

`to_dict()` ¶

`to_dataframe()` ¶

`to_html(title='Spindle Fidelity Report')` ¶

`score(real, synthetic, table_name='table', threshold=85.0)` `classmethod` ¶

`TableFidelity` `dataclass` ¶

`ColumnProfile` `dataclass` ¶

`DataProfiler` ¶

`profile_dataframe(df, table_name='table')` ¶

`profile_dataset(tables)` ¶

`profile(df, table_name='table')` ¶

`from_csv(path, table_name=None, sample_rows=None, **kwargs)` `classmethod` ¶

`DatasetProfile` `dataclass` ¶

`TableProfile` `dataclass` ¶

`ExportedProfile` `dataclass` ¶

`ProfileIO` ¶

`export_profile(domain, output_path, profile_name='default')` ¶

`import_profile(profile_path, target_domain, save_as=None)` ¶

`list_profiles(domain)` ¶

`from_dataframe(df, table_name='table', name='inferred')` ¶

`ProfileStore` ¶

`save(profile, path)` `classmethod` ¶

`load(path)` `classmethod` ¶

`SafeColumnProfile` `dataclass` ¶

`from_column_profile(col, config=None, row_count=None)` `classmethod` ¶

`to_safe_dict()` ¶

`SafeProfile` `dataclass` ¶

`from_dataset_profile(dataset_profile, config=None, unsafe_full_fidelity=False)` `classmethod` ¶

`to_safe_dict()` ¶

`SafeTableProfile` `dataclass` ¶

`from_table_profile(table, config=None)` `classmethod` ¶

`to_safe_dict()` ¶

`SafeProfileAdapter` ¶

`to_schema(profile, domain_name='safe_inferred')` ¶

`SafeProfileValidator` ¶

`validate_file(path)` ¶

`validate_data(data, path='<data>')` ¶

`ValidationFinding` `dataclass` ¶

`ValidationResult` `dataclass` ¶

`exit_code` `property` ¶

`SchemaBuilder` ¶

`build(profile, domain_name='inferred', fit_threshold=0.8, correlation_threshold=0.5, include_anomaly_registry=False)` ¶

`LakehouseProfiler` ¶

`profile_table(table_name, sample_rows='default')` ¶

`profile_all(sample_rows='default')` ¶

`detect_foreign_keys(table_names=None, overlap_threshold=0.9, sample_rows='default', full_scan=False)` ¶

`reconcile_declared_foreign_keys(detected, declared)` `staticmethod` ¶

`BootstrapMode` ¶

`generate(source, n_rows=None, table_name='table', seed=42)` ¶

`BootstrapResult` `dataclass` ¶

`BayesianEdge` `dataclass` ¶

`ChowLiuNetwork` ¶

`fit(df)` ¶

`ChowLiuResult` `dataclass` ¶

`CTGANWrapper` ¶

`fit(df, discrete_columns=None)` ¶

`sample(n_rows)` ¶

`DifferentialPrivacy` ¶

`apply(df, rng=None)` ¶

`DPResult` `dataclass` ¶

`DriftMonitor` ¶

`compare(reference, current)` ¶

`DriftReport` `dataclass` ¶

`ColumnDriftResult` `dataclass` ¶

`AnomalyRateResult` `dataclass` ¶

`CardinalityConstraintChecker` ¶

`CardinalityConstraintResult` `dataclass` ¶

`FormatPreservationAnalyzer` ¶