Skip to content

profiles

sqllocks_spindle.profiles

Profile registry — named, tagged data profiles for fidelity validation.

Classes

RegistryProfile dataclass

A named, tagged data profile stored in the registry.

Identity format: <system>/<table>/<profile_name> e.g. salesforce/customer/prod-2026Q2

Attributes
identity property

Fully-qualified profile identity: system/table/name.

ProfileRegistry

Manages named, tagged profiles under a configurable root directory.

Directory layout::

<root>/
  <system>/
    <table>/
      <profile_name>.json
  _index.json          ← auto-maintained index
Methods:
save(profile)

Save a profile to disk and update the index.

load(identity)

Load a profile by identity (system/table/name).

delete(identity)

Delete a profile from disk and index.

list_all()

Return all index entries sorted by identity.

search(query=None, system=None, table=None, tags=None)

Filter index entries by query string, system, table, and/or tags.

add_tags(identity, tags)

Add tags to a profile (in-place, no duplicates).

remove_tags(identity, tags)

Remove tags from a profile.

import_from_dir(source_dir, overwrite=False)

Import all *.json profile files from a directory tree.

Returns a list of imported identity strings.

diff(identity_a, identity_b)

Compare two profiles column by column.

Returns a dict with keys: added, removed, changed.

reindex()

Rebuild _index.json from all .json files on disk. Returns count.

save_from_dataset_profile(dataset_profile, system, name, tags=None, description='', config=None)

Convert a DatasetProfile into registry profiles via the SafeProfile mapper.

STORY-014 (ADR-001): the per-column stats are now built through SafeProfile.from_dataset_profile (the canonical safe-and-correct mapper), NOT the old hand-read of non-existent .min/.max/ .top_values attributes (the B2 attribute-mismatch bug). So registry profiles carry the SAFE statistic set (dtype/null_rate/cardinality/mean/ std/quantiles/bounds/categorical_weights/categorical_histogram), with no raw values. The RegistryProfile wrapper (system/table/name/tags/ description/source_rows) is unchanged, so the registry read side (load/diff/tag/reindex) is unaffected; no on-disk format break, no sidecar.

Legacy registry files (old min/max/top_values columns) still load as-is. config is forwarded to the SafeProfile mapper (e.g. k, sensitive).

One RegistryProfile is created per table. Returns the saved profiles.

validate(identity, result, sample_rows=500)

Compare a GenerationResult against a stored profile.

Reconstructs an approximate reference DataFrame from stored column statistics and runs FidelityComparator against the new generation. Returns a FidelityReport. Requires scipy.