Skip to content

lakehouse_profiler

sqllocks_spindle.inference.lakehouse_profiler

LakehouseProfiler — profile Fabric Lakehouse tables without a Spark session.

Uses the deltalake library (part of the [fabric] extra) to read Delta tables locally via ABFSS. Falls back to a REST API for table listing when deltalake is unavailable.

Requires: sqllocks-spindle[fabric] — deltalake>=0.17.0, pyarrow>=14.0

Classes

LakehouseProfiler

Profile Fabric Lakehouse Delta tables and return TableProfile objects.

Parameters:

Name Type Description Default
workspace_id str

Fabric workspace GUID.

required
lakehouse_id str

Fabric lakehouse GUID.

required
token_provider Any | None

A callable returning an Azure access token string. Defaults to DefaultAzureCredential when azure-identity is installed.

None
default_sample_rows int | None

Row limit for profiling. Pass None to scan entire table.

100000
Methods:
profile_table(table_name, sample_rows='default')

Profile a single Delta table.

profile_all(sample_rows='default')

Profile all tables in the lakehouse.

detect_foreign_keys(table_names=None, overlap_threshold=0.9, sample_rows='default', full_scan=False)

Sampled cross-table FK detection (advisory). ADR-009 / STORY-016.

Reads each table's columns (sampled by default) and runs the proven DataProfiler._detect_foreign_keys_advisory core (naming *_id plus value-overlap >= overlap_threshold) across every table pair. Detected FKs are advisory and reported with the measured overlap; a declared star_map / RelationshipDef remains authoritative and overrides (resolved by the caller, not here).

Parameters:

Name Type Description Default
table_names list[str] | None

Tables to scan. Defaults to all tables in the lakehouse.

None
overlap_threshold float

Minimum child-to-parent value overlap to report a FK (default 0.9, configurable per ADR-009).

0.9
sample_rows int | None | str

Per-table row cap used when reading key columns. "default" uses self.default_sample_rows; None reads the full table. Ignored when full_scan=True.

'default'
full_scan bool

Read entire tables (no sampling) to confirm a sampled result (ADR-009 full-scan option).

False

Returns:

Type Description
dict[str, dict[str, dict[str, Any]]]

``{child_table: {col_name: {"parent_table": str, "overlap": float,

dict[str, dict[str, dict[str, Any]]]

"advisory": True, "full_scan": bool}}}`` for every detected FK.

reconcile_declared_foreign_keys(detected, declared) staticmethod

Declared FKs override detected advisory FKs (ADR-009 / STORY-017).

A declared star_map / RelationshipDef is AUTHORITATIVE: where a declaration exists for a (child_table, child_col) it wins over any detected FK, even a high-overlap one. Detected FKs that a declaration overrode are REPORTED (not silently dropped) for transparency.

Parameters:

Name Type Description Default
detected dict[str, dict[str, dict[str, Any]]]

the output of :meth:detect_foreign_keys.

required
declared Any

iterable of (child_table, child_col, parent_table) tuples, or dicts with those keys.

required

Returns:

Type Description
dict[str, Any]

{"foreign_keys": <resolved map>, "overridden": [<reports>]}.

dict[str, Any]

Resolved declared entries carry advisory=False, declared=True.