lakehouse_profiler
sqllocks_spindle.inference.lakehouse_profiler
¶
LakehouseProfiler — profile Fabric Lakehouse tables without a Spark session.
Uses the deltalake library (part of the [fabric] extra) to read Delta tables
locally via ABFSS. Falls back to a REST API for table listing when deltalake
is unavailable.
Requires: sqllocks-spindle[fabric] — deltalake>=0.17.0, pyarrow>=14.0
Classes¶
LakehouseProfiler
¶
Profile Fabric Lakehouse Delta tables and return TableProfile objects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
workspace_id
|
str
|
Fabric workspace GUID. |
required |
lakehouse_id
|
str
|
Fabric lakehouse GUID. |
required |
token_provider
|
Any | None
|
A callable returning an Azure access token string. Defaults to DefaultAzureCredential when azure-identity is installed. |
None
|
default_sample_rows
|
int | None
|
Row limit for profiling. Pass None to scan entire table. |
100000
|
Methods:¶
profile_table(table_name, sample_rows='default')
¶
Profile a single Delta table.
profile_all(sample_rows='default')
¶
Profile all tables in the lakehouse.
detect_foreign_keys(table_names=None, overlap_threshold=0.9, sample_rows='default', full_scan=False)
¶
Sampled cross-table FK detection (advisory). ADR-009 / STORY-016.
Reads each table's columns (sampled by default) and runs the proven
DataProfiler._detect_foreign_keys_advisory core (naming *_id plus
value-overlap >= overlap_threshold) across every table pair. Detected
FKs are advisory and reported with the measured overlap; a declared
star_map / RelationshipDef remains authoritative and overrides
(resolved by the caller, not here).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_names
|
list[str] | None
|
Tables to scan. Defaults to all tables in the lakehouse. |
None
|
overlap_threshold
|
float
|
Minimum child-to-parent value overlap to report a FK (default 0.9, configurable per ADR-009). |
0.9
|
sample_rows
|
int | None | str
|
Per-table row cap used when reading key columns.
|
'default'
|
full_scan
|
bool
|
Read entire tables (no sampling) to confirm a sampled result (ADR-009 full-scan option). |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, dict[str, Any]]]
|
``{child_table: {col_name: {"parent_table": str, "overlap": float, |
dict[str, dict[str, dict[str, Any]]]
|
"advisory": True, "full_scan": bool}}}`` for every detected FK. |
reconcile_declared_foreign_keys(detected, declared)
staticmethod
¶
Declared FKs override detected advisory FKs (ADR-009 / STORY-017).
A declared star_map / RelationshipDef is AUTHORITATIVE: where a
declaration exists for a (child_table, child_col) it wins over any
detected FK, even a high-overlap one. Detected FKs that a declaration
overrode are REPORTED (not silently dropped) for transparency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
detected
|
dict[str, dict[str, dict[str, Any]]]
|
the output of :meth: |
required |
declared
|
Any
|
iterable of |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
|
dict[str, Any]
|
Resolved declared entries carry |