Skip to content

profiler

sqllocks_spindle.inference.profiler

Data profiler — analyse pandas DataFrames and produce statistical profiles.

The profiler inspects column types, value distributions, cardinality, null rates, and inter-table foreign-key relationships to build a comprehensive DatasetProfile that the SchemaBuilder can convert into a SpindleSchema.

Classes

ColumnProfile dataclass

Statistical profile of a single column.

TableProfile dataclass

Profile of a single table (DataFrame).

DatasetProfile dataclass

Profile of a multi-table dataset.

DataProfiler

Analyse one or more DataFrames and produce profiles.

Methods:
profile_dataframe(df, table_name='table')

Profile a single DataFrame.

profile_dataset(tables)

Profile a dict of DataFrames and detect cross-table relationships.

profile(df, table_name='table')

Alias for profile_dataframe(). Profile a single DataFrame.

from_csv(path, table_name=None, sample_rows=None, **kwargs) classmethod

Profile a CSV file.

Parameters:

Name Type Description Default
path str | Path

Path to the CSV file.

required
table_name str | None

Name for the table profile. Defaults to the filename stem.

None
sample_rows int | None

If set, sample this many rows before profiling.

None
**kwargs

Passed to DataProfiler constructor (fit_threshold, top_n_values, etc.).

{}