profiler
sqllocks_spindle.inference.profiler
¶
Data profiler — analyse pandas DataFrames and produce statistical profiles.
The profiler inspects column types, value distributions, cardinality, null rates, and inter-table foreign-key relationships to build a comprehensive DatasetProfile that the SchemaBuilder can convert into a SpindleSchema.
Classes¶
ColumnProfile
dataclass
¶
Statistical profile of a single column.
TableProfile
dataclass
¶
Profile of a single table (DataFrame).
DatasetProfile
dataclass
¶
Profile of a multi-table dataset.
DataProfiler
¶
Analyse one or more DataFrames and produce profiles.
Methods:¶
profile_dataframe(df, table_name='table')
¶
Profile a single DataFrame.
profile_dataset(tables)
¶
Profile a dict of DataFrames and detect cross-table relationships.
profile(df, table_name='table')
¶
Alias for profile_dataframe(). Profile a single DataFrame.
from_csv(path, table_name=None, sample_rows=None, **kwargs)
classmethod
¶
Profile a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the CSV file. |
required |
table_name
|
str | None
|
Name for the table profile. Defaults to the filename stem. |
None
|
sample_rows
|
int | None
|
If set, sample this many rows before profiling. |
None
|
**kwargs
|
Passed to DataProfiler constructor (fit_threshold, top_n_values, etc.). |
{}
|