scd2_file_drops
sqllocks_spindle.simulation.scd2_file_drops
¶
SCD Type 2 file-drop simulator — generate initial full loads and daily deltas with SCD2-style versioning (valid_from / valid_to / is_current tracking).
Produces an initial snapshot followed by num_delta_days daily delta files, each containing INSERT rows for new business entities and UPDATE pairs for changed entities (expired old row + new current row).
Classes¶
SCD2FileDropConfig
dataclass
¶
Configuration for SCD2 file-drop simulation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
domain
|
str
|
Domain name used in path construction (e.g. |
'default'
|
base_path
|
str
|
Root directory for landing files. |
'Files/landing'
|
business_key_column
|
str
|
Column that identifies the business entity. |
'id'
|
scd2_columns
|
list[str]
|
Columns to track for changes. |
list()
|
effective_date_column
|
str
|
Name of the valid-from column. |
'valid_from'
|
end_date_column
|
str
|
Name of the valid-to column. |
'valid_to'
|
is_current_column
|
str
|
Name of the is-current flag column. |
'is_current'
|
initial_load_date
|
str
|
Date string for the initial snapshot ( |
'2024-01-01'
|
num_delta_days
|
int
|
Number of daily delta files to generate. |
30
|
daily_change_rate
|
float
|
Fraction of records that change per day. |
0.05
|
daily_new_rate
|
float
|
Fraction of new records per day (relative to initial count). |
0.02
|
formats
|
list[str]
|
File formats to write ( |
(lambda: ['parquet'])()
|
manifest_enabled
|
bool
|
Write a |
True
|
seed
|
int
|
Random seed for reproducibility. |
42
|
SCD2FileDropResult
dataclass
¶
Result of an SCD2 file-drop simulation run.
Attributes:
| Name | Type | Description |
|---|---|---|
initial_load_path |
Path
|
Path to the initial full-load file. |
delta_paths |
list[Path]
|
Paths to daily delta files. |
manifest_paths |
list[Path]
|
Paths to |
stats |
dict[str, Any]
|
Aggregate statistics for the simulation run. |
SCD2FileDropSimulator
¶
Simulate an upstream source landing SCD2-versioned files over time.
Generates an initial full snapshot and then daily delta files containing INSERT rows (new entities) and UPDATE rows (changed entities with valid_from / valid_to / is_current tracking).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
dict[str, DataFrame]
|
Mapping of |
required |
config
|
SCD2FileDropConfig
|
:class: |
required |
Example::
from sqllocks_spindle.simulation.scd2_file_drops import (
SCD2FileDropSimulator,
SCD2FileDropConfig,
)
cfg = SCD2FileDropConfig(
domain="retail",
business_key_column="customer_id",
scd2_columns=["status", "address", "tier"],
)
result = SCD2FileDropSimulator(tables=gen_result.tables, config=cfg).run()