file_drop
sqllocks_spindle.simulation.file_drop
¶
File-drop simulator — simulate upstream sources landing files over a date range.
Classes¶
FileDropConfig
dataclass
¶
Configuration for file-drop simulation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
domain
|
str
|
Domain name used in path construction (e.g. |
'default'
|
base_path
|
str
|
Root directory for landing files. Maps to
|
'Files/landing'
|
cadence
|
str
|
Drop cadence — |
'daily'
|
date_range_start
|
str
|
Inclusive start date as |
''
|
date_range_end
|
str
|
Inclusive end date as |
''
|
partitioning
|
str
|
Partition folder template. |
'dt=YYYY-MM-DD'
|
formats
|
list[str]
|
File formats to write ( |
(lambda: ['parquet'])()
|
file_naming
|
str
|
File naming template. Placeholders:
|
'{domain}_{entity}_{dt}_{seq}.{ext}'
|
entities
|
list[str]
|
Restrict simulation to these table names. Empty = all tables. |
list()
|
manifest_enabled
|
bool
|
Write a |
True
|
done_flag_enabled
|
bool
|
Write a |
True
|
lateness_enabled
|
bool
|
Inject late-arriving rows (data from previous days). |
True
|
lateness_probability
|
float
|
Per-row probability of being marked late. |
0.1
|
max_days_late
|
int
|
Maximum staleness for late rows. |
3
|
duplicates_enabled
|
bool
|
Inject duplicate rows. |
False
|
duplicate_probability
|
float
|
Per-row probability of duplication. |
0.02
|
backfill_enabled
|
bool
|
Re-drop historical partitions. |
False
|
max_days_back
|
int
|
How far back a backfill can reach. |
0
|
seed
|
int
|
Random seed for reproducibility. |
42
|
FileDropResult
dataclass
¶
Result of a file-drop simulation run.
Attributes:
| Name | Type | Description |
|---|---|---|
files_written |
list[Path]
|
All data file paths written. |
manifest_paths |
list[Path]
|
Paths to |
done_flag_paths |
list[Path]
|
Paths to |
stats |
dict[str, Any]
|
Per-entity statistics dict. |
FileDropSimulator
¶
Simulate an upstream source dropping files on a cadence over a date range.
For each simulated time slot the simulator
- Slices rows belonging to that slot (temporal column or round-robin).
- Writes partitioned data files to disk.
- Optionally writes a manifest and done-flag.
- Optionally injects late arrivals, duplicates, and backfills.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
dict[str, DataFrame]
|
Mapping of |
required |
config
|
FileDropConfig
|
:class: |
required |
Example::
from sqllocks_spindle.simulation import FileDropSimulator, FileDropConfig
cfg = FileDropConfig(
domain="retail",
date_range_start="2024-01-01",
date_range_end="2024-01-31",
)
result = FileDropSimulator(tables=gen_result.tables, config=cfg).run()