Skip to content

chunk_worker

sqllocks_spindle.engine.chunk_worker

Subprocess worker function for multi-process chunk generation.

Functions:

generate_chunk(schema_path, seed, offset, count)

Generate a chunk of synthetic data for all tables in a schema.

Designed to run inside a ProcessPoolExecutor worker. Must be importable at the top level (no closure captures) and return plain Python lists so the result pickles cleanly across process boundaries.

Parameters:

Name Type Description Default
schema_path str

Path to a .spindle.json file.

required
seed int

Random seed for this chunk. Each chunk should use a unique seed derived from the base seed and chunk index so that chunks are independent but individually reproducible.

required
offset int

Row offset for PK sequence columns. A chunk with offset=N and count=M will produce sequence IDs starting at start + N * step (e.g. offset=100, start=1, step=1 → IDs 101-200).

required
count int

Number of rows to generate per table.

required

Returns:

Type Description
dict[str, dict[str, list]]

{table_name: {column_name: [values...]}} — plain Python lists,

dict[str, dict[str, list]]

NOT numpy arrays.