chunked_generator

`sqllocks_spindle.engine.chunked_generator` ¶

Chunked generation engine for billion-row scale.

Two-pass approach: 1. Generate ALL parent/dimension/reference tables fully (in-memory). 2. For each child table in dependency order, yield chunk_size rows at a time. Each chunk shares the same IDManager so FK references are valid.

Classes¶

`ChunkedGenerationResult` `dataclass` ¶

Result of a chunked generation run.

Parent tables are fully materialized (small). Child tables are available only via iter_chunks() to keep memory bounded.

Methods:¶

`iter_chunks(table_name)` ¶

Yield DataFrames of chunk_size rows for a child table.

Must be called in dependency order. Each table can only be iterated once.

`write_with(writer, **kwargs)` ¶

Convenience: write parent tables, then stream child chunks through a writer.

The writer must implement either

write_table(table_name, df, **kwargs) for individual DataFrames, or
stage_chunk(table_name, chunk_df, idx) + copy_into(table_name) for bulk writers.

`ChunkedSpindle` ¶

Generate billion-row datasets in bounded memory.

Uses a two-pass approach: 1. Parent tables generated fully in-memory (typically small). 2. Child tables generated in chunks of chunk_size rows.

Example::

cs = ChunkedSpindle()
result = cs.generate_chunked(
    domain=FinancialDomain(),
    scale="warehouse",
    chunk_size=1_000_000,
)

# Parent tables are immediately available
for name, df in result.parent_tables.items():
    print(f"{name}: {len(df)} rows")

# Child tables stream via iterator
for table_name in result.child_table_names:
    for chunk in result.iter_chunks(table_name):
        writer.write(chunk)

Methods:¶

`generate_chunked(domain=None, schema=None, scale=None, scale_overrides=None, seed=None, chunk_size=1000000, target_table=None, target_count=None)` ¶

Generate data with chunked child tables.

Parameters:

Name	Type	Description	Default
`domain`		A Domain instance.	`None`
`schema`	`Any`	Path to .spindle.json, raw dict, or parsed SpindleSchema.	`None`
`scale`	`str \| None`	Scale preset name.	`None`
`scale_overrides`	`dict[str, int] \| None`	Override row counts for specific tables.	`None`
`seed`	`int \| None`	Random seed for reproducibility.	`None`
`chunk_size`	`int`	Rows per chunk for child tables.	`1000000`
`target_table`	`str \| None`	Anchor table name — derive all other table counts proportionally from this table's target_count.	`None`
`target_count`	`int \| None`	Number of rows for the anchor table. Required when target_table is provided.	`None`

Returns:

Type	Description
`ChunkedGenerationResult`	ChunkedGenerationResult with parent tables materialized and
`ChunkedGenerationResult`	child tables available via iter_chunks().

chunked_generator

sqllocks_spindle.engine.chunked_generator ¶

Classes¶

ChunkedGenerationResult dataclass ¶

Methods:¶

iter_chunks(table_name) ¶

write_with(writer, **kwargs) ¶

ChunkedSpindle ¶

Methods:¶

generate_chunked(domain=None, schema=None, scale=None, scale_overrides=None, seed=None, chunk_size=1000000, target_table=None, target_count=None) ¶

`sqllocks_spindle.engine.chunked_generator` ¶

`ChunkedGenerationResult` `dataclass` ¶

`iter_chunks(table_name)` ¶

`write_with(writer, **kwargs)` ¶

`ChunkedSpindle` ¶

`generate_chunked(domain=None, schema=None, scale=None, scale_overrides=None, seed=None, chunk_size=1000000, target_table=None, target_count=None)` ¶