Skip to content

strategies

sqllocks_spindle.engine.strategies

Column generation strategies.

Classes

Strategy

Bases: ABC

Base class for all column generation strategies.

Methods:
generate(column, config, ctx) abstractmethod

Generate values for a column.

Parameters:

Name Type Description Default
column ColumnDef

The column definition from the schema.

required
config dict[str, Any]

The generator config dict from the column def.

required
ctx GenerationContext

The generation context with RNG, ID manager, etc.

required

Returns:

Type Description
ndarray

numpy array of generated values with length ctx.row_count.

apply_nulls(values, column, ctx)

Apply null values based on the column's null_rate.

StrategyRegistry

Registry mapping strategy names to Strategy instances.

Methods:
load_entrypoint_plugins(group='spindle.strategies')

Discover and register strategies from installed entrypoint plugins.

Third-party packages can register custom strategies by defining an entrypoint in their pyproject.toml::

[project.entry-points."spindle.strategies"]
my_strategy = "my_package.strategies:MyStrategy"

The entrypoint value must be a class implementing :class:Strategy.

SequenceStrategy

Bases: Strategy

Generate auto-incrementing integer sequences.

UUIDStrategy

Bases: Strategy

Generate UUID v4 strings.

FakerStrategy

Bases: Strategy

Generate realistic fake data using Faker or native vectorized fallback.

WeightedEnumStrategy

Bases: Strategy

Pick from a weighted list of values.

DistributionStrategy

Bases: Strategy

Generate values from statistical distributions.

TemporalStrategy

Bases: Strategy

Generate dates and timestamps with temporal patterns.

FormulaStrategy

Bases: Strategy

Compute a column value from other columns using simple expressions.

Supports expressions like

"quantity * unit_price" "quantity * unit_price * (1 - discount_percent / 100)"

DerivedStrategy

Bases: Strategy

Derive a column's values from another column with an optional transformation.

CorrelatedStrategy

Bases: Strategy

Generate a column whose values are derived from another column in the same row.

ForeignKeyStrategy

Bases: Strategy

Generate foreign key values by referencing parent table PKs.

LookupStrategy

Bases: Strategy

Look up a value from a parent table using a FK in the current row.

Example: order_line.unit_price = product.unit_price (looked up via product_id)

ReferenceDataStrategy

Bases: Strategy

Pick values from a built-in reference dataset.

PatternStrategy

Bases: Strategy

Generate values from a format pattern with tokens.

Supports tokens like

{seq:6} → zero-padded sequence (e.g., "000042") {random:4} → random alphanumeric (e.g., "A3F1") {column_name} → value from another column in current row

ConditionalStrategy

Bases: Strategy

Generate column values conditionally based on another column's state.

ComputedStrategy

Bases: Strategy

Placeholder for computed columns that depend on child table data.

These columns are initially filled with NaN/placeholder values during the main generation pass, then back-filled during the compute phase after child tables are generated.

Examples:

order.order_total = sum(order_line.line_total) per order_id

Methods:
backfill(parent_df, child_df, parent_pk, child_fk, child_column, target_column, rule='sum_children') staticmethod

Back-fill computed column from child table aggregation.

Parameters:

Name Type Description Default
parent_df

Parent DataFrame to update (the table owning target_column).

required
child_df

DataFrame to aggregate/lookup from.

required
parent_pk str

PK column name in parent_df.

required
child_fk str

FK column in child_df referencing parent_pk.

required
child_column str

Column in child_df to aggregate or copy.

required
target_column str

Column in parent_df to fill.

required
rule str

Aggregation rule. See below for supported values.

'sum_children'

Child-aggregation rules (parent_df is the parent, child_df is the child): sum_children, count_children, avg_children, min_children, max_children

Parent-lookup rule (parent_df is the child, child_df is the parent): lookup_parent — copy a value from parent_df[child_fk] → child_df[parent_pk] into target_column. Used when the "computed" column lives on the child table and copies a value from the parent.

LifecycleStrategy

Bases: Strategy

Generate phase labels based on weighted probabilities.

SelfReferencingStrategy

Bases: Strategy

Assign parent IDs within the same table to form a level hierarchy.

Row allocation
  • The first root_count rows become level-1 roots (parent = NULL)
  • Remaining rows are split evenly across levels 2..N
  • Each non-root row is assigned a random parent from the level above

Stashes level assignments into ctx.current_table[sr_level] for use by a downstream 'level' column.

SelfRefFieldStrategy

Bases: Strategy

Read a field stashed by SelfReferencingStrategy.

Example: reads level assignments stored by the self_referencing column.

"level": {
    "generator": {"strategy": "self_ref_field", "field": "level"}
}

FirstPerParentStrategy

Bases: Strategy

Mark the first row per parent FK group as True, rest as False.

RecordSampleStrategy

Bases: Strategy

Sample complete records from a reference dataset.

Acts as the anchor for a group of correlated columns. Samples N records (one per generated row) and writes ALL fields into ctx.current_table so that RecordFieldStrategy can read them without re-sampling.

RecordFieldStrategy

Bases: Strategy

Read a field from records already sampled by RecordSampleStrategy.

NativeStrategy

Bases: Strategy

Vectorized native data generation — replaces Faker for built-in providers.

Methods:
can_handle(provider)

Check if this strategy handles the given Faker provider.

SCD2Strategy

Bases: Strategy

Generate SCD Type 2 versioning metadata columns.