Skip to content

Chaos Engine

Spindle's chaos engine injects realistic data quality issues into generated datasets — schema drift, value corruption, orphaned FKs, temporal anomalies, volume spikes, and file-level corruption. Use it to test whether your pipelines handle real-world data problems.

Quick Start

from sqllocks_spindle.chaos import ChaosEngine, ChaosConfig

config = ChaosConfig(
    enabled=True,
    intensity="moderate",
    seed=42,
)
engine = ChaosEngine(config=config)

# Corrupt a DataFrame
corrupted_df = engine.corrupt_dataframe(df, day=10)

# Apply all applicable chaos categories
corrupted_df = engine.apply_all(df, day=15)

Six Chaos Categories

Category What It Corrupts
SCHEMA Column renames, type changes, missing columns, extra columns
VALUE NULL injection, out-of-range values, type mismatches, encoding errors
FILE Truncated files, corrupt headers, wrong formats, empty files
REFERENTIAL Orphan FKs, duplicate PKs, broken references
TEMPORAL Out-of-order timestamps, future dates, impossible date sequences
VOLUME Row count spikes (10x), empty partitions, duplicate batches

Four Intensity Presets

Preset Multiplier Description
calm 0.25x Occasional issues — realistic production noise
moderate 1.0x Regular data quality problems
stormy 2.5x Frequent issues — stress testing
hurricane 5.0x Everything breaks — chaos testing

ChaosConfig

from sqllocks_spindle.chaos import ChaosConfig, ChaosOverride

config = ChaosConfig(
    enabled=True,
    intensity="stormy",          # calm | moderate | stormy | hurricane
    seed=42,
    warmup_days=7,               # clean data for first N days
    chaos_start_day=8,           # chaos begins on this day
    escalation="gradual",        # gradual | random | front-loaded
    breaking_change_day=20,      # column drops/renames allowed after this day
    overrides=[                  # force specific chaos on specific days
        ChaosOverride(day=15, category="schema", params={"action": "drop_column"}),
    ],
)
Param Default Description
enabled False Master switch
intensity "moderate" Preset name or custom multiplier
seed 42 Random seed for reproducibility
warmup_days 7 Days of clean data before chaos starts
chaos_start_day 8 First day chaos can be injected
escalation "gradual" How injection probability increases over time
breaking_change_day 20 Day after which breaking schema changes are allowed
overrides [] List of ChaosOverride to force specific events

ChaosEngine Methods

engine = ChaosEngine(config=config)

# Decision: should chaos be injected on this day for this category?
if engine.should_inject(day=10, category="value"):
    ...

# Per-category injection
df = engine.corrupt_dataframe(df, day=10)          # VALUE chaos
df = engine.drift_schema(df, day=10)               # SCHEMA chaos
bytes_ = engine.corrupt_file(file_bytes, day=10)   # FILE chaos

# Cross-table chaos
tables = engine.inject_referential_chaos(tables_dict, day=10)  # REFERENTIAL

# Temporal chaos (specify which columns are dates)
df = engine.inject_temporal_chaos(df, date_columns=["order_date"], day=10)

# Volume chaos
df = engine.inject_volume_chaos(df, day=10)        # VOLUME

# Apply all applicable categories at once
df = engine.apply_all(
    df, day=15,
    tables_dict=all_tables,
    date_columns=["order_date", "ship_date"],
)

CLI Usage

# File-drop simulation with chaos
spindle simulate file-drop --domain retail --scale small \
  --start-date 2025-01-01 --end-date 2025-01-31 \
  --chaos-intensity stormy --output ./landing/

Escalation Modes

  • gradual — injection probability increases linearly from chaos_start_day to the end of the date range
  • random — each day has an independent random probability based on intensity
  • front-loaded — high probability early, decreasing over time (useful for testing recovery)

See Also