Troubleshooting¶

Common issues and how to fix them.

Installation¶

`ModuleNotFoundError: No module named 'sqllocks_spindle'`¶

You haven't installed the package, or you're in the wrong virtual environment.

pip install sqllocks-spindle
# Verify:
python -c "import sqllocks_spindle; print(sqllocks_spindle.__version__)"

`ImportError: cannot import name 'EventHubSink'`¶

Streaming sinks require the [streaming] extra:

pip install sqllocks-spindle[streaming]

`ModuleNotFoundError: No module named 'pyarrow'`¶

Parquet output requires the [parquet] extra:

pip install sqllocks-spindle[parquet]

`ModuleNotFoundError: No module named 'openpyxl'`¶

Excel output requires the [excel] extra:

pip install sqllocks-spindle[excel]

Generation¶

`KeyError` when accessing a table from `GenerationResult`¶

Table names are case-sensitive and use snake_case. Use result.table_names to see available tables:

result = Spindle().generate(domain=RetailDomain(), scale="small", seed=42)
print(result.table_names)
# ['customer', 'address', 'product_category', 'product', 'store', 'promotion', 'order', 'order_line', 'return']

Generation is slow at large scales¶

Large and xlarge scales generate millions of rows. Tips:

Use --dry-run first to see expected row counts
Use Parquet output (--format parquet) instead of CSV for faster writes
For xlarge scale, use Fabric Spark notebooks — pandas can't handle 100M+ rows in memory
Close other memory-intensive applications

`MemoryError` at xlarge scale¶

The xlarge preset generates 100M+ rows and requires 16GB+ RAM. For extreme scales:

Use Fabric notebooks with Spark (distributed memory)
Generate one domain at a time
Use the streaming engine to emit data incrementally instead of materializing everything in memory

Integrity check returns errors¶

result.verify_integrity() checks FK relationships. If it returns errors:

This is a bug — Spindle should always produce referentially intact data. Please open an issue with your domain, scale, and seed.

CLI¶

`spindle: command not found`¶

The CLI is installed as a script entry point. Ensure your virtual environment is activated:

source .venv/bin/activate  # macOS/Linux
.venv\Scripts\activate     # Windows
spindle list

Or run as a module:

python -m sqllocks_spindle.cli list

`spindle generate` produces empty output directory¶

Check that you specified --output:

spindle generate retail --scale small --format csv --output ./output/

Without --output, results are only printed to stdout.

Fabric¶

`LakehouseFilesWriter` raises authentication errors¶

Ensure you're running in a Fabric notebook or have az login configured:

az login --tenant <your-tenant-id>

The Fabric runtime auto-detects authentication. Outside Fabric, use --auth cli.

Delta writes fail with schema mismatch¶

If writing to an existing Delta table, Spindle's schema must match. Use overwriteSchema option:

spark.createDataFrame(df).write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(path)

`OneLakePaths` returns wrong paths¶

Ensure you're running inside a Fabric notebook. OneLakePaths reads environment variables set by the Fabric runtime (FABRIC_RUNTIME, TRIDENT_RUNTIME_VERSION). Outside Fabric, construct paths manually.

Chaos Engine¶

Chaos mutations don't appear in output¶

Check your chaos intensity. The calm preset has low injection probability:

from sqllocks_spindle.chaos import ChaosEngine, ChaosConfig

config = ChaosConfig(intensity="stormy")  # Higher injection rates
engine = ChaosEngine(config)
corrupted = engine.corrupt_dataframe(df, day=5)

Chaos corrupts more data than expected¶

The hurricane preset (5x multiplier) is intentionally aggressive. Use calm (0.25x) or moderate (1x) for typical testing.

Streaming¶

Events arrive out of order¶

This is by design when out_of_order=True in StreamConfig. Spindle intentionally reorders events to test pipeline robustness. Set out_of_order=False for ordered delivery.

Stream rate is lower than configured¶

In realtime=True mode, Spindle uses token-bucket rate limiting with Poisson inter-arrivals. Actual throughput will vary around the target rate. Set realtime=False for maximum throughput (no rate limiting).

Still stuck?¶

Open an issue on GitHub with:

Your Spindle version (python -c "import sqllocks_spindle; print(sqllocks_spindle.__version__)")
Python version (python --version)
The command or code that failed
The full error traceback