Troubleshooting¶
Common issues and how to fix them.
Installation¶
ModuleNotFoundError: No module named 'sqllocks_spindle'¶
You haven't installed the package, or you're in the wrong virtual environment.
pip install sqllocks-spindle
# Verify:
python -c "import sqllocks_spindle; print(sqllocks_spindle.__version__)"
ImportError: cannot import name 'EventHubSink'¶
Streaming sinks require the [streaming] extra:
ModuleNotFoundError: No module named 'pyarrow'¶
Parquet output requires the [parquet] extra:
ModuleNotFoundError: No module named 'openpyxl'¶
Excel output requires the [excel] extra:
Generation¶
KeyError when accessing a table from GenerationResult¶
Table names are case-sensitive and use snake_case. Use result.table_names to see available tables:
result = Spindle().generate(domain=RetailDomain(), scale="small", seed=42)
print(result.table_names)
# ['customer', 'address', 'product_category', 'product', 'store', 'promotion', 'order', 'order_line', 'return']
Generation is slow at large scales¶
Large and xlarge scales generate millions of rows. Tips:
- Use
--dry-runfirst to see expected row counts - Use Parquet output (
--format parquet) instead of CSV for faster writes - For
xlargescale, use Fabric Spark notebooks — pandas can't handle 100M+ rows in memory - Close other memory-intensive applications
MemoryError at xlarge scale¶
The xlarge preset generates 100M+ rows and requires 16GB+ RAM. For extreme scales:
- Use Fabric notebooks with Spark (distributed memory)
- Generate one domain at a time
- Use the streaming engine to emit data incrementally instead of materializing everything in memory
Integrity check returns errors¶
result.verify_integrity() checks FK relationships. If it returns errors:
- This is a bug — Spindle should always produce referentially intact data. Please open an issue with your domain, scale, and seed.
CLI¶
spindle: command not found¶
The CLI is installed as a script entry point. Ensure your virtual environment is activated:
Or run as a module:
spindle generate produces empty output directory¶
Check that you specified --output:
Without --output, results are only printed to stdout.
Fabric¶
LakehouseFilesWriter raises authentication errors¶
Ensure you're running in a Fabric notebook or have az login configured:
The Fabric runtime auto-detects authentication. Outside Fabric, use --auth cli.
Delta writes fail with schema mismatch¶
If writing to an existing Delta table, Spindle's schema must match. Use overwriteSchema option:
spark.createDataFrame(df).write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(path)
OneLakePaths returns wrong paths¶
Ensure you're running inside a Fabric notebook. OneLakePaths reads environment variables set by the Fabric runtime (FABRIC_RUNTIME, TRIDENT_RUNTIME_VERSION). Outside Fabric, construct paths manually.
Chaos Engine¶
Chaos mutations don't appear in output¶
Check your chaos intensity. The calm preset has low injection probability:
from sqllocks_spindle.chaos import ChaosEngine, ChaosConfig
config = ChaosConfig(intensity="stormy") # Higher injection rates
engine = ChaosEngine(config)
corrupted = engine.corrupt_dataframe(df, day=5)
Chaos corrupts more data than expected¶
The hurricane preset (5x multiplier) is intentionally aggressive. Use calm (0.25x) or moderate (1x) for typical testing.
Streaming¶
Events arrive out of order¶
This is by design when out_of_order=True in StreamConfig. Spindle intentionally reorders events to test pipeline robustness. Set out_of_order=False for ordered delivery.
Stream rate is lower than configured¶
In realtime=True mode, Spindle uses token-bucket rate limiting with Poisson inter-arrivals. Actual throughput will vary around the target rate. Set realtime=False for maximum throughput (no rate limiting).
Still stuck?¶
Open an issue on GitHub with:
- Your Spindle version (
python -c "import sqllocks_spindle; print(sqllocks_spindle.__version__)") - Python version (
python --version) - The command or code that failed
- The full error traceback