Spindle by SQLLocks¶
Multi-domain, schema-aware synthetic data generator for Microsoft Fabric.
"Synthea is to MITRE as Spindle is to SQLLocks"
What is Spindle?¶
Spindle generates statistically realistic, relationally correct datasets for Microsoft Fabric. Not random noise — structured data with proper FK integrity, Pareto order distributions, seasonal temporal patterns, and real US addresses with lat/lng coordinates ready for Power BI maps.
from sqllocks_spindle import Spindle, RetailDomain
result = Spindle().generate(domain=RetailDomain(), scale="small", seed=42)
print(result.summary())
# GenerationResult(9 tables, 21,300 total rows, 0.3s)
Key Features¶
-
:material-database-outline:{ .lg .middle } 13 Industry Domains
Retail, Healthcare, Financial, Supply Chain, IoT, HR, Insurance, Marketing, Education, Real Estate, Manufacturing, Telecom, Capital Markets — each with calibrated distribution profiles from real-world data.
-
:material-cog-outline:{ .lg .middle } 21 Generation Strategies
Sequence, Faker, weighted enum, statistical distributions (Pareto, Zipf, log-normal), temporal seasonality, formulas, correlated columns, FK references, and more.
-
:material-lightning-bolt:{ .lg .middle } Chaos Engine
6 corruption categories (schema, value, file, referential, temporal, volume) with 4 intensity presets. Test your pipeline against realistic data quality issues.
-
:material-microsoft:{ .lg .middle } Fabric-Native
Write to Lakehouse, Warehouse, SQL Database, Eventhouse, and Semantic Models. Auto-detects Fabric runtime. Star schema and CDM folder export built in.
-
:material-check-decagram:{ .lg .middle } Validation Gates
8 built-in gates (referential integrity, schema conformance, null constraints, and more) with automatic quarantine for failed artifacts.
-
:material-play-speed:{ .lg .middle } Streaming + Simulation
Poisson inter-arrivals, token-bucket rate limiting, anomaly injection, file-drop simulation with late arrivals and schema drift, and hybrid batch+stream modes.
Where Do I Start?¶
| I am a... | Start here |
|---|---|
| :material-school: Developer new to synthetic data | Before You Start then Quickstart |
| :material-pipe: Data engineer building Fabric pipelines | Quickstart then Fabric Tutorials |
| :material-database: DBA who wants SQL test data | CLI Quickstart — no Python required |
| :material-chart-bell-curve: Data scientist evaluating distributions | Methodology then Domain Catalog |
| :material-sitemap: Architect evaluating Spindle | Why Spindle? then Domain Catalog |
| :material-cog: DevOps automating data generation | CLI Quickstart then CI Integration |
| :material-presentation: Presenter building a demo | 60-Second Overview |
| :material-microsoft: Already in a Fabric notebook | Fabric Quickstart |
Quick Links¶
| :material-rocket-launch: Quickstart (Python) | Generate your first dataset in 5 minutes |
| :material-console: Quickstart (CLI) | Generate data from the command line |
| :material-microsoft: Quickstart (Fabric) | Generate data in a Fabric notebook |
| :material-school: Tutorials | 17 step-by-step learning paths |
| :material-download: Installation | pip install sqllocks-spindle and optional extras |
| :material-console: CLI Cheatsheet | All CLI commands at a glance |
| :fontawesome-brands-github: GitHub | Source code, issues, contributing |
| :fontawesome-brands-python: PyPI | pip install sqllocks-spindle |
Why Spindle?¶
Every Fabric project starts with the same problem: where's the test data?
- Dashboards look flat because every metric has uniform variance
- Pipelines pass testing but fail on real cardinality
- ML models train on data that has no signal to find
- Stakeholders can't relate to
Customer_001buyingProduct_ABCfor$10.00
Spindle solves this with rule-based, transparent generation. Unlike ML generators that output black-box models, Spindle gives you a human-readable .spindle.json schema you can inspect, tweak, and version control. All 13 domains have distributions sourced from published data — BLS, NAIC, NCES, NAR, FDIC, Federal Reserve, SEC, and 40+ more. See the Methodology for per-parameter citations.
MIT License | Built by Jonathan Stewart / SQLLocks