Skip to content

Spindle by SQLLocks

Multi-domain, schema-aware synthetic data generator for Microsoft Fabric.

"Synthea is to MITRE as Spindle is to SQLLocks"


What is Spindle?

Spindle generates statistically realistic, relationally correct datasets for Microsoft Fabric. Not random noise — structured data with proper FK integrity, Pareto order distributions, seasonal temporal patterns, and real US addresses with lat/lng coordinates ready for Power BI maps.

pip install sqllocks-spindle
from sqllocks_spindle import Spindle, RetailDomain

result = Spindle().generate(domain=RetailDomain(), scale="small", seed=42)
print(result.summary())
# GenerationResult(9 tables, 21,300 total rows, 0.3s)

Key Features

  • :material-database-outline:{ .lg .middle } 13 Industry Domains


    Retail, Healthcare, Financial, Supply Chain, IoT, HR, Insurance, Marketing, Education, Real Estate, Manufacturing, Telecom, Capital Markets — each with calibrated distribution profiles from real-world data.

    :octicons-arrow-right-24: Domain Catalog

  • :material-cog-outline:{ .lg .middle } 21 Generation Strategies


    Sequence, Faker, weighted enum, statistical distributions (Pareto, Zipf, log-normal), temporal seasonality, formulas, correlated columns, FK references, and more.

    :octicons-arrow-right-24: Strategy Reference

  • :material-lightning-bolt:{ .lg .middle } Chaos Engine


    6 corruption categories (schema, value, file, referential, temporal, volume) with 4 intensity presets. Test your pipeline against realistic data quality issues.

    :octicons-arrow-right-24: Chaos Guide

  • :material-microsoft:{ .lg .middle } Fabric-Native


    Write to Lakehouse, Warehouse, SQL Database, Eventhouse, and Semantic Models. Auto-detects Fabric runtime. Star schema and CDM folder export built in.

    :octicons-arrow-right-24: Fabric Guides

  • :material-check-decagram:{ .lg .middle } Validation Gates


    8 built-in gates (referential integrity, schema conformance, null constraints, and more) with automatic quarantine for failed artifacts.

    :octicons-arrow-right-24: Validation Guide

  • :material-play-speed:{ .lg .middle } Streaming + Simulation


    Poisson inter-arrivals, token-bucket rate limiting, anomaly injection, file-drop simulation with late arrivals and schema drift, and hybrid batch+stream modes.

    :octicons-arrow-right-24: Streaming Guide


Where Do I Start?

I am a... Start here
:material-school: Developer new to synthetic data Before You Start then Quickstart
:material-pipe: Data engineer building Fabric pipelines Quickstart then Fabric Tutorials
:material-database: DBA who wants SQL test data CLI Quickstart — no Python required
:material-chart-bell-curve: Data scientist evaluating distributions Methodology then Domain Catalog
:material-sitemap: Architect evaluating Spindle Why Spindle? then Domain Catalog
:material-cog: DevOps automating data generation CLI Quickstart then CI Integration
:material-presentation: Presenter building a demo 60-Second Overview
:material-microsoft: Already in a Fabric notebook Fabric Quickstart
:material-rocket-launch: Quickstart (Python) Generate your first dataset in 5 minutes
:material-console: Quickstart (CLI) Generate data from the command line
:material-microsoft: Quickstart (Fabric) Generate data in a Fabric notebook
:material-school: Tutorials 17 step-by-step learning paths
:material-download: Installation pip install sqllocks-spindle and optional extras
:material-console: CLI Cheatsheet All CLI commands at a glance
:fontawesome-brands-github: GitHub Source code, issues, contributing
:fontawesome-brands-python: PyPI pip install sqllocks-spindle

Why Spindle?

Every Fabric project starts with the same problem: where's the test data?

  • Dashboards look flat because every metric has uniform variance
  • Pipelines pass testing but fail on real cardinality
  • ML models train on data that has no signal to find
  • Stakeholders can't relate to Customer_001 buying Product_ABC for $10.00

Spindle solves this with rule-based, transparent generation. Unlike ML generators that output black-box models, Spindle gives you a human-readable .spindle.json schema you can inspect, tweak, and version control. All 13 domains have distributions sourced from published data — BLS, NAIC, NCES, NAR, FDIC, Federal Reserve, SEC, and 40+ more. See the Methodology for per-parameter citations.


MIT License | Built by Jonathan Stewart / SQLLocks