cli
sqllocks_spindle.cli
¶
Spindle CLI — command-line interface for data generation.
Functions:¶
main()
¶
Spindle by SQLLocks — Multi-domain synthetic data generator for Microsoft Fabric.
generate(domain_name, scale, seed, output, fmt, mode, dry_run, schema_name, sql_ddl, sql_drop, sql_go, sql_dialect, connection_string, auth_method, write_mode, batch_size, staging_path)
¶
Generate synthetic data for a domain.
Example: spindle generate retail --scale small --seed 42 Example: spindle generate /path/to/schema.spindle.json --format csv --output /tmp/out/
describe(domain_name, mode)
¶
Describe a domain's schema without generating data.
Example: spindle describe retail
list_cmd()
¶
List available domains and their profiles.
Example: spindle list
validate(schema_path)
¶
Validate a .spindle.json schema file.
Example: spindle validate my_schema.spindle.json
stream(domain_name, table, scale, seed, rate, max_events, duration, out_of_order, sink_type, output, mode, realtime, burst_spec, anomaly_fraction)
¶
Stream synthetic events row-by-row from a domain table.
Example: spindle stream retail --table order --max-events 1000 --sink file --output events.jsonl
to_star(domain_name, scale, seed, output, fmt)
¶
Generate data and export as a star schema (dim_ + fact_ tables).
Example: spindle to-star retail --scale small --output ./star/
to_cdm(domain_name, scale, seed, output, fmt, model_name)
¶
Generate data and export as a Microsoft CDM folder (model.json + data files).
Produces a CDM folder compatible with Fabric CDM connectors, Dataverse, Power Platform, and Azure Data Lake Storage.
Example: spindle to-cdm retail --scale small --output ./cdm/
learn(input_path, output, input_fmt, domain)
¶
Infer a .spindle.json schema from existing data files.
Reads CSV/Parquet/JSONL files from INPUT_PATH (file or directory), profiles column types, distributions, and relationships, then generates a ready-to-use Spindle schema.
Example: spindle learn ./data/ --output my_schema.spindle.json
export_model(domain_name, scale, source_type, source_name, output, include_measures, schema_name)
¶
Export a domain schema as a Power BI / Fabric semantic model (.bim).
Generates TOM JSON at compatibilityLevel 1604 with typed columns, relationships, M expressions, and auto-generated DAX measures.
Example: spindle export-model retail --source-type lakehouse --output retail.bim
from_ddl(input_file, output, domain, scale, smart, explain)
¶
Import SQL DDL (CREATE TABLE) into a .spindle.json schema.
Parses SQL Server, PostgreSQL, MySQL, and ANSI SQL dialects. With --smart (default), infers realistic distributions, FK patterns, temporal seasonality, and business rules from schema structure.
Example: spindle from-ddl adventureworks.sql --output aw.spindle.json
continue_cmd(domain_name, input_dir, output, fmt, inserts, update_fraction, delete_fraction, seed)
¶
Generate incremental changes (inserts, updates, deletes) from existing data.
Reads existing data files, then generates new rows, status updates, and soft deletes tagged with _delta_type and _delta_timestamp.
Example: spindle continue retail --input ./data/ --output ./deltas/ --inserts 50
time_travel(domain_name, months, scale, output, fmt, growth_rate, churn_rate, seed)
¶
Generate monthly point-in-time snapshots showing data evolution.
Produces N+1 snapshots (month 0 = initial, then N months of evolution) with configurable growth, churn, and update rates.
Example: spindle time-travel retail --months 6 --output ./snapshots/
compare(real_path, synth_path, input_fmt, output)
¶
Compare real vs synthetic data and generate a fidelity report.
Compares column distributions, null rates, cardinality, and statistical tests to produce a 0-100 fidelity score.
REAL_PATH and SYNTH_PATH should be directories containing data files (one file per table) in the specified format.
Example: spindle compare ./real_data/ ./synth_data/ --output report.md
verify(data_path, input_fmt, schema_path, statistical, output, strict)
¶
Verify synthetic data quality and statistical integrity.
Loads generated data from DATA_PATH (file or directory), runs a suite of validation gates, and exits 0 if all pass.
Without --schema: only row counts are reported (no schema = no gates). With --schema: adds schema conformance, null constraint, PK uniqueness, and referential integrity checks. With --statistical: also runs KS test (numeric) and chi-squared (enum) against schema-fitted distribution parameters. Requires: pip install sqllocks-spindle[inference]
Examples: spindle verify ./output/ spindle verify data.csv --schema retail.spindle.json spindle verify ./output/ --schema schema.json --statistical --output report.md spindle verify ./output/ --schema schema.json --output report.json
composite(preset_or_domains, scale, seed, output, fmt)
¶
Generate data from a composite preset or ad-hoc domain combination.
Use a preset name or combine domains with '+':
Examples: spindle composite enterprise --scale small spindle composite retail+hr+financial --scale small --output ./data/
list_presets_cmd()
¶
List available composite presets.
mask(input_path, output, input_fmt, seed, exclude)
¶
Replace PII in data files with synthetic values.
Detects PII columns (email, phone, name, SSN, etc.) via column name heuristics and replaces values with realistic synthetic data while preserving null patterns and distributions.
Example: spindle mask ./real_data/ --output ./masked/
profile()
¶
Manage domain profiles — export, import, and list.
profile_export(domain_name, output, profile_name)
¶
Export a domain profile to a portable JSON file.
profile_import(profile_path, domain_name, save_as)
¶
Import a profile into a domain's profiles/ directory.
profile_list(domain_name)
¶
List available profiles for a domain.
profile_validate(artifact_path, safe_mode, as_json)
¶
Static leak scanner over a serialized profile artifact (STORY-010 / ADR-006).
Operates on the artifact file ONLY — never the live data. Exits 0 only on a proven-clean artifact; non-zero on any leak. Use as a pre-commit / CI gate:
spindle profile validate --safe ./profile.json
Add --json for machine-readable output.
profile_capture(data_path, output, input_fmt, name)
¶
Capture the SHAPE of real data into a portable, PII-free profile JSON.
Reads data files (one per table), records categorical distributions, and
writes a small committable profile.json — no raw rows. Commit it next to
your dbt models; diff it with spindle profile diff.
spindle profile capture ./prod_export/ -o prod.profile.json
profile_diff(profile_a, profile_b, threshold, as_json, min_drift)
¶
Diff the SHAPE of two profile artifacts — a git diff for your data.
Reports per-distribution drift (total-variation distance) and an overall
drift score. Works on profile capture / profile export output.
Use --threshold as a CI gate to fail builds on shape drift.
spindle profile diff prod.json staging.json --threshold 0.2
profile_registry()
¶
Manage the profile registry — list, save, load, diff, validate.
registry_list(system, table, tags, root)
¶
List profiles in the registry.
registry_save(domain_name, system, name, scale, seed, tags, description, root)
¶
Generate reference data and save as a named profile.
registry_delete(identity, root)
¶
Delete a profile from the registry.
registry_tag(identity, tags, remove, root)
¶
Add or remove tags on a profile.
registry_diff(identity_a, identity_b, root)
¶
Show column diff between two registry profiles.
registry_reindex(root)
¶
Rebuild the registry index from files on disk.
registry_validate(identity, domain_name, scale, seed, output, root)
¶
Run fidelity validation comparing a profile to freshly generated data.
publish(domain_name, scale, seed, mode, target, workspace_id, lakehouse_id, base_path, connection_string, database, auth_method, fmt, credential_ref, dry_run)
¶
Generate and publish data to a Fabric workspace.
Generates synthetic data and pushes it directly to a Fabric Lakehouse, Eventhouse, or SQL Database endpoint.
Examples:
spindle publish retail --target lakehouse --base-path abfss://ws@onelake.dfs.fabric.microsoft.com/lh.Lakehouse
spindle publish retail --target sql-database --connection-string "env://SPINDLE_SQL_CONNECTION"
spindle publish retail --target eventhouse --connection-string "https://eh.kusto.fabric.microsoft.com" --database mydb
notebook(domain_name, scale, seed, output, target)
¶
Generate a ready-to-run Fabric notebook for a domain.
Creates a .ipynb notebook that installs Spindle, generates data, and writes to a Lakehouse (or other target).
Example: spindle notebook retail --scale medium -o ./notebooks/retail_demo.ipynb
deploy_notebook(domain_name, workspace, scale, seed, auth_method, notebook_name)
¶
Generate and deploy a notebook to a Fabric workspace.
Creates a Spindle notebook and uploads it to the specified Fabric workspace using the Fabric REST API.
Example: spindle deploy-notebook retail --workspace "Demo" --auth cli
setup_fabric(workspace, auth_method, create_lakehouse, env_name, snippet)
¶
Set up a Fabric environment with Spindle pre-installed.
Creates a Fabric Environment item with sqllocks-spindle and dependencies, and optionally creates a Lakehouse for output.
Example: spindle setup-fabric --workspace "Demo" --auth cli --create-lakehouse
demo()
¶
Demo engine — run Spindle demos for conference, client, and workshop use.
demo_init(name, workspace_id, warehouse_conn, eventhouse_uri, sql_db_conn, lakehouse_id, auth)
¶
Configure a named connection profile for Fabric targets.
demo_list()
¶
Show all available demo scenarios.
demo_run(scenario, mode, connection, input_file, rows, domain, domains, env_name, output_formats, dry_run, estimate_only, seed, scale_mode)
¶
Run a demo scenario.
demo_preflight(connection)
¶
Validate connections to configured Fabric targets.
demo_cleanup(session_id, dry_run, connection)
¶
Remove all artifacts from a demo session.
demo_status(session_id)
¶
Show status of a demo session.
demo_notebook(scenario, mode, output)
¶
Generate a Fabric notebook for a demo scenario.
demo_report(session_id, fmt, output)
¶
Generate a report for a completed demo session.