DDL Import¶

Import SQL DDL (CREATE TABLE statements) into a .spindle.json schema and generate synthetic data from any existing database design.

Quick Start¶

spindle from-ddl my_tables.sql --output my_schema.spindle.json
spindle generate --schema my_schema.spindle.json --scale small --output ./data/

from sqllocks_spindle.schema.ddl_parser import DdlParser
from sqllocks_spindle import Spindle

parser = DdlParser()
schema = parser.parse_file("my_tables.sql")

result = Spindle().generate(schema=schema, scale="small", seed=42)
print(result.summary())

Supported SQL Dialects¶

Dialect	Identity Syntax	Identifier Quoting	Notes
SQL Server / Fabric	`IDENTITY(1,1)`	`[brackets]`	T-SQL, schema-qualified names
PostgreSQL	`SERIAL`, `BIGSERIAL`	`"double quotes"`	Schema.table notation
MySQL	`AUTO_INCREMENT`	`backticks`	Standard MySQL syntax
ANSI SQL	—	`"double quotes"`	Generic fallback

The parser auto-detects the dialect from syntax patterns. No configuration needed.

What Gets Parsed¶

CREATE TABLE statements (with optional IF NOT EXISTS)
Column definitions with data types, precision/scale, and constraints
Primary keys — inline and table-level CONSTRAINT ... PRIMARY KEY
Foreign keys — inline REFERENCES, table-level FOREIGN KEY, and ALTER TABLE ADD CONSTRAINT
Identity/serial columns → sequence strategy
NOT NULL / NULL constraints → null_rate: 0.0 or default

Type-to-Strategy Mapping¶

The parser maps SQL types to appropriate generation strategies:

SQL Type	Generator Strategy	Details
`INT`, `BIGINT`, `SMALLINT`	`distribution: uniform`	Min/max based on type range
`DECIMAL(p,s)`, `NUMERIC`	`distribution: normal`	Mean/std derived from precision
`MONEY`, `SMALLMONEY`	`distribution: log_normal`	Realistic monetary distributions
`BIT`, `BOOLEAN`	`weighted_enum`	`{true: 0.85, false: 0.15}`
`DATETIME`, `DATE`, `TIMESTAMP`	`temporal: uniform`	Configurable date range
`UNIQUEIDENTIFIER`, `UUID`	`uuid`	UUID v4
`VARCHAR`, `NVARCHAR`, `TEXT`	Heuristic (see below)	Based on column name
`VARBINARY`, `IMAGE`	Skipped	Binary columns excluded

Column Name Heuristics¶

String columns are matched against 24 exact names and 7 suffix patterns:

Exact Matches¶

Column Name	Faker Provider
`first_name`	`first_name`
`last_name`	`last_name`
`email`	`email`
`phone`	`phone_number`
`address`	`street_address`
`city`	`city`
`state`	`state_abbr`
`zip_code`, `postal_code`	`zipcode`
`country`	`country`
`company`	`company`
`username`	`user_name`
`ssn`	`ssn`
`ip_address`	`ipv4`

Suffix Patterns¶

Suffix	Strategy
`*_name`	`faker: name`
`*_email`	`faker: email`
`*_phone`	`faker: phone_number`
`*_date`	`temporal`
`*_code`	`pattern`
`_type`, `_status`	`weighted_enum`
`*_id`	FK candidate (auto-linked)

Foreign Key Detection¶

FKs are resolved two ways:

Explicit constraints — parsed directly from FOREIGN KEY / REFERENCES clauses
Naming convention — columns ending in _id are matched to tables by name (e.g., customer_id → FK to customer.customer_id)

Both produce foreign_key strategy with Pareto distribution for realistic skew.

Scale Auto-Generation¶

The parser generates three scale presets based on table topology:

Scale	Root Tables (no FK deps)	Child Tables (with FK deps)
`small`	1,000 rows	2,500 rows
`medium`	10,000 rows	25,000 rows
`large`	100,000 rows	250,000 rows

CLI Reference¶

spindle from-ddl INPUT_FILE [OPTIONS]

Option	Default	Description
`INPUT_FILE`	—	Path to SQL DDL file (required)
`--output, -o`	—	Output `.spindle.json` path
`--domain`	`custom`	Domain name for the generated schema
`--scale`	—	Scale overrides: `small:table1=1000,table2=5000`

Example: End-to-End¶

-- my_tables.sql
CREATE TABLE customer (
    customer_id INT IDENTITY(1,1) PRIMARY KEY,
    first_name  NVARCHAR(50) NOT NULL,
    last_name   NVARCHAR(50) NOT NULL,
    email       NVARCHAR(100),
    created_at  DATETIME2 DEFAULT GETDATE()
);

CREATE TABLE [order] (
    order_id    INT IDENTITY(1,1) PRIMARY KEY,
    customer_id INT NOT NULL REFERENCES customer(customer_id),
    order_date  DATE NOT NULL,
    total       DECIMAL(10,2),
    status      VARCHAR(20)
);

# Import DDL
spindle from-ddl my_tables.sql --output my_schema.spindle.json

# Generate data
spindle generate --schema my_schema.spindle.json --scale small --seed 42 --output ./data/

The parser will:

Detect SQL Server dialect (from IDENTITY and bracket quoting)
Map customer_id to sequence strategy (identity column)
Map first_name / last_name / email to appropriate Faker providers
Create FK relationship order.customer_id → customer.customer_id
Map status to weighted_enum (suffix heuristic)
Generate scale presets (customer=1K, order=2.5K for small)