DDL Import¶
Import SQL DDL (CREATE TABLE statements) into a .spindle.json schema and generate synthetic data from any existing database design.
Quick Start¶
spindle from-ddl my_tables.sql --output my_schema.spindle.json
spindle generate --schema my_schema.spindle.json --scale small --output ./data/
from sqllocks_spindle.schema.ddl_parser import DdlParser
from sqllocks_spindle import Spindle
parser = DdlParser()
schema = parser.parse_file("my_tables.sql")
result = Spindle().generate(schema=schema, scale="small", seed=42)
print(result.summary())
Supported SQL Dialects¶
| Dialect | Identity Syntax | Identifier Quoting | Notes |
|---|---|---|---|
| SQL Server / Fabric | IDENTITY(1,1) |
[brackets] |
T-SQL, schema-qualified names |
| PostgreSQL | SERIAL, BIGSERIAL |
"double quotes" |
Schema.table notation |
| MySQL | AUTO_INCREMENT |
`backticks` |
Standard MySQL syntax |
| ANSI SQL | — | "double quotes" |
Generic fallback |
The parser auto-detects the dialect from syntax patterns. No configuration needed.
What Gets Parsed¶
- CREATE TABLE statements (with optional
IF NOT EXISTS) - Column definitions with data types, precision/scale, and constraints
- Primary keys — inline and table-level
CONSTRAINT ... PRIMARY KEY - Foreign keys — inline
REFERENCES, table-levelFOREIGN KEY, andALTER TABLE ADD CONSTRAINT - Identity/serial columns →
sequencestrategy - NOT NULL / NULL constraints →
null_rate: 0.0or default
Type-to-Strategy Mapping¶
The parser maps SQL types to appropriate generation strategies:
| SQL Type | Generator Strategy | Details |
|---|---|---|
INT, BIGINT, SMALLINT |
distribution: uniform |
Min/max based on type range |
DECIMAL(p,s), NUMERIC |
distribution: normal |
Mean/std derived from precision |
MONEY, SMALLMONEY |
distribution: log_normal |
Realistic monetary distributions |
BIT, BOOLEAN |
weighted_enum |
{true: 0.85, false: 0.15} |
DATETIME, DATE, TIMESTAMP |
temporal: uniform |
Configurable date range |
UNIQUEIDENTIFIER, UUID |
uuid |
UUID v4 |
VARCHAR, NVARCHAR, TEXT |
Heuristic (see below) | Based on column name |
VARBINARY, IMAGE |
Skipped | Binary columns excluded |
Column Name Heuristics¶
String columns are matched against 24 exact names and 7 suffix patterns:
Exact Matches¶
| Column Name | Faker Provider |
|---|---|
first_name |
first_name |
last_name |
last_name |
email |
email |
phone |
phone_number |
address |
street_address |
city |
city |
state |
state_abbr |
zip_code, postal_code |
zipcode |
country |
country |
company |
company |
username |
user_name |
ssn |
ssn |
ip_address |
ipv4 |
Suffix Patterns¶
| Suffix | Strategy |
|---|---|
*_name |
faker: name |
*_email |
faker: email |
*_phone |
faker: phone_number |
*_date |
temporal |
*_code |
pattern |
*_type, *_status |
weighted_enum |
*_id |
FK candidate (auto-linked) |
Foreign Key Detection¶
FKs are resolved two ways:
- Explicit constraints — parsed directly from
FOREIGN KEY/REFERENCESclauses - Naming convention — columns ending in
_idare matched to tables by name (e.g.,customer_id→ FK tocustomer.customer_id)
Both produce foreign_key strategy with Pareto distribution for realistic skew.
Scale Auto-Generation¶
The parser generates three scale presets based on table topology:
| Scale | Root Tables (no FK deps) | Child Tables (with FK deps) |
|---|---|---|
small |
1,000 rows | 2,500 rows |
medium |
10,000 rows | 25,000 rows |
large |
100,000 rows | 250,000 rows |
CLI Reference¶
| Option | Default | Description |
|---|---|---|
INPUT_FILE |
— | Path to SQL DDL file (required) |
--output, -o |
— | Output .spindle.json path |
--domain |
custom |
Domain name for the generated schema |
--scale |
— | Scale overrides: small:table1=1000,table2=5000 |
Example: End-to-End¶
-- my_tables.sql
CREATE TABLE customer (
customer_id INT IDENTITY(1,1) PRIMARY KEY,
first_name NVARCHAR(50) NOT NULL,
last_name NVARCHAR(50) NOT NULL,
email NVARCHAR(100),
created_at DATETIME2 DEFAULT GETDATE()
);
CREATE TABLE [order] (
order_id INT IDENTITY(1,1) PRIMARY KEY,
customer_id INT NOT NULL REFERENCES customer(customer_id),
order_date DATE NOT NULL,
total DECIMAL(10,2),
status VARCHAR(20)
);
# Import DDL
spindle from-ddl my_tables.sql --output my_schema.spindle.json
# Generate data
spindle generate --schema my_schema.spindle.json --scale small --seed 42 --output ./data/
The parser will:
- Detect SQL Server dialect (from
IDENTITYand bracket quoting) - Map
customer_idtosequencestrategy (identity column) - Map
first_name/last_name/emailto appropriate Faker providers - Create FK relationship
order.customer_id → customer.customer_id - Map
statustoweighted_enum(suffix heuristic) - Generate scale presets (customer=1K, order=2.5K for small)