Changelog¶

All notable changes to Spindle will be documented in this file.

Format follows Keep a Changelog. This project uses Semantic Versioning.

[2.13.0] - 2026-04-29¶

Added — Phase 6: Fidelity Ceiling, Profile Registry & Streaming Fan-out¶

Profile Registry (`sqllocks_spindle/profiles/`)¶

ProfileRegistry — File-system-backed registry with <system>/<table>/<name>.json hierarchy. CRUD (save, load, delete, exists), search (list_all, list_systems, list_tables, search), tagging, bulk import_from_dir, profile diff, reindex, and save_from_dataset_profile convenience method.
RegistryProfile — Dataclass capturing column statistics, tags, description, source row count. identity property returns canonical system/table/name string. save/load round-trip via JSON. validate() on ProfileRegistry compares a GenerationResult against the stored profile via FidelityComparator.
CLI — spindle profile-registry list|save|delete|tag|diff|reindex|validate subcommands.

Fidelity Reports (`sqllocks_spindle/inference/comparator.py`)¶

FidelityReport.to_html() — Self-contained HTML report with inline CSS. Score colour bands: green ≥ 85, amber 70–84, red < 70. Per-column inline progress bars; KS stat, KS p-value, Chi², null delta, cardinality ratio columns.

Tier 1 — Advanced Profiler (`sqllocks_spindle/inference/advanced_profiler.py`)¶

AdvancedProfiler — Wraps real + synthetic DataFrames with four analysis passes:
GMM fitting — BIC-optimal Gaussian Mixture (1–5 components) per numeric column.
Conditional profiles — per-category mean/std/null-rate for categorical × numeric pairs.
Adversarial test — GradientBoostingClassifier 3-fold CV; AUC ≈ 0.5 = statistically indistinguishable.
Temporal profiles — gap statistics, lag-1 and lag-7 autocorrelation, FFT periodicity detection.

Tier 2 — Format & Cardinality (`sqllocks_spindle/inference/tier2_profiler.py`)¶

FormatPreservationAnalyzer — Detects dominant format in real data (email, phone, UUID, URL, IPv4, ZIP, SSN, ISO date, credit card); compares synthetic preservation rate.
StringSimilarityAnalyzer — Character n-gram cosine similarity between real and synthetic string columns.
CardinalityConstraintChecker — Flags synthetic cardinality deviations > 20 % of real.
check_anomaly_rates() — Verifies _spindle_is_anomaly fractions match expected rates within tolerance.
run_tier2() — Convenience function returning a Tier2Report.

Tier 3 — Research-Grade Features (`sqllocks_spindle/inference/tier3_research.py`)¶

ChowLiuNetwork — Chow-Liu Bayesian network via max spanning tree of pairwise mutual information (Kruskal union-find). Returns ChowLiuResult with edges and joint-entropy score.
DifferentialPrivacy — Laplace (L1 sensitivity / ε) and Gaussian (σ calibrated to ε, δ) mechanisms with optional range clipping. Returns DPResult with privacy budget metadata.
DriftMonitor — KS + PSI for numeric, Chi² for categorical. Returns DriftReport with per-column drift flags and overall status.
BootstrapMode — Row sampling with replacement plus optional Gaussian jitter for numeric columns.
CTGANWrapper — Thin wrapper for optional ctgan dependency; raises ImportError gracefully when not installed.

Streaming Fan-out (`sqllocks_spindle/streaming/multi_writer.py`)¶

StreamingMultiWriter — ThreadPoolExecutor fan-out of any generate_stream() iterator to N named StreamWriter sinks concurrently. stream(generator) and stream_table(name, df) entry points. Dynamic add_sink / remove_sink. Per-sink error isolation with stop_on_sink_error option. Returns StreamingMultiWriteResult with per-sink SinkResult.

Tests¶

122 new tests across 6 new test files: test_profile_registry.py (24), test_fidelity_report.py (11), test_advanced_profiler.py (20), test_tier2_profiler.py (20), test_tier3_research.py (27), test_streaming_multi_writer.py (20). Full suite: 2687 passed, 4 skipped.

Demo Notebooks (`examples/notebooks/demos/`)¶

07_profile_registry.ipynb — Save, load, search, diff, and validate profiles end-to-end.
08_fidelity_report_html.ipynb — Generate HTML fidelity reports for real vs synthetic comparison.
09_advanced_profiler.ipynb — GMM, adversarial, conditional, and temporal profiling.
10_tier2_format_fidelity.ipynb — Format preservation, string similarity, cardinality, anomaly rates.
11_tier3_research.ipynb — Chow-Liu networks, differential privacy, drift monitoring, bootstrapping.
12_streaming_multi_writer.ipynb — Fan-out streaming to 4 sinks in parallel.
13_differential_privacy.ipynb — DP mechanism comparison: Laplace vs Gaussian with budget analysis.
14_drift_monitoring.ipynb — Baseline drift detection with PSI and Chi² visualisations.
15_bootstrap_sampling.ipynb — Bootstrap mode sampling and jitter exploration.

Documentation (`docs/guides/`)¶

profile-registry.md, fidelity-validation.md, advanced-fidelity.md, tier2-fidelity.md, tier3-research.md, streaming-multi-writer.md, drift-monitoring.md, column-variables.md

Changed¶

pyproject.toml — Added [advanced] optional extra: scikit-learn>=1.3, scipy>=1.11.
sqllocks_spindle/__init__.py — Added ProfileRegistry, RegistryProfile, StreamingMultiWriter, StreamingMultiWriteResult, SinkResult to top-level exports.
sqllocks_spindle/inference/__init__.py — Added all Tier 1/2/3 exports.
sqllocks_spindle/streaming/__init__.py — Added StreamingMultiWriter, StreamingMultiWriteResult, SinkResult.

[2.11.0] - 2026-04-29¶

Added — Phase 5: Validation Matrix & Demo Notebooks¶

Validation Matrix¶

tests/fixtures/validation_matrix.py — Matrix builder with filter rules. build_matrix() returns 512 valid (domain, sink, size, mode) tuples covering 13 domains × 5 sinks × 4 sizes × 3 modes after filters (streaming + sql-server, fabric_demo + sql-server, inference + non-capable domains).
tests/fixtures/mock_sinks.py — MockSink dataclass + make_mock_sink(sink_type) factory for all 5 sink types. Records write calls without performing real IO.
tests/test_validation_matrix.py — Parametrized mock suite, 518 tests (512 combos + 6 matrix-builder unit tests). All passing.
tests/test_validation_live.py — Live suite with 26 tests across 4 groups: A (13 domains × lakehouse × small × seeding), B (retail × all 5 sinks × fabric_demo × seeding), C (retail × lakehouse × all 4 sizes × streaming), D (retail × warehouse × all sizes × seeding). Auth via InteractiveBrowserCredential (browser fires once, token cached).
pyproject.toml — Registered infra pytest marker; documented SPINDLE_TEST_*_CONN env vars for live tests.

Demo Notebooks (`notebooks/demos/`)¶

01_retail_lakehouse_quickstart.ipynb — retail → lakehouse, seeding + streaming, all sizes, Delta read-back validation.
02_financial_warehouse_analytics.ipynb — financial → Fabric Warehouse, all sizes, ODBC row-count validation.
03_healthcare_sql_database.ipynb — healthcare → Fabric SQL Database, optional DataMasker HIPAA masking.
04_capital_markets_eventhouse.ipynb — capital markets → Eventhouse/KQL, streaming tick data.
05_multi_domain_fanout.ipynb — retail + financial → lakehouse + optional warehouse.
06_custom_ddl_to_lakehouse.ipynb — bring-your-own DDL → DDLParser → generate → lakehouse.

Notebook Templates (`notebooks/templates/`)¶

template_domain_to_sink.ipynb — parametrized starter for any domain → any sink.
template_custom_schema.ipynb — custom .spindle.json or .sql schema → any sink.

Notes¶

No new sink code required — FabricSqlDatabaseWriter covers SQL Server (on-prem), Azure SQL Database, Azure SQL Managed Instance, Fabric Warehouse, and Fabric SQL Database via auth_method parameter.
Mock matrix runtime: ~12 minutes locally (heavy at fabric_demo size). All 518 tests pass.

[2.9.0] - 2026-04-28¶

Added — Phase 3B: Inference Depth¶

Spindle generated data now statistically matches real source data across all fidelity dimensions: distribution shape, cardinality, null rates, temporal patterns, string formats, outlier rates, and column correlations.

New Classes¶

EmpiricalStrategy (engine/strategies/empirical.py) — Quantile-fingerprint interpolation for numeric columns when parametric distribution fit is poor. Requires a quantiles dict (keys p1–p99). Supports "linear" (default, NumPy) and "cubic" (scipy, optional) interpolation.
GaussianCopula (engine/correlation.py) — Post-generation correlation enforcement. Reorders column values to achieve target Pearson correlations without changing any column's marginal distribution. Algorithm: Cholesky decompose → draw correlated normals → re-rank values. Pure NumPy, no scipy.
LakehouseProfiler (inference/lakehouse_profiler.py) — Fabric-native Delta table profiler. Reads tables over ABFSS via deltalake. Returns the same DatasetProfile/TableProfile as the other entry points. Requires [fabric-inference] extra.
FidelityReport — Extended with .score() classmethod, .failing_columns(), .to_dict(), .to_dataframe(). Enables inline fidelity measurement during generation via new fidelity_profile= kwarg on Spindle.generate().

Enhanced Classes¶

DataProfiler — New constructor kwargs: fit_threshold, top_n_values, outlier_iqr_factor, sample_rows. New profile() alias (same as profile_dataset()). New from_csv() classmethod. Extended string pattern detection: ssn, ip_address (IPv4 + IPv6), mac_address, iban, currency_code, language_code, postal_code.
ColumnProfile — New optional fields: quantiles (dict), hour_histogram, dow_histogram, string_length, outlier_rate, value_counts_ext, fit_score.
TableProfile — New correlation_matrix: dict[str, dict[str, float]] | None field.
SchemaBuilder.build() — New kwargs: fit_threshold, correlation_threshold, include_anomaly_registry. Returns (SpindleSchema, AnomalyRegistry) tuple when include_anomaly_registry=True. Extended priority tree (13 levels) with empirical fallback when KS fit < fit_threshold, temporal histogram routing, and correlation detection.
Spindle.generate() — New kwargs: enforce_correlations=True (auto-applies GaussianCopula when schema contains correlated_columns) and fidelity_profile=None (returns (GenerationResult, FidelityReport) tuple when supplied).

New Extras¶

pip install sqllocks-spindle[inference]          # scipy for FidelityReport + empirical strategies
pip install sqllocks-spindle[fabric-inference]   # scipy + deltalake + pyarrow for LakehouseProfiler

New String Patterns in Engine¶

ssn, ip_address (IPv4 + IPv6), mac_address, iban, currency_code, language_code, postal_code

Changed¶

Test count: 1,946 → 1,973 (+27 Phase 3B tests across test_empirical_strategy.py, test_correlation.py, test_fidelity_report_v2.py, test_lakehouse_profiler.py, and additions to test_inference.py and test_e2e_generation.py)

[2.7.1] - 2026-04-27¶

Changed¶

Demo Engine — Phase 2 wiring: SeedingDemoMode.run() now performs real Fabric sink writes, replacing the previous manifest-only stub. Local mode delegates to ScaleRouter (multi-process); Spark mode delegates to FabricSparkRouter (Fabric notebook submission). Sinks are constructed from the connection profile and fan out simultaneously to all configured targets (lakehouse + warehouse + sql_db + eventhouse).
New --scale-mode {auto,local,spark} flag on spindle demo run. auto selects spark when a connection profile is configured, lakehouse_id is set, and rows >= 500_000; otherwise local.
DemoManifest now records scale_mode, fabric_run_id, workspace_id, and notebook_item_id so Spark runs can be polled and cleaned up by session_id.
cmd_demo_run now forwards scale_mode into DemoParams and includes fabric_run_id and status in the response payload for Spark submissions.
ConnectionProfile extended with warehouse_staging_path and eventhouse_database fields (required by WarehouseSink and KQLSink).

Added¶

cmd_demo_status MCP bridge command — reads the manifest by session_id and, when the run was a Spark submission, polls FabricJobTracker.get_status for live Fabric job state
cmd_demo_cleanup MCP bridge command — runs CleanupEngine against a saved manifest by session_id

Test count¶

1,930 → 1,946 (+16 new tests)

[2.7.0] - 2026-04-27¶

Added¶

Billion-row pipeline (Phase 2) — Fabric Spark scale generation via scale_mode="fabric_spark"
- FabricSparkRouter (engine/spark_router.py) — generates static tables in-process, uploads augmented schema JSON to OneLake via DFS API, finds or auto-creates spindle_spark_worker notebook, submits Fabric notebook run, returns JobRecord immediately
- AsyncJobStore + JobRecord (engine/async_job_store.py) — thread-safe in-process registry tracking submitted Fabric jobs by job_id
- FabricJobTracker (engine/job_tracker.py) — polls and cancels Fabric notebook runs via the Fabric Jobs REST API
- spindle_spark_worker.ipynb — Fabric notebook template: reads schema from OneLake, foreachPartition dynamic table generation, writes to LakehouseSink / WarehouseSink / KQLSink / SQLDatabaseSink, saves result stats and cleans up temp file
- cmd_scale_status MCP bridge command — polls Fabric job status by job_id; maps Fabric statuses to submitted|running|succeeded|failed|cancelled
- cmd_scale_cancel MCP bridge command — cancels an in-flight Fabric notebook run
- cmd_scale_generate(scale_mode="fabric_spark") now fully implemented; requires sink_config.workspace_id, sink_config.lakehouse_id, sink_config.token
- sqllocks_spindle/notebooks/__init__.py — loads and exports SPARK_WORKER_IPYNB notebook template

Changed¶

Test count: 1,913 → 1,930 (+17 Phase 2 unit tests in tests/test_spark_router.py)

[2.6.1] - 2026-04-26¶

Fixed¶

GAP 1 — Reference table chunk replication: ScaleRouter now classifies tables as static (schema count < chunk_size) or dynamic (schema count ≥ chunk_size). Static tables are generated once with their natural cardinality and broadcast as pre-loaded PK pools into every chunk worker via the augmented schema JSON. Dynamic tables are generated chunk_size rows per chunk. Added _classify_tables, _generate_static_tables, and _SpindleJSONEncoder (handles pd.Timestamp, numpy scalars) to scale_router.py.
GAP 2 — Composite FK reference impossible: New composite_foreign_key strategy (engine/strategies/composite_foreign_key.py) — takes ref_table + ref_columns: [list], samples rows from the parent table, returns a dict of per-column arrays. New composite_fk_field strategy reads one component from the stashed dict. Both strategies registered in Spindle, ChunkWorker, ScaleRouter._generate_static_tables.
GAP 3 — Composite PK FK lookup returns 2D array: TableGenerator.generate() now detects dict returns from strategies (multi-column path) and unpacks each key into ctx.current_table. _cfo_ prefix cache keys are filtered from the public DataFrame alongside _rs_ and _sr_.
GAP 4 — Computed columns not applied in ChunkWorker: Extracted _compute_phase into module-level apply_compute_phase(tables, schema) in generator.py; chunk_worker.generate_chunk now calls it after generating all tables.
GAP 5 — Business rules not applied in ChunkWorker: generate_chunk calls BusinessRulesEngine.fix_violations() after apply_compute_phase when the schema defines business rules.
GAP 6 — PK-free tables rejected as errors: Downgraded "Table has no primary key defined" from error to warning in SchemaValidator. IDManager.register_table() now gracefully skips pool registration for empty pk_columns lists (registers data-only for constrained FK lookups).
GAP 7 — Self-referencing hierarchies shatter across chunks: Resolved by GAP 1 fix — tables using self_referencing strategy are typically small reference tables (count < chunk_size) and are now generated once, preserving a single unified hierarchy.
GAP 8 — get_filtered_fks reads first column, not PK: Replaced df.loc[mask, df.columns[0]] with pool[np.where(mask.values)[0]] — uses the PK pool (aligned with df rows) regardless of column order.
GAP 9 — generate_stream() missing compute phase and business rules: Spindle.generate_stream() now buffers all generated tables internally before yielding, then applies _compute_phase and fix_violations in the same pass as Spindle.generate().
GAP 10 — Wrong exception type in DependencyResolver: Added MissingTableError(ValueError) to schema/dependency.py; the resolver now raises it (not CircularDependencyError) when a table depends on a non-existent table.

Changed¶

Test count: 1,912 → 1,913 (+1 revised E2E test asserting correct static/dynamic cardinalities)
test_e2e_scale_router.py: Assertions updated to validate static table natural cardinality (e.g., product_category = 50 rows) and dynamic table chunk multiplication, replacing the incorrect "all tables = TOTAL_ROWS" assertion.

[2.6.0] - 2026-04-25¶

Added¶

Billion-row pipeline (Phase 1) — multi-process scale generation for datasets up to 1B+ rows
- SinkRegistry — fan-out coordinator; writes to all sinks in parallel via ThreadPoolExecutor; raises SinkError with per-sink failures on partial errors
- ChunkWorker (generate_chunk) — subprocess-safe pure function; deferred imports; returns plain Python lists (pickle-safe); applies sequence_offset for PK continuity across chunks
- ScaleRouter — ProcessPoolExecutor-based orchestrator; psutil RAM guard caps workers at 80% available RAM; as_completed() fan-out with configurable max_workers and chunk_size
- StreamManager — singleton per process; daemon threads; stop_event.wait() for interruptible sleep; thread-safe counter_lock on StreamState; stop() returns bool | None (None=unknown, True=clean, False=timeout)
- LakehouseSink — writes Parquet via LakehouseFilesWriter; supports local path mode for testing
- WarehouseSink — stages Parquet and loads via COPY INTO using WarehouseBulkWriter
- KQLSink — ingests into Fabric Eventhouse via EventhouseWriter; deferred import with clear pip-install error
- SQLDatabaseSink — bulk-inserts into Fabric SQL Database / Azure SQL via FabricSqlDatabaseWriter
- cmd_scale_generate MCP bridge command — local single-process and multi-process (subprocess workers) modes; temp file cleanup in finally; seed propagated in return dict
- cmd_stream / cmd_stream_status / cmd_stream_stop MCP bridge commands — background streaming with configurable interval_seconds, max_chunks, sink fan-out

Fixed¶

reference_data.py — _load_dataset now wraps domain path strings with Path() before / operator; was raising TypeError when _domain_path was injected as a plain string from JSON
19_scenario_packs.py — updated to use dict-access (p['domain'], p['pack_id']) after PackLoader.list_builtin() API change

Changed¶

Test count: 1,867 → 1,912 (+45 Phase 1 tests including e2e integration test)

[2.0.0] - 2026-03-14¶

Added¶

All 18 Blueprint items (E1-E18): CredentialResolver, RunManifest enhancements, observability, IoT/financial/clickstream/operational log simulation, state machines, SCD2 file drops, spindle publish CLI, acceptance tests, EventhouseWriter, Fabric provisioning guide
Tier 3 features: spindle learn, spindle continue, spindle compare, spindle time-travel, spindle mask, composite presets, profile sharing
34/35 notebooks pre-executed with saved output

Changed¶

Version: 1.3.0 -> 2.0.0 (major bump reflects complete feature set)
Test count: 989 -> 1,250

[1.3.0] - 2026-03-13¶

Added¶

Chaos engine -- ChaosEngine, ChaosConfig, ChaosCategory, ChaosOverride
- Six chaos categories: schema, value, file, referential, temporal, volume
- Four intensity levels: calm (0.25x), moderate (1.0x), stormy (2.5x), hurricane (5.0x)
- Escalation modes: gradual, random, front-loaded
- Methods: corrupt_dataframe(), drift_schema(), corrupt_file(), inject_referential_chaos(), inject_temporal_chaos(), inject_volume_chaos(), apply_all()
Simulation layer -- three modes for realistic pipeline testing
- FileDropSimulator -- daily/hourly/15-min cadence, Parquet/CSV/JSONL, manifests, done flags, lateness, duplicates, backfill
- StreamEmitter -- CloudEvents envelopes, rate + jitter, out-of-order, replay windows, multi-topic
- HybridSimulator -- concurrent batch + stream, correlation ID linking
Scenario Packs -- PackLoader, PackRunner, PackValidator, ScenarioPack
- 44 built-in packs: 11 verticals x 4 simulation types
- list_builtin(), load_builtin(), PackRunner.run()
GSL spec parser -- GSLParser, GenerationSpec
- Declarative YAML tying schema, scenario pack, chaos, outputs, and validation gates
Validation gates + quarantine -- ReferentialIntegrityGate, SchemaConformanceGate, NullConstraintGate, UniqueConstraintGate, RangeConstraintGate, TemporalConsistencyGate, FileFormatGate, SchemaDriftGate
- QuarantineManager -- quarantine_file(), quarantine_dataframe(), list_quarantined()
CompositeDomain + SharedEntityRegistry
- Multi-domain generation with cross-domain FK enforcement
- SharedConcept enum: PERSON, LOCATION, ORGANIZATION, CALENDAR
EventEnvelope + EnvelopeFactory -- CloudEvents-style wrapper
Fabric integration -- OneLakePaths, LakehouseFilesWriter, EventstreamClient
MCP bridge -- python -m sqllocks_spindle.mcp_bridge (7 commands)
10 new example scripts (13-22) and 3 new notebooks (06-08)
SQL DDL import -- DdlParser for 4 SQL dialects (F-001)
- spindle from-ddl CLI command
- 30+ type-to-strategy mappings, 25+ column name heuristics
- FK detection from explicit constraints and naming conventions
CREATE TABLE DDL in SQL output -- to_sql_inserts() with DDL generation (F-002)
- 3 dialect type maps (T-SQL, PostgreSQL, MySQL)
- Fabric Warehouse compatibility (no PK constraints, no IDENTITY)
- CLI: --sql-ddl, --sql-drop, --sql-go, --sql-dialect, --schema-name
Fabric SQL Database Writer -- FabricSqlDatabaseWriter (F-003)
- 4 auth methods: cli (Entra/az login), msi, spn, sql
- 4 write modes: create_insert, insert_only, truncate_insert, append
- Parameterized executemany, dependency-ordered writes/drops
- CLI: --format sql-database, --connection-string, --auth, --write-mode
- New [fabric-sql] extra: pyodbc>=5.0, azure-identity>=1.15
Semantic Model Writer -- SemanticModelExporter (F-004)
- .bim TOM JSON export at compatibilityLevel 1604
- Auto DAX measures (COUNTROWS + SUM/AVERAGE for numerics)
- M expressions for lakehouse, warehouse, and sql_database source types
- CLI: spindle export-model
Fabric Stream Writer -- FabricStreamWriter convenience wrapper (F-005)
- Single stream() call with sensible defaults for Fabric Notebooks
Capital Markets domain (13th domain) -- 10 tables (F-012)
- Real S&P 500 tickers (110 companies), GICS sectors/industries
- Daily OHLCV pricing, dividends, splits, earnings with EPS surprise
- Insider transactions, tick-level trades for streaming
- Star schema map (4 dims, 4 facts) and CDM mapping
Star schema + CDM maps for all 13 domains
- Every domain now provides star_schema_map() and cdm_map() methods
7 new Fabric guide doc pages -- Lakehouse, Warehouse, SQL Database, Notebooks, Star Schema, CDM Export, 60-Second Overview
12 new notebooks -- T05-T09 tutorials + F01-F07 Fabric scenarios

Changed¶

Version: 1.2.0 -> 1.3.0
Test count: 549 -> 989

[1.2.0] - 2026-03-12¶

Added¶

Star schema transform -- StarSchemaTransform, StarSchemaMap, DimSpec, FactSpec, StarSchemaResult
- Auto-generates dim_date (YYYYMMDD surrogate key, 14 columns)
- RetailDomain.star_schema_map() and HealthcareDomain.star_schema_map()
CDM folder export -- CdmMapper, CdmEntityMap
- Microsoft CDM folder structure (model.json + entity data files)
- RetailDomain.cdm_map() and HealthcareDomain.cdm_map()
Scale presets -- fabric_demo and warehouse added to all 13 domains
CLI commands -- spindle to-star and spindle to-cdm
Streaming engine -- SpindleStreamer, StreamConfig, BurstWindow, TimePattern
- Poisson inter-arrivals, token-bucket rate limiting, burst windows
- Sinks: ConsoleSink, FileSink, EventHubSink, KafkaSink
Anomaly injection -- AnomalyRegistry, PointAnomaly, ContextualAnomaly, CollectiveAnomaly
CLI -- spindle stream command

Changed¶

Version: 1.0.0 -> 1.2.0

[1.0.0] - 2026-03-11¶

Added¶

Core generation engine with 21 column-level strategies
Schema definition format (.spindle.json) with parser, validator, and topological sort
Retail domain -- 9 tables, 3NF normalized
Healthcare domain -- 9 tables, 3NF normalized
10 additional domains: Financial, Supply Chain, IoT, HR, Insurance, Marketing, Education, Real Estate, Manufacturing, Telecom
Distribution profiles with _dist() and _ratio() API, runtime overrides
Real-world calibrations from 40+ authoritative sources (NRF, Census, CMS, CDC, KFF, AAMC, BLS)
Real US address data (40,977 ZIP codes from GeoNames CC-BY-4.0) with lat/lng
ID Manager with Pareto, Zipf, and uniform FK distributions
Business rules engine for cross-table constraint enforcement
CLI: generate, describe, validate, list, --dry-run
Output formats: CSV, TSV, JSON Lines, Parquet, Excel, SQL INSERT, Delta
Fabric Lakehouse writer (DeltaWriter via delta-rs)
103 tests

Changelog¶

[2.13.0] - 2026-04-29¶

Added — Phase 6: Fidelity Ceiling, Profile Registry & Streaming Fan-out¶

Profile Registry (sqllocks_spindle/profiles/)¶

Fidelity Reports (sqllocks_spindle/inference/comparator.py)¶

Tier 1 — Advanced Profiler (sqllocks_spindle/inference/advanced_profiler.py)¶

Tier 2 — Format & Cardinality (sqllocks_spindle/inference/tier2_profiler.py)¶

Tier 3 — Research-Grade Features (sqllocks_spindle/inference/tier3_research.py)¶

Streaming Fan-out (sqllocks_spindle/streaming/multi_writer.py)¶

Tests¶

Demo Notebooks (examples/notebooks/demos/)¶

Documentation (docs/guides/)¶

Changed¶

[2.11.0] - 2026-04-29¶

Added — Phase 5: Validation Matrix & Demo Notebooks¶

Validation Matrix¶

Demo Notebooks (notebooks/demos/)¶

Notebook Templates (notebooks/templates/)¶

Notes¶

[2.9.0] - 2026-04-28¶

Added — Phase 3B: Inference Depth¶

New Classes¶

Enhanced Classes¶

New Extras¶

New String Patterns in Engine¶

Changed¶

[2.7.1] - 2026-04-27¶

Changed¶

Added¶

Test count¶

[2.7.0] - 2026-04-27¶

Added¶

Changed¶

[2.6.1] - 2026-04-26¶

Fixed¶

Changed¶

[2.6.0] - 2026-04-25¶

Added¶

Fixed¶

Changed¶

[2.0.0] - 2026-03-14¶

Added¶

Changed¶

[1.3.0] - 2026-03-13¶

Added¶

Changed¶

[1.2.0] - 2026-03-12¶

Added¶

Changed¶

[1.0.0] - 2026-03-11¶

Added¶

Profile Registry (`sqllocks_spindle/profiles/`)¶

Fidelity Reports (`sqllocks_spindle/inference/comparator.py`)¶

Tier 1 — Advanced Profiler (`sqllocks_spindle/inference/advanced_profiler.py`)¶

Tier 2 — Format & Cardinality (`sqllocks_spindle/inference/tier2_profiler.py`)¶

Tier 3 — Research-Grade Features (`sqllocks_spindle/inference/tier3_research.py`)¶

Streaming Fan-out (`sqllocks_spindle/streaming/multi_writer.py`)¶

Demo Notebooks (`examples/notebooks/demos/`)¶

Documentation (`docs/guides/`)¶

Demo Notebooks (`notebooks/demos/`)¶

Notebook Templates (`notebooks/templates/`)¶