Changelog¶
All notable changes to Spindle will be documented in this file.
Format follows Keep a Changelog. This project uses Semantic Versioning.
[2.13.0] - 2026-04-29¶
Added — Phase 6: Fidelity Ceiling, Profile Registry & Streaming Fan-out¶
Profile Registry (sqllocks_spindle/profiles/)¶
ProfileRegistry— File-system-backed registry with<system>/<table>/<name>.jsonhierarchy. CRUD (save,load,delete,exists), search (list_all,list_systems,list_tables,search), tagging, bulkimport_from_dir, profile diff,reindex, andsave_from_dataset_profileconvenience method.RegistryProfile— Dataclass capturing column statistics, tags, description, source row count.identityproperty returns canonicalsystem/table/namestring.save/loadround-trip via JSON.validate()onProfileRegistrycompares aGenerationResultagainst the stored profile viaFidelityComparator.- CLI —
spindle profile-registry list|save|delete|tag|diff|reindex|validatesubcommands.
Fidelity Reports (sqllocks_spindle/inference/comparator.py)¶
FidelityReport.to_html()— Self-contained HTML report with inline CSS. Score colour bands: green ≥ 85, amber 70–84, red < 70. Per-column inline progress bars; KS stat, KS p-value, Chi², null delta, cardinality ratio columns.
Tier 1 — Advanced Profiler (sqllocks_spindle/inference/advanced_profiler.py)¶
AdvancedProfiler— Wraps real + synthetic DataFrames with four analysis passes:- GMM fitting — BIC-optimal Gaussian Mixture (1–5 components) per numeric column.
- Conditional profiles — per-category mean/std/null-rate for categorical × numeric pairs.
- Adversarial test — GradientBoostingClassifier 3-fold CV; AUC ≈ 0.5 = statistically indistinguishable.
- Temporal profiles — gap statistics, lag-1 and lag-7 autocorrelation, FFT periodicity detection.
Tier 2 — Format & Cardinality (sqllocks_spindle/inference/tier2_profiler.py)¶
FormatPreservationAnalyzer— Detects dominant format in real data (email, phone, UUID, URL, IPv4, ZIP, SSN, ISO date, credit card); compares synthetic preservation rate.StringSimilarityAnalyzer— Character n-gram cosine similarity between real and synthetic string columns.CardinalityConstraintChecker— Flags synthetic cardinality deviations > 20 % of real.check_anomaly_rates()— Verifies_spindle_is_anomalyfractions match expected rates within tolerance.run_tier2()— Convenience function returning aTier2Report.
Tier 3 — Research-Grade Features (sqllocks_spindle/inference/tier3_research.py)¶
ChowLiuNetwork— Chow-Liu Bayesian network via max spanning tree of pairwise mutual information (Kruskal union-find). ReturnsChowLiuResultwith edges and joint-entropy score.DifferentialPrivacy— Laplace (L1 sensitivity / ε) and Gaussian (σ calibrated to ε, δ) mechanisms with optional range clipping. ReturnsDPResultwith privacy budget metadata.DriftMonitor— KS + PSI for numeric, Chi² for categorical. ReturnsDriftReportwith per-column drift flags and overall status.BootstrapMode— Row sampling with replacement plus optional Gaussian jitter for numeric columns.CTGANWrapper— Thin wrapper for optionalctgandependency; raisesImportErrorgracefully when not installed.
Streaming Fan-out (sqllocks_spindle/streaming/multi_writer.py)¶
StreamingMultiWriter— ThreadPoolExecutor fan-out of anygenerate_stream()iterator to N namedStreamWritersinks concurrently.stream(generator)andstream_table(name, df)entry points. Dynamicadd_sink/remove_sink. Per-sink error isolation withstop_on_sink_erroroption. ReturnsStreamingMultiWriteResultwith per-sinkSinkResult.
Tests¶
- 122 new tests across 6 new test files:
test_profile_registry.py(24),test_fidelity_report.py(11),test_advanced_profiler.py(20),test_tier2_profiler.py(20),test_tier3_research.py(27),test_streaming_multi_writer.py(20). Full suite: 2687 passed, 4 skipped.
Demo Notebooks (examples/notebooks/demos/)¶
07_profile_registry.ipynb— Save, load, search, diff, and validate profiles end-to-end.08_fidelity_report_html.ipynb— Generate HTML fidelity reports for real vs synthetic comparison.09_advanced_profiler.ipynb— GMM, adversarial, conditional, and temporal profiling.10_tier2_format_fidelity.ipynb— Format preservation, string similarity, cardinality, anomaly rates.11_tier3_research.ipynb— Chow-Liu networks, differential privacy, drift monitoring, bootstrapping.12_streaming_multi_writer.ipynb— Fan-out streaming to 4 sinks in parallel.13_differential_privacy.ipynb— DP mechanism comparison: Laplace vs Gaussian with budget analysis.14_drift_monitoring.ipynb— Baseline drift detection with PSI and Chi² visualisations.15_bootstrap_sampling.ipynb— Bootstrap mode sampling and jitter exploration.
Documentation (docs/guides/)¶
profile-registry.md,fidelity-validation.md,advanced-fidelity.md,tier2-fidelity.md,tier3-research.md,streaming-multi-writer.md,drift-monitoring.md,column-variables.md
Changed¶
pyproject.toml— Added[advanced]optional extra:scikit-learn>=1.3,scipy>=1.11.sqllocks_spindle/__init__.py— AddedProfileRegistry,RegistryProfile,StreamingMultiWriter,StreamingMultiWriteResult,SinkResultto top-level exports.sqllocks_spindle/inference/__init__.py— Added all Tier 1/2/3 exports.sqllocks_spindle/streaming/__init__.py— AddedStreamingMultiWriter,StreamingMultiWriteResult,SinkResult.
[2.11.0] - 2026-04-29¶
Added — Phase 5: Validation Matrix & Demo Notebooks¶
Validation Matrix¶
tests/fixtures/validation_matrix.py— Matrix builder with filter rules.build_matrix()returns 512 valid(domain, sink, size, mode)tuples covering 13 domains × 5 sinks × 4 sizes × 3 modes after filters (streaming + sql-server, fabric_demo + sql-server, inference + non-capable domains).tests/fixtures/mock_sinks.py—MockSinkdataclass +make_mock_sink(sink_type)factory for all 5 sink types. Records write calls without performing real IO.tests/test_validation_matrix.py— Parametrized mock suite, 518 tests (512 combos + 6 matrix-builder unit tests). All passing.tests/test_validation_live.py— Live suite with 26 tests across 4 groups: A (13 domains × lakehouse × small × seeding), B (retail × all 5 sinks × fabric_demo × seeding), C (retail × lakehouse × all 4 sizes × streaming), D (retail × warehouse × all sizes × seeding). Auth viaInteractiveBrowserCredential(browser fires once, token cached).pyproject.toml— Registeredinfrapytest marker; documentedSPINDLE_TEST_*_CONNenv vars for live tests.
Demo Notebooks (notebooks/demos/)¶
01_retail_lakehouse_quickstart.ipynb— retail → lakehouse, seeding + streaming, all sizes, Delta read-back validation.02_financial_warehouse_analytics.ipynb— financial → Fabric Warehouse, all sizes, ODBC row-count validation.03_healthcare_sql_database.ipynb— healthcare → Fabric SQL Database, optional DataMasker HIPAA masking.04_capital_markets_eventhouse.ipynb— capital markets → Eventhouse/KQL, streaming tick data.05_multi_domain_fanout.ipynb— retail + financial → lakehouse + optional warehouse.06_custom_ddl_to_lakehouse.ipynb— bring-your-own DDL → DDLParser → generate → lakehouse.
Notebook Templates (notebooks/templates/)¶
template_domain_to_sink.ipynb— parametrized starter for any domain → any sink.template_custom_schema.ipynb— custom.spindle.jsonor.sqlschema → any sink.
Notes¶
- No new sink code required —
FabricSqlDatabaseWritercovers SQL Server (on-prem), Azure SQL Database, Azure SQL Managed Instance, Fabric Warehouse, and Fabric SQL Database viaauth_methodparameter. - Mock matrix runtime: ~12 minutes locally (heavy at fabric_demo size). All 518 tests pass.
[2.9.0] - 2026-04-28¶
Added — Phase 3B: Inference Depth¶
Spindle generated data now statistically matches real source data across all fidelity dimensions: distribution shape, cardinality, null rates, temporal patterns, string formats, outlier rates, and column correlations.
New Classes¶
EmpiricalStrategy(engine/strategies/empirical.py) — Quantile-fingerprint interpolation for numeric columns when parametric distribution fit is poor. Requires aquantilesdict (keysp1–p99). Supports"linear"(default, NumPy) and"cubic"(scipy, optional) interpolation.GaussianCopula(engine/correlation.py) — Post-generation correlation enforcement. Reorders column values to achieve target Pearson correlations without changing any column's marginal distribution. Algorithm: Cholesky decompose → draw correlated normals → re-rank values. Pure NumPy, no scipy.LakehouseProfiler(inference/lakehouse_profiler.py) — Fabric-native Delta table profiler. Reads tables over ABFSS viadeltalake. Returns the sameDatasetProfile/TableProfileas the other entry points. Requires[fabric-inference]extra.FidelityReport— Extended with.score()classmethod,.failing_columns(),.to_dict(),.to_dataframe(). Enables inline fidelity measurement during generation via newfidelity_profile=kwarg onSpindle.generate().
Enhanced Classes¶
DataProfiler— New constructor kwargs:fit_threshold,top_n_values,outlier_iqr_factor,sample_rows. Newprofile()alias (same asprofile_dataset()). Newfrom_csv()classmethod. Extended string pattern detection:ssn,ip_address(IPv4 + IPv6),mac_address,iban,currency_code,language_code,postal_code.ColumnProfile— New optional fields:quantiles(dict),hour_histogram,dow_histogram,string_length,outlier_rate,value_counts_ext,fit_score.TableProfile— Newcorrelation_matrix: dict[str, dict[str, float]] | Nonefield.SchemaBuilder.build()— New kwargs:fit_threshold,correlation_threshold,include_anomaly_registry. Returns(SpindleSchema, AnomalyRegistry)tuple wheninclude_anomaly_registry=True. Extended priority tree (13 levels) with empirical fallback when KS fit <fit_threshold, temporal histogram routing, and correlation detection.Spindle.generate()— New kwargs:enforce_correlations=True(auto-appliesGaussianCopulawhen schema containscorrelated_columns) andfidelity_profile=None(returns(GenerationResult, FidelityReport)tuple when supplied).
New Extras¶
pip install sqllocks-spindle[inference] # scipy for FidelityReport + empirical strategies
pip install sqllocks-spindle[fabric-inference] # scipy + deltalake + pyarrow for LakehouseProfiler
New String Patterns in Engine¶
ssn, ip_address (IPv4 + IPv6), mac_address, iban, currency_code, language_code, postal_code
Changed¶
- Test count: 1,946 → 1,973 (+27 Phase 3B tests across
test_empirical_strategy.py,test_correlation.py,test_fidelity_report_v2.py,test_lakehouse_profiler.py, and additions totest_inference.pyandtest_e2e_generation.py)
[2.7.1] - 2026-04-27¶
Changed¶
- Demo Engine — Phase 2 wiring:
SeedingDemoMode.run()now performs real Fabric sink writes, replacing the previous manifest-only stub. Local mode delegates toScaleRouter(multi-process); Spark mode delegates toFabricSparkRouter(Fabric notebook submission). Sinks are constructed from the connection profile and fan out simultaneously to all configured targets (lakehouse + warehouse + sql_db + eventhouse). - New
--scale-mode {auto,local,spark}flag onspindle demo run.autoselectssparkwhen a connection profile is configured,lakehouse_idis set, androws >= 500_000; otherwiselocal. DemoManifestnow recordsscale_mode,fabric_run_id,workspace_id, andnotebook_item_idso Spark runs can be polled and cleaned up bysession_id.cmd_demo_runnow forwardsscale_modeintoDemoParamsand includesfabric_run_idandstatusin the response payload for Spark submissions.ConnectionProfileextended withwarehouse_staging_pathandeventhouse_databasefields (required byWarehouseSinkandKQLSink).
Added¶
cmd_demo_statusMCP bridge command — reads the manifest bysession_idand, when the run was a Spark submission, pollsFabricJobTracker.get_statusfor live Fabric job statecmd_demo_cleanupMCP bridge command — runsCleanupEngineagainst a saved manifest bysession_id
Test count¶
1,930 → 1,946 (+16 new tests)
[2.7.0] - 2026-04-27¶
Added¶
- Billion-row pipeline (Phase 2) — Fabric Spark scale generation via
scale_mode="fabric_spark"FabricSparkRouter(engine/spark_router.py) — generates static tables in-process, uploads augmented schema JSON to OneLake via DFS API, finds or auto-createsspindle_spark_workernotebook, submits Fabric notebook run, returnsJobRecordimmediatelyAsyncJobStore+JobRecord(engine/async_job_store.py) — thread-safe in-process registry tracking submitted Fabric jobs byjob_idFabricJobTracker(engine/job_tracker.py) — polls and cancels Fabric notebook runs via the Fabric Jobs REST APIspindle_spark_worker.ipynb— Fabric notebook template: reads schema from OneLake,foreachPartitiondynamic table generation, writes to LakehouseSink / WarehouseSink / KQLSink / SQLDatabaseSink, saves result stats and cleans up temp filecmd_scale_statusMCP bridge command — polls Fabric job status byjob_id; maps Fabric statuses tosubmitted|running|succeeded|failed|cancelledcmd_scale_cancelMCP bridge command — cancels an in-flight Fabric notebook runcmd_scale_generate(scale_mode="fabric_spark")now fully implemented; requiressink_config.workspace_id,sink_config.lakehouse_id,sink_config.tokensqllocks_spindle/notebooks/__init__.py— loads and exportsSPARK_WORKER_IPYNBnotebook template
Changed¶
- Test count: 1,913 → 1,930 (+17 Phase 2 unit tests in
tests/test_spark_router.py)
[2.6.1] - 2026-04-26¶
Fixed¶
- GAP 1 — Reference table chunk replication:
ScaleRouternow classifies tables as static (schema count <chunk_size) or dynamic (schema count ≥chunk_size). Static tables are generated once with their natural cardinality and broadcast as pre-loaded PK pools into every chunk worker via the augmented schema JSON. Dynamic tables are generatedchunk_sizerows per chunk. Added_classify_tables,_generate_static_tables, and_SpindleJSONEncoder(handlespd.Timestamp, numpy scalars) toscale_router.py. - GAP 2 — Composite FK reference impossible: New
composite_foreign_keystrategy (engine/strategies/composite_foreign_key.py) — takesref_table+ref_columns: [list], samples rows from the parent table, returns a dict of per-column arrays. Newcomposite_fk_fieldstrategy reads one component from the stashed dict. Both strategies registered inSpindle,ChunkWorker,ScaleRouter._generate_static_tables. - GAP 3 — Composite PK FK lookup returns 2D array:
TableGenerator.generate()now detectsdictreturns from strategies (multi-column path) and unpacks each key intoctx.current_table._cfo_prefix cache keys are filtered from the public DataFrame alongside_rs_and_sr_. - GAP 4 — Computed columns not applied in
ChunkWorker: Extracted_compute_phaseinto module-levelapply_compute_phase(tables, schema)ingenerator.py;chunk_worker.generate_chunknow calls it after generating all tables. - GAP 5 — Business rules not applied in
ChunkWorker:generate_chunkcallsBusinessRulesEngine.fix_violations()afterapply_compute_phasewhen the schema defines business rules. - GAP 6 — PK-free tables rejected as errors: Downgraded
"Table has no primary key defined"fromerrortowarninginSchemaValidator.IDManager.register_table()now gracefully skips pool registration for emptypk_columnslists (registers data-only for constrained FK lookups). - GAP 7 — Self-referencing hierarchies shatter across chunks: Resolved by GAP 1 fix — tables using
self_referencingstrategy are typically small reference tables (count <chunk_size) and are now generated once, preserving a single unified hierarchy. - GAP 8 —
get_filtered_fksreads first column, not PK: Replaceddf.loc[mask, df.columns[0]]withpool[np.where(mask.values)[0]]— uses the PK pool (aligned with df rows) regardless of column order. - GAP 9 —
generate_stream()missing compute phase and business rules:Spindle.generate_stream()now buffers all generated tables internally before yielding, then applies_compute_phaseandfix_violationsin the same pass asSpindle.generate(). - GAP 10 — Wrong exception type in
DependencyResolver: AddedMissingTableError(ValueError)toschema/dependency.py; the resolver now raises it (notCircularDependencyError) when a table depends on a non-existent table.
Changed¶
- Test count: 1,912 → 1,913 (+1 revised E2E test asserting correct static/dynamic cardinalities)
test_e2e_scale_router.py: Assertions updated to validate static table natural cardinality (e.g.,product_category= 50 rows) and dynamic table chunk multiplication, replacing the incorrect "all tables = TOTAL_ROWS" assertion.
[2.6.0] - 2026-04-25¶
Added¶
- Billion-row pipeline (Phase 1) — multi-process scale generation for datasets up to 1B+ rows
SinkRegistry— fan-out coordinator; writes to all sinks in parallel viaThreadPoolExecutor; raisesSinkErrorwith per-sink failures on partial errorsChunkWorker(generate_chunk) — subprocess-safe pure function; deferred imports; returns plain Python lists (pickle-safe); appliessequence_offsetfor PK continuity across chunksScaleRouter—ProcessPoolExecutor-based orchestrator; psutil RAM guard caps workers at 80% available RAM;as_completed()fan-out with configurablemax_workersandchunk_sizeStreamManager— singleton per process; daemon threads;stop_event.wait()for interruptible sleep; thread-safecounter_lockonStreamState;stop()returnsbool | None(None=unknown, True=clean, False=timeout)LakehouseSink— writes Parquet viaLakehouseFilesWriter; supports local path mode for testingWarehouseSink— stages Parquet and loads via COPY INTO usingWarehouseBulkWriterKQLSink— ingests into Fabric Eventhouse viaEventhouseWriter; deferred import with clear pip-install errorSQLDatabaseSink— bulk-inserts into Fabric SQL Database / Azure SQL viaFabricSqlDatabaseWritercmd_scale_generateMCP bridge command — local single-process and multi-process (subprocess workers) modes; temp file cleanup in finally; seed propagated in return dictcmd_stream/cmd_stream_status/cmd_stream_stopMCP bridge commands — background streaming with configurableinterval_seconds,max_chunks, sink fan-out
Fixed¶
reference_data.py—_load_datasetnow wraps domain path strings withPath()before/operator; was raisingTypeErrorwhen_domain_pathwas injected as a plain string from JSON19_scenario_packs.py— updated to use dict-access (p['domain'],p['pack_id']) afterPackLoader.list_builtin()API change
Changed¶
- Test count: 1,867 → 1,912 (+45 Phase 1 tests including e2e integration test)
[2.0.0] - 2026-03-14¶
Added¶
- All 18 Blueprint items (E1-E18): CredentialResolver, RunManifest enhancements, observability, IoT/financial/clickstream/operational log simulation, state machines, SCD2 file drops,
spindle publishCLI, acceptance tests, EventhouseWriter, Fabric provisioning guide - Tier 3 features:
spindle learn,spindle continue,spindle compare,spindle time-travel,spindle mask, composite presets, profile sharing - 34/35 notebooks pre-executed with saved output
Changed¶
- Version: 1.3.0 -> 2.0.0 (major bump reflects complete feature set)
- Test count: 989 -> 1,250
[1.3.0] - 2026-03-13¶
Added¶
-
Chaos engine --
ChaosEngine,ChaosConfig,ChaosCategory,ChaosOverride- Six chaos categories:
schema,value,file,referential,temporal,volume - Four intensity levels:
calm(0.25x),moderate(1.0x),stormy(2.5x),hurricane(5.0x) - Escalation modes:
gradual,random,front-loaded - Methods:
corrupt_dataframe(),drift_schema(),corrupt_file(),inject_referential_chaos(),inject_temporal_chaos(),inject_volume_chaos(),apply_all()
- Six chaos categories:
-
Simulation layer -- three modes for realistic pipeline testing
FileDropSimulator-- daily/hourly/15-min cadence, Parquet/CSV/JSONL, manifests, done flags, lateness, duplicates, backfillStreamEmitter-- CloudEvents envelopes, rate + jitter, out-of-order, replay windows, multi-topicHybridSimulator-- concurrent batch + stream, correlation ID linking
-
Scenario Packs --
PackLoader,PackRunner,PackValidator,ScenarioPack- 44 built-in packs: 11 verticals x 4 simulation types
list_builtin(),load_builtin(),PackRunner.run()
-
GSL spec parser --
GSLParser,GenerationSpec- Declarative YAML tying schema, scenario pack, chaos, outputs, and validation gates
-
Validation gates + quarantine --
ReferentialIntegrityGate,SchemaConformanceGate,NullConstraintGate,UniqueConstraintGate,RangeConstraintGate,TemporalConsistencyGate,FileFormatGate,SchemaDriftGateQuarantineManager--quarantine_file(),quarantine_dataframe(),list_quarantined()
-
CompositeDomain + SharedEntityRegistry
- Multi-domain generation with cross-domain FK enforcement
SharedConceptenum:PERSON,LOCATION,ORGANIZATION,CALENDAR
-
EventEnvelope + EnvelopeFactory -- CloudEvents-style wrapper
-
Fabric integration --
OneLakePaths,LakehouseFilesWriter,EventstreamClient -
MCP bridge --
python -m sqllocks_spindle.mcp_bridge(7 commands) -
10 new example scripts (13-22) and 3 new notebooks (06-08)
-
SQL DDL import --
DdlParserfor 4 SQL dialects (F-001)spindle from-ddlCLI command- 30+ type-to-strategy mappings, 25+ column name heuristics
- FK detection from explicit constraints and naming conventions
-
CREATE TABLE DDL in SQL output --
to_sql_inserts()with DDL generation (F-002)- 3 dialect type maps (T-SQL, PostgreSQL, MySQL)
- Fabric Warehouse compatibility (no PK constraints, no IDENTITY)
- CLI:
--sql-ddl,--sql-drop,--sql-go,--sql-dialect,--schema-name
-
Fabric SQL Database Writer --
FabricSqlDatabaseWriter(F-003)- 4 auth methods:
cli(Entra/az login),msi,spn,sql - 4 write modes:
create_insert,insert_only,truncate_insert,append - Parameterized
executemany, dependency-ordered writes/drops - CLI:
--format sql-database,--connection-string,--auth,--write-mode - New
[fabric-sql]extra:pyodbc>=5.0,azure-identity>=1.15
- 4 auth methods:
-
Semantic Model Writer --
SemanticModelExporter(F-004)- .bim TOM JSON export at compatibilityLevel 1604
- Auto DAX measures (COUNTROWS + SUM/AVERAGE for numerics)
- M expressions for lakehouse, warehouse, and sql_database source types
- CLI:
spindle export-model
-
Fabric Stream Writer --
FabricStreamWriterconvenience wrapper (F-005)- Single
stream()call with sensible defaults for Fabric Notebooks
- Single
-
Capital Markets domain (13th domain) -- 10 tables (F-012)
- Real S&P 500 tickers (110 companies), GICS sectors/industries
- Daily OHLCV pricing, dividends, splits, earnings with EPS surprise
- Insider transactions, tick-level trades for streaming
- Star schema map (4 dims, 4 facts) and CDM mapping
-
Star schema + CDM maps for all 13 domains
- Every domain now provides
star_schema_map()andcdm_map()methods
- Every domain now provides
-
7 new Fabric guide doc pages -- Lakehouse, Warehouse, SQL Database, Notebooks, Star Schema, CDM Export, 60-Second Overview
-
12 new notebooks -- T05-T09 tutorials + F01-F07 Fabric scenarios
Changed¶
- Version: 1.2.0 -> 1.3.0
- Test count: 549 -> 989
[1.2.0] - 2026-03-12¶
Added¶
-
Star schema transform --
StarSchemaTransform,StarSchemaMap,DimSpec,FactSpec,StarSchemaResult- Auto-generates
dim_date(YYYYMMDD surrogate key, 14 columns) RetailDomain.star_schema_map()andHealthcareDomain.star_schema_map()
- Auto-generates
-
CDM folder export --
CdmMapper,CdmEntityMap- Microsoft CDM folder structure (model.json + entity data files)
RetailDomain.cdm_map()andHealthcareDomain.cdm_map()
-
Scale presets --
fabric_demoandwarehouseadded to all 13 domains -
CLI commands --
spindle to-starandspindle to-cdm -
Streaming engine --
SpindleStreamer,StreamConfig,BurstWindow,TimePattern- Poisson inter-arrivals, token-bucket rate limiting, burst windows
- Sinks:
ConsoleSink,FileSink,EventHubSink,KafkaSink
-
Anomaly injection --
AnomalyRegistry,PointAnomaly,ContextualAnomaly,CollectiveAnomaly -
CLI --
spindle streamcommand
Changed¶
- Version: 1.0.0 -> 1.2.0
[1.0.0] - 2026-03-11¶
Added¶
- Core generation engine with 21 column-level strategies
- Schema definition format (
.spindle.json) with parser, validator, and topological sort - Retail domain -- 9 tables, 3NF normalized
- Healthcare domain -- 9 tables, 3NF normalized
- 10 additional domains: Financial, Supply Chain, IoT, HR, Insurance, Marketing, Education, Real Estate, Manufacturing, Telecom
- Distribution profiles with
_dist()and_ratio()API, runtime overrides - Real-world calibrations from 40+ authoritative sources (NRF, Census, CMS, CDC, KFF, AAMC, BLS)
- Real US address data (40,977 ZIP codes from GeoNames CC-BY-4.0) with lat/lng
- ID Manager with Pareto, Zipf, and uniform FK distributions
- Business rules engine for cross-table constraint enforcement
- CLI:
generate,describe,validate,list,--dry-run - Output formats: CSV, TSV, JSON Lines, Parquet, Excel, SQL INSERT, Delta
- Fabric Lakehouse writer (
DeltaWritervia delta-rs) - 103 tests