Skip to content

spark_router

sqllocks_spindle.engine.spark_router

FabricSparkRouter — submits Spindle generation jobs to Fabric Spark notebooks.

Classes

NotebookNotFoundError

Bases: RuntimeError

Raised when the Spindle Spark worker notebook cannot be found or created.

FabricAPIError

Bases: RuntimeError

Raised when the Fabric REST API returns an unexpected error.

FabricSparkRouter

Submits Spindle generation jobs to a Fabric Spark notebook.

Parameters:

Name Type Description Default
workspace_id str

Fabric workspace GUID.

required
lakehouse_id str

Lakehouse GUID for temp file staging and output.

required
token str

Entra bearer token with Workspace.Write + Lakehouse.ReadWrite permissions.

required
notebook_name str

Display name of the Spark worker notebook (default: spindle_spark_worker).

'spindle_spark_worker'
sinks list[str] | None

List of sink names to pass into the notebook (default: ["lakehouse"]).

None
sink_config dict | None

Per-sink configuration dict passed as JSON to the notebook.

None
chunk_size int

Rows per Spark partition. Default 500_000.

500000
Methods:
prepare(schema_dict, total_rows, seed)

Phase A — static generation + OneLake upload + notebook find/create.

This is the slow, I/O-heavy phase. Callers that fan out multiple domains should run prepare() calls at controlled concurrency (e.g. 4-way) to avoid OneLake DFS write throttling, then pass the returned dict to submit_run() in a separate, fully-parallel phase.

Returns a prepared dict with keys

run_id, schema_path, notebook_item_id, total_rows, seed

submit_run(prepared)

Phase B — submit the Fabric notebook run from a prepared dict.

Fast: a single REST call. Callers can fan out all domains in parallel after all prepare() calls have completed.

Parameters:

Name Type Description Default
prepared dict

dict returned by prepare().

required
submit(schema_dict, total_rows, seed)

Generate static tables, upload schema, and submit a Fabric notebook run.

Thin wrapper around prepare() + submit_run() for backward compatibility. Returns a JobRecord immediately.