spark_router
sqllocks_spindle.engine.spark_router
¶
FabricSparkRouter — submits Spindle generation jobs to Fabric Spark notebooks.
Classes¶
NotebookNotFoundError
¶
Bases: RuntimeError
Raised when the Spindle Spark worker notebook cannot be found or created.
FabricAPIError
¶
Bases: RuntimeError
Raised when the Fabric REST API returns an unexpected error.
FabricSparkRouter
¶
Submits Spindle generation jobs to a Fabric Spark notebook.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
workspace_id
|
str
|
Fabric workspace GUID. |
required |
lakehouse_id
|
str
|
Lakehouse GUID for temp file staging and output. |
required |
token
|
str
|
Entra bearer token with Workspace.Write + Lakehouse.ReadWrite permissions. |
required |
notebook_name
|
str
|
Display name of the Spark worker notebook (default: spindle_spark_worker). |
'spindle_spark_worker'
|
sinks
|
list[str] | None
|
List of sink names to pass into the notebook (default: ["lakehouse"]). |
None
|
sink_config
|
dict | None
|
Per-sink configuration dict passed as JSON to the notebook. |
None
|
chunk_size
|
int
|
Rows per Spark partition. Default 500_000. |
500000
|
Methods:¶
prepare(schema_dict, total_rows, seed)
¶
Phase A — static generation + OneLake upload + notebook find/create.
This is the slow, I/O-heavy phase. Callers that fan out multiple domains should run prepare() calls at controlled concurrency (e.g. 4-way) to avoid OneLake DFS write throttling, then pass the returned dict to submit_run() in a separate, fully-parallel phase.
Returns a prepared dict with keys
run_id, schema_path, notebook_item_id, total_rows, seed
submit_run(prepared)
¶
Phase B — submit the Fabric notebook run from a prepared dict.
Fast: a single REST call. Callers can fan out all domains in parallel after all prepare() calls have completed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prepared
|
dict
|
dict returned by prepare(). |
required |
submit(schema_dict, total_rows, seed)
¶
Generate static tables, upload schema, and submit a Fabric notebook run.
Thin wrapper around prepare() + submit_run() for backward compatibility. Returns a JobRecord immediately.