Skip to content

hybrid

sqllocks_spindle.simulation.hybrid

Hybrid simulator — combines file-drop and stream emission concurrently.

Classes

HybridConfig dataclass

Configuration for :class:HybridSimulator.

Specifies which tables (or all) go to the streaming path, which go to the micro-batch file-drop path, and how they are linked.

Parameters:

Name Type Description Default
stream_to str

Streaming target — "eventhouse", "lakehouse", or "both". Informational; the actual sink is determined by the embedded :attr:stream_config.

'eventhouse'
micro_batch_to str

Batch target — "lakehouse_files" (default).

'lakehouse_files'
stream_tables list[str]

Tables routed to the stream path. Empty = all.

list()
batch_tables list[str]

Tables routed to the file-drop path. Empty = all.

list()
stream_config StreamEmitConfig

Full :class:StreamEmitConfig for the streaming side.

StreamEmitConfig()
file_drop_config FileDropConfig

Full :class:FileDropConfig for the batch side.

FileDropConfig()
link_strategy str

How stream events and batch files are correlated. "correlation_id" (default) stamps every event and manifest with the same run-level id. "natural_keys" relies on matching business keys across both paths.

'correlation_id'
concurrent bool

Run stream and batch phases in parallel threads.

False
seed int

Random seed (propagated to child configs if they use defaults).

42

HybridResult dataclass

Result of a :meth:HybridSimulator.run invocation.

Attributes:

Name Type Description
file_drop_result FileDropResult | None

Result from the batch (file-drop) phase.

stream_result StreamEmitResult | None

Result from the streaming phase.

correlation_id str

Run-level correlation id linking both outputs.

link_strategy str

The linking strategy that was applied.

HybridSimulator

Run file-drop and stream-emit simulations together, linked by correlation.

The simulator:

  1. Splits the input tables into batch tables and stream tables based on :attr:HybridConfig.batch_tables / :attr:HybridConfig.stream_tables. By default all tables go to both paths.
  2. Stamps a shared correlation_id into every stream envelope and every batch manifest so downstream pipelines can join the two.
  3. Optionally runs both phases concurrently in threads.

Parameters:

Name Type Description Default
tables dict[str, DataFrame]

Pre-generated dict[table_name, DataFrame].

required
config HybridConfig | None

:class:HybridConfig.

None
sink StreamWriter | None

Optional explicit :class:StreamWriter for the streaming side.

None

Example::

from sqllocks_spindle.simulation import (
    HybridSimulator, HybridConfig, FileDropConfig, StreamEmitConfig,
)

cfg = HybridConfig(
    file_drop_config=FileDropConfig(
        domain="retail",
        date_range_start="2024-01-01",
        date_range_end="2024-01-07",
    ),
    stream_config=StreamEmitConfig(rate_per_sec=20),
    concurrent=True,
)
result = HybridSimulator(tables=gen_result.tables, config=cfg).run()
Methods:
run()

Execute both simulation phases and return combined results.