ADR-0010: CQRS Read/Write Distributor Separation¶
Status¶
Accepted
Context¶
The faker module uses distributors to control value selection with statistical distributions (weighted, Zipf/skew, uniform). A distributor stores a pool of created values and selects from them according to a distribution strategy.
Monolithic distributor¶
In the initial design each distributor owned both the storage and the selection strategy. This created a problem when the same pool of values needed to be accessed with different distribution strategies.
The motivating scenario is Pipe with DistributedProvider.
Pipe topology¶
Pipe orchestrates sequential steps of aggregate generation. Each step
can be wrapped in a DistributedProvider that adds distributor-based
value selection:
Pipe(
PipeStep('first_model', DistributedProvider(
first_model_faker,
distributor=make_distributor(
weights=[0.9, 0.5, 0.1, 0.01], mean=10),
)),
PipeStep('second_model', DistributedProvider(
second_model_faker,
distributor=make_distributor(
weights=[0.3, 0.2], mean=20),
), require_fn=...),
)
DistributedProvider.populate() works as follows:
distributor.next()— tries to read an existing value from the pool using the configured distribution strategy.If
ICursoris raised (pool exhausted or probabilistic creation signal) — delegates toinner.populate()to create a new value, thencursor.append()writes it back to the pool.
The problem¶
Multiple DistributedProvider instances may target the same aggregate type
(e.g. FirstModel) but with different distribution strategies:
Pipe A selects FirstModels with
weights=[0.9, 0.5](heavy skew)Pipe B selects FirstModels with
weights=[0.3, 0.2](more uniform)
With a monolithic distributor each instance maintains its own pool. A
FirstModel created via Pipe A’s distributor is invisible to Pipe B’s
distributor. This leads to:
Data duplication — the same aggregate type stored in multiple pools.
Divergent pools — each distributor sees only the values it created itself, distorting the intended distribution.
Wasted round-trips — synchronizing pools requires explicit observer plumbing.
Decision¶
Separate distributors into a Write store and Read strategies (CQRS within the distributor):
WriteDistributor/PgWriteDistributor— owns the data (indexes, PG table). Single point of mutation (append). Always raisesICursoronnext()to signal the caller to create a new value.WeightedDistributor/SkewDistributor/PgWeightedDistributor/PgSkewDistributor— stateless read decorators over a shared store. Each implements_distribute(n) -> int— a pure function that selects an index given pool size. All reads delegate to the store’s data.NullableDistributor— decorator that probabilistically returnsNothingbefore delegating to the inner distributor.
The distributor_factory / pg_distributor_factory accept an optional
store parameter to share a single write store across multiple read
strategies:
store = PgWriteDistributor()
dist_a = pg_distributor_factory(weights=[0.9, 0.5], mean=5, store=store)
dist_b = pg_distributor_factory(weights=[0.3, 0.2], mean=20, store=store)
# Both read from the same PG table with different strategies
When store is not provided, the factory creates one internally — the
single-distributor case works without any extra configuration.
Functional decomposition¶
The separation mirrors the natural decomposition in functional languages:
OOP (Python) |
FP (Gleam / Elixir) |
|---|---|
|
Actor / Process (holds mutable state) |
|
|
|
Higher-order function wrapper |
|
Config construction + process spawn |
Consequences¶
Multiple
DistributedProvider/ReferenceProviderinstances for the same aggregate type can share a single pool viastoreparameter, eliminating data duplication and pool divergence.The distribution strategy (
_distribute) is a pure function with no state, making it trivial to test and reason about in isolation.The factory API remains backwards-compatible: omitting
storecreates a dedicated store, preserving the simple single-distributor use case.Adding a new distribution strategy requires only a new class with
_distribute(n) -> int— no changes to the store or the factory protocol.PgWriteDistributorhandles the diamond problem for shared stores:setup()usesIF NOT EXISTSand an_initializedflag to ensure idempotent table creation even when multiple read distributors trigger setup concurrently.