ADR-0009: Envelope Encryption for Event Store¶
Status¶
Accepted
Context¶
In a multi-tenant event sourcing system, event payloads may contain personally identifiable information (PII), financial data, and other sensitive business data. Several requirements drive the need for field-level encryption:
Data isolation between tenants – compromise of one tenant’s encryption key must not affect other tenants.
GDPR right to erasure – the system must support the ability to render a tenant’s data irrecoverable without physically deleting immutable events (crypto-shredding).
Key rotation – encryption keys must be rotatable without re-encrypting the entire event store.
Performance – encryption/decryption should not become a bottleneck on the write or read path.
The standard industry approach to these requirements is Envelope Encryption, used by AWS KMS, HashiCorp Vault Transit, GCP KMS, and Azure Key Vault.
See also:
Caveat¶
The use of crypto-shredding for GDPR compliance is a debated topic:
Harrison J. Brown wrote the following answer to that question:
The legal question is important and the GDPR does actually state that encrypted personal information is still personal information. I spoke to my lawyer about this and they said that in the event of a breach you’d still have to notify the authority (the ICO, in the UK) though you might not have to tell the data subjects (since the breach wouldn’t affect them). This document from the European Commission states this clearly in section1, paragraph 5:
“A confidentiality breach on personal data that were encrypted with a state of the art algorithm is still a personal data breach, and has to be notified to the authority. Nevertheless, if the confidentiality of the key is intact, the data are in principle unintelligible to any person who is not authorised, thus the breach is unlikely to adversely affect the data subject and therefore doesn’t need to be notified to the data subject.”
As for the deletion request scenario, the law does not consider deleting the encryption key equal to actually deleting the data itself. Encrypted personal data is still personal data, regardless of whether anyone has the key. So, if were asked by a data subject to delete their personal data and all you did was delete the encryption key you would not be complying with the removal request, at least according to the GDPR. So, whilst this crypto-shredding pattern is, I think, really useful for certain types of data (business sensitive information, for example) I think your Forgettable Payloads pattern (where one stores the sensitive payload of an event in a separate store to control access and removal) is more appropriate for personal data.
Harrison J. Brown’s comment on Mathias Verraes’ page about the Crypto-Shredding pattern is essentially correct, but with nuances.
The quote about breach notification is an almost verbatim excerpt from the WP29 opinion (Opinion 03/2014), which was later incorporated into EDPB Guidelines 9/2022 (paragraph 76): a breach of encrypted personal data is still a breach and requires notifying the regulator, but if the key is not compromised, the data remains unintelligible to unauthorized parties, and notifying data subjects is most likely not required. EDPB
Where things get more complicated is the right to erasure (Art. 17): the claim that crypto-shredding does not comply with GDPR for erasure purposes is a debatable position, not an established legal fact. In practice, opinions diverge:
Skeptics (SecuPi) argue that crypto-shredding is not suitable for the right to erasure because encrypted data formally remains personal data, and encryption strength may weaken over time. SecuPi
Proponents (Verdict, Spotify) view crypto-shredding as an effective and scalable solution for GDPR compliance, especially in distributed systems. Spotify, for example, built their “Padlock” system on this principle. Verdict
ThoughtWorks placed crypto-shredding in the “Trial” category of their Technology Radar, considering the technique useful for privacy protection and GDPR compliance. Thoughtworks
In distributed systems (event stores, Kafka, backups), physically deleting all copies of data is often technically impossible or prohibitively expensive.
Harrison Brown is correct in his conservative interpretation. The GDPR literally speaks of “erasure”, not “making inaccessible”. However, in practice many organizations and legal professionals accept crypto-shredding as a sufficient measure, especially in systems where physical deletion is technically impractical (event stores, distributed logs, backups).
Brown’s recommendation to use Forgettable Payloads for personal data is the legally safer approach. Crypto-shredding works well as an additional layer of protection, but relying solely on it for Art. 17 is risky without explicit approval from legal counsel.
A combination of both patterns is the most robust option.
Envelope Encryption¶
The core idea is a two-level key hierarchy:
Data Encryption Key (DEK) – a symmetric key (AES-256) used to encrypt event payloads. Generated per-stream (per-aggregate instance). Stored alongside the data in encrypted form.
Key Encryption Key (KEK) – a per-tenant key used to encrypt/decrypt DEKs. Managed by the Key Management Service (KMS). Never leaves the KMS in plaintext.
Application KMS
┌─────────────────────────────┐ ┌──────────────────────────┐
│ EventStore │ │ kms_keys │
│ ├─ event_log │ │ ├─ tenant_id │
│ │ ├─ payload (bytea) │ │ ├─ key_version │
│ │ └─ metadata (jsonb) │ │ ├─ encrypted_key │
│ └─ stream_deks │ │ ├─ master_algorithm │
│ ├─ stream_id │ │ └─ key_algorithm │
│ ├─ version │ └──────────────────────────┘
│ ├─ encrypted_dek │
│ └─ algorithm │
└─────────────────────────────┘
Key rotation requires re-encrypting only the DEKs (few per tenant), not the events themselves (potentially millions).
What to encrypt¶
Encrypt:
payload– business data containing PII, financial information, etc.Do not encrypt:
metadata(containsevent_idused in unique index, correlation/causation IDs for routing),event_type,stream_id,stream_position,event_version– needed by projections and subscriptions for filtering and routing.
Algorithm¶
AES-256-GCM was chosen as the encryption algorithm:
Authenticated encryption – GCM provides both confidentiality and integrity (tamper detection). This is critical for an immutable event store.
Hardware acceleration – AES-NI is available on all modern server CPUs. The
cryptographylibrary uses it automatically.Industry standard – AWS KMS, Vault Transit, GCP KMS all use AES-256-GCM by default.
Nonce safety – with a random 12-byte nonce, the collision limit is ~232 encryptions per DEK. With per-stream DEK granularity, this is not a practical concern.
AAD (Associated Authenticated Data) –
tenant_idis used as AAD at all encryption levels (master key → KEK, KEK → DEK). This cryptographically binds ciphertext to its tenant, preventing cross-tenant ciphertext substitution even with direct DB write access. The domain model (BaseKey) applies AAD uniformly via_aadproperty derived fromtenant_id.
KMS interface¶
The IKeyManagementService interface mirrors the
Vault Transit Engine
API surface:
encrypt_dek/decrypt_dek– envelope operationsgenerate_dek– generate a new DEK and return both plaintext and encrypted formsrotate_kek– create a new KEK version for a tenantrewrap_dek– re-encrypt a DEK with the current KEK version (after rotation)delete_kek– delete all KEK versions for a tenant (crypto-shredding)
Two implementations are provided: PgKeyManagementService stores
KEKs in PostgreSQL (encrypted with a master key from an environment
variable), VaultTransitService delegates all cryptographic
operations to HashiCorp Vault Transit. The interface allows adding
other backends (AWS KMS, GCP KMS) without changing the EventStore code.
DEK granularity¶
DEKs are generated per-stream (per-aggregate instance), identified
by StreamId(tenant_id, stream_type, stream_id). Each stream can
have multiple versioned DEKs (for algorithm migration). This provides:
Better isolation than per-tenant (compromise of one DEK only affects one aggregate instance)
Manageable number of keys (one per aggregate instance, not per event)
Natural boundary for crypto-shredding at stream level
Safe algorithm migration without re-encrypting existing events
Alternatives considered¶
PostgreSQL pgcrypto (column-level encryption)
PostgreSQL’s pgcrypto extension can encrypt individual columns
using pgp_sym_encrypt / pgp_sym_decrypt. The per-tenant key
is passed via SET LOCAL session variable:
BEGIN;
SET LOCAL app.tenant_key = 'per-tenant-secret';
INSERT INTO events (payload)
VALUES (pgp_sym_encrypt('{"amount": 100}', current_setting('app.tenant_key')));
COMMIT;
This was rejected for several reasons:
Key management stays in the application anyway. PostgreSQL does not manage keys – the entire KEK/DEK hierarchy, rotation, and caching still has to live in application code.
pgcryptoonly moves theAES_ENCRYPTcall from the application to SQL.Performance. Decryption runs on the database server CPU. During projection rebuild or catch-up subscriptions, decrypting thousands of events loads PostgreSQL instead of horizontally scalable read-side services. In CQRS the heavy work should be on subscribers.
Logging and leaks.
pg_stat_statements, slow query log, andEXPLAIN ANALYZEmay capture decrypted values or keys. Requires careful tuning oflog_min_duration_statementand disabling parameter logging.No crypto-shredding guarantee. After deleting a tenant’s key, remnants may persist in PostgreSQL logs,
pg_stat, or WAL.Backup exposure.
pg_dumpexports encrypted blobs (good), but if keys are stored in the same database or passed via session variables that get logged, the protection is illusory.
PostgreSQL-level encryption may be appropriate for prototypes or when a full KMS is overkill, but for a multi-tenant event sourcing system with crypto-shredding requirements, application-level envelope encryption is the correct choice.
Decision¶
Payload column type changed from
jsonbtobytea. Encrypted payload is binary, not JSON. Metadata remainsjsonb.Codec decorator chain applied to payload on write/read:
EncryptionCodec(Aes256GcmCipher(dek, aad), ZlibCodec(JsonCodec()))The chain: serialize to JSON bytes, compress with zlib, encrypt with AES-256-GCM. On read – the reverse. The
ICodecinterface (encode/decode) allows composing arbitrary transformations via the Decorator pattern.DekStore returns
ICipherinstead of raw key bytes.get_or_createandgetreturn a ready-to-useICipherthat handles version prefix and AAD internally.get_allreturns a composite cipher that dispatchesdecrypt()by the version prefix in the ciphertext. TheEventStoreno longer knows aboutAes256GcmCipher– cipher construction is encapsulated inDekStore._make_raw_cipher(), which dispatches by thealgorithmcolumn stored per DEK version.DEKs are versioned. Each stream can have multiple DEK versions (stored as separate rows in
stream_deks). The encrypted payload starts with a 4-byte version prefix identifying which DEK was used. This enables algorithm migration without re-encrypting existing events: new events use the latest DEK version, old events remain decryptable with their original version.Query requests codec via factory, not a ready instance.
evaluate(codec_factory, session)– the query receives anICodecFactory(Callable[[ISession, StreamId], Awaitable[ICodec]]) and calls it with its ownStreamId. This way the query – which already owns the stream identity – decides when to obtain the codec, while the EventStore controls how it is constructed:# Write path: get_or_create returns ICipher for latest DEK version async def _make_codec_factory(self) -> ICodecFactory: _cache = {} async def codec_factory(session, stream_id): if stream_id not in _cache: cipher = await self._dek_store.get_or_create(session, stream_id) _cache[stream_id] = EncryptionCodec(cipher, ZlibCodec(JsonCodec())) return _cache[stream_id] return codec_factory # Read path: get_all returns composite ICipher for all DEK versions async def _make_read_codec_factory(self) -> ICodecFactory: _cache = {} async def codec_factory(session, stream_id): if stream_id not in _cache: cipher = await self._dek_store.get_all(session, stream_id) _cache[stream_id] = EncryptionCodec(cipher, ZlibCodec(JsonCodec())) return _cache[stream_id] return codec_factory_save()does not needstream_idas a parameter – the responsibility is given to the object that owns the data.get_or_createis used on the write path (creates DEK if absent),get_allon the read path (all DEK versions for a stream).Codec factory is a dependency, session is an argument.
evaluate(codec_factory, session)– the factory (strategy) comes before the runtime argument. This follows the principle that dependencies (stable, suitable forfunctools.partial) precede arguments (varying per call).KMS and DekStore use dynamic table names (
%ssubstitution for table name,%%sfor query parameters), allowing test subclasses to override_tablewithout duplicating SQL.tenant_id typed as
typing.Any. The KMS and DekStore do not enforce a specific type fortenant_id. The DDL type is chosen by the user in their schema (varchar,integer, with or withoutREFERENCES). Production code does not apply type conversions.
Consequences¶
Encryption at rest for event payloads: all event payloads are stored as encrypted binary. An attacker with database access cannot read business data without the master key.
Crypto-shredding for GDPR: deleting a tenant’s KEK (
delete_kek) renders all their events permanently unreadable without physically deleting rows from the immutable event store.Transparent key rotation:
rotate_kekcreates a new KEK version. Old events remain decryptable (DEKs still decrypt with their original KEK version).rewrap_dekcan re-encrypt DEKs with the new version when needed.Swappable KMS backend: the
IKeyManagementServiceinterface allows replacingPgKeyManagementServicewith Vault Transit, AWS KMS, or any other backend.Composable codec chain:
ICodecdecorators can be rearranged (e.g., remove compression, add signing) without modifying EventStore.Trade-off – no queryable payload: since payload is now
bytea, SQL queries cannot filter or index on payload fields. This is acceptable in event sourcing where projections handle query-side concerns.Safe algorithm migration: DEK versioning allows switching to a new encryption algorithm without re-encrypting existing events. New events use the latest DEK version; old events are decrypted with their original version (identified by the 4-byte prefix in the payload).
Trade-off – DEK lookup per stream: each distinct stream requires a DEK lookup. The
ICodecFactoryclosure caches codecs byStreamId, so repeated access to the same stream within one_save()/evaluate()call does not trigger additional lookups.
Related¶
ADR-0008: Accessing State of Encapsulated Aggregate – Mediator/Exporter pattern used to export event state for serialization