Distributed systems notes

Start here

A path from core ideas to engineering consequences.

Time, clocks, and orderingWhat “happened before” means and why clock time is not enough.
Failure modes, retries, and idempotencyWhy retries create duplicates and how idempotency makes behaviour predictable.
Observability for distributed systemsTracing, correlation, and turning incidents into evidence.

Consensus without the hypeWhat consensus is (and isn’t), what it buys you, and the real cost drivers.
Time, clocks, and orderingPhysical time vs logical time, causality, and why ordering decisions affect correctness.
Failure modes, retries, and idempotencyDuplicate requests, partial failures, and designing operations so retry behaviour is safe.
Observability for distributed systemsSignals that let you localise latency and failure across services.

Treating time as a single clockWall-clock timestamps are useful, but they do not define causality under delay and drift.
Assuming “retry” means “try again”Retries change semantics: they create duplicates and can amplify load during incidents.
Using consensus as a cure-allConsensus solves a specific coordination problem; it does not remove the need for good data modelling.
Debugging without correlationLogs without request identifiers or traces often produce plausible narratives, not evidence.