Start here

A path from core ideas to engineering consequences.

  1. Time, clocks, and orderingWhat “happened before” means and why clock time is not enough.
  2. Failure modes, retries, and idempotencyWhy retries create duplicates and how idempotency makes behaviour predictable.
  3. Observability for distributed systemsTracing, correlation, and turning incidents into evidence.

All notes in this topic

Common pitfalls

  • Treating time as a single clockWall-clock timestamps are useful, but they do not define causality under delay and drift.
  • Assuming “retry” means “try again”Retries change semantics: they create duplicates and can amplify load during incidents.
  • Using consensus as a cure-allConsensus solves a specific coordination problem; it does not remove the need for good data modelling.
  • Debugging without correlationLogs without request identifiers or traces often produce plausible narratives, not evidence.

Related topics

  • Performance notesTail latency and queueing are often triggered by distributed retries and fan-out.
  • Formal methods notesProtocols and invariants provide a crisp language for distributed correctness.
Notes topic hub • Last updated: Jan 2026