“Just sort by timestamp” is one of the most expensive sentences in distributed systems. The underlying issue is not that clocks are bad; it is that “time” mixes two concepts: a physical measurement (what a wall clock reads) and a causal relationship (which event could have influenced which).

This note is about choosing the right notion of ordering. Some problems require causal order; some require a total order; many require neither and only need a stable tie-break. If you make the wrong choice, you will build systems that work in calm conditions and misbehave under delay, retries, or partition.

Quick takeaways

  • Physical clocks are not shared truthThey drift, and even small drift becomes meaningful at scale.
  • Causality is about influenceIf A can affect B, then A should be ordered before B in a causal order.
  • Logical clocks model order, not timeThey are useful when you care about consistent ordering, not real seconds.
  • Total order is expensiveIf you need a single global order, you often need coordination (sometimes consensus).
  • Most bugs are mismatched assumptionsOne component assumes order; another delivers reorderings.

Problem framing (what is “ordering”?)

Ordering enters systems via requirements like: “apply updates in the order they were created”, “show the latest value”, or “avoid missing increments”. These requirements can mean different things:

  • Causal orderIf an event depends on another (reads after write), it must come later.
  • Total orderAll events are comparable; there is one global sequence.
  • Stable order for presentationYou can pick a deterministic tie-break for unrelated events.

Trouble starts when a system needs causal order but is implemented as “sort by physical timestamp”.

Diagram: physical clock order can differ from causal order

Key concepts (definitions + mini examples)

Clock drift and skew

A physical clock can be “wrong” by a small amount (skew) and can change speed slightly (drift). If two machines disagree by even tens of milliseconds, then ordering based on timestamps can be inverted for events near each other. That inversion is not a bug in the clock; it is a mismatch between the requirement and the mechanism.

Happens-before (intuition)

The causal relation (“happens-before”) captures influence: if A sends a message that causes B, then A happens before B. If two events do not influence each other, they are concurrent: neither happens before the other. Concurrency is not rare; it is the default.

Logical clocks (why they exist)

Logical clocks produce numbers that respect some ordering constraints. A simple logical clock ensures that if A causally precedes B, then timestamp(A) < timestamp(B). It does not claim that the timestamps represent real time.

Logical clocks are useful when you need to merge histories while preserving causality. If you need a single total order for all events, logical clocks alone are not enough: concurrent events remain unordered.

Practical checks (patterns for avoiding ordering bugs)

1) Ask what “latest” means

If “latest” means “most recently created by real time”, you need clock assumptions. If it means “most recently causally visible”, you need causality tracking. If it means “most recently applied in one place”, you need a single-writer or a coordinator.

2) Prefer single-writer per key when possible

If each key (user, document, account) has a single writer at a time, you can impose order locally. This is often cheaper than global ordering and can be achieved via sharding or leases.

3) Design merges explicitly

If concurrent updates can occur, decide what merging means: last-write-wins, add-wins, or something domain-specific. Document it as part of the interface contract.

4) Use coordination only when the invariant requires it

If a domain invariant requires total order (e.g., exactly one leader, globally consistent configuration), use a coordination mechanism. This is where consensus typically appears.

Common pitfalls

  • Using timestamps as truthThey are measurements; they are not automatically a correctness mechanism.
  • Equating “received order” with “sent order”Networks reorder. Queues reorder. Retries duplicate. Assume it.
  • Ignoring timeoutsTimeouts create “apparent order” changes: clients give up and retry, changing observed sequences.
  • Assuming concurrency is rareAt scale, concurrent updates are normal; merges need a design.
  • Not matching order to invariantIf you need uniqueness, ordering mechanisms may not be sufficient; you may need stronger coordination.

Related notes