TTFB is one of the most-used and most-misused performance numbers. When it rises, teams often jump straight to “the server is slow”, but time to first byte is not a direct synonym for server processing time. It is a composite: it includes network distance and connection setup, any edge/caching behaviour in front of the origin, and the origin’s time to begin producing a response.
This note treats TTFB as a diagnostic handle. The goal is to break the number into phases you can measure with different tools, and to develop a repeatable approach for distinguishing (1) a network or routing change, (2) an edge or cache miss pattern change, and (3) genuine origin work or contention.
Quick takeaways
- TTFB is a bundleIt can include DNS, TCP, TLS, edge processing, origin waiting, and origin compute before the first byte leaves.
- Measure from multiple vantage pointsClient, edge, and origin timings answer different questions; none is sufficient alone.
- First byte is about “start of response”Streaming, buffering, and backend fan-out can delay the first byte even if total payload is small.
- Correlate with cache status and connection reuseA shift from keep-alive to new connections or from cache hits to misses can dominate TTFB.
Problem framing (why it matters)
Users and downstream services feel latency as waiting: a browser cannot render above-the-fold content until enough bytes arrive, and a calling service cannot proceed until it receives a response header or the first chunk of data. TTFB is often the first “waiting” milestone you can observe with lightweight instrumentation.
It is also the milestone most sensitive to “hidden” system effects. Connection setup, TLS negotiation, and cache routing are not part of your application code, but they dominate the experience when they change. If you interpret TTFB as “app CPU time”, you will look in the wrong place and learn the wrong lessons.
Key concepts (definitions + mini examples)
What TTFB usually includes
From a browser or HTTP client perspective, the time to first byte is the wall-clock time between initiating the request and receiving the first byte of the response from the remote side.
A useful decomposition is:
TTFB ≈ DNS + TCP handshake + TLS handshake + request send + (edge/origin wait) + first response byte
Depending on the environment, DNS may be cached, TCP/TLS may be reused (HTTP keep-alive), and an edge cache may respond without contacting the origin. Each of those conditions changes which sub-terms matter.
Origin latency is not one thing either
When TTFB is dominated by “origin work”, that work still splits into distinct causes:
- Queueing in front of the originRequests wait for an available worker/CPU slot. This grows rapidly as utilisation rises.
- Dependency latencyDatabase, cache, and downstream RPC calls can block response start.
- Response bufferingSome stacks buffer headers/body until enough bytes exist to flush; streaming can change this.
- Cold-start effectsA just-started process may be slower at first request (JIT, cache warmup, lazy init).
The practical consequence is simple: “origin is slow” should always be followed by “slow because of queueing, dependency, or work?”.
Practical checks (steps/checklist)
1) Confirm the symptom and the population
Start with two clarifications: which population is affected (all users vs a region vs a single client type), and which requests are affected (all endpoints vs one route vs one dependency path). Treat “TTFB is up” as incomplete until you can say:
- WhereRegion/PoP/ASN/client platform.
- WhichEndpoint, request class, or traffic segment.
- WhenThe start time and whether the shift is gradual or step-like.
2) Separate connection setup from request processing
A TTFB regression can be entirely connection-related: DNS cache misses, new TLS behaviour, or reduced keep-alive reuse. If you have client-side timing breakdowns, check whether the increase is in connect/TLS vs “waiting”. If you do not, use a controlled client near the affected region to compare repeated requests (warm connections) vs first request (cold connection).
3) Check cache status and routing behaviour
If a CDN or edge sits in front of the origin, changes in cache hit ratio can dominate user-visible latency. Even without vendor-specific headers, you can often observe a bimodal distribution: cache hits are fast and stable; misses are slower and more variable. A shift from “mostly hits” to “mostly misses” looks like a TTFB regression, but it is a cache policy/input change.
4) If it’s the origin, ask: queueing or work?
Queueing produces a telltale pattern: p50 may move slightly while p95/p99 move a lot, and the effect becomes worse as traffic increases. If p50 and p95 both move by a similar amount, suspect a systematic new cost (a new dependency call, a slower query plan, or a new serial step).
Use this as a sanity rule: if the origin’s CPU is comfortably low and there are no signs of saturation, a pure queueing explanation is unlikely. If CPU is high or concurrency is at its limit, queueing is the default explanation until proven otherwise.
5) Validate with a small local experiment
When possible, validate a hypothesis by changing one knob. Examples: forcing keep-alive reuse, bypassing cache for a single request class, or temporarily reducing an expensive dependency path to see whether TTFB responds. The aim is not to “fix” immediately but to reduce uncertainty.
Common pitfalls
- Equating TTFB with server computeTTFB includes network and edge behaviour; treating it as CPU time leads to misdiagnosis.
- Ignoring connection reuseA loss of keep-alive can add handshakes to every request and look like an application regression.
- Not distinguishing cache hit vs missA cache policy change can double “average TTFB” without any origin code change.
- Using a single test client as “truth”Regional routing, resolver choice, and packet loss create real population differences.
- Chasing noise in the meanUse distributional views: p50/p95/p99, and look for step changes rather than daily variance.
Related notes
- Cache hierarchy: edge to originHow caches affect TTFB and how to reason about hit ratios.
- Queueing basics and latency budgetsWhy tail latency spikes and how to recognise queueing patterns.
- Performance regressions checklistA broader workflow for isolating regressions beyond TTFB.
- Observability for distributed systemsCorrelate client, edge, and origin traces when timing disagree.