Browse by topic
Three topic hubs; each one defines the area, offers a start path, and lists the notes.
Performance
Latency, caching layers, queueing, and diagnosing regressions in web/services.
View notes
Formal methods
Specifications, invariants, model checking, and protocols as tools for clarity.
View notes
Distributed systems
Time/order, retries, idempotency, consensus, and observability for reliability.
View notes
Start here (reading paths)
Three short reading paths, one per topic.
Performance
- TTFB and origin latencyUnderstand request phases and what “first byte” really contains.
- Queueing basics and latency budgetsWhy tail latency explodes and how to set budgets that survive load.
- Performance regressions checklistA workflow to confirm, isolate, and mitigate regressions.
Formal methods
- Specs, invariants, and contractsWrite constraints you can review, test, and monitor.
- Model checking primerExplore behaviour spaces and interpret counterexamples.
- Session types and protocolsMake interaction structure explicit and checkable.
Distributed systems
- Time, clocks, and orderingDefine “before” without relying on a single clock.
- Failure modes, retries, idempotencyDesign operations so retry behaviour is safe and predictable.
- Observability for distributed systemsDebug with evidence across service boundaries.
Latest notes
Six recent notes across all topics.
- Autonomous Operations Maturity Model: Steps Toward Fully Autonomous SystemsA practical maturity model for moving from manual operations to autonomous systems, covering the observability, testing, and governance requirements at each level. distributed
- Agentic AI vs. Human Supervision: Designing Systems Where Humans Set BoundariesHow to design AI agent systems with meaningful human oversight: boundary-setting patterns, escalation protocols, and the operational cost of supervision at scale. distributed
- Multi-Cloud Strategies and Regulatory Pressures: Architecting for Availability and ComplianceHow regulatory requirements and availability goals interact when distributing workloads across cloud providers, and what architectural patterns reduce risk without excessive complexity. distributed
- Designing AI Factories: Building Intelligent, Governed Pipelines Inside Your EnterpriseAn architecture-level view of enterprise AI factories: how to build governed, observable pipelines that move models from experimentation to production reliably. distributed
- Resilience as the New Benchmark: Designing Fault-Tolerant Systems in 2026Why resilience has overtaken raw throughput as the primary design constraint for production systems, and how to evaluate fault tolerance at the architecture level. distributed
- Agentic AI and System Complexity: Ensuring Observability and Governance for Autonomous AgentsHow autonomous AI agents introduce new failure modes into distributed systems, and what observability and governance infrastructure is needed to operate them safely. distributed
- Observability for distributed systemsHow traces and correlation IDs turn incidents into evidence. distributed
- Performance regressions checklistA repeatable workflow for confirming and localising regressions. performance
- Specs, invariants, and contractsWhat to write down so behaviour stays reviewable. formal
- Failure modes, retries, and idempotencyWhy retries create duplicates and how to make outcomes safe. distributed
- Cache hierarchy: edge to originReason about caches as a system, not a single knob. performance
- Behavioural equivalence in plain EnglishWhat “same behaviour” means depends on your observations. formal
How notes are written
- Quick summaryA short statement of what the note covers and why it matters.
- Key ideasDefinitions and mini examples to build working intuition.
- Common pitfallsThe mistakes that recur in practice (measurement, assumptions, and semantics).
- Related notesInternal links that keep the knowledge base navigable and crawlable.