A comprehensive collection of resources for learning distributed systems, performance engineering, and scalable system design.

Books

Distributed Systems & Databases

  • Database Internals - Alex Petrov
  • Designing Data-Intensive Applications - Martin Kleppmann
  • High Performance Browser Networking - Ilya Grigorik
  • Just Use Postgres! - Denis Magda
  • Latency - Pekka Enberg

Language-Centric

  • Fluent Python - Lucian Ramalho
  • Rust Atomics and Locks - Mara Bos
  • Rust for Rustaceans - Jon Gjengset

AI

I’m not focussing on learning AI skills specifically. For my learning I plan on avoiding AI altogether, at least coding agents.

  • Build a Large Language Model (From Scratch) - Sebastian Raschka ( Low Priority )
  • AI Engineering - Chip Huyen ( Low Priority )
  • Designing Machine Learning Systems - Chip Huyen ( Low Priority )

Python PEPs

I wanted to read some Python Enhancement Proposals that led to improved concurrency in Python. These are what I have so far.

Blog Articles

  • Napkin Math - Back-of-the-envelope calculations for systems design

Brendan Gregg - Performance Engineering

Engineering Blogs

Cloudflare Postmortems

Papers

Uncurated List
This list hasn’t been curated yet. I’ll prune or strikethrough papers that I haven’t found useful or won’t read for my preparation.

Essential Papers (Start Here)

  1. Time, clocks, and the ordering of events in a distributed system - Leslie Lamport, 1978
  2. The Byzantine Generals Problem - Leslie Lamport, Robert Shostak, and Marshall Pease, 1982
  3. Distributed snapshots: determining global states of distributed systems - K. Mani Chandy and Leslie Lamport, 1985
  4. Impossibility of distributed consensus with one faulty process - Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson, 1985
  5. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems - Brian M. Oki and Barbara H. Liskov, 1988
  6. The part-time parliament (Paxos) - Leslie Lamport, 1998
  7. Paxos Made Simple - Leslie Lamport, 2001
  8. Bitcoin: A Peer-to-Peer Electronic Cash System - Satoshi Nakamoto, 2008
  9. Conflict-free replicated data types - Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski, 2011
  10. In search of an understandable consensus algorithm (Raft) - Diego Ongaro and John Ousterhout, 2014

Comprehensive Reading List

System Design Principles

Latency

Amazon Systems

Google Systems

Consistency Models

Theory

Expository and Tutorial Resources:

Languages and Tools

Infrastructure

Distributed Storage

Consensus and Replication

Expository and Tutorial Resources:

Gossip Protocols and Epidemic Algorithms

Peer-to-Peer Systems

Distributed Algorithms

Expository and Tutorial Resources:

System Design and Architecture

Expository and Tutorial Resources:

Cloud Computing and Big Data