Internals & Deep Dives¶
This section covers FoundationDB's internal architecture and design for advanced users who want to understand how FDB achieves its remarkable reliability and performance.
Advanced Content
This section assumes familiarity with distributed systems concepts and the Core Concepts section.
Why Understand Internals?¶
FoundationDB is unique among distributed databases:
- Deterministic simulation testing - Found bugs before they hit production
- Unbundled architecture - Separation of concerns at every layer
- Proven at scale - Powers Apple's iCloud and other critical infrastructure
Understanding these internals helps you:
- Debug complex issues
- Make informed architectural decisions
- Contribute to the project
Deep Dive Topics¶
-
Simulation Testing
How FDB tests millions of failure scenarios with deterministic simulation.
-
Architecture Deep Dive
Transaction processing, consensus, and recovery internals.
-
Storage Engines
Redwood B-tree, SSD engine, and storage internals.
-
Data Distribution
How FDB partitions keyspace, builds teams, and rebalances data.
-
Transaction Pipeline
End-to-end transaction flow from client to durable storage.
-
Recovery Internals
Multi-phase cluster recovery and failure handling.
The SIGMOD Paper¶
The definitive technical reference is the 2021 SIGMOD paper:
FoundationDB: A Distributed Unbundled Transactional Key Value Store
Authors: Jingyu Zhou, Meng Xu, Alexander Shraer, Bala Namasivayam, Alex Miller, et al.
Published: SIGMOD 2021 (Best Industry Paper Award)
The paper covers:
- Overall system architecture
- Transaction processing model
- Simulation testing methodology
- Production deployment lessons
Conference Talks¶
Learn from the engineers who built FDB:
| Talk | Speaker | Event |
|---|---|---|
| Testing Distributed Systems w/ Deterministic Simulation | Will Wilson | Strange Loop 2014 |
| FoundationDB: A Distributed Key-Value Store | A.J. Christensen | Apple Tech Talks |
| Simulation Testing at Apple | Markus Pilman | FDB Summit 2019 |
The Flow Language¶
FoundationDB is written in Flow, a custom language that extends C++ with actor-based concurrency primitives. Flow is not just a convenience—it is the foundation that makes deterministic simulation testing possible.
Single-Threaded Cooperative Execution¶
Each fdbserver process runs a single thread with a cooperative run loop (event loop). Multiple actors run within this single thread—they are not OS threads. Actors voluntarily yield control at wait() points, returning execution to the run loop. Between wait() points, an actor has exclusive access to process memory with no preemption.
sequenceDiagram
participant RL as Run Loop
participant A as Actor A
participant B as Actor B
participant C as Actor C
RL->>A: Resume Actor A
Note over A: Runs until wait()
A-->>RL: Yield (waiting on Future)
RL->>B: Resume Actor B
Note over B: Runs until wait()
B-->>RL: Yield (waiting on Future)
RL->>C: Resume Actor C
Note over C: Runs until wait()
C-->>RL: Yield (waiting on Future)
RL->>A: Future ready → Resume Actor A
Note over A: Runs until wait()
A-->>RL: Yield This model is similar to JavaScript's event loop or Python's asyncio—concurrency without parallelism within a single process.
How Flow Code Works¶
Flow's syntax uses ACTOR, Future<T>, and wait()—it looks like async/concurrent code, but the Flow compiler (actorcompiler) transpiles .actor.cpp files into state-machine-based C++. At runtime within a single process, only one actor is executing at any given moment.
Here is a typical Flow actor:
// Flow actor - enables deterministic concurrency testing
ACTOR Future<Void> fetchValue(Database db, Key key) {
state Transaction tr(db);
loop {
try {
Optional<Value> val = wait(tr.get(key));
if (val.present()) {
TraceEvent("GotValue").detail("Key", key).detail("Value", val.get());
}
return Void();
} catch (Error& e) {
wait(tr.onError(e));
}
}
}
Key elements of this code:
ACTORmarks a function as a cooperative task managed by the run loopFuture<T>represents an asynchronous result that will be available laterwait()yields control back to the run loop until theFutureis readystatevariables persist acrosswait()points (since the actor is compiled into a state machine)loop/try/catchprovide retry logic—idiomatic in FDB's transactional patterns
Advantages of Single-Threaded Design¶
The single-threaded cooperative model provides several critical benefits:
| Advantage | Why It Matters |
|---|---|
| No locks or mutexes | Data races are impossible within a process by construction |
| Simpler correctness reasoning | No need to reason about thread interleavings |
| Deterministic execution | The run loop processes events in a deterministic order, enabling simulation |
| Cache-friendly | A single thread provides good CPU cache locality |
How FDB Scales: Processes, Not Threads¶
FDB does not scale by adding threads to a process. Instead, it scales horizontally by running many single-threaded processes—potentially multiple per machine. Each process takes on one or more roles (storage server, TLog, commit proxy, etc.), and the cluster coordinator assigns roles to processes.
graph TB
subgraph "Machine 1"
P1["fdbserver<br/>(Storage)"]
P2["fdbserver<br/>(TLog)"]
end
subgraph "Machine 2"
P3["fdbserver<br/>(Commit Proxy)"]
P4["fdbserver<br/>(Storage)"]
end
subgraph "Machine 3"
P5["fdbserver<br/>(TLog)"]
P6["fdbserver<br/>(Resolver)"]
end
P1 <--> P3
P2 <--> P3
P4 <--> P5
P3 <--> P6 This is a scale-out model: add more processes or machines to increase capacity, rather than adding threads within a process.
Connection to Simulation¶
The single-threaded cooperative model is what makes deterministic simulation possible. Because execution is cooperative and single-threaded, the simulator can control the exact interleaving of actor execution. It replaces the real run loop with a deterministic one that uses seeded randomness—the same seed always reproduces the same bug because there is no OS thread scheduling non-determinism to contend with.
For a full treatment of simulation testing, see the Simulation Testing guide.
Core Design Principles¶
1. Unbundled Architecture¶
FDB separates concerns into independent subsystems:
graph TB
subgraph "Control Plane"
CC[Cluster Controller]
Coord[Coordinators]
end
subgraph "Transaction System"
Proxy[Proxies]
Resolver[Resolvers]
TLog[Transaction Logs]
end
subgraph "Storage System"
SS[Storage Servers]
end
CC --> Proxy
CC --> Resolver
CC --> TLog
CC --> SS
Coord --> CC 2. Simulation-First Development¶
Every feature is tested in simulation before production:
- Write the feature with Flow actors
- Run millions of simulated scenarios
- Inject every possible failure mode
- Only ship when simulation passes
3. Recovery Over Prevention¶
FDB embraces failure as inevitable:
- Design for fast recovery, not failure prevention
- Every component can be replaced quickly
- System continues operating during recovery
Learning Path¶
- Start with Simulation Testing - Understand FDB's secret weapon
- Read Architecture Deep Dive - Learn how transactions work
- Explore Storage Engines - Understand data persistence
Additional Resources¶
- FDB Design Documents - Internal design proposals
- FDB Source Code - The definitive reference
- Community Forums - Ask questions, discuss internals
Source Code Quick Links¶
| Component | Source File |
|---|---|
| Cluster Controller | ClusterController.actor.cpp |
| Commit Proxy | CommitProxyServer.actor.cpp |
| GRV Proxy | GrvProxyServer.actor.cpp |
| Resolver | Resolver.actor.cpp |
| Transaction Logs | TLogServer.actor.cpp |
| Storage Server | storageserver.actor.cpp |
| Data Distributor | DataDistribution.actor.cpp |
| Ratekeeper | Ratekeeper.actor.cpp |
| Coordinators | Coordination.actor.cpp |
| Master | masterserver.actor.cpp |