Cluster Topology Estimator¶

Interactive sizing calculator for FoundationDB clusters. Describe your hardware and pick a workload shape; the page reports the storage / log / proxy / resolver / coordinator counts that layout supports, the sustainable workload it can sustain, and an fdbcli snippet you can adapt.

Starting point, not authoritative

These recommendations are derived from public benchmarks and operator reports. They are a starting point for capacity planning, not a guarantee. Validate against your own workload, and read the How sizing works section below to understand the limits of each formula.

Calculator¶

Hardware

Redundancy mode Drives replication factor and coordinator count.

Storage machines Number of physical hosts dedicated to class=storage.

Cores per storage machine One fdbserver process per core. Storage nodes typically run 4–16.

Total data size (TB) Logical (pre-replication) dataset size.

Workload shape

max-read-throughput balanced 90/10 max-write-throughput

Active: balanced 90/10. Sets the SS:T-log default and the read/write split of the sustainable-workload estimate.

SS : T-log ratio Slider sets a default (8 / 12 / 20). Override here after if you want.

Calibrate to your hardware. The Advanced defaults below come from Apple's published per-process benchmarks (16-byte keys, ≤100-byte values, SATA SSD). Real numbers vary 2–10× with disk class, fsync behaviour, key/value size, and CPU. Before trusting these for capacity planning, run a saturating single-process benchmark on your target hardware (one fdbserver process, one core, one disk) and replace the defaults with what you measure.

Advanced: calibration constants

Reads/s per storage process

Writes/s per storage process

Throughput headroom multiplier

Soft cap on data per storage process (GB)

Writes/s per commit proxy before adding more

Reads/s per GRV proxy

Writes/s per resolver

RAM per fdbserver process (GB)

Failure domains (racks / AZs / data halls)

Recommended topology

Apply this configuration

Live-updated fdbcli snippet derived from the recommendations above. Review it carefully before applying to a live cluster — configure new is destructive.

loading…

How sizing works¶

The calculator above is driven by a small set of per-role rules. Each section below explains the rule, what limits it in practice, and when to add more.

Inputs that drive sizing¶

Input	Drives
Storage machines × cores	Storage-process count (`SS = machines × cores`), and through that the sustainable read/write capacity that every other role is sized from
Total data size	Cluster capacity check, per-SS data load
Redundancy mode	Replication factor (storage and T-log), coordinator count, write amplification on the storage tier, total RAM
Workload shape (slider)	SS : T-log ratio default, and the read- vs write-credit fractions used to compute sustainable reads/writes
SS : T-log ratio	T-log process count (`logs = ceil(SS / ratio)`)
Failure domains	Validates coordinator placement for `three_data_hall` / `three_datacenter`
Advanced calibration constants	Per-process throughput baselines, headroom, max-data-per-SS, and the per-CP / per-GRV / per-resolver thresholds that consume the sustainable numbers

Workload-shape slider¶

The slider has three positions. Each one sets a default SS : T-log ratio and a pair of credit fractions that decide how much of every storage process's per-second budget is spent on reads vs writes:

Position	Default SS:T-log ratio	Read credit	Write credit	Use when
`max-read-throughput`	20	1.0	0.1	Read-heavy serving, analytics, mostly point lookups; T-logs are barely used
`balanced 90/10` (default)	12	0.9	0.5	OLTP-style mix dominated by reads with a steady write tail
`max-write-throughput`	8	0.5	1.0	Ingest, mutation-heavy pipelines; CP / T-logs need more headroom

Moving the slider rewrites the SS : T-log ratio input with the preset's default. Type a new value into the ratio input afterwards if you want to override it.

Role data flow¶

flowchart LR
    Client([Clients])
    GRV[GRV proxies]
    CP[Commit proxies]
    Res[Resolvers]
    TL[Transaction logs]
    SS[Storage servers]
    Coord[Coordinators]

    Client -->|GetReadVersion| GRV
    GRV -->|read version| Client
    Client -->|reads| SS
    Client -->|commits| CP
    CP -->|conflict check| Res
    CP -->|commit batch| TL
    TL -->|persist mutations| SS
    Coord -.->|cluster file / leadership| GRV
    Coord -.-> CP
    Coord -.-> TL

Sustainable workload¶

The calculator works backwards from the storage tier. Once you tell it how many storage processes you have (SS = machines × cores), it computes a sustainable read rate and sustainable write rate that the layout can sustain at the chosen workload-shape position. Every other role count is then derived from those two numbers, so the calculator is closing the loop instead of asking the operator to estimate QPS up front.

Text Only

sustainable_reads  = floor(SS × per_proc_reads  × read_credit  / headroom)
sustainable_writes = floor(SS × per_proc_writes × write_credit / RF / headroom)

The read_credit and write_credit fractions come from the workload-shape slider (see table above). Writes are divided by the replication factor because every logical write fans out to RF storage processes — the per-SS write budget is shared by all replicas.

What limits it. Per-process throughput is bounded by disk IOPS, fsync latency, and CPU. The Advanced calibration constants (perProcReadsSsd, perProcWritesSsd, headroom) are the knobs that recalibrate this estimate to your hardware. Re-run a saturating single-process benchmark and update them before treating these numbers as authoritative.

Storage servers¶

Rule of thumb. Storage processes come straight from the hardware inputs:

Text Only

storage_processes = machines × cores_per_machine

The calculator does not auto-grow this number. Instead it checks two soft ceilings and warns when the layout cannot hold the data:

Text Only

cluster_capacity = SS × max_data_per_SS / RF       # logical bytes the layout can hold
data_per_SS      = data_GB × RF / SS               # actual replicated load per process

You'll see a warning if data_GB × RF > SS × max_data_per_SS (not enough storage processes for the dataset at this replication factor) or if data_per_SS > max_data_per_SS (per-SS load above the recovery / data-distribution soft target). Both conditions can fire independently.

What limits it. Disk IOPS, fsync latency, and CPU on each SS process. SSDs hit fsync ceilings before CPU does on write-heavy workloads.

When to add more. Sustained data_lag_seconds > 5, storage_queue consistently > 100 MB, or data_distribution_queue_length > 0 for long stretches. Bump machines or cores_per_machine until the data-per-SS warning clears and the sustainable-workload card matches your target traffic.

1–3 SSes per disk

Modern NVMe can host 2 SSes per disk; high-IOPS network block storage 1–2; low-IOPS block storage 1. Each SS needs its own disk path so writes don't queue against each other.

Transaction logs¶

Rule of thumb. One T-log per host on dedicated class=transaction nodes. The workload-shape slider sets the SS : T-log ratio default — 20 for max-read-throughput, 12 for balanced 90/10, 8 for max-write-throughput. Override the ratio input afterwards if your write profile sits between two presets.

Community-reported ratios cluster around 8:1 to 10:1 storage-to-T-log (semisol) for general workloads, with read-heavy deployments going as wide as 20:1 to free more cores for storage. The slider picks within that range based on your workload shape.

Text Only

logs = max(3, ceil(storage_processes / SS_to_TLog_ratio))

What limits it. Group-commit fsync latency on the T-log disk. T-logs are disk-IOPS bound, not capacity bound. Co-locating two T-logs on the same disk halves throughput; keep one T-log per disk and one per host.

When to add more. commit_latency rises and tlog_queue_size climbs while disk fsync latency is healthy → fan-in is the bottleneck. Move the slider toward max-write-throughput (or lower the ratio manually) to widen the T-log tier.

Diminishing returns past 15 T-logs

Operators report negligible throughput gains beyond ~15 T-logs in a single log set. Past that point, evaluate sharding the workload across multiple clusters before adding more T-logs.

Commit proxies¶

Rule of thumb. Default to 3 commit proxies (FDB's default). The calculator scales by the sustainable write rate the storage tier supports — not by a user-supplied write QPS:

Text Only

commit_proxies = max(3, ceil(sustainable_writes / 50000))

What limits it. Commit-batch CPU and network bandwidth on each proxy. Each commit proxy adds latency overhead even when idle, so don't over-provision.

When to add more. commit_latency rises while tlog_queue_size is healthy and resolver_queue is short — the proxies themselves are the bottleneck. The slider's max-write-throughput position raises sustainable_writes and therefore the recommended CP count.

GRV proxies¶

Rule of thumb. Default 1 proxy. Cluster-wide GRV throughput saturates around 400K/s, so adding more than ~3–4 GRV proxies rarely helps. The count scales with the sustainable read rate:

Text Only

grv_proxies = max(1, ceil(sustainable_reads / 200000))

What limits it. Single-master getReadVersion serialization. Adding GRV proxies parallelises the GRV path but the underlying master still has to mint version numbers serially.

When to add more. read_version_batch_size is high or clients see GRV latency growing while master CPU is moderate. The slider's max-read-throughput position raises sustainable_reads and therefore the recommended GRV count.

Resolvers¶

Rule of thumb. Default 1. The community guidance is "have one, that is it" unless profiling proves otherwise. The calculator scales with the sustainable write rate:

Text Only

resolvers = max(1, ceil(sustainable_writes / 80000))

What limits it. Conflict-set memory and CPU. Each resolver owns a key range; more resolvers can increase false conflicts because keys from one transaction can hash across multiple resolvers and trigger spurious aborts.

When to add more. resolver_queue is consistently long and CPU on the single resolver is pegged. Rarely needs more than 4.

Coordinators¶

Rule of thumb. Use an odd number of coordinators in distinct failure domains (racks, AZs, datacenters):

Redundancy	Coordinators
`single`	1
`double`	3
`triple`	5
`three_data_hall`	9
`three_datacenter`	9

Coordinators don't sit on the hot path — their latency does not affect normal transactions. Their job is to maintain a Paxos-like quorum on the cluster's coordination state. Set the failure-domains count in Advanced to the number of distinct racks / AZs / data halls you have; the calculator warns when three_data_hall or three_datacenter is selected and the failure-domain count is below the recommended coordinator count.

Stateless processes¶

Reserve a small pool of class=stateless processes for the cluster controller, master, commit proxies, GRV proxies, resolvers, plus a couple of standby slots and any log routers you run. The kubernetes-operator scaling guide recommends:

Text Only

stateless_slots ≈ commit_proxies + grv_proxies + resolvers + 4

The + 4 covers cluster controller, master, and two warm standbys.

Machines¶

The calculator splits machines into three classes:

Class	Count
`storage`	`machines` (input — `cores_per_machine` SSes per host)
`transaction` (one T-log per host)	= `logs`
`stateless` (≤ 8 procs per host)	`ceil((cp + grv + res + 4) / 8)`

Total cluster RAM is summed across every process: (SS + logs + cp + grv + res + 4) × ram_per_process × (1 + ram_headroom). The default per-process RAM budget is 4 GB (per foundationdb.conf defaults), with 25% headroom for OS, page cache, and byte-sample memory inside the storage process.

Replication factor implications¶

Redundancy mode	Replication factor (storage)	T-log replication	Coordinators	Notes
`single`	1	1	1	Dev only — no fault tolerance.
`double`	2	2	3	Single-machine failure.
`triple`	3	3	5	Default for production.
`three_data_hall`	3 (across 3 data halls)	4	9	Survives loss of one data hall.
`three_datacenter`	6 (3 × 2 sites)	6	9	Highest cost, multi-region.

Replication multiplies storage cost (bytes-on-disk and SS RAM) and divides the write tier's per-process budget by RF — every logical write fans out to RF storage processes. The sustainable-writes formula already accounts for this division.

Process-class layout patterns¶

Layouts borrowed from operator playbooks (see semisol's blog and the kubernetes-operator scaling guide):

Tiny (dev / staging)Small (1–4 TB)Medium / large (10+ TB)

Hosts	Roles per host
3	1 × stateless + 1 × log + 2 × storage, all `class=unset`

Use single or double redundancy. One process per role, all on the same machine class.

Hosts	Roles per host
3 transaction	1 × T-log, `class=transaction`
3 stateless	cluster controller / master / proxies / resolver, `class=stateless`
N storage	up to 8 × SS, `class=storage`

Triple redundancy. Coordinators co-locate on stateless or transaction machines in distinct failure domains.

Hosts	Roles per host
5–15 transaction	1 × T-log per host
3+ stateless	dedicated, scaled with proxy count
many storage	one disk = 1–2 SS, ≤ 8 SS per host

Triple or three_data_hall. Add transaction-class hosts before commit proxies — most "throughput is too low" issues are T-log fan-in, not proxy count.

Calculator vs reality

The calculator scales role counts off a few simple per-process throughput heuristics. Real-world commit-proxy scaling is bound by per-CP commit-batch CPU well below the 50K writes/CP rule of thumb, so treat the recommended commit-proxy count as a floor: profile the cluster and raise commit_proxies as commit_latency warrants. Likewise the sustainable workload numbers are a ceiling at the listed per-process baselines and slider position; calibrate perProcReadsSsd / perProcWritesSsd in Advanced to match a saturating single-process benchmark on your hardware before treating them as authoritative.

Sources¶

FoundationDB Configuration — Apple's official configuration guide (redundancy modes, storage backends, process classes).
FoundationDB Architecture — role breakdown for proxies, resolvers, T-logs, storage servers.
FoundationDB Performance — published per-process throughput numbers.
Forum: Scaling log server and log to storage ratio — community discussion of T-log scaling and CPU saturation.
Forum: How to troubleshoot throughput / performance degrade — diagnosing CP / T-log / SS bottlenecks in practice.
fdb-kubernetes-operator scaling guide — process-class slot accounting and stateless minimum.
semisol — FoundationDB tuning — practical layout patterns and proxy/resolver tuning notes.