Skip to content

Cluster Topology Estimator

Interactive sizing calculator for FoundationDB clusters. Describe your hardware and pick a workload shape; the page reports the storage / log / proxy / resolver / coordinator counts that layout supports, the sustainable workload it can sustain, and an fdbcli snippet you can adapt.

Starting point, not authoritative

These recommendations are derived from public benchmarks and operator reports. They are a starting point for capacity planning, not a guarantee. Validate against your own workload, and read the How sizing works section below to understand the limits of each formula.

Calculator

Hardware

Drives replication factor and coordinator count.
Number of physical hosts dedicated to class=storage.
One fdbserver process per core. Storage nodes typically run 4–16.
Logical (pre-replication) dataset size.

Workload shape

max-read-throughput balanced 90/10 max-write-throughput
Active: balanced 90/10. Sets the SS:T-log default and the read/write split of the sustainable-workload estimate.
Slider sets a default (8 / 12 / 20). Override here after if you want.
Calibrate to your hardware. The Advanced defaults below come from Apple's published per-process benchmarks (16-byte keys, ≤100-byte values, SATA SSD). Real numbers vary 2–10× with disk class, fsync behaviour, key/value size, and CPU. Before trusting these for capacity planning, run a saturating single-process benchmark on your target hardware (one fdbserver process, one core, one disk) and replace the defaults with what you measure.
Advanced: calibration constants

Recommended topology

Apply this configuration

Live-updated fdbcli snippet derived from the recommendations above. Review it carefully before applying to a live cluster — configure new is destructive.

loading…

How sizing works

The calculator above is driven by a small set of per-role rules. Each section below explains the rule, what limits it in practice, and when to add more.

Inputs that drive sizing

Input Drives
Storage machines × cores Storage-process count (SS = machines × cores), and through that the sustainable read/write capacity that every other role is sized from
Total data size Cluster capacity check, per-SS data load
Redundancy mode Replication factor (storage and T-log), coordinator count, write amplification on the storage tier, total RAM
Workload shape (slider) SS : T-log ratio default, and the read- vs write-credit fractions used to compute sustainable reads/writes
SS : T-log ratio T-log process count (logs = ceil(SS / ratio))
Failure domains Validates coordinator placement for three_data_hall / three_datacenter
Advanced calibration constants Per-process throughput baselines, headroom, max-data-per-SS, and the per-CP / per-GRV / per-resolver thresholds that consume the sustainable numbers

Workload-shape slider

The slider has three positions. Each one sets a default SS : T-log ratio and a pair of credit fractions that decide how much of every storage process's per-second budget is spent on reads vs writes:

Position Default SS:T-log ratio Read credit Write credit Use when
max-read-throughput 20 1.0 0.1 Read-heavy serving, analytics, mostly point lookups; T-logs are barely used
balanced 90/10 (default) 12 0.9 0.5 OLTP-style mix dominated by reads with a steady write tail
max-write-throughput 8 0.5 1.0 Ingest, mutation-heavy pipelines; CP / T-logs need more headroom

Moving the slider rewrites the SS : T-log ratio input with the preset's default. Type a new value into the ratio input afterwards if you want to override it.

Role data flow

flowchart LR
    Client([Clients])
    GRV[GRV proxies]
    CP[Commit proxies]
    Res[Resolvers]
    TL[Transaction logs]
    SS[Storage servers]
    Coord[Coordinators]

    Client -->|GetReadVersion| GRV
    GRV -->|read version| Client
    Client -->|reads| SS
    Client -->|commits| CP
    CP -->|conflict check| Res
    CP -->|commit batch| TL
    TL -->|persist mutations| SS
    Coord -.->|cluster file / leadership| GRV
    Coord -.-> CP
    Coord -.-> TL

Sustainable workload

The calculator works backwards from the storage tier. Once you tell it how many storage processes you have (SS = machines × cores), it computes a sustainable read rate and sustainable write rate that the layout can sustain at the chosen workload-shape position. Every other role count is then derived from those two numbers, so the calculator is closing the loop instead of asking the operator to estimate QPS up front.

Text Only
sustainable_reads  = floor(SS × per_proc_reads  × read_credit  / headroom)
sustainable_writes = floor(SS × per_proc_writes × write_credit / RF / headroom)

The read_credit and write_credit fractions come from the workload-shape slider (see table above). Writes are divided by the replication factor because every logical write fans out to RF storage processes — the per-SS write budget is shared by all replicas.

What limits it. Per-process throughput is bounded by disk IOPS, fsync latency, and CPU. The Advanced calibration constants (perProcReadsSsd, perProcWritesSsd, headroom) are the knobs that recalibrate this estimate to your hardware. Re-run a saturating single-process benchmark and update them before treating these numbers as authoritative.

Storage servers

Rule of thumb. Storage processes come straight from the hardware inputs:

Text Only
storage_processes = machines × cores_per_machine

The calculator does not auto-grow this number. Instead it checks two soft ceilings and warns when the layout cannot hold the data:

Text Only
cluster_capacity = SS × max_data_per_SS / RF       # logical bytes the layout can hold
data_per_SS      = data_GB × RF / SS               # actual replicated load per process

You'll see a warning if data_GB × RF > SS × max_data_per_SS (not enough storage processes for the dataset at this replication factor) or if data_per_SS > max_data_per_SS (per-SS load above the recovery / data-distribution soft target). Both conditions can fire independently.

What limits it. Disk IOPS, fsync latency, and CPU on each SS process. SSDs hit fsync ceilings before CPU does on write-heavy workloads.

When to add more. Sustained data_lag_seconds > 5, storage_queue consistently > 100 MB, or data_distribution_queue_length > 0 for long stretches. Bump machines or cores_per_machine until the data-per-SS warning clears and the sustainable-workload card matches your target traffic.

1–3 SSes per disk

Modern NVMe can host 2 SSes per disk; high-IOPS network block storage 1–2; low-IOPS block storage 1. Each SS needs its own disk path so writes don't queue against each other.

Transaction logs

Rule of thumb. One T-log per host on dedicated class=transaction nodes. The workload-shape slider sets the SS : T-log ratio default — 20 for max-read-throughput, 12 for balanced 90/10, 8 for max-write-throughput. Override the ratio input afterwards if your write profile sits between two presets.

Community-reported ratios cluster around 8:1 to 10:1 storage-to-T-log (semisol) for general workloads, with read-heavy deployments going as wide as 20:1 to free more cores for storage. The slider picks within that range based on your workload shape.

Text Only
logs = max(3, ceil(storage_processes / SS_to_TLog_ratio))

What limits it. Group-commit fsync latency on the T-log disk. T-logs are disk-IOPS bound, not capacity bound. Co-locating two T-logs on the same disk halves throughput; keep one T-log per disk and one per host.

When to add more. commit_latency rises and tlog_queue_size climbs while disk fsync latency is healthy → fan-in is the bottleneck. Move the slider toward max-write-throughput (or lower the ratio manually) to widen the T-log tier.

Diminishing returns past 15 T-logs

Operators report negligible throughput gains beyond ~15 T-logs in a single log set. Past that point, evaluate sharding the workload across multiple clusters before adding more T-logs.

Commit proxies

Rule of thumb. Default to 3 commit proxies (FDB's default). The calculator scales by the sustainable write rate the storage tier supports — not by a user-supplied write QPS:

Text Only
commit_proxies = max(3, ceil(sustainable_writes / 50000))

What limits it. Commit-batch CPU and network bandwidth on each proxy. Each commit proxy adds latency overhead even when idle, so don't over-provision.

When to add more. commit_latency rises while tlog_queue_size is healthy and resolver_queue is short — the proxies themselves are the bottleneck. The slider's max-write-throughput position raises sustainable_writes and therefore the recommended CP count.

GRV proxies

Rule of thumb. Default 1 proxy. Cluster-wide GRV throughput saturates around 400K/s, so adding more than ~3–4 GRV proxies rarely helps. The count scales with the sustainable read rate:

Text Only
grv_proxies = max(1, ceil(sustainable_reads / 200000))

What limits it. Single-master getReadVersion serialization. Adding GRV proxies parallelises the GRV path but the underlying master still has to mint version numbers serially.

When to add more. read_version_batch_size is high or clients see GRV latency growing while master CPU is moderate. The slider's max-read-throughput position raises sustainable_reads and therefore the recommended GRV count.

Resolvers

Rule of thumb. Default 1. The community guidance is "have one, that is it" unless profiling proves otherwise. The calculator scales with the sustainable write rate:

Text Only
resolvers = max(1, ceil(sustainable_writes / 80000))

What limits it. Conflict-set memory and CPU. Each resolver owns a key range; more resolvers can increase false conflicts because keys from one transaction can hash across multiple resolvers and trigger spurious aborts.

When to add more. resolver_queue is consistently long and CPU on the single resolver is pegged. Rarely needs more than 4.

Coordinators

Rule of thumb. Use an odd number of coordinators in distinct failure domains (racks, AZs, datacenters):

Redundancy Coordinators
single 1
double 3
triple 5
three_data_hall 9
three_datacenter 9

Coordinators don't sit on the hot path — their latency does not affect normal transactions. Their job is to maintain a Paxos-like quorum on the cluster's coordination state. Set the failure-domains count in Advanced to the number of distinct racks / AZs / data halls you have; the calculator warns when three_data_hall or three_datacenter is selected and the failure-domain count is below the recommended coordinator count.

Stateless processes

Reserve a small pool of class=stateless processes for the cluster controller, master, commit proxies, GRV proxies, resolvers, plus a couple of standby slots and any log routers you run. The kubernetes-operator scaling guide recommends:

Text Only
stateless_slots ≈ commit_proxies + grv_proxies + resolvers + 4

The + 4 covers cluster controller, master, and two warm standbys.

Machines

The calculator splits machines into three classes:

Class Count
storage machines (input — cores_per_machine SSes per host)
transaction (one T-log per host) = logs
stateless (≤ 8 procs per host) ceil((cp + grv + res + 4) / 8)

Total cluster RAM is summed across every process: (SS + logs + cp + grv + res + 4) × ram_per_process × (1 + ram_headroom). The default per-process RAM budget is 4 GB (per foundationdb.conf defaults), with 25% headroom for OS, page cache, and byte-sample memory inside the storage process.

Replication factor implications

Redundancy mode Replication factor (storage) T-log replication Coordinators Notes
single 1 1 1 Dev only — no fault tolerance.
double 2 2 3 Single-machine failure.
triple 3 3 5 Default for production.
three_data_hall 3 (across 3 data halls) 4 9 Survives loss of one data hall.
three_datacenter 6 (3 × 2 sites) 6 9 Highest cost, multi-region.

Replication multiplies storage cost (bytes-on-disk and SS RAM) and divides the write tier's per-process budget by RF — every logical write fans out to RF storage processes. The sustainable-writes formula already accounts for this division.

Process-class layout patterns

Layouts borrowed from operator playbooks (see semisol's blog and the kubernetes-operator scaling guide):

Hosts Roles per host
3 1 × stateless + 1 × log + 2 × storage, all class=unset

Use single or double redundancy. One process per role, all on the same machine class.

Hosts Roles per host
3 transaction 1 × T-log, class=transaction
3 stateless cluster controller / master / proxies / resolver, class=stateless
N storage up to 8 × SS, class=storage

Triple redundancy. Coordinators co-locate on stateless or transaction machines in distinct failure domains.

Hosts Roles per host
5–15 transaction 1 × T-log per host
3+ stateless dedicated, scaled with proxy count
many storage one disk = 1–2 SS, ≤ 8 SS per host

Triple or three_data_hall. Add transaction-class hosts before commit proxies — most "throughput is too low" issues are T-log fan-in, not proxy count.

Calculator vs reality

The calculator scales role counts off a few simple per-process throughput heuristics. Real-world commit-proxy scaling is bound by per-CP commit-batch CPU well below the 50K writes/CP rule of thumb, so treat the recommended commit-proxy count as a floor: profile the cluster and raise commit_proxies as commit_latency warrants. Likewise the sustainable workload numbers are a ceiling at the listed per-process baselines and slider position; calibrate perProcReadsSsd / perProcWritesSsd in Advanced to match a saturating single-process benchmark on your hardware before treating them as authoritative.

Sources