Audit Storage NEW IN 7.4¶
Version Notice
This feature is only available in FoundationDB 7.4 and later. You are viewing docs for version 7.3.
Audit Storage validates the consistency of data replicas and location metadata in your FoundationDB cluster. It provides end-to-end verification that all copies of your data match and that metadata is consistent.
Overview¶
Audit Storage checks three types of consistency:
| Audit Type | What It Checks |
|---|---|
replica | Data consistency between replicas across all DCs |
locationmetadata | Consistency between KeyServer and ServerKey metadata |
ssshard | Consistency between ServerKey and storage server shard mappings |
Key Features¶
- End-to-end completeness - Persists progress; continues until all ranges are verified
- Scalable - Near-linear speedup with parallelism (configurable via
CONCURRENT_AUDIT_TASK_COUNT_MAX) - Fault tolerant - Automatically retries failed checks
- Progress monitoring - CLI commands to track job status
- No additional setup - Uses existing DD and SS infrastructure
Commands¶
Start an Audit¶
# Check replica consistency
fdbcli> audit_storage replica "" \xff\xff
# Check location metadata
fdbcli> audit_storage locationmetadata "" \xff\xff
# Check SS shard mappings
fdbcli> audit_storage ssshard "" \xff\xff
Check Status¶
# List recent jobs
fdbcli> get_audit_status replica recent
# Check specific job progress
fdbcli> get_audit_status replica progress <AUDIT_ID>
Cancel an Audit¶
Audit Types¶
Replica Consistency (replica)¶
Verifies that all replicas of each key-value pair are identical:
- Compares data between storage servers across all data centers
- Uses shard-based partitioning for efficient parallel checking
- Generates
SSAuditStorageShardReplicaErrortrace events on mismatch
Location Metadata (locationmetadata)¶
Validates consistency between system metadata:
- Checks
KeyServer↔ServerKeymappings - Ensures ranges are assigned to correct servers
- Generates
DDDoAuditLocationMetadataErroron mismatch
Note
Location metadata audit always checks all key space, regardless of the range specified.
SS Shard Mappings (ssshard)¶
Verifies storage server local state matches system metadata:
- Compares
ServerKeyswith SS in-memory shard information - Checks each storage server individually
- Generates
SSAuditStorageSsShardErroron mismatch
Monitoring Progress¶
CLI Status¶
fdbcli> get_audit_status replica progress <AUDIT_ID>
Audit ID: 12345678...
Type: replica
Range: ["", "\xff")
Phase: Running
Submitted: 42 tasks
Completed: 38 tasks
Error: 0 tasks
Trace Events¶
Monitor these trace events for audit activity:
| Event | Description |
|---|---|
AuditStorageStart | Audit job started |
AuditStorageComplete | Audit job finished |
SSAuditStorageShardReplicaError | Replica inconsistency detected |
DDDoAuditLocationMetadataError | Metadata inconsistency detected |
SSAuditStorageSsShardError | Shard mapping inconsistency detected |
Progress Persistence¶
Audit progress is stored in system metadata:
- Replica/location metadata:
\xff/auditRanges/ - SS shard checking:
\xff/auditServers/
This enables: - Resume after failures without re-checking completed ranges - Accurate progress tracking - Efficient resource utilization
Comparison with Consistency Checker Urgent¶
| Feature | Audit Storage | Consistency Checker Urgent |
|---|---|---|
| Progress persistence | ✅ Yes | ❌ No |
| Location metadata check | ✅ Yes | ❌ No |
| CLI job management | ✅ Yes | ❌ No |
| Efficiency | ✅ High (no repeat work) | ⚠️ Lower |
Best Practices¶
- Schedule regular audits - Run replica audits periodically (e.g., weekly)
- Monitor trace events - Alert on
*Errortrace events - Use appropriate ranges - For large clusters, audit in segments
- Check after incidents - Run audits after hardware failures or recoveries
Troubleshooting¶
Audit Not Progressing¶
- Check storage server health with
status details - Verify data distribution is working
- Review trace logs for errors
High Error Count¶
- Examine specific
*Errortrace events - Check for storage server issues
- Consider running shard-by-shard audits
See Also¶
- Consistency Scan - Continuous background scanning
- Restore Validation - Validate backup restores
- Troubleshooting - General debugging guide