In distributed databases, running out of disk space on even a single node can cause cascading issues—replica instability, write failures, and worst-case: data corruption or crash loops. Fortunately, YugabyteDB includes a built-in mechanism to proactively detect and reject writes when disk space is critically low.
This tip explores how YugabyteDB handles “disk full” scenarios and how you can configure it with fine-grained GFlags to protect your cluster under storage pressure.
The Problem: Disk Exhaustion in Distributed Systems
When a tablet server or master node runs out of disk, several issues can occur:
▪️ WAL (Write-Ahead Logs) can’t be persisted → writes fail silently or hang
▪️ Compaction can stall → read performance degrades
▪️ Leader re-elections may happen frequently due to IO stalls
▪️ Manual cleanup or restarts are often required, which introduces downtime
To avoid these issues, YugabyteDB includes proactive write-rejection logic that kicks in before you hit that critical low-disk threshold.
The Solution: Reject Writes When Disk Is Low
YugabyteDB introduces a safety mechanism: if disk space falls below a threshold, writes are rejected gracefully on that node. This preserves cluster stability and prevents data inconsistency or corruption.
This behavior is controlled by three key GFlags:
1. --reject_writes_when_disk_full
✅ Enables write rejection logic.
If this is true, YugabyteDB monitors available disk and prevents further writes if space drops below the configured threshold.
--reject_writes_when_disk_full=true
2. --reject_writes_min_disk_space_mb
📉 Minimum free space (in MB) to allow writes.
If disk space drops below this threshold, writes are rejected on the WAL and data directories.
--reject_writes_min_disk_space_mb=2048 # Reject writes below 2GB free
If set to 0, it defaults to:
max_disk_throughput_mbps * min(10, reject_writes_min_disk_space_check_interval_sec)
3. --reject_writes_min_disk_space_check_interval_sec
🔁 Interval (in seconds) for checking disk space.
If disk usage crosses the aggressive threshold, checks happen every 10s.
Setting this below 10 forces always-on aggressive checks, which may impact performance.
--reject_writes_min_disk_space_check_interval_sec=60
🧪 Example: Safe Config in Production
--reject_writes_when_disk_full=true
--reject_writes_min_disk_space_mb=4096
--reject_writes_min_disk_space_check_interval_sec=60
✅ This ensures that:
- ▪️ Writes are blocked when <4GB is available
- ▪️ Disk space is checked every 60 seconds (and more aggressively if needed)
- ▪️ The system avoids dangerous edge conditions before they become outages
📌 Best Practices
▪️ Always monitor disk usage with external tools or metrics (e.g., Prometheus, YBA)
▪️ Set realistic thresholds for
min_disk_space_mbbased on your WAL/data growth rate▪️ Avoid setting
check_interval_secbelow 10 unless for testing/debugging
“Disk full” may sound like a basic problem, but in distributed systems, it’s a silent killer. YugabyteDB’s reject-write mechanism turns an unpredictable failure into a graceful, recoverable event.
If you’re running large-scale transactional workloads, we strongly recommend reviewing these flags and ensuring your cluster is resilient to disk pressure!
Have Fun!
