vMotion, DRS, and Clock Skew... Why Distributed Databases Aren’t “Just Another VM"

TL;DR (for the impatient)

● VMware DRS and vMotion are optimized for stateless or loosely stateful workloads
● Temporary clock skew during vMotion is expected behavior
● Distributed databases depend on bounded, monotonic time for transaction safety
● Unconstrained, frequent VM mobility increases clock instability risk
● For YugabyteDB on VMware, choose one of two models:
– ESXi-layer PTP (strongest clock discipline)
– NTP (chrony recommended) + controlled vMotion with one-time VMware Tools resynchronization
● Treat YugabyteDB as a distributed system, not a single-node server

🔎 What This Tip Covers… and What It Does Not

This tip focuses specifically on time discipline, vMotion, and DRS behavior, because clock skew is one of the fastest ways to destabilize a distributed database.

However, safe operation of YugabyteDB on VMware also depends on proper CPU, memory, storage, network, and placement guarantees. Those topics are covered in depth in our VMware deployment best-practices guide and are summarized later in this post.

⏸️⏱️ What actually happens during vMotion (and why time jumps)

During a vMotion event, a virtual machine is briefly paused while its execution state is transferred from the source ESXi host to the destination host. This pause is typically short, often tens to hundreds of milliseconds, but from the guest operating system’s perspective, time effectively “skips forward” when execution resumes.

No instructions execute during this pause. However, once the VM resumes, the guest clock may be offset relative to wall-clock time, depending on how time synchronization is handled between the source and destination hosts and inside the guest itself.

In distributed systems like YugabyteDB, even short-lived time offsets can surface as leadership instability, Raft heartbeat delays, or transaction retries.

🔍 Key Insight: Why NTP Can Look Healthy During vMotion

During vMotion, the VM is briefly paused while memory state is transferred between ESXi hosts. This pause can introduce a sudden jump in the guest VM’s clock relative to other nodes in the cluster.

Even though Chrony may report healthy synchronization, the database nodes observe the jump relative to their peers. If the skew exceeds YugabyteDB’s protection threshold, the system may trigger clock-skew safety mechanisms such as transaction retries or process restarts.

YugabyteDB health checks rely on OS-level synchronization signals such as timedatectl. Short-lived inconsistencies between Chrony and timedatectl reporting can occasionally produce alert noise even when the actual clock offset is small.

⏱️ Why vMotion Duration Matters

Even with Precision Clock (PTP) enabled, vMotion still introduces a brief VM stun. The duration of that stun determines how disruptive the migration will be to a YugabyteDB cluster.

≤ 1.5 seconds -> typically minimal impact; brief latency blip
> 4.5 seconds -> leaseholders may move; recovery load increases
> 5 minutes -> node may be declared dead; tablet repair begins

vMotion events that cross these thresholds can trigger expensive recovery paths even if the migration itself is reported as “successful.”

How VMware Environments Typically Handle Time During vMotion

In practice, VMware environments tend to fall into two distinct patterns for handling time during vMotion.

Each has different operational and organizational trade-offs.

🧭 PTP vs. Avoiding vMotion: Designing for Time Correctness in Distributed Databases

When running YugabyteDB or any distributed database on VMware, the critical architectural decision isn’t which advanced setting to toggle, it’s whether database VMs will participate in vMotion at all.

That decision directly determines how you must handle clock synchronization, drift, and time correctness.

Once the operating model is clear, the tuning choices become straightforward. The sections below outline the practical options and trade-offs.

1️⃣ Support vMotion with PTP at the ESXi Layer (Best Technical Outcome)

Deploying PTP (Precision Time Protocol) at the ESXi layer is the most technically robust way to support YugabyteDB on VMware.

With PTP:

● ESXi hosts synchronize to a hardware-backed clock source
● Guests reference the host PHC (Physical Hardware Clock)
● Inter-VM clock skew remains tightly bounded
● Migration-induced time drift is minimized

This provides the strongest protection against clock skew, related transaction uncertainty in distributed databases.

🛠 Recommended Configuration

● Configure PTP on ESXi hosts
● Ensure all hosts in the cluster participate in the same PTP domain
● Expose the host PHC clock to YugabyteDB VMs
● Configure guest OS time services to reference the host PHC
● Disable continuous VMware Tools time synchronization
● Maintain CPU and memory reservations for YugabyteDB VMs

❓ What About Fully Automatic DRS?

With properly deployed PTP:

● Automatic DRS is technically safe from a clock-skew perspective
● Frequent migrations are far less likely to introduce skew spikes

However, even with PTP, many production teams still:

● Keep DRS in manual mode for stateful database VMs
● Avoid frequent load-balancing migrations
● Use host-VM affinity rules for stability

Distributed databases are resilient, but they are not stateless workloads.

2️⃣ VMware Tools–Assisted Controlled Resynchronization (simplest operational model)

If PTP is not available, a simpler operational approach can still provide acceptable clock discipline for YugabyteDB, provided it is configured carefully.

This model relies on:

● Shared upstream NTP sources (chrony recommended)
● Controlled guest resynchronization during migration
● Reduced migration frequency

The goal is not perfect clock precision, but minimizing skew spikes during planned vMotion events.

🛠 Recommended Configuration

● Configure DRS in Manual mode for YugabyteDB VMs
● Make vMotion infrequent and planned (maintenance windows only)
● Configure host-VM affinity and VM-VM anti-affinity rules
● Ensure CPU and memory reservations are set
● Ensure ESXi hosts and YugabyteDB VMs use the same upstream NTP sources
● Use chrony (recommended) inside the guest as the primary time discipline
● Disable periodic VMware Tools time synchronization
● Enable VMware Tools event-based resynchronization only
● Set the following advanced VM parameter at the VMX / ESXi level:

				
					pref.timeLagInMilliseconds = 100

🛠️ VMware Tools Time Sync: What These Settings Actually Do

Scope: These settings apply when vMotion is enabled and VMware Tools is used for controlled, event-based clock resynchronization, with NTP/chrony providing primary clock discipline inside the guest.

Enable VMware Tools Sync for Specific VM Lifecycle Events

Add the following to the YugabyteDB VM .vmx configuration:

time.synchronize.continue = "1"
time.synchronize.restore = "1"
time.synchronize.resume.disk = "1"
time.synchronize.shrink = "1"
time.synchronize.tools.startup = "1"
time.synchronize.tools.enable = "1"
time.synchronize.resume.host = "1"

These settings allow VMware Tools to synchronize time during migration, resume, restore, and startup events… ensuring rapid correction after lifecycle transitions that can introduce measurable time offset.

Disable Periodic Time Sync (Critical)

tools.syncTime = "0"

This disables continuous (periodic) synchronization between the guest and host. Because the YugabyteDB VM is already running chronyd (or another NTP client), multiple synchronizers must not compete for control of the guest clock.

Correct model:
Primary clock discipline → chrony/NTP inside the guest
Event-based correction → VMware Tools

Post-vMotion Correction Threshold

pref.timeLagInMilliseconds = "100"

This defines the threshold at which VMware Tools considers the guest clock sufficiently out of sync to trigger correction. Setting this low (100–500 ms) forces rapid post-migration correction and minimizes the duration of observable skew.

With these settings, post-vMotion clock skew should typically remain within ~100 ms.

Always validate within your own environment.

Official VMware Guidance

VMware documents recommended timekeeping practices for Linux guests in:
Broadcom KB 2108828 – Timekeeping best practices for Linux guests

📌 VMware Guidance: Guest Time Sync During vMotion

VMware explicitly documents guest time synchronization behavior during vMotion, including the role of VMware Tools and the pref.timeLagInMilliseconds setting, in the following knowledge base article:

Broadcom KB 2108828 – Timekeeping best practices for Linux guests

This guidance aligns with the approach described in Option 2️⃣, where VMware Tools is used to rapidly resynchronize guest time after a migration when deploying PTP at the ESXi layer is not feasible. It reflects how many real-world VMware environments manage time correctness while still allowing vMotion.

However, it is important to understand the scope of this guidance. These mechanisms help correct individual vMotion-induced time offsets after a VM resumes execution. They do not eliminate the vMotion pause itself, nor do they address the systemic effects that can emerge when migrations occur frequently or across multiple nodes at once.

That distinction becomes critical in distributed systems.

⏱️ → 🌩️ From Time Offsets to Skew Storms

Understanding how vMotion introduces time offsets is only the starting point. The more subtle, and more dangerous, behavior emerges when these offsets occur repeatedly and across multiple nodes in a distributed system.

1️⃣ Why skew appears as “storms,” not isolated events

In large enterprise datacenters, especially in in regulated sectors (i.e. Finance), vMotion is rarely a single, visible event.

Common patterns include:

● Background load rebalancing
● Storage-initiated mobility
● Maintenance waves
● Automated placement corrections

As a result:

● Multiple YugabyteDB nodes may move within a short window
● Skew violations appear cluster-wide
● Errors show up in bursts, not as one-off failures

This often leads to the incorrect conclusion:

“The database is unstable under load.”

In reality, the time assumptions of a distributed system are being repeatedly violated.

2️⃣ Why this is amplified in large VMware environments

Large enterprises often run:

● On-prem VMware estates
● Thousands of ESXi hosts
● Multiple active datacenters
● Storage-level block replication (e.g., VMDK replication) for DR

At the infrastructure layer, the implicit assumption is:

“A VM is the unit of failure. Replicate the disk. Move it freely.”

That assumption works for:

● Application servers
● Batch processing
● Stateless services

It does not hold for distributed databases.

In many enterprises, automated mobility is treated as a side effect of DRS rather than a core failure domain… and that leads to repeated skew violations.

3️⃣ Distributed databases are not single-node servers

YugabyteDB is a distributed SQL database built on Raft consensus.

Each node:

● Participates in quorum decisions
● Maintains replicated logs
● Relies on Hybrid Logical Clocks (HLCs)

Time directly affects:

● Transaction ordering
● Leader elections
● Follower reads
● Change data capture
● xCluster replication safe time

YugabyteDB does not require nanosecond-level precision.

It does require:

● Bounded skew (500 ms by default)
● Monotonic forward progress
● No backward time jumps

4️⃣ Distributed Databases Assume Homogeneous Infrastructure

YugabyteDB assumes that all nodes are identically provisioned… CPU, memory, storage performance, and network latency.

In shared VMware environments, this assumption is frequently violated by:

● Noisy neighbors
● Uneven CPU scheduling
● Storage contention

A distributed database will always run at the speed of its slowest node, regardless of how fast the others are.

5️⃣ What clock skew looks like inside YugabyteDB

When clock skew exceeds acceptable bounds, YugabyteDB will trigger an error:

				
					Too big clock skew is detected: X, while max allowed is: Y

This occurs because YugabyteDB enforces a maximum clock skew tolerance (default ~500 ms). If this bound is exceeded, the tserver or master process may crash to prevent data inconsistency.

This behavior is controlled by the fail_on_out_of_range_clock_skew flag; leaving it enabled causes a crash on boundary violation, while disabling it prevents the crash but does not eliminate the underlying consistency risk.

Symptoms of clock skew include:

● Writes rejected with skew errors
● Tablet leader step-downs
● Sudden YCQL failures
● Increased retries in YSQL
● Stalled xCluster safe time

These symptoms often correlate with infrastructure movement, not load.

📘 Official YugabyteDB Support Guidance
For a detailed explanation of clock skew detection, error messages, and tserver crash behavior, see the YugabyteDB Support article: Too big clock skew leading to error messages or tserver crashes

6️⃣ Infrastructure Guardrails (High-Level)

In addition to clock and mobility controls, production deployments should enforce:

● 100% memory reservation for all YugabyteDB VMs
● High CPU shares, not aggressive over-commit
● SSD-backed storage, dedicated VMDKs, PVSCSI
● ≥40% free disk space for compactions
● Separate client and inter-node networks

These are not performance optimizations, they are stability requirements in shared environments.

7️⃣ PTP vs NTP: aligning with real-world VMware constraints

Precision Time Protocol (PTP) offers microsecond-level accuracy and is attractive in theory.

In practice, PTP requires:

● Homogeneous NICs and firmware
● Hardware timestamping
● Stable host placement
● Predictable network paths

Large VMware environments typically have:

● Arbitrary VM placement
● Frequent vMotion
● No per-workload hardware guarantees

For YugabyteDB on VMware, NTP with constrained mobility is more reliable than PTP with unconstrained mobility.

⚠️ Clock Safety Alone Is Not Sufficient

Even with perfect time synchronization, YugabyteDB can become unstable if:

● Memory is not fully reserved
● CPU resources are over-committed
● Storage latency fluctuates due to shared I/O

Clock skew causes correctness failures. Resource contention causes performance collapse, often more subtly.

8️⃣ Disaster recovery: aligning with YugabyteDB support boundaries

Many financial organizations rely on:

● VMDK-level replication
● Storage snapshots
● VM-based recovery workflows

For YugabyteDB, this approach is unsupported.

Why:

● Consensus state is replicated logically, not at the block layer
● Disk-level replication captures partial Raft state
● Restoring VMDKs can lead to:
- ○ Split-brain
- ○ Log corruption
- ○ Silent data loss

Per YugabyteDB guidance:

✅ xCluster replication is the supported DR mechanism
❌ Storage-level replication is not

🧠 The mental model that works

Time skew in distributed systems is often misunderstood as a small, local problem… something that self-corrects and can be ignored once clocks are “close enough.” That mental model works for single-node systems. It fails in distributed databases.

The real risk is not a single offset on a single node, but coordinated time disagreement across multiple nodes, even when each offset is brief and eventually corrected.

The right question is not “Did the clock resynchronize?”

It’s “How often does the cluster experience moments of disagreement about time?”

Repeated, short-lived offsets, especially when driven by infrastructure behavior like vMotion, can overlap across nodes and trigger leadership churn, quorum instability, and retry storms.

The mental model that works is to treat time as a shared dependency. Anything that repeatedly perturbs it across nodes should be evaluated as a cluster-level risk, not a host-level nuisance.

📊 Real-World Outcome

In a recent production environment running YugabyteDB on VMware:

Metric	Before Fix	After Fix
vMotion clock skew	300–700 ms	< 250 ms
Node restarts	Frequent during vMotion	Eliminated
Clock skew alerts	Common during maintenance	Resolved

By tuning VMware Tools time-sync behavior and Chrony configuration, the clock drift after vMotion was reduced below YugabyteDB’s alert threshold, eliminating cluster instability during migrations.

🏁 Final takeaway

vMotion does not break distributed databases because it is slow or misconfigured. It causes problems because it introduces repeated, coordinated time offsets across nodes, even when each individual event is brief and eventually corrected.

There is no single “right” answer for all environments. Some teams avoid vMotion entirely. Others deploy PTP at the ESXi layer. Many rely on rapid guest-level resynchronization using VMware Tools. All three approaches can work… but they are not equivalent, and the trade-offs are real.

What matters most is being explicit about the choice. Distributed databases are sensitive not to one-off time jumps, but to how often the system experiences disagreement about time. If that risk isn’t actively managed, small infrastructure events can amplify into cluster-wide instability.

Distributed databases aren’t “just another VM.” They require treating time as a shared, cluster-level dependency… not a per-host detail.

👉 For practical, field-tested configuration guidance on how to run YugabyteDB safely on VMware, including DRS rules, vMotion controls, and placement strategies, see:

Tip #2: Running YugabyteDB Safely on VMware: A Practical DRS & Time-Sync Checklist.

Have Fun!