Understanding Sampled Tracing (and those “Trace has too many …” log lines)

YugabyteDB has supported sampled tracing for years: even when enable_tracing=false, the system can still collect traces for a small fraction of RPCs (by default, 1 out of every 1000). This gives you “just enough” visibility in production without turning on full tracing everywhere.

At the same time, YugabyteDB now enforces guardrails so a single trace can’t grow without bound (which would otherwise risk memory bloat and noisy logs). Those guardrails are what trigger log lines like:

● Trace has too many child traces. Cannot add more.
● Trace has too many entries. Will not add more entries;

🔍 What is sampled tracing?

Sampled tracing is controlled by this gFlag:

● sampled_trace_1_in_n
- ○ Default: 1000 (meaning ~0.1% of requests are traced)
- ○ Set to 0 to disable sampled tracing entirely

This is why you can see trace-related behavior even when enable_tracing is not turned on.

You can confirm the current value via the varz YB-TServer and YB- Master endpoints:

				
					curl -s http://<master_or_tserver_http>:7000/varz?raw | grep sampled_trace_1_in_n
curl -s http://<tserver_http>:9000/varz?raw | grep sampled_trace_1_in_n

🧠 “Collecting a trace” vs “Printing a trace” (the key mental model)

Think of tracing in two stages:

1. Collect / build a trace (in memory as the request flows through layers)
2. Print / dump the trace to logs (based on thresholds and/or additional sampling)

In practice, you often want:

● a slow-query threshold to decide which requests are “interesting” enough to print
● plus sampling so you don’t print every slow request at high throughput

YugabyteDB exposes a slow-RPC threshold via:

● rpc_slow_query_threshold_ms (traces for calls slower than this are logged)

And sampled tracing via:

● sampled_trace_1_in_n (only a fraction of requests get traces collected)

✅ Why this matters (for example… to a 400k reads/sec workload)

If an app is doing 400k reads/sec, and you need to inspect the tail latencies, enabling full tracing (enable_tracing=true) can become wildly noisy in production.

Instead, you can:

● keep a slow threshold (example: 200ms)
● and apply sampling (default 1/1000)

Result: you reduce log volume by ~1000× while still getting representative traces from the slow tail.

This is the same general principle used across distributed tracing systems: sampling is how you keep tracing feasible at scale.

🚦 The guardrails: trace size / fan-out limits (and why you’re seeing warnings)

YugabyteDB limits trace growth with:

● tracing_max_children_per_trace (default 10)
● tracing_max_entries_per_trace (default 100)

When a trace exceeds these limits, YugabyteDB logs warnings like the ones you pasted, and then stops adding more children/entries to that trace.

Why it can be louder on the Master

Masters can see these warnings more often because some RPCs/traces are longer-lived compared to tservers, so they have more opportunity to accumulate entries/children before completing. (The warnings are typically throttled, but at scale they can still look like “spew”.)

🧯 How to reduce the log noise (without turning off tracing entirely)

If your main goal is: “I want the safety limits, but I don’t want to see these warnings constantly”, the simplest option is to raise the caps.

Option A: Raise the limits (recommended when you just want to silence warnings)

For example:

● tracing_max_children_per_trace=1000
● tracing_max_entries_per_trace=10000

These values are intentionally “much larger than default” so you’re unlikely to hit them in normal operation, which means fewer warnings.

Using `yb-ts-cli` (runtime flag change)

				
					# Example for a tserver:
yb-ts-cli --server_address=<host>:9100 set_flag -force tracing_max_children_per_trace 1000
yb-ts-cli --server_address=<host>:9100 set_flag -force tracing_max_entries_per_trace 10000

Do this on the nodes where you’re seeing the warnings (often masters, sometimes tservers).

Option B: Change flags in YugabyteDB Anywhere (persistent)

If you’re using YBA, the supported workflow is to edit flags in the UI.

🛑 Should you disable sampled tracing?

You can disable it:

set sampled_trace_1_in_n=0

Is there harm?

Not “harm” in the sense of correctness… your cluster will still function normally, but you lose a very useful diagnostic tool:

● Pros of disabling: slightly less tracing overhead, fewer trace-related edge effects
● Cons of disabling: when you hit rare latency spikes, you’ve removed one of the best low-noise ways to capture “what happened” inside the RPC stack without enabling full tracing

In most production cases, a better approach is:

● keep sampled tracing enabled
● tune printing (thresholds) and/or raise the max-children/max-entries caps so logs stay quiet

🧰 Related tracing knobs you’ll likely see together

● enable_tracing (global tracing enable/disable) exists for both tserver and master.
● Yugabyte Support’s YCQL tracing workflow typically combines:
- ○ enable_tracing
- ○ collect_end_to_end_traces
- ○ rpc_slow_query_threshold_ms
- ○ (and sometimes tracing_level)

That workflow is extremely effective, but should be used briefly, because full tracing can be chatty.

📌 Quick “What should I do?” decision table

Situation	Recommended Action	Why
You see occasional trace warnings	Ignore them	These are normal safety guardrails preventing runaway trace growth.
You see frequent warning “spew” but want tracing available	Increase `tracing_max_children_per_trace` and `tracing_max_entries_per_trace`	Larger limits reduce log noise while preserving sampled tracing for diagnostics.
Highly sensitive environment with no desire for sampled traces	Set `sampled_trace_1_in_n=0`	Completely disables sampled tracing, trading observability for minimal overhead.

🧾 Summary

Sampled tracing in YugabyteDB provides low-overhead production visibility by tracing only a small fraction of requests (1 in 1000 by default), even when enable_tracing is disabled. This makes it possible to investigate rare latency issues without the noise and risk of full tracing.

To protect system stability, YugabyteDB limits how large a trace can grow. When those limits are hit, warning messages are logged and trace growth safely stops… this is expected behavior, especially on Masters.

In most cases, it’s best to keep sampled tracing enabled and reduce log noise by tuning slow-query thresholds or increasing trace size limits. Disabling sampled tracing is safe, but it removes a valuable diagnostic tool for understanding tail latency in production.

Have Fun!