NTP

Reading time: 9 minute Word count: 1886

Network NTP UDP Time Synchronization Clock

Two machines may have network connectivity, yet TLS still complains about certificate time. The logs of the same request may appear in the wrong order across different services. A device may come back online after a reboot and behave badly for a few minutes before it slowly settles back to normal time. Time looks like one of the most stable things in a system, but in distributed systems and networked devices the real question is often not “what time is it now?” It is “why should this machine trust its own clock?”

NTP is often reduced to “sync time with a server periodically.” That is not wrong, but it flattens the important part: NTP does not simply overwrite the local clock with a time value. On an untrusted network with jitter and asymmetric delay, it keeps estimating offset and error, then tries to pull the local clock back into place as smoothly as possible.

The core of NTP is not asking a remote host what time it is.
It is keeping a host clock within a trustworthy error range using layered time sources, round-trip measurement, and gradual correction in the presence of network delay and local drift.

Why It Exists

If every machine only relied on its own local clock, three kinds of problems would show up quickly:

Oscillators drift, so time on different machines gradually diverges
Reboots, power loss, and sleep can make local time suddenly wrong
Distributed systems need a sufficiently consistent time base for logs, certificates, expiration checks, and scheduling

Manual adjustment can only solve the one-time question of “how far off is it now.” It cannot solve continuous drift afterward. Worse, the time sources on the network are not naturally trustworthy or error-free either. Once synchronization goes through the network, the measurement is affected by path delay, queue jitter, and asymmetric routes.

So the real questions for a time synchronization protocol are:

Which source should be trusted?
How often should correction happen?
Which measurement errors should be ignored, and which should not?
When correcting time, should the system jump immediately or adjust gradually?

What Background It Grew Out Of

NTP comes from the Internet protocol stack and has evolved mainly along the IETF track. It was not designed for a few directly connected lab devices. It was designed for large-scale, heterogeneous, uneven-quality public and enterprise networks.

That directly shapes its style:

It allows layered synchronization instead of connecting every machine directly to the highest-precision source
It accepts that network measurements are noisy and does not treat every request as absolute truth
It emphasizes long-term stable convergence rather than instant alignment
It is aimed at general-purpose hosts and network devices, not hard real-time control systems

That is also why NTP and PTP went in different directions. NTP favors deployability and steady-state accuracy in general networks, while PTP favors much higher precision under LAN and hardware timestamping conditions. This article only discusses NTP.

The Main Model

To understand NTP, start with three objects:

Reference Clock: the original time source, such as satellite timing, an atomic clock, or another upstream high-precision source
NTP Server: a server that keeps distributing time downstream from its own upstream source
NTP Client: a host that continuously measures offset against upstream and disciplines its local clock

Then remember three facts that never go away:

Local clocks drift
Network round-trip time is not zero and is often unstable
Upstream time sources have trust levels, so they are not all equal

So NTP does not “send a request and get back the current time.” It:

Chooses a time source
Measures offset and round-trip delay
Estimates which samples are more trustworthy
Decides whether the local clock should run a little faster, a little slower, or jump when necessary

The Most Common Main Path

A typical NTP query path can be compressed like this:

The client records its local time when it sends the request
The server records the receive time and includes its send time in the response
The client records local time again when it receives the response
The client uses the four timestamps to estimate round-trip delay and local clock offset
The client does not blindly overwrite the local clock immediately. It folds the sample into the ongoing correction process

In simplified form:

Client
  -> Server: t1

Server
  recv at t2
  send at t3
  -> Client

Client
  recv at t4

offset ≈ ((t2 - t1) + (t3 - t4)) / 2
delay  ≈ (t4 - t1) - (t3 - t2)

What matters most here is not the formula itself, but the assumptions behind it:

The client cannot see the true one-way delay
It can only estimate using round-trip measurement
The offset estimate assumes the forward and reverse path delays are not wildly different

As soon as that assumption gets worse, sample trustworthiness drops.

Why It Was Designed This Way

Why the server does not simply tell the client to “set this time now”

Because the client always receives a value that was already true on the server some time before it arrived locally. There is always network transit time in between, so the server time value is already old by the time it reaches the client.

Without measuring round-trip delay, there is no way to estimate:

How long that value spent in transit
Whether the local clock is ahead of or behind the server
Whether the sample is trustworthy enough to use

NTP does not move remote time over wholesale. It estimates how far the local clock is off when the remote time arrives locally.

Why it must be layered instead of every machine querying the highest-precision source directly

High-precision time sources are limited in number and coverage. If every client connected directly to the top-level time source, the system would quickly run into:

Too much load on upstream sources
Too much network distance, which increases measurement error
Difficult operations and access control

NTP uses a layered structure so time can be distributed step by step. The closer a node is to the reference source, the more trustworthy its stratum typically is. The more downstream you go, the more nodes there are, but the deployment is more practical and the access cost is lower.

The layer number is not “who is absolutely more truthful.” It is “how many synchronization hops away is this machine from a reference source?” It helps organize trust paths and prevents clients from treating all upstream sources as identical.

Why correction has to be continuous instead of occasional

Local clock error is not a one-time offset. It is continuous drift. Even if the clock is perfectly aligned at one instant, it will drift again after a few minutes or hours as oscillator error continues.

That is why NTP clients usually do not “sync once and exit.” They run continuously and keep doing things like:

Sampling periodically
Adjusting synchronization intervals based on jitter and historical samples
Slowly correcting clock frequency and offset

Stable time synchronization depends more on long-term feedback control than on a single lucky hit.

Why slewing is often emphasized more than stepping

Stepping means jumping local time directly to the target. Slewing means making the local clock run a little faster or slower for a while, so it gradually converges to the right time.

Immediate jumps look faster, but they are expensive:

Timers and task scheduling can suddenly become inconsistent
Log timestamps can move backward or jump forward
Services that assume monotonically increasing time are more likely to break

So the common engineering choice is:

If the offset is small, prefer gradual correction
If the offset is huge, the system just booted, or the local time is obviously broken, then consider stepping

What NTP handles here is not mathematical elegance. It is system-behavior stability.

Awkward but Important Design Choices

A single sample is not always trustworthy

The offset and delay computed from one exchange can easily be polluted by queueing, route changes, and middleware jitter. NTP does not treat every sample as truth. It cares more about:

Which samples have lower delay
Which upstreams are stable over time
Whether the current sample deviates sharply from history

That is also why many implementations talk to multiple upstream sources and then filter and weight the results. This is not to look sophisticated. It is because a single path is too easy to skew by random noise.

Successful synchronization does not mean system time is absolutely reliable

NTP can keep error within an acceptable range, but it does not provide a strongly consistent global clock. As long as the network still has jitter and asymmetry, the client is always working with an estimate, not with a locally witnessed truth.

That matters for engineering decisions:

Certificate checks, log alignment, and expiration logic usually only need “accurate enough”
Causality, transaction commit, and distributed consistency cannot rely on wall clocks alone

If you mistake NTP’s “approximately consistent time” for “strictly global true time,” you will eventually make design errors.

Time drift on the wire is often not caused by a broken NTP server

When you see time drift in production, it is tempting to blame the upstream server first. In practice, common issues also include:

Poor local clock hardware
More obvious drift in virtualized environments
Unstable network paths that bias sample quality over time
Firewalls, ACLs, or NAT that intermittently block NTP traffic

That is a reminder that NTP is not just application logic. It is heavily affected by clock hardware and network quality.

Synchronizing wall clock and monotonic time are different jobs

Systems often have two kinds of time:

wall clock: current date and time, which is what NTP primarily corrects
monotonic clock: a clock that only moves forward and is not affected by time jumps

Timeouts, retries, performance measurements, and interval scheduling should often rely on monotonic time. Logs, certificate validity, and business timestamps rely more on wall clock time.

If you mix the two, even a normal gradual NTP correction can be misdiagnosed as a bug in your timing logic.

What Engineers Should Actually Look At

When you implement, operate, or debug NTP, the first thing to focus on is not the packet field table. Focus on these decisions:

First, distinguish whether you need “accurate enough wall-clock time” or a “strictly non-decreasing time base.” The former mainly depends on NTP. The latter should rely more on monotonic clocks.

Second, decide whether the drift is a one-time large offset or a slow continuous drift. The first points more toward startup, power loss, recovery, or clock-jump issues. The second points more toward oscillator drift, poor virtualized clock quality, or a chronically unstable synchronization path.

Third, look at sample quality, not just whether a response arrived. A response from NTP does not automatically mean the measurement is trustworthy. High delay, high jitter, and path asymmetry all worsen the offset estimate.

Fourth, check whether the system is using gradual correction or a direct jump. Many production failures are not “time never synchronized.” They are “time jumped suddenly.” That distinction matters a lot during troubleshooting.

NTP