You change an A record in production, but browsers keep reaching the old address. The authoritative server already returns the new value, yet users in different regions still see different answers. Packet captures clearly ask for www.example.com, but the first few hops do not return an IP directly.
The hard part of DNS is not record types. It is the model behind them: DNS is never a real-time global lookup table. It is a resolution system held together by hierarchical delegation, recursive resolution, and TTL-based caching.
Hierarchical namespace delegation
Recursive resolvers query upstream on behalf of clients
Results are cached with TTL
Temporary inconsistency is accepted across the Internet
The key question in a DNS change is not whether the authoritative server was updated correctly. It is when outside observers will converge from the old answer to the new one.
What Actually Happens During a Lookup
When an application wants to reach www.example.com, it does not walk from the root server all the way to the final authoritative server by itself. It first hands the question to a local stub resolver, then a configured recursive resolver completes the rest.
Client
-> Recursive Resolver: www.example.com A ?
Recursive Resolver
-> Root Server: www.example.com A ?
Root Server
-> Recursive Resolver: I do not know the final answer, but ask these servers for .com
Recursive Resolver
-> .com TLD Server: www.example.com A ?
.com TLD Server
-> Recursive Resolver: I do not know the final answer, but ask these authoritative servers for example.com
Recursive Resolver
-> Authoritative Server for example.com: www.example.com A ?
Authoritative Server
-> Recursive Resolver: www.example.com A = 93.184.216.34, TTL = ...
Recursive Resolver
-> Client: www.example.com A = 93.184.216.34, TTL remaining = ...
Only the last server returns the actual business answer. The earlier hops only point the way.
That path reflects three core DNS ideas:
- Names are delegated step by step instead of stored centrally
- Recursive resolvers do the work for clients
- Both delegation hints and final answers can be cached with TTL
The second lookup for www.example.com does not have to traverse the whole chain again. If the final A record is still fresh, the recursive resolver can answer directly. Even if the final answer has expired, the delegation data for .com and example.com may still be cached. Often only the back half of the chain needs to be queried again.
That is why DNS performance is not only about authoritative server latency. It also comes from the fact that most queries do not need to start from the root every time.
The Main Objects
Stub Resolver: the lightweight resolver on the application or host side, which only hands the question outRecursive Resolver: the server that traces the answer on behalf of the client and keeps cachesAuthoritative Server: the final authority for records in a zoneRoot / TLD / Lower authoritative servers: the hierarchy that delegates the namespace step by stepZone: the set of records managed by one administrative boundaryTTL: how long a result may be cached, not how long it stays true foreverRecursive Query: the client asks someone else to resolve the name all the way downIterative Query: the recursive resolver follows each hint one hop at a time
From the client side, the interaction looks recursive. Inside the resolver, the upstream walk is iterative. Keeping those two layers apart makes most packet captures easier to explain.
DNS First Solves Naming Authority
When a network is small, host names and addresses can live in one local table. The early Internet did use approaches like that.
Once the scale grows, that model breaks down quickly:
- The global directory becomes too large to distribute
- Every organization has to update its own records often
- Old entries linger everywhere after address changes
- There is no natural administrative boundary for delegation
DNS looks like a lookup table, but what it really solves is naming authority.
com does not need to know every host record under example.com, and example.com does not need to report every subdomain change to a global central node. The upper layer only needs to know who is responsible for the next layer.
Delegation gives DNS three important properties:
Global naming rules stay consistent
Local namespaces are managed by the right organization
The querier does not need to know where every record lives ahead of time
That makes DNS more complex than a single global table, but also makes it usable across organizations, regions, and constantly changing infrastructure.
A Recursive Resolver Is More Than a Forwarder
It is easy to think of a recursive resolver as “just a box that forwards DNS requests,” because the client only knows one server.
In practice, it does much more:
- It walks the full delegation chain for the client
- It caches final answers
- It caches intermediate delegation data
- It handles timeouts, retries, truncation, and upstream failures
- It concentrates a lot of resolution complexity into a small amount of infrastructure
The same domain name can produce different answers in different networks.
Different recursive resolvers have different cache states, different upstream paths, and sometimes different policies. The client only receives the answer that its resolver currently believes, not necessarily the freshest value the authoritative server just published.
When debugging, always separate these questions:
- Which recursive resolver did the client ask?
- Was the returned answer cached or freshly resolved?
- What does the authoritative server return directly?
A local ping to a name is not enough to prove that DNS itself is correct.
TTL Manages the Inconsistency Window
The most important time parameter in DNS is TTL.
It is not decorative, and it is not only a performance hint. TTL means:
This answer may be cached and reused for a period of time
Long TTL means:
- Higher cache hit rate
- Less load on authoritative servers
- More stable latency
- Slower propagation of changes
Short TTL means:
- Faster convergence after changes
- More queries
- More upstream jitter exposed to clients
DNS chooses soft state: it accepts old answers for a bounded time in exchange for scalability and availability.
That explains many production symptoms:
- The authoritative server has already changed, but recursive caches have not expired yet
- Users in one region see the new address while another region still sees the old one
- After a rollback, some users keep hitting the previously cached bad answer
That does not mean DNS is broken. It means the system is still inside the inconsistency window allowed by TTL.
Negative Cache Can Hide Newly Added Records
DNS does not only cache what it found. It also caches what it could not find.
If a name once returned NXDOMAIN, the recursive resolver may cache that negative result for a while. Later adding the record does not force every resolver to ask again immediately.
This is a very common field issue:
- Someone visits a name before it is created
- The resolver caches the negative result
- The record is added later
- Some users still keep seeing “does not exist”
So during DNS changes, do not only inspect positive record TTLs. Also consider whether a failed lookup has already been cached.
A Zone Is Not the Same Thing as a Domain Name
A Domain Name is a name in the tree. A Zone is an administrative boundary.
They are related, but they are not the same object.
For example, api.prod.example.com is just part of the name space under example.com, but it can be delegated to another set of authoritative servers. Once a subdomain is delegated, the parent zone is no longer responsible for the final business records inside that subdomain. It is only responsible for telling resolvers where to ask next.
Common mistakes are:
- The parent zone has a delegation, but the child zone is not configured correctly
- The subdomain has already been moved, but the operator is still looking for the final record in the parent zone
glue,NS, and authoritative data do not match- Similar-looking name hierarchy is mistaken for identical administrative boundaries
When debugging DNS, first confirm whether you are looking at the parent zone, the child zone, or the final authoritative data.
CNAME Is Not a Universal Alias
CNAME is often understood as “an alias for another name.” That is only half right.
What it really says is: this name is not the end of the chain, so the resolution process should continue somewhere else.
That is why a CNAME name cannot also hold other record types. The record would be saying both “I am someone else” and “I still have my own data,” which breaks the resolution semantics.
This is also why platforms often do not allow a root domain to be a pure CNAME, and why troubleshooting cannot stop at “does this name have a record.” You must also check whether the record types conflict.
UDP Is Not Why DNS Is Unreliable
Classic DNS uses UDP heavily, not because reliability is unimportant, but because most queries are short, small, and frequent. The request/response model fits low-cost transport well.
UDP gives DNS:
- No connection setup
- Low latency for ordinary lookups
- Better server scalability for lots of short requests
The tradeoff is:
- Large responses may be truncated
- Packet loss must be handled by retry
- Middleboxes may mishandle fragments or oversized packets
If a capture shows truncation, retries, TCP fallback, or failures only on large responses, the problem is usually in the transport path.
How to Debug DNS in Layers
| Symptom | First place to look |
|---|---|
| This machine cannot resolve it, but another network can | Local DNS config, recursive resolver |
| The authoritative answer is correct, but users still see the old value | TTL, recursive cache |
| A newly created record still returns nonexistent | Negative cache, name mismatch |
| A subdomain breaks | Parent delegation, child authoritative server, glue |
| Small records work but large records fail | UDP truncation, TCP fallback, middleboxes |
| Browser behavior differs from CLI tools | Application cache, system cache, query type differences |
When debugging DNS, check these four layers first:
- Which recursive resolver the client asked
- Whether the recursive resolver returned a cached result, a delegation hint, or a final answer
- What the authoritative server returns directly
- Whether TTL or negative cache is still in effect
When You Change a Domain, Do Not Trust “Saved Successfully”
During cutover, migration, or rollback, the dangerous assumption is that once the authoritative record changes, the outside world changes immediately too.
A safer approach is to account for time explicitly:
- Lower the relevant TTL before the change
- Wait for old TTLs to age out before flipping the core record
- Check parent delegation, child authoritative data, and actual service endpoints together
- Consider stale answers and negative cache during rollback as well
- Validate with multiple recursive resolvers and direct authoritative queries
DNS change is complete not when the control panel says “saved,” but when outside observers converge on the expected answer.
What DNS is most worth remembering is not the record names A / AAAA / CNAME / NS, but that it uses hierarchical delegation to assign naming authority, recursive resolvers to absorb lookup complexity, and TTL to manage a bounded window of inconsistency across the Internet.
Further Reading
- DNS Pollution: why a wrong answer can enter the resolution chain before the real one
- DNSSEC: why DNS answers need to be verifiable
- DoH / DoT: how DNS transport and trust boundaries change
- UDP: why classic DNS still runs heavily on UDP