DNS Pollution

Reading time: 11 minute Word count: 2272

Network DNS Security Pollution

The authoritative server may be correct. dig may return the right record when you ask the authority directly. Yet the user still gets sent to the wrong address, and the same fake answer may appear immediately on different networks. When that happens, the problem is usually no longer in the zone file. It is in the resolution path itself, which has been raced and replaced.

DNS pollution looks like “DNS is misconfigured”, but the real problem is often not the authoritative data. It is that someone inserted a packet that looks like a DNS response into the lookup chain before the real answer arrived.

This article looks only at pollution on the classic DNS path: how a wrong answer gets injected during the query, why it often arrives before the correct one, and how to distinguish an authoritative mistake, stale cache, and path interference in practice. DNSSEC, DoT, and DoH matter here, but only as boundary conditions. They are not treated as separate topics.

The essence of DNS pollution is not that authoritative data was changed, but that a “fake responder” appeared earlier, closer, or less trustworthy in the resolution path

First Separate the Targets

“DNS pollution” is not a standard RFC term. In engineering, people usually use it for two related but not identical cases:

A middlebox observes the DNS request on the path and forges a response to win the race
A recursive resolver, forwarder, or local network device injects a wrong result into cache and keeps spreading it downstream

Both cases make the client receive a wrong answer, but the failure point is different.

If the wrong answer was injected in transit, the key question is who replied before the real upstream did
If the wrong answer is already in recursive cache, the key question is who kept the fake answer and kept distributing it

In many contexts, people also translate DNS cache poisoning as DNS pollution. That is not strictly wrong, but during troubleshooting it is better to separate it from the more common “answer injected on the path” case.

Cache poisoning emphasizes that the cache was tricked by forged data
What people usually mean by DNS pollution often emphasizes direct answer racing on the path, even without compromising the authoritative server or the recursive implementation

Why It Works

Classic DNS has a very practical premise: a large amount of traffic goes out over UDP 53, the packets are short, the round trip is fast, and no connection has to be established first.

That gives pollution a real opening:

The query is usually visible in clear text
As long as a returned packet looks like a valid response, the client or recursive resolver may accept it first
Even if the real authoritative response arrives later, it may already be too late because the transaction has ended

So DNS pollution does not rely only on “DNS has no validation”. It relies on the fact that classic DNS was optimized for a low-cost lookup path. It assumes the network is mostly honest and at worst drops packets, retransmits, or times out occasionally. It does not assume that someone on the path will actively forge a faster answer.

What Actually Happens in One Polluted Lookup

One common path looks like this:

The client sends a query to the local recursive resolver, or sends a UDP request directly to an upstream DNS server
The query crosses the local gateway, the ISP network, or some longer path
A middle observer sees the domain lookup
The middle device immediately forges a packet that looks like a normal DNS response and sends it back
The querying side receives the fake response first and ends the transaction
The real upstream response arrives later, but it is already too late

Client/Resolver -> upstream DNS ?
                     |
                     | real query continues upward
                     v
                Authoritative / Recursive

Middlebox sees query
  -> forged DNS response first

Query side accepts first matching reply
  -> wrong answer cached or returned

The key is not how complicated the fake packet is. The key is that it arrives first. For many stub resolvers and recursive implementations, the first response that matches the current transaction is the one most likely to be accepted.

DNS pollution often shows up as:

The wrong answer arrives very quickly
The same wrong answer appears consistently across multiple lookup points
Asking the authority directly is correct, but the normal resolution path is wrong

Why It Is Often Faster Than the Real Answer

Because the polluter is closer to the query side

The authoritative server may be far away, and the recursive resolver may still need to walk the full chain through the root, TLD, and authority. If the pollution device sits on the local exit, the ISP boundary, or one hop away, it does not need to complete the real resolution path. It only needs to see the query and immediately return a forged answer.

The costs of the two paths are not symmetrical:

Real resolution has to keep walking the full upstream chain
Pollution only has to build a response that looks close enough

That is why it wins on latency.

Because classic DNS first checks whether the transaction matches

Many implementations first verify:

Whether the query ID matches
Whether the question section matches
Whether the source address and port meet the current implementation’s acceptance rules

If those conditions line up, the response may be accepted. A real authoritative answer, even if more correct, does not automatically beat a matching response that arrived earlier.

That is why port randomization, query ID randomization, and 0x20 encoding can raise the difficulty of forging responses. They make guessing harder. They do not solve the problem of someone on the path seeing the actual query. Once the attacker can observe the real request, the value of this kind of randomization drops sharply.

How It Differs from Normal Cache Staleness

When records have been changed but users still connect to the old address, people often first suspect TTL has not expired yet. That guess is often right, but it is not the same thing as pollution.

The usual signs of stale cache are:

The returned value used to be valid
Different recursive resolvers diverge gradually as TTL runs down
When you ask the same recursive resolver directly, the TTL keeps decreasing

Pollution more often looks like this:

The returned value may never have belonged to the domain’s normal configuration
The wrong answer appears unusually fast, sometimes more uniformly than a normal local cache hit
Repeated queries may keep returning the same fake IP, the same redirect page, or a deliberately crafted NXDOMAIN
Switching to another protected resolution path changes the result immediately

How It Differs from an Authoritative Misconfiguration

The first question to ask is: what happens if you query the authority directly?

If the authority is also wrong, suspect the zone file, local publishing flow, or delegation configuration first
If the authority is correct but the normal resolution path is wrong, suspect the recursive layer or path pollution first

You also need to distinguish whether the recursive resolver itself is wrong or whether it was raced on the way:

If the same recursive resolver returns the same wrong answer steadily from different network entry points, its cache or upstream policy may be at fault
If the same recursive resolver becomes correct as soon as you change the network path, the path itself is more likely being interfered with

That is why, during troubleshooting, it is best to test @authoritative, @known-recursive, and @local-resolver separately. If you only test one point, it is easy to mistake a path problem for a data problem.

Why Pollution Keeps Spreading

A single forged response on the path does not only break that one lookup. It may also enter cache.

Once a recursive resolver accepts the fake answer as valid, downstream clients no longer see an occasional bad packet. They see a stable wrong result.

A one-time path injection turns into a system-wide error for a period of time
The troubleshooting picture shifts from “some networks fail occasionally” to “all users behind the same recursive service fail”

So pollution usually has two layers:

The race on the path
The cache that keeps spreading the wrong answer

The Most Common Real-World Symptoms

A fixed wrong address is returned

Multiple unrelated domains resolve to the same group of IPs in the affected network, usually pointing to a block page, warning page, or blackhole address.

This usually is not about returning the real business answer. It is about pushing traffic to a single handling point as quickly as possible.

`NXDOMAIN` is returned directly

Some pollution does not send you to a wrong address. It makes the name look nonexistent instead. Negative caching can amplify that result.

Even if the record recovers later, or should have existed all along, the failure may persist for a while.

Only certain types or certain domains are polluted

Pollution does not have to cover all DNS traffic. Many systems only target certain sensitive domains, specific record types, or plain UDP queries.

That makes the symptoms look unstable:

A fails while AAAA succeeds
Plain DNS fails while DoH works
Some domains are always wrong while others are completely normal

How to Verify It in Practice

Start by separating the layers, not by changing the record immediately

You can confirm things in this order:

Query the authority directly to confirm the source data is correct
Query a known trusted public recursive resolver to confirm the normal recursive path is correct
Query the local recursive resolver or system default DNS to see whether the error only appears on the current path
Switch to DoT/DoH or another exit network and see whether the result changes immediately

This set of comparisons usually lands in one of three places:

The authoritative data layer
The recursive cache layer
The transport path layer

Pay attention to answers that arrive “too fast”

If a wrong response comes back suspiciously quickly, so quickly that it does not feel like a normal upstream lookup, suspect path racing. This is especially true for the first query, when there should not yet be a cache hit.

Watch for negative caching

If the pollution result is NXDOMAIN, you also need to check whether the local resolver or recursive server has already cached that negative answer. Otherwise, even after you return to the normal path, the observed result may still be affected by the old negative cache entry.

Why Classic Mitigations Only Help Partially

Port randomization and query ID randomization

These measures significantly raise the bar for attackers who are off the path, because they make the transaction details harder to guess. But they assume the enemy is someone trying to guess packets, not someone who can see packets.

Once the attacker is on the path and can see the real query content, this randomization is no longer a decisive defense.

DNSSEC

DNSSEC answers a different question: whether the answer came from an authorized signer and whether it was altered in transit. It does not answer the question of whether someone on the path tried to insert a fake packet first. If the client or recursive resolver really performs strict validation, a forged answer is much harder to accept as valid authority data even if it arrives first.

The real-world limitation is that:

Deployment is uneven
The validation chain must actually be complete to matter
Many troubleshooting scenarios cannot assume the terminal side is doing strict validation

So DNSSEC is an important reinforcement, but it is not a magic “problem solved”.

DoT / DoH

DoT and DoH protect the query and response inside an encrypted transport, making it much harder for the middle of the path to observe or forge classic cleartext DNS traffic.

They do not change DNS delegation or caching structure, but they do change the premise that pollution relies on most: a middlebox can easily see and race a UDP 53 query.

In many pollution cases, switching to DoT/DoH improves things immediately because the easiest part of the path to attack is no longer exposed.

What Engineering Should Actually Remember Today

DNS pollution is not “the DNS server is broken”. It is “the trust boundary in the resolution path has been lost”. If you still think of DNS as “some server gives me an answer”, you will naturally keep staring at the authority and the zone file during troubleshooting. What you really need to find is which hop replaced the answer first.

When you implement, capture, or operate this path, keep three questions in mind:

Is the source data correct
At which layer does the wrong answer first appear
Has that wrong answer been cached and spread further

If you separate those three things, DNS pollution stops looking like “DNS is misconfigured” and becomes “which layer returned the wrong answer first”.

If you next want to understand why a forged answer should have been detected, continue with DNSSEC. If you want to understand why some services move resolution control back into the application itself, continue with HTTPDNS.

DNS Pollution

First Separate the Targets

Why It Works

What Actually Happens in One Polluted Lookup

Why It Is Often Faster Than the Real Answer

Because the polluter is closer to the query side

Because classic DNS first checks whether the transaction matches

How It Differs from Normal Cache Staleness

How It Differs from an Authoritative Misconfiguration

Why Pollution Keeps Spreading

The Most Common Real-World Symptoms

A fixed wrong address is returned

`NXDOMAIN` is returned directly

Only certain types or certain domains are polluted

How to Verify It in Practice

Start by separating the layers, not by changing the record immediately

Pay attention to answers that arrive “too fast”

Watch for negative caching

Why Classic Mitigations Only Help Partially

Port randomization and query ID randomization

DNSSEC

DoT / DoH

What Engineering Should Actually Remember Today

Further Reading

References

First Separate the Targets

Why It Works

What Actually Happens in One Polluted Lookup

Why It Is Often Faster Than the Real Answer

Because the polluter is closer to the query side

Because classic DNS first checks whether the transaction matches

How It Differs from Normal Cache Staleness

How It Differs from an Authoritative Misconfiguration

Why Pollution Keeps Spreading

The Most Common Real-World Symptoms

A fixed wrong address is returned

NXDOMAIN is returned directly

Only certain types or certain domains are polluted

How to Verify It in Practice

Start by separating the layers, not by changing the record immediately

Pay attention to answers that arrive “too fast”

Watch for negative caching

Why Classic Mitigations Only Help Partially

Port randomization and query ID randomization

DNSSEC

DoT / DoH

What Engineering Should Actually Remember Today

Further Reading

References

`NXDOMAIN` is returned directly