How DNS Works in Linux. Part 2: All Levels of DNS Caching

A comprehensive deep-dive into every layer of DNS caching in Linux, from kernel conntrack and systemd-resolved to application-level caches in Java, Go, and Python, plus container orchestration with Docker, Podman, and Kubernetes CoreDNS.

Series Navigation

  1. How DNS Works in Linux. Part 2: All Levels of DNS Caching (Current)
  2. How DNS works in Linux. Part 3:  Understanding resolv.conf, systemd-resolved, NetworkManager and others
  3. How DNS works in Linux. Part 4: DNS in containers
DNS Caching in Linux

Introduction

DNS caching is a multi-layered system that prevents inefficient repeated network queries. Caching accelerates performance but creates additional complexity when troubleshooting — old DNS records can persist even after updates.

What Is DNS Cache and Why Is It Needed

DNS cache operates as a hierarchical mechanism across system components rather than a single storage unit. Each DNS query represents expensive network latency, prompting various layers to cache results locally.

TTL (Time To Live) defines the cache record lifespan in seconds, after which data requires a refresh.

Benefits:

  • Enhanced speed for repeated DNS queries
  • Reduced infrastructure load
  • Resilience during temporary connectivity loss
  • Minimized application delays

Drawbacks: Stale data can cause service disruptions during infrastructure changes.

DNS Caching Levels in Linux

Kernel Level

The Linux kernel implements indirect "caching" via netfilter/conntrack with stateful filtering. Though UDP lacks connection establishment, the kernel tracks query-response packet pairs (~30-second default duration) for NAT and firewall rules like iptables -m conntrack --ctstate ESTABLISHED,RELATED.

Diagnostic commands:

conntrack -L -p udp --dport 53
conntrack -D -p udp --dport 53

System Level

1. systemd-resolved

Modern systemd distributions employ systemd-resolved as the primary DNS client offering:

  • Native DNS query caching (positive and optional negative responses)
  • Split DNS support via systemd-networkd or NetworkManager
  • DNS-over-TLS with fallback servers and DNSSEC
  • D-Bus API (org.freedesktop.resolve1) for management via resolvectl

2. nscd (Name Service Cache Daemon)

A legacy daemon caching NSS calls including DNS. Configuration in /etc/nscd.conf:

enable-cache hosts yes
positive-time-to-live hosts 600
negative-time-to-live hosts 20

Note: nscd is deprecated; systemd-resolved replaces it in Debian systems, though RHEL-based distributions still use it.

3. Popular Local Caching Resolvers

dnsmasq configuration:

cache-size=1000
neg-ttl=60

unbound configuration:

msg-cache-size: 50m
rrset-cache-size: 100m
cache-min-ttl: 30
cache-max-ttl: 86400

bind configuration:

options {
    max-cache-ttl 86400;
    max-ncache-ttl 3600;
};

Application and Programming Language Level

glibc

While officially lacking full DNS caching, glibc implements NSS (Name Service Switch) delegating to nscd, systemd-resolved, or other caching components via /etc/nsswitch.conf:

hosts: files resolved myhostname

When nscd runs, glibc functions like getaddrinfo() transparently utilize its cache. Internally, glibc contains minor optimizations like socket reuse within single function calls — not inter-process caching.

Java

Java maintains built-in DNS caching with configurable TTL:

Without Security Manager (Java 11+ default):

  • Positive responses: 30-second cache maximum
  • Negative responses (NXDOMAIN): 10 seconds

With Security Manager:

  • Positive responses: indefinite caching
  • Negative responses: no caching

Configuration:

java -Dnetworkaddress.cache.ttl=60 -jar app.jar

Or programmatically:

Security.setProperty("networkaddress.cache.ttl", "60");

Go

Go offers two resolution models:

Via cgo: Invokes system getaddrinfo() using /etc/nsswitch.conf and /etc/resolv.conf

Pure Go resolver: Custom DNS client bypassing system libraries

Selection:

GODEBUG=netdns=go
GODEBUG=netdns=cgo

The pure Go resolver implements in-memory DNS caching respecting TTL for successful (A, AAAA, CNAME) and negative (NXDOMAIN) responses. The cache uses LRU eviction, exists per-process, and isn't shared across instances.

Python

The standard socket module and most libraries (requests, urllib3) use getaddrinfo() without built-in caching, relying on system resolvers.

dnspython supports explicit caching since version 2.0:

import dns.resolver
resolver = dns.resolver.Resolver()
resolver.cache = dns.resolver.Cache()

Containers and Orchestration

Docker

Containers use host DNS (with host networking mode) or embedded resolvers. Default Docker uses address 127.0.0.11 — dockerd's embedded DNS via libnetwork.

Podman

CNI backend (default in Podman 3.x and earlier): Uses dnsname plugin for DNS in virtual networks. Caching occurs within containers using systemd-resolved, dnsmasq, or other local resolvers; dnsname itself doesn't cache.

Netavark backend (default in Podman 4.0+): Runs aardvark-dns lightweight server supporting A and PTR records, forwarding external requests to system DNS without caching responses.

Kubernetes

CoreDNS handles Kubernetes DNS with differentiated behavior:

Internal records (e.g., my-svc.my-namespace.svc.cluster.local): Always cached via kubernetes plugin with specified TTL (typically 5 seconds for headless services, 30 for others).

External requests (e.g., google.com): Handled by forward plugin; without cache plugin, every query hits upstream resolvers.

Enable caching by adding the cache plugin to Corefile:

.:53 {
    errors
    health
    cache 30 .
    forward . 8.8.8.8 1.1.1.1
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    reload
}

Proxy Servers: Nginx and HAProxy

Nginx caches DNS records for their TTL duration, which is problematic in container environments with dynamic IPs. Solution: use variables in proxy_pass with resolver 8.8.8.8 valid=30s for forced refresh every 30 seconds.

HAProxy offers flexible DNS resolution via a resolvers block with parameters:

resolvers mynameservers
  nameserver ns1 192.168.2.10:53
  nameserver ns2 192.168.3.10:53
  hold valid    10s
  hold obsolete 0s
  resolve_retries 3
  timeout retry 1s
  timeout resolve 2s

Monitoring note: Watch logs for "DNS resolution failed" (HAProxy) and "upstream timed out" (Nginx).

Alpine Linux caveat: musl library uses a minimal resolver without a separate cache but fixed timeout/retry; each call triggers a new DNS query.

Special Caching Cases

Negative Caching (NXDOMAIN)

Non-existent domain responses are cached per RFC 2308. TTL varies by resolver:

  • Unbound: SOA.MINIMUM or val-ttl-negative (default ~15 minutes)
  • Dnsmasq: 10 seconds
  • systemd-resolved: 0 seconds (no caching)
  • Bind: TTL from SOA Minimum field plus max-negative-cache-ttl

Stale DNS (Serve Expired)

Some resolvers support RFC 8767 "serve expired" mode — returning stale cache entries when upstream is unavailable. Useful for resilience but risks using outdated records.

Diagnostics: Determining DNS Response Origin

Compare results across tools to identify the caching level:

1. System resolver (libc):

getent hosts example.com

2. Direct systemd-resolved query:

dig @127.0.0.53 example.com
resolvectl query example.com

3. External DNS bypass:

dig @8.8.8.8 example.com

4. View TTL values:

dig +nocmd +noall +answer example.com

Differences in IP addresses or TTL values indicate caching at specific layers.

Network Traffic Monitoring

tcpdump -i any udp port 53 -n -X -w dns_dump.pcap

Absence of traffic suggests a cache hit.

Example output:

13:37:45.123 IP 10.0.0.1.4321 > 8.8.8.8.53: 12345+ A? google.com. (28)
13:37:45.456 IP 8.8.8.8.53 > 10.0.0.1.4321: 12345 1/0/0 A 142.250.185.174 (44)

System Call Tracing

strace -e trace=network curl example.com

Observable sendto() and recvfrom() calls confirm network-based resolution versus cache.

Common Problem Solutions

SituationCauseSolution
"Site opens everywhere except for me"Stale browser/system cacheCompare ping IP to expected; clear cache
"Changed A-record but traffic goes to old server"Insufficient TTL planningSet short TTL (60s) 24-48 hours before change
"curl works, browser doesn't"Browser DNS cacheClear browser cache
"Nginx stopped routing after DNS change"Nginx cache (valid parameter)Reduce valid=1s or restart
"Library uses old address after flush"Language/library internal cacheConfigure TTL in code or restart application

Cache Clearing Checklist

CategorySoftwareCommandNote
Systemsystemd-resolvedresolvectl flush-cachesCache clearing
Systemnscdsudo nscd -i hostsInvalidate hosts cache
Systemdnsmasqsudo systemctl restart dnsmasqRestart clears cache
Systemunboundunbound-control flushFull cache purge
Systemunbound (zone)unbound-control flush_zone example.comZone-specific purge
Systembindrndc flushFull cache purge
Systembind (name)rndc flushname example.comZone purge
Systembind (type)rndc flushname -type=A example.comRecord-type specific
ApplicationChromechrome://net-internals/#dnsBrowser interface
ApplicationFirefoxSettings > Clear DNS CacheOr via about:config
ApplicationJava/GoRestart applicationsNeeded for clearing
Containerdocker/podmanrestart <container>Container restart
ContainerCoreDNSkubectl rollout restart deployment/coredns -n kube-systemNew pods created; restart clients too

Best Practices for Cache Management

Development

  1. Use short TTLs for test domains (60-120 seconds)
  2. Set minimal NXDOMAIN TTL (5-30 seconds) via resolver config
  3. Know cache-clearing methods for each component
  4. Test DNS changes in isolated environments

Production

  1. Plan DNS changes; lower TTL beforehand
  2. Monitor caching via stale answer and NXDOMAIN logging
  3. Deploy centralized resolvers with controlled caching
  4. Configure appropriate TTLs:
    • Frequently changing services: 60-300 seconds
    • Static data: 3600+ seconds

Key metrics: Cache hit rate, stale responses, NXDOMAIN rate, query latency.

System Resolver Monitoring

ResolverCommandPurpose
systemd-resolvedresolvectl statisticsCache hits/misses
nscdnscd -gSystem call cache stats
dnsmasqAdd: log-queries log-facility=/var/log/dnsmasq.logQuery logging
unboundunbound-control stats_noresetServer statistics
unboundunbound-control dump_cacheCache content analysis
bindrndc statsServer statistics
bindrndc dumpdb -cacheCache content analysis

Kubernetes Monitoring

CoreDNS metrics enable identification of caching efficiency, name resolution errors (NXDOMAIN spikes), and abnormal latency.

Example metrics:

sum(rate(coredns_cache_hits_total{type="success"}[5m])) / sum(rate(coredns_cache_requests_total[5m]))
rate(coredns_dns_responses_total{rcode="NXDOMAIN"}[5m])
histogram_quantile(0.99, rate(coredns_dns_request_duration_seconds_bucket[5m]))

Example alerts:

rate(coredns_dns_responses_total{rcode="NXDOMAIN"}[5m]) > 10
histogram_quantile(0.99, rate(coredns_dns_request_duration_seconds_bucket[5m])) > 1
rate(coredns_cache_misses_total[15m]) / rate(coredns_cache_requests_total[15m]) > 0.5

Conclusion

Effective DNS cache management requires understanding where caching occurs, planning DNS changes strategically, diagnosing and clearing cache at appropriate levels, and configuring TTL values matching data characteristics.

The forthcoming third part will explore resolver interactions (glibc, systemd-resolved, dnsmasq, NetworkManager), the role of 127.0.0.53 in /etc/resolv.conf, and identifying the actual DNS system manager.