Domain Name System Primer
In this white paper, we give an overview of the Domain Name System, or DNS, one of the pillars of the Internet. We start by understanding the goal: to assign names to named resources on the Internet and to maintain their database. For this, it is important to understand the structure of domain names and DNS zones. The roles of the actors in the system are domain maintainers, registries and Network Information Centers. The structure of delegation of authority will also be clarified. We give an overview of the structure of data available in the DNS, notably, the resource records (RRs) occurring in zone files. We also review the technology side: the DNS protocol, its operations supporting queries of name resolution, zone file transfers necessary to maintain the system and for reverse mapping. We briefly mention the most popular implementations, notably, BIND, which may be the most prevalent DNS server software. This necessitates a little insight into netblocks and Classless Inter-Domain Routing (CIDR). We address the internal security issues of the DNS as well as the crucial role it plays in cybersecurity. Finally, we provide some references for further reading.
Table of contents
- 1. The need for name servers
- 2. Data behind the name resolution
- 3. DNS operations
- 4. Name Servers
- 5. A simple query example
- 6. Security
- 7. Passive DNS
- 8. Summary and further reading
1. The need for name servers
1.1. What is DNS?
Any network of digital devices operates by using addresses - technical numbers which enable the identification of the nodes. On the Internet, these are IP addresses. However, it is always necessary to give human-readable names to the addressable resources, thereby turning them into "named resources". Consequently, there has to be a technique to map the names into addresses; this is done by name servers.
On a large-scale network, such as the Internet, there is a tremendous number of named resources. This poses requirements against the solution of name-address mapping:
- There is a need for a method to organize and index names in order to efficiently find them in the system.
- It has to be decentralized for several reasons:
- The solution needs to be scalable in order to cope with the huge number of queries for name-address assignments to be served.
- It has to be fault-tolerant; thus, there has to be some reserve in case any element of the required infrastructure is unavailable.
- As the resources are run by physical entities (persons or organizations), it needs to be manageable so that the administration of certain resources can be delegated to their owners.
These requirements led to the introduction of the Internet Domain Name System in the early days of the Internet. This ecosystem has been playing a crucial role in the operation of this network ever since. Its specifications were laid down by Dr. P. Mocakpetris in as early as 1987, in the RFC documents 1034 and 1035. Though many subsequent RFCs have introduced modifications, the core functionality of the system still remains intact.
1.2. Domain name system and WHOIS
To meet the above-outlined requirements, the names of the resources are organized into a hierarchical structure. At the top, there is the name of the top-level domain (TLD), then the second-level domain (SLD), and any number of lower levels, each separated by dots, e.g., "www.example.net". In this way, the management of a sub-tree in the hierarchy can be delegated to the actual owner of the resources below the top of this hierarchy. The authority over the root domain of the Internet is with ICANN (Internet Corporation of Assigned Numbers and Names, www.icann.org).
Below this, for instance, is the TLD ".com" operated by Verisign (though the actual registrations of its sub-domains are processed via registrars accredited by ICANN), whereas "domainwhoisdatabase.com" is the courtesy of WhoisXML API, Inc. — we, as an organization, administer this SLD authoritatively. There are plenty of top-level domains on the Internet. A part of them is a so-called country-code TLD (ccTLD) maintained by the respective entities of the given countries, and there are generic TLDs (gTLDs) related to other entities. Domains are registered by registrars.
When someone, say a company, purchases as a registrant a domain name from a registrar, the latter submits, after the necessary agreements, technical data to appear in the zone files we shall describe later. After this, we say the domain name "will resolve", or get the respective IP addresses in the Domain Name System. The technical data are thus located in the DNS, along with some information about the registrant entity. But not all information, unfortunately.
By design, there is a protocol separate from those used for name resolution — WHOIS, the "phone book of the Internet" which assigns real names and contact data to the registrants, the physical entities the resource belongs to. The WHOIS sub-system is thus crucial in all questions related to the ownership of domains and IP addresses, but the accuracy of WHOIS data is not a technical requirement for the domain to operate.
Meanwhile, in the DNS, all the necessary data have to be present for this operation, but the ownership data are limited. This dichotomy of WHOIS and the other parts of DNS is frequently seen as a serious shortcoming affecting the security of both subsystems. And yet, we have to live with this, as it is a consequence of the approach of the founding fathers of the Internet whose initially saw it as a network of a more-or-less trusted and friendly community. Well, it is not quite what it became.
In the present document, we will not deal with the WHOIS subsystem anymore. Even though it is a part of the domain name system, the system itself is fully functional without it. Instead, we shall focus on name servers, since these are the first which come to mind when speaking about DNS anyway.
Before turning our attention to the actual operation of name servers and the DNS, we will mention briefly a few related topics which will not be covered in detail in this document as they are only loosely related to our main topic.
1.3. Multicast DNS
Consider a local network, possibly of many computers. It is natural to wonder whether they need the same technology as the whole Internet to manage named resources. Indeed, there is a simpler solution for them: RFC 6762 specifies the "Multicast DNS protocol", which does not employ dedicated servers to maintain the name-IP assignment. If a certain site needs the IP address of another, it simply asks all nodes: which identifies itself under the given name.
Obviously, this will only work out in the case of smaller and trusted networks, but it is a great simplification. In addition, the data formats of the mDNS protocol is 99% compatible with the standard DNS protocol (referred to as "Unicast DNS") in this context. However, as we are interested in the operation of the Internet on a large scale, involving authority and delegation questions, we will not go into the details of this protocol.
Even though the number of possible IPv4 addresses, 232, is quite impressive, it can be foreseen that these possibilities will be exhausted at some point in the future. Hence, the IPv6, a new system of identification numbers of nodes of the Internet was developed. There will be times when your Web server IP will not look something like “18.104.22.168” but, rather, more like "2001:0db8:85a3:0000:0000:8a2e:0370:7334".
The technology for this has been developed, including its support in the Domain Name System. But it is not yet prevalent and still, to some extent, in its experimental phase. So, we shall omit the details of IPv6 handling in the Domain Name System in the present document and focus on the currently common IPv4 system.
1.5. Beyond DNS: The dark side
When someone speaks of the Internet (with capital "I"), everybody considers the network we all use and refer to under this name. This is very much in line with ICANN's motto, "One World, One Internet". We have just concluded that DNS is needed for the efficient operation of this network.
But actually, a TCP/IP network has many layers, and it is just a broadly accepted convention that it should be used via DNS. We shall see that this system that enables finding resources consists of files describing the required access information and protocols to distribute and access them. But, fortunately or not, it is not impossible for someone to introduce an alternative system on the same physical network that might use completely different standards and yet still remain operational.
And still, it is feasible. What may be the most significant example is the Tor network. It is a totally different logical network running on our physical Internet. It is hard to judge whether it is good or bad. According to its developers, its main goal is to protect privacy and it is very beneficial for many benevolent actors who just want to avoid being tracked or eavesdropped on the Internet. In reality, however, it is known to be a home of the "Dark Web", the online world of crime and nasty things not to be detailed here.
The reason for us to mention this here is to point out that the Internet Domain Name System we describe here is not the only approach that exists on the physical IPv4 network, but it is what is running the thing we call the Internet. And currently (probably luckily), this is the most prevalent one.
2. Data behind the name resolution
2.1. Zones and zone files
A DNS zone is a contiguous portion of the domain name having a single entity delegated as its manager. In the tree of the namespace, a zone starts at the root of the given domain and ends either at a leaf node, i.e., host, or at the top boundary of other independently managed zones.
Zone files are the very containers of all data describing the information necessary for the name resolution of the zone. They are text files with contents standardized by RFC 1035. (Actually, there are certain conventions used by BIND, the most prevalently used DNS server implementation which does not comply fully with this standard, but they are now generally accepted.) Thus, zone files are both human-readable and machine-parsable: DNS software reads the information from these.
Our goal here is to obtain a basic understanding of the contents of zone files, as it is needed in order to understand DNS operations.
The contents of zone files can be subdivided into three types:
Like virtually all kinds of computer code, they are necessary for human readability. Here, they start with the ";" character.
These start with a "$" sign. They manage the processing of the file.
- Resource records
Those are the actual data lines describing the properties of the domain and the entities contained within.
Let us see a little example of a zone file:
$TTL 86400 ; 24 hours could have been written as 24h or 1d ; $TTL used for all RRs without explicit TTL value $ORIGIN example.com. @ 1D IN SOA ns1.example.com. hostmaster.example.com. ( 2002022401 ; serial 3H ; refresh 15 ; retry 1w ; expire 3h ;nxdomainttl ) IN NS ns1.example.com. ; in the domain IN NS ns2.smokeyjoe.com. ; external to domain IN MX 10 mail.another.com. ; external mail provider ; server host definitions ns1 IN A 192.168.0.1 ;name server definition www IN A 192.168.0.2 ;web server definition ftp IN CNAME www.example.com. ;ftp server definition ; non server domain hosts bill IN A 192.168.0.3 fred IN A 192.168.0.4 joe IN A 192.168.0.2
Most directives are not very important to us, except for the mandatory $TTL directive which defines the Time to Live (TTL) value. This is the default duration for which the Resource Records can be saved or cached by another DNS server.
The $ORIGIN directive gives the name of the domain in argument, but it is optional. If provided, however, the value of $ORIGIN will be appended to it, if any name appears in what follows and it does not end with a dot character ".".
The reason for this is that the file should use Fully Qualified Domain Names (FQDN). That is, it should define the exact location of the domain name in the DNS tree, and the terminating dot here represents the root domain. In addition, the "@" character in the SOA resource record will be substituted for its value, in our example, "example.com.".
2.2. Resource records
From our point of view, the most important elements are the Resource Records (RRs), as they are the ones containing the information on the zone. Let’s see what they tell us.
The first one, the SOA (Start of Authority) RR, has to be the first, and it is mandatory. It is a multi-line RR. Looking at our example, it should be read as follows:
- The "@" character is the name of the domain, now as $ORIGIN has been set, it will be substituted to its value, "example.com.".
- The "1D" stands for one day; it is the TTL (Time to Live) of this very RR. If it is omitted, then the default $TTL would be used.
- "SOA" stands for the record type.
- "IN" stands for the network class, "Internet" in our case. In practice, it is always "IN" in zone files; there are some other possibilities, but they almost never appear in practice.
- "ns1.example.com." is the Primary Master name server for this domain. It will be also specified in a separate RR, but it is mandatory here. (It can have a special meaning though when it is used with Dynamic DNS configurations).
- "hostmaster.example.com." stands for an e-mail address, the first dot should be read as "@" — so it is "firstname.lastname@example.org". This is the administrative e-mail address for the zone, and according to the recommendation of RFC 2142, it is typically "hostmaster@domain".
- "2002022401" is a serial number associated with the zone; this is essentially the version number of the information. By convention, it uses the format of a date "yyyymmdd" followed by a two-digit serial number specifying the version within the day. This field has to be updated every time a change is made to the zone.
- The following time-type fields affect the operation of slave/caching name servers, which we shall describe in detail later.
Name server records. The first few fields are just the same as we saw in the SOA record. The "name" field is empty here, meaning that it is substituted from the preceding SOA record. (This is a general rule: if no name is given in any type of record, the "name" field of the SOA record shall apply.) No TTL is specified, so the default $TTL applies. Finally, in our example, we have "ns1.example.com.", the FQDN of a name server within the zone, and "ns2.smokeyjoe.com.", which is the secondary name server in some other domain, typically at some other location. This increases the robustness of the system — even if the infrastructure of the whole domain fails for some (possibly technical) reason, a name server somewhere else in the world is likely to be available. The organizations typically find partners to run their secondary name server on the basis of a mutual trade-off business (I back you up,, you back me up).
These are the default mail servers for the domain. The syntax is just as in the case of the NS records, apart from the additional number before the last record. This is a priority level: it is a number between 0 and 65535. The lower the number, the higher priority a given mail server has.
These are the very hosts. Each IP address which can be resolved has to have a name (this is the first field) and an assigned IP (this is the last one). Note that the same IP can have multiple A records, like the Web server "www", and Joe's machine, "joe" in our example. Also note that since $ORIGIN is set, "joe" will be substituted for "joe.example.com.", illustrating how useful this directive can be.
These are essentially aliases: the name in the first record is an alias for the name on the right. It can be used for many purposes. Importantly, the alias can point to a host outside the domain. A typical use of CNAME is to enable the Web server to be seen both as "example.com" and "www.example.com":
IN A 192.168.0.2 www IN CNAME example.com.
The first line defines an IP resolving to $ORIGIN, that is, "example.com.", whereas the second one defines "www.example.com." as an alias to "example.com."
We reached the end of our example, and, in fact, what we understand so far is almost completely sufficient for the operation of a domain. The only exceptions are the records of type "PTR", the ones needed for finding out the host name from an IP. This is the topic of "reverse mapping", which we shall address in Section 3.2.
There are many other types of special records. For a more exhaustive list, we refer to the following blog https://www.whoisxmlapi.com/blog/dns-the-dark-knight-of-the-internet/ for a quick overview, or to the cited books for a more detailed account.
Having understood the structure of the information present in the domain name system, let us now proceed to how it is actually distributed and maintained.
3. DNS operations
Here we describe the operations of the Domain Name System. These are realized using dedicated protocols, involving both TCP and UDP communications. The standard port of this service is 53.
3.1. DNS Queries
This is the operation realizing the main goal of DNS: to translate names to IP addresses. Each networked device has a component, the stub resolver (or resolver in brief) for that purpose. If an application, e.g., a Web browser, needs the address of another system, e.g., for visiting "www.whoisxmlapi.com", it will ask the resolver: "What is the IP address of www.whoisxmlapi.com?" There are two possible ways for the resolver to get this information.
3.1.1. Iterative queries
This is the kind of query which must be supported by all name servers. The process, in this case, is as follows:
- The resolver asks the locally configured default name server about "www.whoisxmlapi.com".
- The locally configured nameserver looks up the address in its cache, which is built from previous queries.
- If it finds the address, it returns the answer along with the related CNAME records (aliases), and the query is completed. This answer is non-authoritative in this case.
- If the required information is not there in the cache, the local name server replies to the resolver with a referral to the root server of www.whoisxmlapi.com.
- The resolver asks the root server for the list of authoritative name servers for the given TLD, ".com." in our case.
- Using the answer, the resolver asks the TLD name server for the list of authoritative name servers of the SLD, ".whoisxmlapi.com." in our case.
- Finally, the resolver asks the authoritative name server of the SLD about the IP address of "www.whoisxmlapi.com", and receives the authoritative answer.
Apart from IP addresses (possibly with CNAME records and referrals), there can be answers showing a temporary or permanent failure, or reflecting the absence of the domain (NXDOMAIN), which are treated in the protocol just as one would logically expect.
Note that here all the communication went between the resolver and various name servers in several iterations, hence the name. No direct communication was going on between the name servers directly, i.e., there was no recursion. But it is easy to see then that if this was the only possibility, the cache of the local name server (or any other name server) would remain empty. Therefore, at least the local name server, and possibly some others, should support the communication to other name servers. This leads us to the need for the other type of query.
3.1.2. Recursive queries
This type of query is not necessarily supported by name servers. It enables communication between the servers and thus supports building a cache. Let us see our previous example now in a scenario where the local name server supports recursion:
- The resolver asks the local name server about "www.whoisxmlapi.com".
- If the local nameserver finds the information in the cache, a non-authoritative answer is returned and the query is concluded.
- In the absence of the information in the cache, the local DNS will ask a root server about the authoritative server of the TLD, ".com". A referral will be returned.
- The local name server asks a name server of ".com." for the authoritative name servers of the SLD ".whoisxmlapi.com.", and a referral is returned.
- The local name server asks the authoritative name server of ".whoisxmlapi.com" about "www.whoisxmlapi.com".
- The obtained information is returned as an authoritative answer to the resolver.
- Meanwhile, the information is cached; it will live till the prescribed time (Time To Live, TTL), so if the same question is asked from the local name server again, there is no need to ask for referrals.
The errors and non-existent domains are also treated logically here. Note that the resolver does not receive any referrals in this case. Apparently, the main difference between this protocol and the previous one is that the handling of referrals is done now by the local name server and not the resolver itself, thereby also supporting the caching activity of the local name server.
3.2. Reverse mapping
So far, it is clear how we find out the IP of a host by its name. But in many cases, the opposite is needed: we have an IP address, and we want to know the name (or names, aka aliases) it belongs to. Even though the DNS was designed to have a special kind of query for the purpose, it has never been put into practice. Finally, it was even made obsolete by RFC 3425. It happened so that in the problem of finding a name for an IP, the "reverse mapping" can be handled using the same tools as the direct name to IP mapping with a neat trick. And indeed, this is the de facto way it is done. To understand the idea, however, we need some background information about the delegation structure of IP addresses.
Do IP addresses have a hierarchical structure like that of domain names? They should have one, indeed, as the responsibility has to be delegated not only for domains but also for IP addresses somehow.
The key to this is "Classless Interdomain Routing", CIDR, which we summarize here very briefly. (If you are interested in the details, an explanation can be found, for example, here: https://ip-netblocks-whois-database.whoisxmlapi.com/blog/who-owns-the-internet-ip-netblocks-whois-data-will-tell-you)
An IP address, say, 22.214.171.124, has 4 numbers between 0 and 255. In a binary representation, this is 4*8 bits. In our example, it will be 01101000000110111001101011101011. We keep the trailing zero as we need exactly 32 bits, but we omit the dots; they do not have any role from now on: the octets are concatenated, forming a single 32-digit binary number. This is the ordinal number of the machine.
The assignment of the authority over multiple IP addresses is done in netblocks: these are contiguous intervals of IP addresses. They are defined by fixing a given number of most significant digits.The address in the above example belongs to a netblock in the CIDR notation 126.96.36.199/12, which means the first 12 digits define the block, and the remaining less significant ones define the actual host. So, our IP is between the beginning and the end of this interval:
011010000001.00000000000000000000 = 188.8.131.52 011010000001.10111001101011101011 = 184.108.40.206 011010000001.11111111111111111111 = 220.127.116.11
How about the hierarchy? Clearly, if we put lower digits, we get a bigger interval, and all the smaller ones will be within that one. E.g., our netblock belongs to a higher-level one as well in the hierarchy, 18.104.22.168/8:
01101000.000000000000000000000000 = 22.214.171.124 01101000.000110111001101011101011 = 126.96.36.199 01101000.111111111111111111111111 = 188.8.131.52
This is a very elegant way of subdividing the whole IP range into a hierarchy of contiguous intervals which either do not intersect or where one contains the other. And, indeed, the delegation hierarchy of IPs is arranged on this basis.
3.2.2. The reverse mapping domain
When comparing to the hierarchy of domain names and looking at the binary numbers representing the IPs as strings, we find a significant difference. In the case of domain names, the highest level in the hierarchy, the TLD is at the end of the string, whereas in the case of IPs, the bits, that is, the characters specifying the higher order in the hierarchy, are at the beginning. And here, the big idea comes in: if we reverse the IP address character by character, the two hierarchies become compatible. Now, as the DNS has tools for handling the hierarchy of domain names, we can use the same tools for the reverse name resolution!
So, how does it work out?
- Define a special root domain for IP addresses. This is named "IN-ADDR.ARPA.". (Historically, it used to be directly related to the organization "ARPA", but now it is meant as "Address and Routing Parameter Area".)
- Within this domain, an IP will be represented by a name having all its digits inverted, e.g., "184.108.40.206" will be "220.127.116.11.IN-ADDR.ARPA."
- In the zone file, we need a special RR for these names, this is "PTR". So, a record in a reverse zone
file would look like:
235 IN PTR foo.example.comassuming that this IP belongs to "foo.example.com". The formal syntax of this record is "name ttl class rr name". The first name is treated as a string, albeit it looks like a number; the $ORIGIN directive is in action here as well, unless we write an FQDN, like "18.104.22.168.IN-ADDR.ARPA.". If the TTL is not defined, like in our example, the default is used — IN stands for the Internet, and PTR is the type of this RR.
With these conventions, the reverse resolution can be solved exactly in the same way as the forward resolution. As for the actual administration and hierarchy, the players are somewhat different than in the case of zone files.
3.2.3. Organizations maintaining the reverse zone files
At the root of the system of IP addresses is the Internet Assigned Numbers Authority (IANA); they maintain the root name servers for .IN-ADDR.ARPA. They delegate the smaller blocks to Regional Internet Registries (RIRs) that run the servers on their level (a kind of counterpart of the TLDs in the case of domain). There are currently five of them:
- ARIN, North America
- APNIC, Asia-Pacific
- AfriNIC, Africa
- RIPE NCC, Europe
- LACNIC, Latin America/Caribbean
These then delegate smaller blocks to smaller organizations or persons; everyone with a specific netblock has to run the respective server.
So, all that we have said about recursive and iterative queries work in the same way as in the case of inverse mapping, using the above hierarchy of servers.
3.3. Zone maintenance
This is the set of operations which enable the different authoritative name servers to keep their zone files up to date. As the details are less important from the applications' point of view, we just provide a brief overview of the involved operations. We remark, however, that these are essential for the proper operation of the domain name system, especially from the performance and robustness point of view. The main operations are as follow:
Full ZoneTransfer is simply the polling of the whole zone file, typically from a master to a slave server. It is initiated by the slave. Such polling has to take place according to the timings defined in the SOA record, where all the relevant time parameters, such as timeout, are defined. It is important that the zone file does not get updated if the one to be polled does not have a bigger serial number than the currently available one. A con of AXFR is that a zone file can be huge; an incremental update is much more efficient in some cases.
Incremental Zone Transfer is an update of the zone file restricted to the changed records only. It was introduced in RFC 1995. It is done under the same conditions as AXFR, also initiated by the slave, but it requires much less data to move, so it is much more efficient both regarding the time required to carry it out, and bandwidth-wise.
Also introduced in RFC 1995, this is an operation to the inverse direction as compared to the previous two: it is used to notify slaves that a change in the zone file might have occurred, so it is likely that they should poll it. This has significant benefits for the propagation time of zone file changes.
All these rather logical maintenance operations are based on zone files as literally files existing on certain servers and being interchanged amongst them. With the growth of the Internet, this also became a bottleneck. The files became huge and hard to administer. In addition, if any change appears, the server has to read the whole file again sequentially, causing a possibly unacceptable unavailability time. This leads to the need for dynamic DNS introduced in RFC 2136. This enables the update of zone records from external sources. However, it does not allow for adding or deleting a new zone. In addition, it raises additional security issues as there are more servers involved in the update. Hence, the same RFC defines the concept of a primary master name server which is just one of the master name servers but authorized to control the DDNS process.
Having understood the key DNS operations, let us see what types of name server occur in the DNS system.
4. Name Servers
In this section, we take a closer look at the servers themselves which run the DNS protocol. First, we will classify them based on their role in the system, then we will briefly describe some particular implementations.
Even though we frequently speak about types of name servers, maybe using the term "role" instead of “type” would be more in order. Actually, the same physical server can be a master of a given zone and a slave in another, and may even serve as a caching server in the meantime, depending on the configuration of its software. And the commonly-used implementations allow for very byzantine settings as well. Nevertheless, it is important to distinguish between certain roles:
- Master Name Servers
These read the information directly from the zone files (edited locally). They give authoritative answers about the hosts in their zone, enable the slaves to poll zone files from them, send them NOTIFY if appropriate.
- Secondary Name Servers
They are the slaves. They poll their zone files from their master and provide authoritative answers to queries regarding their zone.
- Caching Name Servers
These do not have complete zone files. They have a cache built from the non-expired results of previous queries and can provide non-authoritative answers to queries they hold the answer for. They support recursive operation and communicate with slave or master servers when they receive a query whose result is not yet cached. If they forward an authoritative answer to the resolver, their answer is also considered as authoritative.
In addition, there are some other types not directly relevant from the point of view of the global DNS ecosystem:
- Forwarding or proxy name servers
These forward all queries to another name server, and cache all the obtained results. At first, this sounds pretty much like a caching name server, but it is not the case. These name servers will not process referrals at all, hence the communication between them and the resolver is restricted to one query-response pair in the case of each lookup request. They are mainly useful for saving network traffic.
- Stealth name servers
These are the ones serving a local network whose sites are not visible from the outside. So, the hosts, except for a few servers, are within a demilitarized zone (DMZ), they have internal IPs, and they see the Internet through a firewall gateway, typically with IP masquerading. Their specialty is that they are expected to answer the queries of the internal hosts, both regarding domains on the Internet and host names within the DMZ. Sometimes, they are also called DMZ, or split name servers.
Perhaps, the most prevalent piece of DNS software is BIND, the Berkeley Internet Name Domain, which was originally developed at the University of California, Berkeley. It is a free, open-source, and reliable implementation running on most root servers, etc.
Alternatives do exist, though. Microsoft Windows servers, for instance, have their own DNS server implementation. And there are many others. Some are designed to act as a simple proxy, some are designed to be an authoritative-only server, etc. A good comparison of these implementations are here: https://en.wikipedia.org/wiki/Comparison_of_DNS_server_software.
Importantly, as we have described, a standard zone file can be migrated from one implementation to another. But many of the servers (including BIND) accept non-standard features in the zone file, like using time units other than seconds. This should also be taken into account if zone files are analyzed with any other type of software.
5. A simple query example
But what do end-users see from all these? Well, not too much. In most cases, they type in a name, and they are not even familiar with the existence of an IP address.
However, as professionals, we can send a query to a server and obtain the accurate answer. The very reason for putting this short section here is that in order to really understand what is going on, we need to illustrate everything that we have discussed so far.
There is a variety of tools for this. We shall use the nslookup utility available on most platforms (even though the Linux and other UNIX-flavor communities tend to prefer the command dig instead).
So, let us give it a try: on my typical Ubuntu host, the command
will result in the not-so-detailed non-authoritative answer:
Server: 127.0.1.1 Address: 127.0.1.1#53 Non-authoritative answer: Name: www.example.com Address: 22.214.171.124
Note that the answer was given by my local host. Indeed, most Linuxes tend to run a proxy name server locally. But what if I'm interested in the related SOA record, too? The "nslookup" has many options, including this one:
nslookup -type=soa www.example.com
and the answer will be:
Server: 127.0.1.1 Address: 127.0.1.1#53 Non-authoritative answer: Can't find www.example.com: No answer Authoritative answers can be found from: example.com origin = sns.dns.icann.org mailaddr = noc.dns.icann.org serial = 2018112857 refresh = 7200 retry = 3600 expire = 1209600 minimum = 3600
Well, in fact, it is not "www.example.com" but "example.com" that has an SOA record. So I could have said:
nslookup -type=soa example.com
Server: 127.0.1.1 Address: 127.0.1.1#53 Non-authoritative answer: example.com origin = sns.dns.icann.org mailaddr = noc.dns.icann.org serial = 2018112857 refresh = 7200 retry = 3600 expire = 1209600 minimum = 3600
Or, if I want to have an authoritative answer directly, I can specify the name server host:
nslookup -type=soa example.com sns.dns.icann.org Server: sns.dns.icann.org Address: 126.96.36.199#53 example.com origin = sns.dns.icann.org mailaddr = noc.dns.icann.org serial = 2018112857 refresh = 7200 retry = 3600 expire = 1209600 minimum = 3600
Finally, let us demonstrate a reverse lookup:
Server: 127.0.1.1 Address: 127.0.1.1#53 Non-authoritative answer: 188.8.131.52.in-addr.arpa name = whoisxmlapi.com.
Of course, what we have seen here is just a small portion of the supported possibilities, and we encourage our readers to play around with them. All the types of RRs are available through these queries, even those which we have not yet discussed, e.g., the ones defined in support of security.
In this section, we will address two points. First, we will provide an overview of potential threats against the DNS system itself and the possibilities of its protection. Then, we will discuss the role of the DNS in overall IT.
6.1. Internal security of the DNS system
The DNS protocol, by its original design, is based on unencrypted network communications. Hence, it is prone to various security threats. These even include the modification of delegation details. We go through these along with the possible means of protection.
- Zone file corruptions
A corrupt zone file, regardless of whether it got corrupted accidentally by some mistake made by authorized personnel or by a malicious intruder to the system, can obviously cause a lot of problems: lack of proper updates, invalid name resolutions, or even the malfunction of a master server. This is a local issue, and it can be overcome by proper system administration and ensuring the overall server security.
- Zone file transfers
They are vulnerable against various types of attacks. For instance, a malicious agent can intercept AFXR or IFXR communications and inject distorted information into the system, e.g., by IP address spoofing, thereby poisoning slave name servers. One way to overcome this is to disable zone transfers. But obviously, it is not always possible. Another option is the protection of the network architecture itself. Finally, the communication can be authenticated and encrypted. RFC 2845 describes the Transaction SIGnature (TSIG) protocol to facilitate an authentication step of the zone file update process. It uses shared secret keys and one-way hashing to ensure the security of the authentication. A special RR type, TKEY is used in various modes to facilitate the establishment of the shared key.
- Dynamic updates
The same can be said here as in the case of conventional zone file updates: address spoofing or unauthorized updates can introduce invalid data into the system. Besides TSIG, there is another related protocol, SIG(0), for request and transaction authentication based on public-key cryptography, c.f. RFC 2931.
- Attacks against remote queries
Subverted masters or slaves, as well as poisoning caches, are all possible attacks against Server-Client communications. A good solution is the use of DNSSEC (Domain Name System Security Extensions), designed for authenticating these communications securely, albeit lacking encryption of the actual communication. This obviously also requires a variety of additional RRs. It is not yet prevalent, but there are a lot of pilot projects and zones where it has been introduced. Additional information can be obtained from https://www.dnssec.net/projects.
- Attacks against resolver queries
These are similar to those mentioned in the previous item, affecting communication between remote and local clients. Besides, the use of DNSSEC, the usual SSL/TLS encryption of the communication is a good way of protection.
6.2. DNS in IT security
The connection of domain names with IP numbers is of paramount importance in IT security. For instance, many spam mail filtering methods are based on the verification of the validity and appropriateness of the DNS data of the sender. Firewall logs contain primarily IP addresses, hence, when investigating threats, it is important to see if it is possible to validly assign domain names to these. And if there are some data, they can reveal a lot of information about the opponent. Many other applications can be listed; considering that naming resources is an inherent feature of any electronic network communication, and it is naturally related to the identity - real or virtual - of the communicating entities.
7. Passive DNS
DNS has one significant shortcoming, especially when viewed from the IT security point of view. While it always contains timely information about domains and IPs, it is just a snapshot which does not allow obtaining DNS information of past time instants within this system. Of course, it is quite natural that even if the snapshot embodies a tremendous amount of data, it is virtually impossible to maintain the whole history. And yet, it would be of paramount importance.
7.1. Reasons why we need passive DNS
Imagine, for instance, that you find an IP address upon the investigation of some threat, but the IP address has ceased to exist. It is likely that at the time of the attack, it did resolve correctly, but then it has disappeared. At least, a chance to find a past resolution of the IP or domain would be a fundamental clue. And even if an IP address that has been marked as malicious does not resolve anymore, the data from the past could still provide a key for the identification of its domain, thereby preventing the malicious activity of the opponent. So, the past data has implications for the present and future security issues, too.
In another example, to detect the success of the aforementioned threats of the DNS system itself, it would be handy to have resolution data of the past. Its analysis could reveal the changes then.
These data can be used in more sophisticated ways in threat intelligence, involving a variety of big data and even machine learning tools, e.g., in order to reveal an algorithm generating short-lived domains registered by a suspicious agent.
7.2. The solution: Passive DNS
Passive DNS, which is otherwise not part of the DNS protocol, provides the very data the applications in the previous section cry for. The original idea was introduced around 2004: to use recursive name servers to log responses received from various name servers, and save the collected data, augmented with timestamps, in a compressed form, to a central database. Note that in this approach, no stub resolver to name server communication goes on; it is based on server-server communication. This saves a lot of network traffic and excludes vulnerabilities related to the avoided kind of protocols. In addition, it does not pose any privacy issues: you will not collect data on who and why a person tried to resolve an IP or a domain.
There are several passive DNS services on the market. The servers collecting the data are termed as DNS sensors, and they provide data for a central, usually very big database. Different services may have different strategies to select the communications to be logged from among the whole DNS traffic. Passive DNS has become a fundamental tool in IT security.
7.2.1 Passive DNS Applications
Passive DNS is an enabler, as it allows existing threat solutions to better perform their important roles. At the same time, it is a facilitator, as it helps produce actionable information that cybersecurity teams can use to be one step ahead of malicious actors.
These functions are made possible through a huge passive DNS database, the analysis of which can reveal the suspicious movements of past domain data which can be leveraged for threat intelligence purposes. Passive DNS data can also be correlated with other information or integrated into APIs for swift analysis.
Below are the relevant use cases of Passive DNS, and why they are crucial to cybersecurity maintenance:
|Application||How passive DNS can help|
|Locating domains connected to known malicious addresses||
|Identifying malicious infrastructure and suspicious activities||
|Fraud and domain name infringement detection||
|Getting actionable insights on the attacks and their mitigation||
8. Summary and further reading
The present document aims to give a quick introduction to the Domain Name System, a crucial ingredient for the operation of the Internet. We have briefly reviewed its concepts, system architecture and implementation, goals and means to reach them, and, notably, its security issues and role in IT security.
This information is sufficient for a newcomer to have a basic understanding of the topic. But, of course, there are many additional details not described here. In this regard, we refer to the extensive literature on the subject.
There is a tremendous number of books and other documents available about the topic. To name a few, “Pro DNS and BIND” by Ron Aitchison provides a detailed, self-contained, and practical introduction to the topic. It is also worth mentioning Cricket Liu's classic works, such as “The DNS and BIND” cookbook. As for DNS security, “DNS Security: Defending the Domain Name System” by Allan Liska and Geoffrey Stowe is a comprehensive source.
As for passive DNS, there are many good reads, too. The original idea of passive DNS is due to Florian Weimer, who has a very informative page on this: http://www.enyo.de/fw/software/dnslogger/ Though relatively old, his original paper is still one of the best introduction to the idea of passive DNS, its functionality and applications.
Finally, we remark that WhoisXML API, Inc., offers various API and database products related to the DNS system. A DNS Lookup API provides a simple and convenient way to perform DNS lookups. The Reverse IP/DNS API provides comprehensive DNS information on an IP address, including its past. The Reverse MX API reveals all domains that use the same name server, whereas the Reverse NS API finds all domains with the same name server. These APIs provide a handy way of obtaining useful information which is not very easily found in the Domain Name System otherwise. The services are based on current and historic databases, which are also available for download.