ICANN-Supported Statistical Analysis & WhoisXML API: Making Sense of DNS Abuse in gTLDs
About
Statistical Analysis of DNS Abuse in gTLDs (SADAG) is a collaborative research project between SIDN Labs and Delft University of Technology. The research was commissioned by the Competition, Consumer Trust, and Consumer Choice Review Team with the support of ICANN to investigate the rate of abuse in both legacy and new generic top-level domains (gTLDs). The study has since become a valuable basis for subsequent research projects, including the ICANN-commissioned Inferential Analysis of Maliciously Registered Domains (INFERMAL) Project.
Highlights
-
The researchers primarily relied on blacklists that did not have the contextual information the study required, such as registrar names and registration dates.
-
They tapped into WhoisXML API’s massive WHOIS database to obtain the data points they needed for their analysis.
-
The researchers were able to seamlessly map 193.5+ million domains to their WHOIS records.
Obtaining the Pertinent Attributes of Malicious Domains
To deeply analyze DNS abuse in both new and legacy gTLDs, the researchers needed to retrieve specific attributes of malicious domain names that appeared on 11 reputable blacklists representing malware distribution, phishing, and spamming.
The research methodology required the registrar information so the researchers could analyze the distribution of malicious domains across registrars and what type of registrars are mostly associated with the blacklisted domains. They also needed to obtain the domains’ creation dates to study the initial intent of the domain registration—whether they were registered solely for malicious purposes or if they were legitimate domains that were compromised.
Accurate and Extensive WHOIS Data Repository
The researchers tapped into WhoisXML API’s massive WHOIS database to obtain the registrar details and registration dates of hundreds of millions of malicious domains spanning 18 legacy gTLDs and 1,196 new gTLDs. These included .aero, .asia, .biz, .cat, .com, .coop, .info, .jobs, .mobi, .museum, .name, .net, .org, .post, .pro, .tel, .travel, and .xxx. It also contained WHOIS information for the domains of 1,196 new gTLDs.
The database’s standardized and consistent format enabled them to easily extract the required attributes and deepen their study.
Identification of Cybercriminal Behavior
With the help of the WHOIS database, the researchers were able to identify certain registrar characteristics that contribute to the behavior of cybercriminals. For one, the study was able to find a correlation between malicious domain registration and the registrars’ pricing strategies.