Fight against phishing e-mail with WHOIS: A technical blog based on the 2018 "Airbnb" case | WhoisXML API

White Papers

Read other articles

Fight against phishing e-mail with WHOIS: A technical blog based on the 2018 "Airbnb" case

Fight against phishing e-mail with WHOIS: A technical blog based on the 2018 'Airbnb' case

Table of contents

On phishing scams

Phishing is a way to obtain sensitive information by sending electronic communication pretending to have come from a reliable, trustworthy partner. According to the 2018 IBM X-Force Threat Intelligence Index, "Despite the increased use of chat and instant messaging applications, email continues to be one of the most widely used communication methods for any organization, and phishing attacks continue to be one of the most successful means of making unknowing insiders open the door to malicious attackers."

Hundreds of millions of phishing e-mails are sent on the Internet every day, leading to billions of dollars stolen annually, not to mention the overtaken accounts and sensitive data obtained this way. The importance of the fight against e-mail phishing cannot thus be overemphasized.

In what follows, we present an example of such a fraudulent activity which attracted a lot of attention in the media recently and whose victim virtually anyone could fall to. Through this particular example, we illustrate the use of WHOIS data in revealing this kind of malicious activity. Whois data can be an important piece of intelligence in any anti-phishing security software/solution.

The Airbnb story

Airbnb, the popular online marketplace for arranging and offering lodgings has been prone to phishing activity for several years. As an online marketplace which assists in organizing payments, it is very attractive to malicious actors who would prefer the money transfers to ultimately end up in their temporary bank accounts.

The recipe in this scheme is simple: deceptive means convince a prospective victim that his credit or debit card data have to be sent in a reply e-mail or typed in on a short-lived, yet seemingly convincing website. Alternatively, these data can be stolen from the client's account along with other sensitive information, after a persuasive email kindly asks them to send the account name along with the password in a reply, claiming it to be necessary for whatever reason.

The active enforcement of the General Data Protection Regulation (GDPR) started across Europe on May 25, 2018. In a matter of days after this data protection legislation took effect, Airbnb saw a significant burst of phishing e-mails. Paradoxically, even though the main intention with the new regulation was that "Stronger rules on data protection mean people have more control over their personal data and businesses benefit from a level playing field." (source: this link, 2018.11.06.), its introduction has led to numerous foreseen and unforeseen consequences, some of which, in fact, seem to be introducing significant IT security risks. One of the short-term impacts of the new rules was that all the companies handling data of EU citizens in any form had to contact their clients to confirm certain new agreements.

As a consequence, e-mails with reference to the new GDPR started flooding all EU citizens (with rules that many of the latter do not even clearly understand). Because most of those e-mails urged for some activity or reply, this confusion-filled scenario became a genuine paradise for phishing schemes.

The malicious scam is simple: send e-mails to all addresses in your spam database on behalf of Airbnb and refer to the new GDPR as the reason why they need to share their sensitive data. There will be enough gullible Airbnb clients on the list who will fall for the trick.

And it happened. It is enough to look at the headlines:

  • "Airbnb Customers Targeted with Phishing Scam" (Infosecurity Magazine, 4 May 2018)
  • "Redscan warns of GDPR phishing scams," (Computer Weekly, 3 May 2018)
  • "Phishing campaign aimed at Airbnb guests uses GDPR hook" (scmagazine.com, 4 May 2018.)
  • "Gardaí warn of possible rise in email scams related to new data law" (The Irish Times, 28 May, 2018.)
  • "GDPR isn't to blame for all those dumb emails you're getting" (Wired, 11 May 2018.)

etc., just to quote some of the news in English.

Let us now look at this incident from the point of view of WHOIS data.

A WHOIS-based investigation of the Airbnb campaign

There are two general ways for anti-phishing software/human to determine if an email is malicious:

  • Without scanning the full email, as that could possibly take lots of time. For this, external data sources can be used: WHOIS, NSL, proximity of the domain to a known malicious actor/domain/IP, etc.
  • By scanning the email: the contents of the email may be helpful if the link directs to a completely different domain or another malicious domain, etc.

In what follows we demonstrate the kind of information we can get, solely from WHOIS data that can be downloaded from the data feeds of WhoisXML API, supplemented by the possible use of some APIs.

About the approach

In our little investigation looking to demonstrate the footprint of phishing attacks against Airbnb in the WHOIS ecosystem, we shall use simple Linux/BASH command-line tools on our csv files downloaded from WhoisXML API, Inc. The same is trivially doable on Mac OS X as well. For Windows 10 users who want to try it out, we recommend installing Bash on Ubuntu on Windows (see our blog on how to install it: https://www.whoisxmlapi.com/blog/using-bash-andother-linux-tools-on-windows-10-for-processing-whois-data) Users of earlier server versions of Windows can also work with Microsoft Services for UNIX.

However, all of this is doable with your favorite tools such as Windows PowerShell, or Python, etc., too.

Fight against phishing e-mail with WHOIS

Single WHOIS records

Our starting point will be an example described in a related article found under this link. "While the phishing messages might look legitimate at first glance, it's worth noting that they don't use the right domain - the fake messages come from '@mail.airbnb.work' as opposed to '@airbnb.com'." The mail in the example dates back to 18 April 2018, about a month before the enforcement of the new GDPR.

Let us now check the "work" top-level domain. Looking at the WHOIS data of the domain "airbnb.work". This task is doable even with a simple WHOIS lookup or entering this search term to the "Whois lookup" field on https://www.whoisxmlapi.com. By doing so we obtain information on who the domain belongs to. Is this a suspicious domain according to these WHOIS data?

First of all, phishing e-mails frequently come from domains which were registered recently and abandoned shortly afterwards. As for the relevant dates, we have:

  • Updated Date: 2018-03-22T15:47:34Z
  • Creation Date: 2015-04-07T06:47:17Z
  • Registry Expiry Date: 2019-04-07T06:47:17Z

This does not look like a very short-lived domain. However, looking at the other lines of the WHOIS record, as for the registrant, we can probably repeat all the data without the risk of privacy violation:

Domain's registrant

  • Name: REDACTED FOR PRIVACY
  • Organization: REDACTED FOR PRIVACY
  • Street: REDACTED FOR PRIVACY
  • City: REDACTED FOR PRIVACY
  • State: Tokyo
  • Postal Code: REDACTED FOR PRIVACY
  • Country: JAPAN
  • Country code: JP

We remark here that regarding the "Technical contact", "Billing contact", and "Administrative contact" data, all the fields are "REDACTED FOR PRIVACY". Of course, due to the "stronger rules" of the new GDPR, WHOIS records are nowadays less and less informative: much of the registrants’ data are hidden for certain privacy reasons. However, if we look at the WHOIS record of the real "airbnb.com", although there aren't as many pieces of information there which traditional WHOIS used to provide, we will still learn the following:

  • Registrant Organization: Airbnb, Inc.
  • Registrant State/Province: CA
  • Registrant Country: US

We do indeed learn to whom the domain belongs. And honestly, is there any good reason to hide the "Registrant Organization" for privacy reasons?

Here all we know about the registrant is the country: Japan. The registrar in question is in fact a known web hosting and service provider, also based in Japan, with many clients, so this part seems legitimate. It is weird though that "Tokyo" is mentioned in the "State" field, whereas the "City" is "REDACTED FOR PRIVACY". Japan does not divide into ‘states’, and Tokyo is certainly not one. In fact, the "State" field is invalid, but let’s suppose it is just an error. But then what are the benefits of a real Aibnb-related enterprise doing business correspondence from Japan, from a top-level domain ".work" which does not even reflect any Japanese character? It is hard to see any good reason.

Hence, there are multiple red flags in the WHOIS record of "airbnb.work" suggesting that any correspondence coming from here or containing an URL from here in the mail body should be treated with care and at least be subjected to further investigations. (Note, however, that we do not state with certainty that "airbnb.work" is a malicious domain. We only remark that its registrant cannot be identified at all from its current WHOIS data, and its registrar and registrant are from a country not directly related to Airbnb. And although it is claimed to be in use for malicious purposes in an incident described on a discovered public web page, someone could well have misused an otherwise honest domain. We leave the estimation of the likelihood of all these to the reader.)

So far our investigation was based on a single WHOIS lookup at the time when the e-mail is investigated. When doing this with a lot of e-mails, one will require many WHOIS lookups. So when using the WHOIS protocol itself, most servers will soon refuse to serve us as they have their limitations. This problem can be overcome by using a proper Web-based API, such as https://whoisapi.whoisxmlapi.com, which will provide an accurate and up-to-date answer in JSON or XML and can be simply used from a script, e.g. with "curl".

Even simpler, the sender address "[email protected]" can be checked with our e-mail verification API. For the sake of completeness we show how this can be invoked from a shell, using, e.g. "curl":

curl --get --include \
"https://emailverification.whoisxmlapi.com/api/v2?
[email protected]"

Here you will need an API key provided with your API subscription; please replace "XXX" with your key. (A free subscription is available, so you can try what we are doing here.) This will result in the following JSON:

{
    "audit":{
        "auditCreatedDate":"2018-11-06 14:20:38.000 UTC",
        "auditUpdatedDate":"2018-11-06 14:20:38.000 UTC"
    },
    "catchAllCheck":"null",
    "disposableCheck":"false",
    "dnsCheck":"Invalid hostname",
    "emailAddress":"[email protected]",
    "formatCheck":"true",
    "freeCheck":"false",
    "smtpCheck":"null"
}

So if the mail were to be received right now, the problem would probably not be entirely at the WHOIS level, although the DNS lookup would immediately reveal that there is something wrong with it.

Let us therefore take a quick look at the DNS data of "airbnb.work". This can be easily done either with the command-line utility "dig", or with another API at whoisxmlapi.com, namely, the DNS API. On this page, there is a simple interactive entry for DNS lookup (or one may subscribe to do it from a program or with "curl"). But entering "airbnb.work" will merely give us an error message:

"Unable to retrieve DNS record for airbnb.work". Although the domain exists, it does not have a valid DNS record. This is another fact that makes the domain suspicious. A possible continuation of our investigation to the DNS direction would be the use of "passive DNS", a very important approach in forensic analysis, but we are not going into detail now, as we aim to demonstrate how far we can get with WHOIS. We’ll remark though that by using passive DNS one can find that this domain, while registered on 2015-04-07, was never seen before 2018-05-03. This is yet another red flag: it appears that it was a Newly Observed Domain (NOD) at the time of the flood of GDPR-related emails.

What if an incident has to be investigated not shortly after it happened but later on? WhoisXML API, Inc. offers downloadable WHOIS datasets, including historic ones, too. Using these data could have various benefits. One can build a local WHOIS database and keep it up-to-date so that the filtering does not rely on an external API call. Also, such a database could provide historic data. As we shall see, even without setting up a database, one can download data and find relevant information by just analyzing the files with simple tools.

An investigation based on bulk WHOIS data

We will now search for short-lived domains by using data from WhoisXML API downloadable feeds. Motivated by the previous example, we will choose a set of top-level domains whose names suggest that they may contain short-lived domains related to Airbnb. We are considering the following ones:

apartments, book, booking, business, global, hotels, international, reise, reisen, rent, rentals, trade, travel, travelers, vacations, work.

All of these are the so-called "new top level domains" in the ICANN terminology. The best approach would be to download these data for all domains, including country-code top-level domains (ccTLDs), but since this is just a quick experiment, we’ve made this subjective filtering.

Finding short-lived domains

Here we shall implement simple tools to present a proof-of-principle demonstration of how to find short-lived domains typically used in phishing attacks. Such an investigation is possible even years after the actual incident.

Downloading data

We shall use some daily data feeds, which are documented here in detail. In particular, first we shall need data from the following feeds:

  • ngtlds_domain_names_new : domains registered on a given day
  • ngtlds_domain_names_dropped : domains deleted on a given day

By examining the emergence and disappearance of domain names containing the string "airbnb", we shall be able to identify short-lived domains. We shall investigate the period from 2017-01-01 to 2018-10-30. We need the data in "CSV" format, which in this case will be just a text file with a domain name in each of its lines.

To efficiently download data we shall use a specialized download script available in the GitHub repository, in its "whoisxmlapi_download_whois_data" subdirectory. It requires series 2 Python and some modules to be installed; we shall refer to its documentation for details. Having set up this program, we change into its directory and do

./download_whois_data.py --feed ngtlds_domain_names_new \
--output-dir /path_to/downloaded_ngtlds_data \
--username MYUSERNAME --password MYPASSWORD \
--verbose --startdate 20170101 --enddate 20181030 \
--tlds
apartments,book,booking,business,global,hotels,international,reise,reisen,rent,rentals,trade,travel,travelers,vacations,work \
--dataformat csv

for the data of new domains each day, and

./download_whois_data.py --feed ngtlds_domain_names_dropped \
--output-dir /path_to/downloaded_ngtlds_data \
--username MYUSERNAME --password MYPASSWORD \
--verbose --startdate 20170101 --enddate 20181030 \
--tlds
apartments,book,booking,business,global,hotels,international,reise,reisen,rent,rentals,trade,travel,travelers,vacations,work \
--dataformat csv

for the dropped ones. (In the above command lines, please replace "MYUSERNAME" and "MYPASSWORD" with the credentials you have obtained with your subscription, and "/path_to/ downloaded_ngtlds_data" to the directory in which you want to work with the data.) Actually, those who prefer GUI mode can start this program without any command line argument, a sequence of dialog windows will then guide the user through the download process.

The result will be the following directory structure within the target directory we have specified as –output -dir: there will be two subdirectories named after the feeds, i. e., "ngtlds_domain_names_new" and "ngltds_domain_names_dropped". Within each subdirectory there will be a subdirectory named after the domain; consider "work" as an example. Within the domain's subdirectory, each date will have a subdirectory, and a CSV file and its md5 sum will be there if any domains were changed or dropped that day. Thus, the relevant files will have the path e.g.

ngtlds_domain_names_new/work/2018-10-30/add.work.csv
ngtlds_domain_names_dropped/work/2018-10-30/dropped.work.csv

for the added and dropped domains respectively.

Analyzing data

Let us consider all domains as short-lived which were added and also dropped in the examined period, i.e., between 2017-01-01 and 2018-10-30. Thus we are looking for all the domains which are there in both the "dropped" and "added" lists for a given TLD on some day. This can be found out using the following BASH code:

for tld in apartments book booking business global hotels international reise reisen rent rentals trade travel travelers vacations work
do echo "In TLD ${tld}:"
    comm -12
        <((for i in ngtlds_domain_names_new/$tld/*/*.csv;do grep airbnb $i;done)|sort)
        <((for i in ngtlds_domain_names_dropped/$tld/*/*.csv;do grep airbnb $i;done)|sort)
Done

The following output is produced:

In TLD apartments:
    airbnbmanager
    airbnbmanager
In TLD book:
In TLD booking:
In TLD business:
In TLD global:
In TLD hotels:
In TLD international:
    airbnb-rooms19982
    booking-on-airbnb
In TLD reise:
In TLD reisen:
In TLD rent:
In TLD rentals:
    airbnb-book
    airbnb-booking
    suisse-airbnb
In TLD trade:
    airbnb-bookings
    airbnb-tenant
In TLD travel:
In TLD travelers:
In TLD vacations:
    airbnb-disneyworld
    airbnb-guest
In TLD work:

Note that not all the examined top-level domains contain short-lived domains (in the sense defined above). However, we have found some short-lived ones which could indeed be suspicious.

Let us now choose one of them, e.g. "airbnb-rooms19982.international", and take a closer look at it. First we find out when they were registered:

grep -H airbnb-rooms19982 ngtlds_domain_names_new/international/*/*.csv

resulting in

ngtlds_domain_names_new/international/2018-05-17/add.international.csv:airbnbrooms19982

so the domain was registered on 2018-05-17. However, doing

grep -H airbnb-rooms19982 ngtlds_domain_names_dropped/international/*/*.csv

we have the output

ngtlds_domain_names_dropped/international/2018-06-15/
dropped.international.csv:airbnb-rooms19982

meaning that it was dropped on 2018-06-15, about one month later. Well, it is at least suspicious...

Finally, let us see the detailed WHOIS data of the domain "airbnb-rooms19982.international". A standard WHOIS query will not find it, as the domain has ceased to exist. However, as it was registered on 2018-05-17, all we need to do is get the data from the "ngtlds_domain_names_whois_archive" daily feed, as at the time of investigating this case the registration happened more than 3 month ago.

(Were this not the case, we would use the feed "ngtlds_domain_names_whois".) So, returning to the downloader script's directory, we do the following:

./download_whois_data.py --feed ngtlds_domain_names_whois_archive \
--output-dir /home/kmatyas/Asztal/Projects/WhoisApi/tmp/ngtlds_whois_data \
--output-dir /path_to/downloaded_ngtlds_data \
--username MYUSERNAME --password MYPASSWORD \
--verbose --startdate 20180517 \
--tlds international \
--dataformat regular_csv

The result will be the file

ngtlds_domain_names_whois_archive/2018_05_17_international.csv.gz

in our data directory. Thus we can look for our domain:

zgrep airbnb-rooms19982 \
ngtlds_domain_names_whois_archive/2018_05_17_international.csv.gz

resulting in the following output:

"airbnb-rooms19982.international","Tucows Domains Inc.","airbnbrooms19982.
[email protected]","whois.tucows.com","ns1.renewyourna me.net|
ns2.renewyourname.net|","2016-05-12T01:59:59Z","2018-05-16T03:22:02Z","2019-05-1
2T01:59:59Z","2016-05-12
00:00:00 UTC","2018-05-16 00:00:00 UTC","2019-05-12
00:00:00 UTC","clientTransferProhibited","2018-05-17 07:00:00
UTC","[email protected]","Contact Privacy Inc. Customer 0143005938","Contact Privacy Inc. Customer 0143005938","96 Mowat Ave","","","","Toronto","ON","M6K
3M1","CANADA","","","14165385457","","airbnbrooms19982.
[email protected]","Contact Privacy Inc. Customer 0143005938","Contact Privacy Inc. Customer 0143005938","96 Mowat Ave","","","","Toronto","ON","M6K
3M1","CANADA","","","14165385457","","","","","","","","","","","","","","","","
","[email protected]","Contact Privacy Inc. Customer 0143005938","Contact Privacy Inc. Customer 0143005938","96 Mowat Ave","","","","Toronto","ON","M6K
3M1","CA

Granted, there is a nicer way to present this result (e.g. you may unzip the csv file and open it with some spreadsheet application). However, there is no real need to do so: essentially all registrant data are obscured and this fact could be very easily found out in an automated way, too.

Hence, if one asks whether the domain used to be a malicious domain related to the phishing campaign against Airbnb, though we cannot state it with absolute certainty, it is extremely likely to have been so.

Lessons to learn

To conclude, WHOIS data are indeed very useful in the fight against e-mail phishing and similar malicious activities. Whois data and DNS data can be an important part of any anti-phishing security solution. What we have presented here was a hindsight investigation, but as the data in the daily feeds are always fresh and accurate, it is easy to turn this into an actual mail filtering procedure. A very significant limitation of the presented example is that we did not check the e-mail contents and we were considering the sender address. In most phishing e-mails there are web links in the e-mail body, and the header of the e-mail also contains technical information on servers whose registration details are of significant relevance. Nevertheless, what we did here gives a hint on how to perform such an analysis. We have used very simple generic tools to present feasible clues, but since CSV formats can be opened or imported with virtually any kind of software for data processing, there is a broad range of possible analyses based on the WHOIS data available in WhoisXML API's Whois database download subscription. Anti-phishing security solution vendors can embed whois database feed to enhance its capabilities.

Read other articles
To download the full article in PDF, please fill in the form.
I have read and agree to the Terms of Service and Privacy Policy
Please keep me updated on news, events, and offers.
Try our WhoisXML API for free
Get started