mins read

XVigil’s Fake Domain Finder: A Deep-dive on the Acquisition Component and Crawlers

XVigil’s Fake Domain Finder: A Deep-dive on the Acquisition Component and Crawlers

July 9, 2021
Green Alert
Last Update posted on
February 3, 2024
Don't let your brand be used to trap users through fake URLs and phishing pages

Identify and counter malicious links and phishing attempts effectively with CloudSEK XVigil Fake URLs and Phishing module, bolstering your defense against cyber threats

Schedule a Demo
Table of Contents
Author(s)
No items found.

 

CloudSEK’s flagship digital risk monitoring product XVigil a SaaS-based platform, fortifies an organization’s external security posture from threats arising from the external attack surface. XVigil offers services such as Brand Monitoring, Cyber Threat Monitoring, and Infrastructure Monitoring.

The Brand Monitoring solutions offered by XVigil, scans the internet to detect relevant mentions of an organization’s brand, asset, or product, that could potentially tarnish its brand reputation. The Brand Monitoring functionality comes with multiple use cases such as:

  • Fake/ rogue application detection
  • Fake domain finder
  • Fake social media page monitoring
  • Fake customer care number finder

XVigil identifies and classifies brand impersonation threats like fake apps, domains, social media pages, and sends real-time alerts to clients notifying them about suspicious activities. For instance, the platform has reported and alerted CloudSEK’s clients regarding more than 10 million fake domains from across the internet.

Fake Domain Finder Framework

Fake domain monitoring is part of XVigil’s premium brand monitoring services that detects domain abuse by threat actors, through phishing or impersonation. The fake domain finder framework depends on the following components:

  • Acquisition
  • Classification

While the acquisition component is responsible for acquiring suspects from across sources on the internet, the classification component is responsible for detecting suspicious data points from the filtered list. 

The suspects acquired using the acquisition component are filtered based on their relevance, with respect to client assets, and are forwarded to the classification component. The classification component, in turn, filters suspicious domains and sends them for further classification.

Tech Stack

Here is the list of few technologies, tools, services, databases that XVigil uses to run the Fake Domain Finder:

  • Kubernetes

The entire project runs on Kubernetes, which orchestrates/ schedules the units, manages the lifecycle and helps in the crash recovery of individual units.

  • Elasticsearch

XVigil uses Elasticsearch as the primary database, in which the data is stored as Object Documents or Scan Documents. Elasticsearch is also used to create secure domain repositories that protect data from generic crawlers.

  • Redis

Redis Streams helps to manage data consumption and preserve the state of data when users are offline.

  • Nginx Ingress

A DaemonSet of Nginx Ingress is deployed through bare metal deployment. All the REST calls are routed through Ingress to the respective services.

  • GitLab

Gitlab is used to create git repositories, continuous integration pipelines, and container registry. When a commit is pushed, it automatically creates a new pipeline which builds the docker image and pushes it to the container registry of that repo. Kubernetes then pulls the docker image from the registry.

  • Others

Flask is used to create REST-APIs, JSON-Schema for validation and Rabbitmq to create queues and Grafana for monitoring.

As the module is large and has multiple proprietary technologies that enable it to perform at such a scale, it would be quite difficult to elaborate on each process. Which is why, this blog will only cover details about the acquisition component and the use of crawlers. 

Acquisition Component

The acquisition component is responsible for managing crawlers that acquire new suspicious domains that masquerade as the clients’ domains. A cron job is executed to periodically aggregate such domains and schedule other such recurring tasks/ jobs. The acquisition controller facilitates communication with the acquired components and exposes a REST interface.

The acquisition controller is connected to a unit called crawlers orchestrator, the role of which is to spawn crawlers and maintain its lifecycle. The orchestrator spawns crawlers based on the acquisition jobs that Redis assigns to it. At the completion of which, it is submitted to the acquisition controller.

Acquisition Controller

Acquisition controller is the interface that allows interaction with the acquisition component. It provides a REST interface to perform the following tasks:

  • Submit acquisition jobs
  • Register new crawlers
  • Delete existing crawlers
  • Update specifications of existing crawlers 
  • Send signals to orchestrator

Crawlers Orchestrator

Crawlers orchestrator is the unit that spawns crawlers based on the assets they receive. It listens to a stream of events, followed by which actions are initiated in accordance with the events. The following events and the actions are associated with the orchestrator:

  • ScheduleCrawlers

This event is initiated by a background thread and is used to schedule crawlers and jobs, it is submitted by the acquisition controller that will run in the orchestrator.

  • WipeDanglingScheduled

This event is initiated by a background thread that triggers the action to wipe dangling scheduled crawlers that run in the orchestrator. This includes those crawlers whose entry is present as scheduled but isn’t actually running in the system.

  • DeleteCompletedCrawler

This event is initiated by the acquisition controller to delete a crawling job.

Types of crawlers

Crawlers orchestrator deals with two types of crawlers:

  1. Generic Crawlers: These are crawlers that don’t perform searches. They visit the data source and scrape all the data available at the source itself.
  2. Keyword Crawlers: These crawlers depend on keywords to perform searches. The search results are then scraped and fed to the system.

Crawler

Crawlers visit sources and scrape the entire data or search-matched data, depending on its type. While the entire data is pulled by the generic crawlers, the data that matches a search is pulled by keyword crawlers.

When the crawlers orchestrator creates a crawling job, it mounts a configmap. If the spawned crawler is a keyword crawler, it gets its search terms. Upon completing a crawling task, the crawler sends a completion signal to the acquisition controller. The controller then takes necessary actions to wipe the crawler’s job and updates the Redis keys to further schedule decisions.

Common Crawlers

Subdomains Repository

The subdomains repository scans millions of domain records already present in-house to identify suspicious domains.

Typosquatting

We use an in-house permutation engine that is used to perform/ detect typosquatting, brandjacking, URL hijacking, fraud, phishing attacks, corporate espionage and threat intelligence. It can be used to generate and record typo-squatted domain/ subdomain names identical to the client’s assets.

The AI/ML-powered XVigil provides specific, actionable and timely intelligence to clients, allowing them to intervene and take remedial steps preventing costly breaches and losses. Fake Domain Finder is one of the multiple premium services and solutions offered by CloudSEK. The framework of which is composed of the acquisition component and the classification component. The acquisition component initiates the process of identifying suspicious, fake domains from across the internet and filtering them based on their relevance. But their activities would be rendered futile without the help of crawlers. This article explains the framework of XVigil’s Fake Domain Finder, with its primary focus on the functions of the acquisition component as well as crawlers. 

Author

Predict Cyber threats against your organization

Related Posts
Blog Image
May 19, 2020

How to bypass CAPTCHAs easily using Python and other methods

How to bypass CAPTCHAs easily using Python and other methods

Blog Image
June 3, 2020

What is shadow IT and how do you manage shadow IT risks associated with remote work?

What is shadow IT and how do you manage shadow IT risks associated with remote work?

Blog Image
June 11, 2020

GraphQL 101: Here’s everything you need to know about GraphQL

GraphQL 101: Here’s everything you need to know about GraphQL

Join 10,000+ subscribers

Keep up with the latest news about strains of Malware, Phishing Lures,
Indicators of Compromise, and Data Leaks.

Take action now

Secure your organisation with our Award winning Products

CloudSEK Platform is a no-code platform that powers our products with predictive threat analytic capabilities.

Engineering

min read

XVigil’s Fake Domain Finder: A Deep-dive on the Acquisition Component and Crawlers

XVigil’s Fake Domain Finder: A Deep-dive on the Acquisition Component and Crawlers

Authors
Co-Authors
No items found.

 

CloudSEK’s flagship digital risk monitoring product XVigil a SaaS-based platform, fortifies an organization’s external security posture from threats arising from the external attack surface. XVigil offers services such as Brand Monitoring, Cyber Threat Monitoring, and Infrastructure Monitoring.

The Brand Monitoring solutions offered by XVigil, scans the internet to detect relevant mentions of an organization’s brand, asset, or product, that could potentially tarnish its brand reputation. The Brand Monitoring functionality comes with multiple use cases such as:

  • Fake/ rogue application detection
  • Fake domain finder
  • Fake social media page monitoring
  • Fake customer care number finder

XVigil identifies and classifies brand impersonation threats like fake apps, domains, social media pages, and sends real-time alerts to clients notifying them about suspicious activities. For instance, the platform has reported and alerted CloudSEK’s clients regarding more than 10 million fake domains from across the internet.

Fake Domain Finder Framework

Fake domain monitoring is part of XVigil’s premium brand monitoring services that detects domain abuse by threat actors, through phishing or impersonation. The fake domain finder framework depends on the following components:

  • Acquisition
  • Classification

While the acquisition component is responsible for acquiring suspects from across sources on the internet, the classification component is responsible for detecting suspicious data points from the filtered list. 

The suspects acquired using the acquisition component are filtered based on their relevance, with respect to client assets, and are forwarded to the classification component. The classification component, in turn, filters suspicious domains and sends them for further classification.

Tech Stack

Here is the list of few technologies, tools, services, databases that XVigil uses to run the Fake Domain Finder:

  • Kubernetes

The entire project runs on Kubernetes, which orchestrates/ schedules the units, manages the lifecycle and helps in the crash recovery of individual units.

  • Elasticsearch

XVigil uses Elasticsearch as the primary database, in which the data is stored as Object Documents or Scan Documents. Elasticsearch is also used to create secure domain repositories that protect data from generic crawlers.

  • Redis

Redis Streams helps to manage data consumption and preserve the state of data when users are offline.

  • Nginx Ingress

A DaemonSet of Nginx Ingress is deployed through bare metal deployment. All the REST calls are routed through Ingress to the respective services.

  • GitLab

Gitlab is used to create git repositories, continuous integration pipelines, and container registry. When a commit is pushed, it automatically creates a new pipeline which builds the docker image and pushes it to the container registry of that repo. Kubernetes then pulls the docker image from the registry.

  • Others

Flask is used to create REST-APIs, JSON-Schema for validation and Rabbitmq to create queues and Grafana for monitoring.

As the module is large and has multiple proprietary technologies that enable it to perform at such a scale, it would be quite difficult to elaborate on each process. Which is why, this blog will only cover details about the acquisition component and the use of crawlers. 

Acquisition Component

The acquisition component is responsible for managing crawlers that acquire new suspicious domains that masquerade as the clients’ domains. A cron job is executed to periodically aggregate such domains and schedule other such recurring tasks/ jobs. The acquisition controller facilitates communication with the acquired components and exposes a REST interface.

The acquisition controller is connected to a unit called crawlers orchestrator, the role of which is to spawn crawlers and maintain its lifecycle. The orchestrator spawns crawlers based on the acquisition jobs that Redis assigns to it. At the completion of which, it is submitted to the acquisition controller.

Acquisition Controller

Acquisition controller is the interface that allows interaction with the acquisition component. It provides a REST interface to perform the following tasks:

  • Submit acquisition jobs
  • Register new crawlers
  • Delete existing crawlers
  • Update specifications of existing crawlers 
  • Send signals to orchestrator

Crawlers Orchestrator

Crawlers orchestrator is the unit that spawns crawlers based on the assets they receive. It listens to a stream of events, followed by which actions are initiated in accordance with the events. The following events and the actions are associated with the orchestrator:

  • ScheduleCrawlers

This event is initiated by a background thread and is used to schedule crawlers and jobs, it is submitted by the acquisition controller that will run in the orchestrator.

  • WipeDanglingScheduled

This event is initiated by a background thread that triggers the action to wipe dangling scheduled crawlers that run in the orchestrator. This includes those crawlers whose entry is present as scheduled but isn’t actually running in the system.

  • DeleteCompletedCrawler

This event is initiated by the acquisition controller to delete a crawling job.

Types of crawlers

Crawlers orchestrator deals with two types of crawlers:

  1. Generic Crawlers: These are crawlers that don’t perform searches. They visit the data source and scrape all the data available at the source itself.
  2. Keyword Crawlers: These crawlers depend on keywords to perform searches. The search results are then scraped and fed to the system.

Crawler

Crawlers visit sources and scrape the entire data or search-matched data, depending on its type. While the entire data is pulled by the generic crawlers, the data that matches a search is pulled by keyword crawlers.

When the crawlers orchestrator creates a crawling job, it mounts a configmap. If the spawned crawler is a keyword crawler, it gets its search terms. Upon completing a crawling task, the crawler sends a completion signal to the acquisition controller. The controller then takes necessary actions to wipe the crawler’s job and updates the Redis keys to further schedule decisions.

Common Crawlers

Subdomains Repository

The subdomains repository scans millions of domain records already present in-house to identify suspicious domains.

Typosquatting

We use an in-house permutation engine that is used to perform/ detect typosquatting, brandjacking, URL hijacking, fraud, phishing attacks, corporate espionage and threat intelligence. It can be used to generate and record typo-squatted domain/ subdomain names identical to the client’s assets.

The AI/ML-powered XVigil provides specific, actionable and timely intelligence to clients, allowing them to intervene and take remedial steps preventing costly breaches and losses. Fake Domain Finder is one of the multiple premium services and solutions offered by CloudSEK. The framework of which is composed of the acquisition component and the classification component. The acquisition component initiates the process of identifying suspicious, fake domains from across the internet and filtering them based on their relevance. But their activities would be rendered futile without the help of crawlers. This article explains the framework of XVigil’s Fake Domain Finder, with its primary focus on the functions of the acquisition component as well as crawlers.