🚀 CloudSEK has raised $19M Series B1 Round – Powering the Future of Predictive Cybersecurity
Read More
Identify and counter malicious links and phishing attempts effectively with CloudSEK XVigil Fake URLs and Phishing module, bolstering your defense against cyber threats
Schedule a Demo
CloudSEK’s flagship digital risk monitoring product XVigil a SaaS-based platform, fortifies an organization’s external security posture from threats arising from the external attack surface. XVigil offers services such as Brand Monitoring, Cyber Threat Monitoring, and Infrastructure Monitoring.
The Brand Monitoring solutions offered by XVigil, scans the internet to detect relevant mentions of an organization’s brand, asset, or product, that could potentially tarnish its brand reputation. The Brand Monitoring functionality comes with multiple use cases such as:
XVigil identifies and classifies brand impersonation threats like fake apps, domains, social media pages, and sends real-time alerts to clients notifying them about suspicious activities. For instance, the platform has reported and alerted CloudSEK’s clients regarding more than 10 million fake domains from across the internet.
Fake domain monitoring is part of XVigil’s premium brand monitoring services that detects domain abuse by threat actors, through phishing or impersonation. The fake domain finder framework depends on the following components:
While the acquisition component is responsible for acquiring suspects from across sources on the internet, the classification component is responsible for detecting suspicious data points from the filtered list.
The suspects acquired using the acquisition component are filtered based on their relevance, with respect to client assets, and are forwarded to the classification component. The classification component, in turn, filters suspicious domains and sends them for further classification.
Here is the list of few technologies, tools, services, databases that XVigil uses to run the Fake Domain Finder:
The entire project runs on Kubernetes, which orchestrates/ schedules the units, manages the lifecycle and helps in the crash recovery of individual units.
XVigil uses Elasticsearch as the primary database, in which the data is stored as Object Documents or Scan Documents. Elasticsearch is also used to create secure domain repositories that protect data from generic crawlers.
Redis Streams helps to manage data consumption and preserve the state of data when users are offline.
A DaemonSet of Nginx Ingress is deployed through bare metal deployment. All the REST calls are routed through Ingress to the respective services.
Gitlab is used to create git repositories, continuous integration pipelines, and container registry. When a commit is pushed, it automatically creates a new pipeline which builds the docker image and pushes it to the container registry of that repo. Kubernetes then pulls the docker image from the registry.
Flask is used to create REST-APIs, JSON-Schema for validation and Rabbitmq to create queues and Grafana for monitoring.
As the module is large and has multiple proprietary technologies that enable it to perform at such a scale, it would be quite difficult to elaborate on each process. Which is why, this blog will only cover details about the acquisition component and the use of crawlers.
The acquisition component is responsible for managing crawlers that acquire new suspicious domains that masquerade as the clients’ domains. A cron job is executed to periodically aggregate such domains and schedule other such recurring tasks/ jobs. The acquisition controller facilitates communication with the acquired components and exposes a REST interface.
The acquisition controller is connected to a unit called crawlers orchestrator, the role of which is to spawn crawlers and maintain its lifecycle. The orchestrator spawns crawlers based on the acquisition jobs that Redis assigns to it. At the completion of which, it is submitted to the acquisition controller.
Acquisition controller is the interface that allows interaction with the acquisition component. It provides a REST interface to perform the following tasks:
Crawlers orchestrator is the unit that spawns crawlers based on the assets they receive. It listens to a stream of events, followed by which actions are initiated in accordance with the events. The following events and the actions are associated with the orchestrator:
This event is initiated by a background thread and is used to schedule crawlers and jobs, it is submitted by the acquisition controller that will run in the orchestrator.
This event is initiated by a background thread that triggers the action to wipe dangling scheduled crawlers that run in the orchestrator. This includes those crawlers whose entry is present as scheduled but isn’t actually running in the system.
This event is initiated by the acquisition controller to delete a crawling job.
Types of crawlers
Crawlers orchestrator deals with two types of crawlers:
Crawlers visit sources and scrape the entire data or search-matched data, depending on its type. While the entire data is pulled by the generic crawlers, the data that matches a search is pulled by keyword crawlers.
When the crawlers orchestrator creates a crawling job, it mounts a configmap. If the spawned crawler is a keyword crawler, it gets its search terms. Upon completing a crawling task, the crawler sends a completion signal to the acquisition controller. The controller then takes necessary actions to wipe the crawler’s job and updates the Redis keys to further schedule decisions.
Subdomains Repository
The subdomains repository scans millions of domain records already present in-house to identify suspicious domains.
Typosquatting
We use an in-house permutation engine that is used to perform/ detect typosquatting, brandjacking, URL hijacking, fraud, phishing attacks, corporate espionage and threat intelligence. It can be used to generate and record typo-squatted domain/ subdomain names identical to the client’s assets.
The AI/ML-powered XVigil provides specific, actionable and timely intelligence to clients, allowing them to intervene and take remedial steps preventing costly breaches and losses. Fake Domain Finder is one of the multiple premium services and solutions offered by CloudSEK. The framework of which is composed of the acquisition component and the classification component. The acquisition component initiates the process of identifying suspicious, fake domains from across the internet and filtering them based on their relevance. But their activities would be rendered futile without the help of crawlers. This article explains the framework of XVigil’s Fake Domain Finder, with its primary focus on the functions of the acquisition component as well as crawlers.
Discover how CloudSEK's comprehensive takedown services protect your brand from online threats.
How to bypass CAPTCHAs easily using Python and other methods
What is shadow IT and how do you manage shadow IT risks associated with remote work?
Take action now
CloudSEK Platform is a no-code platform that powers our products with predictive threat analytic capabilities.
Digital Risk Protection platform which gives Initial Attack Vector Protection for employees and customers.
Software and Supply chain Monitoring providing Initial Attack Vector Protection for Software Supply Chain risks.
Creates a blueprint of an organization's external attack surface including the core infrastructure and the software components.
Instant Security Score for any Android Mobile App on your phone. Search for any app to get an instant risk score.
min read
XVigil’s Fake Domain Finder: A Deep-dive on the Acquisition Component and Crawlers
CloudSEK’s flagship digital risk monitoring product XVigil a SaaS-based platform, fortifies an organization’s external security posture from threats arising from the external attack surface. XVigil offers services such as Brand Monitoring, Cyber Threat Monitoring, and Infrastructure Monitoring.
The Brand Monitoring solutions offered by XVigil, scans the internet to detect relevant mentions of an organization’s brand, asset, or product, that could potentially tarnish its brand reputation. The Brand Monitoring functionality comes with multiple use cases such as:
XVigil identifies and classifies brand impersonation threats like fake apps, domains, social media pages, and sends real-time alerts to clients notifying them about suspicious activities. For instance, the platform has reported and alerted CloudSEK’s clients regarding more than 10 million fake domains from across the internet.
Fake domain monitoring is part of XVigil’s premium brand monitoring services that detects domain abuse by threat actors, through phishing or impersonation. The fake domain finder framework depends on the following components:
While the acquisition component is responsible for acquiring suspects from across sources on the internet, the classification component is responsible for detecting suspicious data points from the filtered list.
The suspects acquired using the acquisition component are filtered based on their relevance, with respect to client assets, and are forwarded to the classification component. The classification component, in turn, filters suspicious domains and sends them for further classification.
Here is the list of few technologies, tools, services, databases that XVigil uses to run the Fake Domain Finder:
The entire project runs on Kubernetes, which orchestrates/ schedules the units, manages the lifecycle and helps in the crash recovery of individual units.
XVigil uses Elasticsearch as the primary database, in which the data is stored as Object Documents or Scan Documents. Elasticsearch is also used to create secure domain repositories that protect data from generic crawlers.
Redis Streams helps to manage data consumption and preserve the state of data when users are offline.
A DaemonSet of Nginx Ingress is deployed through bare metal deployment. All the REST calls are routed through Ingress to the respective services.
Gitlab is used to create git repositories, continuous integration pipelines, and container registry. When a commit is pushed, it automatically creates a new pipeline which builds the docker image and pushes it to the container registry of that repo. Kubernetes then pulls the docker image from the registry.
Flask is used to create REST-APIs, JSON-Schema for validation and Rabbitmq to create queues and Grafana for monitoring.
As the module is large and has multiple proprietary technologies that enable it to perform at such a scale, it would be quite difficult to elaborate on each process. Which is why, this blog will only cover details about the acquisition component and the use of crawlers.
The acquisition component is responsible for managing crawlers that acquire new suspicious domains that masquerade as the clients’ domains. A cron job is executed to periodically aggregate such domains and schedule other such recurring tasks/ jobs. The acquisition controller facilitates communication with the acquired components and exposes a REST interface.
The acquisition controller is connected to a unit called crawlers orchestrator, the role of which is to spawn crawlers and maintain its lifecycle. The orchestrator spawns crawlers based on the acquisition jobs that Redis assigns to it. At the completion of which, it is submitted to the acquisition controller.
Acquisition controller is the interface that allows interaction with the acquisition component. It provides a REST interface to perform the following tasks:
Crawlers orchestrator is the unit that spawns crawlers based on the assets they receive. It listens to a stream of events, followed by which actions are initiated in accordance with the events. The following events and the actions are associated with the orchestrator:
This event is initiated by a background thread and is used to schedule crawlers and jobs, it is submitted by the acquisition controller that will run in the orchestrator.
This event is initiated by a background thread that triggers the action to wipe dangling scheduled crawlers that run in the orchestrator. This includes those crawlers whose entry is present as scheduled but isn’t actually running in the system.
This event is initiated by the acquisition controller to delete a crawling job.
Types of crawlers
Crawlers orchestrator deals with two types of crawlers:
Crawlers visit sources and scrape the entire data or search-matched data, depending on its type. While the entire data is pulled by the generic crawlers, the data that matches a search is pulled by keyword crawlers.
When the crawlers orchestrator creates a crawling job, it mounts a configmap. If the spawned crawler is a keyword crawler, it gets its search terms. Upon completing a crawling task, the crawler sends a completion signal to the acquisition controller. The controller then takes necessary actions to wipe the crawler’s job and updates the Redis keys to further schedule decisions.
Subdomains Repository
The subdomains repository scans millions of domain records already present in-house to identify suspicious domains.
Typosquatting
We use an in-house permutation engine that is used to perform/ detect typosquatting, brandjacking, URL hijacking, fraud, phishing attacks, corporate espionage and threat intelligence. It can be used to generate and record typo-squatted domain/ subdomain names identical to the client’s assets.
The AI/ML-powered XVigil provides specific, actionable and timely intelligence to clients, allowing them to intervene and take remedial steps preventing costly breaches and losses. Fake Domain Finder is one of the multiple premium services and solutions offered by CloudSEK. The framework of which is composed of the acquisition component and the classification component. The acquisition component initiates the process of identifying suspicious, fake domains from across the internet and filtering them based on their relevance. But their activities would be rendered futile without the help of crawlers. This article explains the framework of XVigil’s Fake Domain Finder, with its primary focus on the functions of the acquisition component as well as crawlers.