Plugins play a crucial role in extending the functionality, flexibility, and customization of Urlinsane and allow it to evolve alongside changing needs and technological advancements. Here's a list of the plugin:
Type
Number
Description
Languages
9
Language plugins that support linguistic capabilities.
Keyboards
19
Keyboard plugins offering layouts for various international keyboards.
Algorithms
24
Generate typo variants for each target domain.
Information
13
Gather information on target domains.
Outputs
6
Format and save results in various output formats.
Keyboard Layouts
Arabic
Armenian
English
Finnish
French
Russian
Spanish
Hebrew
Persian
غفقثصض
QWERTY
QWERTY
QWERTY
ACNOR
ЯШЕРТЫ
QWERTY
Standard
Farsi
AZERTY
QWERTY
AZERTY
ЙЦУКЕН
QWERTY
غفقثصض
QWERTZ
ЙЦУКЕН
QWERTY
DVORAK
Algorithms
Algorithms systematically generate plausible misspelled domain variations by analyzing common typing errors and linguistic patterns.
TODO: Replacing words with their root form (stemming) in the domain name.
ks
Keyboard Substitution
TODO: Changing international keyboard layouts, assuming the user is typing in their native layout.
Collectors
Collector plugins gathering information on domains enables a detailed comparison of similar-looking domains to determine if they are being typosquatted by cybercriminals. By collecting data on domain ownership, registration dates, hosting locations, and site content, algorithms can analyze whether these variations are likely to be malicious. This approach helps identify suspicious patterns and potential connections to phishing, fraud, or brand impersonation attempts. With thorough data collection, organizations can better detect and respond to typosquatting threats in real time.
TODO: Retrieve HAR file from browser interaction for in-depth data analysis.
pop
Popularity
TODO: Retrieve domain popularity estimate like Urlcrazy
Outputs
With structured outputs, users can seamlessly incorporate findings into their existing defenses, strengthening their protection against typosquatting threats.
Name
Description
TABLE
Pretty table format with color styling
HTML
HTML-formatted output
JSON
TODO: JSON output format
TXT
Plain text output, one record per line
CSV
Comma-separated values format
TSV
Tab-separated values format
MD
Markdown-formatted output
A major limitation of the output format is its restricted display in the terminal, where data is primarily shown in columns and rows. Although the --filter flag lets you choose specific columns, and the --output/-o txt type enables streaming output directly to the terminal without table formatting, only a fraction of the collected information is shown. The new JSON output option overcomes this by allowing the complete, highly nested JSON document to be dumped, which can then be filtered using tools like jq for more detailed analysis.
In Progress
I am currently developing a sqlite database backend to store results, datasets, languages, and word embeddings. This approach aims to reduce the overall binary size, enable more advanced analysis, and allow the program to download updates in the future. Words often have interrelationships that are best represented in a database, ensuring better storage and improved efficiency.
Exploring the possibility of replacing the chained task pipeline with a DAG-based pipeline.
TODO
LLM: I’m interested in utilizing Large Language Models (LLMs) to replace our existing natural language processing (NLP) algorithms and to automatically generate language datasets.
I want to explore options for reducing the program’s size, currently at 11MB. By reusing existing operating system datasets, such as MaxMind GeoIP, TLD suffix lists, LLMs, and vector databases, we can minimize storage usage.
I’m considering restructuring the information-gathering functions to follow a Directed Acyclic Graph (DAG) execution pattern with dependencies, instead of chaining plugins in a linear pipeline. This would allow more efficient and flexible handling of interdependent tasks, similar to how Terraform manages plugin execution.
I plan to add an analysis plugin that compares data between two domains and can be executed as a separate CLI command.
Develop a script to download and build keyboard layouts from kbdlayout.info.
Work on creating an advanced keyboard model that incorporates layer-shifting functionality.
Implement functionality for sending DNS queries to multiple DNS servers.
Store records in an embedded database, enabling plugins to access the data efficiently.
A CLI command to report or retrieve typosquatting domains to/from (urlinsane.com) could help build a comprehensive dataset of potential typosquatting cases. With sufficient data and domain reports, an AI classifier could be developed to automatically identify typosquatting domains. The larger the dataset grows, the more accurately the AI would be able to detect and classify these domains.
URLCrazy is an OSINT tool to generate and test domain typos or variations to detect or perform typo squatting, URL hijacking, phishing, and corporate espionage.
An internationalized domain name (IDN) is a domain name that includes characters outside of the Latin alphabet, such as letters from Arabic, Chinese, Cyrillic, or Devanagari scripts.
An internationalized domain name (IDN) is a domain name that includes characters outside of the Latin alphabet, such as letters from Arabic, Chinese, Cyrillic, or Devanagari scripts.