SwaggerSpy is a tool designed for automated Open Source Intelligence (OSINT) on SwaggerHub. This project aims to streamline the process of gathering intelligence from APIs documented on SwaggerHub, providing valuable insights for security researchers, developers, and IT professionals.
Swagger is an open-source framework that allows developers to design, build, document, and consume RESTful web services. It simplifies API development by providing a standard way to describe REST APIs using a JSON or YAML format. Swagger enables developers to create interactive documentation for their APIs, making it easier for both developers and non-developers to understand and use the API.
SwaggerHub is a collaborative platform for designing, building, and managing APIs using the Swagger framework. It offers a centralized repository for API documentation, version control, and collaboration among team members. SwaggerHub simplifies the API development lifecycle by providing a unified platform for API design and testing.
Performing OSINT on SwaggerHub is crucial because developers, in their pursuit of efficient API documentation and sharing, may inadvertently expose sensitive information. Here are key reasons why OSINT on SwaggerHub is valuable:
Developer Oversights: Developers might unintentionally include secrets, credentials, or sensitive information in API documentation on SwaggerHub. These oversights can lead to security vulnerabilities and unauthorized access if not identified and addressed promptly.
Security Best Practices: OSINT on SwaggerHub helps enforce security best practices. Identifying and rectifying potential security issues early in the development lifecycle is essential to ensure the confidentiality and integrity of APIs.
Preventing Data Leaks: By systematically scanning SwaggerHub for sensitive information, organizations can proactively prevent data leaks. This is especially crucial in today's interconnected digital landscape where APIs play a vital role in data exchange between services.
Risk Mitigation: Understanding that developers might forget to remove or obfuscate sensitive details in API documentation underscores the importance of continuous OSINT on SwaggerHub. This proactive approach mitigates the risk of unintentional exposure of critical information.
Compliance and Privacy: Many industries have stringent compliance requirements regarding the protection of sensitive data. OSINT on SwaggerHub ensures that APIs adhere to these regulations, promoting a culture of compliance and safeguarding user privacy.
Educational Opportunities: Identifying oversights in SwaggerHub documentation provides educational opportunities for developers. It encourages a security-conscious mindset, fostering a culture of awareness and responsible information handling.
By recognizing that developers can inadvertently expose secrets, OSINT on SwaggerHub becomes an integral part of the overall security strategy, safeguarding against potential threats and promoting a secure API ecosystem.
SwaggerSpy obtains information from SwaggerHub and utilizes regular expressions to inspect API documentation for sensitive information, such as secrets and credentials.
To use SwaggerSpy, follow these steps:
git clone https://github.com/UndeadSec/SwaggerSpy.git
cd SwaggerSpy
pip install -r requirements.txt
python swaggerspy.py searchterm
SwaggerSpy is intended for educational and research purposes only. Users are responsible for ensuring that their use of this tool complies with applicable laws and regulations.
Contributions to SwaggerSpy are welcome! Feel free to submit issues, feature requests, or pull requests to help improve this tool.
SwaggerSpy is developed and maintained by Alisson Moretto (UndeadSec)
I'm a passionate cyber threat intelligence pro who loves sharing insights and crafting cybersecurity tools.
SwaggerSpy is licensed under the MIT License. See the LICENSE file for details.
Special thanks to @Liodeus for providing project inspiration through swaggerHole.
AntiSquat leverages AI techniques such as natural language processing (NLP), large language models (ChatGPT) and more to empower detection of typosquatting and phishing domains.
git clone https://github.com/redhuntlabs/antisquat
.pip install -r requirements.txt
..openai-key
and paste your chatgpt api key in there..godaddy-key
and paste your godaddy api key in there.blacklist.txt
. Type in a line-separated list of domains youβd like to ignore. Regular expressions are supported.python3.8 antisquat.py domains.txt
Letβs say youβd like to run antisquat on "flipkart.com".
Create a file named "domains.txt", then type in flipkart.com
. Then run python3.8 antisquat.py domains.txt
.
AntiSquat generates several permutations of the domain, iterates through them one-by-one and tries extracting all contact information from the page.
A test case for amazon.com is attached. To run it without any api keys, simply run python3.8 test.py
Here, the tool appears to have captured a test phishing site for amazon.com. Similar domains that may be available for sale can be captured in this way and any contact information from the site may be extracted.
If you'd like to know more about the tool, make sure to check out our blog.
To know more about our Attack Surface
Management platform, check out NVADR.
An automated tool which can simultaneously crawl, fill forms, trigger error/debug pages and "loot" secrets out of the client-facing code of sites.
To use the tool, you can grab any one of the pre-built binaries from the Releases section of the repository. If you want to build the source code yourself, you will need Go > 1.16 to build it. Simply running go build
will output a usable binary for you.
Additionally you will need two json files (lootdb.json and regexes.json) alongwith the binary which you can get from the repo itself. Once you have all 3 files in the same folder, you can go ahead and fire up the tool.
Video demo:
Here is the help usage of the tool:
$ ./httploot --help
_____
)=(
/ \ H T T P L O O T
( $ ) v0.1
\___/
[+] HTTPLoot by RedHunt Labs - A Modern Attack Surface (ASM) Management Company
[+] Author: Pinaki Mondal (RHL Research Team)
[+] Continuously Track Your Attack Surface using https://redhuntlabs.com/nvadr.
Usage of ./httploot:
-concurrency int
Maximum number of sites to process concurrently (default 100)
-depth int
Maximum depth limit to traverse while crawling (default 3)
-form-length int
Length of the string to be randomly generated for filling form fields (default 5)
-form-string string
Value with which the tool will auto-fill forms, strings will be randomly generated if no value is supplied
-input-file string
Path of the input file conta ining domains to process
-output-file string
CSV output file path to write the results to (default "httploot-results.csv")
-parallelism int
Number of URLs per site to crawl parallely (default 15)
-submit-forms
Whether to auto-submit forms to trigger debug pages
-timeout int
The default timeout for HTTP requests (default 10)
-user-agent string
User agent to use during HTTP requests (default "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0")
-verify-ssl
Verify SSL certificates while making HTTP requests
-wildcard-crawl
Allow crawling of links outside of the domain being scanned
There are two flags which help with the concurrent scanning:
-concurrency
: Specifies the maximum number of sites to process concurrently.-parallelism
: Specifies the number of links per site to crawl parallely.Both -concurrency
and -parallelism
are crucial to performance and reliability of the tool results.
The crawl depth can be specified using the -depth
flag. The integer value supplied to this is the maximum chain depth of links to crawl grabbed on a site.
An important flag -wildcard-crawl
can be used to specify whether to crawl URLs outside the domain in scope.
NOTE: Using this flag might lead to infinite crawling in worst case scenarios if the crawler finds links to other domains continuously.
If you want the tool to scan for debug pages, you need to specify the -submit-forms
argument. This will direct the tool to autosubmit forms and try to trigger error/debug pages once a tech stack has been identified successfully.
If the -submit-forms
flag is enabled, you can control the string to be submitted in the form fields. The -form-string
specifies the string to be submitted, while the -form-length
can control the length of the string to be randomly generated which will be filled into the forms.
Flags like:
-timeout
- specifies the HTTP timeout of requests.-user-agent
- specifies the user-agent to use in HTTP requests.-verify-ssl
- specifies whether or not to verify SSL certificates.Input file to read can be specified using the -input-file
argument. You can specify a file path containing a list of URLs to scan with the tool. The -output-file
flag can be used to specify the result output file path -- which by default goes into a file called httploot-results.csv
.
Further details about the research which led to the development of the tool can be found on our RedHunt Labs Blog.
The tool is licensed under the MIT license. See LICENSE.
Currently the tool is at v0.1.
The RedHunt Labs Research Team would like to extend credits to the creators & maintainers of shhgit for the regular expressions provided by them in their repository.
To know more about our Attack Surface Management platform, check out NVADR.
Octopii is an open-source AI-powered Personal Identifiable Information (PII) scanner that can look for image assets such as Government IDs, passports, photos and signatures in a directory.
Octopii uses Tesseract's Optical Character Recognition (OCR) and Keras' Convolutional Neural Networks (CNN) models to detect various forms of personal identifiable information that may be leaked on a publicly facing location. This is done in the following steps:
The image is imported via OpenCV and Python Imaging Library (PIL) and is cleaned, deskewed and rotated for scanning.
A directory is looped over and searched for images. These images are scanned for unique features via the image classifier (done by comparing it to a trained model), along with OCR for finding substrings within the image. This may have one of the following outcomes:
Best case (score >=90): The image is sent into the image classifier algorithm to be scanned for features such as an ISO/IEC 7810 card specification, colors, location of text, photos, holograms etc. If it is successfully classified as a type of PII, OCR is performed on it looking for particular words and strings as a final check. When both of these are confirmed, the result from Octopii is extremely reliable.
Average case (score >=50): The image is partially/incorrectly identified by the image classifier algorithm, but an OCR check finds contradicting substrings and reclassifies it.
Worst case (score >=0): The image is only identified by the image classifier algorithm but an OCR scan returns no results.
Incorrect classification: False positives due to a very small model or OCR list may incorrectly classify PIIs, giving inaccurate results.
As a final verification method, images are scanned for certain strings to verify the accuracy of the model.
The accuracy of the scan can determined via the confidence scores in output. If all the mentioned conditions are met, a score of 100.0 is returned.
To train the model, data can also be fed into the model_generator.py
script, and the newly improved h5 file can be used.
pip install -r requirements.txt
.sudo apt install tesseract-ocr -y
(for Ubuntu/Debian).python3 octopii.py <location name>
, for example python3 octopii.py pii_list/
python3 octopii.py <location to scan> <additional flags>
Octopii currently supports local scanning and scanning S3 directories and open directory listings via their URLs.
Open-source projects like these thrive on community support. Since Octopii relies heavily on machine learning and optical character recognition, contributions are much appreciated. Here's how to contribute:
Fork the official repository at https://github.com/redhuntlabs/octopii
There are 3 files in the models/
directory. - The keras_models.h5
file is the Keras h5 model that can be obtained from Google's Teachable Machine or via Keras in Python. - The labels.txt
file contains the list of labels corresponding to the index that the model returns. - The ocr_list.json
file consists of keywords to search for during an OCR scan, as well as other miscellaneous information such as country of origin, regular expressions etc.
Since our current dataset is quite small, we could benefit from a large Keras model of international PII for this project. If you do not have expertise in Keras, Google provides an extremely easy to use model generator called the Teachable Machine. To use it:
Tip: segregate your image assets into folders with the folder name being the same as the class name. You can then drag and drop a folder into the upload dialog.
Note: Only upload the same as the class name, for example, the German Passport class must have German Passport pictures. Uploading the wrong data to the wrong class will confuse the machine learning algorithms.
keras_model.h5
file and labels.txt
file into the models/
directory in Octopii.The images used for the model above are not visible to us since they're in a proprietary format. You can use both dummy and actual PII. Make sure they are square-ish in image size.
Once you generate models using Teachable Machine, you can improve Octopii's accuracy via OCR. To do this:
ocr_list.json
file. Create a JSONObject with the key having the same name as the asset class. NOTE: The key name must be exactly the same as the asset class name from Teachable Machine.
keywords
, use as many unique terms from your asset as possible, such as "Income Tax Department". Store them in a JSONArray.ocr_list.json
file.You can replace each file you modify in the models/
directory after you create or edit them via the above methods.
Submit a pull request from your forked repo and we'll pick it up and replace our current model with it if the changes are large enough.
Note: Please take the following steps to ensure quality
ocr_list.json
.(c) Copyright 2022 RedHunt Labs Private Limited
Author: Owais Shaikh