A Model Context Protocol (MCP) server implementation that integrates with Firecrawl for web scraping capabilities.
Big thanks to @vrknetha, @cawstudios for the initial implementation!
You can also play around with our MCP Server on MCP.so's playground. Thanks to MCP.so for hosting and @gstarwd for integrating our server.
Β
env FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
npm install -g firecrawl-mcp
Configuring Cursor π₯οΈ Note: Requires Cursor version 0.45.6+ For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers: Cursor MCP Server Configuration Guide
To configure Firecrawl MCP in Cursor v0.45.6
env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp
To configure Firecrawl MCP in Cursor v0.48.6
json { "mcpServers": { "firecrawl-mcp": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR-API-KEY" } } } }
If you are using Windows and are running into issues, try
cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"
Replace your-api-key
with your Firecrawl API key. If you don't have one yet, you can create an account and get it from https://www.firecrawl.dev/app/api-keys
After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use Firecrawl MCP when appropriate, but you can explicitly request it by describing your web scraping needs. Access the Composer via Command+L (Mac), select "Agent" next to the submit button, and enter your query.
Add this to your ./codeium/windsurf/model_config.json
:
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY"
}
}
}
}
To install Firecrawl for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude
FIRECRAWL_API_KEY
: Your Firecrawl API keyFIRECRAWL_API_URL
FIRECRAWL_API_URL
(Optional): Custom API endpoint for self-hosted instanceshttps://firecrawl.your-domain.com
FIRECRAWL_RETRY_MAX_ATTEMPTS
: Maximum number of retry attempts (default: 3)FIRECRAWL_RETRY_INITIAL_DELAY
: Initial delay in milliseconds before first retry (default: 1000)FIRECRAWL_RETRY_MAX_DELAY
: Maximum delay in milliseconds between retries (default: 10000)FIRECRAWL_RETRY_BACKOFF_FACTOR
: Exponential backoff multiplier (default: 2)FIRECRAWL_CREDIT_WARNING_THRESHOLD
: Credit usage warning threshold (default: 1000)FIRECRAWL_CREDIT_CRITICAL_THRESHOLD
: Credit usage critical threshold (default: 100)For cloud API usage with custom retry and credit monitoring:
# Required for cloud API
export FIRECRAWL_API_KEY=your-api-key
# Optional retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts
export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay
export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay
export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff
# Optional credit monitoring
export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits
export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits
For self-hosted instance:
# Required for self-hosted
export FIRECRAWL_API_URL=https://firecrawl.your-domain.com
# Optional authentication for self-hosted
export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth
# Custom retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=10
export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries
Add this to your claude_desktop_config.json
:
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
"FIRECRAWL_RETRY_MAX_ATTEMPTS": "5",
"FIRECRAWL_RETRY_INITIAL_DELAY": "2000",
"FIRECRAWL_RETRY_MAX_DELAY": "30000",
"FIRECRAWL_RETRY_BACKOFF_FACTOR": "3",
"FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000",
"FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500"
}
}
}
}
The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured:
const CONFIG = {
retry: {
maxAttempts: 3, // Number of retry attempts for rate-limited requests
initialDelay: 1000, // Initial delay before first retry (in milliseconds)
maxDelay: 10000, // Maximum delay between retries (in milliseconds)
backoffFactor: 2, // Multiplier for exponential backoff
},
credit: {
warningThreshold: 1000, // Warn when credit usage reaches this level
criticalThreshold: 100, // Critical alert when credit usage reaches this level
},
};
These configurations control:
Retry Behavior
Automatically retries failed requests due to rate limits
Example: With default settings, retries will be attempted at:
Credit Usage Monitoring
The server utilizes Firecrawl's built-in rate limiting and batch processing capabilities:
firecrawl_scrape
)Scrape content from a single URL with advanced options.
{
"name": "firecrawl_scrape",
"arguments": {
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": true,
"waitFor": 1000,
"timeout": 30000,
"mobile": false,
"includeTags": ["article", "main"],
"excludeTags": ["nav", "footer"],
"skipTlsVerification": false
}
}
firecrawl_batch_scrape
)Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
{
"name": "firecrawl_batch_scrape",
"arguments": {
"urls": ["https://example1.com", "https://example2.com"],
"options": {
"formats": ["markdown"],
"onlyMainContent": true
}
}
}
Response includes operation ID for status checking:
{
"content": [
{
"type": "text",
"text": "Batch operation queued with ID: batch_1. Use firecrawl_check_batch_status to check progress."
}
],
"isError": false
}
firecrawl_check_batch_status
)Check the status of a batch operation.
{
"name": "firecrawl_check_batch_status",
"arguments": {
"id": "batch_1"
}
}
firecrawl_search
)Search the web and optionally extract content from search results.
{
"name": "firecrawl_search",
"arguments": {
"query": "your search query",
"limit": 5,
"lang": "en",
"country": "us",
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": true
}
}
}
firecrawl_crawl
)Start an asynchronous crawl with advanced options.
{
"name": "firecrawl_crawl",
"arguments": {
"url": "https://example.com",
"maxDepth": 2,
"limit": 100,
"allowExternalLinks": false,
"deduplicateSimilarURLs": true
}
}
firecrawl_extract
)Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
{
"name": "firecrawl_extract",
"arguments": {
"urls": ["https://example.com/page1", "https://example.com/page2"],
"prompt": "Extract product information including name, price, and description",
"systemPrompt": "You are a helpful assistant that extracts product information",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
},
"required": ["name", "price"]
},
"allowExternalLinks": false,
"enableWebSearch": false,
"includeSubdomains": false
}
}
Example response:
{
"content": [
{
"type": "text",
"text": {
"name": "Example Product",
"price": 99.99,
"description": "This is an example product description"
}
}
],
"isError": false
}
urls
: Array of URLs to extract information fromprompt
: Custom prompt for the LLM extractionsystemPrompt
: System prompt to guide the LLMschema
: JSON schema for structured data extractionallowExternalLinks
: Allow extraction from external linksenableWebSearch
: Enable web search for additional contextincludeSubdomains
: Include subdomains in extractionWhen using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
Conduct deep web research on a query using intelligent crawling, search, and LLM analysis.
{
"name": "firecrawl_deep_research",
"arguments": {
"query": "how does carbon capture technology work?",
"maxDepth": 3,
"timeLimit": 120,
"maxUrls": 50
}
}
Arguments:
Returns:
Generate a standardized llms.txt (and optionally llms-full.txt) file for a given domain. This file defines how large language models should interact with the site.
{
"name": "firecrawl_generate_llmstxt",
"arguments": {
"url": "https://example.com",
"maxUrls": 20,
"showFullText": true
}
}
Arguments:
Returns:
The server includes comprehensive logging:
Example log messages:
[INFO] Firecrawl MCP Server initialized successfully
[INFO] Starting scrape for URL: https://example.com
[INFO] Batch operation queued with ID: batch_1
[WARNING] Credit usage has reached warning threshold
[ERROR] Rate limit exceeded, retrying in 2s...
The server provides robust error handling:
Example error response:
{
"content": [
{
"type": "text",
"text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
}
],
"isError": true
}
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
npm test
MIT License - see LICENSE file for details
Real-time face swap and video deepfake with a single click and only a single image.
This deepfake software is designed to be a productive tool for the AI-generated media industry. It can assist artists in animating custom characters, creating engaging content, and even using models for clothing design.
We are aware of the potential for unethical applications and are committed to preventative measures. A built-in check prevents the program from processing inappropriate media (nudity, graphic content, sensitive material like war footage, etc.). We will continue to develop this project responsibly, adhering to the law and ethics. We may shut down the project or add watermarks if legally required.
Ethical Use: Users are expected to use this software responsibly and legally. If using a real person's face, obtain their consent and clearly label any output as a deepfake when sharing online.
Content Restrictions: The software includes built-in checks to prevent processing inappropriate media, such as nudity, graphic content, or sensitive material.
Legal Compliance: We adhere to all relevant laws and ethical guidelines. If legally required, we may shut down the project or add watermarks to the output.
User Responsibility: We are not responsible for end-user actions. Users must ensure their use of the software aligns with ethical standards and legal requirements.
By using this software, you agree to these terms and commit to using it in a manner that respects the rights and dignity of others.
Users are expected to use this software responsibly and legally. If using a real person's face, obtain their consent and clearly label any output as a deepfake when sharing online. We are not responsible for end-user actions.
1. Select a face 2. Select which camera to use 3. Press live!
Retain your original mouth for accurate movement using Mouth Mask
Use different faces on multiple subjects simultaneously
Watch movies with any face in real-time
Run Live shows and performances
Create Your Most Viral Meme Yet
Created using Many Faces feature in Deep-Live-Cam
Surprise people on Omegle
Please be aware that the installation requires technical skills and is not for beginners. Consider downloading the prebuilt version.
git clone https://github.com/hacksider/Deep-Live-Cam.git
cd Deep-Live-Cam
**3. Download the Models** 1. [GFPGANv1.4](https://huggingface.co/hacksider/deep-live-cam/resolve/main/GFPGANv1.4.pth) 2. [inswapper\_128\_fp16.onnx](https://huggingface.co/hacksider/deep-live-cam/resolve/main/inswapper_128_fp16.onnx) Place these files in the "**models**" folder. **4. Install Dependencies** We highly recommend using a `venv` to avoid issues. For Windows: python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
**For macOS:** Apple Silicon (M1/M2/M3) requires specific setup: # Install Python 3.10 (specific version is important)
brew install python@3.10
# Install tkinter package (required for the GUI)
brew install python-tk@3.10
# Create and activate virtual environment with Python 3.10
python3.10 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
** In case something goes wrong and you need to reinstall the virtual environment ** # Deactivate the virtual environment
rm -rf venv
# Reinstall the virtual environment
python -m venv venv
source venv/bin/activate
# install the dependencies again
pip install -r requirements.txt
**Run:** If you don't have a GPU, you can run Deep-Live-Cam using `python run.py`. Note that initial execution will download models (~300MB). ### GPU Acceleration **CUDA Execution Provider (Nvidia)** 1. Install [CUDA Toolkit 11.8.0](https://developer.nvidia.com/cuda-11-8-0-download-archive) 2. Install dependencies: pip uninstall onnxruntime onnxruntime-gpu
pip install onnxruntime-gpu==1.16.3
3. Usage: python run.py --execution-provider cuda
**CoreML Execution Provider (Apple Silicon)** Apple Silicon (M1/M2/M3) specific installation: 1. Make sure you've completed the macOS setup above using Python 3.10. 2. Install dependencies: pip uninstall onnxruntime onnxruntime-silicon
pip install onnxruntime-silicon==1.13.1
3. Usage (important: specify Python 3.10): python3.10 run.py --execution-provider coreml
**Important Notes for macOS:** - You **must** use Python 3.10, not newer versions like 3.11 or 3.13 - Always run with `python3.10` command not just `python` if you have multiple Python versions installed - If you get error about `_tkinter` missing, reinstall the tkinter package: `brew reinstall python-tk@3.10` - If you get model loading errors, check that your models are in the correct folder - If you encounter conflicts with other Python versions, consider uninstalling them: ```bash # List all installed Python versions brew list | grep python # Uninstall conflicting versions if needed brew uninstall --ignore-dependencies python@3.11 python@3.13 # Keep only Python 3.10 brew cleanup ``` **CoreML Execution Provider (Apple Legacy)** 1. Install dependencies: pip uninstall onnxruntime onnxruntime-coreml
pip install onnxruntime-coreml==1.13.1
2. Usage: python run.py --execution-provider coreml
**DirectML Execution Provider (Windows)** 1. Install dependencies: pip uninstall onnxruntime onnxruntime-directml
pip install onnxruntime-directml==1.15.1
2. Usage: python run.py --execution-provider directml
**OpenVINOβ’ Execution Provider (Intel)** 1. Install dependencies: pip uninstall onnxruntime onnxruntime-openvino
pip install onnxruntime-openvino==1.15.0
2. Usage: python run.py --execution-provider openvino
1. Image/Video Mode
python run.py
.2. Webcam Mode
python run.py
.Check out these helpful guides to get the most out of Deep-Live-Cam:
Visit our official blog for more tips and tutorials.
options:
-h, --help show this help message and exit
-s SOURCE_PATH, --source SOURCE_PATH select a source image
-t TARGET_PATH, --target TARGET_PATH select a target image or video
-o OUTPUT_PATH, --output OUTPUT_PATH select output file or directory
--frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...] frame processors (choices: face_swapper, face_enhancer, ...)
--keep-fps keep original fps
--keep-audio keep original audio
--keep-frames keep temporary frames
--many-faces process every face
--map-faces map source target faces
--mouth-mask mask the mouth region
--video-encoder {libx264,libx265,libvpx-vp9} adjust output video encoder
--video-quality [0-51] adjust output video quality
--live-mirror the live camera display as you see it in the front-facing camera frame
--live-resizable the live camera frame is resizable
--max-memory MAX_MEMORY maximum amount of RAM in GB
--execution-provider {cpu} [{cpu} ...] available execution provider (choices: cpu, ...)
--execution-threads EXECUTION_THREADS number of execution threads
-v, --version show program's version number and exit
Looking for a CLI mode? Using the -s/--source argument will make the run program in cli mode.
We are always open to criticism and are ready to improve, that's why we didn't cherry-pick anything.
π« CAMEL is an open-source community dedicated to finding the scaling laws of agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.
The framework is designed to support systems with millions of agents, ensuring efficient coordination, communication, and resource management at scale.
Agents maintain stateful memory, enabling them to perform multi-step interactions with environments and efficiently tackle sophisticated tasks.
Every line of code and comment serves as a prompt for agents. Code should be written clearly and readably, ensuring both humans and agents can interpret it effectively.
We are a community-driven research collective comprising over 100 researchers dedicated to advancing frontier research in Multi-Agent Systems. Researchers worldwide choose CAMEL for their studies based on the following reasons.
β | Large-Scale Agent System | Simulate up to 1M agents to study emergent behaviors and scaling laws in complex, multi-agent environments. |
β | Dynamic Communication | Enable real-time interactions among agents, fostering seamless collaboration for tackling intricate tasks. |
β | Stateful Memory | Equip agents with the ability to retain and leverage historical context, improving decision-making over extended interactions. |
β | Support for Multiple Benchmarks | Utilize standardized benchmarks to rigorously evaluate agent performance, ensuring reproducibility and reliable comparisons. |
β | Support for Different Agent Types | Work with a variety of agent roles, tasks, models, and environments, supporting interdisciplinary experiments and diverse research applications. |
β | Data Generation and Tool Integration | Automate the creation of large-scale, structured datasets while seamlessly integrating with multiple tools, streamlining synthetic data generation and research workflows. |
Installing CAMEL is a breeze thanks to its availability on PyPI. Simply open your terminal and run:
pip install camel-ai
This example demonstrates how to create a ChatAgent
using the CAMEL framework and perform a search query using DuckDuckGo.
bash pip install 'camel-ai[web_tools]'
bash export OPENAI_API_KEY='your_openai_api_key'
```python from camel.models import ModelFactory from camel.types import ModelPlatformType, ModelType from camel.agents import ChatAgent from camel.toolkits import SearchToolkit
model = ModelFactory.create( model_platform=ModelPlatformType.OPENAI, model_type=ModelType.GPT_4O, model_config_dict={"temperature": 0.0}, )
search_tool = SearchToolkit().search_duckduckgo
agent = ChatAgent(model=model, tools=[search_tool])
response_1 = agent.step("What is CAMEL-AI?") print(response_1.msgs[0].content) # CAMEL-AI is the first LLM (Large Language Model) multi-agent framework # and an open-source community focused on finding the scaling laws of agents. # ...
response_2 = agent.step("What is the Github link to CAMEL framework?") print(response_2.msgs[0].content) # The GitHub link to the CAMEL framework is # https://github.com/camel-ai/camel. ```
For more detailed instructions and additional configuration options, check out the installation section.
After running, you can explore our CAMEL Tech Stack and Cookbooks at docs.camel-ai.org to build powerful multi-agent systems.
We provide a demo showcasing a conversation between two ChatGPT agents playing roles as a python programmer and a stock trader collaborating on developing a trading bot for stock market.
Explore different types of agents, their roles, and their applications.
Please reach out to us on CAMEL discord if you encounter any issue set up CAMEL.
Core components and utilities to build, operate, and enhance CAMEL-AI agents and societies.
Module | Description |
---|---|
Agents | Core agent architectures and behaviors for autonomous operation. |
Agent Societies | Components for building and managing multi-agent systems and collaboration. |
Data Generation | Tools and methods for synthetic data creation and augmentation. |
Models | Model architectures and customization options for agent intelligence. |
Tools | Tools integration for specialized agent tasks. |
Memory | Memory storage and retrieval mechanisms for agent state management. |
Storage | Persistent storage solutions for agent data and states. |
Benchmarks | Performance evaluation and testing frameworks. |
Interpreters | Code and command interpretation capabilities. |
Data Loaders | Data ingestion and preprocessing tools. |
Retrievers | Knowledge retrieval and RAG components. |
Runtime | Execution environment and process management. |
Human-in-the-Loop | Interactive components for human oversight and intervention. |
--- |
We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks.
Explore our research projects:
Research with US
We warmly invite you to use CAMEL for your impactful research.
Rigorous research takes time and resources. We are a community-driven research collective with 100+ researchers exploring the frontier research of Multi-agent Systems. Join our ongoing projects or test new ideas with us, reach out via email for more information.
![]()
For more details, please see our Models Documentation
.
Data (Hosted on Hugging Face)
Dataset | Chat format | Instruction format | Chat format (translated) |
---|---|---|---|
AI Society | Chat format | Instruction format | Chat format (translated) |
Code | Chat format | Instruction format | x |
Math | Chat format | x | x |
Physics | Chat format | x | x |
Chemistry | Chat format | x | x |
Biology | Chat format | x | x |
Dataset | Instructions | Tasks |
---|---|---|
AI Society | Instructions | Tasks |
Code | Instructions | Tasks |
Misalignment | Instructions | Tasks |
Practical guides and tutorials for implementing specific functionalities in CAMEL-AI agents and societies.
Cookbook | Description |
---|---|
Creating Your First Agent | A step-by-step guide to building your first agent. |
Creating Your First Agent Society | Learn to build a collaborative society of agents. |
Message Cookbook | Best practices for message handling in agents. |
Cookbook | Description |
---|---|
Tools Cookbook | Integrating tools for enhanced functionality. |
Memory Cookbook | Implementing memory systems in agents. |
RAG Cookbook | Recipes for Retrieval-Augmented Generation. |
Graph RAG Cookbook | Leveraging knowledge graphs with RAG. |
Track CAMEL Agents with AgentOps | Tools for tracking and managing agents in operations. |
Cookbook | Description |
---|---|
Data Generation with CAMEL and Finetuning with Unsloth | Learn how to generate data with CAMEL and fine-tune models effectively with Unsloth. |
Data Gen with Real Function Calls and Hermes Format | Explore how to generate data with real function calls and the Hermes format. |
CoT Data Generation and Upload Data to Huggingface | Uncover how to generate CoT data with CAMEL and seamlessly upload it to Huggingface. |
CoT Data Generation and SFT Qwen with Unsolth | Discover how to generate CoT data using CAMEL and SFT Qwen with Unsolth, and seamlessly upload your data and model to Huggingface. |
Cookbook | Description |
---|---|
Role-Playing Scraper for Report & Knowledge Graph Generation | Create role-playing agents for data scraping and reporting. |
Create A Hackathon Judge Committee with Workforce | Building a team of agents for collaborative judging. |
Dynamic Knowledge Graph Role-Playing: Multi-Agent System with dynamic, temporally-aware knowledge graphs | Builds dynamic, temporally-aware knowledge graphs for financial applications using a multi-agent system. It processes financial reports, news articles, and research papers to help traders analyze data, identify relationships, and uncover market insights. The system also utilizes diverse and optional element node deduplication techniques to ensure data integrity and optimize graph structure for financial decision-making. |
Customer Service Discord Bot with Agentic RAG | Learn how to build a robust customer service bot for Discord using Agentic RAG. |
Customer Service Discord Bot with Local Model | Learn how to build a robust customer service bot for Discord using Agentic RAG which supports local deployment. |
Cookbook | Description |
---|---|
Video Analysis | Techniques for agents in video data analysis. |
3 Ways to Ingest Data from Websites with Firecrawl | Explore three methods for extracting and processing data from websites using Firecrawl. |
Create AI Agents that work with your PDFs | Learn how to create AI agents that work with your PDFs using Chunkr and Mistral AI. |
For those who'd like to contribute code, we appreciate your interest in contributing to our open-source initiative. Please take a moment to review our contributing guidelines to get started on a smooth collaboration journey.π
We also welcome you to help CAMEL grow by sharing it on social media, at events, or during conferences. Your support makes a big difference!
For more information please contact camel-ai@eigent.ai
GitHub Issues: Report bugs, request features, and track development. Submit an issue
Discord: Get real-time support, chat with the community, and stay updated. Join us
X (Twitter): Follow for updates, AI insights, and key announcements. Follow us
Ambassador Project: Advocate for CAMEL-AI, host events, and contribute content. Learn more
@inproceedings{li2023camel,
title={CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society},
author={Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023}
}
Special thanks to Nomic AI for giving us extended access to their data set exploration tool (Atlas).
We would also like to thank Haya Hammoud for designing the initial logo of our project.
We implemented amazing research ideas from other works for you to build, compare and customize your agents. If you use any of these modules, please kindly cite the original works: - TaskCreationAgent
, TaskPrioritizationAgent
and BabyAGI
from Nakajima et al.: Task-Driven Autonomous Agent. [Example]
PersonaHub
from Tao Ge et al.: Scaling Synthetic Data Creation with 1,000,000,000 Personas. [Example]
Self-Instruct
from Yizhong Wang et al.: SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions. [Example]
The source code is licensed under Apache 2.0.
SubGPT looks at subdomains you have already discovered for a domain and uses BingGPT to find more. Best part? It's free!
The following subdomains were found by this tool with these 30 subdomains as input.
call-prompts-staging.example.com
dclb02-dca1.prod.example.com
activedirectory-sjc1.example.com
iadm-staging.example.com
elevatenetwork-c.example.com
If you like my work, you can support me with as little as $1, here :)
pip install subgpt
git clone https://github.com/s0md3v/SubGPT && cd SubGPT && python setup.py install
cookies.json
Note: Any issues regarding BingGPT itself should be reported EdgeGPT, not here.
It is supposed to be used after you have discovered some subdomains using all other methods. The standard way to run SubGPT is as follows:
subgpt -i input.txt -o output.txt -c /path/to/cookies.json
If you don't specify an output file, the output will be shown in your terminal (stdout
) instead.
To generate subdomains and not resolve them, use the --dont-resolve
option. It's a great way to see all subdomains generated by SubGPT and/or use your own resolver on them.
Dealing with failing web scrapers due to anti-bot protections or website changes? Meet Scrapling.
Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.
>> from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher
# Fetch websites' source under the radar!
>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
>> print(page.status)
200
>> products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
>> # Later, if the website structure changes, pass `auto_match=True`
>> products = page.css('.product', auto_match=True) # and Scrapling still finds them!
Fetcher
class.PlayWrightFetcher
class through your real browser, Scrapling's stealth mode, Playwright's Chrome browser, or NSTbrowser's browserless!StealthyFetcher
and PlayWrightFetcher
classes.from scrapling.fetchers import Fetcher
fetcher = Fetcher(auto_match=False)
# Do http GET request to a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)
# Get all text content from all HTML tags in the page except `script` and `style` tags
page.get_all_text(ignore_tags=('script', 'style'))
# Get all quotes elements, any of these methods will return a list of strings directly (TextHandlers)
quotes = page.css('.quote .text::text') # CSS selector
quotes = page.xpath('//span[@class="text"]/text()') # XPath
quotes = page.css('.quote').css('.text::text') # Chained selectors
quotes = [element.text for element in page.css('.quote .text')] # Slower than bulk query above
# Get the first quote element
quote = page.css_first('.quote') # same as page.css('.quote').first or page.css('.quote')[0]
# Tired of selectors? Use find_all/find
# Get all 'div' HTML tags that one of its 'class' values is 'quote'
quotes = page.find_all('div', {'class': 'quote'})
# Same as
quotes = page.find_all('div', class_='quote')
quotes = page.find_all(['div'], class_='quote')
quotes = page.find_all(class_='quote') # and so on...
# Working with elements
quote.html_content # Get Inner HTML of this element
quote.prettify() # Prettified version of Inner HTML above
quote.attrib # Get that element's attributes
quote.path # DOM path to element (List of all ancestors from <html> tag till the element itself)
To keep it simple, all methods can be chained on top of each other!
Scrapling isn't just powerful - it's also blazing fast. Scrapling implements many best practices, design patterns, and numerous optimizations to save fractions of seconds. All of that while focusing exclusively on parsing HTML documents. Here are benchmarks comparing Scrapling to popular Python libraries in two tests.
# | Library | Time (ms) | vs Scrapling |
---|---|---|---|
1 | Scrapling | 5.44 | 1.0x |
2 | Parsel/Scrapy | 5.53 | 1.017x |
3 | Raw Lxml | 6.76 | 1.243x |
4 | PyQuery | 21.96 | 4.037x |
5 | Selectolax | 67.12 | 12.338x |
6 | BS4 with Lxml | 1307.03 | 240.263x |
7 | MechanicalSoup | 1322.64 | 243.132x |
8 | BS4 with html5lib | 3373.75 | 620.175x |
As you see, Scrapling is on par with Scrapy and slightly faster than Lxml which both libraries are built on top of. These are the closest results to Scrapling. PyQuery is also built on top of Lxml but still, Scrapling is 4 times faster.
Library | Time (ms) | vs Scrapling |
---|---|---|
Scrapling | 2.51 | 1.0x |
AutoScraper | 11.41 | 4.546x |
Scrapling can find elements with more methods and it returns full element Adaptor
objects not only the text like AutoScraper. So, to make this test fair, both libraries will extract an element with text, find similar elements, and then extract the text content for all of them. As you see, Scrapling is still 4.5 times faster at the same task.
All benchmarks' results are an average of 100 runs. See our benchmarks.py for methodology and to run your comparisons.
Scrapling is a breeze to get started with; Starting from version 0.2.9, we require at least Python 3.9 to work.
pip3 install scrapling
Then run this command to install browsers' dependencies needed to use Fetcher classes
scrapling install
If you have any installation issues, please open an issue.
Fetchers are interfaces built on top of other libraries with added features that do requests or fetch pages for you in a single request fashion and then return an Adaptor
object. This feature was introduced because the only option we had before was to fetch the page as you wanted it, then pass it manually to the Adaptor
class to create an Adaptor
instance and start playing around with the page.
You might be slightly confused by now so let me clear things up. All fetcher-type classes are imported in the same way
from scrapling.fetchers import Fetcher, StealthyFetcher, PlayWrightFetcher
All of them can take these initialization arguments: auto_match
, huge_tree
, keep_comments
, keep_cdata
, storage
, and storage_args
, which are the same ones you give to the Adaptor
class.
If you don't want to pass arguments to the generated Adaptor
object and want to use the default values, you can use this import instead for cleaner code:
from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher
then use it right away without initializing like:
page = StealthyFetcher.fetch('https://example.com')
Also, the Response
object returned from all fetchers is the same as the Adaptor
object except it has these added attributes: status
, reason
, cookies
, headers
, history
, and request_headers
. All cookies
, headers
, and request_headers
are always of type dictionary
.
[!NOTE] The
auto_match
argument is enabled by default which is the one you should care about the most as you will see later.
This class is built on top of httpx with additional configuration options, here you can do GET
, POST
, PUT
, and DELETE
requests.
For all methods, you have stealthy_headers
which makes Fetcher
create and use real browser's headers then create a referer header as if this request came from Google's search of this URL's domain. It's enabled by default. You can also set the number of retries with the argument retries
for all methods and this will make httpx retry requests if it failed for any reason. The default number of retries for all Fetcher
methods is 3.
Hence: All headers generated by
stealthy_headers
argument can be overwritten by you through theheaders
argument
You can route all traffic (HTTP and HTTPS) to a proxy for any of these methods in this format http://username:password@localhost:8030
>> page = Fetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)
>> page = Fetcher().post('https://httpbin.org/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030')
>> page = Fetcher().put('https://httpbin.org/put', data={'key': 'value'})
>> page = Fetcher().delete('https://httpbin.org/delete')
For Async requests, you will just replace the import like below:
>> from scrapling.fetchers import AsyncFetcher
>> page = await AsyncFetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)
>> page = await AsyncFetcher().post('https://httpbin.org/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030')
>> page = await AsyncFetcher().put('https://httpbin.org/put', data={'key': 'value'})
>> page = await AsyncFetcher().delete('https://httpbin.org/delete')
This class is built on top of Camoufox, bypassing most anti-bot protections by default. Scrapling adds extra layers of flavors and configurations to increase performance and undetectability even further.
>> page = StealthyFetcher().fetch('https://www.browserscan.net/bot-detection') # Running headless by default
>> page.status == 200
True
>> page = await StealthyFetcher().async_fetch('https://www.browserscan.net/bot-detection') # the async version of fetch
>> page.status == 200
True
Note: all requests done by this fetcher are waiting by default for all JS to be fully loaded and executed so you don't have to :)
This list isn't final so expect a lot more additions and flexibility to be added in the next versions!
This class is built on top of Playwright which currently provides 4 main run options but they can be mixed as you want.
>> page = PlayWrightFetcher().fetch('https://www.google.com/search?q=%22Scrapling%22', disable_resources=True) # Vanilla Playwright option
>> page.css_first("#search a::attr(href)")
'https://github.com/D4Vinci/Scrapling'
>> page = await PlayWrightFetcher().async_fetch('https://www.google.com/search?q=%22Scrapling%22', disable_resources=True) # the async version of fetch
>> page.css_first("#search a::attr(href)")
'https://github.com/D4Vinci/Scrapling'
Note: all requests done by this fetcher are waiting by default for all JS to be fully loaded and executed so you don't have to :)
Using this Fetcher class, you can make requests with: 1) Vanilla Playwright without any modifications other than the ones you chose. 2) Stealthy Playwright with the stealth mode I wrote for it. It's still a WIP but it bypasses many online tests like Sannysoft's. Some of the things this fetcher's stealth mode does include: * Patching the CDP runtime fingerprint. * Mimics some of the real browsers' properties by injecting several JS files and using custom options. * Using custom flags on launch to hide Playwright even more and make it faster. * Generates real browser's headers of the same type and same user OS then append it to the request's headers. 3) Real browsers by passing the real_chrome
argument or the CDP URL of your browser to be controlled by the Fetcher and most of the options can be enabled on it. 4) NSTBrowser's docker browserless option by passing the CDP URL and enabling nstbrowser_mode
option.
Hence using the
real_chrome
argument requires that you have Chrome browser installed on your device
Add that to a lot of controlling/hiding options as you will see in the arguments list below.
This list isn't final so expect a lot more additions and flexibility to be added in the next versions!
>>> quote.tag
'div'
>>> quote.parent
<data='<div class="col-md-8"> <div class="quote...' parent='<div class="row"> <div class="col-md-8">...'>
>>> quote.parent.tag
'div'
>>> quote.children
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<span>by <small class="author" itemprop=...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<div class="tags"> Tags: <meta class="ke...' parent='<div class="quote" itemscope itemtype="h...'>]
>>> quote.siblings
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
>>> quote.next # gets the next element, the same logic applies to `quote.previous`
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>
>>> quote.children.css_first(".author::text")
'Albert Einstein'
>>> quote.has_class('quote')
True
# Generate new selectors for any element
>>> quote.generate_css_selector
'body > div > div:nth-of-type(2) > div > div'
# Test these selectors on your favorite browser or reuse them again in the library's methods!
>>> quote.generate_xpath_selector
'//body/div/div[2]/div/div'
If your case needs more than the element's parent, you can iterate over the whole ancestors' tree of any element like below
for ancestor in quote.iterancestors():
# do something with it...
You can search for a specific ancestor of an element that satisfies a function, all you need to do is to pass a function that takes an Adaptor
object as an argument and return True
if the condition satisfies or False
otherwise like below:
>>> quote.find_ancestor(lambda ancestor: ancestor.has_class('row'))
<data='<div class="row"> <div class="col-md-8">...' parent='<div class="container"> <div class="row...'>
You can select elements by their text content in multiple ways, here's a full example on another website:
>>> page = Fetcher().get('https://books.toscrape.com/index.html')
>>> page.find_by_text('Tipping the Velvet') # Find the first element whose text fully matches this text
<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>
>>> page.urljoin(page.find_by_text('Tipping the Velvet').attrib['href']) # We use `page.urljoin` to return the full URL from the relative `href`
'https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html'
>>> page.find_by_text('Tipping the Velvet', first_match=False) # Get all matches if there are more
[<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>]
>>> page.find_by_regex(r'Β£[\d\.]+') # Get the first element that its text content matches my price regex
<data='<p class="price_color">Β£51.77</p>' parent='<div class="product_price"> <p class="pr...'>
>>> page.find_by_regex(r'Β£[\d\.]+', first_match=False) # Get all elements that matches my price regex
[<data='<p class="price_color">Β£51.77</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">Β£53.74</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">Β£50.10</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">Β£47.82</p>' parent='<div class="product_price"> <p class="pr...'>,
...]
Find all elements that are similar to the current element in location and attributes
# For this case, ignore the 'title' attribute while matching
>>> page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title'])
[<data='<a href="catalogue/a-light-in-the-attic_...' parent='<h3><a href="catalogue/a-light-in-the-at...'>,
<data='<a href="catalogue/soumission_998/index....' parent='<h3><a href="catalogue/soumission_998/in...'>,
<data='<a href="catalogue/sharp-objects_997/ind...' parent='<h3><a href="catalogue/sharp-objects_997...'>,
...]
# You will notice that the number of elements is 19 not 20 because the current element is not included.
>>> len(page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title']))
19
# Get the `href` attribute from all similar elements
>>> [element.attrib['href'] for element in page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title'])]
['catalogue/a-light-in-the-attic_1000/index.html',
'catalogue/soumission_998/index.html',
'catalogue/sharp-objects_997/index.html',
...]
To increase the complexity a little bit, let's say we want to get all books' data using that element as a starting point for some reason
>>> for product in page.find_by_text('Tipping the Velvet').parent.parent.find_similar():
print({
"name": product.css_first('h3 a::text'),
"price": product.css_first('.price_color').re_first(r'[\d\.]+'),
"stock": product.css('.availability::text')[-1].clean()
})
{'name': 'A Light in the ...', 'price': '51.77', 'stock': 'In stock'}
{'name': 'Soumission', 'price': '50.10', 'stock': 'In stock'}
{'name': 'Sharp Objects', 'price': '47.82', 'stock': 'In stock'}
...
The documentation will provide more advanced examples.
Let's say you are scraping a page with a structure like this:
<div class="container">
<section class="products">
<article class="product" id="p1">
<h3>Product 1</h3>
<p class="description">Description 1</p>
</article>
<article class="product" id="p2">
<h3>Product 2</h3>
<p class="description">Description 2</p>
</article>
</section>
</div>
And you want to scrape the first product, the one with the p1
ID. You will probably write a selector like this
page.css('#p1')
When website owners implement structural changes like
<div class="new-container">
<div class="product-wrapper">
<section class="products">
<article class="product new-class" data-id="p1">
<div class="product-info">
<h3>Product 1</h3>
<p class="new-description">Description 1</p>
</div>
</article>
<article class="product new-class" data-id="p2">
<div class="product-info">
<h3>Product 2</h3>
<p class="new-description">Description 2</p>
</div>
</article>
</section>
</div>
</div>
The selector will no longer function and your code needs maintenance. That's where Scrapling's auto-matching feature comes into play.
from scrapling.parser import Adaptor
# Before the change
page = Adaptor(page_source, url='example.com')
element = page.css('#p1' auto_save=True)
if not element: # One day website changes?
element = page.css('#p1', auto_match=True) # Scrapling still finds it!
# the rest of the code...
How does the auto-matching work? Check the FAQs section for that and other possible issues while auto-matching.
Let's use a real website as an example and use one of the fetchers to fetch its source. To do this we need to find a website that will change its design/structure soon, take a copy of its source then wait for the website to make the change. Of course, that's nearly impossible to know unless I know the website's owner but that will make it a staged test haha.
To solve this issue, I will use The Web Archive's Wayback Machine. Here is a copy of StackOverFlow's website in 2010, pretty old huh?Let's test if the automatch feature can extract the same button in the old design from 2010 and the current design using the same selector :)
If I want to extract the Questions button from the old design I can use a selector like this #hmenus > div:nth-child(1) > ul > li:nth-child(1) > a
This selector is too specific because it was generated by Google Chrome. Now let's test the same selector in both versions
>> from scrapling.fetchers import Fetcher
>> selector = '#hmenus > div:nth-child(1) > ul > li:nth-child(1) > a'
>> old_url = "https://web.archive.org/web/20100102003420/http://stackoverflow.com/"
>> new_url = "https://stackoverflow.com/"
>>
>> page = Fetcher(automatch_domain='stackoverflow.com').get(old_url, timeout=30)
>> element1 = page.css_first(selector, auto_save=True)
>>
>> # Same selector but used in the updated website
>> page = Fetcher(automatch_domain="stackoverflow.com").get(new_url)
>> element2 = page.css_first(selector, auto_match=True)
>>
>> if element1.text == element2.text:
... print('Scrapling found the same element in the old design and the new design!')
'Scrapling found the same element in the old design and the new design!'
Note that I used a new argument called automatch_domain
, this is because for Scrapling these are two different URLs, not the website so it isolates their data. To tell Scrapling they are the same website, we then pass the domain we want to use for saving auto-match data for them both so Scrapling doesn't isolate them.
In a real-world scenario, the code will be the same except it will use the same URL for both requests so you won't need to use the automatch_domain
argument. This is the closest example I can give to real-world cases so I hope it didn't confuse you :)
Notes: 1. For the two examples above I used one time the Adaptor
class and the second time the Fetcher
class just to show you that you can create the Adaptor
object by yourself if you have the source or fetch the source using any Fetcher
class then it will create the Adaptor
object for you. 2. Passing the auto_save
argument with the auto_match
argument set to False
while initializing the Adaptor/Fetcher object will only result in ignoring the auto_save
argument value and the following warning message text Argument `auto_save` will be ignored because `auto_match` wasn't enabled on initialization. Check docs for more info.
This behavior is purely for performance reasons so the database gets created/connected only when you are planning to use the auto-matching features. Same case with the auto_match
argument.
auto_match
parameter works only for Adaptor
instances not Adaptors
so if you do something like this you will get an error python page.css('body').css('#p1', auto_match=True)
because you can't auto-match a whole list, you have to be specific and do something like python page.css_first('body').css('#p1', auto_match=True)
Inspired by BeautifulSoup's find_all
function you can find elements by using find_all
/find
methods. Both methods can take multiple types of filters and return all elements in the pages that all these filters apply to.
So the way it works is after collecting all passed arguments and keywords, each filter passes its results to the following filter in a waterfall-like filtering system.
It filters all elements in the current page/element in the following order:
Note: The filtering process always starts from the first filter it finds in the filtering order above so if no tag name(s) are passed but attributes are passed, the process starts from that layer and so on. But the order in which you pass the arguments doesn't matter.
Examples to clear any confusion :)
>> from scrapling.fetchers import Fetcher
>> page = Fetcher().get('https://quotes.toscrape.com/')
# Find all elements with tag name `div`.
>> page.find_all('div')
[<data='<div class="container"> <div class="row...' parent='<body> <div class="container"> <div clas...'>,
<data='<div class="row header-box"> <div class=...' parent='<div class="container"> <div class="row...'>,
...]
# Find all div elements with a class that equals `quote`.
>> page.find_all('div', class_='quote')
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Same as above.
>> page.find_all('div', {'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Find all elements with a class that equals `quote`.
>> page.find_all({'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Find all div elements with a class that equals `quote`, and contains the element `.text` which contains the word 'world' in its content.
>> page.find_all('div', {'class': 'quote'}, lambda e: "world" in e.css_first('.text::text'))
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>]
# Find all elements that don't have children.
>> page.find_all(lambda element: len(element.children) > 0)
[<data='<html lang="en"><head><meta charset="UTF...'>,
<data='<head><meta charset="UTF-8"><title>Quote...' parent='<html lang="en"><head><meta charset="UTF...'>,
<data='<body> <div class="container"> <div clas...' parent='<html lang="en"><head><meta charset="UTF...'>,
...]
# Find all elements that contain the word 'world' in its content.
>> page.find_all(lambda element: "world" in element.text)
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<a class="tag" href="/tag/world/page/1/"...' parent='<div class="tags"> Tags: <meta class="ke...'>]
# Find all span elements that match the given regex
>> page.find_all('span', re.compile(r'world'))
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>]
# Find all div and span elements with class 'quote' (No span elements like that so only div returned)
>> page.find_all(['div', 'span'], {'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Mix things up
>> page.find_all({'itemtype':"http://schema.org/CreativeWork"}, 'div').css('.author::text')
['Albert Einstein',
'J.K. Rowling',
...]
Here's what else you can do with Scrapling:
lxml.etree
object itself of any element directly python >>> quote._root <Element div at 0x107f98870>
Saving and retrieving elements manually to auto-match them outside the css
and the xpath
methods but you have to set the identifier by yourself.
To save an element to the database: python >>> element = page.find_by_text('Tipping the Velvet', first_match=True) >>> page.save(element, 'my_special_element')
python >>> element_dict = page.retrieve('my_special_element') >>> page.relocate(element_dict, adaptor_type=True) [<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>] >>> page.relocate(element_dict, adaptor_type=True).css('::text') ['Tipping the Velvet']
if you want to keep it as lxml.etree
object, leave the adaptor_type
argument python >>> page.relocate(element_dict) [<Element a at 0x105a2a7b0>]
Filtering results based on a function
# Find all products over $50
expensive_products = page.css('.product_pod').filter(
lambda p: float(p.css('.price_color').re_first(r'[\d\.]+')) > 50
)
# Find all the products with price '53.23'
page.css('.product_pod').search(
lambda p: float(p.css('.price_color').re_first(r'[\d\.]+')) == 54.23
)
Doing operations on element content is the same as scrapy python quote.re(r'regex_pattern') # Get all strings (TextHandlers) that match the regex pattern quote.re_first(r'regex_pattern') # Get the first string (TextHandler) only quote.json() # If the content text is jsonable, then convert it to json using `orjson` which is 10x faster than the standard json library and provides more options
except that you can do more with them like python quote.re( r'regex_pattern', replace_entities=True, # Character entity references are replaced by their corresponding character clean_match=True, # This will ignore all whitespaces and consecutive spaces while matching case_sensitive= False, # Set the regex to ignore letters case while compiling it )
Hence all of these methods are methods from the TextHandler
within that contains the text content so the same can be done directly if you call the .text
property or equivalent selector function.
Doing operations on the text content itself includes
python quote.clean()
TextHandler
objects too? so in cases where you have for example a JS object assigned to a JS variable inside JS code and want to extract it with regex and then convert it to json object, in other libraries, these would be more than 1 line of code but here you can do it in 1 line like this python page.xpath('//script/text()').re_first(r'var dataLayer = (.+);').json()
Sort all characters in the string as if it were a list and return the new string python quote.sort(reverse=False)
To be clear,
TextHandler
is a sub-class of Python'sstr
so all normal operations/methods that work with Python strings will work with it.
Any element's attributes are not exactly a dictionary but a sub-class of mapping called AttributesHandler
that's read-only so it's faster and string values returned are actually TextHandler
objects so all operations above can be done on them, standard dictionary operations that don't modify the data, and more :)
python >>> for item in element.attrib.search_values('catalogue', partial=True): print(item) {'href': 'catalogue/tipping-the-velvet_999/index.html'}
python >>> element.attrib.json_string b'{"href":"catalogue/tipping-the-velvet_999/index.html","title":"Tipping the Velvet"}'
python >>> dict(element.attrib) {'href': 'catalogue/tipping-the-velvet_999/index.html', 'title': 'Tipping the Velvet'}
Scrapling is under active development so expect many more features coming soon :)
There are a lot of deep details skipped here to make this as short as possible so to take a deep dive, head to the docs section. I will try to keep it updated as possible and add complex examples. There I will explain points like how to write your storage system, write spiders that don't depend on selectors at all, and more...
Note that implementing your storage system can be complex as there are some strict rules such as inheriting from the same abstract class, following the singleton design pattern used in other classes, and more. So make sure to read the docs first.
[!IMPORTANT] A website is needed to provide detailed library documentation.
I'm trying to rush creating the website, researching new ideas, and adding more features/tests/benchmarks but time is tight with too many spinning plates between work, personal life, and working on Scrapling. I have been working on Scrapling for months for free after all.
If you likeScrapling
and want it to keep improving then this is a friendly reminder that you can help by supporting me through the sponsor button.
This section addresses common questions about Scrapling, please read this section before opening an issue.
css
or xpath
with the auto_save
parameter set to True
before structural changes happen.Now because everything about the element can be changed or removed, nothing from the element can be used as a unique identifier for the database. To solve this issue, I made the storage system rely on two things:
identifier
parameter you passed to the method while selecting. If you didn't pass one, then the selector string itself will be used as an identifier but remember you will have to use it as an identifier value later when the structure changes and you want to pass the new selector.Together both are used to retrieve the element's unique properties from the database later. 4. Now later when you enable the auto_match
parameter for both the Adaptor instance and the method call. The element properties are retrieved and Scrapling loops over all elements in the page and compares each one's unique properties to the unique properties we already have for this element and a score is calculated for each one. 5. Comparing elements is not exact but more about finding how similar these values are, so everything is taken into consideration, even the values' order, like the order in which the element class names were written before and the order in which the same element class names are written now. 6. The score for each element is stored in the table, and the element(s) with the highest combined similarity scores are returned.
Not a big problem as it depends on your usage. The word default
will be used in place of the URL field while saving the element's unique properties. So this will only be an issue if you used the same identifier later for a different website that you didn't pass the URL parameter while initializing it as well. The save process will overwrite the previous data and auto-matching uses the latest saved properties only.
For each element, Scrapling will extract: - Element tag name, text, attributes (names and values), siblings (tag names only), and path (tag names only). - Element's parent tag name, attributes (names and values), and text.
auto_save
/auto_match
parameter while selecting and it got completely ignored with a warning messageThat's because passing the auto_save
/auto_match
argument without setting auto_match
to True
while initializing the Adaptor object will only result in ignoring the auto_save
/auto_match
argument value. This behavior is purely for performance reasons so the database gets created only when you are planning to use the auto-matching features.
It could be one of these reasons: 1. No data were saved/stored for this element before. 2. The selector passed is not the one used while storing element data. The solution is simple - Pass the old selector again as an identifier to the method called. - Retrieve the element with the retrieve method using the old selector as identifier then save it again with the save method and the new selector as identifier. - Start using the identifier argument more often if you are planning to use every new selector from now on. 3. The website had some extreme structural changes like a new full design. If this happens a lot with this website, the solution would be to make your code as selector-free as possible using Scrapling features.
Pretty much yeah, almost all features you get from BeautifulSoup can be found or achieved in Scrapling one way or another. In fact, if you see there's a feature in bs4 that is missing in Scrapling, please make a feature request from the issues tab to let me know.
Of course, you can find elements by text/regex, find similar elements in a more reliable way than AutoScraper, and finally save/retrieve elements manually to use later as the model feature in AutoScraper. I have pulled all top articles about AutoScraper from Google and tested Scrapling against examples in them. In all examples, Scrapling got the same results as AutoScraper in much less time.
Yes, Scrapling instances are thread-safe. Each Adaptor instance maintains its state.
Everybody is invited and welcome to contribute to Scrapling. There is a lot to do!
Please read the contributing file before doing anything.
[!CAUTION] This library is provided for educational and research purposes only. By using this library, you agree to comply with local and international laws regarding data scraping and privacy. The authors and contributors are not responsible for any misuse of this software. This library should not be used to violate the rights of others, for unethical purposes, or to use data in an unauthorized or illegal manner. Do not use it on any website unless you have permission from the website owner or within their allowed rules like the
robots.txt
file, for example.
This work is licensed under BSD-3
This project includes code adapted from: - Parsel (BSD License) - Used for translator submodule
Frogy 2.0 is an automated external reconnaissance and Attack Surface Management (ASM) toolkit designed to map out an organization's entire internet presence. It identifies assets, IP addresses, web applications, and other metadata across the public internet and then smartly prioritizes them with highest (most attractive) to lowest (least attractive) from an attacker's playground perspective.
Comprehensive recon:
Aggregate subdomains and assets using multiple tools (CHAOS, Subfinder, Assetfinder, crt.sh) to map an organization's entire digital footprint.
Live asset verification:
Validate assets with live DNS resolution and port scanning (using DNSX and Naabu) to confirm what is publicly reachable.
In-depth web recon:
Collect detailed HTTP response data (via HTTPX) including metadata, technology stack, status codes, content lengths, and more.
Smart prioritization:
Use a composite scoring system that considers homepage status, login identification, technology stack, and DNS data and much more to generate risk score for each assets helping bug bounty hunters and pentesters focus on the most promising targets to start attacks with.
Professional reporting:
Generate a dynamic, colour-coded HTML report with a modern design and dark/light theme toggle.
In this tool, risk scoring is based on the notion of asset attractivenessβthe idea that certain attributes or characteristics make an asset more interesting to attackers. If we see more of these attributes, the overall score goes up, indicating a broader "attack surface" that adversaries could leverage. Below is an overview of how each factor contributes to the final risk score.
200 OK
, it often means the page is legitimately reachable and responding with content. A 200 OK
is more interesting to attackers than a 404
or a redirectβso a 200 status modestly increases the risk.Strict-Transport-Security (HSTS)
X-Frame-Options
Content-Security-Policy
X-XSS-Protection
Referrer-Policy
Permissions-Policy
Missing or disabled headers mean an endpoint is more prone to common web exploits. Each absent header increments the score.
Each factor above contributes one or more points to the final risk score. For example:
Once all factors are tallied, we get a numeric risk score. Higher means more interesting and potentially gives more room for pentesters to test around to an attacker.
Why This Matters
This approach helps you quickly prioritize which assets warrant deeper testing. Subdomains with high counts of open ports, advanced internal usage, missing headers, or login panels are more complex, more privileged, or more likely to be misconfiguredβtherefore, your security team can focus on those first.
Clone the repository and run the installer script to set up all dependencies and tools:
chmod +x install.sh
./install.sh
chmod +x frogy.sh
./frogy.sh domains.txt
https://www.youtube.com/watch?v=LHlU4CYNj1M
OWASP Maryam is a modular open-source framework based on OSINT and data gathering. It is designed to provide a robust environment to harvest data from open sources and search engines quickly and thoroughly.
$ pip install maryam
Alternatively, you can install the latest version with the following command (Recommended):
pip install git+https://github.com/saeeddhqan/maryam.git
# Using dns_search. --max means all of resources. --api shows the results as json.
# .. -t means use multi-threading.
maryam -e dns_search -d ibm.com -t 5 --max --api --form
# Using youtube. -q means query
maryam -e youtube -q "<QUERY>"
maryam -e google -q "<QUERY>"
maryam -e dnsbrute -d domain.tld
# Show framework modules
maryam -e show modules
# Set framework options.
maryam -e set proxy ..
maryam -e set agent ..
maryam -e set timeout ..
# Run web API
maryam -e web api 127.0.0.1 1313
Here is a start guide: Development Guide You can add a new search engine to the util classes or use the current search engines to write a new module. The best help to write a new module is checking the current modules.
To report bugs, requests, or any other issues please create an issue.
Lobo GuarΓ‘ is a platform aimed at cybersecurity professionals, with various features focused on Cyber Threat Intelligence (CTI). It offers tools that make it easier to identify threats, monitor data leaks, analyze suspicious domains and URLs, and much more.
Allows identifying domains and subdomains that may pose a threat to organizations. SSL certificates issued by trusted authorities are indexed in real-time, and users can search using keywords of 4 or more characters.
Note: The current database contains certificates issued from September 5, 2024.
Allows the insertion of keywords for monitoring. When a certificate is issued and the common name contains the keyword (minimum of 5 characters), it will be displayed to the user.
Generates a link to capture device information from attackers. Useful when the security professional can contact the attacker in some way.
Performs a scan on a domain, displaying whois information and subdomains associated with that domain.
Allows performing a scan on a URL to identify URIs (web paths) related to that URL.
Performs a scan on a URL, generating a screenshot and a mirror of the page. The result can be made public to assist in taking down malicious websites.
Monitors a URL with no active application until it returns an HTTP 200 code. At that moment, it automatically initiates a URL scan, providing evidence for actions against malicious sites.
Centralizes intelligence news from various channels, keeping users updated on the latest threats.
The application installation has been approved on Ubuntu 24.04 Server and Red Hat 9.4 distributions, the links for which are below:
Lobo GuarΓ‘ Implementation on Ubuntu 24.04
Lobo GuarΓ‘ Implementation on Red Hat 9.4
There is a Dockerfile and a docker-compose version of Lobo GuarΓ‘ too. Just clone the repo and do:
docker compose up
Then, go to your web browser at localhost:7405.
Before proceeding with the installation, ensure the following dependencies are installed:
git clone https://github.com/olivsec/loboguara.git
cd loboguara/
nano server/app/config.py
Fill in the required parameters in the config.py
file:
class Config:
SECRET_KEY = 'YOUR_SECRET_KEY_HERE'
SQLALCHEMY_DATABASE_URI = 'postgresql://guarauser:YOUR_PASSWORD_HERE@localhost/guaradb?sslmode=disable'
SQLALCHEMY_TRACK_MODIFICATIONS = False
MAIL_SERVER = 'smtp.example.com'
MAIL_PORT = 587
MAIL_USE_TLS = True
MAIL_USERNAME = 'no-reply@example.com'
MAIL_PASSWORD = 'YOUR_SMTP_PASSWORD_HERE'
MAIL_DEFAULT_SENDER = 'no-reply@example.com'
ALLOWED_DOMAINS = ['yourdomain1.my.id', 'yourdomain2.com', 'yourdomain3.net']
API_ACCESS_TOKEN = 'YOUR_LOBOGUARA_API_TOKEN_HERE'
API_URL = 'https://loboguara.olivsec.com.br/api'
CHROME_DRIVER_PATH = '/opt/loboguara/bin/chromedriver'
GOOGLE_CHROME_PATH = '/opt/loboguara/bin/google-chrome'
FFUF_PATH = '/opt/loboguara/bin/ffuf'
SUBFINDER_PATH = '/opt/loboguara/bin/subfinder'
LOG_LEVEL = 'ERROR'
LOG_FILE = '/opt/loboguara/logs/loboguara.log'
sudo chmod +x ./install.sh
sudo ./install.sh
sudo -u loboguara /opt/loboguara/start.sh
Access the URL below to register the Lobo GuarΓ‘ Super Admin
http://your_address:7405/admin
Access the Lobo GuarΓ‘ platform online: https://loboguara.olivsec.com.br/
secator
is a task and workflow runner used for security assessments. It supports dozens of well-known security tools and it is designed to improve productivity for pentesters and security researchers.
Curated list of commands
Unified input options
Unified output schema
CLI and library usage
Distributed options with Celery
Complexity from simple tasks to complex workflows
secator
integrates the following tools:
Name | Description | Category |
---|---|---|
httpx | Fast HTTP prober. | http |
cariddi | Fast crawler and endpoint secrets / api keys / tokens matcher. | http/crawler |
gau | Offline URL crawler (Alien Vault, The Wayback Machine, Common Crawl, URLScan). | http/crawler |
gospider | Fast web spider written in Go. | http/crawler |
katana | Next-generation crawling and spidering framework. | http/crawler |
dirsearch | Web path discovery. | http/fuzzer |
feroxbuster | Simple, fast, recursive content discovery tool written in Rust. | http/fuzzer |
ffuf | Fast web fuzzer written in Go. | http/fuzzer |
h8mail | Email OSINT and breach hunting tool. | osint |
dnsx | Fast and multi-purpose DNS toolkit designed for running DNS queries. | recon/dns |
dnsxbrute | Fast and multi-purpose DNS toolkit designed for running DNS queries (bruteforce mode). | recon/dns |
subfinder | Fast subdomain finder. | recon/dns |
fping | Find alive hosts on local networks. | recon/ip |
mapcidr | Expand CIDR ranges into IPs. | recon/ip |
naabu | Fast port discovery tool. | recon/port |
maigret | Hunt for user accounts across many websites. | recon/user |
gf | A wrapper around grep to avoid typing common patterns. | tagger |
grype | A vulnerability scanner for container images and filesystems. | vuln/code |
dalfox | Powerful XSS scanning tool and parameter analyzer. | vuln/http |
msfconsole | CLI to access and work with the Metasploit Framework. | vuln/http |
wpscan | WordPress Security Scanner | vuln/multi |
nmap | Vulnerability scanner using NSE scripts. | vuln/multi |
nuclei | Fast and customisable vulnerability scanner based on simple YAML based DSL. | vuln/multi |
searchsploit | Exploit searcher. | exploit/search |
Feel free to request new tools to be added by opening an issue, but please check that the tool complies with our selection criterias before doing so. If it doesn't but you still want to integrate it into secator
, you can plug it in (see the dev guide).
pipx install secator
pip install secator
wget -O - https://raw.githubusercontent.com/freelabz/secator/main/scripts/install.sh | sh
docker run -it --rm --net=host -v ~/.secator:/root/.secator freelabz/secator --help
The volume mount -v is necessary to save all secator reports to your host machine, and--net=host is recommended to grant full access to the host network. You can alias this command to run it easier: alias secator="docker run -it --rm --net=host -v ~/.secator:/root/.secator freelabz/secator"
Now you can run secator like if it was installed on baremetal: secator --help
git clone https://github.com/freelabz/secator
cd secator
docker-compose up -d
docker-compose exec secator secator --help
Note: If you chose the Bash, Docker or Docker Compose installation methods, you can skip the next sections and go straight to Usage.
secator
uses external tools, so you might need to install languages used by those tools assuming they are not already installed on your system.
We provide utilities to install required languages if you don't manage them externally:
secator install langs go
secator install langs ruby
secator
does not install any of the external tools it supports by default.
We provide utilities to install or update each supported tool which should work on all systems supporting apt
:
secator install tools
secator install tools <TOOL_NAME>
For instance, to install `httpx`, use: secator install tools httpx
Please make sure you are using the latest available versions for each tool before you run secator or you might run into parsing / formatting issues.
secator
comes installed with the minimum amount of dependencies.
There are several addons available for secator
:
secator install addons worker
secator install addons google
secator install addons mongodb
secator install addons redis
secator install addons dev
secator install addons trace
secator install addons build
secator
makes remote API calls to https://cve.circl.lu/ to get in-depth information about the CVEs it encounters. We provide a subcommand to download all known CVEs locally so that future lookups are made from disk instead:
secator install cves
To figure out which languages or tools are installed on your system (along with their version):
secator health
secator --help
Run a fuzzing task (ffuf
):
secator x ffuf http://testphp.vulnweb.com/FUZZ
Run a url crawl workflow:
secator w url_crawl http://testphp.vulnweb.com
Run a host scan:
secator s host mydomain.com
and more... to list all tasks / workflows / scans that you can use:
secator x --help
secator w --help
secator s --help
To go deeper with secator
, check out: * Our complete documentation * Our getting started tutorial video * Our Medium post * Follow us on social media: @freelabz on Twitter and @FreeLabz on YouTube
Reconnaissance is the first phase of penetration testing which means gathering information before any real attacks are planned So Ashok is an Incredible fast recon tool for penetration tester which is specially designed for Reconnaissance" title="Reconnaissance">Reconnaissance phase. And in Ashok-v1.1 you can find the advanced google dorker and wayback crawling machine.
- Wayback Crawler Machine
- Google Dorking without limits
- Github Information Grabbing
- Subdomain Identifier
- Cms/Technology Detector With Custom Headers
~> git clone https://github.com/ankitdobhal/Ashok
~> cd Ashok
~> python3.7 -m pip3 install -r requirements.txt
A detailed usage guide is available on Usage section of the Wiki.
But Some index of options is given below:
Ashok can be launched using a lightweight Python3.8-Alpine Docker image.
$ docker pull powerexploit/ashok-v1.2
$ docker container run -it powerexploit/ashok-v1.2 --help
Howdy! My name is Harrison Richardson, or rs0n
(arson) when I want to feel cooler than I really am. The code in this repository started as a small collection of scripts to help automate many of the common Bug Bounty hunting processes I found myself repeating. Over time, I built a simple web application with a MongoDB connection to manage my findings and identify valuable data points. After 5 years of Bug Bounty hunting, both part-time and full-time, I'm finally ready to package this collection of tools into a proper framework.
The Ars0n Framework is designed to provide aspiring Application Security Engineers with all the tools they need to leverage Bug Bounty hunting as a means to learn valuable, real-world AppSec concepts and make π° doing it! My goal is to lower the barrier of entry for Bug Bounty hunting by providing easy-to-use automation tools in combination with educational content and how-to guides for a wide range of Web-based and Cloud-based vulnerabilities. In combination with my YouTube content, this framework will help aspiring Application Security Engineers to quickly and easily understand real-world security concepts that directly translate to a high paying career in Cyber Security.
In addition to using this tool for Bug Bounty Hunting, aspiring engineers can also use this Github Repository as a canvas to practice collaborating with other developers! This tool was inspired by Metasploit and designed to be modular in a similar way. Each Script (Ex: wildfire.py
or slowburn.py
) is basically an algorithm that runs the Modules (Ex: fire-starter.py
or fire-scanner.py
) in a specific patter for a desired result. Because of this design, the community is free to build new Scripts to solve a specific use-case or Modules to expand the results of these Scripts. By learning the code in this framework and using Github to contribute your own code, aspiring engineers will continue to learn real-world skills that can be applied on the first day of a Security Engineer I position.
My hope is that this modular framework will act as a canvas to help share what I've learned over my career to the next generation of Security Engineers! Trust me, we need all the help we can get!!
Paste this code block into a clean installation of Kali Linux 2023.4 to download, install, and run the latest stable Alpha version of the framework:
sudo apt update && sudo apt-get update
sudo apt -y upgrade && sudo apt-get -y upgrade
wget https://github.com/R-s0n/ars0n-framework/releases/download/v0.0.2-alpha/ars0n-framework-v0.0.2-alpha.tar.gz
tar -xzvf ars0n-framework-v0.0.2-alpha.tar.gz
rm ars0n-framework-v0.0.2-alpha.tar.gz
cd ars0n-framework
./install.sh
wget https://github.com/R-s0n/ars0n-framework/releases/download/v0.0.2-alpha/ars0n-framework-v0.0.2-alpha.tar.gz
tar -xzvf ars0n-framework-v0.0.2-alpha.tar.gz
rm ars0n-framework-v0.0.2-alpha.tar.gz
The Ars0n Framework includes a script that installs all the necessary tools, packages, etc. that are needed to run the framework on a clean installation of Kali Linux 2023.4.
Please note that the only supported installation of this framework is on a clean installation of Kali Linux 2023.3. If you choose to try and run the framework outside of a clean Kali install, I will not be able to help troubleshoot if you have any issues.
./install.sh
This video shows exactly what to expect from a successful installation.
If you are using an ARM Processor, you will need to add the --arm flag to all Install/Run scripts
./install.sh --arm
You will be prompted to enter various API keys and tokens when the installation begins. Entering these is not required to run the core functionality of the framework. If you do not enter these API keys and tokens at the time of installation, simply hit enter at each of the prompts. The keys can be added later to the ~/.keys
directory. More information about how to add these keys manually can be found in the Frequently Asked Questions section of this README.
Once the installation is complete, you will be given the option to run the application by entering Y
. If you choose not the run the application immediately, or if you need to run the application after a reboot, simply navigate to the root directly and run the run.sh
bash script.
./run.sh
If you are using an ARM Processor, you will need to add the --arm flag to all Install/Run scripts
./run.sh --arm
The Ars0n Framework's Core Modules are used to determine the basic scanning logic. Each script is designed to support a specific recon methodology based on what the user is trying to accomplish.
At this time, the Wildfire script is the most widely used Core Module in the Ars0n Framework. The purpose of this module is to allow the user to scan multiple targets that allow for testing on any subdomain discovered by the researcher.
How it works:
Most Wildfire scans take between 8 and 48 hours to complete against a single domain if all Sub-Modules are being run. Variations in this timing can be caused by a number of factors, including the target application and the machine running the framework.
Also, please note that most data will not show in the GUI until the scan has completed. It's best to try and run the scan overnight or over a weekend, depending on the number of domains being scanned, and return once the scan has complete to move from Recon to Enumeration.
Running Wildfire:
Wildfire can be run from the GUI using the Wildfire button on the dashboard. Once clicked, the front-end will use the checkboxes on the screen to determine what flags should be passed to the scanner.
Please note that running scans from the GUI still has a few bugs and edge cases that haven't been sorted out. If you have any issues, you can simply run the scan form the CLI.
All Core Modules for The Ars0n Framework are stored in the /toolkit
directory. Simply navigate to the directory and run wildfire.py
with the necessary flags. At least one Sub-Module flag must be provided.
python3 wildfire.py --start --cloud --scan
Unlike the Wildfire module, which requires the user to identify target domains to scan, the Slowburn module does that work for you. By communicating with APIs for various bug bounty hunting platforms, this script will identify all domains that allow for testing on any discovered subdomain. Once the data has been populated, Slowburn will randomly choose one domain at a time to scan in the same way Wildfire does.
Please note that the Slowburn module is still in development and is not considered part of the stable alpha release. There will likely be bugs and edge cases encountered by the user.
In order for Slowburn to identify targets to scan, it must first be initialized. This initialization step collects the necessary data from various API's and deposits them into a JSON file stored locally. Once this initialization step is complete, Slowburn will automatically begin selecting and scanning one target at a time.
To initalize Slowburn, simply run the following command:
python3 slowburn.py --initialize
Once the data has been collected, it is up to the user whether they want to re-initialize the tool upon the next scan.
Remember that the scope and targets on public bug bounty programs can change frequently. If you choose to run Slowburn without initializing the data, you may be scanning domains that are no longer in scope for the program. It is strongly recommended that Slowburn be re-initialized each time before running.
If you choose not to re-initialize the target data, you can run Slowburn using the previously collected data with the following command:
python3 slowburn.py
The Ars0n Framework's Sub-Modules are designed to be leveraged by the Core Modules to divide the Recon & Enumeration phases into specific tasks. The data collected in each Sub-Module is used by the others to expand your picture of the target's attack surface.
Fire-Starter is the first step to performing recon against a target domain. The goal of this script is to collect a wealth of information about the attack surface of your target. Once collected, this data will be used by all other Sub-Modules to help the user identify a specific URL that is potentially vulnerable.
Fire-Starter works by running a series of open-source tools to enumerate hidden subdomains, DNS records, and the ASN's to identify where those external entries are hosted. Currently, Fire-Starter works by chaining together the following widely used open-source tools:
These tools cover a wide range of techniques to identify hidden subdomains, including web scraping, brute force, and crawling to identify links and JavaScript URLs.
Once the scan is complete, the Dashboard will be updated and available to the user.
Most Sub-Modules in The Ars0n Framework requre the data collected from the Fire-Starter module to work. With this in mind, Fire-Starter must be included in the first scan against a target for any usable data to be collected.
Coming soon...
Fire-Scanner uses the results of Fire-Starter and Fire-Cloud to perform Wide-Band Scanning against all subdomains and cloud services that have been discovered from previous scans.
At this stage of development, this script leverages Nuclei almost exclusively for all scanning. Instead of simply running the tool, Fire-Scanner breaks the scan down into specific collections of Nuclei Templates and scans them one by one. This strategy helps ensure the scans are stable and produce consistent results, removes any unnecessary or unsafe scan checks, and produces actionable results.
The vast majority of issues installing and/or running the Ars0n Framework are caused by not installing the tool on a clean installation of Kali Linux.
It is important to remember that, at its core, the Ars0n Framework is a collection of automation scripts designed to run existing open-source tools. Each of these tools have their own ways of operating and can experience unexpected behavior if conflicts emerge with any existing service/tool running on the user's system. This complexity is the reason why running The Ars0n Framework should only be run on a clean installation of Kali Linux.
Another very common issue users experience is caused by MongoDB not successfully installing and/or running on their machine. The most common manifestation of this issue is the user is unable to add an initial FQDN and simply sees a broken GUI. If this occurs, please ensure that your machine has the necessary system requirements to run MongoDB. Unfortunately, there is no current solution if you run into this issue.
Coming soon...
SherlockChain is a powerful smart contract analysis framework that combines the capabilities of the renowned Slither tool with advanced AI-powered features. Developed by a team of security experts and AI researchers, SherlockChain offers unparalleled insights and vulnerability detection for Solidity, Vyper and Plutus smart contracts.
To install SherlockChain, follow these steps:
git clone https://github.com/0xQuantumCoder/SherlockChain.git
cd SherlockChain
pip install .
SherlockChain's AI integration brings several advanced capabilities to the table:
Natural Language Interaction: Users can interact with SherlockChain using natural language, allowing them to query the tool, request specific analyses, and receive detailed responses. he --help
command in the SherlockChain framework provides a comprehensive overview of all the available options and features. It includes information on:
Vulnerability Detection: The --detect
and --exclude-detectors
options allow users to specify which vulnerability detectors to run, including both built-in and AI-powered detectors.
--report-format
, --report-output
, and various --report-*
options control how the analysis results are reported, including the ability to generate reports in different formats (JSON, Markdown, SARIF, etc.).--filter-*
options enable users to filter the reported issues based on severity, impact, confidence, and other criteria.--ai-*
options allow users to configure and control the AI-powered features of SherlockChain, such as prioritizing high-impact vulnerabilities, enabling specific AI detectors, and managing AI model configurations.--truffle
and --truffle-build-directory
facilitate the integration of SherlockChain into popular development frameworks like Truffle.The --help
command provides a detailed explanation of each option, its purpose, and how to use it, making it a valuable resource for users to quickly understand and leverage the full capabilities of the SherlockChain framework.
Example usage:
sherlockchain --help
This will display the comprehensive usage guide for the SherlockChain framework, including all available options and their descriptions.
usage: sherlockchain [-h] [--version] [--solc-remaps SOLC_REMAPS] [--solc-settings SOLC_SETTINGS]
[--solc-version SOLC_VERSION] [--truffle] [--truffle-build-directory TRUFFLE_BUILD_DIRECTORY]
[--truffle-config-file TRUFFLE_CONFIG_FILE] [--compile] [--list-detectors]
[--list-detectors-info] [--detect DETECTORS] [--exclude-detectors EXCLUDE_DETECTORS]
[--print-issues] [--json] [--markdown] [--sarif] [--text] [--zip] [--output OUTPUT]
[--filter-paths FILTER_PATHS] [--filter-paths-exclude FILTER_PATHS_EXCLUDE]
[--filter-contracts FILTER_CONTRACTS] [--filter-contracts-exclude FILTER_CONTRACTS_EXCLUDE]
[--filter-severity FILTER_SEVERITY] [--filter-impact FILTER_IMPACT]
[--filter-confidence FILTER_CONFIDENCE] [--filter-check-suicidal]
[--filter-check-upgradeable] [--f ilter-check-erc20] [--filter-check-erc721]
[--filter-check-reentrancy] [--filter-check-gas-optimization] [--filter-check-code-quality]
[--filter-check-best-practices] [--filter-check-ai-detectors] [--filter-check-all]
[--filter-check-none] [--check-all] [--check-suicidal] [--check-upgradeable]
[--check-erc20] [--check-erc721] [--check-reentrancy] [--check-gas-optimization]
[--check-code-quality] [--check-best-practices] [--check-ai-detectors] [--check-none]
[--check-all-detectors] [--check-all-severity] [--check-all-impact] [--check-all-confidence]
[--check-all-categories] [--check-all-filters] [--check-all-options] [--check-all]
[--check-none] [--report-format {json,markdown,sarif,text,zip}] [--report-output OUTPUT]
[--report-severity REPORT_SEVERITY] [--report-impact R EPORT_IMPACT]
[--report-confidence REPORT_CONFIDENCE] [--report-check-suicidal]
[--report-check-upgradeable] [--report-check-erc20] [--report-check-erc721]
[--report-check-reentrancy] [--report-check-gas-optimization] [--report-check-code-quality]
[--report-check-best-practices] [--report-check-ai-detectors] [--report-check-all]
[--report-check-none] [--report-all] [--report-suicidal] [--report-upgradeable]
[--report-erc20] [--report-erc721] [--report-reentrancy] [--report-gas-optimization]
[--report-code-quality] [--report-best-practices] [--report-ai-detectors] [--report-none]
[--report-all-detectors] [--report-all-severity] [--report-all-impact]
[--report-all-confidence] [--report-all-categories] [--report-all-filters]
[--report-all-options] [- -report-all] [--report-none] [--ai-enabled] [--ai-disabled]
[--ai-priority-high] [--ai-priority-medium] [--ai-priority-low] [--ai-priority-all]
[--ai-priority-none] [--ai-confidence-high] [--ai-confidence-medium] [--ai-confidence-low]
[--ai-confidence-all] [--ai-confidence-none] [--ai-detectors-all] [--ai-detectors-none]
[--ai-detectors-specific AI_DETECTORS_SPECIFIC] [--ai-detectors-exclude AI_DETECTORS_EXCLUDE]
[--ai-models-path AI_MODELS_PATH] [--ai-models-update] [--ai-models-download]
[--ai-models-list] [--ai-models-info] [--ai-models-version] [--ai-models-check]
[--ai-models-upgrade] [--ai-models-remove] [--ai-models-clean] [--ai-models-reset]
[--ai-models-backup] [--ai-models-restore] [--ai-models-export] [--ai-models-import]
[--ai-models-config AI_MODELS_CONFIG] [--ai-models-config-update] [--ai-models-config-reset]
[--ai-models-config-export] [--ai-models-config-import] [--ai-models-config-list]
[--ai-models-config-info] [--ai-models-config-version] [--ai-models-config-check]
[--ai-models-config-upgrade] [--ai-models-config-remove] [--ai-models-config-clean]
[--ai-models-config-reset] [--ai-models-config-backup] [--ai-models-config-restore]
[--ai-models-config-export] [--ai-models-config-import] [--ai-models-config-path AI_MODELS_CONFIG_PATH]
[--ai-models-config-file AI_MODELS_CONFIG_FILE] [--ai-models-config-url AI_MODELS_CONFIG_URL]
[--ai-models-config-name AI_MODELS_CONFIG_NAME] [--ai-models-config-description AI_MODELS_CONFIG_DESCRIPTION]
[--ai-models-config-version-major AI_MODELS_CONFIG_VERSION_MAJOR]
[--ai-models-config- version-minor AI_MODELS_CONFIG_VERSION_MINOR]
[--ai-models-config-version-patch AI_MODELS_CONFIG_VERSION_PATCH]
[--ai-models-config-author AI_MODELS_CONFIG_AUTHOR]
[--ai-models-config-license AI_MODELS_CONFIG_LICENSE]
[--ai-models-config-url-documentation AI_MODELS_CONFIG_URL_DOCUMENTATION]
[--ai-models-config-url-source AI_MODELS_CONFIG_URL_SOURCE]
[--ai-models-config-url-issues AI_MODELS_CONFIG_URL_ISSUES]
[--ai-models-config-url-changelog AI_MODELS_CONFIG_URL_CHANGELOG]
[--ai-models-config-url-support AI_MODELS_CONFIG_URL_SUPPORT]
[--ai-models-config-url-website AI_MODELS_CONFIG_URL_WEBSITE]
[--ai-models-config-url-logo AI_MODELS_CONFIG_URL_LOGO]
[--ai-models-config-url-icon AI_MODELS_CONFIG_URL_ICON]
[--ai-models-config-url-banner AI_MODELS_CONFIG_URL_BANNER]
[--ai-models-config-url-screenshot AI_MODELS_CONFIG_URL_SCREENSHOT]
[--ai-models-config-url-video AI_MODELS_CONFIG_URL_VIDEO]
[--ai-models-config-url-demo AI_MODELS_CONFIG_URL_DEMO]
[--ai-models-config-url-documentation-api AI_MODELS_CONFIG_URL_DOCUMENTATION_API]
[--ai-models-config-url-documentation-user AI_MODELS_CONFIG_URL_DOCUMENTATION_USER]
[--ai-models-config-url-documentation-developer AI_MODELS_CONFIG_URL_DOCUMENTATION_DEVELOPER]
[--ai-models-config-url-documentation-faq AI_MODELS_CONFIG_URL_DOCUMENTATION_FAQ]
[--ai-models-config-url-documentation-tutorial AI_MODELS_CONFIG_URL_DOCUMENTATION_TUTORIAL]
[--ai-models-config-url-documentation-guide AI_MODELS_CONFIG_URL_DOCUMENTATION_GUIDE]
[--ai-models-config-url-documentation-whitepaper AI_MODELS_CONFIG_URL_DOCUMENTATION_WHITEPAPER]
[--ai-models-config-url-documentation-roadmap AI_MODELS_CONFIG_URL_DOCUMENTATION_ROADMAP]
[--ai-models-config-url-documentation-blog AI_MODELS_CONFIG_URL_DOCUMENTATION_BLOG]
[--ai-models-config-url-documentation-community AI_MODELS_CONFIG_URL_DOCUMENTATION_COMMUNITY]
This comprehensive usage guide provides information on all the available options and features of the SherlockChain framework, including:
--detect
, --exclude-detectors
--report-format
, --report-output
, --report-*
--filter-*
--ai-*
--truffle
, --truffle-build-directory
--compile
, --list-detectors
, --list-detectors-info
By reviewing this comprehensive usage guide, you can quickly understand how to leverage the full capabilities of the SherlockChain framework to analyze your smart contracts and identify potential vulnerabilities. This will help you ensure the security and reliability of your DeFi protocol before deployment.
Num | Detector | What it Detects | Impact | Confidence |
---|---|---|---|---|
1 | ai-anomaly-detection | Detect anomalous code patterns using advanced AI models | High | High |
2 | ai-vulnerability-prediction | Predict potential vulnerabilities using machine learning | High | High |
3 | ai-code-optimization | Suggest code optimizations based on AI-driven analysis | Medium | High |
4 | ai-contract-complexity | Assess contract complexity and maintainability using AI | Medium | High |
5 | ai-gas-optimization | Identify gas-optimizing opportunities with AI | Medium | Medium |
## Detectors |
Domainim is a fast domain reconnaissance tool for organizational network scanning. The tool aims to provide a brief overview of an organization's structure using techniques like OSINT, bruteforcing, DNS resolving etc.
Current features (v1.0.1)- - Subdomain enumeration (2 engines + bruteforcing) - User-friendly output - Resolving A records (IPv4)
A few features are work in progress. See Planned features for more details.
The project is inspired by Sublist3r. The port scanner module is heavily based on NimScan.
You can build this repo from source- - Clone the repository
git clone git@github.com:pptx704/domainim
nimble build
./domainim <domain> [--ports=<ports>]
Or, you can just download the binary from the release page. Keep in mind that the binary is tested on Debian based systems only.
./domainim <domain> [--ports=<ports> | -p:<ports>] [--wordlist=<filename> | l:<filename> [--rps=<int> | -r:<int>]] [--dns=<dns> | -d:<dns>] [--out=<filename> | -o:<filename>]
<domain>
is the domain to be enumerated. It can be a subdomain as well.-- ports | -p
is a string speicification of the ports to be scanned. It can be one of the following-all
- Scan all ports (1-65535)none
- Skip port scanning (default)t<n>
- Scan top n ports (same as nmap
). i.e. t100
scans top 100 ports. Max value is 5000. If n is greater than 5000, it will be set to 5000.80
scans port 8080-100
scans ports 80 to 10080,443,8080
scans ports 80, 443 and 808080,443,8080-8090,t500
scans ports 80, 443, 8080 to 8090 and top 500 ports--dns | -d
is the address of the dns server. This should be a valid IPv4 address and can optionally contain the port number-a.b.c.d
- Use DNS server at a.b.c.d
on port 53a.b.c.d#n
- Use DNS server at a.b.c.d
on port e
--wordlist | -l
- Path to the wordlist file. This is used for bruteforcing subdomains. If the file is invalid, bruteforcing will be skipped. You can get a wordlist from SecLists. A wordlist is also provided in the release page.--rps | -r
- Number of requests to be made per second during bruteforce. The default value is 1024 req/s
. It is to be noted that, DNS queries are made in batches and next batch is made only after the previous one is completed. Since quries can be rate limited, increasing the value does not always guarantee faster results.--out | -o
- Path to the output file. The output will be saved in JSON format. The filename must end with .json
.Examples - ./domainim nmap.org --ports=all
- ./domainim google.com --ports=none --dns=8.8.8.8#53
- ./domainim pptx704.com --ports=t100 --wordlist=wordlist.txt --rps=1500
- ./domainim pptx704.com --ports=t100 --wordlist=wordlist.txt --outfile=results.json
- ./domainim mysite.com --ports=t50,5432,7000-9000 --dns=1.1.1.1
The help menu can be accessed using ./domainim --help
or ./domainim -h
.
Usage:
domainim <domain> [--ports=<ports> | -p:<ports>] [--wordlist=<filename> | l:<filename> [--rps=<int> | -r:<int>]] [--dns=<dns> | -d:<dns>] [--out=<filename> | -o:<filename>]
domainim (-h | --help)
Options:
-h, --help Show this screen.
-p, --ports Ports to scan. [default: `none`]
Can be `all`, `none`, `t<n>`, single value, range value, combination
-l, --wordlist Wordlist for subdomain bruteforcing. Bruteforcing is skipped for invalid file.
-d, --dns IP and Port for DNS Resolver. Should be a valid IPv4 with an optional port [default: system default]
-r, --rps DNS queries to be made per second [default: 1024 req/s]
-o, --out JSON file where the output will be saved. Filename must end with `.json`
Examples:
domainim domainim.com -p:t500 -l:wordlist.txt --dns:1.1.1.1#53 --out=results.json
domainim sub.domainim.com --ports=all --dns:8.8.8.8 -t:1500 -o:results.json
The JSON schema for the results is as follows-
[
{
"subdomain": string,
"data": [
"ipv4": string,
"vhosts": [string],
"reverse_dns": string,
"ports": [int]
]
}
]
Example json for nmap.org
can be found here.
Contributions are welcome. Feel free to open a pull request or an issue.
This project is still in its early stages. There are several limitations I am aware of.
The two engines I am using (I'm calling them engine because Sublist3r does so) currently have some sort of response limit. dnsdumpster.com">dnsdumpster can fetch upto 100 subdomains. crt.sh also randomizes the results in case of too many results. Another issue with crt.sh is the fact that it returns some SQL error sometimes. So for some domain, results can be different for different runs. I am planning to add more engines in the future (at least a brute force engine).
The port scanner has only ping response time + 750ms
timeout. This might lead to false negatives. Since, domainim is not meant for port scanning but to provide a quick overview, such cases are acceptable. However, I am planning to add a flag to increase the timeout. For the same reason, filtered ports are not shown. For more comprehensive port scanning, I recommend using Nmap. Domainim also doesn't bypass rate limiting (if there is any).
It might seem that the way vhostnames are printed, it just brings repeition on the table.
Printing as the following might've been better-
ack.nmap.org, issues.nmap.org, nmap.org, research.nmap.org, scannme.nmap.org, svn.nmap.org, www.nmap.org
β³ 45.33.49.119
β³ Reverse DNS: ack.nmap.org.
But previously while testing, I found cases where not all IPs are shared by same set of vhostnames. That is why I decided to keep it this way.
DNS server might have some sort of rate limiting. That's why I added random delays (between 0-300ms) for IPv4 resolving per query. This is to not make the DNS server get all the queries at once but rather in a more natural way. For bruteforcing method, the value is between 0-1000ms by default but that can be changed using --rps | -t
flag.
One particular limitation that is bugging me is that the DNS resolver would not return all the IPs for a domain. So it is necessary to make multiple queries to get all (or most) of the IPs. But then again, it is not possible to know how many IPs are there for a domain. I still have to come up with a solution for this. Also, nim-ndns
doesn't support CNAME records. So, if a domain has a CNAME record, it will not be resolved. I am waiting for a response from the author for this.
For now, bruteforcing is skipped if a possible wildcard subdomain is found. This is because, if a domain has a wildcard subdomain, bruteforcing will resolve IPv4 for all possible subdomains. However, this will skip valid subdomains also (i.e. scanme.nmap.org
will be skipped even though it's not a wildcard value). I will add a --force-brute | -fb
flag later to force bruteforcing.
Similar thing is true for VHost enumeration for subdomain inputs. Since, urls that ends with given subdomains are returned, subdomains of similar domains are not considered. For example, scannme.nmap.org
will not be printed for ack.nmap.org
but something.ack.nmap.org
might be. I can search for all subdomains of nmap.org
but that defeats the purpose of having a subdomains as an input.
MIT License. See LICENSE for full text.
V'ger is an interactive command-line application for post-exploitation of authenticated Jupyter instances with a focus on AI/ML security operations.
pip install vger
vger --help
Currently, vger interactive
has maximum functionality, maintaining state for discovered artifacts and recurring jobs. However, most functionality is also available by-name in non-interactive format with vger <module>
. List available modules with vger --help
.
Once a connection is established, users drop into a nested set of menus.
The top level menu is: - Reset: Configure a different host. - Enumerate: Utilities to learn more about the host. - Exploit: Utilities to perform direct action and manipulation of the host and artifacts. - Persist: Utilities to establish persistence mechanisms. - Export: Save output to a text file. - Quit: No one likes quitters.
These menus contain the following functionality: - List modules: Identify imported modules in target notebooks to determine what libraries are available for injected code. - Inject: Execute code in the context of the selected notebook. Code can be provided in a text editor or by specifying a local .py
file. Either input is processed as a string and executed in runtime of the notebook. - Backdoor: Launch a new JupyterLab instance open to 0.0.0.0
, with allow-root
on a user-specified port
with a user-specified password
. - Check History: See ipython commands recently run in the target notebook. - Run shell command: Spawn a terminal, run the command, return the output, and delete the terminal. - List dir or get file: List directories relative to the Jupyter directory. If you don't know, start with /
. - Upload file: Upload file from localhost to the target. Specify paths in the same format as List dir (relative to the Jupyter directory). Provide a full path including filename and extension. - Delete file: Delete a file. Specify paths in the same format as List dir (relative to the Jupyter directory). - Find models: Find models based on common file formats. - Download models: Download discovered models. - Snoop: Monitor notebook execution and results until timeout. - Recurring jobs: Launch/Kill recurring snippets of code silently run in the target environment.
With pip install vger[ai]
you'll get LLM generated summaries of notebooks in the target environment. These are meant to be rough translation for non-DS/AI folks to do quick triage of if (or which) notebooks are worth investigating further.
There was an inherent tradeoff on model size vs. ability and that's something I'll continue to tinker with, but hopefully this is helpful for some more traditional security users. I'd love to see folks start prompt injecting their notebooks ("these are not the droids you're looking for").
Subdomain takeover is a common vulnerability that allows an attacker to gain control over a subdomain of a target domain and redirect users intended for an organization's domain to a website that performs malicious activities, such as phishing campaigns, stealing user cookies, etc. It occurs when an attacker gains control over a subdomain of a target domain. Typically, this happens when the subdomain has a CNAME in the DNS, but no host is providing content for it. Subhunter takes a given list of Subdomains" title="Subdomains">subdomains and scans them to check this vulnerability.
Download from releases
Build from source:
$ git clone https://github.com/Nemesis0U/Subhunter.git
$ go build subhunter.go
Usage of subhunter:
-l string
File including a list of hosts to scan
-o string
File to save results
-t int
Number of threads for scanning (default 50)
-timeout int
Timeout in seconds (default 20)
./Subhunter -l subdomains.txt -o test.txt
____ _ _ _
/ ___| _ _ | |__ | |__ _ _ _ __ | |_ ___ _ __
\___ \ | | | | | '_ \ | '_ \ | | | | | '_ \ | __| / _ \ | '__|
___) | | |_| | | |_) | | | | | | |_| | | | | | | |_ | __/ | |
|____/ \__,_| |_.__/ |_| |_| \__,_| |_| |_| \__| \___| |_|
A fast subdomain takeover tool
Created by Nemesis
Loaded 88 fingerprints for current scan
-----------------------------------------------------------------------------
[+] Nothing found at www.ubereats.com: Not Vulnerable
[+] Nothing found at testauth.ubereats.com: Not Vulnerable
[+] Nothing found at apple-maps-app-clip.ubereats.com: Not Vulnerable
[+] Nothing found at about.ubereats.com: Not Vulnerable
[+] Nothing found at beta.ubereats.com: Not Vulnerable
[+] Nothing found at ewp.ubereats.com: Not Vulnerable
[+] Nothi ng found at edgetest.ubereats.com: Not Vulnerable
[+] Nothing found at guest.ubereats.com: Not Vulnerable
[+] Google Cloud: Possible takeover found at testauth.ubereats.com: Vulnerable
[+] Nothing found at info.ubereats.com: Not Vulnerable
[+] Nothing found at learn.ubereats.com: Not Vulnerable
[+] Nothing found at merchants.ubereats.com: Not Vulnerable
[+] Nothing found at guest-beta.ubereats.com: Not Vulnerable
[+] Nothing found at merchant-help.ubereats.com: Not Vulnerable
[+] Nothing found at merchants-beta.ubereats.com: Not Vulnerable
[+] Nothing found at merchants-staging.ubereats.com: Not Vulnerable
[+] Nothing found at messages.ubereats.com: Not Vulnerable
[+] Nothing found at order.ubereats.com: Not Vulnerable
[+] Nothing found at restaurants.ubereats.com: Not Vulnerable
[+] Nothing found at payments.ubereats.com: Not Vulnerable
[+] Nothing found at static.ubereats.com: Not Vulnerable
Subhunter exiting...
Results written to test.txt
TL;DR: Galah (/Ι‘ΙΛlΙΛ/ - pronounced 'guh-laa') is an LLM (Large Language Model) powered web honeypot, currently compatible with the OpenAI API, that is able to mimic various applications and dynamically respond to arbitrary HTTP requests.
Named after the clever Australian parrot known for its mimicry, Galah mirrors this trait in its functionality. Unlike traditional web honeypots that rely on a manual and limiting method of emulating numerous web applications or vulnerabilities, Galah adopts a novel approach. This LLM-powered honeypot mimics various web applications by dynamically crafting relevant (and occasionally foolish) responses, including HTTP headers and body content, to arbitrary HTTP requests. Fun fact: in Aussie English, Galah also means fool!
I've deployed a cache for the LLM-generated responses (the cache duration can be customized in the config file) to avoid generating multiple responses for the same request and to reduce the cost of the OpenAI API. The cache stores responses per port, meaning if you probe a specific port of the honeypot, the generated response won't be returned for the same request on a different port.
The prompt is the most crucial part of this honeypot! You can update the prompt in the config file, but be sure not to change the part that instructs the LLM to generate the response in the specified JSON format.
Note: Galah was a fun weekend project I created to evaluate the capabilities of LLMs in generating HTTP messages, and it is not intended for production use. The honeypot may be fingerprinted based on its response time, non-standard, or sometimes weird responses, and other network-based techniques. Use this tool at your own risk, and be sure to set usage limits for your OpenAI API.
Rule-Based Response: The new version of Galah will employ a dynamic, rule-based approach, adding more control over response generation. This will further reduce OpenAI API costs and increase the accuracy of the generated responses.
Response Database: It will enable you to generate and import a response database. This ensures the honeypot only turns to the OpenAI API for unknown or new requests. I'm also working on cleaning up and sharing my own database.
Support for Other LLMs.
config.yaml
file.% git clone git@github.com:0x4D31/galah.git
% cd galah
% go mod download
% go build
% ./galah -i en0 -v
ββββββ βββββ ββ βββββ ββ ββ
ββ ββ ββ ββ ββ ββ ββ ββ
ββ βββ βββββββ ββ βββββββ βββββββ
ββ ββ ββ ββ ββ ββ ββ ββ ββ
ββββββ ββ ββ βββββββ ββ ββ ββ ββ
llm-based web honeypot // version 1.0
author: Adel "0x4D31" Karimi
2024/01/01 04:29:10 Starting HTTP server on port 8080
2024/01/01 04:29:10 Starting HTTP server on port 8888
2024/01/01 04:29:10 Starting HTTPS server on port 8443 with TLS profile: profile1_selfsigned
2024/01/01 04:29:10 Starting HTTPS server on port 443 with TLS profile: profile1_selfsigned
2024/01/01 04:35:57 Received a request for "/.git/config" from [::1]:65434
2024/01/01 04:35:57 Request cache miss for "/.git/config": Not found in cache
2024/01/01 04:35:59 Generated HTTP response: {"Headers": {"Content-Type": "text/plain", "Server": "Apache/2.4.41 (Ubuntu)", "Status": "403 Forbidden"}, "Body": "Forbidden\nYou don't have permission to access this resource."}
2024/01/01 04:35:59 Sending the crafted response to [::1]:65434
^C2024/01/01 04:39:27 Received shutdown signal. Shutting down servers...
2024/01/01 04:39:27 All servers shut down gracefully.
Here are some example responses:
% curl http://localhost:8080/login.php
<!DOCTYPE html><html><head><title>Login Page</title></head><body><form action='/submit.php' method='post'><label for='uname'><b>Username:</b></label><br><input type='text' placeholder='Enter Username' name='uname' required><br><label for='psw'><b>Password:</b></label><br><input type='password' placeholder='Enter Password' name='psw' required><br><button type='submit'>Login</button></form></body></html>
JSON log record:
{"timestamp":"2024-01-01T05:38:08.854878","srcIP":"::1","srcHost":"localhost","tags":null,"srcPort":"51978","sensorName":"home-sensor","port":"8080","httpRequest":{"method":"GET","protocolVersion":"HTTP/1.1","request":"/login.php","userAgent":"curl/7.71.1","headers":"User-Agent: [curl/7.71.1], Accept: [*/*]","headersSorted":"Accept,User-Agent","headersSortedSha256":"cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9","body":"","bodySha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"httpResponse":{"headers":{"Content-Type":"text/html","Server":"Apache/2.4.38"},"body":"\u003c!DOCTYPE html\u003e\u003chtml\u003e\u003chead\u003e\u003ctitle\u003eLogin Page\u003c/title\u003e\u003c/head\u003e\u003cbody\u003e\u003cform action='/submit.php' method='post'\u003e\u003clabel for='uname'\u003e\u003cb\u003eUsername:\u003c/b\u003e\u003c/label\u003e\u003cbr\u003e\u003cinput type='text' placeholder='Enter Username' name='uname' required\u003e\u003cbr\u003e\u003clabel for='psw'\u003e\u003cb\u003ePassword:\u003c/b\u003e\u003c/label\u003e\u003cbr\u003e\u003cinput type='password' placeholder='Enter Password' name='psw' required\u003e\u003cbr\u003e\u003cbutton type='submit'\u003eLogin\u003c/button\u003e\u003c/form\u003e\u003c/body\u003e\u003c/html\u003e"}}
% curl http://localhost:8080/.aws/credentials
[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
region = us-west-2
JSON log record:
{"timestamp":"2024-01-01T05:40:34.167361","srcIP":"::1","srcHost":"localhost","tags":null,"srcPort":"65311","sensorName":"home-sensor","port":"8080","httpRequest":{"method":"GET","protocolVersion":"HTTP/1.1","request":"/.aws/credentials","userAgent":"curl/7.71.1","headers":"User-Agent: [curl/7.71.1], Accept: [*/*]","headersSorted":"Accept,User-Agent","headersSortedSha256":"cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9","body":"","bodySha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"httpResponse":{"headers":{"Connection":"close","Content-Encoding":"gzip","Content-Length":"126","Content-Type":"text/plain","Server":"Apache/2.4.51 (Unix)"},"body":"[default]\naws_access_key_id = AKIAIOSFODNN7EXAMPLE\naws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\nregion = us-west-2"}}
Okay, that was impressive!
Now, let's do some sort of adversarial testing!
% curl http://localhost:8888/are-you-a-honeypot
No, I am a server.`
JSON log record:
{"timestamp":"2024-01-01T05:50:43.792479","srcIP":"::1","srcHost":"localhost","tags":null,"srcPort":"61982","sensorName":"home-sensor","port":"8888","httpRequest":{"method":"GET","protocolVersion":"HTTP/1.1","request":"/are-you-a-honeypot","userAgent":"curl/7.71.1","headers":"User-Agent: [curl/7.71.1], Accept: [*/*]","headersSorted":"Accept,User-Agent","headersSortedSha256":"cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9","body":"","bodySha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"httpResponse":{"headers":{"Connection":"close","Content-Length":"20","Content-Type":"text/plain","Server":"Apache/2.4.41 (Ubuntu)"},"body":"No, I am a server."}}
π
% curl http://localhost:8888/i-mean-are-you-a-fake-server`
No, I am not a fake server.
JSON log record:
{"timestamp":"2024-01-01T05:51:40.812831","srcIP":"::1","srcHost":"localhost","tags":null,"srcPort":"62205","sensorName":"home-sensor","port":"8888","httpRequest":{"method":"GET","protocolVersion":"HTTP/1.1","request":"/i-mean-are-you-a-fake-server","userAgent":"curl/7.71.1","headers":"User-Agent: [curl/7.71.1], Accept: [*/*]","headersSorted":"Accept,User-Agent","headersSortedSha256":"cf69e186169279bd51769f29d122b07f1f9b7e51bf119c340b66fbd2a1128bc9","body":"","bodySha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"},"httpResponse":{"headers":{"Connection":"close","Content-Type":"text/plain","Server":"LocalHost/1.0"},"body":"No, I am not a fake server."}}
You're a galah, mate!
A new approach to Browser In The Browser (BITB) without the use of iframes, allowing the bypass of traditional framebusters implemented by login pages like Microsoft.
This POC code is built for using this new BITB with Evilginx, and a Microsoft Enterprise phishlet.
Before diving deep into this, I recommend that you first check my talk at BSides 2023, where I first introduced this concept along with important details on how to craft the "perfect" phishing attack. βΆ Watch Video
βοΈ Buy Me A Coffee
This tool is for educational and research purposes only. It demonstrates a non-iframe based Browser In The Browser (BITB) method. The author is not responsible for any misuse. Use this tool only legally and ethically, in controlled environments for cybersecurity defense testing. By using this tool, you agree to do so responsibly and at your own risk.
Over the past year, I've been experimenting with different tricks to craft the "perfect" phishing attack. The typical "red flags" people are trained to look for are things like urgency, threats, authority, poor grammar, etc. The next best thing people nowadays check is the link/URL of the website they are interacting with, and they tend to get very conscious the moment they are asked to enter sensitive credentials like emails and passwords.
That's where Browser In The Browser (BITB) came into play. Originally introduced by @mrd0x, BITB is a concept of creating the appearance of a believable browser window inside of which the attacker controls the content (by serving the malicious website inside an iframe). However, the fake URL bar of the fake browser window is set to the legitimate site the user would expect. This combined with a tool like Evilginx becomes the perfect recipe for a believable phishing attack.
The problem is that over the past months/years, major websites like Microsoft implemented various little tricks called "framebusters/framekillers" which mainly attempt to break iframes that might be used to serve the proxied website like in the case of Evilginx.
In short, Evilginx + BITB for websites like Microsoft no longer works. At least not with a BITB that relies on iframes.
A Browser In The Browser (BITB) without any iframes! As simple as that.
Meaning that we can now use BITB with Evilginx on websites like Microsoft.
Evilginx here is just a strong example, but the same concept can be used for other use-cases as well.
Framebusters target iframes specifically, so the idea is to create the BITB effect without the use of iframes, and without disrupting the original structure/content of the proxied page. This can be achieved by injecting scripts and HTML besides the original content using search and replace (aka substitutions), then relying completely on HTML/CSS/JS tricks to make the visual effect. We also use an additional trick called "Shadow DOM" in HTML to place the content of the landing page (background) in such a way that it does not interfere with the proxied content, allowing us to flexibly use any landing page with minor additional JS scripts.
Create a local Linux VM. (I personally use Ubuntu 22 on VMWare Player or Parallels Desktop)
Update and Upgrade system packages:
sudo apt update && sudo apt upgrade -y
Create a new evilginx user, and add user to sudo group:
sudo su
adduser evilginx
usermod -aG sudo evilginx
Test that evilginx user is in sudo group:
su - evilginx
sudo ls -la /root
Navigate to users home dir:
cd /home/evilginx
(You can do everything as sudo user as well since we're running everything locally)
Download and build Evilginx: Official Docs
Copy Evilginx files to /home/evilginx
Install Go: Official Docs
wget https://go.dev/dl/go1.21.4.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.21.4.linux-amd64.tar.gz
nano ~/.profile
ADD: export PATH=$PATH:/usr/local/go/bin
source ~/.profile
Check:
go version
Install make:
sudo apt install make
Build Evilginx:
cd /home/evilginx/evilginx2
make
Create a new directory for our evilginx build along with phishlets and redirectors:
mkdir /home/evilginx/evilginx
Copy build, phishlets, and redirectors:
cp /home/evilginx/evilginx2/build/evilginx /home/evilginx/evilginx/evilginx
cp -r /home/evilginx/evilginx2/redirectors /home/evilginx/evilginx/redirectors
cp -r /home/evilginx/evilginx2/phishlets /home/evilginx/evilginx/phishlets
Ubuntu firewall quick fix (thanks to @kgretzky)
sudo setcap CAP_NET_BIND_SERVICE=+eip /home/evilginx/evilginx/evilginx
On Ubuntu, if you get Failed to start nameserver on: :53
error, try modifying this file
sudo nano /etc/systemd/resolved.conf
edit/add the DNSStubListener
to no
> DNSStubListener=no
then
sudo systemctl restart systemd-resolved
Since we will be using Apache2 in front of Evilginx, we need to make Evilginx listen to a different port than 443.
nano ~/.evilginx/config.json
CHANGE https_port
from 443
to 8443
Install Apache2:
sudo apt install apache2 -y
Enable Apache2 mods that will be used: (We are also disabling access_compat module as it sometimes causes issues)
sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod proxy_balancer
sudo a2enmod lbmethod_byrequests
sudo a2enmod env
sudo a2enmod include
sudo a2enmod setenvif
sudo a2enmod ssl
sudo a2ensite default-ssl
sudo a2enmod cache
sudo a2enmod substitute
sudo a2enmod headers
sudo a2enmod rewrite
sudo a2dismod access_compat
Start and enable Apache:
sudo systemctl start apache2
sudo systemctl enable apache2
Try if Apache and VM networking works by visiting the VM's IP from a browser on the host machine.
Install git if not already available:
sudo apt -y install git
Clone this repo:
git clone https://github.com/waelmas/frameless-bitb
cd frameless-bitb
Make directories for the pages we will be serving:
sudo mkdir /var/www/home
sudo mkdir /var/www/primary
sudo mkdir /var/www/secondary
Copy the directories for each page:
sudo cp -r ./pages/home/ /var/www/
sudo cp -r ./pages/primary/ /var/www/
sudo cp -r ./pages/secondary/ /var/www/
Optional: Remove the default Apache page (not used):
sudo rm -r /var/www/html/
Copy the O365 phishlet to phishlets directory:
sudo cp ./O365.yaml /home/evilginx/evilginx/phishlets/O365.yaml
Optional: To set the Calendly widget to use your account instead of the default I have inside, go to pages/primary/script.js
and change the CALENDLY_PAGE_NAME
and CALENDLY_EVENT_TYPE
.
Note on Demo Obfuscation: As I explain in the walkthrough video, I included a minimal obfuscation for text content like URLs and titles of the BITB. You can open the demo obfuscator by opening demo-obfuscator.html
in your browser. In a real-world scenario, I would highly recommend that you obfuscate larger chunks of the HTML code injected or use JS tricks to avoid being detected and flagged. The advanced version I am working on will use a combination of advanced tricks to make it nearly impossible for scanners to fingerprint/detect the BITB code, so stay tuned.
Since we are running everything locally, we need to generate self-signed SSL certificates that will be used by Apache. Evilginx will not need the certs as we will be running it in developer mode.
We will use the domain fake.com
which will point to our local VM. If you want to use a different domain, make sure to change the domain in all files (Apache conf files, JS files, etc.)
Create dir and parents if they do not exist:
sudo mkdir -p /etc/ssl/localcerts/fake.com/
Generate the SSL certs using the OpenSSL config file:
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/ssl/localcerts/fake.com/privkey.pem -out /etc/ssl/localcerts/fake.com/fullchain.pem \
-config openssl-local.cnf
Modify private key permissions:
sudo chmod 600 /etc/ssl/localcerts/fake.com/privkey.pem
Copy custom substitution files (the core of our approach):
sudo cp -r ./custom-subs /etc/apache2/custom-subs
Important Note: In this repo I have included 2 substitution configs for Chrome on Mac and Chrome on Windows BITB. Both have auto-detection and styling for light/dark mode and they should act as base templates to achieve the same for other browser/OS combos. Since I did not include automatic detection of the browser/OS combo used to visit our phishing page, you will have to use one of two or implement your own logic for automatic switching.
Both config files under /apache-configs/
are the same, only with a different Include directive used for the substitution file that will be included. (there are 2 references for each file)
# Uncomment the one you want and remember to restart Apache after any changes:
#Include /etc/apache2/custom-subs/win-chrome.conf
Include /etc/apache2/custom-subs/mac-chrome.conf
Simply to make it easier, I included both versions as separate files for this next step.
Windows/Chrome BITB:
sudo cp ./apache-configs/win-chrome-bitb.conf /etc/apache2/sites-enabled/000-default.conf
Mac/Chrome BITB:
sudo cp ./apache-configs/mac-chrome-bitb.conf /etc/apache2/sites-enabled/000-default.conf
Test Apache configs to ensure there are no errors:
sudo apache2ctl configtest
Restart Apache to apply changes:
sudo systemctl restart apache2
Get the IP of the VM using ifconfig
and note it somewhere for the next step.
We now need to add new entries to our hosts file, to point the domain used in this demo fake.com
and all used subdomains to our VM on which Apache and Evilginx are running.
On Windows:
Open Notepad as Administrator (Search > Notepad > Right-Click > Run as Administrator)
Click on the File option (top-left) and in the File Explorer address bar, copy and paste the following:
C:\Windows\System32\drivers\etc\
Change the file types (bottom-right) to "All files".
Double-click the file named hosts
On Mac:
Open a terminal and run the following:
sudo nano /private/etc/hosts
Now modify the following records (replace [IP]
with the IP of your VM) then paste the records at the end of the hosts file:
# Local Apache and Evilginx Setup
[IP] login.fake.com
[IP] account.fake.com
[IP] sso.fake.com
[IP] www.fake.com
[IP] portal.fake.com
[IP] fake.com
# End of section
Save and exit.
Now restart your browser before moving to the next step.
Note: On Mac, use the following command to flush the DNS cache:
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
This demo is made with the provided Office 365 Enterprise phishlet. To get the host entries you need to add for a different phishlet, use phishlet get-hosts [PHISHLET_NAME]
but remember to replace the 127.0.0.1
with the actual local IP of your VM.
Since we are using self-signed SSL certificates, our browser will warn us every time we try to visit fake.com
so we need to make our host machine trust the certificate authority that signed the SSL certs.
For this step, it's easier to follow the video instructions, but here is the gist anyway.
Open https://fake.com/ in your Chrome browser.
Ignore the Unsafe Site warning and proceed to the page.
Click the SSL icon > Details > Export Certificate IMPORTANT: When saving, the name MUST end with .crt for Windows to open it correctly.
Double-click it > install for current user. Do NOT select automatic, instead place the certificate in specific store: select "Trusted Route Certification Authorities".
On Mac: to install for current user only > select "Keychain: login" AND click on "View Certificates" > details > trust > Always trust
Now RESTART your Browser
You should be able to visit https://fake.com
now and see the homepage without any SSL warnings.
At this point, everything should be ready so we can go ahead and start Evilginx, set up the phishlet, create our lure, and test it.
Optional: Install tmux (to keep evilginx running even if the terminal session is closed. Mainly useful when running on remote VM.)
sudo apt install tmux -y
Start Evilginx in developer mode (using tmux to avoid losing the session):
tmux new-session -s evilginx
cd ~/evilginx/
./evilginx -developer
(To re-attach to the tmux session use tmux attach-session -t evilginx
)
Evilginx Config:
config domain fake.com
config ipv4 127.0.0.1
IMPORTANT: Set Evilginx Blacklist mode to NoAdd to avoid blacklisting Apache since all requests will be coming from Apache and not the actual visitor IP.
blacklist noadd
Setup Phishlet and Lure:
phishlets hostname O365 fake.com
phishlets enable O365
lures create O365
lures get-url 0
Copy the lure URL and visit it from your browser (use Guest user on Chrome to avoid having to delete all saved/cached data between tests).
Original iframe-based BITB by @mrd0x: https://github.com/mrd0x/BITB
Evilginx Mastery Course by the creator of Evilginx @kgretzky: https://academy.breakdev.org/evilginx-mastery
My talk at BSides 2023: https://www.youtube.com/watch?v=p1opa2wnRvg
How to protect Evilginx using Cloudflare and HTML Obfuscation: https://www.jackphilipbutton.com/post/how-to-protect-evilginx-using-cloudflare-and-html-obfuscation
Evilginx resources for Microsoft 365 by @BakkerJan: https://janbakker.tech/evilginx-resources-for-microsoft-365/
SiCat is an advanced exploit search tool designed to identify and gather information about exploits from both open sources and local repositories effectively. With a focus on cybersecurity, SiCat allows users to quickly search online, finding potential vulnerabilities and relevant exploits for ongoing projects or systems.
SiCat's main strength lies in its ability to traverse both online and local resources to collect information about relevant exploitations. This tool aids cybersecurity professionals and researchers in understanding potential security risks, providing valuable insights to enhance system security.
git clone https://github.com/justakazh/sicat.git && cd sicat
pip install -r requirements.txt
~$ python sicat.py --help
Command | Description |
---|---|
-h | Show help message and exit |
-k KEYWORD | |
-kv KEYWORK_VERSION | |
-nm | Identify via nmap output |
--nvd | Use NVD as info source |
--packetstorm | Use PacketStorm as info source |
--exploitdb | Use ExploitDB as info source |
--exploitalert | Use ExploitAlert as info source |
--msfmoduke | Use metasploit as info source |
-o OUTPUT | Path to save output to |
-ot OUTPUT_TYPE | Output file type: json or html |
From keyword
python sicat.py -k telerik --exploitdb --msfmodule
From nmap output
nmap --open -sV localhost -oX nmap_out.xml
python sicat.py -nm nmap_out.xml --packetstorm
I'm aware that perfection is elusive in coding. If you come across any bugs, feel free to contribute by fixing the code or suggesting new features. Your input is always welcomed and valued.
AttackGen is a cybersecurity incident response testing tool that leverages the power of large language models and the comprehensive MITRE ATT&CK framework. The tool generates tailored incident response scenarios based on user-selected threat actor groups and your organisation's details.
If you find AttackGen useful, please consider starring the repository on GitHub. This helps more people discover the tool. Your support is greatly appreciated! β
What's new? | Why is it useful? |
---|---|
Mistral API Integration | - Alternative Model Provider: Users can now leverage the Mistral AI models to generate incident response scenarios. This integration provides an alternative to the OpenAI and Azure OpenAI Service models, allowing users to explore and compare the performance of different language models for their specific use case. |
Local Model Support using Ollama | - Local Model Hosting: AttackGen now supports the use of locally hosted LLMs via an integration with Ollama. This feature is particularly useful for organisations with strict data privacy requirements or those who prefer to keep their data on-premises. Please note that this feature is not available for users of the AttackGen version hosted on Streamlit Community Cloud at https://attackgen.streamlit.app |
Optional LangSmith Integration | - Improved Flexibility: The integration with LangSmith is now optional. If no LangChain API key is provided, users will see an informative message indicating that the run won't be logged by LangSmith, rather than an error being thrown. This change improves the overall user experience and allows users to continue using AttackGen without the need for LangSmith. |
Various Bug Fixes and Improvements | - Enhanced User Experience: This release includes several bug fixes and improvements to the user interface, making AttackGen more user-friendly and robust. |
What's new? | Why is it useful? |
---|---|
Azure OpenAI Service Integration | - Enhanced Integration: Users can now choose to utilise OpenAI models deployed on the Azure OpenAI Service, in addition to the standard OpenAI API. This integration offers a seamless and secure solution for incorporating AttackGen into existing Azure ecosystems, leveraging established commercial and confidentiality agreements. - Improved Data Security: Running AttackGen from Azure ensures that application descriptions and other data remain within the Azure environment, making it ideal for organizations that handle sensitive data in their threat models. |
LangSmith for Azure OpenAI Service | - Enhanced Debugging: LangSmith tracing is now available for scenarios generated using the Azure OpenAI Service. This feature provides a powerful tool for debugging, testing, and monitoring of model performance, allowing users to gain insights into the model's decision-making process and identify potential issues with the generated scenarios. - User Feedback: LangSmith also captures user feedback on the quality of scenarios generated using the Azure OpenAI Service, providing valuable insights into model performance and user satisfaction. |
Model Selection for OpenAI API | - Flexible Model Options: Users can now select from several models available from the OpenAI API endpoint, such as gpt-4-turbo-preview . This allows for greater customization and experimentation with different language models, enabling users to find the most suitable model for their specific use case. |
Docker Container Image | - Easy Deployment: AttackGen is now available as a Docker container image, making it easier to deploy and run the application in a consistent and reproducible environment. This feature is particularly useful for users who want to run AttackGen in a containerised environment, or for those who want to deploy the application on a cloud platform. |
What's new? | Why is it useful? |
---|---|
Custom Scenarios based on ATT&CK Techniques | - For Mature Organisations: This feature is particularly beneficial if your organisation has advanced threat intelligence capabilities. For instance, if you're monitoring a newly identified or lesser-known threat actor group, you can tailor incident response testing scenarios specific to the techniques used by that group. - Focused Testing: Alternatively, use this feature to focus your incident response testing on specific parts of the cyber kill chain or certain MITRE ATT&CK Tactics like 'Lateral Movement' or 'Exfiltration'. This is useful for organisations looking to evaluate and improve specific areas of their defence posture. |
User feedback on generated scenarios | - Collecting feedback is essential to track model performance over time and helps to highlight strengths and weaknesses in scenario generation tasks. |
Improved error handling for missing API keys | - Improved user experience. |
Replaced Streamlit st.spinner widgets with new st.status widget | - Provides better visibility into long running processes (i.e. scenario generation). |
Initial release.
langchain
and mitreattack
).enterprise-attack.json
(MITRE ATT&CK dataset in STIX format) and groups.json
.git clone https://github.com/mrwadams/attackgen.git
cd attackgen
pip install -r requirements.txt
docker pull mrwadams/attackgen
If you would like to use LangSmith for debugging, testing, and monitoring of model performance, you will need to set up a LangSmith account and create a .streamlit/secrets.toml
file that contains your LangChain API key. Please follow the instructions here to set up your account and obtain your API key. You'll find a secrets.toml-example
file in the .streamlit/
directory that you can use as a template for your own secrets.toml file.
If you do not wish to use LangSmith, you must still have a .streamlit/secrets.toml
file in place, but you can leave the LANGCHAIN_API_KEY
field empty.
Download the latest version of the MITRE ATT&CK dataset in STIX format from here. Ensure to place this file in the ./data/
directory within the repository.
After the data setup, you can run AttackGen with the following command:
streamlit run π_Welcome.py
You can also try the app on Streamlit Community Cloud.
streamlit run π_Welcome.py
docker run -p 8501:8501 mrwadams/attackgen
This command will start the container and map port 8501 (default for Streamlit apps) from the container to your host machine. 2. Open your web browser and navigate to http://localhost:8501
. 3. Use the app to generate standard or custom incident response scenarios (see below for details).
Threat Group Scenarios
page..streamlit/secrets.toml
file.Custom Scenario
page..streamlit/secrets.toml
file.Please note that generating scenarios may take a minute or so. Once the scenario is generated, you can view it on the app and also download it as a Markdown file.
I'm very happy to accept contributions to this project. Please feel free to submit an issue or pull request.
This project is licensed under GNU GPLv3.
This library was developed to combat insecure methods of storing random data into modern C++ containers. For example, old and clunky PRNGs. Thus, rrgen uses STL's distribution engines in order to efficiently and safely store a random number distribution into a given C++ container.
1) git clone https://github.com/josh0xA/rrgen.git
2) cd rrgen
3) make
4) Add include/rrgen.hpp
to your project tree for access to the library classes and functions.
rrgen/docs/index.rst
1) std::vector<>
2) std::list<>
3) std::array<>
4) std::stack<>
#include "../include/rrgen.hpp"
#include <iostream>
int main(void)
{
// Example usage for rrgen vector
rrgen::rrand<float, std::vector, 10> rrvec;
rrvec.gen_rrvector(false, true, 0, 10);
for (auto &i : rrvec.contents())
{
std::cout << i << " ";
} // ^ the same as rrvec.show_contents()
// Example usage for rrgen list (frontside insertion)
rrgen::rrand<int, std::list, 10> rrlist;
rrlist.gen_rrlist(false, true, "fside", 5, 25);
std::cout << '\n'; rrlist.show_contents();
std::cout << "Size: " << rrlist.contents().size() << '\n';
// Example usage for rrgen array
rrgen::rrand_array<int, 5> rrarr;
rrarr.gen_rrarray(false, true, 5, 35);
for (auto &i : rrarr.contents())
{
std::cout << i << " ";
} // ^ the same as rrarr. show_contents()
// Example usage for rrgen stack
rrgen::rrand_stack<float, 10> rrstack;
rrstack.gen_rrstack(false, true, 200, 1000);
for (auto m = rrstack.xsize(); m > 0; m--)
{
std::cout << rrstack.grab_top() << " ";
rrstack.pop_off();
if (m == 1) { std::cout << '\n'; }
}
}
Note: This is a transferred repository, from a completely unrelated project.
Pentest Muse is an AI assistant tailored for cybersecurity professionals. It can help penetration testers brainstorm ideas, write payloads, analyze code, and perform reconnaissance. It can also take actions, execute command line codes, and iteratively solve complex tasks.
In addition to this command-line tool, we are excited to introduce the Pentest Muse Web Application! The web app has access to the latest online information, and would be a good AI assistant for your pentesting job.
This tool is intended for legal and ethical use only. It should only be used for authorized security testing and educational purposes. The developers assume no liability and are not responsible for any misuse or damage caused by this program.
requirements.txt
git clone https://github.com/pentestmuse-ai/PentestMuse cd PentestMuse
pip install -r requirements.txt
Install Pentest Muse as a Python Package:
pip install .
In the chat mode, you can chat with pentest muse and ask it to help you brainstorm ideas, write payloads, and analyze code. Run the application with:
python run_app.py
or
pmuse
You can also give Pentest Muse more control by asking it to take actions for you with the agent mode. In this mode, Pentest Muse can help you finish a simple task (e.g., 'help me do sql injection test on url xxx'). To start the program with agent model, you can use:
python run_app.py agent
or
pmuse agent
You can use Pentest Muse with our managed APIs after signing up at www.pentestmuse.ai/signup. After creating an account, you can simply start the pentest muse cli, and the program will prompt you to login.
Alternatively, you can also choose to use your own OpenAI API keys. To do this, you can simply add argument --openai-api-key=[your openai api key]
when starting the program.
For any feedback or suggestions regarding Pentest Muse, feel free to reach out to us at contact@pentestmuse.ai or join our discord. Your input is invaluable in helping us improve and evolve.
skytrack is a command-line based plane spotting and aircraft OSINT reconnaissanceΒ tool made using Python. It can gather aircraft information using various data sources, generate a PDF report for a specified aircraft, and convert between ICAO and Tail Number designations. Whether you are a hobbyist plane spotter or an experienced aircraft analyst, skytrack can help you identify and enumerate aircraft for general purposeΒ reconnaissance.
Planespotting is the art of tracking down and observing aircraft. While planespotting mostly consists of photography and videography of aircraft, aircraft informationΒ gathering and OSINT is a crucial step in the planespotting process. OSINT (Open Source Intelligence) describes a methodology of using publicy accessible data sources to obtain data about a specific subject β in this case planes!
To run skytrack on your machine, follow the steps below:
$ git clone https://github.com/ANG13T/skytrack
$ cd skytrack
$ pip install -r requirements.txt
$ python skytrack.py
skytrack works best for Python version 3.
skytrack features three main functions for aircraft information
gathering and display options. They include the following:skytrack obtains general information about the aircraft given its tail number or ICAO designator. The tool sources this information using several reliable data sets. Once the data is collected, it is displayed in the terminal within a table layout.
skytrack also enables you the save the collected aircraft information into a PDF. The PDF includes all the aircraft data in a visual layout for later reference. The PDF report will be entitled "skytrack_report.pdf"
There are two standard identification formats for specifying aircraft: Tail Number and ICAO Designation. The tail number (aka N-Number) is an alphanumerical ID starting with the letter "N" used to identify aircraft. The ICAO type designation is a six-character fixed-length ID in the hexadecimal format. Both standards are highly pertinent for aircraft
reconnaissance as they both can be used to search for a specific aircraft in data sources. However, converting them from one format to another can be rather cumbersome as it follows a tricky algorithm. To streamline this process, skytrack includes a standard converter.ICAO and Tail Numbers follow a mapping system like the following:
ICAO address N-Number (Tail Number)
a00001 N1
a00002 N1A
a00003 N1AA
You can learn more about aircraft registration numbers [here](https://www.faa.gov/licenses_certificates/aircraft_certification/aircraft_registry/special_nnumbers):warning: Converter only works for USA-registered aircraft
ICAO Aircraft Type Designators Listings
skytrack is open to any contributions. Please fork the repository and make a pull request with the features or fixes you want to implement.
If you enjoyed skytrack, please consider becoming a sponsor or donating on buymeacoffee in order to fund my future projects.
To check out my other works, visit my GitHub profile.
During reconaissance phase or when doing OSINT , we often use google dorking and shodan and thus the idea of Dorkish.
Dorkish is a Chrome extension tool that facilitates custom dork creation for Google and Shodan using the builder and it offers prebuilt dorks for efficient reconnaissance and OSINT engagement.
1- Clone the repository
git clone https://github.com/yousseflahouifi/dorkish.git
2- Go to chrome://extensions/ and enable the Developer mode in the top right corner.
3- click on Load unpacked extension button and select the dorkish folder.
Note: For firefox users , you can find the extension here : https://addons.mozilla.org/en-US/firefox/addon/dorkish/
Once you have found or built the dork you need, simply click it and click search. This will direct you to the desired search engine, Shodan or Google, with the specific dork you've entered. Then, you can explore and enjoy the results that match your query.
I have built some dorks and I have used some public resources to gather the dorks , here's few : - https://github.com/lothos612/shodan - https://github.com/TakSec/google-dorks-bug-bounty
SharpCovertTube is a program created to control Windows systems remotely by uploading videos to Youtube.
The program monitors a Youtube channel until a video is uploaded, decodes the QR code from the thumbnail of the uploaded video and executes a command. The QR codes in the videos can use cleartext or AES-encrypted values.
It has two versions, binary and service binary, and it includes a Python script to generate the malicious videos. Its purpose is to serve as a persistence method using only web requests to the Google API.
Run the listener in your Windows system:
It will check the Youtube channel every a specific amount of time (10 minutes by default) until a new video is uploaded. In this case, we upload "whoami.avi" from the folder example-videos:
After finding there is a new video in the channel, it decodes the QR code from the video thumbnail, executes the command and the response is base64-encoded and exfiltrated using DNS:
This works also for QR codes with AES-encrypted payloads and longer command responses. In this example, the file "dirtemp_aes.avi" from example-videos is uploaded and the content of c:\temp is exfiltrated using several DNS queries:
Logging to a file is optional but you must check the folder for that file exists in the system, the default value is "c:\temp\.sharpcoverttube.log". DNS exfiltration is also optional and can be tested using Burp's collaborator:
As an alternative, I created this repository with scripts to monitor and parse the base64-encoded DNS queries containing the command responses.
There are some values you can change, you can find them in Configuration.cs file for the regular binary and the service binary. Only the first two have to be updated:
You can generate the videos from Windows using Python3. For that, first install the dependencies:
pip install Pillow opencv-python pyqrcode pypng pycryptodome rebus
Then run the generate_video.py script:
python generate_video.py -t TYPE -f FILE -c COMMAND [-k AESKEY] [-i AESIV]
TYPE (-t) must be "qr" for payloads in cleartext or "qr_aes" if using AES encryption.
FILE (-f) is the path where the video is generated.
COMMAND (-c) is the command to execute in the system.
AESKEY (-k) is the key for AES encryption, only necessary if using the type "qr_aes". It must be a string of 16 characters and the same as in Program.cs file in SharpCovertTube.
AESIV (-i) is the IV for AES encryption, only necessary if using the type "qr_aes". It must be a string of 16 characters and the same as in Program.cs file in SharpCovertTube.
Generate a video with a QR value of "whoami" in cleartext in the path c:\temp\whoami.avi:
python generate_video.py -t qr -f c:\temp\whoami.avi -c whoami
Generate a video with an AES-encrypted QR value of "dir c:\windows\temp" with the key and IV "0000000000000000" in the path c:\temp\dirtemp_aes.avi:
python generate_video.py -t qr_aes -f c:\temp\dirtemp_aes.avi -c "dir c:\windows\temp" -k 0000000000000000 -i 0000000000000000
You can find the code to run it as a service in the SharpCovertTube_Service folder. It has the same functionalities except self-deletion, which would not make sense in this case.
It possible to install it with InstallUtil, it is prepared to run as the SYSTEM user and you need to install it as administrator:
InstallUtil.exe SharpCovertTube_Service.exe
You can then start it with:
net start "SharpCovertTube Service"
In case you have administrative privileges this may be stealthier than the ordinary binary, but the "Description" and "DisplayName" should be updated (as you can see in the image above). If you do not have those privileges you can not install services so you can only use the ordinary binary.
File must be 64 bits!!! This is due to the code used for QR decoding, which is borrowed from Stefan Gansevles's QR-Capture project, who borrowed part of it from Uzi Granot's QRCode project, who at the same time borrowed part of it from Zakhar Semenov's Camera_Net project (then I lost track). So thanks to all of them!
This project is a port from covert-tube, a project I developed in 2021 using just Python, which was inspired by Welivesecurity blogs about Casbaneiro and Numando malwares.
pip3 install swaggerhole
or cloning this repository and running git clone https://github.com/Liodeus/swaggerHole.git
pip3 install .
_____ _ __ ____ _ ____ _ ____ _ ___ _____
/ ___/| | /| / // __ `// __ `// __ `// _ \ / ___/
(__ ) | |/ |/ // /_/ // /_/ // /_/ // __// /
/____/ |__/|__/ \__,_/ \__, / \__, / \___//_/
__ __ __ /____/ /____/
/ / / /____ / /___
/ /_/ // __ \ / // _ \
/ __ // /_/ // // __/
/_/ /_/ \____//_/ \___/
usage: swaggerhole [-h] [-s SEARCH] [-o OUT] [-t THREADS] [-j] [-q] [-du] [-de]
optional arguments:
-h, --help show this help message and exit
-s SEARCH, --search SEARCH
Term to search
-o OUT, --out OUT Output directory
-t THREADS, --threads THREADS
Threads number (Default 25)
-j, --json Json ouput
-q, --quiet Remove banner
-du, --deactivate_url
Deactivate the URL filtering
-de, --deactivate_email
Deactivate the email filtering
swaggerHole -s test.com
echo test.com | swaggerHole
swaggerHole -s test.com --json
echo test.com | swaggerHole --json
swaggerHole -s test.com -t 100
echo test.com | swaggerHole -t 100
RepoReaper is a precision tool designed to automate the identification of exposed .git
repositories across a list of domains and subdomains. By processing a user-provided text file with domain names, RepoReaper systematically checks each for publicly accessible .git
files. This enables rapid assessment and protection against information leaks, making RepoReaper an essential resource for security teams and web developers.
.git
repositories.Clone the repository and install the required dependencies:
git clone https://github.com/YourUsername/RepoReaper.git
cd RepoReaper
pip install -r requirements.txt
chmod +x RepoReaper.py
RepoReaper is executed from the command line and will prompt for the path to a file containing a list of domains or subdomains to be scanned.
To start RepoReaper, simply run:
./RepoReaper.py
or
python3 RepoReaper.py
Upon execution, RepoReaper will ask for the path to the file containing the domains or subdomains: Enter the path of the file containing domains
Provide the path to your text file when prompted. The file should contain one domain or subdomain per line, like so:
example.com
subdomain.example.com
anotherdomain.com
RepoReaper will then proceed to scan the provided domains or subdomains for exposed .git repositories and report its findings.Β
This tool is intended for educational purposes and security research only. The user assumes all responsibility for any damages or misuse resulting from its use.
AzSubEnum is a specialized subdomain enumeration tool tailored for Azure services. This tool is designed to meticulously search and identify subdomains associated with various Azure services. Through a combination of techniques and queries, AzSubEnum delves into the Azure domain structure, systematically probing and collecting subdomains related to a diverse range of Azure services.
AzSubEnum operates by leveraging DNS resolution techniques and systematic permutation methods to unveil subdomains associated with Azure services such as Azure App Services, Storage Accounts, Azure Databases (including MSSQL, Cosmos DB, and Redis), Key Vaults, CDN, Email, SharePoint, Azure Container Registry, and more. Its functionality extends to comprehensively scanning different Azure service domains to identify associated subdomains.
With this tool, users can conduct thorough subdomain enumeration within Azure environments, aiding security professionals, researchers, and administrators in gaining insights into the expansive landscape of Azure services and their corresponding subdomains.
During my learning journey on Azure AD exploitation, I discovered that the Azure subdomain tool, Invoke-EnumerateAzureSubDomains from NetSPI, was unable to run on my Debian PowerShell. Consequently, I created a crude implementation of that tool in Python.
β AzSubEnum git:(main) β python3 azsubenum.py --help
usage: azsubenum.py [-h] -b BASE [-v] [-t THREADS] [-p PERMUTATIONS]
Azure Subdomain Enumeration
options:
-h, --help show this help message and exit
-b BASE, --base BASE Base name to use
-v, --verbose Show verbose output
-t THREADS, --threads THREADS
Number of threads for concurrent execution
-p PERMUTATIONS, --permutations PERMUTATIONS
File containing permutations
Basic enumeration:
python3 azsubenum.py -b retailcorp --thread 10
Using permutation wordlists:
python3 azsubenum.py -b retailcorp --thread 10 --permutation permutations.txt
With verbose output:
python3 azsubenum.py -b retailcorp --thread 10 --permutation permutations.txt --verbose
This repo contains the code for our USENIX Security '23 paper "ARGUS: A Framework for Staged Static Taint Analysis of GitHub Workflows and Actions". Argus is a comprehensive security analysis tool specifically designed for GitHub Actions. Built with an aim to enhance the security of CI/CD workflows, Argus utilizes taint-tracking techniques and an impact classifier to detect potential vulnerabilities in GitHub Action workflows.
Visit our website - secureci.org for more information.
Taint-Tracking: Argus uses sophisticated algorithms to track the flow of potentially untrusted data from specific sources to security-critical sinks within GitHub Actions workflows. This enables the identification of vulnerabilities that could lead to code injection attacks.
Impact Classifier: Argus classifies identified vulnerabilities into High, Medium, and Low severity classes, providing a clearer understanding of the potential impact of each identified vulnerability. This is crucial in prioritizing mitigation efforts.
This Python script provides a command line interface for interacting with GitHub repositories and GitHub actions.
python argus.py --mode [mode] --url [url] [--output-folder path_to_output] [--config path_to_config] [--verbose] [--branch branch_name] [--commit commit_hash] [--tag tag_name] [--action-path path_to_action] [--workflow-path path_to_workflow]
--mode
: The mode of operation. Choose either 'repo' or 'action'. This parameter is required.--url
: The GitHub URL. Use USERNAME:TOKEN@URL
for private repos. This parameter is required.--output-folder
: The output folder. The default value is '/tmp'. This parameter is optional.--config
: The config file. This parameter is optional.--verbose
: Verbose mode. If this option is provided, the logging level is set to DEBUG. Otherwise, it is set to INFO. This parameter is optional.--branch
: The branch name. You must provide exactly one of: --branch
, --commit
, --tag
. This parameter is optional.--commit
: The commit hash. You must provide exactly one of: --branch
, --commit
, --tag
. This parameter is optional.--tag
: The tag. You must provide exactly one of: --branch
, --commit
, --tag
. This parameter is optional.--action-path
: The (relative) path to the action. You cannot provide --action-path
in repo mode. This parameter is optional.--workflow-path
: The (relative) path to the workflow. You cannot provide --workflow-path
in action mode. This parameter is optional.To use this script to interact with a GitHub repo, you might run a command like the following:
python argus.py --mode repo --url https://github.com/username/repo.git --branch master
This would run the script in repo mode on the master branch of the specified repository.
Argus can be run inside a docker container. To do so, follow the steps:
results
folderYou can view SARIF results either through an online viewer or with a Visual Studio Code (VSCode) extension.
Online Viewer: The SARIF Web Viewer is an online tool that allows you to visualize SARIF files. You can upload your SARIF file (argus_report.sarif
) directly to the website to view the results.
VSCode Extension: If you prefer to use VSCode, you can install the SARIF Viewer extension. After installing the extension, you can open your SARIF file (argus_report.sarif
) in VSCode. The results will appear in the SARIF Explorer pane, which provides a detailed and navigable view of the results.
Remember to handle the SARIF file with care, especially if it contains sensitive information from your codebase.
If there is an issue with needing the Github authorization for running, you can provide username:TOKEN
in the GITHUB_CREDS
environment variable. This will be used for all the requests made to Github. Note, we do not store this information anywhere, neither create any thing in the Github account - we only use this for cloning the repositories.
Argus is an open-source project, and we welcome contributions from the community. Whether it's reporting a bug, suggesting a feature, or writing code, your contributions are always appreciated!
If you use Argus in your research, please cite our paper:
@inproceedings{muralee2023Argus,
title={ARGUS: A Framework for Staged Static Taint Analysis of GitHub Workflows and Actions},
author={S. Muralee, I. Koishybayev, A. Nahapetyan, G. Tystahl, B. Reaves, A. Bianchi, W. Enck,
A. Kapravelos, A. Machiry},
booktitle={32st USENIX Security Symposium (USENIX Security 23)},
year={2023},
}
To know more about our Attack Surface
Management platform, check out NVADR.
Airgorah
is a WiFi auditing software that can discover the clients connected to an access point, perform deauthentication attacks against specific clients or all the clients connected to it, capture WPA handshakes, and crack the password of the access point.
It is written in Rust and uses GTK4 for the graphical part. The software is mainly based on aircrack-ng tools suite.
β Don't forget to put a star if you like the project!
This software only works on linux
and requires root
privileges to run.
You will also need a wireless network card that supports monitor mode
and packet injection
.
The installation instructions are available here.
The documentation about the usage of the application is available here.
This project is released under MIT license.
If you have any question about the usage of the application, do not hesitate to open a discussion
If you want to report a bug or provide a feature, do not hesitate to open an issue or submit a pull request
Introducing Uscrapper 2.0, A powerfull OSINT webscrapper that allows users to extract various personal information from a website. It leverages web scraping techniques and regular expressions to extract email addresses, social media links, author names, geolocations, phone numbers, and usernames from both hyperlinked and non-hyperlinked sources on the webpage, supports multithreading to make this process faster, Uscrapper 2.0 is equipped with advanced Anti-webscrapping bypassing modules and supports webcrawling to scrape from various sublinks within the same domain. The tool also provides an option to generate a report containing the extracted details.
Uscrapper extracts the following details from the provided website:
Uscrapper 2.0:
git clone https://github.com/z0m31en7/Uscrapper.git
cd Uscrapper/install/
chmod +x ./install.sh && ./install.sh #For Unix/Linux systems
To run Uscrapper, use the following command-line syntax:
python Uscrapper-v2.0.py [-h] [-u URL] [-c (INT)] [-t THREADS] [-O] [-ns]
Arguments:
Uscrapper relies on web scraping techniques to extract information from websites. Make sure to use it responsibly and in compliance with the website's terms of service and applicable laws.
The accuracy and completeness of the extracted details depend on the structure and content of the website being analyzed.
To bypass some Anti-Webscrapping methods we have used selenium which can make the overall process slower.
WebCopilot is an automation tool designed to enumerate subdomains of the target and detect bugs using different open-source tools.
The script first enumerate all the subdomains of the given target domain using assetfinder, sublister, subfinder, amass, findomain, hackertarget, riddler and crt then do active subdomain enumeration using gobuster from SecLists wordlist then filters out all the live subdomains using dnsx then it extract titles of the subdomains using httpx & scans for subdomain takeover using subjack. Then it uses gauplus & waybackurls to crawl all the endpoints of the given subdomains then it use gf patterns to filters out xss, lfi, ssrf, sqli, open redirect & rce parameters from that given subdomains, and then it scans for vulnerabilities on the sub domains using different open-source tools (like kxss, dalfox, openredirex, nuclei, etc). Then it'll print out the result of the scan and save all the output in a specified directory.
g!2m0:~ webcopilot -h
βββββββββββββββββ
ββββββββββββββββββ
ββββββββββββββββββββββ
ββββββββββββ¬βββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββ¦βββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββ
βββββββββββββββββββββββββββββ¦βββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββ
[β] @h4r5h1t.hrs | G!2m0
Usage:
webcopilot -d <target>
webcopilot -d <target> -s
webcopilot [-d target] [-o output destination] [-t threads] [-b blind server URL] [-x exclude domains]
Flags:
-d Add your target [Requried]
-o To save outputs in folder [Default: domain.com]
-t Number of threads [Default: 100]
-b Add your server for BXSS [Default: False]
-x Exclude out of scope domains [Default: False]
-s Run only Subdomain Enumeration [Default: False]
-h Show this help message
Example: webcopilot -d domain.com -o domain -t 333 -x exclude.txt -b testServer.xss
Use https://xsshunter.com/ or https://interact.projectdiscovery.io/ to get your server
WebCopilot requires git to install successfully. Run the following command as a root to install webcopilot
git clone https://github.com/h4r5h1t/webcopilot && cd webcopilot/ && chmod +x webcopilot install.sh && mv webcopilot /usr/bin/ && ./install.sh
SubFinder β’ Sublist3r β’ Findomain β’ gf β’ OpenRedireX β’ dnsx β’ sqlmap β’ gobuster β’ assetfinder β’ httpx β’ kxss β’ qsreplace β’ Nuclei β’ dalfox β’ anew β’ jq β’ aquatone β’ urldedupe β’ Amass β’ gauplus β’ waybackurls β’ crlfuzz
To run the tool on a target, just use the following command.
g!2m0:~ webcopilot -d bugcrowd.com
The -o
command can be used to specify an output dir.
g!2m0:~ webcopilot -d bugcrowd.com -o bugcrowd
The -s
command can be used for only subdomain enumerations (Active + Passive and also get title & screenshots).
g!2m0:~ webcopilot -d bugcrowd.com -o bugcrowd -s
The -t
command can be used to add thrads to your scan for faster result.
g!2m0:~ webcopilot -d bugcrowd.com -o bugcrowd -t 333
The -b
command can be used for blind xss (OOB), you can get your server from xsshunter or interact
g!2m0:~ webcopilot -d bugcrowd.com -o bugcrowd -t 333 -b testServer.xss
The -x
command can be used to exclude out of scope domains.
g!2m0:~ echo out.bugcrowd.com > excludeDomain.txt
g!2m0:~ webcopilot -d bugcrowd.com -o bugcrowd -t 333 -x excludeDomain.txt -b testServer.xss
Default options looks like this:
g!2m0:~ webcopilot -d bugcrowd.com - bugcrowd
βββββββββββββββββ
ββββββββββββββββββ
ββββββββββββββββββββββ
ββββββββββββ¬βββββββββββ
βββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ βββββββββββββ¦βββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββββββ
βββββββββββββββββββββββββββββ¦βββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββ
[β] @h4r5h1t.hrs | G!2m0
[β] Warning: Use with caution. You are responsible for your own actions.
[β] Developers assume no liability and are not responsible for any misuse or damage cause by this tool.
Target: bugcrowd.com
Output: /home/gizmo/targets/bugcrowd
Threads: 100
Server: False
Exclude: False
Mode: Running all Enumeration
Time: 30-08-2021 15:10:00
[!] Please wait while scanning...
[β] Subdoamin Scanning is in progress: Scanning subdomains of bugcrowd.com
[β] Subdoamin Scanned - [assetfinderβ] Subdomain Found: 34
[β] Subdoamin Scanned - [sublist3rβ] Subdomain Found: 29
[β] Subdoamin Scanned - [subfinderβ] Subdomain Found: 54
[β] Subdoamin Scanned - [amassβ] Subdomain Found: 43
[β] Subdoamin Scanned - [findomainβ] Subdomain Found: 27
[β] Active Subdoamin Scanning is in progress:
[!] Please be patient. This may take a while...
[β] Active Subdoamin Scanned - [gobusterβ] Subdomain Found: 11
[β] Active Subdoamin Scanned - [amassβ] Subdomain Found: 0
[β] Subdomain Scanning: Filtering out of scope subdomains
[β] Subdomain Scanning: Filtering Alive subdomains
[β] Subdomain Scanning: Getting titles of valid subdomains
[β] Visual inspection of Subdoamins is completed. Check: /subdomains/aquatone/
[β] Scanning Completed for Subdomains of bugcrowd.com Total: 43 | Alive: 30
[β] Endpoints Scanning Completed for Subdomains of bugcrowd.com Total: 11032
[β] Vulnerabilities Scanning is in progress: Getting all vulnerabilities of bugcrowd.com
[β] Vulnerabilities Scanned - [XSSβ] Found: 0
[β] Vulnerabilities Scanned - [SQLiβ] Found: 0
[β] Vulnerabilities Scanned - [LFIβ] Found: 0
[β] Vulnerabilities Scanned - [CRLFβ] Found: 0
[β] Vulnerabilities Scanned - [SSRFβ] Found: 0
[β] Vulnerabilities Scanned - [Sensitive Dataβ] Found: 0
[β] Vulnerabilities Scanned - [Open redirectβ] Found: 0
[β] Vulnerabilities Scanned - [Subdomain Takeoverβ] Found: 0
[β] Vulnerabilities Scanned - [Nuclieβ] Found: 0
[β] Vulnerabilities Scanning Completed for Subdomains of bugcrowd.com Check: /vulnerabilities/
βββββ βββ βββ ββββ βββ βββββ
βββββ βββ βββ ββββ βββ βββββ
βββββ βββ βββ ββββ βββ βββββ
[+] Subdomains of bugcrowd.com
[+] Subdomains Found: 0
[+] Subdomains Alive: 0
[+] Endpoints: 11032
[+] XSS: 0
[+] SQLi: 0
[+] Open Redirect: 0
[+] SSRF: 0
[+] CRLF: 0
[+] LFI: 0
[+] Sensitive Data: 0
[+] Subdomain Takeover: 0
[+] Nuclei: 0
WebCopilot is inspired from Garud & Pinaak by ROX4R.
@aboul3la @tomnomnom @lc @hahwul @projectdiscovery @maurosoria @shelld3v @devanshbatham @michenriksen @defparam @projectdiscovery @bp0lr @ameenmaali @sqlmapproject @dwisiswant0 @OWASP @OJ @Findomain @danielmiessler @1ndianl33t @ROX4R
Warning: Developers assume no liability and are not responsible for any misuse or damage cause by this tool. So, please se with caution because you are responsible for your own actions. |
Legba
is a multiprotocol credentials bruteforcer / password sprayer and enumerator built with Rust and the Tokio asynchronous runtime in order to achieve better performances and stability while consuming less resources than similar tools (see the benchmark below).
For the building instructions, usage and the complete list of options check the project Wiki.
AMQP (ActiveMQ, RabbitMQ, Qpid, JORAM and Solace), Cassandra/ScyllaDB, DNS subdomain enumeration, FTP, HTTP (basic authentication, NTLMv1, NTLMv2, multipart form, custom requests with CSRF support, files/folders enumeration, virtual host enumeration), IMAP, Kerberos pre-authentication and user enumeration, LDAP, MongoDB, MQTT, Microsoft SQL, MySQL, Oracle, PostgreSQL, POP3, RDP, Redis, SSH / SFTP, SMTP, STOMP (ActiveMQ, RabbitMQ, HornetQ and OpenMQ), TCP port scanning, Telnet, VNC.
Here's a benchmark of legba
versus thc-hydra
running some common plugins, both targeting the same test servers on localhost. The benchmark has been executed on a macOS laptop with an M1 Max CPU, using a wordlist of 1000 passwords with the correct one being on the last line. Legba was compiled in release mode, Hydra compiled and installed via brew formula.
Far from being an exhaustive benchmark (some legba features are simply not supported by hydra, such as CSRF token grabbing), this table still gives a clear idea of how using an asynchronous runtime can drastically improve performances.
Test Name | Hydra Tasks | Hydra Time | Legba Tasks | Legba Time |
---|---|---|---|---|
HTTP basic auth | 16 | 7.100s | 10 | 1.560s (ο 4.5x faster) |
HTTP POST login (wordpress) | 16 | 14.854s | 10 | 5.045s (ο 2.9x faster) |
SSH | 16 | 7m29.85s * | 10 | 8.150s (ο 55.1x faster) |
MySQL | 4 ** | 9.819s | 4 ** | 2.542s (ο 3.8x faster) |
Microsoft SQL | 16 | 7.609s | 10 | 4.789s (ο 1.5x faster) |
* While this result would suggest a default delay between connection attempts used by Hydra. I've tried to study the source code to find such delay but to my knowledge there's none. For some reason it's simply very slow.
** For MySQL hydra automatically reduces the amount of tasks to 4, therefore legba's concurrency level has been adjusted to 4 as well.
Legba is released under the GPL 3 license. To see the licenses of the project dependencies, install cargo license with cargo install cargo-license
and then run cargo license
.
APIDetector is a powerful and efficient tool designed for testing exposed Swagger endpoints in various subdomains with unique smart capabilities to detect false-positives. It's particularly useful for security professionals and developers who are engaged in API testing and vulnerability scanning.
Before running APIDetector, ensure you have Python 3.x and pip installed on your system. You can download Python here.
Clone the APIDetector repository to your local machine using:
git clone https://github.com/brinhosa/apidetector.git
cd apidetector
pip install requests
Run APIDetector using the command line. Here are some usage examples:
Common usage, scan with 30 threads a list of subdomains using a Chrome user-agent and save the results in a file:
python apidetector.py -i list_of_company_subdomains.txt -o results_file.txt -t 30 -ua "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
To scan a single domain:
python apidetector.py -d example.com
To scan multiple domains from a file:
python apidetector.py -i input_file.txt
To specify an output file:
python apidetector.py -i input_file.txt -o output_file.txt
To use a specific number of threads:
python apidetector.py -i input_file.txt -t 20
To scan with both HTTP and HTTPS protocols:
python apidetector.py -m -d example.com
To run the script in quiet mode (suppress verbose output):
python apidetector.py -q -d example.com
To run the script with a custom user-agent:
python apidetector.py -d example.com -ua "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
-d
, --domain
: Single domain to test.-i
, --input
: Input file containing subdomains to test.-o
, --output
: Output file to write valid URLs to.-t
, --threads
: Number of threads to use for scanning (default is 10).-m
, --mixed-mode
: Test both HTTP and HTTPS protocols.-q
, --quiet
: Disable verbose output (default mode is verbose).-ua
, --user-agent
: Custom User-Agent string for requests.Exposing Swagger or OpenAPI documentation endpoints can present various risks, primarily related to information disclosure. Here's an ordered list based on potential risk levels, with similar endpoints grouped together APIDetector scans:
'/swagger-ui.html'
, '/swagger-ui/'
, '/swagger-ui/index.html'
, '/api/swagger-ui.html'
, '/documentation/swagger-ui.html'
, '/swagger/index.html'
, '/api/docs'
, '/docs'
, '/api/swagger-ui'
, '/documentation/swagger-ui'
'/openapi.json'
, '/swagger.json'
, '/api/swagger.json'
, '/swagger.yaml'
, '/swagger.yml'
, '/api/swagger.yaml'
, '/api/swagger.yml'
, '/api.json'
, '/api.yaml'
, '/api.yml'
, '/documentation/swagger.json'
, '/documentation/swagger.yaml'
, '/documentation/swagger.yml'
'/v2/api-docs'
, '/v3/api-docs'
, '/api/v2/swagger.json'
, '/api/v3/swagger.json'
, '/api/v1/documentation'
, '/api/v2/documentation'
, '/api/v3/documentation'
, '/api/v1/api-docs'
, '/api/v2/api-docs'
, '/api/v3/api-docs'
, '/swagger/v2/api-docs'
, '/swagger/v3/api-docs'
, '/swagger-ui.html/v2/api-docs'
, '/swagger-ui.html/v3/api-docs'
, '/api/swagger/v2/api-docs'
, '/api/swagger/v3/api-docs'
'/swagger-resources'
, '/swagger-resources/configuration/ui'
, '/swagger-resources/configuration/security'
, '/api/swagger-resources'
, '/api.html'
Contributions to APIDetector are welcome! Feel free to fork the repository, make changes, and submit pull requests.
The use of APIDetector should be limited to testing and educational purposes only. The developers of APIDetector assume no liability and are not responsible for any misuse or damage caused by this tool. It is the end user's responsibility to obey all applicable local, state, and federal laws. Developers assume no responsibility for unauthorized or illegal use of this tool. Before using APIDetector, ensure you have permission to test the network or systems you intend to scan.
This project is licensed under the MIT License.
CloakQuest3r is a powerful Python tool meticulously crafted to uncover the true IP address of websites safeguarded by Cloudflare, a widely adopted web security and performance enhancement service. Its core mission is to accurately discern the actual IP address of web servers that are concealed behind Cloudflare's protective shield. Subdomain scanning is employed as a key technique in this pursuit. This tool is an invaluable resource for penetration testers, security professionals, and web administrators seeking to perform comprehensive security assessments and identify vulnerabilities that may be obscured by Cloudflare's security measures.
Key Features:
Real IP Detection: CloakQuest3r excels in the art of discovering the real IP address of web servers employing Cloudflare's services. This crucial information is paramount for conducting comprehensive penetration tests and ensuring the security of web assets.
Subdomain Scanning: Subdomain scanning is harnessed as a fundamental component in the process of finding the real IP address. It aids in the identification of the actual server responsible for hosting the website and its associated subdomains.
Threaded Scanning: To enhance efficiency and expedite the real IP detection process, CloakQuest3r utilizes threading. This feature enables scanning of a substantial list of subdomains without significantly extending the execution time.
Detailed Reporting: The tool provides comprehensive output, including the total number of subdomains scanned, the total number of subdomains found, and the time taken for the scan. Any real IP addresses unveiled during the process are also presented, facilitating in-depth analysis and penetration testing.
With CloakQuest3r, you can confidently evaluate website security, unveil hidden vulnerabilities, and secure your web assets by disclosing the true IP address concealed behind Cloudflare's protective layers.
- Still in the development phase, sometimes it can't detect the real Ip.
- CloakQuest3r combines multiple indicators to uncover real IP addresses behind Cloudflare. While subdomain scanning is a part of the process, we do not assume that all subdomains' A records point to the target host. The tool is designed to provide valuable insights but may not work in every scenario. We welcome any specific suggestions for improvement.
1. False Negatives: CloakReveal3r may not always accurately identify the real IP address behind Cloudflare, particularly for websites with complex network configurations or strict security measures.
2. Dynamic Environments: Websites' infrastructure and configurations can change over time. The tool may not capture these changes, potentially leading to outdated information.
3. Subdomain Variation: While the tool scans subdomains, it doesn't guarantee that all subdomains' A records will point to the pri mary host. Some subdomains may also be protected by Cloudflare.
How to Use:
Run CloudScan with a single command-line argument: the target domain you want to analyze.
git clone https://github.com/spyboy-productions/CloakQuest3r.git
cd CloakQuest3r
pip3 install -r requirements.txt
python cloakquest3r.py example.com
The tool will check if the website is using Cloudflare. If not, it will inform you that subdomain scanning is unnecessary.
If Cloudflare is detected, CloudScan will scan for subdomains and identify their real IP addresses.
You will receive detailed output, including the number of subdomains scanned, the total number of subdomains found, and the time taken for the scan.
Any real IP addresses found will be displayed, allowing you to conduct further analysis and penetration testing.
CloudScan simplifies the process of assessing website security by providing a clear, organized, and informative report. Use it to enhance your security assessments, identify potential vulnerabilities, and secure your web assets.
Run it online on replit.com : https://replit.com/@spyb0y/CloakQuest3r
Porch Pirate started as a tool to quickly uncover Postman secrets, and has slowly begun to evolve into a multi-purpose reconaissance / OSINT framework for Postman. While existing tools are great proof of concepts, they only attempt to identify very specific keywords as "secrets", and in very limited locations, with no consideration to recon beyond secrets. We realized we required capabilities that were "secret-agnostic", and had enough flexibility to capture false-positives that still provided offensive value.
Porch Pirate enumerates and presents sensitive results (global secrets, unique headers, endpoints, query parameters, authorization, etc), from publicly accessible Postman entities, such as:
python3 -m pip install porch-pirate
The Porch Pirate client can be used to nearly fully conduct reviews on public Postman entities in a quick and simple fashion. There are intended workflows and particular keywords to be used that can typically maximize results. These methodologies can be located on our blog: Plundering Postman with Porch Pirate.
Porch Pirate supports the following arguments to be performed on collections, workspaces, or users.
--globals
--collections
--requests
--urls
--dump
--raw
--curl
porch-pirate -s "coca-cola.com"
By default, Porch Pirate will display globals from all active and inactive environments if they are defined in the workspace. Provide a -w
argument with the workspace ID (found by performing a simple search, or automatic search dump) to extract the workspace's globals, along with other information.
porch-pirate -w abd6bded-ac31-4dd5-87d6-aa4a399071b8
When an interesting result has been found with a simple search, we can provide the workspace ID to the -w
argument with the --dump
command to begin extracting information from the workspace and its collections.
porch-pirate -w abd6bded-ac31-4dd5-87d6-aa4a399071b8 --dump
Porch Pirate can be supplied a simple search term, following the --globals
argument. Porch Pirate will dump all relevant workspaces tied to the results discovered in the simple search, but only if there are globals defined. This is particularly useful for quickly identifying potentially interesting workspaces to dig into further.
porch-pirate -s "shopify" --globals
Porch Pirate can be supplied a simple search term, following the --dump
argument. Porch Pirate will dump all relevant workspaces and collections tied to the results discovered in the simple search. This is particularly useful for quickly sifting through potentially interesting results.
porch-pirate -s "coca-cola.com" --dump
A particularly useful way to use Porch Pirate is to extract all URLs from a workspace and export them to another tool for fuzzing.
porch-pirate -w abd6bded-ac31-4dd5-87d6-aa4a399071b8 --urls
Porch Pirate will recursively extract all URLs from workspaces and their collections related to a simple search term.
porch-pirate -s "coca-cola.com" --urls
porch-pirate -w abd6bded-ac31-4dd5-87d6-aa4a399071b8 --collections
porch-pirate -w abd6bded-ac31-4dd5-87d6-aa4a399071b8 --requests
porch-pirate -w abd6bded-ac31-4dd5-87d6-aa4a399071b8 --raw
porch-pirate -w WORKSPACE_ID
porch-pirate -c COLLECTION_ID
porch-pirate -r REQUEST_ID
porch-pirate -u USERNAME/TEAMNAME
Porch Pirate can build curl requests when provided with a request ID for easier testing.
porch-pirate -r 11055256-b1529390-18d2-4dce-812f-ee4d33bffd38 --curl
porch-pirate -s coca-cola.com --proxy 127.0.0.1:8080
p = porchpirate()
print(p.search('coca-cola.com'))
p = porchpirate()
print(p.collections('4127fdda-08be-4f34-af0e-a8bdc06efaba'))
p = porchpirate()
collections = json.loads(p.collections('4127fdda-08be-4f34-af0e-a8bdc06efaba'))
for collection in collections['data']:
requests = collection['requests']
for r in requests:
request_data = p.request(r['id'])
print(request_data)
p = porchpirate()
print(p.workspace_globals('4127fdda-08be-4f34-af0e-a8bdc06efaba'))
Other library usage examples can be located in the examples
directory, which contains the following examples:
dump_workspace.py
format_search_results.py
format_workspace_collections.py
format_workspace_globals.py
get_collection.py
get_collections.py
get_profile.py
get_request.py
get_statistics.py
get_team.py
get_user.py
get_workspace.py
recursive_globals_from_search.py
request_to_curl.py
search.py
search_by_page.py
workspace_collections.py
OSINT framework focused on gathering information from free tools or resources. The intention is to help people find free OSINT resources. Some of the sites included might require registration or offer more data for $$$, but you should be able to get at least a portion of the available information for no cost.
I originally created this framework with an information security point of view. Since then, the response from other fields and disciplines has been incredible. I would love to be able to include any other OSINT resources, especially from fields outside of infosec. Please let me know about anything that might be missing!
Please visit the framework at the link below and good hunting!
(T) - Indicates a link to a tool that must be installed and run locally
(D) - Google Dork, for more information: Google Hacking
(R) - Requires registration
(M) - Indicates a URL that contains the search term and the URL itself must be edited manually
Follow me on Twitter: @jnordine - https://twitter.com/jnordine
Watch or star the project on Github: https://github.com/lockfale/osint-framework
Feedback or new tool suggestions are extremely welcome! Please feel free to submit a pull request or open an issue on github or reach out on Twitter.
For new resources, please ensure that the site is available for public and free use.
Thank you!
Happy Hunting!
Goblob is a lightweight and fast enumeration tool designed to aid in the discovery of sensitive information exposed publicy in Azure blobs, which can be useful for various research purposes such as vulnerability assessments, penetration testing, and reconnaissance.
Warning. Goblob will issue individual goroutines for each container name to check in each storage account, only limited by the maximum number of concurrent goroutines specified in the -goroutines
flag. This implementation can exhaust bandwidth pretty quickly in most cases with the default wordlist, or potentially cost you a lot of money if you're using the tool in a cloud environment. Make sure you understand what you are doing before running the tool.
go install github.com/Macmod/goblob@latest
To use goblob simply run the following command:
$ ./goblob <storageaccountname>
Where <storageaccountname>
is the target storage account to enumerate public Azure blob storage URLs on.
You can also specify a list of storage account names to check:
$ ./goblob -accounts accounts.txt
By default, the tool will use a list of common Azure Blob Storage container names to construct potential URLs. However, you can also specify a custom list of container names using the -containers
option. For example:
$ ./goblob -accounts accounts.txt -containers wordlists/goblob-folder-names.txt
The tool also supports outputting the results to a file using the -output
option:
$ ./goblob -accounts accounts.txt -containers wordlists/goblob-folder-names.txt -output results.txt
If you want to provide accounts to test via stdin
you can also omit -accounts
(or the account name) entirely:
$ cat accounts.txt | ./goblob
Goblob comes bundled with basic wordlists that can be used with the -containers
option:
Goblob provides several flags that can be tuned in order to improve the enumeration process:
-goroutines=N
- Maximum number of concurrent goroutines to allow (default: 5000
).-blobs=true
- Report the URL of each blob instead of the URL of the containers (default: false
).-verbose=N
- Set verbosity level (default: 1
, min: 0
, max: 3
).-maxpages=N
- Maximum of container pages to traverse looking for blobs (default: 20
, set to -1
to disable limit or to 0
to avoid listing blobs at all and just check if the container is public)-timeout=N
- Timeout for HTTP requests (seconds, default: 90
)-maxidleconns=N
- MaxIdleConns
transport parameter for HTTP client (default: 100
)-maxidleconnsperhost=N
- MaxIdleConnsPerHost
transport parameter for HTTP client (default: 10
)-maxconnsperhost=N
- MaxConnsPerHost
transport parameter for HTTP client (default: 0
)-skipssl=true
- Skip SSL verification (default: false
)-invertsearch=true
- Enumerate accounts for each container instead of containers for each account (default: false
)For instance, if you just want to find publicly exposed containers using large lists of storage accounts and container names, you should use -maxpages=0
to prevent the goroutines from paginating the results. Then run it again on the set of results you found with -blobs=true
and -maxpages=-1
to actually get the URLs of the blobs.
If, on the other hand, you want to test a small list of very popular container names against a large set of storage accounts, you might want to try -invertsearch=true
with -maxpages=0
, in order to see the public accounts for each container name instead of the container names for each storage account.
You may also want to try changing -goroutines
, -timeout
and -maxidleconns
, -maxidleconnsperhost
and -maxconnsperhost
and -skipssl
in order to best use your bandwidth and find results faster.
Experiment with the flags to find what works best for you ;-)
Contributions are welcome by opening an issue or by submitting a pull request.
An interesting visualization of popular container names found in my experiments with the tool:
If you want to know more about my experiments and the subject in general, take a look at my article:
During the reconnaissance phase, an attacker searches for any information about his target to create a profile that will later help him to identify possible ways to get in an organization.
CloudPulse is a powerful tool that simplifies and enhances the analysis of SSL certificate data. It leverages the extensive repository of SSL certificates obtained from the AWS EC2 machines available at Trickest Cloud. With CloudPulse , security researchers can efficiently explore SSL certificate details, uncover potential vulnerabilities, and gather valuable insights for a variety of security-related tasks.
Simplifies security assessments with a user-friendly interface. It allows you to effortlessly find company's asset's on aws cloud:
1- Download CloudPulse :
git clone https://github.com/yousseflahouifi/CloudPulse
cd CloudPulse/
2- Run docker compose :
docker-compose up -d
3- Run script.py script
docker-compose exec web python script.py
4 - Now go to http://:8000/search and enjoy the search engine
1- download CloudPulse :
git clone https://github.com/yousseflahouifi/CloudPulse
cd CloudPulse/
2- Setup virtual environment :
python3 -m venv myenv
source myenv/bin/activate
3- Install requirements.txt file :
pip install -r requirements.txt
4- run an instance of elasticsearch using docker :
docker run -d --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" elasticsearch:6.6.1
5- update script.py and settings file to the host 'localhost':
#script.py
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
#se/settings.py
ELASTICSEARCH_DSL = {
'default': {
'hosts': 'localhost:9200'
},
}
6- Run script.py to index data in elasticsearch:
python script.py
7- Run the app:
python manage.py runserver 0:8000
Included in the CloudPulse repository is a sample data.csv file containing close to 4,000 records, which provides a glimpse of the tool's capabilities. For the full dataset, visit the Trickest Cloud repository clone the data and update data.csv file (it contains close to 9 millions data)
as an example searching for .mil data gives:
searching for tesla as en example gives :
CloudPulse heavily depends on the data.csv file, which is a sample dataset extracted from the larger collection maintained by Trickest. While the sample dataset provides valuable insights, the tool's full potential is realized when used in conjunction with the complete dataset, which is accessible in the Trickest repository here.
Users are encouraged to refer to the Trickest dataset for a more comprehensive and up-to-date analysis.
Cross-language email validation. Backed by a database of over 55 000 throwable email domains.
FILTER_VALIDATE_EMAIL
for PHP)This will be very helpful when you have to contact your users and you want to avoid errors causing lack of communication or want to block "spamboxes".
Need to provide Webhooks inside your SaaS?
Need to embed a charts into an email?
It's over with Image-Charts, no more server-side rendering pain, 1 url = 1 chart.
https://image-charts.com/chart?
cht=lc // chart type
&chd=s:cEAELFJHHHKUju9uuXUc // chart data
&chxt=x,y // axis
&chxl=0:|0|1|2|3|4|5| // axis labels
&chs=873x200 // size
Mailchecker public API has been normalized, here are the changes:
MailChecker(email)
-> MailChecker.isValid(email)
MailChecker($email)
-> MailChecker::isValid($email)
import MailChecker
m = MailChecker.MailChecker()
if not m.is_valid('bla@example.com'):
# ...
became:
import MailChecker
if not MailChecker.is_valid('bla@example.com'):
# ...
MailChecker currently supports:
var MailChecker = require('mailchecker');
if(!MailChecker.isValid('myemail@yopmail.com')){
console.error('O RLY !');
process.exit(1);
}
if(!MailChecker.isValid('myemail.com')){
console.error('O RLY !');
process.exit(1);
}
<script type="text/javascript" src="MailChecker/platform/javascript/MailChecker.js"></script>
<script type="text/javascript">
if(!MailChecker.isValid('myemail@yopmail.com')){
console.error('O RLY !');
}
if(!MailChecker.isValid('myemail.com')){
console.error('O RLY !');
}
</script>
include __DIR__."/MailChecker/platform/php/MailChecker.php";
if(!MailChecker::isValid('myemail@yopmail.com')){
die('O RLY !');
}
if(!MailChecker::isValid('myemail.com')){
die('O RLY !');
}
pip install mailchecker
# no package yet; just drop in MailChecker.py where you want to use it.
from MailChecker import MailChecker
if not MailChecker.is_valid('bla@example.com'):
print "O RLY !"
Django validator: https://github.com/jonashaag/django-indisposable
require 'mail_checker'
unless MailChecker.valid?('myemail@yopmail.com')
fail('O RLY!')
end
extern crate mailchecker;
assert_eq!(true, mailchecker::is_valid("plop@plop.com"));
assert_eq!(false, mailchecker::is_valid("\nok@gmail.com\n"));
assert_eq!(false, mailchecker::is_valid("ok@guerrillamailblock.com"));
Code.require_file("mail_checker.ex", "mailchecker/platform/elixir/")
unless MailChecker.valid?("myemail@yopmail.com") do
raise "O RLY !"
end
unless MailChecker.valid?("myemail.com") do
raise "O RLY !"
end
; no package yet; just drop in mailchecker.clj where you want to use it.
(load-file "platform/clojure/mailchecker.clj")
(if (not (mailchecker/valid? "myemail@yopmail.com"))
(throw (Throwable. "O RLY!")))
(if (not (mailchecker/valid? "myemail.com"))
(throw (Throwable. "O RLY!")))
package main
import (
"log"
"github.com/FGRibreau/mailchecker/platform/go"
)
if !mail_checker.IsValid('myemail@yopmail.com') {
log.Fatal('O RLY !');
}
if !mail_checker.IsValid('myemail.com') {
log.Fatal("O RLY !")
}
Go
go get https://github.com/FGRibreau/mailchecker
NodeJS/JavaScript
npm install mailchecker
Ruby
gem install ruby-mailchecker
PHP
composer require fgribreau/mailchecker
We accept pull-requests for other package manager.
$('td', 'table:last').map(function(){
return this.innerText;
}).toArray();
Array.prototype.slice.call(document.querySelectorAll('.entry > ul > li a')).map(function(el){return el.innerText});
... please add your own dataset to list.txt.
Just run (requires NodeJS):
npm run build
Development environment requires docker.
# install and setup every language dependencies in parallel through docker
npm install
# run every language setup in parallel through docker
npm run setup
# run every language tests in parallel through docker
npm test
These amazing people are maintaining this project:
These amazing people have contributed code to this project:
Discover how you can contribute by heading on over to the CONTRIBUTING.md
file.
Web Path Finder is a Python program that provides information about a website. It retrieves various details such as page title, last updated date, DNS information, subdomains, firewall names, technologies used, certificate information, and more.Β
Clone the repository:
git clone https://github.com/HalilDeniz/PathFinder.git
Install the required packages:
pip install -r requirements.txt
This will install all the required modules and their respective versions.
Run the program using the following command:
Γ’βΕΓ’ββ¬Γ’ββ¬(rootΓ°ΕΈββ¬denizhalil)-[~/MyProjects/]
Γ’ββΓ’ββ¬# python3 web-info-explorer.py --help
usage: wpathFinder.py [-h] url
Web Information Program
positional arguments:
url Enter the site URL
options:
-h, --help show this help message and exit
Replace <url>
with the URL of the website you want to explore.
Here is an example output of running the program:
Γ’βΕΓ’ββ¬Γ’ββ¬(rootΓ°ΕΈββ¬denizhalil)-[~/MyProjects/]
Γ’ββΓ’ββ¬# python3 pathFinder.py https://www.facebook.com/
Site Information:
Title: Facebook - Login or Register
Last Updated Date: None
First Creation Date: 1997-03-29 05:00:00
Dns Information: []
Sub Branches: ['157']
Firewall Names: []
Technologies Used: javascript, php, css, html, react
Certificate Information:
Certificate Issuer: US
Certificate Start Date: 2023-02-07 00:00:00
Certificate Expiration Date: 2023-05-08 23:59:59
Certificate Validity Period (Days): 90
Bypassed JavaScript content:
</ div> Contributions are welcome! To contribute to PathFinder, follow these steps:
This project is licensed under the MIT License - see the LICENSE file for details.
For any inquiries or further information, you can reach me through the following channels:
Spoofy
is a program that checks if a list of domains can be spoofed based on SPF and DMARC records. You may be asking, "Why do we need another tool that can check if a domain can be spoofed?"
Well, Spoofy is different and here is why:
- Authoritative lookups on all lookups with known fallback (Cloudflare DNS)
- Accurate bulk lookups
- Custom, manually tested spoof logic (No guessing or speculating, real world test results)
- SPF lookup counter
Β
Spoofy
requires Python 3+. Python 2 is not supported. Usage is shown below:
Usage:
./spoofy.py -d [DOMAIN] -o [stdout or xls]
OR
./spoofy.py -iL [DOMAIN_LIST] -o [stdout or xls]
Install Dependencies:
pip3 install -r requirements.txt
(The spoofability table lists every combination of SPF and DMARC configurations that impact deliverability to the inbox, except for DKIM modifiers.) Download Here
The creation of the spoofability table involved listing every relevant SPF and DMARC configuration, combining them, and then conducting SPF and DMARC information collection using an early version of Spoofy on a large number of US government domains. Testing if an SPF and DMARC combination was spoofable or not was done using the email security pentesting suite at emailspooftest using Microsoft 365. However, the initial testing was conducted using Protonmail and Gmail, but these services were found to utilize reverse lookup checks that affected the results, particularly for subdomain spoof testing. As a result, Microsoft 365 was used for the testing, as it offered greater control over the handling of mail.
After the initial testing using Microsoft 365, some combinations were retested using Protonmail and Gmail due to the differences in their handling of banners in emails. Protonmail and Gmail can place spoofed mail in the inbox with a banner or in spam without a banner, leading to some SPF and DMARC combinations being reported as "Mailbox Dependent" when using Spoofy. In contrast, Microsoft 365 places both conditions in spam. The testing and data collection process took several days to complete, after which a good master table was compiled and used as the basis for the Spoofy spoofability logic.
This tool is only for testing and academic purposes and can only be used where strict consent has been given. Do not use it for illegal purposes! It is the end userβs responsibility to obey all applicable local, state and federal laws. Developers assume no liability and are not responsible for any misuse or damage caused by this tool and software.
Lead / Only programmer & spoofability logic comprehension upgrades & lookup resiliency system / fix (main issue with other tools) & multithreading & feature additions: Matt Keeley
DMARC, SPF, DNS insights & Spoofability table creation/confirmation/testing & application accuracy/quality assurance: calamity.email / eman-ekaf
Logo: cobracode
Tool was inspired by Bishop Fox's project called spoofcheck.
Daksh SCRA (Source Code Review Assist) tool is built to enhance the efficiency of the source code review process, providing a well-structured and organized approach for code reviewers.
Rather than indiscriminately flagging everything as a potential issue, Daksh SCRA promotes thoughtful analysis, urging the investigation and confirmation of potential problems. This approach mitigates the scramble to tag every potential concern as a bug, cutting back on the confusion and wasted time spent on false positives.
What sets Daksh SCRA apart is its emphasis on avoiding unnecessary bug tagging. Unlike conventional methods, it advocates for thorough investigation and confirmation of potential issues before tagging them as bugs. This approach helps mitigate the issue of false positives, which often consume valuable time and resources, thereby fostering a more productive and efficient code review process.
Daksh SCRA was initially introduced during a source code review training session I conducted at Black Hat USA 2022 (August 6 - 9), where it was subtly presented to a specific audience. However, this introduction was carried out with a low-profile approach, avoiding any major announcements.
While this tool was quietly published on GitHub after the 2022 training, its official public debut took place at Black Hat USA 2023 in Las Vegas.
Identifies Areas of Interest in Source Code: Encourage focused investigation and confirmation rather than indiscriminately labeling everything as a bug.
Identifies Areas of Interest in File Paths (Worldβs First): Recognises patterns in file paths to pinpoint relevant sections for review.
Software-Level Reconnaissance to Identify Technologies Utilised: Identifies project technologies, enabling code reviewers to conduct precise scans with appropriate rules.
Automated Scientific Effort Estimation for Code Review (Worldβs First): Providing a measurable approach for estimating efforts required for a code review process.
Although this tool has progressed beyond its early stages, it has reached a functional state that is quite usable and delivers on its promised capabilities. Nevertheless, active enhancements are currently underway, and there are multiple new features and improvements expected to be added in the upcoming months.
Additionally, the tool offers the following functionalities:
Refer to the wiki for the tool setup and usage details - https://github.com/coffeeandsecurity/DakshSCRA/wiki
Feel free to contribute towards updating or adding new rules and future development.
If you find any bugs, report them to d3basis.m0hanty@gmail.com.
Python3 and all the libraries listed in requirements.txt
$ pip install virtualenv
$ virtualenv -p python3 {name-of-virtual-env} // Create a virtualenv
Example: virtualenv -p python3 venv
$ source {name-of-virtual-env}/bin/activate // To activate virtual environment you just created
Example: source venv/bin/activate
After running the activate command you should see the name of your virtual env at the beginning of your terminal like this: (venv) $
You must run the below command after activating the virtual environment as mentioned in the previous steps.
pip install -r requirements.txt
Once the above step successfully installs all the required libraries, refer to the following tool usage commands to run the tool.
$ python3 dakshscra.py -h // To view avaialble options and arguments
usage: dakshscra.py [-h] [-r RULE_FILE] [-f FILE_TYPES] [-v] [-t TARGET_DIR] [-l {R,RF}] [-recon] [-estimate]
options:
-h, --help show this help message and exit
-r RULE_FILE Specify platform specific rule name
-f FILE_TYPES Specify file types to scan
-v Specify verbosity level {'-v', '-vv', '-vvv'}
-t TARGET_DIR Specify target directory path
-l {R,RF}, --list {R,RF}
List rules [R] OR rules and filetypes [RF]
-recon Detects platform, framework and programming language used
-estimate Estimate efforts required for code review
$ python3 dakshscra.py // To view tool usage along with examples
Examples:
# '-f' is optional. If not specified, it will default to the corresponding filetypes of the selected rule.
dakshsca.py -r php -t /source_dir_path
# To override default settings, other filetypes can be specified with '-f' option.
dakshsca.py -r php -f dotnet -t /path_to_source_dir
dakshsca.py -r php -f custom -t /path_to_source_dir
# Perform reconnaissance and rule based scanning if '-recon' used with '-r' option.
dakshsca.py -recon -r php -t /path_to_source_dir
# Perform only reconnaissance if '-recon' used without the '-r' option.
dakshsca.py -recon -t /path_to_source_dir
# Verbosity: '-v' is default, '-vvv' will display all rules check within each rule category.
dakshsca.py -r php -vv -t /path_to_source_dir
Supported RULE_FILE: dotnet, java, php, javascript
Supported FILE_TY PES: dotnet, php, java, custom, allfiles
The tool generates reports in three formats: HTML, PDF, and TEXT. Although the HTML and PDF reports are still being improved, they are currently in a reasonably good state. With each subsequent iteration, these reports will continue to be refined and improved even further.
Note: Currently, the reconnaissance report is created in a text format. However, in upcoming releases, the plan is to incorporate it into the vulnerability scanning report, which will be available in both HTML and PDF formats.
Note: At present, the effort estimation for the source code review is in its early stages. It is considered experimental and will be developed and refined through several iterations. Improvements will be made over multiple releases, as the formula and the concept are new and require time to be honed to achieve accuracy or reasonable estimation.
Currently, the report is generated in HTML format. However, in future releases, there are plans to also provide it in PDF format.
Nodesub is a command-line tool for finding subdomains in bug bounty programs. It supports various subdomain enumeration techniques and provides flexible options for customization.
To install Nodesub, use the following command:
npm install -g nodesub
NOTE:
~/.config/nodesub/config.ini
nodesub -h
This will display help for the tool. Here are all the switches it supports.
Enumerate subdomains for a single domain:
nodesub -u example.com
Enumerate subdomains for a list of domains from a file:
nodesub -l domains.txt
Perform subdomain enumeration using CIDR:
node nodesub.js -c 192.168.0.0/24 -o subdomains.txt
node nodesub.js -c CIDR.txt -o subdomains.txt
Perform subdomain enumeration using ASN:
node nodesub.js -a AS12345 -o subdomains.txt
node nodesub.js -a ASN.txt -o subdomains.txt
Enable recursive subdomain enumeration and output the results to a JSON file:
nodesub -u example.com -r -o output.json -f json
The tool provides various output formats for the results, including:
The output file contains the resolved subdomains, failed resolved subdomains, or all subdomains based on the options chosen.
Β
AtlasReaper is a command-line tool developed for offensive security purposes, primarily focused on reconnaissance of Confluence and Jira. It also provides various features that can be helpful for tasks such as credential farming and social engineering. The tool is written in C#.
Blog post: Sowing Chaos and Reaping Rewards in Confluence and Jira
.@@@@
@@@@@
@@@@@ @@@@@@@
@@@@@ @@@@@@@@@@@
@@@@@ @@@@@@@@@@@@@@@
@@@@, @@@@ *@@@@
@@@@ @@@ @@ @@@ .@@@
_ _ _ ___ @@@@@@@ @@@@@@
/_\| |_| |__ _ __| _ \___ __ _ _ __ ___ _ _ @@ @@@@@@@@
/ _ \ _| / _` (_-< / -_) _` | '_ \/ -_) '_| @@ @@@@@@@@
/_/ \_\__|_\__,_/__/_|_\___\__,_| .__/\___|_| @@@@@@@@ &@
|_| @@@@@@@@@@ @@&
@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@. @@
@werdhaihai
AtlasReaper uses commands, subcommands, and options. The format for executing commands is as follows:
.\AtlasReaper.exe [command] [subcommand] [options]
Replace [command]
, [subcommand]
, and [options]
with the appropriate values based on the action you want to perform. For more information about each command or subcommand, use the -h
or --help
option.
Below is a list of available commands and subcommands:
Each command has sub commands for interacting with the specific product.
confluence
jira
confluence attach
- Attach a file to a page.confluence download
- Download an attachment.confluence embed
- Embed a 1x1 pixel image to perform farming attacks.confluence link
- Add a link to a page.confluence listattachments
- List attachments.confluence listpages
- List pages in Confluence.confluence listspaces
- List spaces in Confluence.confluence search
- Search Confluence.jira addcomment
- Add a comment to an issue.jira attach
- Attach a file to an issue.jira createissue
- Create a new issue.jira download
- Download attachment(s) from an issue.jira listattachments
- List attachments on an issue.jira listissues
- List issues in Jira.jira listprojects
- List projects in Jira.jira listusers
- List Atlassian users.jira searchissues
- Search issues in Jira.help
- Display more information on a specific command.Here are a few examples of how to use AtlasReaper:
Search for a keyword in Confluence with wildcard search:
.\AtlasReaper.exe confluence search --query "http*example.com*" --url $url --cookie $cookie
Attach a file to a page in Confluence:
.\AtlasReaper.exe confluence attach --page-id "12345" --file "C:\path\to\file.exe" --url $url --cookie $cookie
Create a new issue in Jira:
.\AtlasReaper.exe jira createissue --project "PROJ" --issue-type Task --message "I can't access this link from my host" --url $url --cookie $cookie
Confluence and Jira can be configured to allow anonymous access. You can check this by supplying omitting the -c/--cookie from the commands.
In the event authentication is required, you can dump cookies from a user's browser with SharpChrome or another similar tool.
.\SharpChrome.exe cookies /showall
Look for any cookies scoped to the *.atlassian.net
named cloud.session.token
or tenant.session.token
Please note the following limitations of AtlasReaper:
cloud.session.token
or tenant.session.token
which can be obtained from a user's browser. Alternatively, it can use anonymous access if permitted. (API tokens or other auth is not currently supported)If you encounter any issues or have suggestions for improvements, please feel free to contribute by submitting a pull request or opening an issue in the AtlasReaper repo.
surf
allows you to filter a list of hosts, returning a list of viable SSRF candidates. It does this by sending a HTTP request from your machine to each host, collecting all the hosts that did not respond, and then filtering them into a list of externally facing and internally facing hosts.
You can then attempt these hosts wherever an SSRF vulnerability may be present. Due to most SSRF filters only focusing on internal or restricted IP ranges, you'll be pleasantly surprised when you get SSRF on an external IP that is not accessible via HTTP(s) from your machine.
Often you will find that large companies with cloud environments will have external IPs for internal web apps. Traditional SSRF filters will not capture this unless these hosts are specifically added to a blacklist (which they usually never are). This is why this technique can be so powerful.
This tool requires go 1.19 or above as we rely on httpx to do the HTTP probing.
It can be installed with the following command:
go install github.com/assetnote/surf/cmd/surf@latest
Consider that you have subdomains for bigcorp.com
inside a file named bigcorp.txt
, and you want to find all the SSRF candidates for these subdomains. Here are some examples:
# find all ssrf candidates (including external IP addresses via HTTP probing)
surf -l bigcorp.txt
# find all ssrf candidates (including external IP addresses via HTTP probing) with timeout and concurrency settings
surf -l bigcorp.txt -t 10 -c 200
# find all ssrf candidates (including external IP addresses via HTTP probing), and just print all hosts
surf -l bigcorp.txt -d
# find all hosts that point to an internal/private IP address (no HTTP probing)
surf -l bigcorp.txt -x
The full list of settings can be found below:
β― surf -h
βββββββββββ ββββββββββ ββββββββ
βββββββββββ βββββββββββββββββββ
βββββββββββ βββββββββββββββββ
βββββββββββ βββββββββββββββββ
ββββββββββββββββ βββ ββββββ
ββββββββ βββββββ βββ ββββββ
by shubs @ assetnote
Usage: surf [--hosts FILE] [--concurrency CONCURRENCY] [--timeout SECONDS] [--retries RETRIES] [--disablehttpx] [--disableanalysis]
Options:
--hosts FILE, -l FILE
List of assets (hosts or subdomains)
--concurrency CONCURRENCY, -c CONCURRENCY
Threads (passed down to httpx) - default 100 [default: 100]
--timeout SECONDS, -t SECONDS
Timeout in seconds (passed down to httpx) - default 3 [default: 3]
--retries RETRIES, -r RETRIES
Retries on failure (passed down to httpx) - default 2 [default: 2]
--disablehttpx, -x Disable httpx and only output list of hosts that resolve to an internal IP address - default false [default: false]
--disableanalysis, -d
Disable analysis and only output list of hosts - default false [default: false]
--help, -h display this help and exit
When running surf
, it will print out the SSRF candidates to stdout
, but it will also save two files inside the folder it is ran from:
external-{timestamp}.txt
- Externally resolving, but unable to send HTTP requests to from your machineinternal-{timestamp}.txt
- Internally resolving, and obviously unable to send HTTP requests from your machineThese two files will contain the list of hosts that are ideal SSRF candidates to try on your target. The external target list has higher chances of being viable than the internal list.
Under the hood, this tool leverages httpx to do the HTTP probing. It captures errors returned from httpx, and then performs some basic analysis to determine the most viable candidates for SSRF.
This tool was created as a result of a live hacking event for HackerOne (H1-4420 2023).
xsubfind3r
is a command-line interface (CLI) utility to find domain's known subdomains from curated passive online sources.
Fetches domains from curated passive sources to maximize results.
Supports stdin
and stdout
for easy integration into workflows.
Cross-Platform (Windows, Linux & macOS).
Visit the releases page and find the appropriate archive for your operating system and architecture. Download the archive from your browser or copy its URL and retrieve it with wget
or curl
:
...with wget
:
wget https://github.com/hueristiq/xsubfind3r/releases/download/v<version>/xsubfind3r-<version>-linux-amd64.tar.gz
...or, with curl
:
curl -OL https://github.com/hueristiq/xsubfind3r/releases/download/v<version>/xsubfind3r-<version>-linux-amd64.tar.gz
...then, extract the binary:
tar xf xsubfind3r-<version>-linux-amd64.tar.gz
TIP: The above steps, download and extract, can be combined into a single step with this onliner
curl -sL https://github.com/hueristiq/xsubfind3r/releases/download/v<version>/xsubfind3r-<version>-linux-amd64.tar.gz | tar -xzv
NOTE: On Windows systems, you should be able to double-click the zip archive to extract the xsubfind3r
executable.
...move the xsubfind3r
binary to somewhere in your PATH
. For example, on GNU/Linux and OS X systems:
sudo mv xsubfind3r /usr/local/bin/
NOTE: Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xsubfind3r
to their PATH
.
Before you install from source, you need to make sure that Go is installed on your system. You can install Go by following the official instructions for your operating system. For this, we will assume that Go is already installed.
go install ...
go install -v github.com/hueristiq/xsubfind3r/cmd/xsubfind3r@latest
go build ...
the development VersionClone the repository
git clone https://github.com/hueristiq/xsubfind3r.git
Build the utility
cd xsubfind3r/cmd/xsubfind3r && \
go build .
Move the xsubfind3r
binary to somewhere in your PATH
. For example, on GNU/Linux and OS X systems:
sudo mv xsubfind3r /usr/local/bin/
NOTE: Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xsubfind3r
to their PATH
.
NOTE: While the development version is a good way to take a peek at xsubfind3r
's latest features before they get released, be aware that it may have bugs. Officially released versions will generally be more stable.
xsubfind3r
will work right after installation. However, BeVigil, Chaos, Fullhunt, Github, Intelligence X and Shodan require API keys to work, URLScan supports API key but not required. The API keys are stored in the $HOME/.hueristiq/xsubfind3r/config.yaml
file - created upon first run - and uses the YAML format. Multiple API keys can be specified for each of these source from which one of them will be used.
Example config.yaml
:
version: 0.3.0
sources:
- alienvault
- anubis
- bevigil
- chaos
- commoncrawl
- crtsh
- fullhunt
- github
- hackertarget
- intelx
- shodan
- urlscan
- wayback
keys:
bevigil:
- awA5nvpKU3N8ygkZ
chaos:
- d23a554bbc1aabb208c9acfbd2dd41ce7fc9db39asdsd54bbc1aabb208c9acfb
fullhunt:
- 0d9652ce-516c-4315-b589-9b241ee6dc24
github:
- d23a554bbc1aabb208c9acfbd2dd41ce7fc9db39
- asdsd54bbc1aabb208c9acfbd2dd41ce7fc9db39
intelx:
- 2.intelx.io:00000000-0000-0000-0000-000000000000
shodan:
- AAAAClP1bJJSRMEYJazgwhJKrggRwKA
urlscan:
- d4c85d34-e425-446e-d4ab-f5a3412acbe8
To display help message for xsubfind3r
use the -h
flag:
xsubfind3r -h
help message:
_ __ _ _ _____
__ _____ _ _| |__ / _(_)_ __ __| |___ / _ __
\ \/ / __| | | | '_ \| |_| | '_ \ / _` | |_ \| '__|
> <\__ \ |_| | |_) | _| | | | | (_| |___) | |
/_/\_\___/\__,_|_.__/|_| |_|_| |_|\__,_|____/|_| v0.3.0
USAGE:
xsubfind3r [OPTIONS]
INPUT:
-d, --domain string[] target domains
-l, --list string target domains' list file path
SOURCES:
--sources bool list supported sources
-u, --sources-to-use string[] comma(,) separeted sources to use
-e, --sources-to-exclude string[] comma(,) separeted sources to exclude
OPTIMIZATION:
-t, --threads int number of threads (default: 50)
OUTPUT:
--no-color bool disable colored output
-o, --output string output subdomains' file path
-O, --output-directory string output subdomains' directory path
-v, --verbosity string debug, info, warning, error, fatal or silent (default: info)
CONFIGURATION:
-c, --configuration string configuration file path (default: ~/.hueristiq/xsubfind3r/config.yaml)
Issues and Pull Requests are welcome! Check out the contribution guidelines.
This utility is distributed under the MIT license.
Columbus Project is an API first subdomain discovery service, blazingly fast subdomain enumeration service with advanced features.
Columbus returned 638 subdomains of tesla.com in 0.231 sec.
By default Columbus returns only the subdomains in a JSON string array:
curl 'https://columbus.elmasy.com/lookup/github.com'
But we think of the bash lovers, so if you don't want to mess with JSON and a newline separated list is your wish, then include the Accept: text/plain
header.
DOMAIN="github.com"
curl -s -H "Accept: text/plain" "https://columbus.elmasy.com/lookup/$DOMAIN" | \
while read SUB
do
if [[ "$SUB" == "" ]]
then
HOST="$DOMAIN"
else
HOST="${SUB}.${DOMAIN}"
fi
echo "$HOST"
done
For more, check the features or the API documentation.
Currently, entries are got from Certificate Transparency.
Usage of columbus-server:
-check
Check for updates.
-config string
Path to the config file.
-version
Print version informations.
-check
: Check the lates version on GitHub. Prints up-to-date
and returns 0
if no update required. Prints the latest tag (eg.: v0.9.1
) and returns 1
if new release available. In case of error, prints the error message and returns 2
.
git clone https://github.com/elmasy-com/columbus-server
make build
Create a new user:
adduser --system --no-create-home --disabled-login columbus-server
Create a new group:
addgroup --system columbus
Add the new user to the new group:
usermod -aG columbus columbus-server
Copy the binary to /usr/bin/columbus-server
.
Make it executable:
chmod +x /usr/bin/columbus-server
Create a directory:
mkdir /etc/columbus
Copy the config file to /etc/columbus/server.conf
.
Set the permission to 0600.
chmod -R 0600 /etc/columbus
Set the owner of the config file:
chown -R columbus-server:columbus /etc/columbus
Install the service file (eg.: /etc/systemd/system/columbus-server.service
).
cp columbus-server.service /etc/systemd/system/
Reload systemd:
systemctl daemon-reload
Start columbus:
systemctl start columbus-server
If you want to columbus start automatically:
systemctl enable columbus-server
chaos is an 'origin' IP scanner developed by RST in collaboration with ChatGPT. It is a niche utility with an intended audience of mostly penetration testers and bug hunters.
An origin-IP is a term-of-art expression describing the final public IP destination for websites that are publicly served via 3rd parties. If you'd like to understand more about why anyone might be interested in Origin-IPs, please check out our blog post.
chaos was rapidly prototyped from idea to functional proof-of-concept in less than 24 hours using our principles of DevOps with ChatGPT.
usage: chaos.py [-h] -f FQDN -i IP [-a AGENT] [-C] [-D] [-j JITTER] [-o OUTPUT] [-p PORTS] [-P] [-r] [-s SLEEP] [-t TIMEOUT] [-T] [-v] [-x]
_..._
.-'` `'-.
__|___________|__
\ /
`._ CHAOS _.'
`-------`
/ \\
/ \\
/ \\
/ \\
/ \\
/ \\
/ \\
/ \\
/ \\
/_____________________\\
CHAtgpt Origin-ip Scanner
_______ _______ _______ _______ _______
|\\ /|\\ /|\\ /|\\ /|\\/|
| +---+ | +---+ | +---+ | +---+ | +---+ |
| |H | | |U | | |M | | |A | | |N | |
| |U | | |S | | |A | | |N | | |C | |
| |M | | |E | | |N | | |D | | |O | |
| |A | | |R | | |C | | | | | |L | |
| +---+ | +---+ | +---+ | +---+ | +---+ |
|/_____|\\_____|\\_____|\\_____|\\_____\\
Origin IP Scanner developed with ChatGPT
cha*os (n): complete disorder and confusion
(ver: 0.9.4)
cd path/to/chaos
pip3 install -U pip setuptools virtualenv
virtualenv env
source env/bin/activate
(env) pip3 install -U -r ./requirements.txt
(env) ./chaos.py -h
-h, --help show this help message and exit
-f FQDN, --fqdn FQDN Path to FQDN file (one FQDN per line)
-i IP, --ip IP IP address(es) for HTTP requests (Comma-separated IPs, IP networks, and/or files with IP/network per line)
-a AGENT, --agent AGENT
User-Agent header value for requests
-C, --csv Append CSV output to OUTPUT_FILE.csv
-D, --dns Perform fwd/rev DNS lookups on FQDN/IP values prior to request; no impact to testing queue
-j JITTER, --jitter JITTER
Add a 0-N second randomized delay to the sleep value
-o OUTPUT, --output OUTPUT
Append console output to FILE
-p PORTS, --ports PORTS
Comma-separated list of TCP ports to use (default: "80,443")
-P, --no-prep Do not pre-scan each IP/port w ith `GET /` using `Host: {IP:Port}` header to eliminate unresponsive hosts
-r, --randomize Randomize(ish) the order IPs/ports are tested
-s SLEEP, --sleep SLEEP
Add N seconds before thread completes
-t TIMEOUT, --timeout TIMEOUT
Wait N seconds for an unresponsive host
-T, --test Test-mode; don't send requests
-v, --verbose Enable verbose output
-x, --singlethread Single threaded execution; for 1-2 core systems; default threads=(cores-1) if cores>2
Launch python HTTP server
% python3 -u -m http.server 8001
Serving HTTP on :: port 8001 (http://[::]:8001/) ...
Launch ncat as HTTP on a port detected as SSL; use a loop because --keep-open can hang
% while true; do ncat -lvp 8443 -c 'printf "HTTP/1.0 204 Plaintext OK\n\n<html></html>\n"'; done
Ncat: Version 7.94 ( https://nmap.org/ncat )
Ncat: Listening on [::]:8443
Ncat: Listening on 0.0.0.0:8443
Also launch ncat as SSL on a port that will default to HTTP detection
% while true; do ncat --ssl -lvp 8444 -c 'printf "HTTP/1.0 202 OK\n\n<html></html>\n"'; done
Ncat: Version 7.94 ( https://nmap.org/ncat )
Ncat: Generating a temporary 2048-bit RSA key. Use --ssl-key and --ssl-cert to use a permanent one.
Ncat: SHA-1 fingerprint: 0208 1991 FA0D 65F0 608A 9DAB A793 78CB A6EC 27B8
Ncat: Listening on [::]:8444
Ncat: Listening on 0.0.0.0:8444
Prepare an FQDN file:
% cat ../test_localhost_fqdn.txt
www.example.com
localhost.example.com
localhost.local
localhost
notreally.arealdomain
Prepare an IP file / list:
% cat ../test_localhost_ips.txt
127.0.0.1
127.0.0.0/29
not_an_ip_addr
-6.a
=4.2
::1
Run the scan
% ./chaos.py -f ../test_localhost_fqdn.txt -i ../test_localhost_ips.txt,::1/126 -p 8001,8443,8444 -x -s0.2 -t1
2023-06-21 12:48:33 [WARN] Ignoring invalid FQDN value: localhost.local
2023-06-21 12:48:33 [WARN] Ignoring invalid FQDN value: localhost
2023-06-21 12:48:33 [WARN] Ignoring invalid FQDN value: notreally.arealdomain
2023-06-21 12:48:33 [WARN] Error: invalid IP address or CIDR block =4.2
2023-06-21 12:48:33 [WARN] Error: invalid IP address or CIDR block -6.a
2023-06-21 12:48:33 [WARN] Error: invalid IP address or CIDR block not_an_ip_addr
2023-06-21 12:48:33 [INFO] * ---- <META> ---- *
2023-06-21 12:48:33 [INFO] * Version: 0.9.4
2023-06-21 12:48:33 [INFO] * FQDN file: ../test_localhost_fqdn.txt
2023-06-21 12:48:33 [INFO] * FQDNs loaded: ['www.example.com', 'localhost.example.com']
2023-06-21 12:48:33 [INFO] * IP input value(s): ../test_localhost_ips.txt,::1/126
2023-06-21 12:48:33 [INFO] * Addresses pars ed from IP inputs: 12
2023-06-21 12:48:33 [INFO] * Port(s): 8001,8443,8444
2023-06-21 12:48:33 [INFO] * Thread(s): 1
2023-06-21 12:48:33 [INFO] * Sleep value: 0.2
2023-06-21 12:48:33 [INFO] * Timeout: 1.0
2023-06-21 12:48:33 [INFO] * User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36 ch4*0s/0.9.4
2023-06-21 12:48:33 [INFO] * ---- </META> ---- *
2023-06-21 12:48:33 [INFO] 36 unique address/port addresses for testing
Prep Tests: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ&# 9608;ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 36/36 [00:29<00:00, 1.20it/s]
2023-06-21 12:49:03 [INFO] 9 IP/ports verified, reducing test dataset from 72 entries
2023-06-21 12:49:03 [INFO] 18 pending tests remain after pre-testing
2023-06-21 12:49:03 [INFO] Queuing 18 threads
++RCVD++ (200 OK) www.example.com @ :::8001
++RCVD++ (204 Plaintext OK) www.example.com @ :::8443
++RCVD++ (202 OK) www.example.com @ :::8444
++RCVD++ (200 OK) www.example.com @ ::1:8001
++RCVD++ (204 Plaintext OK) www.example.com @ ::1:8443
++RCVD++ (202 OK) www.example.com @ ::1:8444
++RCVD++ (200 OK) www.example.com @ 127.0.0.1:8001
++RCVD++ (204 Plaintext OK) www.example.com @ 127.0.0.1:8443
++RCVD++ (202 OK) www.example.com @ 127.0.0.1:8444
++RCVD++ (200 OK) localhost.example.com @ :::8001
++RCVD++ (204 Plaintext OK) localhost.example.com @ :::8443
++RCVD+ + (202 OK) localhost.example.com @ :::8444
++RCVD++ (200 OK) localhost.example.com @ ::1:8001
++RCVD++ (204 Plaintext OK) localhost.example.com @ ::1:8443
++RCVD++ (202 OK) localhost.example.com @ ::1:8444
++RCVD++ (200 OK) localhost.example.com @ 127.0.0.1:8001
++RCVD++ (204 Plaintext OK) localhost.example.com @ 127.0.0.1:8443
++RCVD++ (202 OK) localhost.example.com @ 127.0.0.1:8444
Origin Scan: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ` 08;βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 18/18 [00:06<00:00, 2.76it/s]
2023-06-21 12:49:09 [RSLT] Results from 5 FQDNs:
::1
::1:8444 => (202 / OK)
::1:8443 => (204 / Plaintext OK)
::1:8001 => (200 / OK)
127.0.0.1
127.0.0.1:8001 => (200 / OK)
127.0.0.1:8443 => (204 / Plaintext OK)
127.0.0.1:8444 => (202 / OK)
::
:::8001 => (200 / OK)
:::8443 => (204 / Plaintext OK)
:::8444 => (202 / OK)
www.example.com
:::8001 => (200 / OK)
:::8443 => (204 / Plaintext OK)
:::8444 => (202 / OK)
::1:8001 => (200 / OK)
::1:8443 => (204 / Plaintext OK)
::1:8444 => (202 / OK)
127.0.0.1:8001 => (200 / OK)
127.0.0.1:8443 => (204 / Plaintext OK)
127.0.0.1:8444 => (202 / OK)
localhost.example.com
:::8001 => (200 / OK)
:::8443 => (204 / Plaintext OK)
:::8444 => (202 / OK)
::1:8001 => (200 / OK)
::1:8443 => (204 / Plaintext OK)
::1:8444 => (202 / OK)
127.0.0.1:8001 => (200 / OK)
127.0.0.1:8443 => (204 / Plaintext OK)
127.0.0.1:8444 => (202 / OK)
rst@r57 chaos %
-T
runs in test mode (do everything except send requests)
-v
verbose option provides additional output
xurlfind3r
is a command-line interface (CLI) utility to find domain's known URLs from curated passive online sources.
robots.txt
snapshots.Visit the releases page and find the appropriate archive for your operating system and architecture. Download the archive from your browser or copy its URL and retrieve it with wget
or curl
:
...with wget
:
wget https://github.com/hueristiq/xurlfind3r/releases/download/v<version>/xurlfind3r-<version>-linux-amd64.tar.gz
...or, with curl
:
curl -OL https://github.com/hueristiq/xurlfind3r/releases/download/v<version>/xurlfind3r-<version>-linux-amd64.tar.gz
...then, extract the binary:
tar xf xurlfind3r-<version>-linux-amd64.tar.gz
TIP: The above steps, download and extract, can be combined into a single step with this onliner
curl -sL https://github.com/hueristiq/xurlfind3r/releases/download/v<version>/xurlfind3r-<version>-linux-amd64.tar.gz | tar -xzv
NOTE: On Windows systems, you should be able to double-click the zip archive to extract the xurlfind3r
executable.
...move the xurlfind3r
binary to somewhere in your PATH
. For example, on GNU/Linux and OS X systems:
sudo mv xurlfind3r /usr/local/bin/
NOTE: Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xurlfind3r
to their PATH
.
Before you install from source, you need to make sure that Go is installed on your system. You can install Go by following the official instructions for your operating system. For this, we will assume that Go is already installed.
go install ...
go install -v github.com/hueristiq/xurlfind3r/cmd/xurlfind3r@latest
go build ...
the development VersionClone the repository
git clone https://github.com/hueristiq/xurlfind3r.git
Build the utility
cd xurlfind3r/cmd/xurlfind3r && \
go build .
Move the xurlfind3r
binary to somewhere in your PATH
. For example, on GNU/Linux and OS X systems:
sudo mv xurlfind3r /usr/local/bin/
NOTE: Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xurlfind3r
to their PATH
.
NOTE: While the development version is a good way to take a peek at xurlfind3r
's latest features before they get released, be aware that it may have bugs. Officially released versions will generally be more stable.
xurlfind3r
will work right after installation. However, BeVigil, Github and Intelligence X require API keys to work, URLScan supports API key but not required. The API keys are stored in the $HOME/.hueristiq/xurlfind3r/config.yaml
file - created upon first run - and uses the YAML format. Multiple API keys can be specified for each of these source from which one of them will be used.
Example config.yaml
:
version: 0.2.0
sources:
- bevigil
- commoncrawl
- github
- intelx
- otx
- urlscan
- wayback
keys:
bevigil:
- awA5nvpKU3N8ygkZ
github:
- d23a554bbc1aabb208c9acfbd2dd41ce7fc9db39
- asdsd54bbc1aabb208c9acfbd2dd41ce7fc9db39
intelx:
- 2.intelx.io:00000000-0000-0000-0000-000000000000
urlscan:
- d4c85d34-e425-446e-d4ab-f5a3412acbe8
To display help message for xurlfind3r
use the -h
flag:
xurlfind3r -h
help message:
_ __ _ _ _____
__ ___ _ _ __| |/ _(_)_ __ __| |___ / _ __
\ \/ / | | | '__| | |_| | '_ \ / _` | |_ \| '__|
> <| |_| | | | | _| | | | | (_| |___) | |
/_/\_\\__,_|_| |_|_| |_|_| |_|\__,_|____/|_| v0.2.0
USAGE:
xurlfind3r [OPTIONS]
TARGET:
-d, --domain string (sub)domain to match URLs
SCOPE:
--include-subdomains bool match subdomain's URLs
SOURCES:
-s, --sources bool list sources
-u, --use-sources string sources to use (default: bevigil,commoncrawl,github,intelx,otx,urlscan,wayback)
--skip-wayback-robots bool with wayback, skip parsing robots.txt snapshots
--skip-wayback-source bool with wayback , skip parsing source code snapshots
FILTER & MATCH:
-f, --filter string regex to filter URLs
-m, --match string regex to match URLs
OUTPUT:
--no-color bool no color mode
-o, --output string output URLs file path
-v, --verbosity string debug, info, warning, error, fatal or silent (default: info)
CONFIGURATION:
-c, --configuration string configuration file path (default: ~/.hueristiq/xurlfind3r/config.yaml)
xurlfind3r -d hackerone.com --include-subdomains
# filter images
xurlfind3r -d hackerone.com --include-subdomains -f '`^https?://[^/]*?/.*\.(jpg|jpeg|png|gif|bmp)(\?[^\s]*)?$`'
# match js URLs
xurlfind3r -d hackerone.com --include-subdomains -m '^https?://[^/]*?/.*\.js(\?[^\s]*)?$'
Issues and Pull Requests are welcome! Check out the contribution guidelines.
This utility is distributed under the MIT license.
AiCEF is a tool implementing the accompanying framework [1] in order to harness the intelligence that is available from online resources, as well as threat groups' activities, arsenal (eg. MITRE), to create relevant and timely cybersecurity exercise content. This way, we abstract the events from the reports in a machine-readable form. The produced graphs can be infused with additional intelligence, e.g. the threat actor profile from MITRE, also mapped in our ontology. While this may fill gaps that would be missing from a report, one can also manipulate the graph to create custom and unique models. Finally, we exploit transformer-based language models like GPT to convert the graph into text that can serve as the scenario of a cybersecurity exercise. We have tested and validated AiCEF with a group of experts in cybersecurity exercises, and the results clearly show that AiCEF significantly augments the capabilities in creating timely and relevant cybersecurity exercises in terms of both quality and time.
We used Python to create a machine-learning-powered Exercise Generation Framework and developed a set of tools to perform a set of individual tasks which would help an exercise planner (EP) to create a timely and targeted Cybersecurity Exercise Scenario, regardless of her experience.
Problems an Exercise Planner faces:
Our Main Objective: Build an AI powered tool that can generate relevant and up-to-date Cyber Exercise Content in a few steps with little technical expertise from the user.
The updated project, AiCEF v.2.0 is planned to be publicly released by the end of 2023, pending heavy code review and functionality updates. Submodules with reduced functinality will start being release by early June 2023. Thank you for your patience.
The most convenient way to install AiCEF is by using the docker-compose command. For production deployment, we advise you deploy MySQL manually in a dedicated environment and then to start the other components using Docker.
First, make sure you have docker-compose installed in your environment:
$ sudo apt-get install docker-compose
Then, clone the repository:
$ git clone https://github.com/grazvan/AiCEF/docker.git /<choose-a-path>/AiCEF-docker
$ cd /<choose-a-path>/AiCEF-docker
Import the MySQL file in your
$ mysql -u <your_username> Γ’β¬β-password=<your_password> AiCEF_db < AiCEF_db.sql
Before running the docker-compose
command, settings must be configured. Copy the sample settings file and change it accordingly to your needs.
$ cp .env.sample .env
Note: Make sure you have an OpenAI API key available. Load the environment setttings (including your MySQL connection details):
set -a ; source .env
Finally, run docker-compose
in detached (-d
) mode:
$ sudo docker-compose up -d
A common usage flow consists of generating a Trend Report to analyze patterns over time, parsing relevant articles and converting them into Incident Breadcrumbs using MLTP module and storing them in a knowledge database called KDb. Incidents are then generated using IncGen component and can be enhanced using the Graph Enhancer module to simulate known APT activity. The incidents come with injects that can be edited on the fly. The CSE scenario is then created using CEGen, which defines various attributes like CSE name, number of Events, and Incidents. MLCESO is a crucial step in the methodology where dedicated ML models are trained to extract information from the collected articles with over 80% accuracy. The Incident Generation & Enhancer (IncGen) workflow can be automated, generating a variety of incidents based on filtering parameters and the existing database. The knowledge database (KDB) consists of almost 3000 articles classified into six categories that can be augmented using APT Enhancer by using the activity of known APT groups from MITRE or manually.
Find below some sample usage screenshots:
AiCEF is a product designed and developed by Alex Zacharis, Razvan Gavrila and Constantinos Patsakis.
[1] https://link.springer.com/article/10.1007/s10207-023-00693-z
[2] https://oasis-open.github.io/cti-documentation/stix/intro.html
Contributions are welcome! If you'd like to contribute to AiCEF v2.0, please follow these steps:
git checkout -b feature/your-branch-name
)git commit -m 'Add some feature'
)git push origin feature/your-branch-name
)AiCEF is licensed under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. See for more information.
Under the following terms:
Attribution β You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. NonCommercial β You may not use the material for commercial purposes. No additional restrictions β You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Polaris is an open source policy engine for Kubernetes that validates and remediates resource configuration. It includes 30+ built in configuration policies, as well as the ability to build custom policies with JSON Schema. When run on the command line or as a mutating webhook, Polaris can automatically remediate issues based on policy criteria.
Polaris can be run in three different modes:
Check out the documentation at docs.fairwinds.com
The goal of the Fairwinds Community is to exchange ideas, influence the open source roadmap, and network with fellow Kubernetes users. Chat with us on Slack or join the user group to get involved!
Enjoying Polaris? Check out some of our other projects:
If you're interested in running Polaris in multiple clusters, tracking the results over time, integrating with Slack, Datadog, and Jira, or unlocking other functionality, check out Fairwinds Insights, a platform for auditing and enforcing policy in Kubernetes clusters.
A modular web reconnaissance tool and vulnerability scanner based on Karton (https://github.com/CERT-Polska/karton).
The Artemis project has been initiated by the KN Cyber science club of Warsaw University of Technology and is currently being maintained by CERT Polska.
Artemis is experimental software, under active development - use at your own risk.
For an up-to-date list of features, please refer to the documentation.
To run the tests, use:
./scripts/test
Artemis uses pre-commit
to run linters and format the code. pre-commit
is executed on CI to verify that the code is formatted properly.
To run it locally, use:
pre-commit run --all-files
To setup pre-commit
so that it runs before each commit, use:
pre-commit install
To build the documentation, use:
cd docs
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt
make html
Please refer to the documentation.
Contributions are welcome! We will appreciate both ideas for new Artemis modules (added as GitHub issues) as well as pull requests with new modules or code improvements.
However obvious it may seem we kindly remind you that by contributing to Artemis you agree that the BSD 3-Clause License shall apply to your input automatically, without the need for any additional declarations to be made.
ReconAIzer is a powerful Jython extension for Burp Suite that leverages OpenAI to help bug bounty hunters optimize their recon process. This extension automates various tasks, making it easier and faster for security researchers to identify and exploit vulnerabilities.
Once installed, ReconAIzer add a contextual menu and a dedicated tab to see the results:
Follow these steps to install the ReconAIzer extension on Burp Suite:
ReconAIzer.py
file in Step 3.1. Select the file and click "Open."Congratulations! You have successfully installed the ReconAIzer extension in Burp Suite. You can now start using it to enhance your bug bounty hunting experience.
Once it's done, you must configure your OpenAI API key on the "Config" tab under "ReconAIzer" tab.
Feel free to suggest prompts improvements or anything you would like to see on ReconAIzer!
Happy bug hunting!
burpgpt
leverages the power of AI
to detect security vulnerabilities that traditional scanners might miss. It sends web traffic to an OpenAI
model
specified by the user, enabling sophisticated analysis within the passive scanner. This extension offers customisable prompts
that enable tailored web traffic analysis to meet the specific needs of each user. Check out the Example Use Cases section for inspiration.
The extension generates an automated security report that summarises potential security issues based on the user's prompt
and real-time data from Burp
-issued requests. By leveraging AI
and natural language processing, the extension streamlines the security assessment process and provides security professionals with a higher-level overview of the scanned application or endpoint. This enables them to more easily identify potential security issues and prioritise their analysis, while also covering a larger potential attack surface.
[!WARNING] Data traffic is sent to
OpenAI
for analysis. If you have concerns about this or are using the extension for security-critical applications, it is important to carefully consider this and review OpenAI's Privacy Policy for further information.
[!WARNING] While the report is automated, it still requires triaging and post-processing by security professionals, as it may contain false positives.
[!WARNING] The effectiveness of this extension is heavily reliant on the quality and precision of the prompts created by the user for the selected
GPT
model. This targeted approach will help ensure theGPT model
generates accurate and valuable results for your security analysis.
Β
passive scan check
, allowing users to submit HTTP
data to an OpenAI
-controlled GPT model
for analysis through a placeholder
system.OpenAI's GPT models
to conduct comprehensive traffic analysis, enabling detection of various issues beyond just security vulnerabilities in scanned applications.GPT tokens
used in the analysis by allowing for precise adjustments of the maximum prompt length
.OpenAI models
to choose from, allowing them to select the one that best suits their needs.prompts
and unleash limitless possibilities for interacting with OpenAI models
. Browse through the Example Use Cases for inspiration.Burp Suite
, providing all native features for pre- and post-processing, including displaying analysis results directly within the Burp UI for efficient analysis.Burp Event Log
, enabling users to quickly resolve communication issues with the OpenAI API
.Operating System: Compatible with Linux
, macOS
, and Windows
operating systems.
Java Development Kit (JDK): Version 11
or later.
Burp Suite Professional or Community Edition: Version 2023.3.2
or later.
[!IMPORTANT] Please note that using any version lower than
2023.3.2
may result in a java.lang.NoSuchMethodError. It is crucial to use the specified version or a more recent one to avoid this issue.
Version 6.9
or later (recommended). The build.gradle file is provided in the project repository.JAVA_HOME
environment variable to point to the JDK installation directory.Please ensure that all system requirements, including a compatible version of Burp Suite
, are met before building and running the project. Note that the project's external dependencies will be automatically managed and installed by Gradle
during the build process. Adhering to the requirements will help avoid potential issues and reduce the need for opening new issues in the project repository.
Ensure you have Gradle installed and configured.
Download the burpgpt
repository:
git clone https://github.com/aress31/burpgpt
cd .\burpgpt\
Build the standalone jar
:
./gradlew shadowJar
Burp Suite
To install burpgpt
in Burp Suite
, first go to the Extensions
tab and click on the Add
button. Then, select the burpgpt-all
jar file located in the .\lib\build\libs
folder to load the extension.
To start using burpgpt, users need to complete the following steps in the Settings panel, which can be accessed from the Burp Suite menu bar:
OpenAI API key
.model
.max prompt size
. This field controls the maximum prompt
length sent to OpenAI
to avoid exceeding the maxTokens
of GPT
models (typically around 2048
for GPT-3
).Once configured as outlined above, the Burp passive scanner
sends each request to the chosen OpenAI model
via the OpenAI API
for analysis, producing Informational
-level severity findings based on the results.
burpgpt
enables users to tailor the prompt
for traffic analysis using a placeholder
system. To include relevant information, we recommend using these placeholders
, which the extension handles directly, allowing dynamic insertion of specific values into the prompt
:
Placeholder | Description |
---|---|
{REQUEST} | The scanned request. |
{URL} | The URL of the scanned request. |
{METHOD} | The HTTP request method used in the scanned request. |
{REQUEST_HEADERS} | The headers of the scanned request. |
{REQUEST_BODY} | The body of the scanned request. |
{RESPONSE} | The scanned response. |
{RESPONSE_HEADERS} | The headers of the scanned response. |
{RESPONSE_BODY} | The body of the scanned response. |
{IS_TRUNCATED_PROMPT} | A boolean value that is programmatically set to true or false to indicate whether the prompt was truncated to the Maximum Prompt Size defined in the Settings . |
These placeholders
can be used in the custom prompt
to dynamically generate a request/response analysis prompt
that is specific to the scanned request.
[!NOTE] >
Burp Suite
provides the capability to support arbitraryplaceholders
through the use of Session handling rules or extensions such as Custom Parameter Handler, allowing for even greater customisation of theprompts
.
The following list of example use cases showcases the bespoke and highly customisable nature of burpgpt
, which enables users to tailor their web traffic analysis to meet their specific needs.
Identifying potential vulnerabilities in web applications that use a crypto library affected by a specific CVE:
Analyse the request and response data for potential security vulnerabilities related to the {CRYPTO_LIBRARY_NAME} crypto library affected by CVE-{CVE_NUMBER}:
Web Application URL: {URL}
Crypto Library Name: {CRYPTO_LIBRARY_NAME}
CVE Number: CVE-{CVE_NUMBER}
Request Headers: {REQUEST_HEADERS}
Response Headers: {RESPONSE_HEADERS}
Request Body: {REQUEST_BODY}
Response Body: {RESPONSE_BODY}
Identify any potential vulnerabilities related to the {CRYPTO_LIBRARY_NAME} crypto library affected by CVE-{CVE_NUMBER} in the request and response data and report them.
Scanning for vulnerabilities in web applications that use biometric authentication by analysing request and response data related to the authentication process:
Analyse the request and response data for potential security vulnerabilities related to the biometric authentication process:
Web Application URL: {URL}
Biometric Authentication Request Headers: {REQUEST_HEADERS}
Biometric Authentication Response Headers: {RESPONSE_HEADERS}
Biometric Authentication Request Body: {REQUEST_BODY}
Biometric Authentication Response Body: {RESPONSE_BODY}
Identify any potential vulnerabilities related to the biometric authentication process in the request and response data and report them.
Analysing the request and response data exchanged between serverless functions for potential security vulnerabilities:
Analyse the request and response data exchanged between serverless functions for potential security vulnerabilities:
Serverless Function A URL: {URL}
Serverless Function B URL: {URL}
Serverless Function A Request Headers: {REQUEST_HEADERS}
Serverless Function B Response Headers: {RESPONSE_HEADERS}
Serverless Function A Request Body: {REQUEST_BODY}
Serverless Function B Response Body: {RESPONSE_BODY}
Identify any potential vulnerabilities in the data exchanged between the two serverless functions and report them.
Analysing the request and response data for potential security vulnerabilities specific to a Single-Page Application (SPA) framework:
Analyse the request and response data for potential security vulnerabilities specific to the {SPA_FRAMEWORK_NAME} SPA framework:
Web Application URL: {URL}
SPA Framework Name: {SPA_FRAMEWORK_NAME}
Request Headers: {REQUEST_HEADERS}
Response Headers: {RESPONSE_HEADERS}
Request Body: {REQUEST_BODY}
Response Body: {RESPONSE_BODY}
Identify any potential vulnerabilities related to the {SPA_FRAMEWORK_NAME} SPA framework in the request and response data and report them.
Settings
panel that allows users to set the maxTokens
limit for requests, thereby limiting the request size.AI model
, allowing users to run and interact with the model on their local machines, potentially improving response times and data privacy.maxTokens
value for each model
to transmit the maximum allowable data and obtain the most extensive GPT
response possible.Burp Suite
restarts.GPT
responses into the Vulnerability model
for improved reporting.The extension is currently under development and we welcome feedback, comments, and contributions to make it even better.
If this extension has saved you time and hassle during a security assessment, consider showing some love by sponsoring a cup of coffee
for the developer. It's the fuel that powers development, after all. Just hit that shiny Sponsor button at the top of the page or click here to contribute and keep the caffeine flowing.Did you find a bug? Well, don't just let it crawl around! Let's squash it together like a couple of bug whisperers!
Please report any issues on the GitHub issues tracker. Together, we'll make this extension as reliable as a cockroach surviving a nuclear apocalypse!
Looking to make a splash with your mad coding skills?
Awesome! Contributions are welcome and greatly appreciated. Please submit all PRs on the GitHub pull requests tracker. Together we can make this extension even more amazing!
See LICENSE.
This tool is a simple PoC of how to hide memory artifacts using a ROP chain in combination with hardware breakpoints. The ROP chain will change the main module memory page's protections to N/A while sleeping (i.e. when the function Sleep is called). For more detailed information about this memory scanning evasion technique check out the original project Gargoyle. x64 only.
The idea is to set up a hardware breakpoint in kernel32!Sleep and a new top-level filter to handle the exception. When Sleep is called, the exception filter function set before is triggered, allowing us to call the ROP chain without the need of using classic function hooks. This way, we avoid leaving weird and unusual private memory regions in the process related to well known dlls.
The ROP chain simply calls VirtualProtect() to set the current memory page to N/A, then calls SleepEx and finally restores the RX memory protection.
The overview of the process is as follows:
This process repeats indefinitely.
As it can be seen in the image, the main module's memory protection is changed to N/A while sleeping, which avoids memory scans looking for pages with execution permission.
Since we are using LITCRYPT plugin to obfuscate string literals, it is required to set up the environment variable LITCRYPT_ENCRYPT_KEY before compiling the code:
C:\Users\User\Desktop\RustChain> set LITCRYPT_ENCRYPT_KEY="yoursupersecretkey"
After that, simply compile the code and run the tool:
C:\Users\User\Desktop\RustChain> cargo build
C:\Users\User\Desktop\RustChain\target\debug> rustchain.exe
This tool is just a PoC and some extra features should be implemented in order to be fully functional. The main purpose of the project was to learn how to implement a ROP chain and integrate it within Rust. Because of that, this tool will only work if you use it as it is, and failures are expected if you try to use it in other ways (for example, compiling it to a dll and trying to reflectively load and execute it).