FreshRSS

πŸ”’
❌ Secure Planet Training Courses Updated For 2019 - Click Here
There are new available articles, click to refresh the page.
Before yesterdayYour RSS feeds

Amazon Is Investigating Perplexity Over Claims of Scraping Abuse

AWS hosted a server linked to the Bezos family- and Nvidia-backed search startup that appears to have been used to scrape the sites of major outlets, prompting an inquiry into potential rules violations.

JAW - A Graph-based Security Analysis Framework For Client-side JavaScript

By: Zion3R

An open-source, prototype implementation of property graphs for JavaScript based on the esprima parser, and the EsTree SpiderMonkey Spec. JAW can be used for analyzing the client-side of web applications and JavaScript-based programs.

This project is licensed under GNU AFFERO GENERAL PUBLIC LICENSE V3.0. See here for more information.

JAW has a Github pages website available at https://soheilkhodayari.github.io/JAW/.

Release Notes:


Overview of JAW

The architecture of the JAW is shown below.

Test Inputs

JAW can be used in two distinct ways:

  1. Arbitrary JavaScript Analysis: Utilize JAW for modeling and analyzing any JavaScript program by specifying the program's file system path.

  2. Web Application Analysis: Analyze a web application by providing a single seed URL.

Data Collection

  • JAW features several JavaScript-enabled web crawlers for collecting web resources at scale.

HPG Construction

  • Use the collected web resources to create a Hybrid Program Graph (HPG), which will be imported into a Neo4j database.

  • Optionally, supply the HPG construction module with a mapping of semantic types to custom JavaScript language tokens, facilitating the categorization of JavaScript functions based on their purpose (e.g., HTTP request functions).

Analysis and Outputs

  • Query the constructed Neo4j graph database for various analyses. JAW offers utility traversals for data flow analysis, control flow analysis, reachability analysis, and pattern matching. These traversals can be used to develop custom security analyses.

  • JAW also includes built-in traversals for detecting client-side CSRF, DOM Clobbering and request hijacking vulnerabilities.

  • The outputs will be stored in the same folder as that of input.

Setup

The installation script relies on the following prerequisites: - Latest version of npm package manager (node js) - Any stable version of python 3.x - Python pip package manager

Afterwards, install the necessary dependencies via:

$ ./install.sh

For detailed installation instructions, please see here.

Quick Start

Running the Pipeline

You can run an instance of the pipeline in a background screen via:

$ python3 -m run_pipeline --conf=config.yaml

The CLI provides the following options:

$ python3 -m run_pipeline -h

usage: run_pipeline.py [-h] [--conf FILE] [--site SITE] [--list LIST] [--from FROM] [--to TO]

This script runs the tool pipeline.

optional arguments:
-h, --help show this help message and exit
--conf FILE, -C FILE pipeline configuration file. (default: config.yaml)
--site SITE, -S SITE website to test; overrides config file (default: None)
--list LIST, -L LIST site list to test; overrides config file (default: None)
--from FROM, -F FROM the first entry to consider when a site list is provided; overrides config file (default: -1)
--to TO, -T TO the last entry to consider when a site list is provided; overrides config file (default: -1)

Input Config: JAW expects a .yaml config file as input. See config.yaml for an example.

Hint. The config file specifies different passes (e.g., crawling, static analysis, etc) which can be enabled or disabled for each vulnerability class. This allows running the tool building blocks individually, or in a different order (e.g., crawl all webapps first, then conduct security analysis).

Quick Example

For running a quick example demonstrating how to build a property graph and run Cypher queries over it, do:

$ python3 -m analyses.example.example_analysis --input=$(pwd)/data/test_program/test.js

Crawling and Data Collection

This module collects the data (i.e., JavaScript code and state values of web pages) needed for testing. If you want to test a specific JavaScipt file that you already have on your file system, you can skip this step.

JAW has crawlers based on Selenium (JAW-v1), Puppeteer (JAW-v2, v3) and Playwright (JAW-v3). For most up-to-date features, it is recommended to use the Puppeteer- or Playwright-based versions.

Playwright CLI with Foxhound

This web crawler employs foxhound, an instrumented version of Firefox, to perform dynamic taint tracking as it navigates through webpages. To start the crawler, do:

$ cd crawler
$ node crawler-taint.js --seedurl=https://google.com --maxurls=100 --headless=true --foxhoundpath=<optional-foxhound-executable-path>

The foxhoundpath is by default set to the following directory: crawler/foxhound/firefox which contains a binary named firefox.

Note: you need a build of foxhound to use this version. An ubuntu build is included in the JAW-v3 release.

Puppeteer CLI

To start the crawler, do:

$ cd crawler
$ node crawler.js --seedurl=https://google.com --maxurls=100 --browser=chrome --headless=true

See here for more information.

Selenium CLI

To start the crawler, do:

$ cd crawler/hpg_crawler
$ vim docker-compose.yaml # set the websites you want to crawl here and save
$ docker-compose build
$ docker-compose up -d

Please refer to the documentation of the hpg_crawler here for more information.

Graph Construction

HPG Construction CLI

To generate an HPG for a given (set of) JavaScript file(s), do:

$ node engine/cli.js  --lang=js --graphid=graph1 --input=/in/file1.js --input=/in/file2.js --output=$(pwd)/data/out/ --mode=csv

optional arguments:
--lang: language of the input program
--graphid: an identifier for the generated HPG
--input: path of the input program(s)
--output: path of the output HPG, must be i
--mode: determines the output format (csv or graphML)

HPG Import CLI

To import an HPG inside a neo4j graph database (docker instance), do:

$ python3 -m hpg_neo4j.hpg_import --rpath=<path-to-the-folder-of-the-csv-files> --id=<xyz> --nodes=<nodes.csv> --edges=<rels.csv>
$ python3 -m hpg_neo4j.hpg_import -h

usage: hpg_import.py [-h] [--rpath P] [--id I] [--nodes N] [--edges E]

This script imports a CSV of a property graph into a neo4j docker database.

optional arguments:
-h, --help show this help message and exit
--rpath P relative path to the folder containing the graph CSV files inside the `data` directory
--id I an identifier for the graph or docker container
--nodes N the name of the nodes csv file (default: nodes.csv)
--edges E the name of the relations csv file (default: rels.csv)

HPG Construction and Import CLI (v1)

In order to create a hybrid property graph for the output of the hpg_crawler and import it inside a local neo4j instance, you can also do:

$ python3 -m engine.api <path> --js=<program.js> --import=<bool> --hybrid=<bool> --reqs=<requests.out> --evts=<events.out> --cookies=<cookies.pkl> --html=<html_snapshot.html>

Specification of Parameters:

  • <path>: absolute path to the folder containing the program files for analysis (must be under the engine/outputs folder).
  • --js=<program.js>: name of the JavaScript program for analysis (default: js_program.js).
  • --import=<bool>: whether the constructed property graph should be imported to an active neo4j database (default: true).
  • --hybrid=bool: whether the hybrid mode is enabled (default: false). This implies that the tester wants to enrich the property graph by inputing files for any of the HTML snapshot, fired events, HTTP requests and cookies, as collected by the JAW crawler.
  • --reqs=<requests.out>: for hybrid mode only, name of the file containing the sequence of obsevered network requests, pass the string false to exclude (default: request_logs_short.out).
  • --evts=<events.out>: for hybrid mode only, name of the file containing the sequence of fired events, pass the string false to exclude (default: events.out).
  • --cookies=<cookies.pkl>: for hybrid mode only, name of the file containing the cookies, pass the string false to exclude (default: cookies.pkl).
  • --html=<html_snapshot.html>: for hybrid mode only, name of the file containing the DOM tree snapshot, pass the string false to exclude (default: html_rendered.html).

For more information, you can use the help CLI provided with the graph construction API:

$ python3 -m engine.api -h

Security Analysis

The constructed HPG can then be queried using Cypher or the NeoModel ORM.

Running Custom Graph traversals

You should place and run your queries in analyses/<ANALYSIS_NAME>.

Option 1: Using the NeoModel ORM (Deprecated)

You can use the NeoModel ORM to query the HPG. To write a query:

  • (1) Check out the HPG data model and syntax tree.
  • (2) Check out the ORM model for HPGs
  • (3) See the example query file provided; example_query_orm.py in the analyses/example folder.
$ python3 -m analyses.example.example_query_orm  

For more information, please see here.

Option 2: Using Cypher Queries

You can use Cypher to write custom queries. For this:

  • (1) Check out the HPG data model and syntax tree.
  • (2) See the example query file provided; example_query_cypher.py in the analyses/example folder.
$ python3 -m analyses.example.example_query_cypher

For more information, please see here.

Vulnerability Detection

This section describes how to configure and use JAW for vulnerability detection, and how to interpret the output. JAW contains, among others, self-contained queries for detecting client-side CSRF and DOM Clobbering

Step 1. enable the analysis component for the vulnerability class in the input config.yaml file:

request_hijacking:
enabled: true
# [...]
#
domclobbering:
enabled: false
# [...]

cs_csrf:
enabled: false
# [...]

Step 2. Run an instance of the pipeline with:

$ python3 -m run_pipeline --conf=config.yaml

Hint. You can run multiple instances of the pipeline under different screens:

$ screen -dmS s1 bash -c 'python3 -m run_pipeline --conf=conf1.yaml; exec sh'
$ screen -dmS s2 bash -c 'python3 -m run_pipeline --conf=conf2.yaml; exec sh'
$ # [...]

To generate parallel configuration files automatically, you may use the generate_config.py script.

How to Interpret the Output of the Analysis?

The outputs will be stored in a file called sink.flows.out in the same folder as that of the input. For Client-side CSRF, for example, for each HTTP request detected, JAW outputs an entry marking the set of semantic types (a.k.a, semantic tags or labels) associated with the elements constructing the request (i.e., the program slices). For example, an HTTP request marked with the semantic type ['WIN.LOC'] is forgeable through the window.location injection point. However, a request marked with ['NON-REACH'] is not forgeable.

An example output entry is shown below:

[*] Tags: ['WIN.LOC']
[*] NodeId: {'TopExpression': '86', 'CallExpression': '87', 'Argument': '94'}
[*] Location: 29
[*] Function: ajax
[*] Template: ajaxloc + "/bearer1234/"
[*] Top Expression: $.ajax({ xhrFields: { withCredentials: "true" }, url: ajaxloc + "/bearer1234/" })

1:['WIN.LOC'] variable=ajaxloc
0 (loc:6)- var ajaxloc = window.location.href

This entry shows that on line 29, there is a $.ajax call expression, and this call expression triggers an ajax request with the url template value of ajaxloc + "/bearer1234/, where the parameter ajaxloc is a program slice reading its value at line 6 from window.location.href, thus forgeable through ['WIN.LOC'].

Test Web Application

In order to streamline the testing process for JAW and ensure that your setup is accurate, we provide a simple node.js web application which you can test JAW with.

First, install the dependencies via:

$ cd tests/test-webapp
$ npm install

Then, run the application in a new screen:

$ screen -dmS jawwebapp bash -c 'PORT=6789 npm run devstart; exec sh'

Detailed Documentation.

For more information, visit our wiki page here. Below is a table of contents for quick access.

The Web Crawler of JAW

Data Model of Hybrid Property Graphs (HPGs)

Graph Construction

Graph Traversals

Contribution and Code Of Conduct

Pull requests are always welcomed. This project is intended to be a safe, welcoming space, and contributors are expected to adhere to the contributor code of conduct.

Academic Publication

If you use the JAW for academic research, we encourage you to cite the following paper:

@inproceedings{JAW,
title = {JAW: Studying Client-side CSRF with Hybrid Property Graphs and Declarative Traversals},
author= {Soheil Khodayari and Giancarlo Pellegrino},
booktitle = {30th {USENIX} Security Symposium ({USENIX} Security 21)},
year = {2021},
address = {Vancouver, B.C.},
publisher = {{USENIX} Association},
}

Acknowledgements

JAW has come a long way and we want to give our contributors a well-deserved shoutout here!

@tmbrbr, @c01gide, @jndre, and Sepehr Mirzaei.



Huntr-Com-Bug-Bounties-Collector - Keep Watching New Bug Bounty (Vulnerability) Postings

By: Zion3R


New bug bounty(vulnerabilities) collector


Requirements
  • Chrome with GUI (If you encounter trouble with script execution, check the status of VMs GPU features, if available.)
  • Chrome WebDriver

Preview
# python3 main.py

*2024-02-20 16:14:47.836189*

1. Arbitrary File Reading due to Lack of Input Filepath Validation
- Feb 6th 2024 / High (CVE-2024-0964)
- gradio-app/gradio
- https://huntr.com/bounties/25e25501-5918-429c-8541-88832dfd3741/

2. View Barcode Image leads to Remote Code Execution
- Jan 31st 2024 / Critical (CVE: Not yet)
- dolibarr/dolibarr
- https://huntr.com/bounties/f0ffd01e-8054-4e43-96f7-a0d2e652ac7e/

(delimiter-based file database)

# vim feeds.db

1|2024-02-20 16:17:40.393240|7fe14fd58ca2582d66539b2fe178eeaed3524342|CVE-2024-0964|https://huntr.com/bounties/25e25501-5918-429c-8541-88832dfd3741/
2|2024-02-20 16:17:40.393987|c6b84ac808e7f229a4c8f9fbd073b4c0727e07e1|CVE: Not yet|https://huntr.com/bounties/f0ffd01e-8054-4e43-96f7-a0d2e652ac7e/
3|2024-02-20 16:17:40.394582|7fead9658843919219a3b30b8249700d968d0cc9|CVE: Not yet|https://huntr.com/bounties/d6cb06dc-5d10-4197-8f89-847c3203d953/
4|2024-02-20 16:17:40.395094|81fecdd74318ce7da9bc29e81198e62f3225bd44|CVE: Not yet|https://huntr.com/bounties/d875d1a2-7205-4b2b-93cf-439fa4c4f961/
5|2024-02-20 16:17:40.395613|111045c8f1a7926174243db403614d4a58dc72ed|CVE: Not yet|https://huntr.com/bounties/10e423cd-7051-43fd-b736-4e18650d0172/

Notes
  • This code is designed to parse HTML elements from huntr.com, so it may not function correctly if the HTML page structure changes.
  • In case of errors during parsing, exception handling has been included, so if it doesn't work as expected, please inspect the HTML source for any changes.
  • If get in trouble In a typical cloud environment, scripts may not function properly within virtual machines (VMs).


PhantomCrawler - Boost Website Hits By Generating Requests From Multiple Proxy IPs

By: Zion3R


PhantomCrawler allows users to simulate website interactions through different proxy IP addresses. It leverages Python, requests, and BeautifulSoup to offer a simple and effective way to test website behaviour under varied proxy configurations.

Features:

  • Utilizes a list of proxy IP addresses from a specified file.
  • Supports both HTTP and HTTPS proxies.
  • Allows users to input the target website URL, proxy file path, and a static port.
  • Makes HTTP requests to the specified website using each proxy.
  • Parses HTML content to extract and visit links on the webpage.

Usage:

  • POC Testing: Simulate website interactions to assess functionality under different proxy setups.
  • Web Traffic Increase: Boost website hits by generating requests from multiple proxy IPs.
  • Proxy Rotation Testing: Evaluate the effectiveness of rotating proxy IPs.
  • Web Scraping Testing: Assess web scraping tasks under different proxy configurations.
  • DDoS Awareness: Caution: The tool has the potential for misuse as a DDoS tool. Ensure responsible and ethical use.

Get New Proxies with port and add in proxies.txt in this format 50.168.163.176:80
  • You can add it from here: https://free-proxy-list.net/ these free proxies are not validated some might not work so first validate these proxies before adding.

How to Use:

  1. Clone the repository:
git clone https://github.com/spyboy-productions/PhantomCrawler.git
  1. Install dependencies:
pip3 install -r requirements.txt
  1. Run the script:
python3 PhantomCrawler.py

Disclaimer: PhantomCrawler is intended for educational and testing purposes only. Users are cautioned against any misuse, including potential DDoS activities. Always ensure compliance with the terms of service of websites being tested and adhere to ethical standards.


Snapshots:

If you find this GitHub repo useful, please consider giving it a star!Β 



Crawlector - Threat Hunting Framework Designed For Scanning Websites For Malicious Objects

By: Zion3R


Crawlector (the name Crawlector is a combination of Crawler & Detector) is a threat hunting framework designed for scanning websites for malicious objects.

Note-1: The framework was first presented at the No Hat conference in Bergamo, Italy on October 22nd, 2022 (Slides, YouTube Recording). Also, it was presented for the second time at the AVAR conference, in Singapore, on December 2nd, 2022.

Note-2: The accompanying tool EKFiddle2Yara (is a tool that takes EKFiddle rules and converts them into Yara rules) mentioned in the talk, was also released at both conferences.


Features

  • Supports spidering websites for findings additional links for scanning (up to 2 levels only)
  • Integrates Yara as a backend engine for rule scanning
  • Supports online and offline scanning
  • Supports crawling for domains/sites digital certificate
  • Supports querying URLhaus for finding malicious URLs on the page
  • Supports hashing the page's content with TLSH (Trend Micro Locality Sensitive Hash), and other standard cryptographic hash functions such as md5, sha1, sha256, and ripemd128, among others
    • TLSH won't return a value if the page size is less than 50 bytes or not "enough amount of randomness" is present in the data
  • Supports querying the rating and category of every URL
  • Supports expanding on a given site, by attempting to find all available TLDs and/or subdomains for the same domain
    • This feature uses the Omnisint Labs API (this site is down as of March 10, 2023) and RapidAPI APIs
    • TLD expansion implementation is native
    • This feature along with the rating and categorization, provides the capability to find scam/phishing/malicious domains for the original domain
  • Supports domain resolution (IPv4 and IPv6)
  • Saves scanned websites pages for later scanning (can be saved as a zip compressed)
  • The entirety of the framework’s settings is controlled via a single customizable configuration file
  • All scanning sessions are saved into a well-structured CSV file with a plethora of information about the website being scanned, in addition to information about the Yara rules that have triggered
  • All HTTP(S) communications are proxy-aware
  • One executable
  • Written in C++

URLHaus Scanning & API Integration

This is for checking for malicious urls against every page being scanned. The framework could either query the list of malicious URLs from URLHaus server (configuration: url_list_web), or from a file on disk (configuration: url_list_file), and if the latter is specified, then, it takes precedence over the former.

It works by searching the content of every page against all URL entries in url_list_web or url_list_file, checking for all occurrences. Additionally, upon a match, and if the configuration option check_url_api is set to true, Crawlector will send a POST request to the API URL set in the url_api configuration option, which returns a JSON object with extra information about a matching URL. Such information includes urlh_status (ex., online, offline, unknown), urlh_threat (ex., malware_download), urlh_tags (ex., elf, Mozi), and urlh_reference (ex., https://urlhaus.abuse.ch/url/1116455/). This information will be included in the log file cl_mlog_<current_date><current_time><(pm|am)>.csv (check below), only if check_url_api is set to true. Otherwise, the log file will include the columns urlh_url (list o f matching malicious URLs) and urlh_hit (number of occurrences for every matching malicious URL), conditional on whether check_url is set to true.

URLHaus feature could be disabled in its entirety by setting the configuration option check_url to false.

It is important to note that this feature could slow scanning considering the huge number of malicious urls (~ 130 million entries at the time of this writing) that need to be checked, and the time it takes to get extra information from the URLHaus server (if the option check_url_api is set to true).

Files and Folders Structures

  1. \cl_sites
    • this is where the list of sites to be visited or crawled is stored.
    • supports multiple files and directories.
  2. \crawled
    • where all crawled/spidered URLs are saved to a text file.
  3. \certs
    • where all domains/sites digital certificates are stored (in .der format).
  4. \results
    • where visited websites are saved.
  5. \pg_cache
    • program cache for sites that are not part of the spider functionality.
  6. \cl_cache
    • crawler cache for sites that are part of the spider functionality.
  7. \yara_rules
    • this is where all Yara rules are stored. All rules that exist in this directory will be loaded by the engine, parsed, validated, and evaluated before execution.
  8. cl_config.ini
    • this file contains all the configuration parameters that can be adjusted to influence the behavior of the framework.
  9. cl_mlog_<current_date><current_time><(pm|am)>.csv
    • log file that contains a plethora of information about visited websites
    • date, time, the status of Yara scanning, list of fired Yara rules with the offsets and lengths of each of the matches, id, URL, HTTP status code, connection status, HTTP headers, page size, the path to a saved page on disk, and other columns related to URLHaus results.
    • file name is unique per session.
  10. cl_offl_mlog_<current_date><current_time><(pm|am)>.csv
    • log file that contains information about files scanned offline.
    • list of fired Yara rules with the offsets and lengths of the matches, and path to a saved page on disk.
    • file name is unique per session.
  11. cl_certs_<current_date><current_time><(pm|am)>.csv
    • log file that contains a plethora of information about found digital certificates
  12. \expanded\exp_subdomain_<pm|am>.txt
    • contains discovered subdomains (part of the [site] section)
  13. \expanded\exp_tld_<pm|am>.txt
    • contains discovered domains (part of the [site] section)

Configuration File (cl_config.ini)

It is very important that you familiarize yourself with the configuration file cl_config.ini before running any session. All of the sections and parameters are documented in the configuration file itself.

The Yara offline scanning feature is a standalone option, meaning, if enabled, Crawlector will execute this feature only irrespective of other enabled features. And, the same is true for the crawling for domains/sites digital certificate feature. Either way, it is recommended that you disable all non-used features in the configuration file.

  • Depending on the configuration settings (log_to_file or log_to_cons), if a Yara rule references only a module's attributes (ex., PE, ELF, Hash, etc...), then Crawlector will display only the rule's name upon a match, excluding offset and length data.

Sites Format Pattern

To visit/scan a website, the list of URLs must be stored in text files, in the directory β€œcl_sites”.

Crawlector accepts three types of URLs:

  1. Type 1: one URL per line
    • Crawlector will assign a unique name to every URL, derived from the URL hostname
  2. Type 2: one URL per line, with a unique name [a-zA-Z0-9_-]{1,128} = <url>
  3. Type 3: for the spider functionality, a unique format is used. One URL per line is as follows:

<id>[depth:<0|1>-><\d+>,total:<\d+>,sleep:<\d+>] = <url>

For example,

mfmokbel[depth:1->3,total:10,sleep:0] = https://www.mfmokbel.com

which is equivalent to: mfmokbel[d:1->3,t:10,s:0] = https://www.mfmokbel.com

where, <id> := [a-zA-Z0-9_-]{1,128}

depth, total and sleep, can also be replaced with their shortened versions d, t and s, respectively.

  • depth: the spider supports going two levels deep for finding additional URLs (this is a design decision).
  • A value of 0 indicates a depth of level 1, with the value that comes after the β€œ->” ignored.
  • A depth of level-1 is controlled by the total parameter. So, first, the spider tries to find that many additional URLs off of the specified URL.
  • The value after the β€œ->” represents the maximum number of URLs to spider for each of the URLs found (as per the total parameter value).
  • A value of 1, indicates a depth of level 2, with the value that comes after the β€œ->” representing the maximum number of URLs to find, for every URL found per the total parameter. For clarification, and as shown in the example above, first, the spider will look for 10 URLs (as specified in the total parameter), and then, each of those found URLs will be spidered up to a max of 3 URLs; therefore, and in the best-case scenario, we would end up with 40 (10 + (10*3)) URLs.
  • The sleep parameter takes an integer value representing the number of milliseconds to sleep between every HTTP request.

Note 1: Type 3 URL could be turned into type 1 URL by setting the configuration parameter live_crawler to false, in the configuration file, in the spider section.

Note 2: Empty lines and lines that start with β€œ;” or β€œ//” are ignored.

The Spider Functionality

The spider functionality is what gives Crawlector the capability to find additional links on the targeted page. The Spider supports the following featuers:

  • The domain has to be of Type 3, for the Spider functionality to work
  • You may specify a list of wildcarded patterns (pipe delimited) to prevent spidering matching urls via the exclude_url config. option. For example, *.zip|*.exe|*.rar|*.zip|*.7z|*.pdf|.*bat|*.db
  • You may specify a list of wildcarded patterns (pipe delimited) to spider only urls that match the pattern via the include_url config. option. For example, */checkout/*|*/products/*
  • You may exclude HTTPS urls via the config. option exclude_https
  • You may account for outbound/external links as well, for the main page only, via the config. option add_ext_links. This feature honours the exclude_url and include_url config. option.
  • You may account for outbound/external links of the main page only, excluding all other urls, via the config. option ext_links_only. This feature honours the exclude_url and include_url config. option.

Site Ranking Functionality

  • This is for checking the ranking of the website
  • You give it a file with a list of websites, with their ranking, in a csv file format
  • Services that provide lists of websites ranking include, Alexa top-1m (discontinued as of May 2022), Cisco Umbrella, Majestic, Quantcast, Farsight and Tranco, among others
  • CSV file format (2 columns only): first column holds the ranking, and the second column holds the domain name
  • If a cell to contain quoted data, it'll be automatically dequoted
  • Line breaks aren't allowed in quoted text
  • Leading and trailing spaces are trimmed from cells read
  • Empty and comment lines are skipped
  • The section site_ranking in the configuration file provides some options to alter how the CSV file is to be read
  • The performance of this query is dependent on the number of records in the CSV file
  • Crawlector compares every entry in the CSV file against the domain being investigated, and not the other way around
  • Only the registered/pay-level domain is compared

Finding TLDs and Subdomains - [site] Section

  • The site section provides the capability to expand on a given site, by attempting to find all available top-level domains (TLDs) and/or subdomains for the same domain. If found, new tlds/subdomains will be checked like any other domain
  • This feature uses the Omnisint Labs (https://omnisint.io/) and RapidAPI APIs
  • Omnisint Labs API returns subdomains and tlds, whereas RapidAPI returns only subdomains (the Omnisint Labs API is down as of March 10, 2023, however, the implementation is still available in case the site is back up)
  • For RapidAPI, you need a valid "Domains records" API key that you can request from RapidAPI, and plug it into the key rapid_api_key in the configuration file
  • With find_tlds enabled, in addition to Omnisint Labs API tlds results, the framework attempts to find other active/registered domains by going through every tld entry, either, in the tlds_file or tlds_url
  • If tlds_url is set, it should point to a url that hosts tlds, each one on a new line (lines that start with either of the characters ';', '#' or '//' are ignored)
  • tlds_file, holds the filename that contains the list of tlds (same as for tlds_url; only the tld is present, excluding the '.', for ex., "com", "org")
  • If tlds_file is set, it takes precedence over tlds_url
  • tld_dl_time_out, this is for setting the maximum timeout for the dnslookup function when attempting to check if the domain in question resolves or not
  • tld_use_connect, this option enables the functionality to connect to the domain in question over a list of ports, defined in the option tlds_connect_ports
  • The option tlds_connect_ports accepts a list of ports, comma separated, or a list of ranges, such as 25-40,90-100,80,443,8443 (range start and end are inclusive)
    • tld_con_time_out, this is for setting the maximum timeout for the connect function
  • tld_con_use_ssl, enable/disable the use of ssl when attempting to connect to the domain
  • If save_to_file_subd is set to true, discovered subdomains will be saved to "\expanded\exp_subdomain_<pm|am>.txt"
  • If save_to_file_tld is set to true, discovered domains will be saved to "\expanded\exp_tld_<pm|am>.txt"
  • If exit_here is set to true, then Crawlector bails out after executing this [site] function, irrespective of other enabled options. It means found sites won't be crawled/spidered

Design Considerations

  • A URL page is retrieved by sending a GET request to the server, reading the server response body, and passing it to Yara engine for detection.
  • Some of the GET request attributes are defined in the [default] section in the configuration file, including, the User-Agent and Referer headers, and connection timeout, among other options.
  • Although Crawlector logs a session's data to a CSV file, converting it to an SQL file is recommended for better performance, manipulation and retrieval of the data. This becomes evident when you’re crawling thousands of domains.
  • Repeated domains/urls in the cl_sites are allowed.

Limitations

  • Single threaded
  • Static detection (no dynamic evaluation of a given page's content)
  • No headless browser support, yet!

Third-party libraries used

Contributing

Open for pull requests and issues. Comments and suggestions are greatly appreciated.

Author

Mohamad Mokbel (@MFMokbel)



Cve-Collector - Simple Latest CVE Collector

By: Zion3R


Simple Latest CVE Collector Written in Python

  • There are various methods for collecting the latest CVE (Common Vulnerabilities and Exposures) information.
  • This code was created to provide guidance on how to collect, what information to include, and how to code when creating a CVE collector.
  • The code provided here is one of many ways to implement a CVE collector.
  • It is written using a method that involves crawling a specific website, parsing HTML elements, and retrieving the data.

This collector uses a search query on https://www.cvedetails.com to collect information on vulnerabilities with a severity score of 6 or higher.

  • It creates a simple delimiter-based file to function as a database (no DBMS required).
  • When a new CVE is discovered, it retrieves "vulnerability details" as well.

  1. Set the cvss_min_score variable.
  1. Add addtional code to receive results, such as a webhook.
  • The location for calling this code is marked as "Send the result to webhook."
  1. If you want to run it automatically, register it in crontab or a similar scheduler.

# python3 main.py

*2023-10-10 11:05:33.370262*

1. CVE-2023-44832 / CVSS: 7.5 (HIGH)
- Published: 2023-10-05 16:15:12
- Updated: 2023-10-07 03:15:47
- CWE: CWE-120 Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')

D-Link DIR-823G A1V1.0.2B05 was discovered to contain a buffer overflow via the MacAddress parameter in the SetWanSettings function. Th...
>> https://www.cve.org/CVERecord?id=CVE-2023-44832

- Ref.
(1) https://www.dlink.com/en/security-bulletin/
(2) https://github.com/bugfinder0/public_bug/tree/main/dlink/dir823g/SetWanSettings_MacAddress



2. CVE-2023-44831 / CVSS: 7.5 (HIGH)
- Published: 2023-10-05 16:15:12
- Updated: 2023-10-07 03:16:56
- CWE: CWE-120 Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')

D-Lin k DIR-823G A1V1.0.2B05 was discovered to contain a buffer overflow via the Type parameter in the SetWLanRadioSettings function. Th...
>> https://www.cve.org/CVERecord?id=CVE-2023-44831

- Ref.
(1) https://www.dlink.com/en/security-bulletin/
(2) https://github.com/bugfinder0/public_bug/tree/main/dlink/dir823g/SetWLanRadioSettings_Type

(delimiter-based file database)

# vim feeds.db

1|2023-10-10 09:24:21.496744|0d239fa87be656389c035db1c3f5ec6ca3ec7448|CVE-2023-45613|2023-10-09 11:15:11|6.8|MEDIUM|CWE-295 Improper Certificate Validation
2|2023-10-10 09:24:27.073851|30ebff007cca946a16e5140adef5a9d5db11eee8|CVE-2023-45612|2023-10-09 11:15:11|8.6|HIGH|CWE-611 Improper Restriction of XML External Entity Reference
3|2023-10-10 09:24:32.650234|815b51259333ed88193fb3beb62c9176e07e4bd8|CVE-2023-45303|2023-10-06 19:15:13|8.4|HIGH|Not found CWE ids for CVE-2023-45303
4|2023-10-10 09:24:38.369632|39f98184087b8998547bba41c0ccf2f3ad61f527|CVE-2023-45248|2023-10-09 12:15:10|6.6|MEDIUM|CWE-427 Uncontrolled Search Path Element
5|2023-10-10 09:24:43.936863|60083d8626b0b1a59ef6fa16caec2b4fd1f7a6d7|CVE-2023-45247|2023-10-09 12:15:10|7.1|HIGH|CWE-862 Missing Authorization
6|2023-10-10 09:24:49.472179|82611add9de44e5807b8f8324bdfb065f6d4177a|CVE-2023-45246|2023-10-06 11:15:11|7.1|HIGH|CWE-287 Improper Authentication
7|20 23-10-10 09:24:55.049191|b78014cd7ca54988265b19d51d90ef935d2362cf|CVE-2023-45244|2023-10-06 10:15:18|7.1|HIGH|CWE-862 Missing Authorization

The methods for collecting CVE (Common Vulnerabilities and Exposures) information are divided into different stages. They are primarily categorized into two

(1) Method for retrieving CVE information after vulnerability analysis and risk assessment have been completed.

  • This method involves collecting CVE information after all the processes have been completed.
  • Naturally, there is a time lag of several days (it is slower).

(2) Method for retrieving CVE information at the stage when it is included as a vulnerability.

  • This refers to the stage immediately after a CVE ID has been assigned and the vulnerability has been publicly disclosed.
  • At this stage, there may only be basic information about the vulnerability, or the CVSS score may not have been evaluated, and there may be a lack of necessary content such as reference documents.

  • This code is designed to parse HTML elements from cvedetails.com, so it may not function correctly if the HTML page structure changes.
  • In case of errors during parsing, exception handling has been included, so if it doesn't work as expected, please inspect the HTML source for any changes.

  • Get free latest infomation. If useful to someone, Free for all to the last. (absolutely no paid)
  • ID 2 is the channel created using this repository source code.
  • If you find this helpful, please the "star" to support further improvements.


Pinkerton - An JavaScript File Crawler And Secret Finder Developed In Python

By: Zion3R


️️ Pinkerton is a Python tool created to crawl JavaScript files and search for secrets


Installing / Getting started

A quick guide of how to install and use Pinkerton.

1. Clone the repository with: git clone https://github.com/oppsec/pinkerton.git
2. Install the libraries with: pip3 install -r requirements.txt
3. Run Pinkerton with: python3 main.py -u https://example.com

Docker

If you want to use pinkerton in a Docker container, follow this commands:

1. Clone the repository - git clone https://github.com/oppsec/pinkerton.git
2. Build the image - sudo docker build -t pinkerton:latest .
3. Run container - sudo docker run pinkerton:latest



Pre-requisites

  • Python 3 installed on your machine.
  • Install the libraries with pip3 install -r requirements.txt

Features

  • Works with ProxyChains
  • Fast scan
  • Low RAM and CPU usage
  • Open-Source
  • Python ❀️

To-Do

  • Add more secrets regex pattern
  • Improve JavaScript file extract function
  • Improve pattern match system
  • Add pass list file method

Contributing

A quick guide of how to contribute with the project.

1. Create a fork from Pinkerton repository
2. Clone the repository with git clone https://github.com/your/pinkerton.git
3. Type cd pinkerton/
4. Create a branch and make your changes
5. Commit and make a git push
6. Open a pull request


Credits


Warning

  • The developer is not responsible for any malicious use of this tool.


Xcrawl3R - A CLI Utility To Recursively Crawl Webpages

By: Zion3R


xcrawl3r is a command-line interface (CLI) utility to recursively crawl webpages i.e systematically browse webpages' URLs and follow links to discover linked webpages' URLs.


Features

  • Recursively crawls webpages for URLs.
  • Parses URLs from files (.js, .json, .xml, .csv, .txt & .map).
  • Parses URLs from robots.txt.
  • Parses URLs from sitemaps.
  • Renders pages (including Single Page Applications such as Angular and React).
  • Cross-Platform (Windows, Linux & macOS)

Installation

Install release binaries (Without Go Installed)

Visit the releases page and find the appropriate archive for your operating system and architecture. Download the archive from your browser or copy its URL and retrieve it with wget or curl:

  • ...with wget:

     wget https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz
  • ...or, with curl:

     curl -OL https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz

...then, extract the binary:

tar xf xcrawl3r-<version>-linux-amd64.tar.gz

TIP: The above steps, download and extract, can be combined into a single step with this onliner

curl -sL https://github.com/hueristiq/xcrawl3r/releases/download/v<version>/xcrawl3r-<version>-linux-amd64.tar.gz | tar -xzv

NOTE: On Windows systems, you should be able to double-click the zip archive to extract the xcrawl3r executable.

...move the xcrawl3r binary to somewhere in your PATH. For example, on GNU/Linux and OS X systems:

sudo mv xcrawl3r /usr/local/bin/

NOTE: Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xcrawl3r to their PATH.

Install source (With Go Installed)

Before you install from source, you need to make sure that Go is installed on your system. You can install Go by following the official instructions for your operating system. For this, we will assume that Go is already installed.

go install ...

go install -v github.com/hueristiq/xcrawl3r/cmd/xcrawl3r@latest

go build ... the development Version

  • Clone the repository

     git clone https://github.com/hueristiq/xcrawl3r.git 
  • Build the utility

     cd xcrawl3r/cmd/xcrawl3r && \
    go build .
  • Move the xcrawl3r binary to somewhere in your PATH. For example, on GNU/Linux and OS X systems:

     sudo mv xcrawl3r /usr/local/bin/

    NOTE: Windows users can follow How to: Add Tool Locations to the PATH Environment Variable in order to add xcrawl3r to their PATH.

NOTE: While the development version is a good way to take a peek at xcrawl3r's latest features before they get released, be aware that it may have bugs. Officially released versions will generally be more stable.

Usage

To display help message for xcrawl3r use the -h flag:

xcrawl3r -h

help message:

                             _ _____      
__ _____ _ __ __ ___ _| |___ / _ __
\ \/ / __| '__/ _` \ \ /\ / / | |_ \| '__|
> < (__| | | (_| |\ V V /| |___) | |
/_/\_\___|_| \__,_| \_/\_/ |_|____/|_| v0.1.0

A CLI utility to recursively crawl webpages.

USAGE:
xcrawl3r [OPTIONS]

INPUT:
-d, --domain string domain to match URLs
--include-subdomains bool match subdomains' URLs
-s, --seeds string seed URLs file (use `-` to get from stdin)
-u, --url string URL to crawl

CONFIGURATION:
--depth int maximum depth to crawl (default 3)
TIP: set it to `0` for infinite recursion
--headless bool If true the browser will be displayed while crawling.
-H, --headers string[] custom header to include in requests
e.g. -H 'Referer: http://example.com/'
TIP: use multiple flag to set multiple headers
--proxy string[] Proxy URL (e.g: http://127.0.0.1:8080)
TIP: use multiple flag to set multiple proxies
--render bool utilize a headless chrome instance to render pages
--timeout int time to wait for request in seconds (default: 10)
--user-agent string User Agent to use (default: web)
TIP: use `web` for a random web user-agent,
`mobile` for a random mobile user-agent,
or you can set your specific user-agent.

RATE LIMIT:
-c, --concurrency int number of concurrent fetchers to use (default 10)
--delay int delay between each request in seconds
--max-random-delay int maximux extra randomized delay added to `--dalay` (default: 1s)
-p, --parallelism int number of concurrent URLs to process (default: 10)

OUTPUT:
--debug bool enable debug mode (default: false)
-m, --monochrome bool coloring: no colored output mode
-o, --output string output file to write found URLs
-v, --verbosity string debug, info, warning, error, fatal or silent (default: debug)

Contributing

Issues and Pull Requests are welcome! Check out the contribution guidelines.

Licensing

This utility is distributed under the MIT license.

Credits



SpiderSuite - Advance Web Spider/Crawler For Cyber Security Professionals

By: Zion3R


An advance cross-platform and multi-feature GUI web spider/crawler for cyber security proffesionals. Spider Suite can be used for attack surface mapping and analysis. For more information visit SpiderSuite's website.


Installation and Usage

Spider Suite is designed for easy installation and usage even for first timers.

  • First, download the package of your choice.

  • Then install the downloaded SpiderSuite package.

  • See First time crawling with SpiderSuite article for tutorial on how to get started.

For complete documentation of Spider Suite see wiki.

Contributing

Can you translate?

Visit SpiderSuite's translation project to make translations to your native language.

Not a developer?

You can help by reporting bugs, requesting new features, improving the documentation, sponsoring the project & writing articles.

For More information see contribution guide.

Contributers

Credits

This product includes software developed by the following open source projects:



Katana - A Next-Generation Crawling And Spidering Framework


A next-generation crawling and spidering framework

Features β€’ Installation β€’ Usage β€’ Scope β€’ Config β€’ Filters β€’ Join Discord

Features

  • Fast And fully configurable web crawling
  • Standard and Headless mode support
  • JavaScript parsing / crawling
  • Customizable automatic form filling
  • Scope control - Preconfigured field / Regex
  • Customizable output - Preconfigured fields
  • INPUT - STDIN, URL and LIST
  • OUTPUT - STDOUT, FILE and JSON

Installation

katana requires Go 1.18 to install successfully. To install, just run the below command or download pre-compiled binary from release page.

go install github.com/projectdiscovery/katana/cmd/katana@latest

Usage

katana -h

This will display help for the tool. Here are all the switches it supports.

Usage:
./katana [flags]

Flags:
INPUT:
-u, -list string[] target url / list to crawl

CONFIGURATION:
-d, -depth int maximum depth to crawl (default 2)
-jc, -js-crawl enable endpoint parsing / crawling in javascript file
-ct, -crawl-duration int maximum duration to crawl the target for
-kf, -known-files string enable crawling of known files (all,robotstxt,sitemapxml)
-mrs, -max-response-size int maximum response size to read (default 2097152)
-timeout int time to wait for request in seconds (default 10)
-aff, -automatic-form-fill enable optional automatic form filling (experimental)
-retry int number of times to retry the request (default 1)
-proxy string http/socks5 proxy to use
-H, -headers string[] custom hea der/cookie to include in request
-config string path to the katana configuration file
-fc, -form-config string path to custom form configuration file

DEBUG:
-health-check, -hc run diagnostic check up
-elog, -error-log string file to write sent requests error log

HEADLESS:
-hl, -headless enable headless hybrid crawling (experimental)
-sc, -system-chrome use local installed chrome browser instead of katana installed
-sb, -show-browser show the browser on the screen with headless mode
-ho, -headless-options string[] start headless chrome with additional options
-nos, -no-sandbox start headless chrome in --no-sandbox mode
-scp, -system-chrome-path string use specified chrome binary path for headless crawling
-noi, -no-incognito start headless chrome without incognito mode

SCOPE:
-cs, -crawl-scope string[] in scope url regex to be followed by crawler
-cos, -crawl-out-scope string[] out of scope url regex to be excluded by crawler
-fs, -field-scope string pre-defined scope field (dn,rdn,fqdn) (default "rdn")
-ns, -no-scope disables host based default scope
-do, -display-out-scope display external endpoint from scoped crawling

FILTER:
-f, -field string field to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,key,value,kv,dir,udir)
-sf, -store-field string field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,key,value,kv,dir,udir)
-em, -extension-match string[] match output for given extension (eg, -em php,html,js)
-ef, -extension-filter string[] filter output for given extension (eg, -ef png,css)

RATE-LIMIT:
-c, -concurrency int number of concurrent fetchers to use (defaul t 10)
-p, -parallelism int number of concurrent inputs to process (default 10)
-rd, -delay int request delay between each request in seconds
-rl, -rate-limit int maximum requests to send per second (default 150)
-rlm, -rate-limit-minute int maximum number of requests to send per minute

OUTPUT:
-o, -output string file to write output to
-j, -json write output in JSONL(ines) format
-nc, -no-color disable output content coloring (ANSI escape codes)
-silent display output only
-v, -verbose display verbose output
-version display project version

Running Katana

Input for katana

katana requires url or endpoint to crawl and accepts single or multiple inputs.

Input URL can be provided using -u option, and multiple values can be provided using comma-separated input, similarly file input is supported using -list option and additionally piped input (stdin) is also supported.

URL Input

katana -u https://tesla.com

Multiple URL Input (comma-separated)

katana -u https://tesla.com,https://google.com

List Input

$ cat url_list.txt

https://tesla.com
https://google.com
katana -list url_list.txt

STDIN (piped) Input

echo https://tesla.com | katana
cat domains | httpx | katana

Example running katana -

katana -u https://youtube.com

__ __
/ /_____ _/ /____ ____ ___ _
/ '_/ _ / __/ _ / _ \/ _ /
/_/\_\\_,_/\__/\_,_/_//_/\_,_/ v0.0.1

projectdiscovery.io

[WRN] Use with caution. You are responsible for your actions.
[WRN] Developers assume no liability and are not responsible for any misuse or damage.
https://www.youtube.com/
https://www.youtube.com/about/
https://www.youtube.com/about/press/
https://www.youtube.com/about/copyright/
https://www.youtube.com/t/contact_us/
https://www.youtube.com/creators/
https://www.youtube.com/ads/
https://www.youtube.com/t/terms
https://www.youtube.com/t/privacy
https://www.youtube.com/about/policies/
https://www.youtube.com/howyoutubeworks?utm_campaign=ytgen&utm_source=ythp&utm_medium=LeftNav&utm_content=txt&u=https%3A%2F%2Fwww.youtube.com %2Fhowyoutubeworks%3Futm_source%3Dythp%26utm_medium%3DLeftNav%26utm_campaign%3Dytgen
https://www.youtube.com/new
https://m.youtube.com/
https://www.youtube.com/s/desktop/4965577f/jsbin/desktop_polymer.vflset/desktop_polymer.js
https://www.youtube.com/s/desktop/4965577f/cssbin/www-main-desktop-home-page-skeleton.css
https://www.youtube.com/s/desktop/4965577f/cssbin/www-onepick.css
https://www.youtube.com/s/_/ytmainappweb/_/ss/k=ytmainappweb.kevlar_base.0Zo5FUcPkCg.L.B1.O/am=gAE/d=0/rs=AGKMywG5nh5Qp-BGPbOaI1evhF5BVGRZGA
https://www.youtube.com/opensearch?locale=en_GB
https://www.youtube.com/manifest.webmanifest
https://www.youtube.com/s/desktop/4965577f/cssbin/www-main-desktop-watch-page-skeleton.css
https://www.youtube.com/s/desktop/4965577f/jsbin/web-animations-next-lite.min.vflset/web-animations-next-lite.min.js
https://www.youtube.com/s/desktop/4965577f/jsbin/custom-elements-es5-adapter.vflset/custom-elements-es5-adapter.js
https://w ww.youtube.com/s/desktop/4965577f/jsbin/webcomponents-sd.vflset/webcomponents-sd.js
https://www.youtube.com/s/desktop/4965577f/jsbin/intersection-observer.min.vflset/intersection-observer.min.js
https://www.youtube.com/s/desktop/4965577f/jsbin/scheduler.vflset/scheduler.js
https://www.youtube.com/s/desktop/4965577f/jsbin/www-i18n-constants-en_GB.vflset/www-i18n-constants.js
https://www.youtube.com/s/desktop/4965577f/jsbin/www-tampering.vflset/www-tampering.js
https://www.youtube.com/s/desktop/4965577f/jsbin/spf.vflset/spf.js
https://www.youtube.com/s/desktop/4965577f/jsbin/network.vflset/network.js
https://www.youtube.com/howyoutubeworks/
https://www.youtube.com/trends/
https://www.youtube.com/jobs/
https://www.youtube.com/kids/

Crawling Mode

Standard Mode

Standard crawling modality uses the standard go http library under the hood to handle HTTP requests/responses. This modality is much faster as it doesn't have the browser overhead. Still, it analyzes HTTP responses body as is, without any javascript or DOM rendering, potentially missing post-dom-rendered endpoints or asynchronous endpoint calls that might happen in complex web applications depending, for example, on browser-specific events.

Headless Mode

Headless mode hooks internal headless calls to handle HTTP requests/responses directly within the browser context. This offers two advantages:

  • The HTTP fingerprint (TLS and user agent) fully identify the client as a legitimate browser
  • Better coverage since the endpoints are discovered analyzing the standard raw response, as in the previous modality, and also the browser-rendered one with javascript enabled.

Headless crawling is optional and can be enabled using -headless option.

Here are other headless CLI options -

katana -h headless

Flags:
HEADLESS:
-hl, -headless enable experimental headless hybrid crawling
-sc, -system-chrome use local installed chrome browser instead of katana installed
-sb, -show-browser show the browser on the screen with headless mode
-ho, -headless-options string[] start headless chrome with additional options
-nos, -no-sandbox start headless chrome in --no-sandbox mode
-noi, -no-incognito start headless chrome without incognito mode

-no-sandbox

Runs headless chrome browser with no-sandbox option, useful when running as root user.

katana -u https://tesla.com -headless -no-sandbox

-no-incognito

Runs headless chrome browser without incognito mode, useful when using the local browser.

katana -u https://tesla.com -headless -no-incognito

-headless-options

When crawling in headless mode, additional chrome options can be specified using -headless-options, for example -

katana -u https://tesla.com -headless -system-chrome -headless-options --disable-gpu,proxy-server=http://127.0.0.1:8080

Scope Control

Crawling can be endless if not scoped, as such katana comes with multiple support to define the crawl scope.

-field-scope

Most handy option to define scope with predefined field name, rdn being default option for field scope.

  • rdn - crawling scoped to root domain name and all subdomains (e.g. *example.com) (default)
  • fqdn - crawling scoped to given sub(domain) (e.g. www.example.com or api.example.com)
  • dn - crawling scoped to domain name keyword (e.g. example)
katana -u https://tesla.com -fs dn

-crawl-scope

For advanced scope control, -cs option can be used that comes with regex support.

katana -u https://tesla.com -cs login

For multiple in scope rules, file input with multiline string / regex can be passed.

$ cat in_scope.txt

login/
admin/
app/
wordpress/
katana -u https://tesla.com -cs in_scope.txt

-crawl-out-scope

For defining what not to crawl, -cos option can be used and also support regex input.

katana -u https://tesla.com -cos logout

For multiple out of scope rules, file input with multiline string / regex can be passed.

$ cat out_of_scope.txt

/logout
/log_out
katana -u https://tesla.com -cos out_of_scope.txt

-no-scope

Katana is default to scope *.domain, to disable this -ns option can be used and also to crawl the internet.

katana -u https://tesla.com -ns

-display-out-scope

As default, when scope option is used, it also applies for the links to display as output, as such external URLs are default to exclude and to overwrite this behavior, -do option can be used to display all the external URLs that exist in targets scoped URL / Endpoint.

katana -u https://tesla.com -do

Here is all the CLI options for the scope control -

katana -h scope

Flags:
SCOPE:
-cs, -crawl-scope string[] in scope url regex to be followed by crawler
-cos, -crawl-out-scope string[] out of scope url regex to be excluded by crawler
-fs, -field-scope string pre-defined scope field (dn,rdn,fqdn) (default "rdn")
-ns, -no-scope disables host based default scope
-do, -display-out-scope display external endpoint from scoped crawling

Crawler Configuration

Katana comes with multiple options to configure and control the crawl as the way we want.

-depth

Option to define the depth to follow the urls for crawling, the more depth the more number of endpoint being crawled + time for crawl.

katana -u https://tesla.com -d 5

-js-crawl

Option to enable JavaScript file parsing + crawling the endpoints discovered in JavaScript files, disabled as default.

katana -u https://tesla.com -jc

-crawl-duration

Option to predefined crawl duration, disabled as default.

katana -u https://tesla.com -ct 2

-known-files

Option to enable crawling robots.txt and sitemap.xml file, disabled as default.

katana -u https://tesla.com -kf robotstxt,sitemapxml

-automatic-form-fill

Option to enable automatic form filling for known / unknown fields, known field values can be customized as needed by updating form config file at $HOME/.config/katana/form-config.yaml.

Automatic form filling is experimental feature.

   -aff, -automatic-form-fill  enable optional automatic form filling (experimental)

There are more options to configure when needed, here is all the config related CLI options -

katana -h config

Flags:
CONFIGURATION:
-d, -depth int maximum depth to crawl (default 2)
-jc, -js-crawl enable endpoint parsing / crawling in javascript file
-ct, -crawl-duration int maximum duration to crawl the target for
-kf, -known-files string enable crawling of known files (all,robotstxt,sitemapxml)
-mrs, -max-response-size int maximum response size to read (default 2097152)
-timeout int time to wait for request in seconds (default 10)
-retry int number of times to retry the request (default 1)
-proxy string http/socks5 proxy to use
-H, -headers string[] custom header/cookie to include in request
-config string path to the katana configuration file
-fc, -form-config string path to custom form configuration file

Filters

-field

Katana comes with built in fields that can be used to filter the output for the desired information, -f option can be used to specify any of the available fields.

   -f, -field string  field to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,key,value,kv,dir,udir)

Here is a table with examples of each field and expected output when used -

FIELD DESCRIPTION EXAMPLE
url URL Endpoint https://admin.projectdiscovery.io/admin/login?user=admin&password=admin
qurl URL including query param https://admin.projectdiscovery.io/admin/login.php?user=admin&password=admin
qpath Path including query param /login?user=admin&password=admin
path URL Path https://admin.projectdiscovery.io/admin/login
fqdn Fully Qualified Domain name admin.projectdiscovery.io
rdn Root Domain name projectdiscovery.io
rurl Root URL https://admin.projectdiscovery.io
file Filename in URL login.php
key Parameter keys in URL user,password
value Parameter values in URL admin,admin
kv Keys=Values in URL user=admin&password=admin
dir URL Directory name /admin/
udir URL with Directory https://admin.projectdiscovery.io/admin/

Here is an example of using field option to only display all the urls with query parameter in it -

katana -u https://tesla.com -f qurl -silent

https://shop.tesla.com/en_au?redirect=no
https://shop.tesla.com/en_nz?redirect=no
https://shop.tesla.com/product/men_s-raven-lightweight-zip-up-bomber-jacket?sku=1740250-00-A
https://shop.tesla.com/product/tesla-shop-gift-card?sku=1767247-00-A
https://shop.tesla.com/product/men_s-chill-crew-neck-sweatshirt?sku=1740176-00-A
https://www.tesla.com/about?redirect=no
https://www.tesla.com/about/legal?redirect=no
https://www.tesla.com/findus/list?redirect=no

Custom Fields

You can create custom fields to extract and store specific information from page responses using regex rules. These custom fields are defined using a YAML config file and are loaded from the default location at $HOME/.config/katana/field-config.yaml. Alternatively, you can use the -flc option to load a custom field config file from a different location. Here is example custom field.

- name: email
type: regex
regex:
- '([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)'
- '([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)'

- name: phone
type: regex
regex:
- '\d{3}-\d{8}|\d{4}-\d{7}'

When defining custom fields, following attributes are supported:

  • name (required)

The value of name attribute is used as the -field cli option value.

  • type (required)

The type of custom attribute, currenly supported option - regex

  • part (optional)

The part of the response to extract the information from. The default value is response, which includes both the header and body. Other possible values are header and body.

  • group (optional)

You can use this attribute to select a specific matched group in regex, for example: group: 1

Running katana using custom field:

katana -u https://tesla.com -f email,phone

-store-field

To compliment field option which is useful to filter output at run time, there is -sf, -store-fields option which works exactly like field option except instead of filtering, it stores all the information on the disk under katana_field directory sorted by target url.

katana -u https://tesla.com -sf key,fqdn,qurl -silent
$ ls katana_field/

https_www.tesla.com_fqdn.txt
https_www.tesla.com_key.txt
https_www.tesla.com_qurl.txt

The -store-field option can be useful for collecting information to build a targeted wordlist for various purposes, including but not limited to:

  • Identifying the most commonly used parameters
  • Discovering frequently used paths
  • Finding commonly used files
  • Identifying related or unknown subdomains

-extension-match

Crawl output can be easily matched for specific extension using -em option to ensure to display only output containing given extension.

katana -u https://tesla.com -silent -em js,jsp,json

-extension-filter

Crawl output can be easily filtered for specific extension using -ef option which ensure to remove all the urls containing given extension.

katana -u https://tesla.com -silent -ef css,txt,md

Here are additional filter options -

   -f, -field string                field to display in output (url,path,fqdn,rdn,rurl,qurl,file,key,value,kv,dir,udir)
-sf, -store-field string field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,file,key,value,kv,dir,udir)
-em, -extension-match string[] match output for given extension (eg, -em php,html,js)
-ef, -extension-filter string[] filter output for given extension (eg, -ef png,css)

Rate Limit

It's easy to get blocked / banned while crawling if not following target websites limits, katana comes with multiple option to tune the crawl to go as fast / slow we want.

-delay

option to introduce a delay in seconds between each new request katana makes while crawling, disabled as default.

katana -u https://tesla.com -delay 20

-concurrency

option to control the number of urls per target to fetch at the same time.

katana -u https://tesla.com -c 20

-parallelism

option to define number of target to process at same time from list input.

katana -u https://tesla.com -p 20

-rate-limit

option to use to define max number of request can go out per second.

katana -u https://tesla.com -rl 100

-rate-limit-minute

option to use to define max number of request can go out per minute.

katana -u https://tesla.com -rlm 500

Here is all long / short CLI options for rate limit control -

katana -h rate-limit

Flags:
RATE-LIMIT:
-c, -concurrency int number of concurrent fetchers to use (default 10)
-p, -parallelism int number of concurrent inputs to process (default 10)
-rd, -delay int request delay between each request in seconds
-rl, -rate-limit int maximum requests to send per second (default 150)
-rlm, -rate-limit-minute int maximum number of requests to send per minute

Output

Katana support both file output in plain text format as well as JSON which includes additional information like, source, tag, and attribute name to co-related the discovered endpoint.

-output

By default, katana outputs the crawled endpoints in plain text format. The results can be written to a file by using the -output option.

katana -u https://example.com -no-scope -output example_endpoints.txt

-json

katana -u https://example.com -json -do | jq .
{
"timestamp": "2022-11-05T22:33:27.745815+05:30",
"endpoint": "https://www.iana.org/domains/example",
"source": "https://example.com",
"tag": "a",
"attribute": "href"
}

-store-response

The -store-response option allows for writing all crawled endpoint requests and responses to a text file. When this option is used, text files including the request and response will be written to the katana_response directory. If you would like to specify a custom directory, you can use the -store-response-dir option.

katana -u https://example.com -no-scope -store-response
$ cat katana_response/index.txt

katana_response/example.com/327c3fda87ce286848a574982ddd0b7c7487f816.txt https://example.com (200 OK)
katana_response/www.iana.org/bfc096e6dd93b993ca8918bf4c08fdc707a70723.txt http://www.iana.org/domains/reserved (200 OK)

Note:

-store-response option is not supported in -headless mode.

Here are additional CLI options related to output -

katana -h output

OUTPUT:
-o, -output string file to write output to
-sr, -store-response store http requests/responses
-srd, -store-response-dir string store http requests/responses to custom directory
-j, -json write output in JSONL(ines) format
-nc, -no-color disable output content coloring (ANSI escape codes)
-silent display output only
-v, -verbose display verbose output
-version display project version


GraphCrawler - GraphQL Automated Security Testing Toolkit


Graph Crawler is the most powerful automated testing toolkit for any GraphQL endpoint.

NEW: Can search for endpoints for you using Escape Technology's powerful Graphinder tool. Just point it towards a domain and add the '-e' option and Graphinder will do subdomain enumeration + search popular directories for GraphQL endpoints. After all this GraphCrawler will take over and work through each find.

It will run through and check if mutation is enabled, check for any sensitive queries available, such as users and files, and it will also test any easy queries it find to see if authentication is required.

If introspection is not enabled on the endpoint it will check if it is an Apollo Server and then can run Clairvoyance to brute force and grab the suggestions to try to build the schema ourselves. (See the Clairvoyance project for greater details on this). It will then score the findings 1-10 with 10 being the most critical.

If you want to dig deeper into the schema you can also use graphql-path-enum to look for paths to certain types, like user IDs, emails, etc.

I hope this saves you as much time as it has for me


Usage

python graphCrawler.py -u https://test.com/graphql/api -o <fileName> -a "<headers>"


β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•
β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β• β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘
β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•šβ•β• β•šβ•β•β•šβ•β• β•šβ•β• β•šβ•β• β•šβ•β•β•β•β•β•β•šβ•β• β•šβ•β•β•šβ•β• β•šβ•β• β•šβ•β•β•β•šβ•β•β• β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β• β•šβ•β•

The output option is not required and by default it will output to schema.json

Example output:

Requirements

  • Python3
  • Docker
  • Install all Python dependencies with pip

Wordlist from google-10000-english

TODO

  • Add option for "full report" following the endpoint search where it will run clairvoyance and all other aspects of the toolkit on the endpoints found
  • Default to "simple scan" to just find endpoints when this feature is added
  • Way Future: help craft queries based of the shema provided


❌