Introducing Uscrapper 2.0, A powerfull OSINT webscrapper that allows users to extract various personal information from a website. It leverages web scraping techniques and regular expressions to extract email addresses, social media links, author names, geolocations, phone numbers, and usernames from both hyperlinked and non-hyperlinked sources on the webpage, supports multithreading to make this process faster, Uscrapper 2.0 is equipped with advanced Anti-webscrapping bypassing modules and supports webcrawling to scrape from various sublinks within the same domain. The tool also provides an option to generate a report containing the extracted details.
Uscrapper extracts the following details from the provided website:
Uscrapper 2.0:
git clone https://github.com/z0m31en7/Uscrapper.git
cd Uscrapper/install/
chmod +x ./install.sh && ./install.sh #For Unix/Linux systems
To run Uscrapper, use the following command-line syntax:
python Uscrapper-v2.0.py [-h] [-u URL] [-c (INT)] [-t THREADS] [-O] [-ns]
Arguments:
Uscrapper relies on web scraping techniques to extract information from websites. Make sure to use it responsibly and in compliance with the website's terms of service and applicable laws.
The accuracy and completeness of the extracted details depend on the structure and content of the website being analyzed.
To bypass some Anti-Webscrapping methods we have used selenium which can make the overall process slower.
ScrapPY is a Python utility for scraping manuals, documents, and other sensitive PDFs to generate targeted wordlists that can be utilized by offensive security tools to perform brute force, forced browsing, and dictionary attacks. ScrapPY performs word frequency, entropy, and metadata analysis, and can run in full output modes to craft custom wordlists for targeted attacks. The tool dives deep to discover keywords and phrases leading to potential passwords or hidden directories, outputting to a text file that is readable by tools such as Hydra, Dirb, and Nmap. Expedite initial access, vulnerability discovery, and lateral movement with ScrapPY!
Download Repository:
$ mkdir ScrapPY
$ cd ScrapPY/
$ sudo git clone https://github.com/RoseSecurity/ScrapPY.git
Install Dependencies:
$ pip3 install -r requirements.txt
usage: ScrapPY.py [-h] [-f FILE] [-m {word-frequency,full,metadata,entropy}] [-o OUTPUT]
Output metadata of document:
$ python3 ScrapPY.py -f example.pdf -m metadata
Output top 100 frequently used keywords to a file name Top_100_Keywords.txt
:
$ python3 ScrapPY.py -f example.pdf -m word-frequency -o Top_100_Keywords.txt
Output all keywords to default ScrapPY.txt file:
$ python3 ScrapPY.py -f example.pdf
Output top 100 keywords with highest entropy rating:
$ python3 ScrapPY.py -f example.pdf -m entropy
ScrapPY Output:
# ScrapPY outputs the ScrapPY.txt file or specified name file to the directory in which the tool was ran. To view the first fifty lines of the file, run this command:
$ head -50 ScrapPY.txt
# To see how many words were generated, run this command:
$ wc -l ScrapPY.txt
Easily integrate with tools such as Dirb to expedite the process of discovering hidden subdirectories:
root@RoseSecurity:~# dirb http://192.168.1.123/ /root/ScrapPY/ScrapPY.txt
-----------------
DIRB v2.21
By The Dark Raver
-----------------
START_TIME: Fri May 16 13:41:45 2014
URL_BASE: http://192.168.1.123/
WORDLIST_FILES: /root/ScrapPY/ScrapPY.txt
-----------------
GENERATED WORDS: 4592
---- Scanning URL: http://192.168.1.123/ ----
==> DIRECTORY: http://192.168.1.123/vi/
+ http://192.168.1.123/programming (CODE:200|SIZE:2726)
+ http://192.168.1.123/s7-logic/ (CODE:403|SIZE:1122)
==> DIRECTORY: http://192.168.1.123/config/
==> DIRECTORY: http://192.168.1.123/docs/
==> DIRECTORY: http://192.168.1.123/external/
Utilize ScrapPY with Hydra for advanced brute force attacks:
root@RoseSecurity:~# hydra -l root -P /root/ScrapPY/ScrapPY.txt -t 6 ssh://192.168.1.123
Hydra v7.6 (c)2013 by van Hauser/THC & David Maciejak - for legal purposes only
Hydra (http://www.thc.org/thc-hydra) starting at 2014-05-19 07:53:33
[DATA] 6 tasks, 1 server, 1003 login tries (l:1/p:1003), ~167 tries per task
[DATA] attacking service ssh on port 22
Enhance Nmap scripts with ScrapPY wordlists:
nmap -p445 --script smb-brute.nse --script-args userdb=users.txt,passdb=ScrapPY.txt 192.168.1.123