FreshRSS

๐Ÿ”’
โŒ Secure Planet Training Courses Updated For 2019 - Click Here
There are new available articles, click to refresh the page.
Before yesterdayTools

Telegram-Scraper - A Powerful Python Script That Allows You To Scrape Messages And Media From Telegram Channels Using The Telethon Library

By: Unknown


A powerful Python script that allows you to scrape messages and media from Telegram channels using the Telethon library. Features include real-time continuous scraping, media downloading, and data export capabilities.

___________________  _________
\__ ___/ _____/ / _____/
| | / \ ___ \_____ \
| | \ \_\ \/ \
|____| \______ /_______ /
\/ \/

Features ๐Ÿš€

  • Scrape messages from multiple Telegram channels
  • Download media files (photos, documents)
  • Real-time continuous scraping
  • Export data to JSON and CSV formats
  • SQLite database storage
  • Resume capability (saves progress)
  • Media reprocessing for failed downloads
  • Progress tracking
  • Interactive menu interface

Prerequisites ๐Ÿ“‹

Before running the script, you'll need:

  • Python 3.7 or higher
  • Telegram account
  • API credentials from Telegram

Required Python packages

pip install -r requirements.txt

Contents of requirements.txt:

telethon
aiohttp
asyncio

Getting Telegram API Credentials ๐Ÿ”‘

  1. Visit https://my.telegram.org/auth
  2. Log in with your phone number
  3. Click on "API development tools"
  4. Fill in the form:
  5. App title: Your app name
  6. Short name: Your app short name
  7. Platform: Can be left as "Desktop"
  8. Description: Brief description of your app
  9. Click "Create application"
  10. You'll receive:
  11. api_id: A number
  12. api_hash: A string of letters and numbers

Keep these credentials safe, you'll need them to run the script!

Setup and Running ๐Ÿ”ง

  1. Clone the repository:
git clone https://github.com/unnohwn/telegram-scraper.git
cd telegram-scraper
  1. Install requirements:
pip install -r requirements.txt
  1. Run the script:
python telegram-scraper.py
  1. On first run, you'll be prompted to enter:
  2. Your API ID
  3. Your API Hash
  4. Your phone number (with country code)
  5. Your phone number (with country code) or bot, but use the phone number option when prompted second time.
  6. Verification code (sent to your Telegram)

Initial Scraping Behavior ๐Ÿ•’

When scraping a channel for the first time, please note:

  • The script will attempt to retrieve the entire channel history, starting from the oldest messages
  • Initial scraping can take several minutes or even hours, depending on:
  • The total number of messages in the channel
  • Whether media downloading is enabled
  • The size and number of media files
  • Your internet connection speed
  • Telegram's rate limiting
  • The script uses pagination and maintains state, so if interrupted, it can resume from where it left off
  • Progress percentage is displayed in real-time to track the scraping status
  • Messages are stored in the database as they are scraped, so you can start analyzing available data even before the scraping is complete

Usage ๐Ÿ“

The script provides an interactive menu with the following options:

  • [A] Add new channel
  • Enter the channel ID or channelname
  • [R] Remove channel
  • Remove a channel from scraping list
  • [S] Scrape all channels
  • One-time scraping of all configured channels
  • [M] Toggle media scraping
  • Enable/disable downloading of media files
  • [C] Continuous scraping
  • Real-time monitoring of channels for new messages
  • [E] Export data
  • Export to JSON and CSV formats
  • [V] View saved channels
  • List all saved channels
  • [L] List account channels
  • List all channels with ID:s for account
  • [Q] Quit

Channel IDs ๐Ÿ“ข

You can use either: - Channel username (e.g., channelname) - Channel ID (e.g., -1001234567890)

Data Storage ๐Ÿ’พ

Database Structure

Data is stored in SQLite databases, one per channel: - Location: ./channelname/channelname.db - Table: messages - id: Primary key - message_id: Telegram message ID - date: Message timestamp - sender_id: Sender's Telegram ID - first_name: Sender's first name - last_name: Sender's last name - username: Sender's username - message: Message text - media_type: Type of media (if any) - media_path: Local path to downloaded media - reply_to: ID of replied message (if any)

Media Storage ๐Ÿ“

Media files are stored in: - Location: ./channelname/media/ - Files are named using message ID or original filename

Exported Data ๐Ÿ“Š

Data can be exported in two formats: 1. CSV: ./channelname/channelname.csv - Human-readable spreadsheet format - Easy to import into Excel/Google Sheets

  1. JSON: ./channelname/channelname.json
  2. Structured data format
  3. Ideal for programmatic processing

Features in Detail ๐Ÿ”

Continuous Scraping

The continuous scraping feature ([C] option) allows you to: - Monitor channels in real-time - Automatically download new messages - Download media as it's posted - Run indefinitely until interrupted (Ctrl+C) - Maintains state between runs

Media Handling

The script can download: - Photos - Documents - Other media types supported by Telegram - Automatically retries failed downloads - Skips existing files to avoid duplicates

Error Handling ๐Ÿ› ๏ธ

The script includes: - Automatic retry mechanism for failed media downloads - State preservation in case of interruption - Flood control compliance - Error logging for failed operations

Limitations โš ๏ธ

  • Respects Telegram's rate limits
  • Can only access public channels or channels you're a member of
  • Media download size limits apply as per Telegram's restrictions

Contributing ๐Ÿค

Contributions are welcome! Please feel free to submit a Pull Request.

License ๐Ÿ“„

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer โš–๏ธ

This tool is for educational purposes only. Make sure to: - Respect Telegram's Terms of Service - Obtain necessary permissions before scraping - Use responsibly and ethically - Comply with data protection regulations



Telegram-Story-Scraper - A Python Script That Allows You To Automatically Scrape And Download Stories From Your Telegram Friends

By: Unknown


A Python script that allows you to automatically scrape and download stories from your Telegram friends using the Telethon library. The script continuously monitors and saves both photos and videos from stories, along with their metadata.


Important Note About Story Access โš ๏ธ

Due to Telegram API restrictions, this script can only access stories from: - Users you have added to your friend list - Users whose privacy settings allow you to view their stories

This is a limitation of Telegram's API and cannot be bypassed.

Features ๐Ÿš€

  • Automatically scrapes all available stories from your Telegram friends
  • Downloads both photos and videos from stories
  • Stores metadata in SQLite database
  • Exports data to Excel spreadsheet
  • Real-time monitoring with customizable intervals
  • Timestamp is set to (UTC+2)
  • Maintains record of previously downloaded stories
  • Resume capability
  • Automatic retry mechanism

Prerequisites ๐Ÿ“‹

Before running the script, you'll need:

  • Python 3.7 or higher
  • Telegram account
  • API credentials from Telegram
  • Friends on Telegram whose stories you want to track

Required Python packages

pip install -r requirements.txt

Contents of requirements.txt:

telethon
openpyxl
schedule

Getting Telegram API Credentials ๐Ÿ”‘

  1. Visit https://my.telegram.org/auth
  2. Log in with your phone number
  3. Click on "API development tools"
  4. Fill in the form:
  5. App title: Your app name
  6. Short name: Your app short name
  7. Platform: Can be left as "Desktop"
  8. Description: Brief description of your app
  9. Click "Create application"
  10. You'll receive:
  11. api_id: A number
  12. api_hash: A string of letters and numbers

Keep these credentials safe, you'll need them to run the script!

Setup and Running ๐Ÿ”ง

  1. Clone the repository:
git clone https://github.com/unnohwn/telegram-story-scraper.git
cd telegram-story-scraper
  1. Install requirements:
pip install -r requirements.txt
  1. Run the script:
python TGSS.py
  1. On first run, you'll be prompted to enter:
  2. Your API ID
  3. Your API Hash
  4. Your phone number (with country code)
  5. Verification code (sent to your Telegram)
  6. Checking interval in seconds (default is 60)

How It Works ๐Ÿ”„

The script: 1. Connects to your Telegram account 2. Periodically checks for new stories from your friends 3. Downloads any new stories (photos/videos) 4. Stores metadata in a SQLite database 5. Exports information to an Excel file 6. Runs continuously until interrupted (Ctrl+C)

Data Storage ๐Ÿ’พ

Database Structure (stories.db)

SQLite database containing: - user_id: Telegram user ID of the story creator - story_id: Unique story identifier - timestamp: When the story was posted (UTC+2) - filename: Local filename of the downloaded media

CSV and Excel Export (stories_export.csv/xlsx)

Export file containing the same information as the database, useful for: - Easy viewing of story metadata - Filtering and sorting - Data analysis - Sharing data with others

Media Storage ๐Ÿ“

  • Photos are saved as: {user_id}_{story_id}.jpg
  • Videos are saved with their original extension: {user_id}_{story_id}.{extension}
  • All media files are saved in the script's directory

Features in Detail ๐Ÿ”

Continuous Monitoring

  • Customizable checking interval (default: 60 seconds)
  • Runs continuously until manually stopped
  • Maintains state between runs
  • Avoids duplicate downloads

Media Handling

  • Supports both photos and videos
  • Automatically detects media type
  • Preserves original quality
  • Generates unique filenames

Error Handling ๐Ÿ› ๏ธ

The script includes: - Automatic retry mechanism for failed downloads - Error logging for failed operations - Connection error handling - State preservation in case of interruption

Limitations โš ๏ธ

  • Subject to Telegram's rate limits
  • Stories must be currently active (not expired)
  • Media download size limits apply as per Telegram's restrictions

Contributing ๐Ÿค

Contributions are welcome! Please feel free to submit a Pull Request.

License ๐Ÿ“„

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer โš–๏ธ

This tool is for educational purposes only. Make sure to: - Respect Telegram's Terms of Service - Obtain necessary permissions before scraping - Use responsibly and ethically - Comply with data protection regulations - Respect user privacy



Uscrapper - Powerful OSINT Webscraper For Personal Data Collection

By: Zion3R


Introducing Uscrapper 2.0, A powerfull OSINT webscrapper that allows users to extract various personal information from a website. It leverages web scraping techniques and regular expressions to extract email addresses, social media links, author names, geolocations, phone numbers, and usernames from both hyperlinked and non-hyperlinked sources on the webpage, supports multithreading to make this process faster, Uscrapper 2.0 is equipped with advanced Anti-webscrapping bypassing modules and supports webcrawling to scrape from various sublinks within the same domain. The tool also provides an option to generate a report containing the extracted details.


Extracted Details:

Uscrapper extracts the following details from the provided website:

  • Email Addresses: Displays email addresses found on the website.
  • Social Media Links: Displays links to various social media platforms found on the website.
  • Author Names: Displays the names of authors associated with the website.
  • Geolocations: Displays geolocation information associated with the website.
  • Non-Hyperlinked Details: Displays non-hyperlinked details found on the website including email addresses phone numbers and usernames.

Whats New?:

Uscrapper 2.0:

  • Introduced multiple modules to bypass anti-webscrapping techniques.
  • Introducing Crawl and scrape: an advanced crawl and scrape module to scrape the websites from within.
  • Implemented Multithreading to make these processes faster.

Installation Steps:

git clone https://github.com/z0m31en7/Uscrapper.git
cd Uscrapper/install/ 
chmod +x ./install.sh && ./install.sh #For Unix/Linux systems

Usage:

To run Uscrapper, use the following command-line syntax:

python Uscrapper-v2.0.py [-h] [-u URL] [-c (INT)] [-t THREADS] [-O] [-ns]


Arguments:

  • -h, --help: Show the help message and exit.
  • -u URL, --url URL: Specify the URL of the website to extract details from.
  • -c INT, --crawl INT: Specify the number of links to crawl
  • -t INT, --threads INT: Specify the number of threads to use while crawling and scraping.
  • -O, --generate-report: Generate a report file containing the extracted details.
  • -ns, --nonstrict: Display non-strict usernames during extraction.

Note:

  • Uscrapper relies on web scraping techniques to extract information from websites. Make sure to use it responsibly and in compliance with the website's terms of service and applicable laws.

  • The accuracy and completeness of the extracted details depend on the structure and content of the website being analyzed.

  • To bypass some Anti-Webscrapping methods we have used selenium which can make the overall process slower.

Contribution:

Want a new feature to be added?

  • Make a pull request with all the necessary details and it will be merged after a review.
  • You can contribute by making the regular expressions more efficient and accurate, or by suggesting some more features that can be added.


ScrapPY - A Python Utility For Scraping Manuals, Documents, And Other Sensitive PDFs To Generate Wordlists That Can Be Utilized By Offensive Security Tools

By: Zion3R


ScrapPY is a Python utility for scraping manuals, documents, and other sensitive PDFs to generate targeted wordlists that can be utilized by offensive security tools to perform brute force, forced browsing, and dictionary attacks. ScrapPY performs word frequency, entropy, and metadata analysis, and can run in full output modes to craft custom wordlists for targeted attacks. The tool dives deep to discover keywords and phrases leading to potential passwords or hidden directories, outputting to a text file that is readable by tools such as Hydra, Dirb, and Nmap. Expedite initial access, vulnerability discovery, and lateral movement with ScrapPY!


Install:

Download Repository:

$ mkdir ScrapPY
$ cd ScrapPY/
$ sudo git clone https://github.com/RoseSecurity/ScrapPY.git

Install Dependencies:

$ pip3 install -r requirements.txt

ScrapPY Usage:

usage: ScrapPY.py [-h] [-f FILE] [-m {word-frequency,full,metadata,entropy}] [-o OUTPUT]

Output metadata of document:

$ python3 ScrapPY.py -f example.pdf -m metadata

Output top 100 frequently used keywords to a file name Top_100_Keywords.txt:

$ python3 ScrapPY.py -f example.pdf -m word-frequency -o Top_100_Keywords.txt

Output all keywords to default ScrapPY.txt file:

$ python3 ScrapPY.py -f example.pdf

Output top 100 keywords with highest entropy rating:

$ python3 ScrapPY.py -f example.pdf -m entropy

ScrapPY Output:

# ScrapPY outputs the ScrapPY.txt file or specified name file to the directory in which the tool was ran. To view the first fifty lines of the file, run this command:

$ head -50 ScrapPY.txt

# To see how many words were generated, run this command:

$ wc -l ScrapPY.txt

Integration with Offensive Security Tools:

Easily integrate with tools such as Dirb to expedite the process of discovering hidden subdirectories:

root@RoseSecurity:~# dirb http://192.168.1.123/ /root/ScrapPY/ScrapPY.txt

-----------------
DIRB v2.21
By The Dark Raver
-----------------

START_TIME: Fri May 16 13:41:45 2014
URL_BASE: http://192.168.1.123/
WORDLIST_FILES: /root/ScrapPY/ScrapPY.txt

-----------------

GENERATED WORDS: 4592

---- Scanning URL: http://192.168.1.123/ ----
==> DIRECTORY: http://192.168.1.123/vi/
+ http://192.168.1.123/programming (CODE:200|SIZE:2726)
+ http://192.168.1.123/s7-logic/ (CODE:403|SIZE:1122)
==> DIRECTORY: http://192.168.1.123/config/
==> DIRECTORY: http://192.168.1.123/docs/
==> DIRECTORY: http://192.168.1.123/external/

Utilize ScrapPY with Hydra for advanced brute force attacks:

root@RoseSecurity:~# hydra -l root -P /root/ScrapPY/ScrapPY.txt -t 6 ssh://192.168.1.123
Hydra v7.6 (c)2013 by van Hauser/THC & David Maciejak - for legal purposes only

Hydra (http://www.thc.org/thc-hydra) starting at 2014-05-19 07:53:33
[DATA] 6 tasks, 1 server, 1003 login tries (l:1/p:1003), ~167 tries per task
[DATA] attacking service ssh on port 22

Enhance Nmap scripts with ScrapPY wordlists:

nmap -p445 --script smb-brute.nse --script-args userdb=users.txt,passdb=ScrapPY.txt 192.168.1.123

Future Development:

  • Allow for custom output file naming and increased verbosity
  • Integrate different modes of operation including word frequency analysis
  • Allow for metadata analysis
  • Search for high-entropy data
  • Search for path-like data
  • Implement image OCR to enumerate data from images in PDFs
  • Allow for processing of multiple PDFs


โŒ