Normal view

Weekly Update 487

18 January 2026 at 08:43
Weekly Update 487

I thought Scott would cop it first when he posted about what his solar system really cost him last year. "You're so gonna get that stupid AI-slop response from some people", I joked. But no, he got other stupid responses instead! And I got the AI-slop responses! Draw your own conclusions on those comments, but I find it fascinating that the one thing people would take away from a thoughtful blog post I spent many hours writing to explain how much work I put into privacy is that the illustration was computer-generated. That such feedback aligns with the political leanings of folks on Mastodon is also fascinating, and probably something I should have seen coming. But hey, there's nothing new about folks popping their heads up to make inane comments where none were needed, and I have a special blog post for just such occasions: If You Don't Want Guitar Lessons, Stop Following Me.

Weekly Update 487
Weekly Update 487
Weekly Update 487
Weekly Update 487

Weekly Update 486

16 January 2026 at 06:39
Weekly Update 486

I’m in Oslo! Flighty is telling me I’ve flown in or out of here 43 times since a visit in 2014 set me on a new path professionally and, many years later, personally. It’s special here, like a second home that just feels… right. This week, the business end of things is about the WhiteDate data breach. Seeking a partner along common racial lines isn’t unusual, but… well… WhiteDate is anything but usual. And, just for fun, see if you can pick the thing that garnered the most negative feedback about that blog post this week, I’ll feature the discussion in the next vid.

Weekly Update 486
Weekly Update 486
Weekly Update 486
Weekly Update 486

Who Decides Who Doesn’t Deserve Privacy?

13 January 2026 at 11:41
Who Decides Who Doesn’t Deserve Privacy?

Remember the Ashley Madison data breach? That was now more than a decade ago, yet it arguably remains the single most noteworthy data breach of all time. There are many reasons for this accolade, but chief among them is that by virtue of the site being expressly designed to facilitate extramarital affairs, there was massive social stigma attached to it. As a result, we saw some pretty crazy stuff:

  1. Various websites were stood up to publicly disclose the presence of people in the data and out them as “cheaters”
  2. Churches trawled through the data and contacted the spouses of exposed parishioners
  3. The media outed noteworthy individuals they searched for in the breach
  4. A radio station back home in Australia encouraged listeners to dial in to check if their spouse was in the data

Arguably, we now live in a more privacy-conscious era, one full of acronyms such as GDPR and CCPA, among others, in different parts of the world. The right to be forgotten, the right to erasure, and, indeed, privacy as a fundamental human right feature very differently in 2026 than they did in 2015. But arguably, even back then, the impact of outing someone as a member of the site should have been obvious. It was certainly obvious to me, which is why I introduced the concept of a sensitive data breach before the data even went public. HIBP wouldn’t show results for this breach publicly because I was concerned about the impact on people being outed. My worst fear was a spouse coming home to find someone having taken their own life, an HIBP search result on the screen in front of their lifeless body.

People died as a result of the breach. Marriages ended and lives were turned upside down. People lost their jobs. The human toll of the breach was profound. The decision I made after witnessing this was that if a breach was likely to have serious personal or social consequences for people in there, it would be flagged as sensitive and not publicly searchable.

The public doxing of members of the service was often justified on a moral basis: “adultery is bad, they deserve to be outed”. But there are two massive problems with this attitude, and I’ll begin with the purpose for which accounts were sometimes made:

An email address appearing in that breach implied that the person was there to have an extramarital affair because that was literally the catch-phrase of the service: “Life is short, have an affair”. But the reality was that people were members of the service for many, many different reasons. Have a read of my post titled Here’s What Ashley Madison Members Have Told Me and you’ll begin to understand how much more nuanced the situation was:

  1. Single people had joined the service, and later married before the breach occurred
  2. People who were worried about a cheating spouse joined the service in order to try to catch them
  3. Accounts were made with some people’s names and email addresses without their consent (there are many “Barrack Obamas” in the data)

So, should everyone with an email address on Ashley Madison be considered an adulterer? Clearly, no, that completely misses the nuances of what an email address in a data breach really means. But what about the people who were there to have an affair? Well, that brings us to the second problem:

Our own personal belief systems are not a valid basis for outing people publicly because their belief systems differ. I used more generic terms than “extramarital affair” or “cheating” because there are many other data breaches that are flagged as sensitive in HIBP for the very same reason. Fur Affinity, for example: there is a social stigma around furries and outing someone as a member of that community could have negative consequences for them. Rosebutt Board is another example: anal fisting is evidently something a bunch of people are into, and equally, I’m sure there are many who take a moral objection to it. And finally, to get to the catalyst for this post, WhiteDate: the website that is ostensibly designed for white people to date other white people. Flagging that as sensitive resulted in some unsavoury commentary being directed at me:

U are a Nazi end of story

— 𝔗𝔥𝔢ℑ𝔡𝔦𝔬𝔱 (@fuckelonsob) January 6, 2026

Now, I emphasised “ostensibly” because the more you dig into this breach, the more you find tones of white supremacy and other behaviours that definitely don’t align with my personal value system. That societal view doesn’t sit well with me, and I think I’m safe in saying it wouldn’t sit well with most people. Would someone being outed as a member of that service be likely to result in “serious personal or social consequences”? Yes, and you can see that in the messaging from the same account:

Context matters. U are literally shielding Nazi hate mongering scoundrels. We can't doxx white supremacists?

If ISIS had a dating site & it got breached, would you protect it out of fear of doxxing? No.

Every database leaked is sensitive in a way.

— 𝔗𝔥𝔢ℑ𝔡𝔦𝔬𝔱 (@fuckelonsob) January 6, 2026

This behaviour is precisely what I don’t want HIBP being used for: as a weapon to attack people solely on the basis of their email address being affiliated with a website that has had a data breach.

Imagine, for a moment, if ISIS did have a dating site and it was breached, should it be flagged as sensitive? Contrary to the comment about "every database leaked is sensitive", there is a clear legal definition for sensitive personal information and it includes:

personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs;
trade-union membership;
genetic data, biometric data processed solely to identify a human being;
health-related data;
data concerning a person’s sex life or sexual orientation.

An ISIS dating website breach would tick many of the boxes above and would therefore constitute a sensitive data breach. That's not an endorsement of what they stand for; it's simply a data-processing decision. But there may be a nuance in there which I didn't see present in the WhiteDate data - what if it contained illegal activity? (Sidenote: for the most part, HIBP is used by people in Western Europe, North America and Australasia, so when I say "illegal", I'm looking at it through that lens. Clearly, there are parts of the world where our "illegal" is their "normal", which further complicates how I run a service accessible from every corner of the world.) I had another example recently that went well beyond moral contention and deep into the realm of illegality:

New sensitive breach: "AI girlfriend" site Muah[.]ai had 1.9M email addresses breached last month. Data included AI prompts describing desired images, many sexual in nature and many describing child exploitation. 24% were already in @haveibeenpwned. More: https://t.co/NTXeQZFr2x

— Have I Been Pwned (@haveibeenpwned) October 8, 2024

Of all the different things people can disagree on when it comes to our moral compasses, paedophilia is where we unanimously draw the line. But I still flagged it as sensitive because of the reasons outlined above. Many people using the service were just lonely guys trying to create an AI girlfriend with no prompts around age. There would be email addresses in there that weren’t entered by the rightful owner. And then, there are cases like this:

That's a firstname.lastname Gmail address. Drop it into Outlook and it automatically matches the owner. It has his name, his job title, the company he works for and his professional photo, all matched to that AI prompt. pic.twitter.com/wpXQMBLf3B

— Troy Hunt (@troyhunt) October 9, 2024

I sat there with my wife, looking at the LinkedIn profile that used the same email address as the person who posted that comment. We looked at his photo and at the veneer of professionalism that surrounded him on that site, knowing what he had written in that prompt above. It was repulsive. Further, beyond being solely an affront to our morals, it was clearly illegal. So, I had many conversations with law enforcement agencies around the world and ensured they had access to the data. Involving law enforcement where data sets contain illegal activity is absolutely the right approach here, but equally, not being the vehicle for implying someone’s affiliation or beliefs and doxing them publicly without due process is also absolutely the right approach.

I understand the gut reaction that flagging a breach like WhiteDate as sensitive protects people whom most of us do not like. But a dozen years of running this service have caused me to consider individual privacy and rights literally hundreds of times, and these conclusions aren’t arrived at hastily. Imagine for a moment, the possible ramifications for HIBP if the service were used to publicly shame someone as a "Nazi" and that, in turn, had serious real-world consequences for them. Whether that implication was right or not, there are potentially serious ramifications for us that could well leave us unable to operate at all. And, as the Ashley Madison examples show, there are also potentially life-threatening outcomes for individuals.

I don't particularly care about one random, anonymous X account making poorly thought-out statements, but the same sentiment has been expressed after loading previous similar breaches, and it deserves a blog post. Equally, I've written before about why all the other data breaches are publicly searchable and again, that conclusion is not arrived at lightly.

I’ll finish with a note about privacy that relates to my earlier comment about it being a human right. It's literally a human right under Article 12 of the Universal Declaration of Human Rights:

No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks.

Breaches with legally defined sensitive data will continue to be flagged as sensitive, and breaches with illegal data will continue to be forwarded to law enforcement agencies.

Weekly Update 485

7 January 2026 at 06:26
Weekly Update 485

15 mins and 40 seconds. That's how long it took to troubleshoot the first tech problem of 2026, and that's how far you'll need to skip through this video to hear the audio at normal volume. The problem Scott and I had is analogous to the troubleshooting so many of us do in our roles day in and day out:

  1. This should work fine
  2. It doesn't work, and I don't know why
  3. I did something that seems unrelate,d and now it works
  4. I still don't know why

Anyway, I've cleaned up the audio-only version for the podcast, but I can't change the YouTube version once it's streamed, so apologies, just pump your volume up for the first quarter hour. And Happy New Year!

Weekly Update 485
Weekly Update 485
Weekly Update 485
Weekly Update 485

Weekly Update 484

28 December 2025 at 09:33
Weekly Update 484

I think the start of this week's video really nailed it for the techies amongst us: shit doesn't work, you change something random and now shit works and yu have no idea why 🤷‍♂️ Such was my audio this week and apoligise to those of you watching the video below for the first few mins (although I managed to clean up the audio-only podcast version). Ironically, doing things non-standard at home was intended to iron out the creases before the impending travel so... a week from now when I do this with Scott Helme from Duabi it'll all be fine! Let's see 🤞

Weekly Update 484
Weekly Update 484
Weekly Update 484
Weekly Update 484

References

  1. Sponsored by: Malwarebytes Browser Guard blocks phishing, ads, scams, and trackers for safer, faster browsing

Weekly Update 483

20 December 2025 at 06:31
Weekly Update 483

Building out an IoT environment is a little like the old Maslow's Hierarchy of Needs. All the stuff on the top is only any good if all the stuff on the bottom is good, starting with power. This week, I couldn't even get that right, but thankfully, sparky to rescue and ensuite underfloor heating disconnected, and we now have reliable power again. On top of that is the layer that has increasingly been my nemesis - the network. Two days after recording, I've just spent the better part of the entire day making a much more concerted effort to adjust channel and power settings on APs, lock clients that don't move to the APs that make the most sense, and generally just screw around with it until stuff worked. And then I turned off a circuit, turned it back on again, and all hell broke loose 😭

Weekly Update 483
Weekly Update 483
Weekly Update 483
Weekly Update 483

References

  1. Sponsored by: 1Password Extended Access Management: Secure every sign-in for every app on every device.

Weekly Update 482

16 December 2025 at 22:52
Weekly Update 482

Perhaps it's just the time of year where we all start to wind down a bit, or maybe I'm just tired after another massive 12 months, but this week's vid is way late. Ok, going away to the place that had just been breached (ironic!) didn't help, but I think in general the pace we've maintained this year just needs to come back a bit. That said, I'll try to get this week's and next week's out on time, then it's off on travels for the next four weeks after that. Stay tuned for more IoT problems in a few days from now 🤦‍♂️

Weekly Update 482
Weekly Update 482
Weekly Update 482
Weekly Update 482

References

  1. Sponsored by: Malwarebytes Browser Guard blocks phishing, ads, scams, and trackers for safer, faster browsing
  2. Spicers Retreats suffered a data breach they attributed back to an attack on the Mews reservation platform (timely, given we had a getaway booked there only a couple of days later)
  3. We worked through 630 million more passwords provided by the FBI (that includes 46 million we've never seen before)
  4. Hmmm... spam to a Qantas-only email address, wonder where that might have come from? (this should be impossible because there's an injunction in place 🤦‍♂️)

Processing 630 Million More Pwned Passwords, Courtesy of the FBI

12 December 2025 at 21:29
Processing 630 Million More Pwned Passwords, Courtesy of the FBI

The sheer scope of cybercrime can be hard to fathom, even when you live and breathe it every day. It's not just the volume of data, but also the extent to which it replicates across criminal actors seeking to abuse it for their own gain, and to our detriment.

We were reminded of this recently when the FBI reached out and asked if they could send us 630 million more passwords. For the last four years, they've been sending over passwords found during the course of their investigations in the hope that we can help organisations block them from future use. Back then, we were supporting 1.26 billion searches of the service each month. Now, it's... more:

Just as it's hard to wrap your head around the scale of cybercrime, I find it hard to grasp that number fully. On average, that service is hit nearly 7 thousand times per second, and at peak, it's many times more than that. Every one of those requests is a chance to stop an account takeover. But the real scale goes well beyond the API itself. Because the data model is open source and freely available, many organisations use the Pwned Passwords Downloader to take the entire corpus offline and query it directly within their own applications. That tool alone calls the API around a million times during download, but the resulting data is then queried… well, who knows how many times after that. Pretty cool, right?

This latest corpus of data came to us as a result of the FBI seizing multiple devices belonging to a suspect. The data appeared to have originated from both the open web and Tor-based marketplaces, Telegram channels and infostealer malware families. We hadn't seen about 7.4% of them in HIBP before, which might sound small, but that's 46 million vulnerable passwords we weren't giving people using the service the opportunity to block. So, we've added those and bumped the prevalence count on the other 584 million we already had.

We're thrilled to be able to provide this service to the community for free and want to also quickly thank Cloudflare for their support in providing us with the infrastructure to make this possible. Thanks to their edge caching tech, all those passwords are queryable from a location just a handful of milliseconds away from wherever you are on the globe.

If you're hitting the API, then all the data is already searchable for you. If you're downloading it all offline, go and grab the latest data now. Either way, go forth and put it to good use and help make a cybercriminal's day just that much harder 😊

Weekly Update 481

5 December 2025 at 07:14
Weekly Update 481

Twelve years (and one day) since launching Have I Been Pwned, it's now a service that Charlotte and I live and breathe every day. From the first thing every morning to the last thing each day, from holidays to birthdays, in sickness and in heal... wait a minute - did we marry each other or a data breach service?! We decided to do a 12th-birthday special together today to give everyone a bit more insight into what she does and what life is like running this service. It's a different weekly vid, and we really hope you enjoy watching it 😊

Weekly Update 481
Weekly Update 481
Weekly Update 481
Weekly Update 481

References

  1. Sponsored by: Report URI: Guarding you from rogue JavaScript! Don’t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. Just because a "fake" email address is in HIBP, it doesn't mean HIBP isn't accurately indexing data breaches (if it looks like an email address, it's an email address)

Why Does Have I Been Pwned Contain "Fake" Email Addresses?

3 December 2025 at 23:37
Why Does Have I Been Pwned Contain "Fake" Email Addresses?

Normally, when someone sends feedback like this, I ignore it, but it happens often enough that it deserves an explainer, because the answer is really, really simple. So simple, in fact, that it should be evident to the likes of Bruce, who decided his misunderstanding deserved a 1-star Trustpilot review yesterday:

Why Does Have I Been Pwned Contain "Fake" Email Addresses?

Now, frankly, Trustpilot is a pretty questionable source of real-world, quality reviews anyway, but the same feedback has come through other channels enough times that let's just sort this out once and for all. It all begins with one simple question:

What is an Email Address?

You think you know - and Bruce thinks he knows - but you might both be wrong. To explain the answer to the question, we need to start with how HIBP ingests data, and that really is pretty simple: someone sends us a breach (which is typically just text files of data), and we run the open source Email Address Extractor tool over it, which then dumps all the unique addresses into a file. That file is then uploaded into the system, where the addresses are then searchable.

The logic for how we extract addresses is all in that Github repository, but in simple terms, it boils down to this:

  1. There must be an @ symbol
  2. There can be up to 64 characters before it (the alias)
  3. There can be up to 255 characters after it (the domain)
  4. The domain must contain a period
  5. The domain must also have a valid TLD
  6. A few other little criteria that are all documented in the public repo

That is all! We can't then tell if there's an actual mailbox behind the address, as that would require massive per-address processing, for example, sending an email to each one and seeing if it bounces. Can you imagine doing that 7 billion times?! That's the number of unique addresses in HIBP, and clearly, it's impossible. So, that means all the following were parsed as being valid and loaded into HIBP (deep links to the search result):

  1. test@example.com
  2. _test@google.com
  3. fuckingwasteoftime@foo.com

I particularly like that last one, as it feels like a sentiment Bruce would express. It's also a great example as it's clearly not "real"; the alias is a bit of a giveaway, as is the domain ("foo" is commonly used as a placeholder, similar to how we might also use "bar", or combine them as "foo bar"). But if you follow the link and see the breach it was exposed in, you'll see a very familiar name:

Why Does Have I Been Pwned Contain "Fake" Email Addresses?

Which brings us to the next question:

How Do "Fake" Email Addresses End up in Real Websites?

This is also going to seem profoundly simple when you see it. Here goes:

Why Does Have I Been Pwned Contain "Fake" Email Addresses?

Any questions, Bruce? This is just as easily explainable as why we considered it a valid address and ingested it into HIBP: the email address has a valid structure. That is all. That's how it got into Adobe, and that's how it then flowed through into HIBP.

Ah, but shouldn't Adobe verify the address? I mean, shouldn't they send an email to the address along the lines of "Hey, are you sure you want to sign up for this service?" Yes, they should, but here's the kicker: that doesn't stop the email address from being added to their database in the first place! The way this normally works (and this is what we do with HIBP when you sign up for the free notification service) is you enter the email address, the system generates a random token, and then the two are saved together in the database. A link with the token is then emailed to the address and used to verify the user if they then follow that link. And if they don't follow that link? We delete the email address if it hasn't been verified within a few days, but evidently, Adobe doesn't. Most services don't, so here we are.

How Can I Be Really Sure Actual Fake Addresses Aren't in HIBP?

This is also going to seem profoundly obvious, but genuinely random email addresses (not "thisisfuckinguseless@") won't show up in HIBP. Want to test the theory? Try 1Password's generator (yes, Bruce, they also sponsor HIBP):

Why Does Have I Been Pwned Contain "Fake" Email Addresses?

Now, whack that on the foo.com domain and do a search:

Why Does Have I Been Pwned Contain "Fake" Email Addresses?

Huh, would you look at that? And you can keep doing that over and over again. You’ll get the same result because they are fabricated addresses that no one else has created or entered into a website that was subsequently breached, ipso facto proving they cannot appear in the dataset.

Conclusion

Today is HIBP's 12th birthday, and I've taken particular issue with Bruce's review because it calls into question the integrity with which I run this service. This is now the 218th blog post I've written about HIBP, and over the last dozen years, I've detailed everything from the architecture to the ethical considerations to how I verify breaches. It's hard to imagine being any more transparent about how this service runs, and per the above, it's very simple to disprove the Bruces of the world. If you've read this far and have an accurate, fact-based review you'd like to leave, that'd be awesome 😊

Weekly Update 480

1 December 2025 at 06:11
Weekly Update 480

Well, I now have the answer to how Snapchat does age verification for under-16s: they give an underage kid the ability to change their date of birth, then do a facial scan to verify. The facial scan (a third party tells me...) allows someone well under 16 to pass it easily. So, is that control "reasonable"? I guess that will depend on whether this case is an outlier or a much more common scenario, and a sample set of one isn't particularly scientific. Either way, I expect that what we're seeing is representative of a pretty obvious problem: privacy-preserving age verification is very unlikely to be reliable. It will inevitably result in letting too many young kids through, whilst blocking too many people of legitimate age. Or we end up with people needing to start uploading formal age-verification documents, which creates a whole new problem. Absolutely none of this should come as any surprise whatsoever!

Weekly Update 480
Weekly Update 480
Weekly Update 480
Weekly Update 480

References

  1. Sponsored by: Report URI: Guarding you from rogue JavaScript! Don’t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. This week, it's all about Australia's social media ban for under 16s (link to the thread that sparked all the debate)
  3. I wrote about "sharenting" back in 2020 (lots in there about protecting kids online whilst also making appropriate use of technology)
  4. Our eSafety Commissioner has an FAQ on what the ban means (lot of use of the word "reasonable" in there)

Weekly Update 479

23 November 2025 at 04:44
Weekly Update 479

I gave up on the IoT water meter reader. Being technical and thinking you can solve everything with technology is both a blessing and a curse; dogged persistence has given me the life I have today, but it has also burned serious amounts of time because I never want to let a problem go unsolved. But sometimes, common sense and the ROI of my time have to prevail, so I packed up all the gear and went back to processing data breaches. If you happen to solve this problem in a way that doesn't require any more time investment on my end, I'd love to hear it 😊

Weekly Update 479
Weekly Update 479
Weekly Update 479
Weekly Update 479

References

  1. Sponsored by: 1Password Extended Access Management: Secure every sign-in for every app on every device
  2. We've had a massive month on HIBP (20M+ visits is a solid number!)

Weekly Update 478

16 November 2025 at 08:13
Weekly Update 478

This week, it was an absolute privilege to be at Europol in The Hague, speaking about cyber offenders and at the InterCOP conference and spending time with some of the folks involved in the Operation Endgame actions. The latter in particular gave me a new sense of just how much coordination is involved in this sort of operation, all the way down to some of the messaging in the videos they've since released. I've seen some social commentary on these already, check them out and see what you think, especially as it relates to the psyops those videos play a role in.

Weekly Update 478
Weekly Update 478
Weekly Update 478
Weekly Update 478

References

  1. Sponsored by: Malwarebytes Browser Guard blocks phishing, ads, scams, and trackers for safer, faster browsing
  2. Operation Endgame saw a significant amount of criminal infrastructure taken down by Europol and friends (it's now the third "season" of Endgame that has ended up in HIBP)

Weekly Update 477

12 November 2025 at 12:27
Weekly Update 477

What. A. Week. It wasn't just the preceding weeks of technical pain as we tried to work out how to get this data loaded, it was all the subsequent queries we had to deal with too. Some of them are totally understandable, whilst others just resulted in endless facepalms 🤦‍♂️ But we got there in the end with the worst of it just being a 24-hour period where we ended up on a SpamCop block list, for reasons I still don't understand. We are still on the very tail end of sending individual notifications, so there may be more to update in the next vid, but at least that one will be from home with sunshine, good coffee and a slower pace 😊

Weekly Update 477
Weekly Update 477
Weekly Update 477
Weekly Update 477

References

  1. Sponsored by: Report URI: Guarding you from rogue JavaScript! Don’t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. Our largest corpus of data ever added to HIBP went live (1.3B passwords and 2B email addresses 🫨)
  3. Belgium was super pretty and a nice interlude between Norway and the Netherlands (including some time with our friends at the Centre for Cybersecurity Belgium)

2 Billion Email Addresses Were Exposed, and We Indexed Them All in Have I Been Pwned

5 November 2025 at 06:41
2 Billion Email Addresses Were Exposed, and We Indexed Them All in Have I Been Pwned

I hate hyperbolic news headlines about data breaches, but for the "2 Billion Email Addresses" headline to be hyperbolic, it'd need to be exaggerated or overstated - and it isn't. It's rounded up from the more precise number of 1,957,476,021 unique email addresses, but other than that, it's exactly what it sounds like. Oh - and 1.3 billion unique passwords, 625 million of which we'd never seen before either. It's the most extensive corpus of data we've ever processed, by a significant margin.

Edit: Just to be crystal clear about the origin of the data and the role of Synthient (who you’ll read about in the next paragraph): this data came from numerous locations where cybercriminals had published it. Synthient (run by Ben during his final year of college) indexed that data and provided it to Have I Been Pwned solely for the purpose of notifying victims. He’s the good guy shining a light on the bad guys, so keep that in mind as you read on. (Some of the feedback Ben has received is exactly what I foreshadowed in the final paragraph of this post.)

A couple of weeks ago, I wrote about the 183M unique email addresses that Synthient had indexed in their threat intelligence platform and then shared with us. I explained that this was only part of the corpus of data they'd indexed, and that it didn't include the credential stuffing records. Stealer log data is obtained by malware running on infected machines. In contrast, credential stuffing lists usually originate from other data breaches where email addresses and passwords are exposed. They're then bundled up, sold, redistributed, and ultimately used to log in to victims' accounts. Not just the accounts they were initially breached from, either, because people reuse the same password over and over again, the data from one breach is frequently usable on completely unrelated sites. A breach of a forum to comment on cats often exposes data that can then be used to log in to the victim's shopping, social media and even email accounts. In that regard, credential stuffing data becomes "the keys to the castle".

Let me run through how we verified the data, what you can do about it and for the tech folks, some of the hoops we had to jump through to make processing this volume of data possible.

Data Verification

The first person whose data I verified was easy - me 😔 An old email address I've had since the 90s has been in credential stuffing lists before, so it wasn't too much of a surprise. Furthermore, I found a password associated with my address, which I'd definitely used many eons ago, and it was about as terrible as you'd expect from that era. However, none of the other passwords associated with my address were familiar. They certainly looked like passwords that other people might have feasibly used, but I'm pretty sure they weren't mine. One was even just an IP address from Perth on the other side of the country, which is both infeasible as a password I would have used, yet eerily close to home. I mean, of all the places in the world an IP address could have appeared from, it had to be somewhere in my own country I've been many times before...

Moving on to HIBP subscribers, I reached out to a handful and asked for support verifying the data. I chose a mix of subscribers with many who'd never been involved in any data breach we'd ever seen before; my experience above suggested that there's recycled data in there, and we had previously verified that when investigating those other incidents. However, is the all-new stuff legitimate? The very first response I received was exactly what I was looking for:

#1 is an old password that I don't use anymore. #2 is a more recent password. Thanks for the heads up, I've gone and changed the password for every critical account that used either one. 

Perfectly illustrating most people's behaviour with passwords, #2 referred to above was just #1 with two exclamation marks at the end!! (Incidentally, these were simple six and eight-character passwords, and neither of them was in Pwned Passwords either.) He had three passwords in total, which also means one of them, like with my data, was not familiar. However, the most important thing here is that this example perfectly illustrates why we put the effort into processing data like this: #2 was a real, live password that this guy was actively using, and it was sitting right next to his email address, being passed around among criminals. However, through this effort, that credential pair has now become useless, which is precisely what we're aiming for with this exercise, just a couple of billion times over.

The second respondent only had one password against their address:

Yes that was a password I used for many years for what I would call throw away or unimportant accounts between 20 and 10 years ago

That was also only eight characters, but this time, we'd seen it in Pwned Passwords many times before. And the observation about the password's age was consistent with my own records, so there's definitely some pretty old data in there.

The following response was not at all surprising:

I am familiar with that password... I used it almost 10 years ago... and cannot recall the last time I used it.

That was on a corporate account, too, and the owner of the address duly forwarded my email to the cybersecurity team for further investigation. The single password associated with this lady's email address had a massive nine characters, and also hadn't previously appeared in Pwned Passwords.

Next up was a respondent who replied inline to my questions, so I'll list them below with the corresponding answers:

Is this familiar? Yes  
Have you ever used it in the past? Yes and is still on some accounts I do not use any longer.
And if so, how long ago? Unfortunately, it is still on some active accounts that I have just made a list of to change or close immediately.

This individual's eight-character password with uppercase, lowercase, numbers and a "special" character also wasn't in Pwned Passwords. Similarly, as with the earlier response, that password was still in active use, posing a real risk to the owner. It would pass most password complexity criteria and slip through any service using Pwned Passwords to block bad ones, so again, this highlights why it was so important for us to process the data.

The next person had three different passwords against rows with their email address, and they came back with a now common response:

Yes, these are familiar, last used 10 years ago

We'd actually seen all three of them in Pwned Passwords before, many times each. Another respondent with precisely the kind of gamer-like passwords you'd expect a kid to use (one of which we hadn't seen before), also confirmed (I think?) their use:

maybe when i was a kid lol

Responses that weren't an emphatic "yes, that's my data" were scarce. The two passwords against one person's name were both in Pwned Passwords (albeit only once each), yet it's entirely possible that neither of them had been used by this specific individual before. It's also possible they'd forgotten a password they'd used more than a decade ago, or it may have even been automatically assigned to them by the service that was subsequently breached. Put it down as a statistical anomaly, but I thought it was worth mentioning to highlight that being in this data set isn't a guarantee of a genuine password of yours being exposed. If your email address is found in this corpus then that's real, of course, so there must be some truth in the data, but it's a reminder that when data is aggregated from so many different sources over such a long period of time, there's going to be some inconsistencies.

Searching Pwned Passwords

As a brief recap, we load passwords into the service we call Pwned Passwords. When we do so, there is absolutely no association between the password and the email address it appeared next to. This is for both your protection and ours; can you imagine if HIBP was pwned? It's not beyond the realm of possibility, and the impact of exposing billions of credential pairs that can immediately unlock an untold number of accounts would be catastrophic. It's highly risky, and completely unnecessary when you can search for standalone passwords anyway without creating the risk of it being linked back to someone.

Think about it: if you have a password of "Fido123!" and you find it's been previously exposed (which it has), it doesn't matter if it was exposed against your email address or someone else's; it's still a bad password because it's named after your dog followed by a very predictable pattern. If you have a genuinely strong password and it's in Pwned Passwords, then you can walk away with some confidence that it really was yours. Either way, you shouldn't ever use that password again anywhere, and Pwned Passwords has done its job.

Checking the service is easy, anonymous and depending on your level of technical comfort, can be done in several different ways. Here's a copy and paste from the last Synthient blog post:

  1. Use the Pwned Passwords search page. Passwords are protected with an anonymity model, so we never see them (it's processed in the browser itself), but if you're wary, just check old ones you may suspect.
  2. Use the k-anonymity API. This is what drives the page in the previous point, and if you're handy with writing code, this is an easy approach and gives you complete confidence in the anonymity aspect.
  3. Use 1Password's Watchtower. The password manager has a built-in checker that uses the abovementioned API and can check all the passwords in your vault. (Disclosure: 1Password is a regular sponsor of this blog, and has product placement on HIBP.)
2 Billion Email Addresses Were Exposed, and We Indexed Them All in Have I Been Pwned

My vested interest in 1Password aside, Watchtower is the easiest, fastest way to understand your potential exposure in this incident. And in case you're wondering why I have so many vulnerable and reused passwords, it's a combination of the test accounts I've saved over the years and the 4-digit PINs some services force you to use. Would you believe that every single 4-digit number ever has been pwned?! (If you're interested, the ABC has a fantastic infographic using a heatmap based on HIBP data that shows some very predictable patterns for 4-digit PINs.)

This Is Not a Gmail Breach

It pains me to say it, but I have to, given the way the stealer logs made ridiculous, completely false headlines a couple of weeks ago:

This story has suddenly gained *way* more traction in recent hours, and something I thought was obvious needs clarifying: this *is not* a Gmail leak, it simply has the credentials of victims infected with malware, and Gmail is the dominant email provider: https://t.co/S75hF4T1es

— Troy Hunt (@troyhunt) October 27, 2025

There are 32 million different email domains in this latest corpus, of which gmail.com is one. It is, of course, the largest and has 394 million unique email addresses on it. In other words, 80% of the data in this corpus has absolutely nothing to do with Gmail, and the 20% of Gmail addresses have absolutely nothing to do with any sort of security vulnerability on Google's behalf. There - now let reporting sanity prevail!

The Technical Bits

I wanted to add this just to highlight how painful it has been to deal with this data. This corpus is nearly 3 times the size of the previous largest breach we'd loaded, and HIBP is many times larger than it was in 2019 when we loaded the Collection #1 data. Taking 2 billion records and adding the ones we hadn't already seen in the existing 15 billion corpus, whilst not adversely impacting the live system serving millions of visitors a day, was very non-trivial. Managing the nuances of SQL Server indexes such that we could optimise both inserts and queries is not my idea of fun, and it's been a pretty hard couple of weeks if I'm honest. It's also been a very expensive period as we turned the cloud up to 11 (we run on Azure SQL Hyperscale, which we maxed out at 80 cores for almost two weeks).

A simple example of the challenge is that after loading all the email addresses up into a staging table, we needed to create SHA1 hashes of each. Normally, that would involve something to the effect of "update table set column = sha1(email)" and you're done. That crashed completely, so we ended up doing "insert into new table select email, sha1(email)". But on other occasions the breach load required us to do updates on other columns (with no hash creation), which, on mulitple occasions, we had to kill after a day or more of execution with no end in sight. So, we ended up batching in loops (usually 1M records at a time), reporting on progress along the way so we had some idea of when it would actually finish. It was a painful process of trial, waiting ages, error then taking a completely different approach.

Notifying our subscribers is another problem. We have 5.9 million of them, and 2.9 million are in this data 🫨 Simply sending that many emails at once is hard. It's not so much hard in terms of firing them off, rather it's hard in terms of not ending up on a reputation naughty list or having mail throttled by the receiving server. That's happened many times in the past when loading large, albeit much smaller corpuses; Gmail, for example, suddenly sees a massive spike and slows down the delivery to inboxes. Not such a biggy for sending breach notices, but a major problem for people trying to sign into their dashboard who can no longer receive the email with the "magic" link.

What we've done to address that for this incident is to slow down the delivery of emails for the individual breach notification. Whilst I'd originally intended to send the emails at a constant rate over the period of a week, someone listening to me on my Friday live stream had a much better suggestion:

the strategy I've found to best work with large email delivery is to look at the average number of emails you've sent over the last 30 days each time you want to ramp up, and then increase that volume by around 50% per day until you've worked your way through the queue

Which makes a lot of sense, and stacked up as I did more research (thanks Joe!). So, here's what our planned delivery schedule now looks like:

2 Billion Email Addresses Were Exposed, and We Indexed Them All in Have I Been Pwned

That's broken down by hour, increasing in volume by 1.015 times per hour, such that the emails are spread out in a similar, gradually increasing cadence. On a daily basis, that works out at a 45% increase in each 24-hour period, within Joe's suggested 50% threshold. Plus, we obviously have all the other mechanisms such as a dedicated IP, properly configured DKIM, DMARC and SPF, only emailing double-opted-in subscribers and spam-friendly message body construction. So, it could be days before you receive a notification, or just run a haveibeenpwned.com search on demand if you're impatient.

We've sent all the domain notification emails instantly because, by definition, they're going to a very wide range of different mail servers; it's just the individual ones we're drop-feeding.

Lastly, if you've integrated Pwned Passwords into your service, you'll now see noticeably larger response sizes. The numbers I mentioned in the opening paragraph increase the size of each hash range by an average of about 50%, which will push responses from about 26kb to 40kb. That's when brotli compressed, so obviously, make sure you're making requests that make the most of the compression.

Conclusion

This data is now searchable in HIBP as the Synthient Credential Stuffing Threat Data. It's an entirely separate corpus from that previous Synthient data I mentioned earlier; they're discrete datasets with some crossover, but obviously, this one is significantly larger. And, of course, all the passwords are now searchable per the Pwned Passwords guidance above.

If I could close with one request: this was an extremely laborious, time-consuming and expensive exercise for us to complete. We've done our best to verify the integrity of the data and make it searchable in a practical way while remaining as privacy-centric as possible. Sending as many notifications as we have will inevitably lead to a barrage of responses from people wanting access to complete rows of data, grilling us on precisely where it was obtained from or, believe it or not, outright abusing us. Not doing those things would be awesome, and I suggest instead putting the energy into getting a password manager, making passwords strong and unique (or even better, using passkeys where available), and turning on multi-factor auth. That would be an awesome outcome for all 😊

Edit: I've closed off comments on this blog post. As you'll see below, there was a constant stream of questions that have already been answered in the post itself, plus some comments that were starting to verge on precisely what I predicted in the last para above. Reading, responding and engaging is time-consuming and at this point, all the answers are already here both above and below this edit in the comments.

❌