Dealing with failing web scrapers due to anti-bot protections or website changes? Meet Scrapling.
Scrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.
>> from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher
# Fetch websites' source under the radar!
>> page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
>> print(page.status)
200
>> products = page.css('.product', auto_save=True) # Scrape data that survives website design changes!
>> # Later, if the website structure changes, pass `auto_match=True`
>> products = page.css('.product', auto_match=True) # and Scrapling still finds them!
Fetcher
class.PlayWrightFetcher
class through your real browser, Scrapling's stealth mode, Playwright's Chrome browser, or NSTbrowser's browserless!StealthyFetcher
and PlayWrightFetcher
classes.from scrapling.fetchers import Fetcher
fetcher = Fetcher(auto_match=False)
# Do http GET request to a web page and create an Adaptor instance
page = fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)
# Get all text content from all HTML tags in the page except `script` and `style` tags
page.get_all_text(ignore_tags=('script', 'style'))
# Get all quotes elements, any of these methods will return a list of strings directly (TextHandlers)
quotes = page.css('.quote .text::text') # CSS selector
quotes = page.xpath('//span[@class="text"]/text()') # XPath
quotes = page.css('.quote').css('.text::text') # Chained selectors
quotes = [element.text for element in page.css('.quote .text')] # Slower than bulk query above
# Get the first quote element
quote = page.css_first('.quote') # same as page.css('.quote').first or page.css('.quote')[0]
# Tired of selectors? Use find_all/find
# Get all 'div' HTML tags that one of its 'class' values is 'quote'
quotes = page.find_all('div', {'class': 'quote'})
# Same as
quotes = page.find_all('div', class_='quote')
quotes = page.find_all(['div'], class_='quote')
quotes = page.find_all(class_='quote') # and so on...
# Working with elements
quote.html_content # Get Inner HTML of this element
quote.prettify() # Prettified version of Inner HTML above
quote.attrib # Get that element's attributes
quote.path # DOM path to element (List of all ancestors from <html> tag till the element itself)
To keep it simple, all methods can be chained on top of each other!
Scrapling isn't just powerful - it's also blazing fast. Scrapling implements many best practices, design patterns, and numerous optimizations to save fractions of seconds. All of that while focusing exclusively on parsing HTML documents. Here are benchmarks comparing Scrapling to popular Python libraries in two tests.
# | Library | Time (ms) | vs Scrapling |
---|---|---|---|
1 | Scrapling | 5.44 | 1.0x |
2 | Parsel/Scrapy | 5.53 | 1.017x |
3 | Raw Lxml | 6.76 | 1.243x |
4 | PyQuery | 21.96 | 4.037x |
5 | Selectolax | 67.12 | 12.338x |
6 | BS4 with Lxml | 1307.03 | 240.263x |
7 | MechanicalSoup | 1322.64 | 243.132x |
8 | BS4 with html5lib | 3373.75 | 620.175x |
As you see, Scrapling is on par with Scrapy and slightly faster than Lxml which both libraries are built on top of. These are the closest results to Scrapling. PyQuery is also built on top of Lxml but still, Scrapling is 4 times faster.
Library | Time (ms) | vs Scrapling |
---|---|---|
Scrapling | 2.51 | 1.0x |
AutoScraper | 11.41 | 4.546x |
Scrapling can find elements with more methods and it returns full element Adaptor
objects not only the text like AutoScraper. So, to make this test fair, both libraries will extract an element with text, find similar elements, and then extract the text content for all of them. As you see, Scrapling is still 4.5 times faster at the same task.
All benchmarks' results are an average of 100 runs. See our benchmarks.py for methodology and to run your comparisons.
Scrapling is a breeze to get started with; Starting from version 0.2.9, we require at least Python 3.9 to work.
pip3 install scrapling
Then run this command to install browsers' dependencies needed to use Fetcher classes
scrapling install
If you have any installation issues, please open an issue.
Fetchers are interfaces built on top of other libraries with added features that do requests or fetch pages for you in a single request fashion and then return an Adaptor
object. This feature was introduced because the only option we had before was to fetch the page as you wanted it, then pass it manually to the Adaptor
class to create an Adaptor
instance and start playing around with the page.
You might be slightly confused by now so let me clear things up. All fetcher-type classes are imported in the same way
from scrapling.fetchers import Fetcher, StealthyFetcher, PlayWrightFetcher
All of them can take these initialization arguments: auto_match
, huge_tree
, keep_comments
, keep_cdata
, storage
, and storage_args
, which are the same ones you give to the Adaptor
class.
If you don't want to pass arguments to the generated Adaptor
object and want to use the default values, you can use this import instead for cleaner code:
from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher
then use it right away without initializing like:
page = StealthyFetcher.fetch('https://example.com')
Also, the Response
object returned from all fetchers is the same as the Adaptor
object except it has these added attributes: status
, reason
, cookies
, headers
, history
, and request_headers
. All cookies
, headers
, and request_headers
are always of type dictionary
.
[!NOTE] The
auto_match
argument is enabled by default which is the one you should care about the most as you will see later.
This class is built on top of httpx with additional configuration options, here you can do GET
, POST
, PUT
, and DELETE
requests.
For all methods, you have stealthy_headers
which makes Fetcher
create and use real browser's headers then create a referer header as if this request came from Google's search of this URL's domain. It's enabled by default. You can also set the number of retries with the argument retries
for all methods and this will make httpx retry requests if it failed for any reason. The default number of retries for all Fetcher
methods is 3.
Hence: All headers generated by
stealthy_headers
argument can be overwritten by you through theheaders
argument
You can route all traffic (HTTP and HTTPS) to a proxy for any of these methods in this format http://username:password@localhost:8030
>> page = Fetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)
>> page = Fetcher().post('https://httpbin.org/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030')
>> page = Fetcher().put('https://httpbin.org/put', data={'key': 'value'})
>> page = Fetcher().delete('https://httpbin.org/delete')
For Async requests, you will just replace the import like below:
>> from scrapling.fetchers import AsyncFetcher
>> page = await AsyncFetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)
>> page = await AsyncFetcher().post('https://httpbin.org/post', data={'key': 'value'}, proxy='http://username:password@localhost:8030')
>> page = await AsyncFetcher().put('https://httpbin.org/put', data={'key': 'value'})
>> page = await AsyncFetcher().delete('https://httpbin.org/delete')
This class is built on top of Camoufox, bypassing most anti-bot protections by default. Scrapling adds extra layers of flavors and configurations to increase performance and undetectability even further.
>> page = StealthyFetcher().fetch('https://www.browserscan.net/bot-detection') # Running headless by default
>> page.status == 200
True
>> page = await StealthyFetcher().async_fetch('https://www.browserscan.net/bot-detection') # the async version of fetch
>> page.status == 200
True
Note: all requests done by this fetcher are waiting by default for all JS to be fully loaded and executed so you don't have to :)
This list isn't final so expect a lot more additions and flexibility to be added in the next versions!
This class is built on top of Playwright which currently provides 4 main run options but they can be mixed as you want.
>> page = PlayWrightFetcher().fetch('https://www.google.com/search?q=%22Scrapling%22', disable_resources=True) # Vanilla Playwright option
>> page.css_first("#search a::attr(href)")
'https://github.com/D4Vinci/Scrapling'
>> page = await PlayWrightFetcher().async_fetch('https://www.google.com/search?q=%22Scrapling%22', disable_resources=True) # the async version of fetch
>> page.css_first("#search a::attr(href)")
'https://github.com/D4Vinci/Scrapling'
Note: all requests done by this fetcher are waiting by default for all JS to be fully loaded and executed so you don't have to :)
Using this Fetcher class, you can make requests with: 1) Vanilla Playwright without any modifications other than the ones you chose. 2) Stealthy Playwright with the stealth mode I wrote for it. It's still a WIP but it bypasses many online tests like Sannysoft's. Some of the things this fetcher's stealth mode does include: * Patching the CDP runtime fingerprint. * Mimics some of the real browsers' properties by injecting several JS files and using custom options. * Using custom flags on launch to hide Playwright even more and make it faster. * Generates real browser's headers of the same type and same user OS then append it to the request's headers. 3) Real browsers by passing the real_chrome
argument or the CDP URL of your browser to be controlled by the Fetcher and most of the options can be enabled on it. 4) NSTBrowser's docker browserless option by passing the CDP URL and enabling nstbrowser_mode
option.
Hence using the
real_chrome
argument requires that you have Chrome browser installed on your device
Add that to a lot of controlling/hiding options as you will see in the arguments list below.
This list isn't final so expect a lot more additions and flexibility to be added in the next versions!
>>> quote.tag
'div'
>>> quote.parent
<data='<div class="col-md-8"> <div class="quote...' parent='<div class="row"> <div class="col-md-8">...'>
>>> quote.parent.tag
'div'
>>> quote.children
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<span>by <small class="author" itemprop=...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<div class="tags"> Tags: <meta class="ke...' parent='<div class="quote" itemscope itemtype="h...'>]
>>> quote.siblings
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
>>> quote.next # gets the next element, the same logic applies to `quote.previous`
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>
>>> quote.children.css_first(".author::text")
'Albert Einstein'
>>> quote.has_class('quote')
True
# Generate new selectors for any element
>>> quote.generate_css_selector
'body > div > div:nth-of-type(2) > div > div'
# Test these selectors on your favorite browser or reuse them again in the library's methods!
>>> quote.generate_xpath_selector
'//body/div/div[2]/div/div'
If your case needs more than the element's parent, you can iterate over the whole ancestors' tree of any element like below
for ancestor in quote.iterancestors():
# do something with it...
You can search for a specific ancestor of an element that satisfies a function, all you need to do is to pass a function that takes an Adaptor
object as an argument and return True
if the condition satisfies or False
otherwise like below:
>>> quote.find_ancestor(lambda ancestor: ancestor.has_class('row'))
<data='<div class="row"> <div class="col-md-8">...' parent='<div class="container"> <div class="row...'>
You can select elements by their text content in multiple ways, here's a full example on another website:
>>> page = Fetcher().get('https://books.toscrape.com/index.html')
>>> page.find_by_text('Tipping the Velvet') # Find the first element whose text fully matches this text
<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>
>>> page.urljoin(page.find_by_text('Tipping the Velvet').attrib['href']) # We use `page.urljoin` to return the full URL from the relative `href`
'https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html'
>>> page.find_by_text('Tipping the Velvet', first_match=False) # Get all matches if there are more
[<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>]
>>> page.find_by_regex(r'Β£[\d\.]+') # Get the first element that its text content matches my price regex
<data='<p class="price_color">Β£51.77</p>' parent='<div class="product_price"> <p class="pr...'>
>>> page.find_by_regex(r'Β£[\d\.]+', first_match=False) # Get all elements that matches my price regex
[<data='<p class="price_color">Β£51.77</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">Β£53.74</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">Β£50.10</p>' parent='<div class="product_price"> <p class="pr...'>,
<data='<p class="price_color">Β£47.82</p>' parent='<div class="product_price"> <p class="pr...'>,
...]
Find all elements that are similar to the current element in location and attributes
# For this case, ignore the 'title' attribute while matching
>>> page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title'])
[<data='<a href="catalogue/a-light-in-the-attic_...' parent='<h3><a href="catalogue/a-light-in-the-at...'>,
<data='<a href="catalogue/soumission_998/index....' parent='<h3><a href="catalogue/soumission_998/in...'>,
<data='<a href="catalogue/sharp-objects_997/ind...' parent='<h3><a href="catalogue/sharp-objects_997...'>,
...]
# You will notice that the number of elements is 19 not 20 because the current element is not included.
>>> len(page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title']))
19
# Get the `href` attribute from all similar elements
>>> [element.attrib['href'] for element in page.find_by_text('Tipping the Velvet').find_similar(ignore_attributes=['title'])]
['catalogue/a-light-in-the-attic_1000/index.html',
'catalogue/soumission_998/index.html',
'catalogue/sharp-objects_997/index.html',
...]
To increase the complexity a little bit, let's say we want to get all books' data using that element as a starting point for some reason
>>> for product in page.find_by_text('Tipping the Velvet').parent.parent.find_similar():
print({
"name": product.css_first('h3 a::text'),
"price": product.css_first('.price_color').re_first(r'[\d\.]+'),
"stock": product.css('.availability::text')[-1].clean()
})
{'name': 'A Light in the ...', 'price': '51.77', 'stock': 'In stock'}
{'name': 'Soumission', 'price': '50.10', 'stock': 'In stock'}
{'name': 'Sharp Objects', 'price': '47.82', 'stock': 'In stock'}
...
The documentation will provide more advanced examples.
Let's say you are scraping a page with a structure like this:
<div class="container">
<section class="products">
<article class="product" id="p1">
<h3>Product 1</h3>
<p class="description">Description 1</p>
</article>
<article class="product" id="p2">
<h3>Product 2</h3>
<p class="description">Description 2</p>
</article>
</section>
</div>
And you want to scrape the first product, the one with the p1
ID. You will probably write a selector like this
page.css('#p1')
When website owners implement structural changes like
<div class="new-container">
<div class="product-wrapper">
<section class="products">
<article class="product new-class" data-id="p1">
<div class="product-info">
<h3>Product 1</h3>
<p class="new-description">Description 1</p>
</div>
</article>
<article class="product new-class" data-id="p2">
<div class="product-info">
<h3>Product 2</h3>
<p class="new-description">Description 2</p>
</div>
</article>
</section>
</div>
</div>
The selector will no longer function and your code needs maintenance. That's where Scrapling's auto-matching feature comes into play.
from scrapling.parser import Adaptor
# Before the change
page = Adaptor(page_source, url='example.com')
element = page.css('#p1' auto_save=True)
if not element: # One day website changes?
element = page.css('#p1', auto_match=True) # Scrapling still finds it!
# the rest of the code...
How does the auto-matching work? Check the FAQs section for that and other possible issues while auto-matching.
Let's use a real website as an example and use one of the fetchers to fetch its source. To do this we need to find a website that will change its design/structure soon, take a copy of its source then wait for the website to make the change. Of course, that's nearly impossible to know unless I know the website's owner but that will make it a staged test haha.
To solve this issue, I will use The Web Archive's Wayback Machine. Here is a copy of StackOverFlow's website in 2010, pretty old huh?Let's test if the automatch feature can extract the same button in the old design from 2010 and the current design using the same selector :)
If I want to extract the Questions button from the old design I can use a selector like this #hmenus > div:nth-child(1) > ul > li:nth-child(1) > a
This selector is too specific because it was generated by Google Chrome. Now let's test the same selector in both versions
>> from scrapling.fetchers import Fetcher
>> selector = '#hmenus > div:nth-child(1) > ul > li:nth-child(1) > a'
>> old_url = "https://web.archive.org/web/20100102003420/http://stackoverflow.com/"
>> new_url = "https://stackoverflow.com/"
>>
>> page = Fetcher(automatch_domain='stackoverflow.com').get(old_url, timeout=30)
>> element1 = page.css_first(selector, auto_save=True)
>>
>> # Same selector but used in the updated website
>> page = Fetcher(automatch_domain="stackoverflow.com").get(new_url)
>> element2 = page.css_first(selector, auto_match=True)
>>
>> if element1.text == element2.text:
... print('Scrapling found the same element in the old design and the new design!')
'Scrapling found the same element in the old design and the new design!'
Note that I used a new argument called automatch_domain
, this is because for Scrapling these are two different URLs, not the website so it isolates their data. To tell Scrapling they are the same website, we then pass the domain we want to use for saving auto-match data for them both so Scrapling doesn't isolate them.
In a real-world scenario, the code will be the same except it will use the same URL for both requests so you won't need to use the automatch_domain
argument. This is the closest example I can give to real-world cases so I hope it didn't confuse you :)
Notes: 1. For the two examples above I used one time the Adaptor
class and the second time the Fetcher
class just to show you that you can create the Adaptor
object by yourself if you have the source or fetch the source using any Fetcher
class then it will create the Adaptor
object for you. 2. Passing the auto_save
argument with the auto_match
argument set to False
while initializing the Adaptor/Fetcher object will only result in ignoring the auto_save
argument value and the following warning message text Argument `auto_save` will be ignored because `auto_match` wasn't enabled on initialization. Check docs for more info.
This behavior is purely for performance reasons so the database gets created/connected only when you are planning to use the auto-matching features. Same case with the auto_match
argument.
auto_match
parameter works only for Adaptor
instances not Adaptors
so if you do something like this you will get an error python page.css('body').css('#p1', auto_match=True)
because you can't auto-match a whole list, you have to be specific and do something like python page.css_first('body').css('#p1', auto_match=True)
Inspired by BeautifulSoup's find_all
function you can find elements by using find_all
/find
methods. Both methods can take multiple types of filters and return all elements in the pages that all these filters apply to.
So the way it works is after collecting all passed arguments and keywords, each filter passes its results to the following filter in a waterfall-like filtering system.
It filters all elements in the current page/element in the following order:
Note: The filtering process always starts from the first filter it finds in the filtering order above so if no tag name(s) are passed but attributes are passed, the process starts from that layer and so on. But the order in which you pass the arguments doesn't matter.
Examples to clear any confusion :)
>> from scrapling.fetchers import Fetcher
>> page = Fetcher().get('https://quotes.toscrape.com/')
# Find all elements with tag name `div`.
>> page.find_all('div')
[<data='<div class="container"> <div class="row...' parent='<body> <div class="container"> <div clas...'>,
<data='<div class="row header-box"> <div class=...' parent='<div class="container"> <div class="row...'>,
...]
# Find all div elements with a class that equals `quote`.
>> page.find_all('div', class_='quote')
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Same as above.
>> page.find_all('div', {'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Find all elements with a class that equals `quote`.
>> page.find_all({'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Find all div elements with a class that equals `quote`, and contains the element `.text` which contains the word 'world' in its content.
>> page.find_all('div', {'class': 'quote'}, lambda e: "world" in e.css_first('.text::text'))
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>]
# Find all elements that don't have children.
>> page.find_all(lambda element: len(element.children) > 0)
[<data='<html lang="en"><head><meta charset="UTF...'>,
<data='<head><meta charset="UTF-8"><title>Quote...' parent='<html lang="en"><head><meta charset="UTF...'>,
<data='<body> <div class="container"> <div clas...' parent='<html lang="en"><head><meta charset="UTF...'>,
...]
# Find all elements that contain the word 'world' in its content.
>> page.find_all(lambda element: "world" in element.text)
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>,
<data='<a class="tag" href="/tag/world/page/1/"...' parent='<div class="tags"> Tags: <meta class="ke...'>]
# Find all span elements that match the given regex
>> page.find_all('span', re.compile(r'world'))
[<data='<span class="text" itemprop="text">"The...' parent='<div class="quote" itemscope itemtype="h...'>]
# Find all div and span elements with class 'quote' (No span elements like that so only div returned)
>> page.find_all(['div', 'span'], {'class': 'quote'})
[<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
<data='<div class="quote" itemscope itemtype="h...' parent='<div class="col-md-8"> <div class="quote...'>,
...]
# Mix things up
>> page.find_all({'itemtype':"http://schema.org/CreativeWork"}, 'div').css('.author::text')
['Albert Einstein',
'J.K. Rowling',
...]
Here's what else you can do with Scrapling:
lxml.etree
object itself of any element directly python >>> quote._root <Element div at 0x107f98870>
Saving and retrieving elements manually to auto-match them outside the css
and the xpath
methods but you have to set the identifier by yourself.
To save an element to the database: python >>> element = page.find_by_text('Tipping the Velvet', first_match=True) >>> page.save(element, 'my_special_element')
python >>> element_dict = page.retrieve('my_special_element') >>> page.relocate(element_dict, adaptor_type=True) [<data='<a href="catalogue/tipping-the-velvet_99...' parent='<h3><a href="catalogue/tipping-the-velve...'>] >>> page.relocate(element_dict, adaptor_type=True).css('::text') ['Tipping the Velvet']
if you want to keep it as lxml.etree
object, leave the adaptor_type
argument python >>> page.relocate(element_dict) [<Element a at 0x105a2a7b0>]
Filtering results based on a function
# Find all products over $50
expensive_products = page.css('.product_pod').filter(
lambda p: float(p.css('.price_color').re_first(r'[\d\.]+')) > 50
)
# Find all the products with price '53.23'
page.css('.product_pod').search(
lambda p: float(p.css('.price_color').re_first(r'[\d\.]+')) == 54.23
)
Doing operations on element content is the same as scrapy python quote.re(r'regex_pattern') # Get all strings (TextHandlers) that match the regex pattern quote.re_first(r'regex_pattern') # Get the first string (TextHandler) only quote.json() # If the content text is jsonable, then convert it to json using `orjson` which is 10x faster than the standard json library and provides more options
except that you can do more with them like python quote.re( r'regex_pattern', replace_entities=True, # Character entity references are replaced by their corresponding character clean_match=True, # This will ignore all whitespaces and consecutive spaces while matching case_sensitive= False, # Set the regex to ignore letters case while compiling it )
Hence all of these methods are methods from the TextHandler
within that contains the text content so the same can be done directly if you call the .text
property or equivalent selector function.
Doing operations on the text content itself includes
python quote.clean()
TextHandler
objects too? so in cases where you have for example a JS object assigned to a JS variable inside JS code and want to extract it with regex and then convert it to json object, in other libraries, these would be more than 1 line of code but here you can do it in 1 line like this python page.xpath('//script/text()').re_first(r'var dataLayer = (.+);').json()
Sort all characters in the string as if it were a list and return the new string python quote.sort(reverse=False)
To be clear,
TextHandler
is a sub-class of Python'sstr
so all normal operations/methods that work with Python strings will work with it.
Any element's attributes are not exactly a dictionary but a sub-class of mapping called AttributesHandler
that's read-only so it's faster and string values returned are actually TextHandler
objects so all operations above can be done on them, standard dictionary operations that don't modify the data, and more :)
python >>> for item in element.attrib.search_values('catalogue', partial=True): print(item) {'href': 'catalogue/tipping-the-velvet_999/index.html'}
python >>> element.attrib.json_string b'{"href":"catalogue/tipping-the-velvet_999/index.html","title":"Tipping the Velvet"}'
python >>> dict(element.attrib) {'href': 'catalogue/tipping-the-velvet_999/index.html', 'title': 'Tipping the Velvet'}
Scrapling is under active development so expect many more features coming soon :)
There are a lot of deep details skipped here to make this as short as possible so to take a deep dive, head to the docs section. I will try to keep it updated as possible and add complex examples. There I will explain points like how to write your storage system, write spiders that don't depend on selectors at all, and more...
Note that implementing your storage system can be complex as there are some strict rules such as inheriting from the same abstract class, following the singleton design pattern used in other classes, and more. So make sure to read the docs first.
[!IMPORTANT] A website is needed to provide detailed library documentation.
I'm trying to rush creating the website, researching new ideas, and adding more features/tests/benchmarks but time is tight with too many spinning plates between work, personal life, and working on Scrapling. I have been working on Scrapling for months for free after all.
If you likeScrapling
and want it to keep improving then this is a friendly reminder that you can help by supporting me through the sponsor button.
This section addresses common questions about Scrapling, please read this section before opening an issue.
css
or xpath
with the auto_save
parameter set to True
before structural changes happen.Now because everything about the element can be changed or removed, nothing from the element can be used as a unique identifier for the database. To solve this issue, I made the storage system rely on two things:
identifier
parameter you passed to the method while selecting. If you didn't pass one, then the selector string itself will be used as an identifier but remember you will have to use it as an identifier value later when the structure changes and you want to pass the new selector.Together both are used to retrieve the element's unique properties from the database later. 4. Now later when you enable the auto_match
parameter for both the Adaptor instance and the method call. The element properties are retrieved and Scrapling loops over all elements in the page and compares each one's unique properties to the unique properties we already have for this element and a score is calculated for each one. 5. Comparing elements is not exact but more about finding how similar these values are, so everything is taken into consideration, even the values' order, like the order in which the element class names were written before and the order in which the same element class names are written now. 6. The score for each element is stored in the table, and the element(s) with the highest combined similarity scores are returned.
Not a big problem as it depends on your usage. The word default
will be used in place of the URL field while saving the element's unique properties. So this will only be an issue if you used the same identifier later for a different website that you didn't pass the URL parameter while initializing it as well. The save process will overwrite the previous data and auto-matching uses the latest saved properties only.
For each element, Scrapling will extract: - Element tag name, text, attributes (names and values), siblings (tag names only), and path (tag names only). - Element's parent tag name, attributes (names and values), and text.
auto_save
/auto_match
parameter while selecting and it got completely ignored with a warning messageThat's because passing the auto_save
/auto_match
argument without setting auto_match
to True
while initializing the Adaptor object will only result in ignoring the auto_save
/auto_match
argument value. This behavior is purely for performance reasons so the database gets created only when you are planning to use the auto-matching features.
It could be one of these reasons: 1. No data were saved/stored for this element before. 2. The selector passed is not the one used while storing element data. The solution is simple - Pass the old selector again as an identifier to the method called. - Retrieve the element with the retrieve method using the old selector as identifier then save it again with the save method and the new selector as identifier. - Start using the identifier argument more often if you are planning to use every new selector from now on. 3. The website had some extreme structural changes like a new full design. If this happens a lot with this website, the solution would be to make your code as selector-free as possible using Scrapling features.
Pretty much yeah, almost all features you get from BeautifulSoup can be found or achieved in Scrapling one way or another. In fact, if you see there's a feature in bs4 that is missing in Scrapling, please make a feature request from the issues tab to let me know.
Of course, you can find elements by text/regex, find similar elements in a more reliable way than AutoScraper, and finally save/retrieve elements manually to use later as the model feature in AutoScraper. I have pulled all top articles about AutoScraper from Google and tested Scrapling against examples in them. In all examples, Scrapling got the same results as AutoScraper in much less time.
Yes, Scrapling instances are thread-safe. Each Adaptor instance maintains its state.
Everybody is invited and welcome to contribute to Scrapling. There is a lot to do!
Please read the contributing file before doing anything.
[!CAUTION] This library is provided for educational and research purposes only. By using this library, you agree to comply with local and international laws regarding data scraping and privacy. The authors and contributors are not responsible for any misuse of this software. This library should not be used to violate the rights of others, for unethical purposes, or to use data in an unauthorized or illegal manner. Do not use it on any website unless you have permission from the website owner or within their allowed rules like the
robots.txt
file, for example.
This work is licensed under BSD-3
This project includes code adapted from: - Parsel (BSD License) - Used for translator submodule
In September 2023, KrebsOnSecurity published findings from security researchers who concluded that a series of six-figure cyberheists across dozens of victims resulted from thieves cracking master passwords stolen from the password manager service LastPass in 2022. In a court filing this week, U.S. federal agents investigating a spectacular $150 million cryptocurrency heist said they had reached the same conclusion.
On March 6, federal prosecutors in northern California said they seized approximately $24 million worth of cryptocurrencies that were clawed back following a $150 million cyberheist on Jan. 30, 2024. The complaint refers to the person robbed only as βVictim-1,β but according to blockchain security researcher ZachXBT the theft was perpetrated against Chris Larsen, the co-founder of the cryptocurrency platform Ripple. ZachXBT was the first to report on the heist.
This weekβs action by the government merely allows investigators to officially seize the frozen funds. But there is an important conclusion in this seizure document: It basically says the U.S. Secret Service and the FBI agree with the findings of the LastPass breach story published here in September 2023.
That piece quoted security researchers who said they were witnessing six-figure crypto heists several times each month that all appeared to be the result of crooks cracking master passwords for the password vaults stolen from LastPass in 2022.
βThe Federal Bureau of Investigation has been investigating these data breaches, and law enforcement agents investigating the instant case have spoken with FBI agents about their investigation,β reads the seizure complaint, which was written by a U.S. Secret Service agent. βFrom those conversations, law enforcement agents in this case learned that the stolen data and passwords that were stored in several victimsβ online password manager accounts were used to illegally, and without authorization, access the victimsβ electronic accounts and steal information, cryptocurrency, and other data.β
The document continues:
βBased on this investigation, law enforcement had probable cause to believe the same attackers behind the above-described commercial online password manager attack used a stolen password held in Victim 1βs online password manager account and, without authorization, accessed his cryptocurrency wallet/account.β
Working with dozens of victims, security researchers Nick Bax and Taylor Monahan found that none of the six-figure cyberheist victims appeared to have suffered the sorts of attacks that typically preface a high-dollar crypto theft, such as the compromise of oneβs email and/or mobile phone accounts, or SIM-swapping attacks.
They discovered the victims all had something else in common: Each had at one point stored their cryptocurrency seed phrase β the secret code that lets anyone gain access to your cryptocurrency holdings β in the βSecure Notesβ area of their LastPass account prior to the 2022 breaches at the company.
Bax and Monahan found another common theme with these robberies: They all followed a similar pattern of cashing out, rapidly moving stolen funds to a dizzying number of drop accounts scattered across various cryptocurrency exchanges.
According to the government, a similar level of complexity was present in the $150 million heist against the Ripple co-founder last year.
βThe scale of a theft and rapid dissipation of funds would have required the efforts of multiple malicious actors, and was consistent with the online password manager breaches and attack on other victims whose cryptocurrency was stolen,β the government wrote. βFor these reasons, law enforcement agents believe the cryptocurrency stolen from Victim 1 was committed by the same attackers who conducted the attack on the online password manager, and cryptocurrency thefts from other similarly situated victims.β
Reached for comment, LastPass said it has seen no definitive proof β from federal investigators or others β that the cyberheists in question were linked to the LastPass breaches.
βSince we initially disclosed this incident back in 2022, LastPass has worked in close cooperation with multiple representatives from law enforcement,β LastPass said in a written statement. βTo date, our law enforcement partners have not made us aware of any conclusive evidence that connects any crypto thefts to our incident. In the meantime, we have been investing heavily in enhancing our security measures and will continue to do so.β
On August 25, 2022,Β LastPass CEO Karim Toubba told users the company had detected unusual activity in its software development environment, and that the intruders stole some source code and proprietary LastPass technical information. On Sept. 15, 2022, LastPass said an investigation into the August breach determined the attacker did not access any customer data or password vaults.
But on Nov. 30, 2022, LastPass notified customers about another, far more serious security incident that the company said leveraged data stolen in the August breach. LastPass disclosed that criminal hackers had compromised encrypted copies of some password vaults, as well as other personal information.
Experts say the breach would have given thieves βofflineβ access to encrypted password vaults, theoretically allowing them all the time in the world to try to crack some of the weaker master passwords using powerful systems that can attempt millions of password guesses per second.
Researchers found that many of the cyberheist victims had chosen master passwords with relatively low complexity, and were among LastPassβs oldest customers. Thatβs because legacy LastPass users were more likely to have master passwords that were protected with far fewer βiterations,β which refers to the number of times your password is run through the companyβs encryption routines. In general, the more iterations, the longer it takes an offline attacker to crack your master password.
Over the years, LastPass forced new users to pick longer and more complex master passwords, and they increased the number of iterations on multiple occasions by several orders of magnitude. But researchers found strong indications that LastPass never succeeded in upgrading many of its older customers to the newer password requirements and protections.
Asked about LastPassβs continuing denials, Bax said that after the initial warning in our 2023 story, he naively hoped people would migrate their funds to new cryptocurrency wallets.
βWhile some did, the continued thefts underscore how much more needs to be done,β Bax told KrebsOnSecurity. βItβs validating to see the Secret Service and FBI corroborate our findings, but Iβd much rather see fewer of these hacks in the first place. ZachXBT and SEAL 911Β reported yet another wave of thefts as recently as December, showing the threat is still very real.β
Monahan saidΒ LastPass still hasnβt alerted their customers that their secretsβespecially those stored in βSecure Notesββmay be at risk.
βIts been two and a half years since LastPass was first breached [and] hundreds of millions of dollars has been stolen from individuals and companies around the globe,β Monahan said. βThey could have encouraged users to rotate their credentials. They couldβve prevented millions and millions of dollars from being stolen by these threat actors. ButΒ instead they chose to deny that their customers were are risk and blame the victims instead.β
The FBI joined authorities across Europe last week in seizing domain names for Cracked and Nulled, English-language cybercrime forums with millions of users that trafficked in stolen data, hacking tools and malware. An investigation into the history of these communities shows their apparent co-founders quite openly operate an Internet service provider and a pair of e-commerce platforms catering to buyers and sellers on both forums.
In this 2019 post from Cracked, a forum moderator told the author of the post (Buddie) that the owner of the RDP service was the founder of Nulled, a.k.a. βFinndev.β Image: Ke-la.com.
On Jan. 30, the U.S. Department of Justice said it seized eight domain names that were used to operate Cracked, a cybercrime forum that sprang up in 2018 and attracted more than four million users. The DOJ said the law enforcement action, dubbed Operation Talent, also seized domains tied to Sellix, Crackedβs payment processor.
In addition, the government seized the domain names for two popular anonymity services that were heavily advertised on Cracked and Nulled and allowed customers to rent virtual servers: StarkRDP[.]io, and rdp[.]sh.
Those archived webpages show both RDP services were owned by an entity called 1337 Services Gmbh. According to corporate records compiled by Northdata.com, 1337 Services GmbH is also known as AS210558Β and is incorporated in Hamburg, Germany.
The Cracked forum administrator went by the nicknames βFlorainNβ and βStarkRDPβ on multiple cybercrime forums. Meanwhile, a LinkedIn profile for a Florian M. from Germany refers to this person as the co-founder of Sellix and founder of 1337 Services GmbH.
Northdataβs business profile for 1337 Services GmbH shows the company is controlled by two individuals: 32-year-old Florian Marzahl and Finn Alexander Grimpe, 28.
An organization chart showing the owners of 1337 Services GmbH as Florian Marzahl and Finn Grimpe. Image: Northdata.com.
Neither Marzahl nor Grimpe responded to requests for comment. But Grimpeβs first name is interesting because it corresponds to the nickname chosen by the founder of Nulled, who goes by the monikers βFinnβ and βFinndev.β NorthData reveals that Grimpe was the founder of a German entity called DreamDrive GmbH, which rented out high-end sports cars and motorcycles.
According to the cyber intelligence firm Intel 471, a user named Finndev registered on multiple cybercrime forums, including Raidforums [seized by the FBI in 2022], Void[.]to, and vDOS, a DDoS-for-hire service that was shut down in 2016 after its founders were arrested.
The email address used for those accounts was f.grimpe@gmail.com. DomainTools.com reports f.grimpe@gmail.com was used to register at least nine domain names, including nulled[.]lol and nulled[.]it. Neither of these domains were among those seized in Operation Talent.
Intel471 finds the user FlorainN registered across multiple cybercrime forums using the email address olivia.messla@outlook.de. The breach tracking service Constella Intelligence says this email address used the same password (and slight variations of it) across many accounts online β including at hacker forums β and that the same password was used in connection with dozens of other email addresses, such as florianmarzahl@hotmail.de, and fmarzahl137@gmail.com.
The Justice Department said the Nulled marketplace had more than five million members, and has been selling stolen login credentials, stolen identification documents and hacking services, as well as tools for carrying out cybercrime and fraud, since 2016.
Perhaps fittingly, both Cracked and Nulled have been hacked over the years, exposing countless private messages between forum users. A review of those messages archived by Intel 471 showed that dozens of early forum members referred privately to Finndev as the owner of shoppy[.]gg, an e-commerce platform that caters to the same clientele as Sellix.
Shoppy was not targeted as part of Operation Talent, and its website remains online. Northdata reports that Shoppyβs business name β Shoppy Ecommerce Ltd. β is registered at an address in Gan-Ner, Israel, but there is no ownership information about this entity. Shoppy did not respond to requests for comment.
Constella found that a user named Shoppy registered on Cracked in 2019 using the email address finn@shoppy[.]gg. Constella says that email address is tied to a Twitter/X account for Shoppy Ecommerce in Israel.
The DOJ said one of the alleged administrators of Nulled, a 29-year-old Argentinian national named Lucas Sohn, was arrested in Spain.Β The government has not announced any other arrests or charges associated with Operation Talent.
Indeed, both StarkRDP and FloraiN have posted to their accounts on Telegram that there were no charges levied against the proprietors of 1337 Services GmbH. FlorainN told former customers they were in the process of moving to a new name and domain for StarkRDP, where existing accounts and balances would be transferred.
βStarkRDP has always been operating by the law and is not involved in any of these alleged crimes and the legal process will confirm this,β the StarkRDP Telegram account wrote on January 30. βAll of your servers are safe and they have not been collected in this operation. The only things that were seized is the website server and our domain. Unfortunately, no one can tell who took it and with whom we can talk about it. Therefore, we will restart operation soon, under a different name, to close the chapter [of] βStarkRDP.'β
Weβre excited to announce the release of McAfeeβs Personal Data Cleanup, a new feature that finds and removes your personal info from data brokers and people search sites. Now, you can feel more confident by removing personal infoΒ from data broker sites and keeping it from being collected, sold, and used to:Β advertise products to you, fill your email box with spam, and can even give criminals the info they need to steal your identity.Β Letβs look at why weβre offering McAfee Personal Data Cleanup, how it protects your privacy, and why itβs a great addition to the online protection we already offer.Β
Thereβs so much to enjoy when you live a connected life β free email, online stores that remember what you like, social media that connects you to friends and influencers. Itβs a world of convenience, opportunity, and incredible content. Itβs also a world whereΒ your data is constantly collected.Β Β
Thatβs right, companies are collecting your personal data. Theyβre called data brokers and they make money by selling information that specifically identifies you, like an email address. They sell this information to marketers looking to target you with ads. Criminals can also use it to build profiles in service of stealing your identity and accessing your accounts.Β This activity takes place behind theΒ scenesΒ and often without consumersβ knowledge.Β There are also data brokers known as people search sites that compile and sell infoΒ like homeΒ addresses,Β emails, phones, court records, employment info, and more.Β These websites give identityΒ thieves, hackers, stalkers, and other malicious actors easy access to your info. Regardless of how your data is being used, itβs clear that these days a more connected life often comes at the cost of your privacy. Β
In a recent survey of McAfee customers, we found that 59% have become more protective of their personal data over the past six months. And itβs no wonder. Over the past two years, trends like telehealth, remote working, and increased usage of online shopping and financial services have meant that more of your time is being spent online. Unsurprisingly, more personal data is being made available in the process. This leads us to the most alarming finding of our survey β 95% of consumers whose personal information ends up on data broker sites had it collected without their consent.Β Β
Β
We created Personal Data Cleanup to make it easy for you toΒ take back your privacy online. McAfeeβs Personal Data Cleanup regularly scans the riskiest data broker sites for info like your home address, date of birth, and names of relatives. After showing where we found your data, you can either remove it yourself or we will work on your behalf to remove it. Hereβs how it works:Β
Ready to take back your personal info online? Personal Data Cleanup is available immediately with most of our online protection plans. If you have an eligible subscription, you can start using this new feature through McAfee Protection Center, or you can get McAfee online protection here.
The post Introducing Personal Data Cleanup appeared first on McAfee Blog.
Data Privacy Week is here, and thereβs no better time to shine a spotlight on one of the biggest players in the personal information economy: data brokers. These entities collect, buy, and sell hundredsβsometimes thousandsβof data points on individuals like you. But how do they manage to gather so much information, and for what purpose? From your browsing habits and purchase history to your location data and even more intimate details, these digital middlemen piece together surprisingly comprehensive profiles. The real question is: where are they getting it all, and why is your personal data so valuable to them? Letβs unravel the mystery behind the data broker industry.
Data brokers aggregate user info from various sources on the internet. They collect, collate, package, and sometimes even analyze this data to create a holistic and coherent version of you online. This data then gets put up for sale to nearly anyone whoβll buy it. That can include marketers, private investigators, tech companies, and sometimes law enforcement as well. Theyβll also sell to spammers and scammers. (Those bad actors need to get your contact info from somewhere β data brokers are one way to get that and more.)
And that list of potential buyers goes on, which includes but isnβt limited to:
These companies andΒ social mediaΒ platforms use your data to better understand target demographics and the content with which they interact. While the practice isnβt unethical in and of itself (personalizing user experiences and creating more convenient UIs are usually cited as the primary reasons for it), it does make your data vulnerable to malicious attacks targeted toward big-tech servers.
Most of your online activities are related. Devices like your phone, laptop, tablets, and even fitness watches are linked to each other. Moreover, you might use one email ID for various accounts and subscriptions. This online interconnectedness makes it easier forΒ data brokersΒ to create a cohesive user profile.
Mobile phoneΒ appsΒ are the most common way forΒ data brokerageΒ firms to collect your data. You might have countlessΒ appsΒ for various purposes, such as financial transactions, health and fitness, orΒ social media.
A number of theseΒ appsΒ usually fall under the umbrella of the same or subsidiary family ofΒ apps, all of which work toward collecting and supplying data to big tech platforms. Programs like Googleβs AdSense make it easier for developers to monetize theirΒ appsΒ in exchange for the user information they collect.
Data brokers also collect data points like your home address, full name, phone number, and date of birth. They have automated scraping tools to quickly collect relevant information from public records (think sales of real estate, marriages, divorces, voter registration, and so on).
Lastly,Β data brokersΒ can gather data from other third parties that track your cookies or even placeΒ trackersΒ or cookies on your browsers. Cookies are small data files that track your online activities when visiting different websites. They track yourΒ IP addressΒ and browsing history, which third parties can exploit. Cookies are also the reason you see personalized ads and products.
Data brokersΒ collate your private information into one package and sell it to βpeople searchβ websites. As mentioned above, practically anyone can access these websites and purchase extensiveΒ consumer data, for groups of people and individuals alike.
Next, marketing and sales firms are some ofΒ data brokersβ biggest clients. These companies purchase massive data sets fromΒ data brokersΒ to research yourΒ data profile. They have advanced algorithms to segregate users into various consumer groups and target you specifically. Their predictive algorithms can suggest personalized ads and products to generate higher lead generation and conversation percentages for their clients.
We tend to accept the terms and conditions that variousΒ appsΒ ask us to accept without thinking twice or reading the fine print. You probably cannot proceed without letting theΒ appΒ track certain data or giving your personal information. To a certain extent, we trade some of our privacy for convenience. This becomes public information, andΒ appsΒ andΒ data brokers collect, track, and use our data however they please while still complying with the law.
There is no comprehensive privacy law in the U.S. on a federal level. This allowsΒ data brokersΒ to collect personal information and condense it into marketing insights. While not all methods of gathering private data are legal, it is difficult to track the activities ofΒ data brokersΒ online (especially on the dark web). As technology advances, there are also easier ways to harvest and exploit data.
As of March 2024, 15 states in the U.S. have data privacy laws in place. That includes California, Virginia, Connecticut, Colorado, Utah, Iowa, Indiana, Tennessee, Oregon, Montana, Texas, Delaware, Florida, New Jersey, and New Hampshire.[i] The laws vary by state, yet generally, they grant rights to individuals around the collection, use, and disclosure of their personal data by businesses.
However, these laws make exceptions for certain types of data and certain types of collectors. In short, these laws arenβt absolute.
SomeΒ data brokersΒ let youΒ remove your information from their websites. There are also extensive guides available online that list the method by which you can opt-out of some of the biggest data brokering firms. For example,Β a guide by Griffin Boyce, the systems administrator at Harvard Universityβs Berkman Klein Center for Internet and Society, provides detailed information on how to opt-out of a long list ofΒ data broker companies.
Yet the list of data brokers is long. Cleaning up your personal data online can quickly eat up your time, as it requires you to reach out to multiple data brokers and opt-out.
Rather than removing yourself one by one from the host of data broker sites out there, you have a solid option: our Personal Data Cleanup.
Personal Data Cleanup scans data broker sites and shows you which ones are selling your personal info. It also provides guidance on how you can remove your data from those sites. And if you want to save time on manually removing that info, you have options. Our McAfee+ Advanced and Ultimate plans come with full-service Personal Data Cleanup, which sends requests to remove your data automatically.
If the thought of your personal info getting bought and sold in such a public way bothers you, our Personal Data Cleanup can put you back in charge of it.
[i] https://pro.bloomberglaw.com/insights/privacy/state-privacy-legislation-tracker/
Β
The post How Data Brokers Sell Your Identity appeared first on McAfee Blog.
PrivateΒ tech companiesΒ gather tremendous amounts ofΒ user data. These companies can afford to let you useΒ social mediaΒ platforms free of charge because itβs paid for by your data, attention, and time.
Big tech derives most of its profits by selling your attention to advertisers β a well-knownΒ business model. Various documentaries (likeΒ Netflixβs βThe Social Dilemmaβ) have tried to get to the bottom of the complexΒ algorithmsΒ that bigΒ tech companiesΒ employ to mine and analyzeΒ user dataΒ for the benefit of third-party advertisers.
Tech companiesΒ benefit from personal info by being able to provideΒ personalized ads. When you click βyesβ at the end of a terms and conditions agreement found on someΒ web pages, you might be allowing the companies to collect the following data:
For someone unfamiliar with privacy issues, it is important to understand the extent of big techβs tracking andΒ data collection. After these companiesΒ collect data, all this info can be supplied to third-party businesses or used to improve user experience.
The problem with this is that big tech has blurred the line between collecting customer data and violating user privacy in some cases. While tracking what content you interact with can be justified under the garb of personalizing the content you see, big tech platforms have been known to go too far. Prominent social networks like Facebook and LinkedIn have faced legal trouble for accessing personal user data like private messages and saved photos.
The info you provide helps build an accurate character profile and turns it into knowledge that gives actionable insights to businesses. Private data usage can be classified into three cases: selling it toΒ data brokers, using it to improve marketing, or enhancing customer experience.
To sell your info toΒ data brokers
Along with big data, another industry has seen rapid growth:Β data brokers.Β Data brokersΒ buy, analyze, and package your data. Companies that collect large amounts of data on their users stand to profit from this service.Β Selling dataΒ to brokers is an important revenue stream for bigΒ tech companies.
Advertisers and businesses benefit from increased info on their consumers, creating a high demand for your info. The problem here is that companies like Facebook and Alphabet (Googleβs parent company) have been known to mine massive amounts ofΒ user dataΒ for the sake of their advertisers.
To personalize marketing efforts
Marketing can be highly personalized thanks to the availability of large amounts of consumer data. Tracking your response to marketing campaigns can help businesses alter or improve certain aspects of their campaign to drive better results.
The problem is that most AI-basedΒ algorithmsΒ are incapable of assessing when they should stop collecting or using your info. After a point, users run the risk of being constantly subjected to intrusive ads and other unconsented marketing campaigns thatΒ pop upΒ frequently.
To cater to the customer experience
Analyzing consumer behavior through reviews, feedback, and recommendations can help improve customer experience. Businesses have access to various facets of data that can be analyzed to show them how to meet consumer demands. This might help improve any part of a consumerβs interaction with the company, from designing special offers and discounts to improving customer relationships.
For mostΒ social mediaΒ platforms, the goal is to curate a personalized feed that appeals to users and allows them to spend more time on theΒ app. When left unmonitored, the powerfulΒ algorithmsΒ behind theseΒ social mediaΒ platforms can repeatedly subject you to the same kind of content from different creators.
Here are the bigΒ tech companiesΒ that collect and mine the mostΒ user data.
Users need a comprehensive data privacy solution to tackle the rampant, large-scale data mining carried out by big tech platforms. While targeted advertisements and easily found items are beneficial, many of these companies collect and mineΒ user dataΒ through several channels simultaneously, exploiting them in several ways.
Itβs important to ensure your personal info is protected. Protection solutions likeΒ McAfeeβs Personal Data Cleanup feature can help. It scours the web for traces of your personal info and helps remove it for your online privacy.
McAfee+Β provides antivirus software for all your digital devices and a secureΒ VPNΒ connection to avoid exposure to malicious third parties while browsing the internet. Our Identity Monitoring andΒ personal dataΒ removal solutions further remove gaps in your devicesβ security systems.
With our data protection and custom guidance (complete with a protection score for each platform and tips to keep you safer), you can be sure that your internet identity is protected.
The post What Personal Data Do Companies Track? appeared first on McAfee Blog.
McAfee threat researchers have identified several consumer brands and product categories most frequently used by cybercriminals to trick consumers into clicking on malicious links in the first weeks of this holiday shopping season. As holiday excitement peaks and shoppers hunt for the perfect gifts and amazing deals, scammers are taking advantage of the buzz. The National Retail Federation projects holiday spending will reach between $979.5 and $989 billion this year, and cybercriminals are capitalizing by creating scams that mimic the trusted brands and categories consumers trust. From October 1 to November 12, 2024, McAfee safeguarded its customers from 624,346 malicious or suspicious URLs tied to popular consumer brand names β a clear indication that bad actors are exploiting trusted brand names to deceive holiday shoppers.Β
McAfeeβs threat research also reveals a 33.82% spike in malicious URLs targeting consumers with these brandsβ names in the run-up to Black Friday and Cyber Monday. This rise in fraudulent activity aligns with holiday shopping patterns during a time when consumers may be more susceptible to clicking on offers from well-known brands like Apple, Yeezy, and Louis Vuitton, especially when deals seem too good to be true β pointing to the need for consumers to stay vigilant, especially with offers that seem unusually generous or come from unverified sources.Β Β
McAfee threat researchers have identified a surge in counterfeit sites and phishing scams that use popular luxury brands and tech products to lure consumers into βdealsβ on fake e-commerce sites designed to appear as official brand pages. While footwear and handbags were identified as the top two product categories exploited by cybercrooks during this festive time, the list of most exploited brands extends beyond those borders:Β
By mimicking trusted brands like these, offering unbelievable deals, or posing as legitimate customer service channels, cybercrooks create convincing traps designed to steal personal information or money. Here are some of the most common tactics scammers are using this holiday season:Β
Β With holiday shopping in full swing, itβs essential for consumers to stay one step ahead of scammers. By understanding the tactics cybercriminals use and taking a few precautionary measures, shoppers can protect themselves from falling victim to fraud. Here are some practical tips for safe shopping this season:Β
McAfeeβs threat research team analyzed malicious or suspicious URLs that McAfeeβs web reputation technology identified as targeting customers, by using a list of key company and product brand namesβbased on insights from a Potter Clarkson report on frequently faked brandsβto query the URLs. This methodology captures instances where users either clicked on or were directed to dangerous sites mimicking trusted brands. Additionally, the team queried anonymized user activity from October 1st through November 12th.Β
The image below is a screenshot of a fake / malicious / scam site: Yeezy is a popular product brand formerly from Adidas found in multiple Malicious/Suspicious URLs. Often, they present themselves as official Yeezy and/or Adidas shopping sites.Β
Β
The image below is a screenshot of a fake / malicious / scam site: The Apple brand was a popular target for scammers. Many sites were either knock offs, scams, or in this case, a fake customer service page designed to lure users into a scam.Β
Β
The image below is a screenshot of a fake / malicious / scam site: This particular (fake) Apple sales site used Apple within its URL and name to appear more official. Oddly, this site also sells Samsung Android phones.Β
The image below is a screenshot of a fake / malicious / scam site: This site, now taken down, is a scam site purporting to sell Nike shoes.Β
The image below is a screenshot of a fake / malicious / scam site: Louis Vuitton is a popular brand for counterfeit and scams. Particularly their handbags. Here is one site that was entirely focused on Louis Vuitton Handbags.Β
The image below is a screenshot of a fake / malicious / scam site: This site presents itself as the official Louis Vuitton site selling handbags and clothes.Β
Β
The image below is a screenshot of a fake / malicious / scam site: This site uses too-good-to-be-true deals on branded items including this Louis Vuitton Bomber jacket.Β
The image below is a screenshot of a fake / malicious / scam site: Rolex is a popular watch brand for counterfeits and scams. This site acknowledges it sells counterfeits and makes no effort to indicate this on the product.Β Β
Β
The post This Holiday Season, Watch Out for These Cyber-Grinch Tricks Used to Scam Holiday Shoppers appeared first on McAfee Blog.
Two-step verification, two-factor authentication, multi-factor authenticationβ¦whatever your social media platform calls it, itβs an excellent way to protect your accounts.
Thereβs a good chance youβre already using multi-factor verification with your other accounts β for your bank, your finances, your credit card, and any number of things. The way it requires an extra one-time code in addition to your login and password makes life far tougher for hackers.
Itβs increasingly common to see nowadays, where all manner of online services only allow access to your accounts after youβve provided a one-time passcode sent to your email or smartphone. Thatβs where two-step verification comes in. You get sent a code as part of your usual login process (usually a six-digit number), and then you enter that along with your username and password.
Some online services also offer the option to use an authenticator app, which sends the code to a secure app rather than via email or your smartphone. Authenticator apps work much in the same way, yet they offer three unique features:
Google, Microsoft, and others offer authenticator apps if you want to go that route. You can get a good list of options by checking out the βeditorβs picksβ at your app store or in trusted tech publications.
Whichever form of authentication you use, always keep that secure code to yourself. Itβs yours and yours alone. Anyone who asks for that code, say someone masquerading as a customer service rep, is trying to scam you. With that code, and your username/password combo, they can get into your account.
Passwords and two-step verification work hand-in-hand to keep you safer. Yet not any old password will do. Youβll want a strong, unique password. Hereβs how that breaks down:
Now, with strong passwords in place, you can get to setting up multi-factor verification on your social media accounts.
When you set up two-factor authentication on Facebook, youβll be asked to choose one of three security methods:
And hereβs a link to the companyβs full walkthrough: https://www.facebook.com/help/148233965247823
When you set up two-factor authentication on Instagram, youβll be asked to choose one of three security methods: an authentication app, text message, or WhatsApp.
And hereβs a link to the companyβs full walkthrough: https://help.instagram.com/566810106808145
And hereβs a link to the companyβs full walkthrough: https://faq.whatsapp.com/1920866721452534
And hereβs a link to the companyβs full walkthrough: https://support.google.com/accounts/answer/185839?hl=en&co=GENIE.Platform%3DDesktop
1. TapProfileat the bottom of the screen.
2. Tap theΒ MenuΒ button at the top.
3. TapΒ Settings and Privacy, then Security.
4. TapΒ 2-step verificationΒ and choose at least two verification methods: SMS (text), email, and authenticator app.
5. TapΒ Turn on to confirm.
And hereβs a link to the companyβs full walkthrough: https://support.tiktok.com/en/account-and-privacy/personalized-ads-and-data/how-your-phone-number-is-used-on-tiktok
The post How to Protect Your Social Media Passwords with Multi-factor Verification appeared first on McAfee Blog.
The financial technology firm Finastra is investigating the alleged large-scale theft of information from its internal file transfer platform, KrebsOnSecurity has learned. Finastra, which provides software and services to 45 of the worldβs top 50 banks, notified customers of the security incident after a cybercriminal began selling more than 400 gigabytes of data purportedly stolen from the company.
London-based Finastra has offices in 42 countries and reported $1.9 billion in revenues last year. The company employs more than 7,000 people and serves approximately 8,100 financial institutions around the world. A major part of Finastraβs day-to-day business involves processing huge volumes of digital files containing instructions for wire and bank transfers on behalf of its clients.
On November 8, 2024, Finastra notified financial institution customers that on Nov. 7 its security team detected suspicious activity on Finastraβs internally hosted file transfer platform. Finastra also told customers that someone had begun selling large volumes of files allegedly stolen from its systems.
βOn November 8, a threat actor communicated on the dark web claiming to have data exfiltrated from this platform,β reads Finastraβs disclosure, a copy of which was shared by a source at one of the customer firms.
βThere is no direct impact on customer operations, our customersβ systems, or Finastraβs ability to serve our customers currently,β the notice continued. βWe have implemented an alternative secure file sharing platform to ensure continuity, and investigations are ongoing.β
But its notice to customers does indicate the intruder managed to extract or βexfiltrateβ an unspecified volume of customer data.
βThe threat actor did not deploy malware or tamper with any customer files within the environment,β the notice reads. βFurthermore, no files other than the exfiltrated files were viewed or accessed. We remain focused on determining the scope and nature of the data contained within the exfiltrated files.β
In a written statement in response to questions about the incident, Finastra said it has been βactively and transparently responding to our customersβ questions and keeping them informed about what we do and do not yet know about the data that was posted.β The company also shared an updated communication to its clients, which said while it was still investigating the root cause, βinitial evidence points to credentials that were compromised.β
βAdditionally, we have been sharing Indicators of Compromise (IOCs) and our CISO has been speaking directly with our customersβ security teams to provide updates on the investigation and our eDiscovery process,β the statement continues. Here is the rest of what they shared:
βIn terms of eDiscovery, we are analyzing the data to determine what specific customers were affected, while simultaneously assessing and communicating which of our products are not dependent on the specific version of the SFTP platform that was compromised. The impacted SFTP platform is not used by all customers and is not the default platform used by Finastra or its customers to exchange data files associated with a broad suite of our products, so we are working as quickly as possible to rule out affected customers. However, as you can imagine, this is a time-intensive process because we have many large customers that leverage different Finastra products in different parts of their business. We are prioritizing accuracy and transparency in our communications.
Importantly, for any customers who are deemed to be affected, we will be reaching out and working with them directly.β
On Nov. 8, a cybercriminal using the nickname βabyss0β posted on the English-language cybercrime community BreachForums that theyβd stolen files belonging to some of Finastraβs largest banking clients. The data auction did not specify a starting or βbuy it nowβ price, but said interested buyers should reach out to them on Telegram.
abyss0βs Nov. 7 sales thread on BreachForums included many screenshots showing the file directory listings for various Finastra customers. Image: Ke-la.com.
According to screenshots collected by the cyber intelligence platform Ke-la.com, abyss0 first attempted to sell the data allegedly stolen from Finastra on October 31, but that earlier sales thread did not name the victim company. However, it did reference many of the same banks called out as Finastra customers in the Nov. 8 post on BreachForums.
The original October 31 post from abyss0, where they advertise the sale of data from several large banks that are customers of a large financial software company. Image: Ke-la.com.
The October sales thread also included a starting price: $20,000. By Nov. 3, that price had been reduced to $10,000. A review of abyss0βs posts to BreachForums reveals this user has offered to sell databases stolen in several dozen other breaches advertised over the past six months.
The apparent timeline of this breach suggests abyss0 gained access to Finastraβs file sharing system at least a week before the company says it first detected suspicious activity, and that the Nov. 7 activity cited by Finastra may have been the intruder returning to exfiltrate more data.
Maybe abyss0 found a buyer who paid for their early retirement. We may never know, because this person has effectively vanished. The Telegram account that abyss0 listed in their sales thread appears to have been suspended or deleted. Likewise, abyss0βs account on BreachForums no longer exists, and all of their sales threads have since disappeared.
It seems improbable that both Telegram and BreachForums would have given this user the boot at the same time. The simplest explanation is that something spooked abyss0 enough for them to abandon a number of pending sales opportunities, in addition to a well-manicured cybercrime persona.
In March 2020, Finastra suffered a ransomware attack that sidelined a number of the companyβs core businesses for days. According to reporting from Bloomberg, Finastra was able to recover from that incident without paying a ransom.
This is a developing story. Updates will be noted with timestamps. If you have any additional information about this incident, please reach out to krebsonsecurity @ gmail.com or at protonmail.com.
In December 2023, KrebsOnSecurity revealed the real-life identity of Rescator, the nickname used by a Russian cybercriminal who sold more than 100 million payment cards stolen from Target and Home Depot between 2013 and 2014. Moscow resident Mikhail Shefel, who confirmed using the Rescator identity in a recent interview, also admitted reaching out because he is broke and seeking publicity for several new money making schemes.
Mikhail βMikeβ Shefelβs former Facebook profile. Shefel has since legally changed his last name to Lenin.
Mr. Shefel, who recently changed his legal surname to Lenin, was the star of last yearβs story, Ten Years Later, New Clues in the Target Breach. That investigation detailed how the 38-year-old Shefel adopted the nickname Rescator while working as vice president of payments at ChronoPay, a Russian financial company that paid spammers to advertise fake antivirus scams, male enhancement drugs and knockoff pharmaceuticals.
Mr. Shefel did not respond to requests for comment in advance of that December 2023 profile. Nor did he respond to reporting here in January 2024 that he ran an IT company with a 34-year-old Russian man named Aleksandr Ermakov, who was sanctioned by authorities in Australia, the U.K. and U.S. for stealing data on nearly 10 million customers of the Australian health insurance giant Medibank.
But not long after KrebsOnSecurity reported in April that Shefel/Rescator also was behind the theft of Social Security and tax information from a majority of South Carolina residents in 2012, Mr. Shefel began contacting this author with the pretense of setting the record straight on his alleged criminal hacking activities.
In a series of live video chats and text messages, Mr. Shefel confirmed he indeed went by the Rescator identity for several years, and that he did operate a slew of websites between 2013 and 2015 that sold payment card data stolen from Target, Home Depot and a number of other nationwide retail chains.
Shefel claims the true mastermind behind the Target and other retail breaches was Dmitri Golubov, an infamous Ukrainian hacker known as the co-founder of Carderplanet, among the earliest Russian-language cybercrime forums focused on payment card fraud. Mr. Golubov could not be reached for comment, and Shefel says he no longer has the laptop containing evidence to support that claim.
Shefel asserts he and his team were responsible for developing the card-stealing malware that Golubovβs hackers installed on Target and Home Depot payment terminals, and that at the time he was technical director of a long-running Russian cybercrime community called Lampeduza.
βMy nickname was MikeMike, and I worked with Dmitri Golubov and made technologies for him,β Shefel said. βIβm also godfather of his second son.β
Dmitri Golubov, circa 2005. Image: U.S. Postal Investigative Service.
A week after breaking the story about the 2013 data breach at Target, KrebsOnSecurity published Whoβs Selling Cards from Target?, which identified a Ukrainian man who went by the nickname Helkern as Rescatorβs original identity. But Shefel claims Helkern was subordinate to Golubov, and that he was responsible for introducing the two men more than a decade ago.
βHelkern was my friend, I [set up a] meeting with Golubov and him in 2013,β Shefel said. βThat was in Odessa, Ukraine. I was often in that city, and [itβs where] I met my second wife.β
Shefel claims he made several hundred thousand dollars selling cards stolen by Golubovβs Ukraine-based hacking crew, but that not long after Russia annexed Crimea in 2014 Golubov cut him out of the business and replaced Shefelβs malware coding team with programmers in Ukraine.
Golubov was arrested in Ukraine in 2005 as part of a joint investigation with multiple U.S. federal law enforcement agencies, but his political connections in the country ensured his case went nowhere. Golubov later earned immunity from prosecution by becoming an elected politician and founding the Internet Party of Ukraine, which called for free internet for all, the creation of country-wide βhacker schoolsβ and the βcomputerization of the entire economy.β
Mr. Shefel says he stopped selling stolen payment cards after being pushed out of the business, and invested his earnings in a now-defunct Russian search engine called tf[.]org. He also apparently ran a business called click2dad[.]net that paid people to click on ads for Russian government employment opportunities.
When those enterprises fizzled out, Shefel reverted to selling malware coding services for hire under the nickname βGetsendβ; this claim checks out, as Getsend for many years advertised the same Telegram handle that Shefel used in our recent chats and video calls.
Shefel acknowledged that his outreach was motivated by a desire to publicize several new business ventures. None of those will be mentioned here because Shefel is already using my December 2023 profile of him to advertise what appears to be a pyramid scheme, and to remind others within the Russian hacker community of his skills and accomplishments.
Shefel says he is now flat broke, and that he currently has little to show for a storied hacking career. The Moscow native said he recently heard from his ex-wife, who had read last yearβs story about him and was suddenly wondering where heβd hidden all of his earnings.
More urgently, Shefel needs money to stay out of prison. In February, he and Ermakov were arrested on charges of operating a short-lived ransomware affiliate program in 2021 called Sugar (a.k.a. Sugar Locker), which targeted single computers and end-users instead of corporations. Shefel is due to face those charges in a Moscow court on Friday, Nov. 15, 2024. Ermakov was recently found guilty and given two years probation.
Shefel claims his Sugar ransomware affiliate program was a bust, and never generated any profits. Russia is known for not prosecuting criminal hackers within its borders who scrupulously avoid attacking Russian businesses and consumers. When asked why he now faces prosecution over Sugar, Shefel said heβs certain the investigation was instigated byΒ Pyotr βPeterβ Vrublevsky β the son of his former boss at ChronoPay.
ChronoPay founder and CEO Pavel Vrublevsky was the key subject of my 2014 book Spam Nation, which described his role as head of one of Russiaβs most notorious criminal spam operations.
Vrublevsky Sr. recently declared bankruptcy, and is currently in prison on fraud charges. Russian authorities allege Vrublevsky operated several fraudulent SMS-based payment schemes. They also accused Vrublevsky of facilitating money laundering for Hydra, the largest Russian darknet market at the time. Hydra trafficked in illegal drugs and financial services, including cryptocurrency tumbling for money laundering, exchange services between cryptocurrency and Russian rubles, and the sale of falsified documents and hacking services.
However, in 2022 KrebsOnSecurity reported on a more likely reason for Vrublevskyβs latest criminal charges: Heβd been extensively documenting the nicknames, real names and criminal exploits of Russian hackers who worked with the protection of corrupt officials in the Russian Federal Security Service (FSB), and operating a Telegram channel that threatened to expose alleged nefarious dealings by Russian financial executives.
Shefel believes Vrublevskyβs son Peter paid corrupt cops to levy criminal charges against him after reporting the youth to Moscow police, allegedly for walking around in public with a loaded firearm. Shefel says the Russian authorities told the younger Vrublevsky that he had lodged the firearms complaint.
In July 2024, the Russian news outlet Izvestia published a lengthy investigation into Peter Vrublevsky, alleging that the younger son took up his fatherβs mantle and was responsible for advertising Sprut, a Russian-language narcotics bazaar that sprang to life after the Hydra darknet market was shut down by international law enforcement agencies in 2022.
Izvestia reports that Peter Vrublevsky was the advertising mastermind behind this 3D ad campaign and others promoting the Russian online narcotics bazaar Sprut.
Izvestia reports that Peter Vrublevsky is currently living in Switzerland, where he reportedly fled in 2022 after being βarrested in absentiaβ in Russia on charges of running a violent group that could be hired via Telegram to conduct a range of physical attacks in real life, including firebombings and muggings.
Shefel claims his former partner Golubov was involved in the development and dissemination of early ransomware strains, including Cryptolocker, and that Golubov remains active in the cybercrime community.
Meanwhile, Mr. Shefel portrays himself as someone who is barely scraping by with the few odd coding jobs that come his way each month. Incredibly, the day after our initial interview via Telegram, Shefel proposed going into business together.
By way of example, he suggested maybe a company centered around recovering lost passwords for cryptocurrency accounts, or perhaps a series of online retail stores that sold cheap Chinese goods at a steep markup in the United States.
βHi, how are you?β he inquired. βMaybe we can open business?β
The Federal Bureau of Investigation (FBI) is urging police departments and governments worldwide to beef up security around their email systems, citing a recent increase in cybercriminal services that use hacked police email accounts to send unauthorized subpoenas and customer data requests to U.S.-based technology companies.
In an alert (PDF) published this week, the FBI said it has seen an uptick in postings on criminal forums regarding the process of emergency data requests (EDRs) and the sale of email credentials stolen from police departments and government agencies.
βCybercriminals are likely gaining access to compromised US and foreign government email addresses and using them to conduct fraudulent emergency data requests to US based companies, exposing the personal information of customers to further use for criminal purposes,β the FBI warned.
In the United States, when federal, state or local law enforcement agencies wish to obtain information about an account at a technology provider β such as the accountβs email address, or what Internet addresses a specific cell phone account has used in the past β they must submit an official court-ordered warrant or subpoena.
Virtually all major technology companies serving large numbers of users online have departments that routinely review and process such requests, which are typically granted (eventually, and at least in part) as long as the proper documents are provided and the request appears to come from an email address connected to an actual police department domain name.
In some cases, a cybercriminal will offer to forge a court-approved subpoena and send that through a hacked police or government email account. But increasingly, thieves are relying on fake EDRs, which allow investigators to attest that people will be bodily harmed or killed unless a request for account data is granted expeditiously.
The trouble is, these EDRs largely bypass any official review and do not require the requester to supply any court-approved documents. Also, it is difficult for a company that receives one of these EDRs to immediately determine whether it is legitimate.
In this scenario, the receiving company finds itself caught between two unsavory outcomes: Failing to immediately comply with an EDR β and potentially having someoneβs blood on their hands β or possibly leaking a customer record to the wrong person.
Perhaps unsurprisingly, compliance with such requests tends to be extremely high. For example, in its most recent transparency report (PDF) Verizon said it received more than 127,000 law enforcement demands for customer data in the second half of 2023 β including more than 36,000 EDRs β and that the company provided records in response to approximately 90 percent of requests.
One English-speaking cybercriminal who goes by the nicknames βPwnstarβ and βPwnipotentβ has been selling fake EDR services on both Russian-language and English cybercrime forums. Their prices range from $1,000 to $3,000 per successful request, and they claim to control βgov emails from over 25 countries,β including Argentina, Bangladesh, Brazil, Bolivia, Dominican Republic, Hungary, India, Kenya, Jordan, Lebanon, Laos, Malaysia, Mexico, Morocco, Nigeria, Oman, Pakistan, Panama, Paraguay, Peru, Philippines, Tunisia, Turkey, United Arab Emirates (UAE), and Vietnam.
βI cannot 100% guarantee every order will go through,β Pwnstar explained. βThis is social engineering at the highest level and there will be failed attempts at times. Donβt be discouraged. You can use escrow and I give full refund back if EDR doesnβt go through and you donβt receive your information.β
An ad from Pwnstar for fake EDR services.
A review of EDR vendors across many cybercrime forums shows that some fake EDR vendors sell the ability to send phony police requests to specific social media platforms, including forged court-approved documents. Others simply sell access to hacked government or police email accounts, and leave it up to the buyer to forge any needed documents.
βWhen you get account, itβs yours, your account, your liability,β reads an ad in October on BreachForums. βUnlimited Emergency Data Requests. Once Paid, the Logins are completely Yours. Reset as you please. You would need to Forge Documents to Successfully Emergency Data Request.β
Still other fake EDR service vendors claim to sell hacked or fraudulently created accounts on Kodex, a startup that aims to help tech companies do a better job screening out phony law enforcement data requests. Kodex is trying to tackle the problem of fake EDRs by working directly with the data providers to pool information about police or government officials submitting these requests, with an eye toward making it easier for everyone to spot an unauthorized EDR.
If police or government officials wish to request records regarding Coinbase customers, for example, they must first register an account on Kodexglobal.com. Kodexβs systems then assign that requestor a score or credit rating, wherein officials who have a long history of sending valid legal requests will have a higher rating than someone sending an EDR for the first time.
It is not uncommon to see fake EDR vendors claim the ability to send data requests through Kodex, with some even sharing redacted screenshots of police accounts at Kodex.
Matt Donahue is the former FBI agent who founded Kodex in 2021. Donahue said just because someone can use a legitimate police department or government email to create a Kodex account doesnβt mean that user will be able to send anything. Donahue said even if one customer gets a fake request, Kodex is able to prevent the same thing from happening to another.
Kodex told KrebsOnSecurity that over the past 12 months it has processed a total of 1,597 EDRs, and that 485 of those requests (~30 percent) failed a second-level verification. Kodex reports it has suspended nearly 4,000 law enforcement users in the past year, including:
-1,521 from the Asia-Pacific region;
-1,290 requests from Europe, the Middle East and Asia;
-460 from police departments and agencies in the United States;
-385 from entities in Latin America, and;
-285 from Brazil.
Donahue said 60 technology companies are now routing all law enforcement data requests through Kodex, including an increasing number of financial institutions and cryptocurrency platforms. He said one concern shared by recent prospective customers is that crooks are seeking to use phony law enforcement requests to freeze and in some cases seize funds in specific accounts.
βWhatβs being conflated [with EDRs] is anything that doesnβt involve a formal judgeβs signature or legal process,β Donahue said. βThat can include control over data, like an account freeze or preservation request.β
In a hypothetical example, a scammer uses a hacked government email account to request that a service provider place a hold on a specific bank or crypto account that is allegedly subject to a garnishment order, or party to crime that is globally sanctioned, such as terrorist financing or child exploitation.
A few days or weeks later, the same impersonator returns with a request to seize funds in the account, or to divert the funds to a custodial wallet supposedly controlled by government investigators.
βIn terms of overall social engineering attacks, the more you have a relationship with someone the more theyβre going to trust you,β Donahue said. βIf you send them a freeze order, thatβs a way to establish trust, because [the first time] theyβre not asking for information. Theyβre just saying, βHey can you do me a favor?β And that makes the [recipient] feel valued.β
Echoing the FBIβs warning, Donahue said far too many police departments in the United States and other countries have poor account security hygiene, and often do not enforce basic account security precautions β such as requiring phishing-resistant multifactor authentication.
How are cybercriminals typically gaining access to police and government email accounts? Donahue said itβs still mostly email-based phishing, and credentials that are stolen by opportunistic malware infections and sold on the dark web. But as bad as things are internationally, he said, many law enforcement entities in the United States still have much room for improvement in account security.
βUnfortunately, a lot of this is phishing or malware campaigns,β Donahue said. βA lot of global police agencies donβt have stringent cybersecurity hygiene, but even U.S. dot-gov emails get hacked. Over the last nine months, Iβve reached out to CISA (the Cybersecurity and Infrastructure Security Agency) over a dozen times about .gov email addresses that were compromised and that CISA was unaware of.β
Change Healthcare says it has notified approximately 100 million Americans that their personal, financial and healthcare records may have been stolen in a February 2024 ransomware attack that caused the largest ever known data breach of protected health information.
Image: Tamer Tuncay, Shutterstock.com.
A ransomware attack at Change Healthcare in the third week of February quickly spawned disruptions across the U.S. healthcare system that reverberated for months, thanks to the companyβs central role in processing payments and prescriptions on behalf of thousands of organizations.
In April, Change estimated the breach would affect a βsubstantial proportion of people in America.β On Oct 22, the healthcare giant notified the U.S. Department of Health and Human Resources (HHS) that βapproximately 100 million notices have been sent regarding this breach.β
A notification letter from Change Healthcare said the breach involved the theft of:
-Health Data: Medical record #s, doctors, diagnoses, medicines, test results, images, care and treatment;
-Billing Records: Records including payment cards, financial and banking records;
-Personal Data: Social Security number; driverβs license or state ID number;
-Insurance Data: Health plans/policies, insurance companies, member/group ID numbers, and Medicaid-Medicare-government payor ID numbers.
The HIPAA Journal reports that in the nine months ending on September 30, 2024, Changeβs parent firm United Health Group had incurred $1.521 billion in direct breach response costs, and $2.457 billion in total cyberattack impacts.
Those costs include $22 million the company admitted to paying their extortionists β a ransomware group known as BlackCat and ALPHV β in exchange for a promise to destroy the stolen healthcare data.
That ransom payment went sideways when the affiliate who gave BlackCat access to Changeβs network said the crime gang had cheated them out of their share of the ransom. The entire BlackCat ransomware operation shut down after that, absconding with all of the money still owed to affiliates who were hired to install their ransomware.
A few days after BlackCat imploded, the same stolen healthcare data was offered for sale by a competing ransomware affiliate group called RansomHub.
βAffected insurance providers can contact us to prevent leaking of their own data and [remove it] from the sale,β RansomHubβs victim shaming blog announced on April 16. βChange Health and United Health processing of sensitive data for all of these companies is just something unbelievable. For most US individuals out there doubting us, we probably have your personal data.β
It remains unclear if RansomHub ever sold the stolen healthcare data. The chief information security officer for a large academic healthcare system affected by the breach told KrebsOnSecurity they participated in a call with the FBI and were told a third party partner managed to recover at least four terabytes of data that was exfiltrated from Change by the cybercriminal group. The FBI declined to comment.
Change Healthcareβs breach notification letter offers recipients two years of credit monitoring and identity theft protection services from a company called IDX. In the section of the missive titled βWhy did this happen?,β Change shared only that βa cybercriminal accessed our computer system without our permission.β
But in June 2024 testimony to the Senate Finance Committee, it emerged that the intruders had stolen or purchased credentials for a Citrix portal used for remote access, and that no multi-factor authentication was required for that account.
Last month, Sens. Mark Warner (D-Va.) and Ron Wyden (D-Ore.) introduced a bill that would require HHS to develop and enforce a set of tough minimum cybersecurity standards for healthcare providers, health plans, clearinghouses and businesses associates. The measure also would remove the existing cap on fines under the Health Insurance Portability and Accountability Act, which severely limits the financial penalties HHS can issue against providers.
According to the HIPAA Journal, the biggest penalty imposed to date for a HIPAA violation was the paltry $16 million fine against the insurer Anthem Inc., which suffered a data breach in 2015 affecting 78.8 million individuals. Anthem reported revenues of around $80 billion in 2015.
A post about the Change breach from RansomHub on April 8, 2024. Image: Darkbeast, ke-la.com.
There is little that victims of this breach can do about the compromise of their healthcare records. However, because the data exposed includes more than enough information for identity thieves to do their thing, it would be prudent to place a security freeze on your credit file and on that of your family members if you havenβt already.
The best mechanism for preventing identity thieves from creating new accounts in your name is to freeze your credit file with Equifax, Experian,Β andΒ TransUnion. This process is now free for all Americans, and simply blocks potential creditors from viewing your credit file. Parents and guardians can now also freeze the credit files for their children or dependents.
Since very few creditors are willing to grant new lines of credit without being able to determine how risky it is to do so, freezing your credit file with the Big Three is a great way to stymie all sorts of ID theft shenanigans. Having a freeze in place does nothing to prevent you from using existing lines of credit you may already have, such as credit cards, mortgage and bank accounts. When and if you ever do need to allow access to your credit file β such as when applying for a loan or new credit card β you will need to lift or temporarily thaw the freeze in advance with one or more of the bureaus.
All three bureaus allow users to place a freeze electronically after creating an account, but all of them try to steer consumers away from enacting a freeze. Instead, the bureaus are hoping consumers will opt for their confusingly named βcredit lockβ services, which accomplish the same result but allow the bureaus to continue selling access to your file to select partners.
If you havenβt done so in a while, now would be an excellent time to review your credit file for any mischief or errors. By law, everyone is entitled to one free credit report every 12 months from each of the three credit reporting agencies. But the Federal Trade Commission notes that the big three bureaus have permanently extended a program enacted in 2020 that lets you check your credit report at each of the agencies once a week for free.