Fuzzable - Framework For Automating Fuzzable Target Discovery With Static Analysis

By: noreply@blogger.com (Unknown) · noreply@blogger.com (Unknown)

Framework for Automating Fuzzable Target Discovery with Static Analysis.

Introduction

Vulnerability researchers conducting security assessments on software will often harness the capabilities of coverage-guided fuzzing through powerful tools like AFL++ and libFuzzer. This is important as it automates the bughunting process and reveals exploitable conditions in targets quickly. However, when encountering large and complex codebases or closed-source binaries, researchers have to painstakingly dedicate time to manually audit and reverse engineer them to identify functions where fuzzing-based exploration can be useful.

Fuzzable is a framework that integrates both with C/C++ source code and binaries to assist vulnerability researchers in identifying function targets that are viable for fuzzing. This is done by applying several static analysis-based heuristics to pinpoint risky behaviors in the software and the functions that executes them. Researchers can then utilize the framework to generate basic harness templates, which can then be used to hunt for vulnerabilities, or to be integrated as part of a continuous fuzzing pipeline, such as Google's oss-fuzz project.

In addition to running as a standalone tool, Fuzzable is also integrated as a plugin for the Binary Ninja disassembler, with support for other disassembly backends being developed.

Check out the original blog post detailing the tool here, which highlights the technical specifications of the static analysis heuristics and how this tool came about. This tool is also featured at Black Hat Arsenal USA 2022.

Features

Supports analyzing binaries (with Angr and Binary Ninja) and source code artifacts (with tree-sitter).
Run static analysis both as a standalone CLI tool or a Binary Ninja plugin.
Harness generation to ramp up on creating fuzzing campaigns quickly.

Installation

Some binary targets may require some sanitizing (ie. signature matching, or identifying functions from inlining), and therefore fuzzable primarily uses Binary Ninja as a disassembly backend because of it's ability to effectively solve these problems. Therefore, it can be utilized both as a standalone tool and plugin.

Since Binary Ninja isn't accessible to all and there may be a demand to utilize for security assessments and potentially scaling up in the cloud, an angr fallback backend is also supported. I anticipate to incorporate other disassemblers down the road as well (priority: Ghidra).

Command Line (Standalone)

If you have Binary Ninja Commercial, be sure to install the API for standalone headless usage:

$ python3 /Applications/Binary\ Ninja.app/Contents/Resources/scripts/install_api.py

Install with pip:

$ pip install fuzzable

Manual/Development Build

We use poetry for dependency management and building. To do a manual build, clone the repository with the third-party modules:

$ git clone --recursive https://github.com/ex0dus-0x/fuzzable

To install manually:

$ cd fuzzable/

# without poetry
$ pip install .

# with poetry
$ poetry install

# with poetry for a development virtualenv
$ poetry shell

You can now analyze binaries and/or source code with the tool!

# analyzing a single shared object library binary
$ fuzzable analyze examples/binaries/libbasic.so

# analyzing a single C source file
$ fuzzable analyze examples/source/libbasic.c

# analyzing a workspace with multiple C/C++ files and headers
$ fuzzable analyze examples/source/source_bundle/

Binary Ninja Plugin

fuzzable can be easily installed through the Binary Ninja plugin marketplace by going to Binary Ninja > Manage Plugins and searching for it. Here is an example of the fuzzable plugin running, accuracy identifying targets for fuzzing and further vulnerability assessment:

Usage

fuzzable comes with various options to help better tune your analysis. More will be supported in future plans and any feature requests made.

Static Analysis Heuristics

To determine fuzzability, fuzzable utilize several heuristics to determine which targets are the most viable to target for dynamic analysis. These heuristics are all weighted differently using the scikit-criteria library, which utilizes multi-criteria decision analysis to determine the best candidates. These metrics and are there weights can be seen here:

Heuristic	Description	Weight
Fuzz Friendly Name	Symbol name implies behavior that ingests file/buffer input	0.3
Risky Sinks	Arguments that flow into risky calls (ie memcpy)	0.3
Natural Loops	Number of loops detected with the dominance frontier	0.05
Cyclomatic Complexity	Complexity of function target based on edges + nodes	0.05
Coverage Depth	Number of callees the target traverses into	0.3

As mentioned, check out the technical blog post for a more in-depth look into why and how these metrics are utilized.

Many metrics were largely inspired by Vincenzo Iozzo's original work in 0-knowledge fuzzing.

Every targets you want to analyze is diverse, and fuzzable will not be able to account for every edge case behavior in the program target. Thus, it may be important during analysis to tune these weights appropriately to see if different results make more sense for your use case. To tune these weights in the CLI, simply specify the --score-weights argument:

$ fuzzable analyze <TARGET> --score-weights=0.2,0.2,0.2,0.2,0.2

Analysis Filtering

By default, fuzzable will filter out function targets based on the following criteria:

Top-level entry calls - functions that aren't called by any other calls in the target. These are ideal entry points that have potentially very high coverage.
Static calls - (source only) functions that are static and aren't exposed through headers.
Imports - (binary only) other library dependencies being used by the target's implementations.

To see calls that got filtered out by fuzzable, set the --list_ignored flag:

$ fuzzable analyze --list-ignored <TARGET>

In Binary Ninja, you can turn this setting in Settings > Fuzzable > List Ignored Calls.

In the case that fuzzable falsely filters out important calls that should be analyzed, it is recommended to use --include-* arguments to include them during the run:

# include ALL non top-level calls that were filtered out
$ fuzzable analyze --include-nontop <TARGET>

# include specific symbols that were filtered out
$ fuzzable analyze --include-sym <SYM> <TARGET>

In Binary Ninja, this is supported through Settings > Fuzzable > Include non-top level calls and Symbols to Exclude.

Harness Generation

Now that you have found your ideal candidates to fuzz, fuzzable will also help you generate fuzzing harnesses that are (almost) ready to instrument and compile for use with either a file-based fuzzer (ie. AFL++, Honggfuzz) or in-memory fuzzer (libFuzzer). To do so in the CLI:

If this target is a source codebase, the generic source template will be used.

If the target is a binary, the generic black-box template will be used, which ideally can be used with a fuzzing emulation mode like AFL-QEMU. A copy of the binary will also be created as a shared object if the symbol isn't exported directly to be dlopened using LIEF.

At the moment, this feature is quite rudimentary, as it simply will create a standalone C++ harness populated with the appropriate parameters, and will not auto-generate code that is needed for any runtime behaviors (ie. instantiating and freeing structures). However, the templates created for fuzzable should get still get you running quickly. Here are some ambitious features I would like to implement down the road:

Full harness synthesis - harnesses will work directly with absolutely no manual changes needed.
Synthesis from potential unit tests using the DeepState framework (Source only).
Immediate deployment to a managed continuous fuzzing fleet.

Exporting Reports

fuzzable supports generating reports in various formats. The current ones that are supported are JSON, CSV and Markdown. This can be useful if you are utilizing this as part of automation where you would like to ingest the output in a serializable format.

In the CLI, simply pass the --export argument with a filename with the appropriate extension:

$ fuzzable analyze --export=report.json <TARGET>

In Binary Ninja, go to Plugins > Fuzzable > Export Fuzzability Report > ... and select the format you want to export to and the path you want to write it to.

Contributing

This tool will be continuously developed, and any help from external mantainers are appreciated!

Create an issue for feature requests or bugs that you have come across.
Submit a pull request for fixes and enhancements that you would like to see contributed to this tool.

License

Fuzzable is licensed under the MIT License.

Download Fuzzable

💾

Kam1n0 - Assembly Analysis Platform

By: noreply@blogger.com (Unknown)

Kam1n0 v2.x is a scalable assembly management and analysis platform. It allows a user to first index a (large) collection of binaries into different repositories and provide different analytic services such as clone search and classification. It supports multi-tenancy access and management of assembly repositories by using the concept of Application. An application instance contains its own exclusive repository and provides a specialized analytic service. Considering the versatility of reverse engineering tasks, Kam1n0 v2.x server currently provides three different types of clone-search applications: Asm-Clone, Sym1n0, and Asm2Vec, and an executable classification based on Asm2Vec. New application type can be further added to the platform.

A user can create multiple application instances. An application instance can be shared among a specific group of users. The application repository read-write access and on-off status can be controlled by the application owner. Kam1n0 v2.x server can serve the applications concurrently using several shared resource pools.

Kam1n0 was developed by Steven H. H. Ding and Miles Q. Li under the supervision of Benjamin C. M. Fung of the Data Mining and Security Lab at McGill University in Canada. It won the second prize at the Hex-Rays Plug-In Contest 2015. If you find Kam1n0 useful, please cite our paper:

S. H. H. Ding, B. C. M. Fung, and P. Charland. Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 461-470, San Francisco, CA: ACM Press, August 2016.
S. H. H. Ding, B. C. M. Fung, and P. Charland. Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 40th IEEE Symposium on Security and Privacy (S&P), 18 pages, San Francisco, CA: IEEE Computer Society, May 2019.

Asm-Clone

Asm-Clone applications try to solve the efficient subgraph search problem (i.e. graph isomorphism problem) for assembly functions (<1.3s average query time and <30ms average index time with 2.3M functions). Given a target function (the one on the left as shown below), it can identify the cloned subgraphs among other functions in the repository (the one on the right as shown below).

Application Type: Asm-Clone
The original clone search service used in Kam1n0 v1.x.
Currently support Meta-PC, ARM, PowerPC, and TMS320c6 (experimental).
Support subgraph clone search within a certain assembly code family.
- + Good interpretability of the result: breaks down to subgraphs.
- + Accurate for searching within the given code family.
- + Good for differing various patches or versions for big binaries.
- - Relatively more sensitive to instruction set changes, optimizations, and obfuscation.
- - Need to pre-define the syntax of the assembly code language.
- - Need to have assembly code of the same chosen family in the repository.

Sym1n0

Semantic clone search by differentiated fuzz testing and constraint solving. An efficient and scalable dynamic-static hybrid approach (<1s average query time and <100ms average index time with 1.5M functions). Given a target function (the one on the left as shown below), it can identify the cloned subgraphs among other functions in the repository (the one on the right as shown below). Support visualization of abstract syntax graph.

Application Type: Sym1n0 (v2 only)
Clone search by both symbolic execution and concrete execution.
Differentiate functions based on their different I/O behavior.
Clone search conducted on the abstract syntax graph constructed from Vex IR (powered by LibVex).
- + Clone search across different assembly code families.
  - For example, indexed x86 binaries but the query is ARM code.
- + Subgraph clone search.
- + Support a wide range of families throub LibVex.
  - x86, AMD64, MIPS32, MIPS64, PowerPC32, PowerPC64, ARM32, and ARM64.
- + An efficient dynamic-static hybrid approach.
- + Ideal for analyzing firmware compiled for different processors.
- - Sensitive to heavy graph manipulation (such as a full flattening).
- - Sensitive to large scale breakdown of basic block integrity.

Asm2Vec

Asm2Vec leverages representation learning. It understands the lexical semantic relationship of assembly code. For example, xmm* registers are semantically related to vector operations such as addps. memcpy is similar to strcpy. The graph below shows different assembly functions compiled from the same source code of gmpz_tdiv_r_2exp in libgmp. From left to right, the assembly functions are compiled with GCC O0 option, GCC O3 option, O-LLVM obfuscator Control Flow Graph, Flattening option, and LLVM obfuscator Bogus Control Flow Graph option. Asm2Vec can statically identify them as clones.

Leverage representation learning.
Understand the lexical semantic relationship of assembly code.
- + State-of-the-art for clone search against heavy code obfuscation techniques.
  - (>0.8 accuracy for all options applied in O-LLVM, multiple iterations).
- + State-of-the-art for clone search against code optimization.
  - (>0.8 accuracy between O0 and O3, >0.94 accuracy between O2 and O3)
- + Even better result than the most recent dynamic approach.
- + Much more efficient than recent dynamic approaches.
- + Do not need to define the architecture. It self-learns by reading large volume of code.
- + Static approach: efficient and scalable.
- - No subgraphs.
- - Assume the assembly code come from the same processor family.
- - Static approach: cannot recognize jump table, etc.

Executable Classification

In this application, the user defines a set of software classes which are based on functional relatedness and provides binaries belong to each class. Then the system automatically groups functions into clusters in which functions are connected directly or indirectly by clone relation. The clusters that are discriminative for the classification are kept and serve as signatures of their classes. Given a target binary, the system shows the degree it belongs to each software class.

Use Asm2Vec as its function similarity computation model
- + Provide interpretable classification results.
- + Learn common characteristics (i.e., function clusters) of each class.
- + Able to handle smaller and imbalanced datasets than an ordinary machine learning model.
- - The limitation is that the assumption that binaries in the same class share some common functions must hold for the system to work.

Platform Overview

The figure below shows the major UI components and functionalities of Kam1n0 v2.x. We adopt a material design. In general, each user has an application list, a running-job list, and a result file list.

Application list shows the application instances owned by the user and shared by the others.
Running-job list shows the running progress for a large query (such as chrome.dll) and indexing procedure.
Result file list displays the saved results. More details of the UI design can be found in our detailed tutorial.

Installation Instruction

The current release of Kam1n0 consists of two installers: the core server and IDA Pro plug-in.

Installer	Included components	Description
Kam1n0-Server.msi	Core engine	Main engine providing service for indexing and searching.
	Workbench	A user interface to manage the repositories and running service.
	Web user interface	Web user interface for searching/indexing binary files and assembly functions.
	Visual C++ redistributable for VS 15	Dependecy for z3.
Kam1n0-IDA-Plugin.msi	Plug-in	Connectors and user interface.
	PyPI wheels for Cefpython	Rendering engine for the user interface.
	PyPI and dependent wheels	Package management for Python. Included for IDA 6.8 &6.9.

Installing the Kam1n0 Server

The Kam1n0 core engine is purely written in Java. You need the following dependencies:

[Required] The latest x64 11.x JRE/JDK distribution from Oracle.
[Optional] The latest version of IDA Pro with the idapython plug-in installed. The Python plug-in and runtime should have already been installed with IDA Pro. Reinstall IDA Pro if necessary.

Download the Kam1n0-Server.msi file from our release page. Follow the instructions to install the server. You will be prompted to select an installation path. IDA Pro is optional if the server does not have to deal with any disassembling. In other words, the client side uses the Kam1n0 plugin for IDA Pro. It is strongly suggested to have the IDA Pro installed with the Kam1n0 server. Kam1n0 server will automatically detect your IDA Pro by looking for the default application that you used to open .i64 file.

Installing the IDA Pro Plug-in

The Kam1n0 IDA Pro plug-in is written in Python for the logic and in HTML/JavaScript for the rendering. The following dependencies are required for its installation:

[Required] IDA Pro (>6.7) with the idapython plug-in installed. The Python plug-in and runtime should have already been installed with IDA Pro. Reinstall IDA Pro if necessary.

Next, download the Kam1n0-IDA-Plugin.msi installer from our release page. Follow the instructions to install the plug-in and runtime. Please note that the plug-in has to be installed in the IDA Pro plugins folder which is located at $IDA_PRO_PATH$/plugins. For example, on Windows, the path could be C:/Program Files (x86)/IDA 6.95/plugins. The installer will detect and validate the path.

Setting Up Kam1n0 on Ubuntu/Debian-based systems

Ensure you have the Oracle version of Java 11. (Not default-jdk in apt.)
- Add Oracle's PPA and then update your package repository: sudo add-apt-repository ppa:webupd8team/java
  - If you encounter any errors (such as ~webupd8team not found), if you are on a proxy, make sure you set and export your http_proxy and https_proxy environment variables, and then try again with the -E option on sudo. Additionally, if you are getting a 'add-apt repository command not found error, try: sudo apt install -y software-properties-common.
- Afterwards: sudo apt-get update, and sudo apt-get install oracle-java8-installer
  - Verify your Java version with java -version; you may need to manually set the JAVA_HOME environment variable (in /etc/environment), JAVA_HOME=/usr/lib/jvm/java-11-oracle
Download the latest release for Linux (Kam1n0-IDA-Plugin.tar.gz and Kam1n0-Server.tar.gz) from Kam1n0-Community.
Extract the two tarballs (i.e. tar –xvzf Kam1n0-IDA-Plugin.tar.gz and tar –xvzf Kam1n0-Server.tar.gz)
The Kam1n0-Server.tar.gz file will create the server directory.
Inside the server directory, you should see a file called kam1n0.properties, which is where you will set various configurations for kam1n0; this is very important.
Set kam1n0.data.path to where you would like your kam1n0-related data to be written to. We choose to put it in the same place that we keep our server. kam1n0.ida.home refers to where your IDA installation is located. Comment this line (and kam1n0.ida.batch, the line following) if you do not have IDA and don't plan to use kam1n0 for disassembly. For more (accurate) information about the kam1n0.properties file, see the kam1n0.properties.explained file.
Run kam1n0-server-workbench: java -jar kam1n0-server-workbench.jar. This should cause a window to pop up, which prompts you to actually start kam1n0. Alternatively, run kam1n0-server: java -jar kam1n0-server.jar --start. This starts the server from the console without a window.
To connect and use it, go to 127.0.0.1:8571 (the default port kam1n0 listens on should be 8571, but can be changed in kam1n0.properties) in your browser. You should see the pretty kam1n0 web UI. From there, follow the tutorial on the Kam1n0-Community repo if you do not know how to use kam1n0.

Backward Compatibility

The assembly code repositories and configuration files used in previous versions (<2.0.0) are no longer supported by the latest version. Please contact us if you need to migrate your old repositories.

Documentation

Development

Clone the latest stable branch (don't forget --recursive!):

git clone --recursive -b master2.x --single-branch https://github.com/McGill-DMaS/Kam1n0-Community

Importing the project.

IntelliJ: Import the root /kam1n0/kam1n0/ as a maven project. All the submodules will be loaded accordingly. EclipseEE: Add the cloned git repository to the git view. Import all maven projects from the git repository. You may need to modify the classpath to address any error. All the resources path are dynamically modified when running inside an IDE (through the kam1n0-resources submodule).

To build the project:

cd /kam1n0/kam1n0
mvn -DskipTests clean package
mvn -DskipTests package

The resulting binaries can be found in /kam1n0/build-bins/

To run the test code, you will need to first download chromedriver.exe from http://chromedriver.chromium.org/ and add its absolute path into an environment variable named webdriver.chrome.driver. It is also required that there is a chrome browser installed in the system. The test code will launch a browser instance to test the UI interfaces. The complete testing procedure will take approximately 3 hours.

cd /kam1n0/kam1n0
mvn -DskipTests clean package # you can skip this one if you already built the package
mvn -DskipTests package       # you can skip this one if you already built the package
mvn -DforkMode=never test

These commands only compiles java with pre-compiled wheels of libvex and z3. It works out-of-the-box. The build of libvex and z3 is platform-dependent. We use a fork of libvex from Angr. More serious build scripts as well as installers for windows/linux can be found under /kam1n0-builds/

kam1n0: The server's source code.
kam1n0-builds: Installer source code and scripts to build the distribution.
kam1n0-clients: The clients' source code.

Binary Releases

We have a Jenkin server for contineous development and delivery. Latest stable release will be posted here. Periodically we will synchronize our internal experimental branch with this repository.

Licensing

The software was developed by Steven H. H. Ding, Miles Q. Li, and Benjamin C. M. Fung in the McGill Data Mining and Security Lab and Queen's L1NNA Research Laboratory in Canada. It is distributed under the Apache License Version 2.0. Please refer to LICENSE.txt for details.

Download Kam1n0-Community

FreshRSS