MSI Dump - a tool that analyzes malicious MSI installation packages, extracts files, streams, binary data and incorporates YARA scanner.
On Macro-enabled Office documents we can quickly use oletools mraptor to determine whether document is malicious. If we want to dissect it further, we could bring in oletools olevba or oledump.
To dissect malicious MSI files, so far we had only one, but reliable and trustworthy lessmsi. However, lessmsi
doesn't implement features I was looking for:
Hence this is where msidump
comes into play.
This tool helps in quick triages as well as detailed examinations of malicious MSIs corpora. It lets us:
file
/MIME type deduction to determine inner data typeIt was created as a companion tool to the blog post I released here:
WindowsInstaller.Installer
interfaces, currently it is not possible to support native Linux platforms. Maybe wine python msidump.py
could help, but haven't tried that yet.cmd> python msidump.py evil.msi -y rules.yara
Here we can see that input MSI is injected with suspicious VBScript and contains numerous executables in it.
We see from the triage table that it was present in Binary
table. Lets get him:
python msidump.py putty-backdoored.msi -l binary -i UBXtHArj
We can specify which to record dump either by its name/ID or its index number (here that would be 7).
Lets have a look at another example. This time there is executable stored in Binary
table that will be executed during installation:
To extract that file we're gonna go with
python msidump.py evil2.msi -x binary -i lmskBju -O extracted
Where
-x binary
tells to extract contents of Binary
table-i lmskBju
specifies which record exactly to extract-O extracted
sets output directoryFor the best output experience, run the tool on a maximized console window or redirect output to file:
python msidump.py [...] -o analysis.log
PS D:\> python .\msidump.py --help
options:
-h, --help show this help message and exit
Required arguments:
infile Input MSI file (or directory) for analysis.
Options:
-q, --quiet Surpress banner and unnecessary information. In triage mode, will display only verdict.
-v, --verbose Verbose mode.
-d, --debug Debug mode.
-N, --nocolor Dont use colors in text output.
-n PRINT_LEN, --print-len PRINT_LEN
When previewing data - how many bytes to include in preview/hexdump. Default: 128
-f {text,json,csv}, --format {text,json,csv}
Output format: text, json, csv. Default: text
-o path, --outfile path
Redirect program output to this file.
-m, --mime When sniffing inner data type, report MIME types
Analysis Modes:
-l what, --list what List specific table contents. See help message to learn what can be listed.
-x what, --extract what
Extract data from MSI. For what can be extracted, refer to help message.
Analysis Specific options:
-i number|name, --record number|name
Can be a number or name. In --list mode, specifies which record to dump/display entirely. In --extract mode dumps only this particular record to --outdir
-O path, --outdir path
When --extract mode is used, specifies output location where to extract data.
-y path, --yara path Path to YARA rule/directory with rules. YARA will be matched against Binary data, streams and inner files
------------------------------------------------------
- What can be listed:
--list CustomAction - Specific table
--lis t Registry,File - List multiple tables
--list stats - Print MSI database statistics
--list all - All tables and their contents
--list olestream - Prints all OLE streams & storages.
To display CABs embedded in MSI try: --list _Streams
--list cabs - Lists embedded CAB files
--list binary - Lists binary data embedded in MSI for its own purposes.
That typically includes EXEs, DLLs, VBS/JS scripts, etc
- What can be extracted:
--extract all - Extracts Binary data, all files from CABs, scripts from CustomActions
--extract binary - Extracts Binary data
--extract files - Extracts files
--extract cabs - Extracts cabinets
--extract scripts - Extrac ts scripts
------------------------------------------------------
CustomAction Type
s based on assessing their numbers, which is prone to being evaded. Apparently when naming my tool, I didn't think on checking whether it was already taken. There is another tool named msidump
being part of msitools GNU package:
This and other projects are outcome of sleepless nights and plenty of hard work. If you like what I do and appreciate that I always give back to the community, Consider buying me a coffee (or better a beer) just to say thank you!
Mariusz Banach / mgeeky, (@mariuszbit)
<mb [at] binary-offensive.com>
Subparse, is a modular framework developed by Josh Strochein, Aaron Baker, and Odin Bernstein. The framework is designed to parse and index malware files and present the information found during the parsing in a searchable web-viewer. The framework is modular, making use of a core parsing engine, parsing modules, and a variety of enrichers that add additional information to the malware indices. The main input values for the framework are directories of malware files, which the core parsing engine or a user-specified parsing engine parses before adding additional information from any user-specified enrichment engine all before indexing the information parsed into an elasticsearch index. The information gathered can then be searched and viewed via a web-viewer, which also allows for filtering on any value gathered from any file. There are currently 3 parsing engine, the default parsing modules (ELFParser, OLEParser and PEParser), and 4 enrichment modules (ABUSEEnricher, C APEEnricher, STRINGEnricher and YARAEnricher).
ย
To get started using Subparse there are a few requrired/recommened programs that need to be installed and setup before trying to work with our software.
Software | Status | Link |
---|---|---|
Docker | Required | Installation Guide |
Python3.8.1 | Required | Installation Guide |
Pyenv | Recommended | Installation Guide |
After getting the required/recommended software installed to your system there are a few other steps that need to be taken to get Subparse installed.
sudo get apt install build-essential
pip3 install -r ./requirements.txt
docker-compose up
Note: This might take a little time due to downloading the images and setting up the containers that will be needed by Subparse.
ย
Command line options that are available for subparse/parser/subparse.py:
Argument | Alternative | Required | Description |
---|---|---|---|
-h | --help | No | Shows help menu |
-d SAMPLES_DIR | --directory SAMPLES_DIR | Yes | Directory of samples to parse |
-e ENRICHER_MODULES | --enrichers ENRICHER_MODULES | No | Enricher modules to use for additional parsing |
-r | --reset | No | Reset/delete all data in the configured Elasticsearch cluster |
-v | --verbose | No | Display verbose commandline output |
-s | --service-mode | No | Enters service mode allowing for mode samples to be added to the SAMPLES_DIR while processing |
To view the results from Subparse's parsers, navigate to localhost:8080. If you are having trouble viewing the site, make sure that you have the container started up in Docker and that there is not another process running on port 8080 that could cause the site to not be available.
ย
Before any parser is executed general information is collected about the sample regardless of the underlying file type. This information includes:
Parsers are ONLY executed on samples that match the file type. For example, PE files will by default have the PEParser executed against them due to the file type corresponding with those the PEParser is able to examine.
ย
These modules are optional modules that will ONLY get executed if specified via the -e | --enrichers flag on the command line.
ย
Subparse's web view was built using Bootstrap for its CSS, this allows for any built in Bootstrap CSS to be used when developing your own custom Parser/Enricher Vue.js files. We have also provided an example for each to help get started and have also implemented a few custom widgets to ease the process of development and to promote standardization in the way information is being displayed. All Vue.js files are used for dynamically displaying information from the custom Parser/Enricher and are used as templates for the data.
Note: Naming conventions with both class and file names must be strictly adheared to, this is the first thing that should be checked if you run into issues now getting your custom Parser/Enricher to be executed. The naming convention of your Parser/Enricher must use the same name across all of the files and class names.
The logger object is a singleton implementation of the default Python logger. For indepth usage please reference the Offical Doc. For Subparse the only logging methods that we recommend using are the logging levels for output. These are: