Tool for Fingerprinting HTTP requests of malware. Based on Tshark and written in Python3. Working prototype stage :-)
Its main objective is to provide unique representations (fingerprints) of malware requests, which help in their identification. Unique means here that each fingerprint should be seen only in one particular malware family, yet one family can have multiple fingerprints. Hfinger represents the request in a shorter form than printing the whole request, but still human interpretable.
Hfinger can be used in manual malware analysis but also in sandbox systems or SIEMs. The generated fingerprints are useful for grouping requests, pinpointing requests to particular malware families, identifying different operations of one family, or discovering unknown malicious requests omitted by other security systems but which share fingerprint.
An academic paper accompanies work on this tool, describing, for example, the motivation of design choices, and the evaluation of the tool compared to p0f, FATT, and Mercury.
The basic assumption of this project is that HTTP requests of different malware families are more or less unique, so they can be fingerprinted to provide some sort of identification. Hfinger retains information about the structure and values of some headers to provide means for further analysis. For example, grouping of similar requests - at this moment, it is still a work in progress.
After analysis of malware's HTTP requests and headers, we have identified some parts of requests as being most distinctive. These include: * Request method * Protocol version * Header order * Popular headers' values * Payload length, entropy, and presence of non-ASCII characters
Additionally, some standard features of the request URL were also considered. All these parts were translated into a set of features, described in details here.
The above features are translated into varying length representation, which is the actual fingerprint. Depending on report mode, different features are used to fingerprint requests. More information on these modes is presented below. The feature selection process will be described in the forthcoming academic paper.
Minimum requirements needed before installation: * Python
>= 3.3, * Tshark
>= 2.2.0.
Installation available from PyPI:
pip install hfinger
Hfinger has been tested on Xubuntu 22.04 LTS with tshark
package in version 3.6.2
, but should work with older versions like 2.6.10
on Xubuntu 18.04 or 3.2.3
on Xubuntu 20.04.
Please note that as with any PoC, you should run Hfinger in a separated environment, at least with Python virtual environment. Its setup is not covered here, but you can try this tutorial.
After installation, you can call the tool directly from a command line with hfinger
or as a Python module with python -m hfinger
.
For example:
foo@bar:~$ hfinger -f /tmp/test.pcap
[{"epoch_time": "1614098832.205385000", "ip_src": "127.0.0.1", "ip_dst": "127.0.0.1", "port_src": "53664", "port_dst": "8080", "fingerprint": "2|3|1|php|0.6|PO|1|us-ag,ac,ac-en,ho,co,co-ty,co-le|us-ag:f452d7a9/ac:as-as/ac-en:id/co:Ke-Al/co-ty:te-pl|A|4|1.4"}]
Help can be displayed with short -h
or long --help
switches:
usage: hfinger [-h] (-f FILE | -d DIR) [-o output_path] [-m {0,1,2,3,4}] [-v]
[-l LOGFILE]
Hfinger - fingerprinting malware HTTP requests stored in pcap files
optional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE Read a single pcap file
-d DIR, --directory DIR
Read pcap files from the directory DIR
-o output_path, --output-path output_path
Path to the output directory
-m {0,1,2,3,4}, --mode {0,1,2,3,4}
Fingerprint report mode.
0 - similar number of collisions and fingerprints as mode 2, but using fewer features,
1 - representation of all designed features, but a little more collisions than modes 0, 2, and 4,
2 - optimal (the default mode),
3 - the lowest number of generated fingerprints, but the highest number of collisions,
4 - the highest fingerprint entropy, but slightly more fingerprints than modes 0-2
-v, --verbose Report information about non-standard values in the request
(e.g., non-ASCII characters, no CRLF tags, values not present in the configuration list).
Without --logfile (-l) will print to the standard error.
-l LOGFILE, --logfile LOGFILE
Output logfile in the verbose mode. Implies -v or --verbose switch.
You must provide a path to a pcap file (-f), or a directory (-d) with pcap files. The output is in JSON format. It will be printed to standard output or to the provided directory (-o) using the name of the source file. For example, output of the command:
hfinger -f example.pcap -o /tmp/pcap
will be saved to:
/tmp/pcap/example.pcap.json
Report mode -m
/--mode
can be used to change the default report mode by providing an integer in the range 0-4
. The modes differ on represented request features or rounding modes. The default mode (2
) was chosen by us to represent all features that are usually used during requests' analysis, but it also offers low number of collisions and generated fingerprints. With other modes, you can achieve different goals. For example, in mode 3
you get a lower number of generated fingerprints but a higher chance of a collision between malware families. If you are unsure, you don't have to change anything. More information on report modes is here.
Beginning with version 0.2.1
Hfinger is less verbose. You should use -v
/--verbose
if you want to receive information about encountered non-standard values of headers, non-ASCII characters in the non-payload part of the request, lack of CRLF tags (\r\n\r\n
), and other problems with analyzed requests that are not application errors. When any such issues are encountered in the verbose mode, they will be printed to the standard error output. You can also save the log to a defined location using -l
/--log
switch (it implies -v
/--verbose
). The log data will be appended to the log file.
Beginning with version 0.2.0
, Hfinger supports importing to other Python applications. To use it in your app simply import hfinger_analyze
function from hfinger.analysis
and call it with a path to the pcap file and reporting mode. The returned result is a list of dicts with fingerprinting results.
For example:
from hfinger.analysis import hfinger_analyze
pcap_path = "SPECIFY_PCAP_PATH_HERE"
reporting_mode = 4
print(hfinger_analyze(pcap_path, reporting_mode))
Beginning with version 0.2.1
Hfinger uses logging
module for logging information about encountered non-standard values of headers, non-ASCII characters in the non-payload part of the request, lack of CRLF tags (\r\n\r\n
), and other problems with analyzed requests that are not application errors. Hfinger creates its own logger using name hfinger
, but without prior configuration log information in practice is discarded. If you want to receive this log information, before calling hfinger_analyze
, you should configure hfinger
logger, set log level to logging.INFO
, configure log handler up to your needs, add it to the logger. More information is available in the hfinger_analyze
function docstring.
A fingerprint is based on features extracted from a request. Usage of particular features from the full list depends on the chosen report mode from a predefined list (more information on report modes is here). The figure below represents the creation of an exemplary fingerprint in the default report mode.
Three parts of the request are analyzed to extract information: URI, headers' structure (including method and protocol version), and payload. Particular features of the fingerprint are separated using |
(pipe). The final fingerprint generated for the POST
request from the example is:
2|3|1|php|0.6|PO|1|us-ag,ac,ac-en,ho,co,co-ty,co-le|us-ag:f452d7a9/ac:as-as/ac-en:id/co:Ke-Al/co-ty:te-pl|A|4|1.4
The creation of features is described below in the order of appearance in the fingerprint.
Firstly, URI features are extracted: * URI length represented as a logarithm base 10 of the length, rounded to an integer, (in the example URI is 43 characters long, so log10(43)โ2
), * number of directories, (in the example there are 3 directories), * average directory length, represented as a logarithm with base 10 of the actual average length of the directory, rounded to an integer, (in the example there are three directories with total length of 20 characters (6+6+8), so log10(20/3)โ1
), * extension of the requested file, but only if it is on a list of known extensions in hfinger/configs/extensions.txt
, * average value length represented as a logarithm with base 10 of the actual average value length, rounded to one decimal point, (in the example two values have the same length of 4 characters, what is obviously equal to 4 characters, and log10(4)โ0.6
).
Secondly, header structure features are analyzed: * request method encoded as first two letters of the method (PO
), * protocol version encoded as an integer (1 for version 1.1, 0 for version 1.0, and 9 for version 0.9), * order of the headers, * and popular headers and their values.
To represent order of the headers in the request, each header's name is encoded according to the schema in hfinger/configs/headerslow.json
, for example, User-Agent
header is encoded as us-ag
. Encoded names are separated by ,
. If the header name does not start with an upper case letter (or any of its parts when analyzing compound headers such as Accept-Encoding
), then encoded representation is prefixed with !
. If the header name is not on the list of the known headers, it is hashed using FNV1a hash, and the hash is used as encoding.
When analyzing popular headers, the request is checked if they appear in it. These headers are: * Connection * Accept-Encoding * Content-Encoding * Cache-Control * TE * Accept-Charset * Content-Type * Accept * Accept-Language * User-Agent
When the header is found in the request, its value is checked against a table of typical values to create pairs of header_name_representation:value_representation
. The name of the header is encoded according to the schema in hfinger/configs/headerslow.json
(as presented before), and the value is encoded according to schema stored in hfinger/configs
directory or configs.py
file, depending on the header. In the above example Accept
is encoded as ac
and its value */*
as as-as
(asterisk-asterisk
), giving ac:as-as
. The pairs are inserted into fingerprint in order of appearance in the request and are delimited using /
. If the header value cannot be found in the encoding table, it is hashed using the FNV1a hash.
If the header value is composed of multiple values, they are tokenized to provide a list of values delimited with ,
, for example, Accept: */*, text/*
would give ac:as-as,te-as
. However, at this point of development, if the header value contains a "quality value" tag (q=
), then the whole value is encoded with its FNV1a hash. Finally, values of User-Agent and Accept-Language headers are directly encoded using their FNV1a hashes.
Finally, in the payload features: * presence of non-ASCII characters, represented with the letter N
, and with A
otherwise, * payload's Shannon entropy, rounded to an integer, * and payload length, represented as a logarithm with base 10 of the actual payload length, rounded to one decimal point.
Hfinger
operates in five report modes, which differ in features represented in the fingerprint, thus information extracted from requests. These are (with the number used in the tool configuration): * mode 0
- producing a similar number of collisions and fingerprints as mode 2
, but using fewer features, * mode 1
- representing all designed features, but producing a little more collisions than modes 0
, 2
, and 4
, * mode 2
- optimal (the default mode), representing all features which are usually used during requests' analysis, but also offering a low number of collisions and generated fingerprints, * mode 3
- producing the lowest number of generated fingerprints from all modes, but achieving the highest number of collisions, * mode 4
- offering the highest fingerprint entropy, but also generating slightly more fingerprints than modes 0
-2
.
The modes were chosen in order to optimize Hfinger's capabilities to uniquely identify malware families versus the number of generated fingerprints. Modes 0
, 2
, and 4
offer a similar number of collisions between malware families, however, mode 4
generates a little more fingerprints than the other two. Mode 2
represents more request features than mode 0
with a comparable number of generated fingerprints and collisions. Mode 1
is the only one representing all designed features, but it increases the number of collisions by almost two times comparing to modes 0
, 1
, and 4
. Mode 3
produces at least two times fewer fingerprints than other modes, but it introduces about nine times more collisions. Description of all designed features is here.
The modes consist of features (in the order of appearance in the fingerprint): * mode 0
: * number of directories, * average directory length represented as an integer, * extension of the requested file, * average value length represented as a float, * order of headers, * popular headers and their values, * payload length represented as a float. * mode 1
: * URI length represented as an integer, * number of directories, * average directory length represented as an integer, * extension of the requested file, * variable length represented as an integer, * number of variables, * average value length represented as an integer, * request method, * version of protocol, * order of headers, * popular headers and their values, * presence of non-ASCII characters, * payload entropy represented as an integer, * payload length represented as an integer. * mode 2
: * URI length represented as an integer, * number of directories, * average directory length represented as an integer, * extension of the requested file, * average value length represented as a float, * request method, * version of protocol, * order of headers, * popular headers and their values, * presence of non-ASCII characters, * payload entropy represented as an integer, * payload length represented as a float. * mode 3
: * URI length represented as an integer, * average directory length represented as an integer, * extension of the requested file, * average value length represented as an integer, * order of headers. * mode 4
: * URI length represented as a float, * number of directories, * average directory length represented as a float, * extension of the requested file, * variable length represented as a float, * average value length represented as a float, * request method, * version of protocol, * order of headers, * popular headers and their values, * presence of non-ASCII characters, * payload entropy represented as a float, * payload length represented as a float.
Invisible protocol sniffer for finding vulnerabilities in the network. Designed for pentesters and security engineers.
Above: Invisible network protocol sniffer
Designed for pentesters and security engineers
Author: Magama Bazarov, <caster@exploit.org>
Pseudonym: Caster
Version: 2.6
Codename: Introvert
All information contained in this repository is provided for educational and research purposes only. The author is not responsible for any illegal use of this tool.
It is a specialized network security tool that helps both pentesters and security professionals.
Above is a invisible network sniffer for finding vulnerabilities in network equipment. It is based entirely on network traffic analysis, so it does not make any noise on the air. He's invisible. Completely based on the Scapy library.
Above allows pentesters to automate the process of finding vulnerabilities in network hardware. Discovery protocols, dynamic routing, 802.1Q, ICS Protocols, FHRP, STP, LLMNR/NBT-NS, etc.
Detects up to 27 protocols:
MACSec (802.1X AE)
EAPOL (Checking 802.1X versions)
ARP (Passive ARP, Host Discovery)
CDP (Cisco Discovery Protocol)
DTP (Dynamic Trunking Protocol)
LLDP (Link Layer Discovery Protocol)
802.1Q Tags (VLAN)
S7COMM (Siemens)
OMRON
TACACS+ (Terminal Access Controller Access Control System Plus)
ModbusTCP
STP (Spanning Tree Protocol)
OSPF (Open Shortest Path First)
EIGRP (Enhanced Interior Gateway Routing Protocol)
BGP (Border Gateway Protocol)
VRRP (Virtual Router Redundancy Protocol)
HSRP (Host Standby Redundancy Protocol)
GLBP (Gateway Load Balancing Protocol)
IGMP (Internet Group Management Protocol)
LLMNR (Link Local Multicast Name Resolution)
NBT-NS (NetBIOS Name Service)
MDNS (Multicast DNS)
DHCP (Dynamic Host Configuration Protocol)
DHCPv6 (Dynamic Host Configuration Protocol v6)
ICMPv6 (Internet Control Message Protocol v6)
SSDP (Simple Service Discovery Protocol)
MNDP (MikroTik Neighbor Discovery Protocol)
Above works in two modes:
The tool is very simple in its operation and is driven by arguments:
.pcap
as input and looks for protocols in it.pcap
file, its name you specify yourselfusage: above.py [-h] [--interface INTERFACE] [--timer TIMER] [--output OUTPUT] [--input INPUT] [--passive-arp]
options:
-h, --help show this help message and exit
--interface INTERFACE
Interface for traffic listening
--timer TIMER Time in seconds to capture packets, if not set capture runs indefinitely
--output OUTPUT File name where the traffic will be recorded
--input INPUT File name of the traffic dump
--passive-arp Passive ARP (Host Discovery)
The information obtained will be useful not only to the pentester, but also to the security engineer, he will know what he needs to pay attention to.
When Above detects a protocol, it outputs the necessary information to indicate the attack vector or security issue:
Impact: What kind of attack can be performed on this protocol;
Tools: What tool can be used to launch an attack;
Technical information: Required information for the pentester, sender MAC/IP addresses, FHRP group IDs, OSPF/EIGRP domains, etc.
Mitigation: Recommendations for fixing the security problems
Source/Destination Addresses: For protocols, Above displays information about the source and destination MAC addresses and IP addresses
You can install Above directly from the Kali Linux repositories
caster@kali:~$ sudo apt update && sudo apt install above
Or...
caster@kali:~$ sudo apt-get install python3-scapy python3-colorama python3-setuptools
caster@kali:~$ git clone https://github.com/casterbyte/Above
caster@kali:~$ cd Above/
caster@kali:~/Above$ sudo python3 setup.py install
# Install python3 first
brew install python3
# Then install required dependencies
sudo pip3 install scapy colorama setuptools
# Clone the repo
git clone https://github.com/casterbyte/Above
cd Above/
sudo python3 setup.py install
Don't forget to deactivate your firewall on macOS!
Above requires root access for sniffing
Above can be run with or without a timer:
caster@kali:~$ sudo above --interface eth0 --timer 120
To stop traffic sniffing, press CTRL + ะก
WARNING! Above is not designed to work with tunnel interfaces (L3) due to the use of filters for L2 protocols. Tool on tunneled L3 interfaces may not work properly.
Example:
caster@kali:~$ sudo above --interface eth0 --timer 120
-----------------------------------------------------------------------------------------
[+] Start sniffing...
[*] After the protocol is detected - all necessary information about it will be displayed
--------------------------------------------------
[+] Detected SSDP Packet
[*] Attack Impact: Potential for UPnP Device Exploitation
[*] Tools: evil-ssdp
[*] SSDP Source IP: 192.168.0.251
[*] SSDP Source MAC: 02:10:de:64:f2:34
[*] Mitigation: Ensure UPnP is disabled on all devices unless absolutely necessary, monitor UPnP traffic
--------------------------------------------------
[+] Detected MDNS Packet
[*] Attack Impact: MDNS Spoofing, Credentials Interception
[*] Tools: Responder
[*] MDNS Spoofing works specifically against Windows machines
[*] You cannot get NetNTLMv2-SSP from Apple devices
[*] MDNS Speaker IP: fe80::183f:301c:27bd:543
[*] MDNS Speaker MAC: 02:10:de:64:f2:34
[*] Mitigation: Filter MDNS traffic. Be careful with MDNS filtering
--------------------------------------------------
If you need to record the sniffed traffic, use the --output
argument
caster@kali:~$ sudo above --interface eth0 --timer 120 --output above.pcap
If you interrupt the tool with CTRL+C, the traffic is still written to the file
If you already have some recorded traffic, you can use the --input
argument to look for potential security issues
caster@kali:~$ above --input ospf-md5.cap
Example:
caster@kali:~$ sudo above --input ospf-md5.cap
[+] Analyzing pcap file...
--------------------------------------------------
[+] Detected OSPF Packet
[+] Attack Impact: Subnets Discovery, Blackhole, Evil Twin
[*] Tools: Loki, Scapy, FRRouting
[*] OSPF Area ID: 0.0.0.0
[*] OSPF Neighbor IP: 10.0.0.1
[*] OSPF Neighbor MAC: 00:0c:29:dd:4c:54
[!] Authentication: MD5
[*] Tools for bruteforce: Ettercap, John the Ripper
[*] OSPF Key ID: 1
[*] Mitigation: Enable passive interfaces, use authentication
--------------------------------------------------
[+] Detected OSPF Packet
[+] Attack Impact: Subnets Discovery, Blackhole, Evil Twin
[*] Tools: Loki, Scapy, FRRouting
[*] OSPF Area ID: 0.0.0.0
[*] OSPF Neighbor IP: 192.168.0.2
[*] OSPF Neighbor MAC: 00:0c:29:43:7b:fb
[!] Authentication: MD5
[*] Tools for bruteforce: Ettercap, John the Ripper
[*] OSPF Key ID: 1
[*] Mitigation: Enable passive interfaces, use authentication
The tool can detect hosts without noise in the air by processing ARP frames in passive mode
caster@kali:~$ sudo above --interface eth0 --passive-arp --timer 10
[+] Host discovery using Passive ARP
--------------------------------------------------
[+] Detected ARP Reply
[*] ARP Reply for IP: 192.168.1.88
[*] MAC Address: 00:00:0c:07:ac:c8
--------------------------------------------------
[+] Detected ARP Reply
[*] ARP Reply for IP: 192.168.1.40
[*] MAC Address: 00:0c:29:c5:82:81
--------------------------------------------------
I wrote this tool because of the track "A View From Above (Remix)" by KOAN Sound. This track was everything to me when I was working on this sniffer.
NETWORK Pcap File Analysis, It was developed to speed up the processes of SOC Analysts during analysis
Tested
OK Debian
OK Ubuntu
$ pip install pyshark
$ pip install dpkt
$ Wireshark
$ Tshark
$ Mergecap
$ Ngrep
$ https://github.com/emrekybs/Bryobio.git
$ cd Bryobio
$ chmod +x bryobio.py
$ python3 bryobio.py