PCAP Analysis with Zeek | Digital Forensics and Incident Response
Introduction
Zeek (previously called bro) is a useful tool that enables high-level PCAP analysis at the application layer. I have mostly been doing my packet capture analysis in Wireshark and while Wireshark is still my number one tool for PCAP analysis, Zeek was a great find for me. Zeek is very suitable for performing automated analysis for quickly zeroing in on information. This post provides a quick introduction to Zeek and its capabilities.
We will be using a sample PCAP in this post. Grab a sample PCAP file here.
Obtaining Zeek log files
Zeek produces several .log
files pertaining to various types of information contained in the PCAP. To generate these logs files, feed the PCAP to Zeek:
zeek -r <pcap>
The -r
option specifies offline PCAP file analysis whereas -w
specifies live network capture.
Depending on the size of the PCAP, this could take a while. When done, Zeek creates the following log files (depending on the type of traffic discovered):
- dns.log
- http.log
- ssl.log
- dhcp.log
- (etc.)
The format within these log files is self-explanatory with column names being indicative of the information contained within the columns. Columns are tab-separated and are described in Zeek docs.
Parsing Zeek logs with zeek-cut
zeek-cut
is a useful utility that ships with Zeek and provides the ability to extract desired information contained within the Zeek *.log
files. I usually use zeek-cut
to grep
and awk
and/or export data in CSV format. Some examples:
zeek-cut -u ts method host uri < http.log | grep "<string>" | awk '{print $1$}'
zeek-cut -F ',' -u ts method host uri < http.log | grep "<string>" | awk '{print $3 }'
cat conn.log | zeek-cut id.orig_h id.orig_p id.resp_h id.resp_p > temp.txt
Analyzing information in Zeek log files using ZAT
An alternative to manually converting Zeek log files to CSV format using zeek-cut
mentioned above is the Zeek Analysis Toolkit (ZAT). ZAT can help automate the process of taking the Zeek log files and turning them into Pandas dataframes. I would advise that some familiarity with Pandas is needed but after learning the basics of Pandas dataframe manipulation, gleaning information from the log files becomes trivial. To begin, let’s load up the zat
module and read the Zeek log files in a dataframe:
from zat.log_to_dataframe import LogToDataFrame
log_to_df = LogToDataFrame()
zeek_df = log_to_df.create_dataframe('dns.log')
pd.set_option('display.max_columns', None)
zeek_df
Since the information is now contained in a convenient dataframe, we can write queries to better understand the logs. Some examples are provided below.
zeek_df['query'].value_counts()
Automated anomaly detection in Zeek logs
I tried using this tool that relies on pyOD
to detect outliers in multivariate data within the conn.log
file. However, the tool is not well-documented yet and in my opinion it’s better to write our own scripts to run anomaly detection models on Zeek logs for better control and comprehension of the process and results.
Note: My .ipynb
pertaining to some of the examples mentioned is available here.