Parsing firewall logs with FilterX

Your SIEM is only as good as the data it ingests. But when firewall logs from major vendors like FortiGate, Palo Alto, and SonicWall arrive incomplete, inconsistent, or malformed, most syslog pipelines struggle to keep up.

FilterX, the open-source parsing engine behind AxoSyslog, solves this problem at its core—providing a flexible, scalable way to normalize, enrich, and classify logs before they reach your SIEM. Whether you’re managing your own pipeline or scaling with Axoflow, FilterX ensures your security data is clean, consistent, and full of context—every time.

The Common Logging Dilemma

Traditional syslog infrastructures face tricky design trade-offs:

Option 1:

Send all syslog data to the same port (usually port 514) and sort incoming messages by sender.
Sounds simple, but it’s not:

Many endpoints relay their logs, which can easily obscure the original sender if relays aren’t correctly configured.
Worse, many commercial appliances—including firewalls—generate invalid syslog messages that are poorly formatted or missing key details like the hostname or timestamp.
This makes sender attribution difficult and forces complex, error-prone parsing later on.

Option 2:

Segment syslog traffic by assigning similar devices to dedicated ports. This offers basic classification, but:

Every device must be manually configured to send logs to the correct port—an operational headache.
You still need to parse messages to clean up inconsistent formats and ensure correct attribution.
Misconfigured devices sending to the wrong port may have their logs silently dropped, with no alert or warning.

FilterX: Built for Modern Logging Challenges

FilterX addresses these problems head-on. It’s a powerful replacement for traditional syslog-ng filters, parsers, and rewrite rules, designed to filter, parse, manipulate, and restructure both simple and complex data.

Key features:

High-speed parsers for common log formats.
Deep integration with routing—route messages based on parsed content or classification results.
Exception handling and backtracking in classification trees, with minimal performance impact.
Native support for modern, nested formats like JSON and OpenTelemetry—far beyond what legacy tools like syslog-ng can handle.

If you’ve used syslog-ng’s old patterndb engine, think of FilterX as its high-performance, ultra-flexible successor—minus the XML headaches.

Examples

Let’s see some FilterX parsing and classification examples. If you don’t know FilterX yet, I recommend quickly checking our FilterX introduction blog or the FilterX documentation.

First, we’ll classify and parse messages for the three commercial firewalls that we’ve discussed in an earlier blog: FortiGate, Palo Alto, and SonicWall. Here’s a sample message for each, along with some message characteristics specific to that firewall that we can use to classify the message.

NOTE: A well-formed syslog message consists of a header and the message body, like this:

<priority>timestamp hostname application: message body with info
<-----------------header----------------><----message body----->

None of the following messages are well-formed syslog messages.

FortiGate parser

<165> us-east-1-dc1-a-dmz-fw date=2025-03-26 time=18:41:07Z devname=us-east-1-dc1-a-dmz-fw devid=FGT60D4614044725 logid=0100040704 type=event subtype=system level=notice vd=root logdesc="System performance statistics" action="perf-stats" cpu=2 mem=35 totalsession=61 disk=2 bandwidth=158/138 setuprate=2 disklograte=0 fazlograte=0 msg="Performance statistics: average CPU: 2, memory: 35, concurrent sessions: 61, setup-rate: 2"

This message begins with an incomplete syslog header (has only <priority> hostname), followed by space-separated key=value pairs, which repeat the hostname in the devname field and has a devid field followed by a logid field. Let’s see what a FilterX block for this looks like:

block log parse_fortigate() {
  filterx {
	# Check that the message contains the devid= and logid= strings
	includes($MSG, "devid=") and includes($MSG, "logid=");

	# Parse the $MSG part of the message as key-value pairs into the key_values object
	declare key_values = parse_kv($MSG);

	# Verify that the devid field was present
	key_values.devid;

	# Set the hostname syslog field to the value of the devname field
	$HOST = key_values.devname;

	# Successfully classified a fortigate message
	$VENDOR = "fortinet";
	$PRODUCT = "fortigate";

	# Set variables for Splunk sourcetype and index based on the value of the type field
	switch (key_values.type) {
  	case "event":
    	  $SPLUNK_SOURCETYPE = "fortigate_event";
    	  $SPLUNK_INDEX = "netops";
    	break;
  	case "traffic":
    	  $SPLUNK_SOURCETYPE = "fortigate_traffic";
    	  $SPLUNK_INDEX = "netfw";
    	break;
  	case "utm":
    	  $SPLUNK_SOURCETYPE = "fortigate_utm";
    	  $SPLUNK_INDEX = "netfw";
    	break;
  	case "anomaly":
    	  $SPLUNK_SOURCETYPE = "fortigate_anomaly";
    	  $SPLUNK_INDEX = "netfw";
    	break;
  	default:
    	  $SPLUNK_SOURCETYPE = "fortigate_event";
    	  $SPLUNK_INDEX = "netops";
    	break;
	};
  };
};

Note: The example above sets the Splunk sourcetype and index based on the content from the message.

Palo Alto firewall parser

Here we show a sample for TRAFFIC logs of Palo Alto firewall logs.

<165>Mar 26 18:41:06 us-east-1-dc1-b-edge-fw 1,2025/03/26 18:41:06,007200001056,TRAFFIC,end,1,2025/03/26 18:41:06,192.168.41.30,192.168.41.255,10.193.16.193,192.168.41.255,allow-all,,,netbios-ns,vsys1,Trust,Untrust,ethernet1/1,ethernet1/2,To-Panorama,2025/03/26 18:41:06,8720,1,137,137,11637,137,0x400000,udp,allow,276,276,0,3,2025/03/26 18:41:06,2,any,0,2345136,0x0,192.168.0.0-192.168.255.255,192.168.0.0-192.168.255.255,0,3,0

Palo Alto messages get most of the header right (<priority>timestamp hostname), <165>Mar 26 18:41:06 us-east-1-dc1-b-edge-fw, omits the name of the application, then puts a long list of comma-separated values into the message body. The body begins with a version number (1), followed by a timestamp and a serial number.

block log parse_palo_alto() {
  filterx {
    # Check that the message includes the "1," and ",TRAFFIC," strings
    includes($MSG, "1,") and includes($MSG, ",TRAFFIC,");

    # Names of the columns in TRAFFIC logs
    declare palo_alto_traffic_columns = ["future_use1", "received_time", "serial_number", "type", "log_subtype", "version", "generated_time", "src_ip", "dest_ip", "src_translated_ip", "dest_translated_ip", "rule", "src_user", "dest_user", "app", "vsys", "src_zone", "dest_zone", "src_interface", "dest_interface", "log_forwarding_profile", "future_use3", "session_id", "repeat_count", "src_port", "dest_port", "src_translated_port", "dest_translated_port", "session_flags", "protocol", "action", "bytes", "bytes_sent", "bytes_received", "packets", "start_time", "elapsed_time", "http_category", "future_use4", "sequence_number", "action_flags", "src_location", "dest_location", "future_use5", "packets_sent", "packets_received", "session_end_reason", "devicegroup_level1", "devicegroup_level2", "devicegroup_level3", "devicegroup_level4", "vsys_name", "dvc_name", "action_source", "src_uuid", "dst_uuid", "tunnelid_imsi", "monitortag_imei", "parent_session_id", "parent_start_time", "tunnel", "assoc_id", "chunks", "chunks_sent", "chunks_received", "rule_uuid", "http2_connection", "link_change_count", "policy_id", "link_switches", "sdwan_cluster", "sdwan_device_type", "sdwan_cluster_type", "sdwan_site", "dynusergroup_name", "xff_ip", "src_category", "src_profile", "src_model", "src_vendor", "src_osfamily", "src_osversion", "src_host", "src_mac", "dst_category", "dst_profile", "dst_model", "dst_vendor", "dst_osfamily", "dst_osversion", "dst_host", "dst_mac", "container_id", "pod_namespace", "pod_name", "src_edl", "dst_edl", "hostid", "client_serialnumber", "src_dag", "dst_dag", "session_owner", "high_res_timestamp", "nssai_sst", "nssai_sd", "subcategory_of_app", "category_of_app", "technology_of_app", "risk_of_app", "characteristic_of_app", "container_of_app", "tunneled_app", "is_saas_of_app", "sanctioned_state_of_app", "offloaded", "flow_type", "cluster_name"];
    
    # parse the entire line, columns are type specific
    key_values = parse_csv($RAWMSG, columns=palo_alto_traffic_columns);

    # Verify that the TYPE field contains TRAFFIC
    key_values.type == "TRAFFIC";

    # Successfully classified a palo alto message
    $VENDOR = "paloalto";
    $PRODUCT = "firewall";

    # Set variables for Splunk sourcetype and index based on the value of the type field
    switch (key_values.type) {
      case "TRAFFIC":
          $SPLUNK_SOURCETYPE = "pan:traffic";
          $SPLUNK_INDEX = "netfw";
        break;
      # Add other sourcetypes/index for other message types
    };
  };
};

Note that Palo Alto firewalls have several different types of log messages in addition to TRAFFIC logs (15+ altogether), and each have their own unique list of columns that are included in the message. Also, the list of columns often changes between upgrades, so you need to check and update your parsers as needed.

SonicWall parser

<165> id=us-west-1-dc1-a-dmz-fw sn=C0EFE3336C80 time="2025-03-26 18:41:01" fw=192.168.1.239 pri=6 c=1024 gcat=6 m=537 msg="Connection Closed" srcMac=00:50:56:f5:50:27 src=10.237.228.74:54406:X20 srcZone=Trusted natSrc=192.168.1.239:38377 dstMac=00:1a:f0:8b:e0:18 dst=44.190.129.212:123:X2 dstZone=Untrusted natDst=44.190.129.212:123 proto=udp/ntp sent=152 rcvd=152 spkt=2 rpkt=2 cdur=30250 rule="22 (LAN->WAN)" n=490872197 fw_action="NA" dpi=0

Begins with a <priority> field, followed by 1 or 2 spaces, and a long list of space-separated key=value pairs. The first such field is the id field, which contains the hostname followed by a serial number field (sn). Let’s see a FilterX block for this:

@include "scl.conf"
block log parse() {
  # Conditionals to go through the custom parsers
  if   { parse_fortigate(); }
  elif { parse_palo_alto(); }
  elif { parse_sonicwall(); };
};
source s_network {
  default-network-drivers(
	flags(store-raw-message) # Needed to parse messages that are very much non-syslog compliant, like the Sonicwall messages
  );
};
destination d_splunk_hec_event {
  splunk-hec-event(
	url("https://localhost:8088")
	token("70b6ae71-76b3-4c38-9597-0c5b37ad9630")
	sourcetype($SPLUNK_SOURCETYPE)
	index($SPLUNK_INDEX)
	default-index("netops")
  );
};
log {
  source(s_network);
  parse();
  destination(d_splunk_hec_event);
};

Combining the blocks

Let’s create the other parts of the configuration that are needed to use the parsers we’ve created:

A block with if-elif conditionals to group the parsers.
Unlike traditional programming languages, FilterX was purposefully built for such use cases. As a result, these conditionals can be evaluated effectively, and having to backtrack when a parser doesn’t match (drop any failed or partial results and try the next conditional on the original message) has only a minor performance impact.
A generic network source to receive the messages.
A Splunk destination that will use the sourcetype and index set in the parsers, and
a log path to tie all these together.

Are we done yet?

I think the above parsing examples highlight a few important things:

With the right tool (for example, AxoSyslog), message parsing and classification can be done effectively.
However, even the best tool requires significant know-how about:
- the tool itself,
- the specific log messages you need to process, and also a
- good overview of the parsers you’ve already created to avoid overlaps and misclassification of similar messages, and also to be able to maintain them (because the message formats might change after a product upgrade).
Creating and maintaining the parsers for many applications or devices is a complex and time-consuming effort, especially for high-volume message sources that need performance optimizations (like firewalls).

If you don’t have the resources to tackle the problem, an alternative is to outsource these efforts to a product dedicated to managing your security data pipeline that includes optimized message parsing for over a hundred ubiquitous products, like Axoflow Platform. That way you solve your data parsing issues, and can also:

Deliver high-quality, optimized data to your SIEM
Gain insight into the status of your data pipeline to avoid losing data
Remove noise and redundant data to cut your SIEM and storage costs

Summary

Parsing messages and other security data is still an ongoing issue for organizations. The open source FilterX data processing engine of AxoSyslog provides an effective way to parse and classify your data, and is a great choice if you want to build a classification database. However, creating and maintaining such a database requires significant time, effort, and know-how. If your organization prefers, you can outsource this to Axoflow Platform, our security data curation pipeline.

About DT Asia

DT Asia began in 2007 with a clear mission to build the market entry for various pioneering IT security solutions from the US, Europe and Israel.

Today, DT Asia is a regional, value-added distributor of cybersecurity solutions providing cutting-edge technologies to key government organisations and top private sector clients including global banks and Fortune 500 companies. We have offices and partners around the Asia Pacific to better understand the markets and deliver localised solutions.

How we help

If you need to know more about parsing firewall logs with FilterX, you’re in the right place, we’re here to help! DTA is Axoflow’s distributor, especially in Singapore and Asia, our technicians have deep experience on the product and relevant technologies you can always trust, we provide this product’s turnkey solutions, including consultation, deployment, and maintenance service.

Click here and here and here to know more: https://dtasiagroup.com/axoflow/