Categories
Admin Firewall

The IT Detective Agency: large packets dropped by firewall, but logs show OK

Intro
All of a sudden one day I could not access the GUI of one my security appliances. It had only worked yesterday. CLI access kind of worked – until it didn’t. It was the standby part of a cluster so I tried the active unit. Same issues. I have some ill-defined involvement with the firewall the traffic was traversing, so I tried to debug the problem without success. So I brought in a real firewall expert.

More details
Of course I knew to check the firewall logs. Well, they showed this traffic (https and ssh) to have been accepted, no problems. Hmm. I suspected some weird IPS thing. IPS is kind of a big black box to me as I don’t deal with it. But I have seen cases where it blocks traffic without logging the fact. But that concern led me to bring in the expert.

By myself I had gotten it to the point where I had done tcpdump (I had totally forgotten how to use fw monitor. Now I will know and refer to my own recent blog post) on the corporate network side as well as the protected subnet side. And I saw that packets were hitting the corporate network interface that weren’t crossing over to the protected subnet. Why? But first some more about the symptoms.

The strange behaviour of my ssh session
The web GUI just would not load the home page. But ssh was a little different. I could indeed log in. But my ssh froze every time I changed to the /var/log directory and did a detailed directory listing ls -l. The beginning of the file listing would come back, and then just hang there mid-stream, frozen. In my tcpdump I noticed that the packets that did not get through were larger than the ones sent in the beginning of the session – by a lot. 1494 data bytes or something like that. So I could kind of see that with ssh, you normally send smallish packets, until you need a bigger one for something like a detailed directory listing! And https sends a large server certificate at the beginning of the session so it makes sense that it would hang if those packets were being stopped. So the observed behaviour makes sense in light of the dropping of the large packets. But that doesn’t explain why.

I asked a colleague to try it and they got similar results.

The solution method
It had nothing to do with IPS. The firewall guy noticed and did several things.

  • He agreed the firewall logs showed my connection being accepted.
  • He saw that another firewall admin had installed policy around the time the problem began. We analyzed what was changed and concluded that was a false lead. No way those changes could have caused this problem.
  • He switched the active firewall to standby so that we used the standby unit. It worked just fine!
  • He observed that the current active unit became active around the time of the problem, due to a problem with an interface on the normally active unit.

I probably would have been fine to just work using the standby but I didn’t want to crimp his style, so he continued in investigating…and found the ultimate root cause.

And finally the solution
He noticed that on the bad firewall the one interface – I swear I am not making this up – had been configured with a non-standard MTU! 1420 instead of 1500.

Analysis
I did a head slap when he shared that finding. Of course I should have looked for that. It explains everything. The OS was dropping the packet, not the firewall blade per se. And I knew the history. Some years back these firewalls were used for testing OLTV, a tunneling technology to extend layer 2 across physically separated subnets. That never did work to my satisfaction. One of the issues we encountered was a problem with large packets. So the firewall guy at the time tried this out to help. Normally firewalls don’t fail so the one unit where this MTU setting was present just wasn’t really used, except for brief moments during OS upgrade. And, funny to say, this mis-configuration was even propagated from older hardware to newer! The firewall guys have a procedure where they suck up all the configuration from the old firewall and restore to the newer one, mapping updated interface names, etc, as needed.

Well, at least we found it before too many others complained. Though, as expected, complain they did, the next day.

Aside: where is curl?
I normally would have tested the web page from the firewall iself using curl. But curl has disappeared from Gaia v 80.20. And there’s no wget either. How can such a useful and universal utility be missing? The firewall guy looked it up and quickly found that instead of curl, they have curl_cli. Who knew?

Conclusion
The strange case of the large packets dropped by a firewall, but not by the firewall blade, was resolved the same day it occurred. It took a partner ship of two people bringing their domain-specific knowledge to bear on the problem to arrive at the solution.

Categories
Admin Firewall Network Technologies Security

Linux shell script to cut a packet trace every 10 minutes on Checkpoint firewall

Intro
Scripts are normally not worth sharing because they are so easy to construct. This one illustrates several different concepts so may be of interest to someone else besides myself:

  • packet trace utility in Checkpoint firewall Gaia
  • send Ctrl-C interrupt to a process which has been run in the background
  • giving unqieu filenames for each cut
  • general approach to tacklnig the challenge of breaking a potentially large output into manageable chunks

The script
I wanted to learn about unexpected VPN client disconnects that a user, Sandy, was experiencing. Her external IP is 99.221.205.103.

while /bin/true; do
# date +%H%M inserts the current Hour (HH) and minute (MM).
 file=/tmp/sandy`date +%H%M`.cap
# fw monitor is better than tcpdump because it looks at all interfaces
 fw monitor -o $file -l 60 -e "accept src=99.221.205.103 or dst=99.221.205.103;" &
# $! picks up the process number of the command we backgrounded just above
 pid=$!
 sleep 600
 #sleep 90
 kill $pid
 sleep 3
 gzip $file
done

This type of tracing of this VPN session produces about 20 MB of data every 10 minutes. I want to be able to easily share the trace file afterwards in an email. And smaller files will be faster when analyzed in Wireshark.

The script itself I run in the background:
# ./sandy.sh &
And to make sure I don’t get logged out, I just run a slow PING afterwards:
# ping ‐i45 1.1.1.1

Alternate approaches
In retrospect I could have simply used the -ci argument and had the process terminate itself after a certain number of packets were recorded, and saved myself the effort of killing that process. But oh well, it is what it is.

Small tip to see all packets
Turn acceleration off:
fwaccel stat
fwaccel off
fwaccel on (when you’re done).

Conclusion
I share a script I wrote today that is simple, but illustrates several useful concepts.

References and related
fw monitor cheat sheet.

The standard packet analyzer everyone uses is Wireshark from https://wireshark.org/