Categories
Admin Linux Network Technologies Web Site Technologies

The IT Detective Agency: This site can’t be reached

Intro

It’s been awhile since I’ve had the opportunity to relatean IT mystery. After awhile they are repates of what’s already happened in the past, or it’s too complex to relate, or I was only peripherally involved. But today I came across a good one. It falls into the never been seen before category.

The details

A web server behind my web application firewall became unreachable. In the browser they get a message This site can’t be reached. The app owners came to me looking for input. I checked the WAF and it was fine. The virtual server was looking healthy. So I took a packet trace, something to this effect:

$ tcpdump -nni 0.0 host 192.168.2.124

14:00:45.180349 IP 192.68.1.13.42045 > 192.68.2.124.443: Flags [S], seq 1106553901, win 23360, options [mss 1460,sackOK,TS val 3715803515 ecr 0], length 0 out slot1/tmm3 lis=/Common/was90extqa.drjohn.com.app/was90extqa.drjohn.com_vs port=0.53 trunk=
14:00:45.181081 IP 192.68.2.124 > 192.68.1.13: ICMP host 192.68.2.124 unreachable - admin prohibited filter, length 64 in slot1/tmm2 lis= port=0.47 trunk=
14:00:45.181239 IP 192.68.1.13.42045 > 192.68.2.124.443: Flags [R.], seq 1106553902, ack 0, win 0, length 0 out slot1/tmm3 lis=/Common/was90extqa.drjohn.com.app/was9
0extqa.drjohn.com port=0.53 trunk=

I’ve never seen that before, ICMP host 192.68.2.124 unreachable – admin prohibited filter. But I know ICMP can be used to relay out-of-band routing information on occasion, though I do not see it often. I suspect it is a BAD THING and forces the connection to be shut down. Question is, where was it coming from?

The communication is via a firewall so I check the firewall. I see a little more traffic so I narrow the filter down:

$ tcpdump -nni 0.0 host 192.168.2.124 host 443

And then I only see the initial SYN packet followed by the RST – from the same source IP! So since I didn’t see the bad ICMP packet on the firewall, but I do see it on the WAF, I preliminarily conclude the problem exists on the WAF.

Rookie mistake! Did you fall for it? So very, very often, in the heat of debugging, we invent some unit test which we’ve never done before, and we have to be satisified with the uncertainty in the testing method and hope to find a control test somehow, somewhere to validate our new unit test.

Although I very commonly do compound filters, in this case it makes no sense, as I realized a few minutes later. My port 443 filter would of course exclude logging the bad ICMP packets because ICMP does not use tcp port 443! So I took that out and re-run it. Yup. bad ICMP packet still present on the firewall, even on the interface of the firewall directly connected to the server.

So at this point I have proven to my satisfaction that this packet, which is ruining the communication, really comes frmo the server.

What the server guys say

Server support is outsourced. The vendor replies

As far as the patching activities go , there is nothing changed to the server except distro upgrading from 15.2 to 15.3. no other configs were changed. This is a regular procedure executed on almost all 15.2 servers in your environment. No other complains received so far…

So, the usual It’s not us, look somewhere else. So the app owner asks me for further guidance. I find it’s helpful to create a test that will convince the other party of the error with their service. And what is one test I would have liked to have seen but didn’t cnoduct? A packet trace on the server itself. So I write

I would suggest they (or you) do a packet trace on the server itself to prove to themselves that this server is not behaving ini an acceptable way, network-wise, if they see that same ICMP packet which I see.

The resolution

This kind of thing can often come to a stand-off, or many days can be wasted as an issue gets escalated to sufficiently competent technicians. In this case it wasn’t so bad. A few hours later the app owners write and mention that the home-grown local firewall seemed suspect to them. They dsabled it and this traffic began to work.

They are reaching out to the vendor to understand what may have happened.

Case: closed!

Conclusion

An IT mystery was resolved today – something we’ve never seen but were able to diagnose and overcome. We learned it’s sometimes a good thing to throw a wider net when seeing unexpected reset packets because maybe just maybe there is an ICMP host unreachable packet somewhere in the mix.

Most firewalls would just drop packets and you wait for a timeout. But this was a homegrown firewall running on SLES 15. So it abides by its own ways of working, I guess. So because of the RST, your connection closes quickly, not timing out as with a normal network firewall.

As always, one has to maintain an open mind as to the true source of an issue. What was working yesterday does not today. No one admits to changing anything. Finding clever ad hoc unit tests is the way forward, and don’t forget to validate the ad hoc test. We use curl a lot for these kinds of tests. A browser is a complex beast and too much of a black box.

Categories
Admin Network Technologies TCP/IP

Verizon Airspeed Hotspot uses ipv6 and interferes with VPN client Global Protect

Intro

The headline says it all. I got my shiny brand new Verizon hotspot from Walmart. I managed to activate it and add it to my Verizon account (not super easy, but after a few stumbles it did work.) I tried it out my home PC – works fine. I tried it out on my work PC. No good. My Global Protect connection was unstable. It connects for about a minute, then disconnects, then connects, etc. Basically unusable.

The details

I have heard of possible problem with the GP client (version 5.2.11) and IPv6. So I looked to see if this hotspot could be handing out IPv6 info. Yes. It is. But is that really making a difference? I concocted a simple test. I disabled IPv6 on my Wi-Fi adapter, then re-tested the GP client. The connection was smooth as glass! No disconnects!

Disable ipv6 on your Wi-Fi adapter

Bring up a powershell as administrator. Then:

get-netadapterbinding -componentid ms_tcpip6

will show you the current state of ipv6 on your adapters.

disable-netadapterbinding -Name “Wi-Fi” -ComponentID ms_tcpip6

will disable ipv6 on your Wi-Fi. And

enable-netadapterbinding -Name “Wi-Fi” -ComponentID ms_tcpip6

will re-enable it.

ipconfig /all output

For the record, here are some interesting bits from running ipconfig /all:

Wireless LAN adapter Wi-Fi:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) Dual Band Wireless-AC 8265
Physical Address. . . . . . . . . : 0C-BD-94-98-11-5B
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Temporary IPv6 Address. . . . . . : 2600:1001:b004:2b78:8ab:145c:d014:2edd(Deprecated)
IPv6 Address. . . . . . . . . . . : 2600:1001:b004:2b78:2cc0:71b0:7f1e:a973(Deprecated)
Link-local IPv6 Address . . . . . : fe80::2cc0:71b0:7f1e:a973%30(Preferred)
IPv4 Address. . . . . . . . . . . : 192.168.1.103(Preferred)

Subnet Mask . . . . . . . . . . . : 255.255.255.0
Lease Obtained. . . . . . . . . . : Thursday, April 21, 2022 4:54:04 PM
Lease Expires . . . . . . . . . . : Friday, April 22, 2022 4:54:04 AM
Default Gateway . . . . . . . . . : 192.168.1.1
DHCP Server . . . . . . . . . . . : 192.168.1.1
DHCPv6 IAID . . . . . . . . . . . : 302832932
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-28-89-F6-8E-B0-5C-DA-E6-09-0A
DNS Servers . . . . . . . . . . . : fe80::50ae:caff:fea8:1dbc%30
192.168.1.1
NetBIOS over Tcpip. . . . . . . . : Enabled

But, having done all that, I can only occasionally connect to GP. It seems to work slightly better at night. ipv6 does not seem to be the sole hiccup. No idea what the recipe for reliable success is. If I ever learn it I will publish it. Meanwhile, my phone’s hotspot, also VErizon, also handing out ipv6 info, usually permits me to connect to GP. It’s hard to see the difference.

Conclusion

The Verizon Airspeed Hotspot sends out a mix of IPv6 and IPv4 info to dhcp clients. Palo Alto Networks’ Global Protect client does not play well with that setup and wil not have a stable connection.

I do not think there is a way to disable IPv6 on the hotspot. However, for those with admin access it can be disabled on a Windows PC. And then GP will work just fine. Or not.

Oh, and by the way, otherwise the Airspeed works well and is an adequate solution where you need a good reliable hotspot. Well, in fact, don’t expect reliability like you have from a wired connection. After a couple hours, all users just got dropped for no apparent reason whatsoever.

Categories
Admin Linux Raspberry Pi

Scripts checker

Intro

Imagine an infrastructure team empowered to create its own scripts to do such things as regularly update external dynamic lists (EDLs) or interact with APIs in an automated fashion. At some point they will want to have a meta script in place to check the output of the all the automation scripts. This is something I developed to meet that need.

I am getting tired of perl, and I still don’t know python, so I decided to enhance my bash scripting for this script. I learned some valuable things along the way.

checklogs.sh

I call the script checklogs.sh Here it is.

                    

#!/bin/bash
# DrJ 12/2021
# it is desired to run this using the logrotate mechanism
#
# logrotate invokes with /bin/sh so we have to do this trick...
if [ ! "$BASH_VERSION" ] ; then
  exec /bin/bash "$0" "$@"
  exit
fi
DIR=$(cd $(dirname $0);pwd)
INI=$DIR/log.ini
DAY=2 # Day of week to analyze full week of logs. Monday is 1, Tuesday 2, etc
DEBUG=0
maxdiff=10
maxerrors=10
minstarts=10
TMPDIR=/var/tmp
cd $TMPDIR
recipients="john@drjetc.com"
#
checklog2() {
  [[ "$DEBUG" -eq "1" ]] && echo ID, $ID, LPATH, $LPATH, START, $START, ERROR, $ERROR, END, $END
  LPATH="${LPATH}${wildcard}"
  zgrep -Ec "$START" ${LPATH}|cut -d: -f2|while read sline; do starts=$((starts + sline));echo $starts>starts; done
  zgrep -Ec "$END" ${LPATH}|cut -d: -f2|while read sline; do ends=$((ends + sline));echo $ends>ends; done
  zgrep -Ec "$ERROR" ${LPATH}|cut -d: -f2|while read sline; do errors=$((errors + sline));echo $errors>errors; done
  exampleerrors=$(zgrep -E "$ERROR" ${LPATH}|head -10)
  starts=$(cat starts)
  ends=$(cat ends)
  errors=$(cat errors)
  info="${info}===========================================
$ID SUMMARY
  Total starts: $starts
  Total finishes: $ends
  Total errors: $errors
  Most recent errors: "
  info="${info}${exampleerrors}
==========================
"
  unset NEW
# get cumulative totals
  starttot=$((starttot + starts))
  endtot=$((endtot + ends))
  errortot=$((errortot + errors))
  [[ "$DEBUG" -eq "1" ]] && echo starttot, $starttot, endtot, $endtot, errortot, $errortot
  [[ "$DEBUG" -eq "1" ]] || rm starts ends errors
} # end of checklog2 function

checklog() {
# clear out stats and some variables
starttot=0;endtot=0;errortot=0;info=""
#this IFS and following line is trick to preserve those darn backslash charactes in the input file
IFS=$'\n'
for line in $(<$INI); do
  [[ "$line" =~ ^# ]] || {
  pval=$(echo "$line"|sed s'/: */:/')
  lhs=$(echo $pval|cut -d: -f1)
  rhs=$(echo "$pval"|cut -d: -f2-)
  lhs=$(echo $lhs|tr [:upper:] [:lower:])
  [[ "$DEBUG" -eq "1" ]] && echo line is "$line", pval is $pval, lhs is $lhs, rhs is "$rhs"
  if [ "$lhs" = "identifier" ]; then
    [[ "$DEBUG" -eq "1" ]] && echo matched lhs = identifer section
    [[ -n "$NEW" ]] && checklog2
    ID="$rhs"
  fi
  [[ "$lhs" = "path" ]] && LPATH="$rhs" && NEW=false
  [[ "$lhs" = "error" ]] && ERROR="$rhs"
  [[ "$lhs" = "start" ]] && START="$rhs"
  [[ "$lhs" = "end" ]] && END="$rhs"
  }
done
# call one last time at the end
checklog2
} # end of checklog function

anomalydetection() {
# a few tests - you can always come up with more...
  diff=$((starttot - endtot))
  [[ $diff -gt $maxdiff ]] || [[ $starttot -lt $minstarts ]] || [[ $errortot -gt $maxerrors ]] && {
    ANOMALIES=1
    [[ "$DEBUG" -eq "1" ]] && echo ANOMALIES, $ANOMALIES, starttot, $starttot, endtot, $endtot, errortot, $errortot
  }
} # end function anomalydetection

sendsummary() {
  subject="Weekly summary of automation scripts - please review"
  [[ -n "$ANOMALIES" ]] && subject="${subject} - ANOMALIES DETECTED PLEASE REVIEW CAREFULLY!!"

  intro="This summarizes the results from the past week of running automation scripts on script server.
Please check that values seem reasonable. If things are out of range, look at the script server.

"

  [[ "$DEBUG" -eq "1" ]] && echo subject, $subject, intro, "$intro", info, "$info"
  [[ "$DEBUG" -eq "1" ]] && args="-v"
  echo "${intro}${info}"|mail "$args" -s "$subject" $recipients
} # end function sendsummary

# MAIN PROGRAM
# always check the latest log
checklog
anomalydetection

# only check all logs if it is certain day of the week. Monday = 1, etc
day=$(date +%u)
[[ "$DEBUG" -eq "1" ]] && echo day, $day
[[ $day -eq $DAY ]] || [[ -n "$ANOMALIES" ]] && {
  [[ "$DEBUG" -eq "1" ]] && echo calling checklog with wildcard set
  wildcard='*'
  checklog
  sendsummary
}

[[ "$DEBUG" -eq "1" ]] && echo message so far is "$info"

log.ini
                    

# The suggestion: To have a configuration file with log identifiers
#(e.g. “anydesk-edl”) and per identifier: log file path (“/var/log/anydesk-edl.log”),
# error pattern (“.+\[Error\].+”), start pattern (“.+\[Notice\] Starting$”) end pattern (“.+\[Notice\] Done$”).
#Then just count number of executions (based on start/end) and number of errors.

# the start/end/error values are interpreted as extended regular expressions - see regex(7) man page
identifier: anydesk-edl
path: /var/log/anydesk-edl.log
error: .+\[Error\].+
start: .+\[Notice\] Starting$
end: .+\[Notice\] Done$

identifier: firewall-requester-to-edl
path: /var/log/firewall-requester-to-edl.log
error: .+\[Error\].+
start: .+\[Notice\] Starting$
end: .+\[Notice\] Done$

identifier: sase-ips-to-bigip
path: /var/log/sase-ips-to-bigip.log
error: .+\[Error\].+
start: .+\[Notice\] Starting$
end: .+\[Notice\] Done$

What this script does

So when the guy writes an automation script, he is so meticulous that he follows the same convention and hooks it into the syslogger to create uniquely named log files for it. He writes out a [Notice] Starting when his script starts, and a [Notice] Done when it ends. And errors are reported with an [Error] details. Some of the scripts are called hourly. So we agreed to have a script that checks all the other scripts once a week and send a summary email of the results. I look to see that the count of starts and ends is roughly the same, and I report back the ten most recent errors from a given script. I also look for other basic things. That's the purpose of the function anomalydetection in my script. It's just basic tests. I didn't want to go wild.

But what if there was a problem with one of the scripts, wouldn't we want to know sooner than possibly six days later? So I decided to have my script run every day, but only send email on the off days if an anomaly was detected. This made the logic a tad more complex, but nothing bash and I couldn't handle. It fits the need of an overworked operational staff.

Techniques I learned and re-learned from developing this script

cron scheduling - more to it than you thought

I used to naively think that it suffices to look into the crontab files of all users to discover all the scheduled processes. What I missed is thinking about how log rotate works. How does it work? Turns out there is another section of cron for jobs run daily, weekly and monthly. logrotate is called from cron.daily.

logrotate - potential to do more

The person who wrote the automation scripts is a much better scripter than I am. I didn't want to disappoint so I put in the extra effort to discover the best way to call my script. I reasoned that logrotate would offer the opportunity to run side scripts, and I was absolutely right about that! You can run a script just before the logs rotate, or just after. I chose the just before timing - prerotate. In actual fact logrotate calls the prerotate script with all the log files to be rotate as arguments, which you notice we don't take advantage of, because at the time we were unsure how we were going to interface. But I figure let's just leave it now. man logrotate to learn more.

By the way although I developed on a generic Debian system, it should work on a Raspberry Pi as well since it is Debian based.

BASH - the potential to do more, at a price

You'll note that I use some bash-specific extensions in my script. I figure bash is near universal, so why not? The downside is that when logrotate invokes an external script, it calls is using old-fashioned shell. And my script does not work. Except I learned this useful trick:

if [ ! "$BASH_VERSION" ]; then
  exec /bin/bash "$0" "$@"
  exit
fi

Note this is legit syntax in SHELL and a legit conditional operator expression. So it means if you - and by you I mean the script talking about itself - are invoked via SHELL, then invoke yourself via BASH and exit the parent afterwards. And this actually does work (To do: have to check which occurs first, the syntax checking or the command invocation).

More

Speaking of that conditional, if you want to know all the major comparison tests, do a man test. I have around to use the double bracket expressions [[ more and more, though they are BASH specific I believe. The double bracket can be followed by a && and then an open curly brace { which can introduce a block of code delimited of course by a close curly brace }. So for me this is an attractive alternative to SHELL's if conditional then code block fi syntax, and probably just slightly more compact. Replace && with || to execute the code block when the condition does not evaluate to be true.

zgrep is grep for compressed files, but we knew that right? But it's agnostic - it works like grep on both compressed and uncompressed files. That's important because with rotated logs you usually have a combination of both.

Now the expert suggested a certain regular expression for the search string. It wasn't working in my first pass. I reasoned that zgrep may have a special mode to act more like egrep which supports extended regular expressions (EREs). EREs aren't really the same as perl-compatible regular expressions (PCREs) but for this kind of simple stuff we want, they're close enough. And sure enough zgrep has the -E option to force it to interpret the expression as an ERE. Great.

RegEx

So in the log.ini file the regular expression has a \[...\] syntax. The backslash is actually required because otherwise the [...] syntax is interpreted as a character class, where all the characters between the brackets get tried to match a single character in the string to be matched. That's a very different match!

My big thing was - will I have to further escape those lines read in from log.ini, perhaps to replace a \[ with a \\[? Stuff like that happens. I found as long as I used those double quotes around the variables (see below) I did not need to further escape them. Similarly, I found that the EREs in log.ini did not need to be placed between quotes though the guy initially proposed that. It looks cleaner without them.

Variable scope

I wasted a lot of time on a problem which I thought may be due to some weird variable scoping. I've memorized this syntax cat file|while read line; do etc, etc so I use it a lot in my tiny scripts. It's amazing I got away with it as much as I have because it has one huge flaw. if you start using variables within the loop you can't really suck them out, unless you write them to a file. So while at first I thought it was a problem of variable scoping - why do my loop variables have no values when the code comes out of the loop? - it really isn't that issue. It's that the pipe, |, created a forked process which has its own variables. So to avoid that I switched to this weird syntax for line in $(<$INI); do etc. So it does the line-by-line file reading as before but without the pipe and hence without the "variable scope" problem.

But in another place in the script - where I add up numbers - I felt I could not avoid the pipe. So there I do write the value to a file.

The conclusion is that with the caveat that if you know what you're doing, all variables have global scope, and that's just as it should be. Hey, I'm from the old Fortran 66/77 school where we were writing Monte Carlos with thousands of lines of code and dozens of variables in a COMMON block (global scope), and dozens of contributors. It worked just fine thank you very much. Because we knew what we were doing.

Adding numbers in bash

Speaking of adding, I can never remember how to add numbers (integers). In bash you can do starts=$((starts + sline)) , where starts and sline are integers. At least this worked in Debian linux Stretch. I did not really get the same to work so well in SLES Linux - at least not inside a loop where I most needed it.

When you look up how to add numbers in bash there are about a zillion different ways to do it. I'm trying to stick to the built-in way.

Sending mail in Debian linux

You probably need to configure a smarthost if you haven't used your server to send emails up until now. You have to reconfigure of the exim4 package:

dpkg-reconfigure exim4-config

This also can be done on a RPi if you ever find you need for it to send out emails.

Variables

If a variables includes linebreaks and you want to see that, put it between double-quotes, e.g., echo "$myVariableWithLineBreaks". If you don't do that it seems to remove the linebreaks. Use of the double quotes also seems to help avoid mangling variables that contain meta characters found in regular expressions such as .+ or \[.

Result of executing the commands

I grew up using the backtick metacharacter, `, to indicate that the enclosed command should be executed. E.g., old way:

DIR=`dirname $0`

But when you think about it, that metacharacter is small, and often you are unlucky and it sits right alongside a double quote or a single quote, making for a visual trainwreck. So this year I've come to love the use of $(command to be executed) syntax instead. It offers much improved readability. But then the question became, could I nest a command within a command, e.g., for my DIR assignment? I tried it. Now this kind of runs counter to my philosophy of being able to examine every single step as it executes because now I'm executing two steps at once, but since it's pretty straightforward, I went for it. And it does work. Hence the DIR variable is assigned with the compound command:

DIR=$(cd $(dirname $0);pwd)

So now I wonder if you can go more than two levels deep? Each level is an incrementally bad idea - just begging for undetectable mistakes, so I didn't experiment with that!

By the way the reason I needed to do that is that the script jumps around to another directory to create temporary files, and I wanted it to be able to reference the full path to its original directory, so a simpler DIR=$(dirname $0) wasn't going to cut it if it's called with a relative path such as ./checklogs.sh

Debugging

I make mistakes left and right. But I know what results I expect. So I generously insert statements as variables get assigned to double check them, prefacing them with a conditional [[ $DEBUG -eq 1 ]] && print out these values. As I develop DEBUG is set to 1. When it's finally working, I usually set it to 0, though in some script I never quite reach that point. It looks like a lot of typing, but it's really just cut and paste and not over-thinking it for the variable dump, so it's very quick to type.

Another thing I do when I'm stuck is to watch as the script executes in great detail by appending -xv to the first line, e.g., #!/bin/bash -xv. But the output is always confusing. Sometimes it helps though.

Techniques I'd like to use in the future

You can assign a function to a variable and then call that variable. I know that will have lots of uses but I'm not used to the construct. So maybe for my next program.

Conclusion

This fairly simple yet still powerful script has forced me to become a better BASH shell scripter. In this post I review some of the basics that make for successful scripting using the BASH shell. I feel the time invested will pay off as there are many opportunities to write such utility scripts. I actually prefer bash to perl or python for these tasks as it is conceptually simpler, less ambitious, less pretentious, yes, far less capable, but adequate for my tasks. A few rules of the road and you're off and running! bash lends itself to very quick testing cycles. Different versions of bash introduced additional features, and that gets trying. I hope I have found and utilized some of the basic stuff that will be available on just about any bash implementation you are likely to run across.

References and related

The nitty gritty details about BASH shell can be gleaned by doing a man bash. It seems daunting at first but it's really not too bad once you learn how to skim through it.

Categories
Admin Firewall Linux

Linux: how to estimate bandwidth usage to a particular subnet

Intro

Let’s say someone asks you to estimate the total bandwith used by a particular subnet, or a particular service such as https on port 443. I provide a crude way to do that using tcpdump on a not-too-busy server.

The code

I call it bandwidth.sh. By the way, I ran it on a Checkpoint Gaia appliance so it works there as well.

                    

#!/bin/bash
# DrJ 11/21
sleep=120
file=/tmp/ctpackets
sum() {
sum=0
cat $file|while read line; do
 length=$(echo $line|awk '{print $17}'|sed 's/)//')
 sum=$(expr $sum + $length)
 echo $sum
done
}
while /bin/true; do
tcpdump -c1000 -v -nni eth1 net 216.71/16 > $file
#10:29:49.471455 IP (tos 0x0, ttl 126, id 32399, offset 0, flags [none], proto: UDP (17), length: 105) 10.32.25.126.3391 > 216.71.170.32.61445: UDP, length 77
total=$(sum|tail -1)
t0=$(head -1 $file|awk '{print $1}')
t1=$(tail -1 $file|awk '{print $1}')
h0=$(echo $t0|cut -d: -f1|sed 's/^0//')
h1=$(echo $t1|cut -d: -f1|sed 's/^0//')
m0=$(echo $t0|cut -d: -f2|sed 's/^0//')
m1=$(echo $t1|cut -d: -f2|sed 's/^0//')
s0=$(echo $t0|cut -d: -f3|sed 's/^0//')
s1=$(echo $t1|cut -d: -f3|sed 's/^0//')
s0=$(echo $s0|cut -d\. -f1|sed 's/^0//')
s1=$(echo $s1|cut -d\. -f1|sed 's/^0//')
[ -z "$h0" ] && h0=0
[ -z "$h1" ] && h1=0
[ -z "$m0" ] && m0=0
[ -z "$m1" ] && m1=0
[ -z "$s0" ] && s0=0
[ -z "$s1" ] && s1=0
t0secs=$((3600*$h0+60*$m0+$s0))
t1secs=$((3600*$h1+60*$m1+$s1))
#echo total bytes: $total
elapsed=$(($t1secs-$t0secs))
#echo elapsed time: $elapsed
kbps=$(($total*8/$elapsed/1000))
echo $kbps kbps at $(date)
sleep $sleep
done

The idea

Running tcpdump with the -v switch gives us packet length. We find that length and sum it up. Here we used a filter epxression of 216.71/16 to capture only the traffic from that subnet.

The number of packets to capture has to be tuned to how busy it gets. Now it’s set to only capture 1000 packets. And you see my crude timings are truncated at the second. So 1000 packets in one second or about 1.5 MBytes/sec = 12 Mbps is the maximum sensitivy of this approach. I doub it will really work for interfaces with more thn 100 Mbps, even after you scaled up the count (and don’t forget to change the denominator in the kbps line!

Here’s a sample output:

1000 packets captured
2002 packets received by filter
0 packets dropped by kernel
5 kbps at Wed Nov 3 12:09:45 EDT 2021

I think it’s important to note the number of packets dropped by the kernel. So if it gets too busy as I underatdn it, it will at least try to tell yuo that it couldn’t capture all the data and at that point you can no longer trust this method. Perhaps with enhanced statistical methods it could be salvaged.

I don’t run it continuously to also give the kernel a breather. It probably doesn’t make much difference, but every two minutes seems plenty frequent to me…

Conclusion

We have demonstrated a crude but better-than-nothing script to calculate bandwidth for a given tcpdump filter expression. It won’t win any awards, but it contains some worthwhile ideas. And it seems to work at low bandwidth levels.

Categories
Admin DNS Firewall Network Technologies TCP/IP

The IT detective agency: named times out tcp queries

Intro

I’ve been reliable running ISC’s BIND server for eons. Recently I had a problem getting my slave servers updated after a change to the primary master. What was going on there?

The details

This was truly a team effort. I saw that the zone file had differing serial numbers on the master versus the slave servers. My attempts to update via an rndc refresh zone was having no effect.

So I tried a zone transfer by hand: dig axfr drjohnstechtalk.com @50.17.188.196

That timed out!

Yet, regular dns qeuries went through fine: dig ns drjohnstechtakl.com @50.17.188.196

I thought about it and remembered zone transfers use TCP whereas standard queries use UDP. So I tried a TCP-based simple query: dig +tcp ns drjohnstechtalk.com @50.17.188.196. It timed out!

So of course one suspects the firewall, which is reasonable enough. And when I looked at the firewal I found some funny drops, though i cuoldn’t line them up exactly with my failed tests. But I’m not a firewall expert; I just muddle through.

The next day someone from the DNS group asked how local queries behaved? Hmm. never tried that. So I tried it: dig +tcp ns drjohnstechtalk.com @localhost. That timed out as well! That was a brilliant suggestion as we now could eliminate the firewall and all that complexity from the equation. Because I had tried to do packet traces on two different machines at the same time and line up the results. It wasn’t easy.

The whole issue was very concerning to us because we feared our secondaries would be unable to pudate their slave zones and ultimately time them out. The result would be devastating.

We have support, fortunately. A company that hearkens frmo the good old days, with real subject matter experts. But they’re extremely busy. We did not get a suggestion for a couple weeks. But eventually we did. They had seen this once before.

named time to respond to TCP-based queries

The above graph is from a Zabbix monitor showing how long it takes that dns server to respond to that simple query. 6 s is a time-out. I actually set dig to timeout at 2 s, but in wall-clock time it actually takes 6 s.

The fix

We removed this line from the options block of named.conf:

keep-response-order {any; };

The info fmo the experts is that most likely that was configured as a workaround to CVE-2019-6477 but that issue was fixed since 9.15.6.

Conclusion

We encountered the named daemon in a situation where it was unable to respond to TCP-based DNS queries and hence unable to do zone transfers. So although most queries use UDP, this was a serious issue for us and prevented zones from being updated on all authoritative nameservers.

As is the case with so many modern IT problems, the effect was not black or white. Failures were intermittent, and then permanent. A restart fixed ths issue (forgot to mention so far!). But we involved an expert to find the root cause and it was the presence of a single configuration line in our named.conf. After removing that all was good.

Categories
Admin

Git commands cheat sheet

Intro

This is the list of git commands I compiled.

Create new local GIT repository

git init [project name]

Copy a repository

git clone username@host:/path/to/repository

Add a file to the staging area – must be done or won’t be saved in next commit

git add temp.txt or git add -A (add everything at once)

Create a snapshot of the changes and save to git directory??

Note that any committed changes won’t make their way to the remote repo

git commit –m “Message to go with the commit here”

Nota bene: git does not allow to add empty folders! If you try, you’ll simply see: nothing to commit, working tree clean

Put some “junk file” like .gitkeep in your empty folder for git add/git commit to work.

Set user-specific values

git config –global user.email youremail

Displays the list of changed files together with the files that are yet to be staged or committed

git status

Send local commits to the master branch of the remote repository

  (Replace <master> with the branch where you want to push your changes when you’re not intending to push to the master branch)

git push origin <master>

For real basic setups like mine, where I work on branch master, it suffices to simply do git push

Merge all the changes present in the remote repository to the local working directory

git pull

Create branches and helps you to navigate between them

git checkout -b <branch-name>

Switch from one branch to another

git checkout <branch-name>

View all remote repositories

git remote -v

Connect the local repository to a remote server

git remote add origin <host-or-remoteURL>

Delete connection to a specified remote repository

git remote rm <name-of-the-repository>

List, create, or delete branches

git branch

Delete a branch

git -d <branch-name>

Merge a branch into the active one

git merge <branch-name>

List file conflicts

git diff –base <file-name>

View conflicts between branches before a merge

git diff <source-branch> <target-branch>

List all conflicts

git diff

Mark certain commits, i.e., v1.0

git tag <commitID>

View repository’s commit history, etc

git log, e.g., git log –oneline

Reset index?? and working directory to last git commit

git reset –hard HEAD

Remove files from the index?? and working directory

git rm filename.txt

Revert (undo) changes from a commit as per hash shown by git log –oneline

git revert <hash>

Temporarily save the changes not ready to be committed??

git stash

View info about any git object

git show

Fetch all objects from the remote repository that don’t currently reside in the local working directory

git fetch origin

View a tree object??

git ls-tree HEAD

Search everywhere

git grep <string>

Clean unneeded files and optimize local repository

git gc

Create zip or tar file of a repository

Git archive –format=tar master

Delete objects without incoming pointers??

git prune

Identify corrupted objects

git fsck

Merge conflicts

Today I got this error during my usual git pull:

error: Pulling is not possible because you have unmerged files.
hint: Fix them up in the work tree, and then use 'git add/rm '
hint: as appropriate to mark resolution and make a commit.
fatal: Exiting because of an unresolved conflict.

I did not wish to waste too much time. I tried a few things (git status, etc), none of which worked. As it is a small repository without much at stake and I know which files I changed, I simply deleted the clone and re-cloned the repo, put back the newer versions of the changed files, did a usual git add and git commit and git push. All was good. Not too much time wasted in becoming a gitmaster, a fate I wish to avoid.

Ignore a file

Put the unqualified name of the file in .gitignore at same level as .git. It will not be added to the project. Use this for keeping passwords secret.

References and related

https://www.freecodecamp.org/news/10-important-git-commands-that-every-developer-should-know/

Categories
Admin Apache CentOS Python Raspberry Pi Web Site Technologies

Traffic shaping on linux – an exploration

Intro

I have always been somewhat agog at the idea of limiting bandwidth on my linux servers. Users complain about slow web sites and you want to try it for yourself, slowing your connection down to meet the parameters of their slower connection. More recently I happened on librespeed, an alternative to speedtest.net, where you can run both server and client. But in order to avoid transferring too much data and monopolizing the whole line, I wanted to actually put in some bandwidth throttling. I began an exploration of available methods to achieve this and found some satisfactory approaches that are readily available on Redhat-type linuxes.

bandwidth throttling, bandwidth rate limiting, bandwidth classes – these are all synonyms for what is most commonly called traffic shaping.

What doesn’t work so well

I think it’s important to start with the walls that I hit.

Cgroup

I stumbled on cgroups first. The man page starts in a promising way

cgroup - control group based traffic control filter

Then after you research it you see that support was enabled for cgroups in linux kernels already long ago. And there is version 1 and 2. And only version 1 supports bandwidth limits. But if you’re just a mid-level linux person such as myself, it is confusing and unclear how to take advantage of cgroup. My current conclusion is that it is more a subsystem designed for use by systemctl. In fact if you’ve ever looked at a status, for instance of crond, you see a mention of a cgroup:

sudo systemctl status crond
? crond.service - Command Scheduler
Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2021-08-09 15:44:24 EDT; 5 days ago
Main PID: 1193 (crond)
Tasks: 1 (limit: 11278)
Memory: 2.1M
CGroup: /system.slice/crond.service
mq1193 /usr/sbin/crond -n

I don’t claim to know what it all means, but there it is. Some nice abilities to schedule and allocate finite resources, at a very high level.

So I get the impression that no one really uses cgroups to do traffic shaping.

apache web server to the rescue – not

Since I was mostly interested in my librespeed server and controlling its bandwidth during testing, I wondered if the apache web server has this capability built-in. Essentially, it does! There is the module mod_ratelimit. So, quest over, and let the implementation begin! Except not so fast. In fact I did enable that module. And I set it up on my librespeed server. It kind of works, but mostly, not really, and nothing like its documented design.

                    


    SetOutputFilter RATE_LIMIT
    SetEnv rate-limit 400 
    SetEnv rate-initial-burst 512

That’s their example section. I have no interest in such low limits and tried various values from 4000 to 12000. I only got two different actual rates from librespeed out of all those various configurations. I could either get 83 Mbps or around 162 Mbps. And that’s it. Merely having any statement whatsoever starts limiting to one of these strange values. With the statement commented out I was getting around 300 Mbps. So I got rate-limiting, but not what I was seeking and with almost no control.

So the apache config approach was a bust for me.

Trickle

There are some linux programs that are perhaps promoted too heavily? Within a minute of posting my first draft of this someone comes along and suggests trickle. Well, on CentOS yum search trickle gives no results. My other OS was SLES v 15 and I similarly got no results. So I’m not enamored with trickle.

tc – now that looks promising

Then I discovered tc – traffic control. That sounds like just the thing. I had to search around a bit on one of my OSes to find the appropriate package, but I found it. On CentOS/Redhat/Fedora the package is iproute-tc. On SLES v15 it was iproute2. On FreeBSD I haven’t figured it out yet.

But it looks unwieldy to use, frankly. Not, as they say, user-friendly.

tcconfig + tc – perfect together

Then I stumbled onto tcconfig, a python wrapper for tc that provides convenient utilities and examples. It’s available, assuming you’ve already installed python, through pip or pip3, depending on how you’ve installed python. Something like

$ sudo pip3 install tcconfig

I love the available settings for tcset – just the kinds of things I would have dreamed up on my own. I wanted to limit download speeds, and only on the web server running on port 443, and noly from a specific subnet. You can do all that! My tcset command went something like this:

$ cd /usr/local/bin; sudo ./tcset eth0 --direction outgoing --src-port 443 --rate 150Mbps --network 134.12.0.0/16

$ sudo ./tcshow eth0

{
"eth0": {
"outgoing": {
"src-port=443, dst-network=134.12.0.0/16, protocol=ip": {
"filter_id": "800::800",
"rate": "150Mbps"
}
},
"incoming": {}
}
}

More importantly – does it work? Yes, it works beautifully. I run a librespeed cli with three concurrent streams against my AWS server thusly configured and I get around 149 Mbps. Every time.

Note that things are opposite of what you first think of. When I want to restrict download speeds from a server but am imposing traffic shaping on the server (as opposed to on the client machine), from its perspective that is upload traffic! And port 443 is the source port, not the destination port!

Raspberry Pi example

I’m going to try regular librespeed tests on my home RPi which is cabled to my router to do the Internet monitoring. So I’m trying

$ sudo tcset eth0 --direction incoming --rate 100Mbps
$ sudo tcset eth0 --direction outgoing --rate 9Mbps --add

This reflects the reality of the asymmetric rate you typically get from a home Internet connection. tcshow looks a bit peculiar however:

{
"eth0": {
"outgoing": {
"protocol=ip": {
"filter_id": "800::800",
"delay": "274.9s",
"delay-distro": "274.9s",
"rate": "9Mbps"
}
},
"incoming": {
"protocol=ip": {
"filter_id": "800::800",
"delay": "274.9s",
"delay-distro": "274.9s",
"rate": "100Mbps"
}
}
}
}
Results on the RPi

Despite the strange delay-distro appearing in the tcshow output, the results are perfect. Here are my librespeed results, running against my own private AWS server:

Time is Sat 21 Aug 16:17:23 EDT 2021
Ping: 20 ms Jitter: 1 ms
Download rate: 100.01 Mbps
Upload rate: 9.48 Mbps

!

Problems creep in on RPi

I swear I had it all working. This blog post is the proof. Now I’ve rebooted my RPi and that tcset command above gives the result Illegal instruction. Still trying to figure that one out!

March, 2022 update. My RPi had other issues. I’ve re-imaged the micro SD card and all is good once again. I set traffic shaping policies as shown in this post.

Conclusion about tcconfig

It’s clear tcset is just giving you a nice interface to tc, but sometimes that’s all you need to not sweat the details and start getting productive.

Possible issue – missing kernel module

On one of my servers (the CentOS 8 one), I had to do a

$ sudo yum install kernel-modules-extra

$ sudo modprobe sch_netem

before I could get tcconfig to really work.

To do list

Make the tc settings permanent.

Verify tc + tcconfig work on a Raspberry Pi. (tc is definitely available for RPi.)

Conclusion

We have found a pretty nice and effective way to do traffic shaping on linux systems. The best tool is tc and the best wrapper for it is tcconfig.

References and related

Librespeed is a great speedtest.net alternative for hard-code linux types who love command line and being in full control of both ends of a speed test. I describe it here.

tcconfig’s project page on PyPi.

Power cycling one’s cable modem automatically via an attached RPi. I refer to this blog post specifically because I intend to expand that RPi to also do periodic, automated speedtesting of my home braodband connection, with traffic shaping in place if all goes well (as it seems to thus far).

Bandwidth management and “queueing discipline” in all its gory detail is explained in this post, including example raw tc commands. I haven’t digested it yet but it may represent a way for me to get my RPi working again without a re-image: http://www.fifi.org/doc/HOWTO/en-html/Adv-Routing-HOWTO-9.html

Categories
Admin Network Technologies Raspberry Pi

A nice alternative to speedtest.net for the DIY linux crowd

Intro

I was building some infrastucture around automated speedtest.net tests using speedtest-cli. I noticed the assigned servers keep changing, some servers are categorized as malicious sources, some time out if tested on the hour and half-hour, and results are inconsistent depending on which server you get.

So, I saw that the speedtest-cli (linux command-line python script) has a switch for a “mini” server. When I investigated that seemed the answer to the problem – you can set up your own mini server and use that for yuor tests. I.e., control both ends of the test. great.

The speedtest.net mini server was discontinued in 2017! There’s some commercial replacement. So I thought. Forget that. I was disillusioned and then happened upon a breath of fresh air – an open source alternative to speedtest.net. Enter, librespeed.

Some details

librespeed has a command-line program whih is an obvious rip-off of speedtest-cli. In fact it is called librespeed-cli and has many similar switches.

There is also a server setup. Really, just a few files you can put on any apache + php web server. There is a web GUI as well, but in fact I am not even that interested in that. And you don’t need to set it up at all.

What I like is that with the appropriate switches supplied to librespeed-cli, I can have it run against my own librespeed server. In some testing configurations I was getting 500 Mbps downloads. Under less favorable circumstances, much less.

Testing, testing, testing

I tested between Europe and the US. I tested through a proxy. I tested from the Azure cloud to an Amazon AWS server. I tested with a single cpu linux server (good old drjohnstechtakl.com) either as server, or as the client. This was all possible because I had full control over both ends.

Some tips
  1. Play with the speedtest-cli switches. See what works for you. librespeed-cli -h will shows you all the options.
  2. Increasing the stream count can compensate for slower PING times (assuming both ends have a fast connection)
  3. It does support proxy, but
  4. Downloads don’t really work through proxy if the server is only running http
  5. Counterintuitively, the cpu burden is on the client, not the server! My servers didn’t show the slightest bit of resource usage.
  6. Corollary to 5. My 4-cpu client to 1-cpu server test was much faster than the other way around where server and client roles were reversed.
  7. Most things aren’t sensitive to upload speeds anyway so seriously consider suppressing that test with the appropriate switch. Your tests will also run a lot faster (18 seconds versus 40 seconds).
  8. Worried about consuming too much bandwidth and transferring too much data? I also developed a solution for that (will be my next blog post)
  9. So I am running a librespeed server on my little VM on Amazon AWS but I can’t make it public for fear of getting overrun.
  10. ISPs that have excellent interconnects such as the various cloud providers are probably going to give the best results
  11. It is not true your web server needs write access to its directory in my experience. As long as you don’t care about sharing telemetry data and all that.
  12. To emphasize, they supply the speedtest-cli binary, pre-built, for a whole slew of OSes. You do not and should not compile it yourself. For a standard linux VM you will want the binary called librespeed-cli_1.0.9_linux_386.tar.gz
Example files

The point of these files is to test librespeed-cli, from the directory where you copied it to, against your own librespeed server.

json-ns6
                    

[{
"name":"ns6, Germany (active-servers)",
"server":"https://ns6.drjohnstechtalk.com/",
"id":864,
"dlURL":"backend/garbage.php",
"ulURL":"backend/empty.php",
"pingURL":"backend/empty.php",
"getIpURL":"backend/getIP.php",
"sponsorName":"/dev/null/v",
"sponsorURL":"https://dev.nul.lv/"
}]

wrapper.sh
                    

#!/bin/sh
# see ./librespeed-cli -help for all the options
./librespeed-cli --local-json json-ns6 --server 864 --simple --no-upload --no-icmp --ipv4 --concurrent 4 --skip-cert-verify

Purpose: are we getting good speeds?

My purpose in what I am constructing is to verify we are getting good download speeds. I am not trying to hit it out of the park. That consumes (read, wastes) a lot of resources. I am targeting to prove we can achieve about 150 Mbps downloads. I don’t know anyone who can point to 150 Mbps and honestly say that’s insufficient for them. For some setups that may take four simultaneous streams, for others six. But it is definitely achievable. By not going crazy we are saving a lot of data transfers. AWS charges me for my network usage. So a six stream download test at 150 Mbps (Megabits per second) consumes about 325 MBytes download data. If you’re not being careful with your switches, you can easily nudge that up to 1 GB downloads for a single test.

My librespeed client to server tests ran overnight alongside my old approach using speedtest. The speedtest results are all over the place, with a bunch of zeroes for whatever reason, as is typical, while librespeed – and mind you this is from a client in the US, going through a proxy, to a server in Europe – produced much more consistent results. In one case where the normal value was 130 mbps, it dipped down to 110 mbps.

Testing it out at home
Test from a home PC against my own librespeed server

I made my test URL on my AWS server private, but a public one is available at https://librespeed.de/

At home of course I want to test with a Raspberry Pi since I work with them so much. There is indeed a pre-built binary for Raspberry Pi. It is https://github.com/librespeed/speedtest-cli/releases/download/v1.0.9/librespeed-cli_1.0.9_linux_armv7.tar.gz

The problem with speedtest in more detail

There were two final issues with speedtest that were the straws that broke the camel’s back, and they are closely related.

When you resolve www.speedtest.net it hits a Content Distribution Network (CDN), and the returned results vary. For instance right now we get:

;; QUESTION SECTION:
;www.speedtest.net. IN A
;; ANSWER SECTION:
www.speedtest.net. 4301 IN CNAME zd.map.fastly.net.
zd.map.fastly.net. 9 IN A 151.101.66.219
zd.map.fastly.net. 9 IN A 151.101.194.219
zd.map.fastly.net. 9 IN A 151.101.2.219
zd.map.fastly.net. 9 IN A 151.101.130.219

Note that you can also run speedtest-cli with the –list switch to get a list of speedtest servers. So in my case I found some servers which procuced good results. There was one where I even know the guy who runs the ISP and know he does an excellent job. His speedtest server is 15 miles away. But, in its infinite wisdom, speedtest sometimes thinks my server is in Lousiana, and other times thinks it’s in New Jersey! So the returned server list is completely different for the two cases. And, even though each server gets assigned a unique number, and you can specify that number with the –server switch, it won’t run the test if that particular server wasn’t proposed to you in its initial listing. (It always makes a server listing call whether you specified –list or not, for its own purposes as to which servers to use.)

I tried to use some of tricks to override this behaviour, but short of re-writing the whole thing, it was not going to work. I imagined I could force speedtest-cli to always use a particular IP address, overwriting the return from the fastly results, but getting that to work through proxy was not feasible. On the other hand if you suck it up and accept their randomly assigned server, you have to put up with a lot of garbage results.

So set up your own server, right? The –mini switch seems built to accommodate that. But the mini server was discontinued in 2017. The commercial replacement seemed to have some limits. So it’s dead end upon dead end with speedtest.net.

Conclusion

An open source alternative to speedtest.net’s speedtest-cli has been identified and tested, both server and client. It is librespeed. It gives you a lot more control than speedtest, if that is your thing and you know a smidgeon of linux.

References and related

Just to do your own test with your browser the way you do with speedtest.net: https://librespeed.de/

librespeed-cli: https://github.com/librespeed/speedtest-cli

librespeed-cli binaries download page: https://github.com/librespeed/speedtest-cli/releases

RPi version of librespeed-cli: https://github.com/librespeed/speedtest-cli/releases/download/v1.0.9/librespeed-cli_1.0.9_linux_armv7.tar.gz

The RPi I use for automatically power cycling my cable modem is hard-wired to my router and makes for an excellent platform from which to conduct these speedtests.

librespeed server: https://github.com/librespeed/speedtest

If, in spite of every positive thing I’ve had to say about librespeed, you still want to try the more commercial speedtest-cli, here is that link: https://www.speedtest.net/apps/cli

In this context a lot of people feel iperf is also worth exploring. I think it is a built-in linux command.

To kick it up a notch for professional-class bandwidth and availability measurements, ThousandEyes is the way to go. This discussion is very enlightening: https://www.thousandeyes.com/blog/caveats-of-traditional-network-tools-iperf

Categories
Admin TCP/IP

Poor man’s port checker for Windows

Intro

Say you want to check if a tcp port is open frmo yuor standard-issue Windows 10 PC. Can you? Yes you can. I will share a way that requires the fewest keystrokes.

A use case

We wanted to know if an issue with a network drive mapping was a network issue. The suggestion is to connect to port 445 on the remote server.

How to do it

From a CMD prompt

> powershell

> test-netconnection 192.168.20.250 -port 445

ComputerName : 192.168.20.250.250
RemoteAddress : 192.168.20.250
RemotePort : 445
InterfaceAlias : Ethernet 3
SourceAddress : 192.168.1.101
TcpTestSucceeded : True

That’s for the working case. If it can’t establish the connection it will take awhile and the last line will be False.

What you can type to minimize keystrokes is

test-n <TAB> in place of test-netconnection. It will be expanded to the full thing.

Conclusion

On linux you have tools like nc (netcat), nmap, scapy and even telnet, that we network engineers have used for ages. On Windows the options may be more limited, but this is one good way to know of. In the past I had written about portqry as a similar tool for Windows, but it requires an install. This test-netconnection needs nothing installed.

References and related

scapy, a custom packet generation utility

portqry for Windows

Categories
Admin Firewall Proxy

Checkpoint SYN Defender: what you don’t know can hurt you

Intro

Our EDI group hails me last Friday and says they can’t reach their VANs, or at best intermittently. What to do, what to do… I go on the offensive and say they have to stop using FTP (and that’s literal FTP, not sftp, not FTPs, just plain old FTP), it’s been out of date for at least 15 years.

But that wasn’t really helping the situation, so I had to dig a lot deeper. And frankly, I was coincidentally having intermittent issues with my scripted speedtests. Could the two be related?

The details

We have a bunch of synthetic monitors we run though that same firewall. They were failing every few minutes, and then became good.

And these FTPs were like that as well. Some would work and then minutes later not work.

The firewall person on call looked at the firewall, saw some of the described traffic passing through, and declared firewall is fine.

So I got a more cooperative firewall colleague on this. And he got a really expert Checkpoint support person on the call. That guy led us to look at SYN DEFENDER which is part of IPS and enabled via fw accel. If it sees too many out of state packets in a given time it will shut down the interface where the problem was observed!

The practical effect is that even if you’re taking traces on the Checkpoint, checking the logs, etc, you won’t see the traffic! So that really throws most firewall admins is this situation is so unusual and they are not trained to look for it.

In this case it was an internal firewall and ir was comfortable to disable SYN DEFENDER on it. All problems went away after that.

Four months later…

Then four months later, after the firewall was upgraded to v 81.10, they must have set SYN DEFENDER (AKA synatk) up all over again. And of course no one was thinking about it or expecting what happened next, which is, these exact same problems started all over again. But there were different firewall colleagues involved, none with any first-hand experience of the issue. Then I got involved and just sort of tackled my way through it in a trouble-shooting session. No one was placing any judgments (my-stuff-is- fine,-yours-must-be-broken kind of thinking). Then I eventually recalled the old problem, and looked up this post to help name it – SYN DEFENDER – so that that would be meaningful to the firewall colleague. Yup, he took it from there. And we were good. I admonished the on-call guy who totally missed it, and he humbly admitted to not being familiar with this feature and how does it work. So I will explain it to him.

Results of running fwaccel synatk config:

enabled 0
enforce 0
global_high_threshold 10000
periodic_updates 1
cookie_resolution_shift 6
min_frag_sz 80
high_threshold 5000
low_threshold 1000
score_alpha 100
monitor_log_interval (msec) 60000
grace_timeout (msec) 30000
min_time_in_active (msec) 60000

These are probably the defaults as we haven’t messed with them. Right now you see it’s disabled. It spontaneously re-enabeld itself after only a few days, and the problems started all over again.

References and related

VAN: Value Added Network