If you fit a certain profile: been in IT for > 20 years, managed to crate a few utility scripts in Perl, ut never wrapped your head around the newer and flashier Python, this blog post is for you.
Conversely, if you have grown up with Python and find yourself stuck maintaining some obscure legacy Perl code, this post is also for you.
A friend of mine has written a conceptually cool program that converts Perl programs into Python which he calls a Pythonizer.
I’m sure it won’t do well with special Perl packages and such. In fact it is an alpha release I think. But perhaps for those scripts which use the basic built-in Perl functions and operations, it will do the job.
When I get a chance to try it myself I will give some more feedback here. I have a perfect example in mind, i.e., a self-contained little Perl script which ought to work if anything will.
Conclusion
Old Perl programs have been given new life by Pythonizer, which can convert Perl programs into Python.
Intro In this spellbinding segment we examine what happened when a user found an inaccessible web site.
Some details The user in a corporate environment reports not being able to access https://login.smartnotice.net/. She has the latest version of Windows 10.
On the trail I sense something is wrong with SSL because of the type of errors reported by the browser. Something to the effect that it can’t make a secure connection.
But I decided to doggedly pursue it because I have a decent background in understanding SSL-related problems, and I was wondering if this was the first of what might be a systemic problem. I’m always interested to find little problem and resolve them in a way that addresses bigger issues.
So the first thing I try to lean more about the SSL versions and ciphers supported is to use my Go-To site, ssllabs.com, Test your Server: https://www.ssllabs.com/ssltest/. Well, this test failed miserably, and in a way I’ve never seen before. SSLlabs just quickly gave up without any analysis! So we pushed ahead, undaunted.
So I hit the site with curl from my CentOS 8 server (Upgrading WordPress brings a thicket of problems). Curl works fine. But I see it prefers to use TLS 1.3. So I finally buckle down and learn how to properly cnotrol the SSL/TLS version in curl. The output from curl -help is misleading, shall we say?
You think using curl –tlsv1.2 is going to use TLS v 1.2? Think again. Maybe it will, or maybe it won’t. In fact it tells curl to use TLS version 1.2 or higher. I totally missed understanding that for all these years. What I’m looking for is to determine if the web site is willing to use TLS v 1.2 in addition to TLS v 1.3.
The ticket is … –tls-max 1.2 . This sets the maximum TLS version curl will use to access the URL.
So we have curl -v –tls-max 1.3 https://login.smartnotice.net/
So now we know, this web site requires the latest and greatest TLS v 1.3. Even TLS 1.2 won’t do.
Well, this old corporate environment still offered users a choice of old browsers, including IE 11 and the old Edge browser. These two browsers simply do not support TLS 1.3. But I fuond even Firefox wasn’t working, although the Chrome browser was.
How to explain all that? How to fix it?
It comes down to a good knowledge of the particular environment. As I think I stated, the this corporate environment uses proxies, which in turn, most likely, tried to SSL intercept the traffic. The proxies are old so they in turn don’t actually support SSL interception of TLS v 1.3! They had separate problems with Chrome browser so they weren’t intercepting its traffic. This explains why FF was broken yet Chrome worked.
So the fix, such as it was, was to disable SSL interception for this request URL so that Firefox would work, and tell the user to use either FF or Chrome.
Just being thorough, when i tested from home with Edge Chromium – the newer Edge browser – it worked and SSLlabs showed (correctly) that it supports TLS 1.3. Edge in the corporate environment is the older, non-Chromium one. It seems to max out at TLS 1.2. No good.
For good measure I explained the situation to the desktop support people.
Case: closed.
Appendix
How did I decide the proxies didn’t support TLS 1,3? What if this site had some other issue after all? I looked on the web for another web site which only supports TLS 1.3. I thought hopefully badssl.com would have one. But they don’t! Undaunted yet again, I determined to change my own web site, drjohnstechtalk.com, into one that only supports TLS 1.3! This is easy to do with apache web server. You basically need a line that looks like this:
I’ve got my Philips Hue light bulb working with my Amazon Alexa. It’s an older 860 lumens bulb. I also have a voltmeter. So I went through different intensities, recording the power draw for each. The results are in the table below.
Level (%)
Power (Watts)
100
9.0
90
7.0
80
5.7
70
4.3
60
3.4
50
3.1
40
1.7
30
1.2
20
1.0
10
0.9
5*
0.8
0 (off)
0.3**
Power draw of LED light bulb at various brightness set by Alexa voice command.
So above 60% or so the relationship looks exponential. 50% seems like an outlier.
*By observation, the lowest lighting you can get from your bulbs is 5%.
**Unexpected finding – smartbulbs are vampire devices
I didn’t originally measure the power draw when “off.” You don’t think to do that. Then I gave it some more thought and had an aha moment – the bulb can only be smart if it is always listening for commands. And that, in turn, must create a power draw when off. A quick measurement and sure enough, confirmed. Though very small – 0.3 watts – it is not nothing. A typical single-family home has over a hundred bulbs. If they were all smartbulbs, it would add up… I believe small draw devices – typically those power adapters for cell phones – are called vampire devices.
Conclusion
So we have a very non-linear relationship here. I probably should plot the current draw as well. But, you definitely can save energy by lowering the intensity – quite a lot. But LED bulbs are drawing very little power anyway, so unless you have bunch of them, why bother?
My second conclusion – a finding I didn’t expect – is that even when off these bulbs are consuming a bit of power. It’s not a lot, 0.3 watts, but it’s something to keep in mind when planning your smartbulb deployment. So, large arrays of smartbulbs? Probably not such a smart idea.
Quick Tip
If you are using OpenSCAD for your 3D model construction, and after creating a satisfactory model do an export to STL, you may observe that nothing at all happens!
I was stuck on this problem for awhile. Yes, the solution is obvious for a regular users, but I only use it every few months. If you open the console you will see the problem immediately:
ERROR: Nothing to export! Try rendering first (press F6).
ERROR: Nothing to export! Try rendering first (press F6).
But in my case I had closed the console, forgot there was such a thing, and of course it remembers your settings.
So you have to render your object (F6) before you can export as an STL file.
OpenSCAD is a 3D modelling application that uses CSG – constructive Solid Geometry. It’s very math and basic geometric shapes focussed – perfect for me. https://www.openscad.org/
Intro
vsftpd is a useful daemon which I use to run an ftps service (ftp which uses TLS encryption). Since I am not part of the group that administers the server, it makes sense for me to maintain my own userlist rather than rely on the system password database. vsftpd has a convenient feature which allows this known as virtual users.
In the file /etc/vsftpd-virtual-user.db I have my Berkeley database of users and passwords. See references on how to set this up.
The point is that I had this all working last year – 2019 – on my SLES 12SP4 server.
Then it all broke
Then in early May, 2020, all the FTPs stopped working. The status of the vsftpd service hinted that the file /lib64/security/pam_userdb.so could not be loaded. Sure enough, it was missing! I checked some of my other SLES12SP4 servers, some of which are on a different patch schedule. It was missing on some, and present on one. So I “borrowed” pam_userdb.so from the one server which still had it and put it onto my server in /lib64/security. All good. Service restored. But clearly that is a hack.
What’s going on
So I asked a Linux expert what’s going on and got a good explanation.
pam_userdb has been moved to a separate package, named pam-extra
1) http://lists.suse.com/pipermail/sle-security-updates/2020-April/006661.html
2) https://www.suse.com/support/update/announcement/2020/suse-ru-20200822-1/
Advisory ID: SUSE-RU-2020:917-1
Released: Fri Apr 3 15:02:25 2020
Summary: Recommended update for pam
Type: recommended
Severity: moderate
References: 1166510
This update for pam fixes the following issues:
- Moved pam_userdb into a separate package pam-extra. (bsc#1166510)
Installing the package pam-extra should resolve your issue.
pam_userdb has been moved to a separate package, named pam-extra
1) http://lists.suse.com/pipermail/sle-security-updates/2020-April/006661.html
2) https://www.suse.com/support/update/announcement/2020/suse-ru-20200822-1/
Advisory ID: SUSE-RU-2020:917-1
Released: Fri Apr 3 15:02:25 2020
Summary: Recommended update for pam
Type: recommended
Severity: moderate
References: 1166510
This update for pam fixes the following issues:
- Moved pam_userdb into a separate package pam-extra. (bsc#1166510)
Installing the package pam-extra should resolve your issue.
I installed the pam-extra package using zypper, and, yes, it creates a /lib64/security/pam_userdb.so file!
And vsftpd works once more using supported packages.
Conclusion
Virtual users with vsftpd requires pam_userdb.so. However, PAM wished to decouple itself from dependency on external databases, etc, so they bundled this kind of thing into a separate package, pam-extra, more-or-less in the middle of a patch cycle. So if you had the problem I had, the solution may be as simple as installing the pam-extra package on your system. Although I experienced this on SLES, I believe it has or will happen on other Linux flavors as well.
This problem is poorly documented on the Internet.
Intro
You have to dig a little to find out about this somewhat obscure topic. You want to send syslog output, e.g., from the named daemon, to a syslog server with beefed up security, such that it requires the use of TLS so traffic is encrypted. This is how I did that.
The details
This is what worked for me:
...
# DrJ fixes - log local0 to DrJ's dmz syslog server - DrJ 5/6/20
# use local0 for named's query log, but also log locally
# see https://www.linuxquestions.org/questions/linux-server-73/bind-queries-log-to-remote-syslog-server-4175
669371/
# @@ means use TCP
$DefaultNetstreamDriver gtls
$DefaultNetstreamDriverCAFile /etc/ssl/certs/GlobalSign_Root_CA_-_R3.pem
$ActionSendStreamDriver gtls
$ActionSendStreamDriverMode 1
$ActionSendStreamDriverAuthMode anon
local0.* @@(o)14.17.85.10:6514
#local0.* /var/lib/named/query.log
local1.* -/var/log/localmessages
#local0.*;local1.* -/var/log/localmessages
local2.*;local3.* -/var/log/localmessages
local4.*;local5.* -/var/log/localmessages
local6.*;local7.* -/var/log/localmessages
...
# DrJ fixes - log local0 to DrJ's dmz syslog server - DrJ 5/6/20
# use local0 for named's query log, but also log locally
# see https://www.linuxquestions.org/questions/linux-server-73/bind-queries-log-to-remote-syslog-server-4175
669371/
# @@ means use TCP
$DefaultNetstreamDriver gtls
$DefaultNetstreamDriverCAFile /etc/ssl/certs/GlobalSign_Root_CA_-_R3.pem
$ActionSendStreamDriver gtls
$ActionSendStreamDriverMode 1
$ActionSendStreamDriverAuthMode anon
local0.* @@(o)14.17.85.10:6514
#local0.* /var/lib/named/query.log
local1.* -/var/log/localmessages
#local0.*;local1.* -/var/log/localmessages
local2.*;local3.* -/var/log/localmessages
local4.*;local5.* -/var/log/localmessages
local6.*;local7.* -/var/log/localmessages
The above is the important part of my /etc/rsyslog.conf file. The SIEM server is running at IP address 14.17.85.10 on TCP port 6514. It is using a certificate issued by Globalsign. An openssl call confirms this (see references).
Other gothcas
I am running on a SLES 15 server. Although it had rsyslog installed, it did not support tls initially. I was getting a dlopen error. So I figured out I needed to install this module:
Intro
I share some Zabbix items I’ve had to create which I find useful.
Low-level discovery to discover IPSEC tunnels on an F5 BigIP
IPSec tunnels are weird insofar as there is one IKE SA but potentially lots of SAs – two for each traffic selector. So if your traffic selector is called proxy-01, some OIDs you’ll see in your SNMP walk will be like …proxy-01.58769, …proxy-01.58770. So to review, do an snmpwalk on the F5 itself. That command is something like
snmpwalk -v3 -l authPriv -u proxyUser -a SHA -A shaAUTHpwd -x AES -X AESpwd -c public 127.0.0.1 SNMPv2-SMI::enterprises >/tmp/snmpwalk-ent
Now…how to translate this LLD? In my case I have a template since there are several F5s which need this. The template already has discovery rules for Pool discovery, Virtual server discovery, etc. So first thing we do is add a Tunnel discovery rule.
Tunnel Discovery Rule
The SNMP OID is clipped at the end. In full it is:
Initially I tried something else, but that did not go so well.
Now we want to know the tunnel status (up or down) and the amount of traffic over the tunnel. We create two item prototypes to get those.
Tunnel Status Item prototype
So, yes, we’re doing some fancy regex to simplify the otherwise ungainly name which would be generated, stripping out the useless stuff with a regsub function, which, by the way, is poorly documented. So that’s how we’re going to discover the statuses of the tunnels. In text, the name is:
Tunnel {{#SNMPINDEX}.regsub(“\”\/Common\/([^\”]+)\”(.+)”,\1\2)} status
I learned how to choose the OID, which is the most critical part, I guess, from a combination of parsing the output of the snmpwalk plus imitation of those other LLD item prortypes, which were writtne by someone more competent than I.
Now the SNMP value for traffic is bytes, but you see I set units of bps? I can do that because of the preprocessing steps which are
Bytes to traffic rate preprocessing steps
Final tip
For these discovery items what you want to do is to disable Create Enabled and disable Discover. I just run it on the F5s which actually have IPSEC tunnels. Execute now actually works and generates items pretty quickly.
Using the api with a token and security by obscurity
I am taking the approach of pulling the token out of a config file where it has been stored, base85 encoded, because, who uses base85, anyway? I call the following script encode.py:
import sys
from base64 import b85encode
s = sys.argv[1]
s_e = s.encode('utf-8')
s64 = b85encode(s_e)
print('s,s_e,s64',s,s_e,s64)
In my case I pull this encoded token from a config file, but to simplify, let’s say we got it from the command line. This is how that goes, and we use it to create the zapi object which can be used in any subsequent api calls. That is the key.
So it’s a few extra lines of code, but the cool thing is that it works. This should be good for version 5.4 and 6.0. Note that if you installed both py-zabbix and pyzabbix, your best bet may be to uninstall both and reinstall just pyzabbix. At least that was my experience going from user/pass to token-based authentication.
Convert DateAndTime SNMP output to human-readable format
Of course this is not very Zabbix-specific, as long as you realize that Zabbix produces the outer skin of the function:
function(value){
function (value) {
// DrJ 2020-05-04// see https://support.zabbix.com/browse/ZBXNEXT-3899 for SNMP DateAndTime format'use strict';//var str = "07 E4 05 04 0C 32 0F 00 2B 00 00";var str = value;// alert("str: " + str);// read values are hexvar y256 = str.slice(0,2);var y = str.slice(3,5);var m = str.slice(6,8);var d = str.slice(9,11);var h = str.slice(12,14);var min = str.slice(15,17);// convert to decimalvar y256Base10 =+("0x"+ y256);// convert to decimalvar yBase10 =+("0x"+ y);var Year =256*y256Base10 + yBase10;// alert("Year: " + Year);var mBase10 =+("0x"+ m);var dBase10 =+("0x"+ d);var hBase10 =+("0x"+ h);var minBase10 =+("0x"+ min);var YR =String(Year);var MM =String(mBase10);var DD =String(dBase10);var HH =String(hBase10);var MIN =String(minBase10);// paddingif(mBase10 <10) MM ="0"+ MM;if(dBase10 <10) DD ="0"+ DD;if(hBase10 <10) HH ="0"+ HH;if(minBase10 <10) MIN ="0"+ MIN;varDate= YR +"-"+ MM +"-"+ DD +" "+ HH +":"+ MIN;returnDate;
// DrJ 2020-05-04
// see https://support.zabbix.com/browse/ZBXNEXT-3899 for SNMP DateAndTime format
'use strict';
//var str = "07 E4 05 04 0C 32 0F 00 2B 00 00";
var str = value;
// alert("str: " + str);
// read values are hex
var y256 = str.slice(0,2); var y = str.slice(3,5); var m = str.slice(6,8);
var d = str.slice(9,11); var h = str.slice(12,14); var min = str.slice(15,17);
// convert to decimal
var y256Base10 = +("0x" + y256);
// convert to decimal
var yBase10 = +("0x" + y);
var Year = 256*y256Base10 + yBase10;
// alert("Year: " + Year);
var mBase10 = +("0x" + m);
var dBase10 = +("0x" + d);
var hBase10 = +("0x" + h);
var minBase10 = +("0x" + min);
var YR = String(Year); var MM = String(mBase10); var DD = String(dBase10);
var HH = String(hBase10);
var MIN = String(minBase10);
// padding
if (mBase10 < 10) MM = "0" + MM; if (dBase10 < 10) DD = "0" + DD;
if (hBase10 < 10) HH = "0" + HH; if (minBase10 < 10) MIN = "0" + MIN;
var Date = YR + "-" + MM + "-" + DD + " " + HH + ":" + MIN;
return Date;
I put that javascript into the preprocessing step of a dependent item, of course.
All my real-life examples do not fill in the last two fields: +/-, UTC offset. So in my case the times must be local times. But consequently I have no idea how a + or – would be represented in HEX! So I just ignored those last fields in the SNNMP DateAndTime which otherwise might have been useful.
Here’s an alternative version which calculates how long its been in hours since the last AV signature update.
// DrJ 2020-05-05// see https://support.zabbix.com/browse/ZBXNEXT-3899 for SNMP DateAndTime format'use strict';//var str = "07 E4 05 04 0C 32 0F 00 2B 00 00";var Start =newDate();var str = value;// alert("str: " + str);// read values are hexvar y256 = str.slice(0,2);var y = str.slice(3,5);var m = str.slice(6,8);var d = str.slice(9,11);var h = str.slice(12,14);var min = str.slice(15,17);// convert to decimalvar y256Base10 =+("0x"+ y256);// convert to decimalvar yBase10 =+("0x"+ y);var Year =256*y256Base10 + yBase10;// alert("Year: " + Year);var mBase10 =+("0x"+ m);var dBase10 =+("0x"+ d);var hBase10 =+("0x"+ h);var minBase10 =+("0x"+ min);var YR =String(Year);var MM =String(mBase10);var DD =String(dBase10);var HH =String(hBase10);var MIN =String(minBase10);var Sigdate =newDate(Year, mBase10 -1, dBase10,hBase10,minBase10);//difference in hoursvar difference =Math.trunc((Start - Sigdate)/1000/3600);return difference;
// DrJ 2020-05-05
// see https://support.zabbix.com/browse/ZBXNEXT-3899 for SNMP DateAndTime format
'use strict';
//var str = "07 E4 05 04 0C 32 0F 00 2B 00 00";
var Start = new Date();
var str = value;
// alert("str: " + str);
// read values are hex
var y256 = str.slice(0,2); var y = str.slice(3,5); var m = str.slice(6,8); var d = str.slice(9,11); var h = str.slice(12,14); var min = str.slice(15,17);
// convert to decimal
var y256Base10 = +("0x" + y256);
// convert to decimal
var yBase10 = +("0x" + y);
var Year = 256*y256Base10 + yBase10;
// alert("Year: " + Year);
var mBase10 = +("0x" + m);
var dBase10 = +("0x" + d);
var hBase10 = +("0x" + h);
var minBase10 = +("0x" + min);
var YR = String(Year); var MM = String(mBase10); var DD = String(dBase10);
var HH = String(hBase10);
var MIN = String(minBase10);
var Sigdate = new Date(Year, mBase10 - 1, dBase10,hBase10,minBase10);
//difference in hours
var difference = Math.trunc((Start - Sigdate)/1000/3600);
return difference;
function customSlice(array, start, end) {
var result = [];
var length = array.length;
// Handle negative start
start = start < 0 ? Math.max(length + start, 0) : start;
// Handle negative end
end = end === undefined ? length : (end < 0 ? length + end : end);
// Iterate over the array and push elements to result
for (var i = start; i < end && i < length; i++) {
result.push(array[i]);
}
return result;
}
function compareRangeWithExtractedIPs(ranges, result) {
// ranges is ["120.52.22.96/27","205.251.249.0/24"]
// I need to know if ips are in some of the ranges
var ipRegex = /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g;
var ips = result.match(ipRegex) || [];
return ips.every(function(ip) {
return ranges.some(function(range) {
var rangeParts = range.split('/');
var rangeIp = rangeParts[0];
var rangeCidr = rangeParts[1];
var rangeIpParts = rangeIp.split('.').map(Number);
var ipParts = ip.split('.').map(Number);
var rangeBinary = rangeIpParts.map(toBinary).join('');
var ipBinary = ipParts.map(toBinary).join('');
return ipBinary.substring(0, rangeCidr) === rangeBinary.substring(0, rangeCidr);
});
});
}
function toBinary(num) {
var binary = num.toString(2);
return '00000000'.substring(binary.length) + binary;
}
function fetchData(url) {
try {
Zabbix.log(4, 'Starting GET request to the provided URL');
var result = {},
req = new HttpRequest(),
resp;
req.addHeader('Content-Type: application/json');
resp = req.get(url);
if (req.getStatus() != 200) {
throw 'Response code: ' + req.getStatus();
}
resp = JSON.parse(resp);
result.data = resp;
var ipPrefixes = resp.prefixes.map(function(prefix) {
return prefix.ip_prefix;
});
result.data = ipPrefixes;
} catch (error) {
Zabbix.log(4, 'GET request failed: ' + error);
result = {};
}
return JSON.stringify(result.data);
}
var result = fetchData('https://ip-ranges.amazonaws.com/ip-ranges.json');
var ranges = JSON.parse(result);
return compareRangeWithExtractedIPs(ranges, value);
This is the preprocessing step of a dependent item which ran a net.dns.record to do DNS outlook and get results as you see them from dig. I don’t fully understand how it works! My colleague wrote it and he used copilot mostly! He started making progress with Copilot once he constrained it to use “duct tape Javascript.” Apparently that’s a thing. This is defined within a template. It compares all of the returned IPs to a json list of expected possible ranges, which are pulled directly from an AWS endpoint. There are currently 7931 ranges, who knew?
Since this returns a true of false there are two subsequent preprocessing steps which Repalce true with 0 and false with 1 and we set the type to Numeric unsigned.
Calculated bandwidth from an interface that only provides byte count
Again in this example the assumption is you have an item, probably from SNMP, that lists the total inbound/outbound byte count of a network interface – hopefully stored as a 64-bit number to avoid frequent rollovers. But the quantity that really excites you is bandwidth, such as megabits per second.
Use a calculated item as in this example for Bluecoat ProxySG:
change(sgProxyInBytesCount)*8/1000000/300
change(sgProxyInBytesCount)*8/1000000/300
Give it type numeric, Units of mbps. sgProxyInBytesCount is the key for an SNMP monitor that uses OID
IF-MIB::ifHCInOctets.{$INTERFACE_TO_MEASURE}
IF-MIB::ifHCInOctets.{$INTERFACE_TO_MEASURE}
where {$INTERFACE_TO_MEASURE} is a macro set for each proxy with the SNMP-reported interface number that we want to pull the statistics for.
The 300 in the denominator of the calculated item is required for me because my item is run every five minutes.
Alternative
No one really cares about the actual total value of byte count, right? So just re-purpose the In Bytes Count item a bit as follows:
add preprocessing step: Change per second
add second preprocessing step, Custom multiplier 8e-6
The first step gives you units of bytes/second which is less interesting than mbps, which is given by the second step. So the final units are mbps.
Be sure to put the units as !mbps into the Zabbix item, or else you may wind up with funny things like Kmbps in your graphs!
Creating a baseline
Even as of Zabbix v 5, there is no built-in baseline item type, which kind of sucks. Baseline can mean many different things to many people – it really depends on the data. In the corporate world, where I’m looking at bandwidth, my data has these distinct characteristics:
varies by hour-of-day, e.g., mornings see heavier usage than afternoons
there is the “Friday effect” where somewhat less usage is seen on Fridays, and extremely less usage occurs on weekends, hence variability by day-of-week
probably varies by day of month, e.g., month-end closings
So for this type of data (except the last criterion) I have created an appropriate baseline. Note I would do something different if I were graphing something like the solar generation from my solar panels, where the day-of-week variability does not exist.
Getting to the point, I have created a rolling lookback item. This needs to be created as a Zabbix Item of type Calculated. The formula is as follows:
In this example sgProxyInBytesCount is my key from the reference item. Breaking it down, it does a rolling lookback of the last six measurements taken at this time of day on this day of the week over the last six weeks and averages them. Voila, baseline! The more weeks you include the more likely you are to include data you’d rather not like holidays, days when things were busted, etc. I’d like to have a baseline that is from a fixed time, like “all of last year.” I have no idea how. I actually don’t think it’s possible.
But, anyway, the baseline approach above should generally work for any numeric item.
Refinement
The above approach only gives you six measurements, hence 1/sqrt(6) ~ 40% standard deviation by the law of large numbers, which is still pretty jittery as it turns out. So I came up with this refined approach which includes 72 measurements, hence 1/sqrt(72) ~ 12% st dev. I find that to be closer to what you intuitively expect in a baseline – a smooth approximation of the past. Here is the refined function:
I would have preferred a one-hour interval centered around one week ago, etc., e.g., something like 1w+30m, but such date arithmetic does not seem possible in Zabbix functions. And, yeah, I could put 84600s (i.e., 86400 – 1800), but that is much less meaingful and so harder to maintain. Here is a three-hour graph whose first half still reflects the original (jittery) baseline, and last half the refined function.
Latter part has smoothed baseline in light green
What I do not have mastered is whether we can easily use a proper smoothing function. It does not seem to be a built-in offering of Zabbix. Perhaps it could be faked by a combination of pre-processing and Javascript? I simply don’t know, and it’s more than I wish to tackle for the moment.
Data gap between mulitple item measurements looks terrible in Dashboard graph – solution
In a Dashboard if you are graphing items which were not all measured at the same time, the results can be frustrating. For instance, an item and its baseline as calculated above. The central part of the graph will look fine, but at either end giant sections will be missing when the timescale of display is 30 minutes or 60 minutes for items measured every five minutes or so. Here’s an example before I got it totally fixed.
Zabbix item timing mismatch
See the left side – how it’s broken up? I had beguin my fix so the right side is OK.
The data gap solution
Use Scheduling Intervals in defining the items. Say you want a measurement every five minutes. Then make your scheduling interval m/5 in all the items you are putting on the same graph. For good measure, make the regular interval value infrequent. I use a macro {$UPDATE_LONG}. What this does is force Zabbix to measure all the items at the same time, in this case every five minutes on minutes divisible by five. Once I did that my incoming bandwith item and its corresponding baseline item aligned nicely.
Low-level Discovery
I cottoned on to the utility of this part of Zabbix a little late. Hey, slow learner, but I eventually got there. What I found in my F5 devices is that using SNMP to monitor the /var filesystem was a snap: it was always device 32 (final OID digit). But /var/log monitoring? Not so much. Every device seemed different, with no obvious pattern. Active and standby units – identical hardware – and some would be 53, the partner 55. Then I rebooted a device and its number changed! So, clearly, dynamically assigned and no way was I going to keep up with it. I had learned the numbers by doing an snmpwalk. The solution to this dynamically changing OID number is to use low-level discovery.
Tip: using zabbix_sender in a more robust fashion
We run the Zabbix proxies as pairs. They are not run as a cluster. Instead one is active and the other is a warm standby. Then we can upgrade at our leisure the standby proxy, switch the hosts to it, then upgrade the other now-unused proxy.
But our scripts which send results using zabbix_sender run on other servers. Their data stops being recorded when the switch is made. What to do?
I learned you can send to both Zabbix proxies. It will fail on the standby one and succeed on the other. Since one proxy is always active, it will always succeed in sending its data!
A nice DNS synthetic monitor
It would have been so easy for Zabbix to have built in the capability of doing synthetic DNS checks against your DNS servers. But, alas, they left it out. Which leaves it to us to fill that gap. Here is a nice and simple but surprisingly effective script for doing synthetic DNS checks. You put it in the external script directory of whatever proxy is monitoring your DNS host. I called it dns.sh.
#!/bin/sh
# arg1 - hostname of nameserver
# arg2 - DNS server to test
# arg3 - FQDN
# arg4 - RR type
# arg5 - match arg
# [arg6] - tcpflag # this argument is optional
# if you set DEBUG=1, and debug through zabbix, set item type to text
DEBUG=0
timeout=2 # secs - seems a good value
name=$1
nameserver=$2
record=$3
type=$4
match=$5
tcpflag=$6
[[ "$DEBUG" -eq "1" ]] && echo "name: $name, nameserver: $nameserver , record: $record , type: $type , match pattern: $match, tcpflag: $tcpflag"
[[ "$tcpflag" = "y" ]] || [[ "$tcpflag" = "Y" ]] && PROTO="+tcp"
# unless you set tries to 1 it will try three times by default!
MATCH=$(dig +noedns +short $PROTO +timeout=$timeout +tries=1 $type $record @${nameserver} )
[[ "$DEBUG" -eq "1" ]] && echo MATCHed line is $MATCH
return=0
[[ "$MATCH" =~ $match ]] && return=1
[[ "$DEBUG" -eq "1" ]] && echo return $return
echo $return
It gives a value of 1 if it matched the match expression, 0 otherwise.
With Zabbix version 7 we finally replaced this nice external dns.sh script with built-in agent items. But it was a little tricky to throw that sort of item into a template – we had to define a host maro containing the IP of the host!
Convert a string to a number in item output
I’ve been bit by this a couple times. Create a dependent item. In the preprocessing steps first do a RegEx match and keep \0. Important to also check the Custom on Fail checkbox. Set value to 0 on fail. In the second step I used a replace and replaced the expected string with a 1.
I had a harder case where the RegEx had alternate strings. But that’s solvable as well! I created additional Replace steps with all possible alternate strings, setting each one to be replaced with 1. Kludge, yes, but it works. And that is how you turn a text item into a boolean style output of 0 or 1.
Conclusion
A couple of really useful but poorly documented items are shared. Perhaps more will be added in the future.
Intro
All of a sudden one day I could not access the GUI of one my security appliances. It had only worked yesterday. CLI access kind of worked – until it didn’t. It was the standby part of a cluster so I tried the active unit. Same issues. I have some ill-defined involvement with the firewall the traffic was traversing, so I tried to debug the problem without success. So I brought in a real firewall expert.
More details
Of course I knew to check the firewall logs. Well, they showed this traffic (https and ssh) to have been accepted, no problems. Hmm. I suspected some weird IPS thing. IPS is kind of a big black box to me as I don’t deal with it. But I have seen cases where it blocks traffic without logging the fact. But that concern led me to bring in the expert.
By myself I had gotten it to the point where I had done tcpdump (I had totally forgotten how to use fw monitor. Now I will know and refer to my own recent blog post) on the corporate network side as well as the protected subnet side. And I saw that packets were hitting the corporate network interface that weren’t crossing over to the protected subnet. Why? But first some more about the symptoms.
The strange behaviour of my ssh session
The web GUI just would not load the home page. But ssh was a little different. I could indeed log in. But my ssh froze every time I changed to the /var/log directory and did a detailed directory listing ls -l. The beginning of the file listing would come back, and then just hang there mid-stream, frozen. In my tcpdump I noticed that the packets that did not get through were larger than the ones sent in the beginning of the session – by a lot. 1494 data bytes or something like that. So I could kind of see that with ssh, you normally send smallish packets, until you need a bigger one for something like a detailed directory listing! And https sends a large server certificate at the beginning of the session so it makes sense that it would hang if those packets were being stopped. So the observed behaviour makes sense in light of the dropping of the large packets. But that doesn’t explain why.
I asked a colleague to try it and they got similar results.
The solution method
It had nothing to do with IPS. The firewall guy noticed and did several things.
He agreed the firewall logs showed my connection being accepted.
He saw that another firewall admin had installed policy around the time the problem began. We analyzed what was changed and concluded that was a false lead. No way those changes could have caused this problem.
He switched the active firewall to standby so that we used the standby unit. It worked just fine!
He observed that the current active unit became active around the time of the problem, due to a problem with an interface on the normally active unit.
I probably would have been fine to just work using the standby but I didn’t want to crimp his style, so he continued in investigating…and found the ultimate root cause.
And finally the solution
He noticed that on the bad firewall the one interface – I swear I am not making this up – had been configured with a non-standard MTU! 1420 instead of 1500.
Analysis
I did a head slap when he shared that finding. Of course I should have looked for that. It explains everything. The OS was dropping the packet, not the firewall blade per se. And I knew the history. Some years back these firewalls were used for testing OLTV, a tunneling technology to extend layer 2 across physically separated subnets. That never did work to my satisfaction. One of the issues we encountered was a problem with large packets. So the firewall guy at the time tried this out to help. Normally firewalls don’t fail so the one unit where this MTU setting was present just wasn’t really used, except for brief moments during OS upgrade. And, funny to say, this mis-configuration was even propagated from older hardware to newer! The firewall guys have a procedure where they suck up all the configuration from the old firewall and restore to the newer one, mapping updated interface names, etc, as needed.
Well, at least we found it before too many others complained. Though, as expected, complain they did, the next day.
Aside: where is curl?
I normally would have tested the web page from the firewall iself using curl. But curl has disappeared from Gaia v 80.20. And there’s no wget either. How can such a useful and universal utility be missing? The firewall guy looked it up and quickly found that instead of curl, they have curl_cli. Who knew?
Conclusion
The strange case of the large packets dropped by a firewall, but not by the firewall blade, was resolved the same day it occurred. It took a partner ship of two people bringing their domain-specific knowledge to bear on the problem to arrive at the solution.
Intro
Scripts are normally not worth sharing because they are so easy to construct. This one illustrates several different concepts so may be of interest to someone else besides myself:
packet trace utility in Checkpoint firewall Gaia
send Ctrl-C interrupt to a process which has been run in the background
giving unqieu filenames for each cut
general approach to tacklnig the challenge of breaking a potentially large output into manageable chunks
The script
I wanted to learn about unexpected VPN client disconnects that a user, Sandy, was experiencing. Her external IP is 99.221.205.103.
while /bin/true; do
# date +%H%M inserts the current Hour (HH) and minute (MM).
file=/tmp/sandy`date +%H%M`.cap
# fw monitor is better than tcpdump because it looks at all interfaces
fw monitor -o $file -l 60 -e "accept src=99.221.205.103 or dst=99.221.205.103;" &
# $! picks up the process number of the command we backgrounded just above
pid=$!
sleep 600
#sleep 90
kill $pid
sleep 3
gzip $file
done
while /bin/true; do
# date +%H%M inserts the current Hour (HH) and minute (MM).
file=/tmp/sandy`date +%H%M`.cap
# fw monitor is better than tcpdump because it looks at all interfaces
fw monitor -o $file -l 60 -e "accept src=99.221.205.103 or dst=99.221.205.103;" &
# $! picks up the process number of the command we backgrounded just above
pid=$!
sleep 600
#sleep 90
kill $pid
sleep 3
gzip $file
done
This type of tracing of this VPN session produces about 20 MB of data every 10 minutes. I want to be able to easily share the trace file afterwards in an email. And smaller files will be faster when analyzed in Wireshark.
The script itself I run in the background:
# ./sandy.sh &
And to make sure I don’t get logged out, I just run a slow PING afterwards:
# ping ‐i45 1.1.1.1
Alternate approaches
In retrospect I could have simply used the -ci argument and had the process terminate itself after a certain number of packets were recorded, and saved myself the effort of killing that process. But oh well, it is what it is.
Small tip to see all packets
Turn acceleration off:
fwaccel stat
fwaccel off
fwaccel on (when you’re done).
Conclusion
I share a script I wrote today that is simple, but illustrates several useful concepts.
Intro
I just learned of this really clear explanation of BGP hijacking, including interactive links, and what can be done to improve the current situation, namely, implement RPKI. I guess this weighs on me lately after I learned about another massive BGP hijacking out of Russia. See the references for these links.
References and related
This is the post I was referring to above, very interactive and not too technical. Is BGP Safe Yet?