Categories
Admin

OpenSCAD export to STL does nothing

Quick Tip
If you are using OpenSCAD for your 3D model construction, and after creating a satisfactory model do an export to STL, you may observe that nothing at all happens!

I was stuck on this problem for awhile. Yes, the solution is obvious for a regular users, but I only use it every few months. If you open the console you will see the problem immediately:

ERROR: Nothing to export! Try rendering first (press F6).

But in my case I had closed the console, forgot there was such a thing, and of course it remembers your settings.

So you have to render your object (F6) before you can export as an STL file.

References and related
I don’t know why this endplate design blog post which I wrote never caught on. I think the pictures are cool.

OpenSCAD is a 3D modelling application that uses CSG – constructive Solid Geometry. It’s very math and basic geometric shapes focussed – perfect for me. https://www.openscad.org/

Categories
Admin Linux

vsftd Virtual Users stopped working after patching: the solution

Intro
vsftpd is a useful daemon which I use to run an ftps service (ftp which uses TLS encryption). Since I am not part of the group that administers the server, it makes sense for me to maintain my own userlist rather than rely on the system password database. vsftpd has a convenient feature which allows this known as virtual users.

More details
In /etc/pam.d/vsftpd.virtual I have:

auth required pam_userdb.so db=/etc/vsftpd/vsftpd-virtual-user
account required pam_userdb.so db=/etc/vsftpd/vsftpd-virtual-user
session required pam_loginuid.so

In the file /etc/vsftpd-virtual-user.db I have my Berkeley database of users and passwords. See references on how to set this up.

The point is that I had this all working last year – 2019 – on my SLES 12SP4 server.

Then it all broke
Then in early May, 2020, all the FTPs stopped working. The status of the vsftpd service hinted that the file /lib64/security/pam_userdb.so could not be loaded. Sure enough, it was missing! I checked some of my other SLES12SP4 servers, some of which are on a different patch schedule. It was missing on some, and present on one. So I “borrowed” pam_userdb.so from the one server which still had it and put it onto my server in /lib64/security. All good. Service restored. But clearly that is a hack.

What’s going on
So I asked a Linux expert what’s going on and got a good explanation.

pam_userdb has been moved to a separate package, named pam-extra
 
1) http://lists.suse.com/pipermail/sle-security-updates/2020-April/006661.html
2) https://www.suse.com/support/update/announcement/2020/suse-ru-20200822-1/
 
Advisory ID: SUSE-RU-2020:917-1
Released: Fri Apr 3 15:02:25 2020
Summary: Recommended update for pam
Type: recommended
Severity: moderate
References: 1166510
This update for pam fixes the following issues:
 
- Moved pam_userdb into a separate package pam-extra. (bsc#1166510)
 
Installing the package pam-extra should resolve your issue.

I installed the pam-extra package using zypper, and, yes, it creates a /lib64/security/pam_userdb.so file!

And vsftpd works once more using supported packages.

Conclusion
Virtual users with vsftpd requires pam_userdb.so. However, PAM wished to decouple itself from dependency on external databases, etc, so they bundled this kind of thing into a separate package, pam-extra, more-or-less in the middle of a patch cycle. So if you had the problem I had, the solution may be as simple as installing the pam-extra package on your system. Although I experienced this on SLES, I believe it has or will happen on other Linux flavors as well.

This problem is poorly documented on the Internet.


References and related

https://www.cyberciti.biz/tips/centos-redhat-vsftpd-ftp-with-virtual-users.html

Categories
Admin Linux Network Technologies

Configure rsyslog to send syslog to SIEM server running TLS

Intro
You have to dig a little to find out about this somewhat obscure topic. You want to send syslog output, e.g., from the named daemon, to a syslog server with beefed up security, such that it requires the use of TLS so traffic is encrypted. This is how I did that.

The details
This is what worked for me:

...
# DrJ fixes - log local0 to DrJ's dmz syslog server - DrJ 5/6/20
# use local0 for named's query log, but also log locally
# see https://www.linuxquestions.org/questions/linux-server-73/bind-queries-log-to-remote-syslog-server-4175
669371/
# @@ means use TCP
$DefaultNetstreamDriver gtls
$DefaultNetstreamDriverCAFile /etc/ssl/certs/GlobalSign_Root_CA_-_R3.pem
$ActionSendStreamDriver gtls
$ActionSendStreamDriverMode 1
$ActionSendStreamDriverAuthMode anon
 
local0.*                                @@(o)14.17.85.10:6514
#local0.*                               /var/lib/named/query.log
local1.*                                -/var/log/localmessages
#local0.*;local1.*                      -/var/log/localmessages
local2.*;local3.*                       -/var/log/localmessages
local4.*;local5.*                       -/var/log/localmessages
local6.*;local7.*                       -/var/log/localmessages

The above is the important part of my /etc/rsyslog.conf file. The SIEM server is running at IP address 14.17.85.10 on TCP port 6514. It is using a certificate issued by Globalsign. An openssl call confirms this (see references).

Other gothcas
I am running on a SLES 15 server. Although it had rsyslog installed, it did not support tls initially. I was getting a dlopen error. So I figured out I needed to install this module:

rsyslog-module-gtls

References and related
How to find the server’s certificate using openssl. Very briefly, use the openssl s_client submenu.

The rsyslog site itself probably has the complete documentation. Though I haven’t looked at it thoroughly, it seems very complete.

Categories
Admin JavaScript Network Technologies

Practical Zabbix examples

Intro
I share some Zabbix items I’ve had to create which I find useful.

Convert DateAndTime SNMP output to human-readable format

Of course this is not very Zabbix-specific, as long as you realize that Zabbix produces the outer skin of the function:

function (value) {
// DrJ 2020-05-04
// see https://support.zabbix.com/browse/ZBXNEXT-3899 for SNMP DateAndTime format
'use strict';
//var str = "07 E4 05 04 0C 32 0F 00 2B 00 00";
var str = value;
// alert("str: " + str);
// read values are hex
var y256 = str.slice(0,2); var y = str.slice(3,5); var m = str.slice(6,8); 
var d = str.slice(9,11); var h = str.slice(12,14); var min = str.slice(15,17);
// convert to decimal
var y256Base10 = +("0x" + y256);
// convert to decimal
var yBase10 = +("0x" + y);
var Year = 256*y256Base10 + yBase10;
//  alert("Year: " + Year);
var mBase10 = +("0x" + m);
var dBase10 = +("0x" + d);
var hBase10 = +("0x" + h);
var minBase10 = +("0x" + min);
var YR = String(Year); var MM = String(mBase10); var DD = String(dBase10);
var HH = String(hBase10);
var MIN = String(minBase10);
// padding
if (mBase10 < 10)  MM = "0" + MM; if (dBase10 < 10) DD = "0" + DD;
if (hBase10 < 10) HH = "0" + HH; if (minBase10 < 10) MIN = "0" + MIN;
var Date = YR + "-" + MM + "-" + DD + " " + HH + ":" + MIN;
return Date;

I put that javascript into the preprocessing step of a dependent item, of course.

All my real-life examples do not fill in the last two fields: +/-, UTC offset. So in my case the times must be local times. But consequently I have no idea how a + or – would be represented in HEX! So I just ignored those last fields in the SNNMP DateAndTime which otherwise might have been useful.

Here’s an alternative version which calculates how long its been in hours since the last AV signature update.

// DrJ 2020-05-05
// see https://support.zabbix.com/browse/ZBXNEXT-3899 for SNMP DateAndTime format
'use strict';
//var str = "07 E4 05 04 0C 32 0F 00 2B 00 00";
var Start = new Date();
var str = value;
// alert("str: " + str);
// read values are hex
var y256 = str.slice(0,2); var y = str.slice(3,5); var m = str.slice(6,8); var d = str.slice(9,11); var h = str.slice(12,14); var min = str.slice(15,17);
// convert to decimal
var y256Base10 = +("0x" + y256);
// convert to decimal
var yBase10 = +("0x" + y);
var Year = 256*y256Base10 + yBase10;
//  alert("Year: " + Year);
var mBase10 = +("0x" + m);
var dBase10 = +("0x" + d);
var hBase10 = +("0x" + h);
var minBase10 = +("0x" + min);
var YR = String(Year); var MM = String(mBase10); var DD = String(dBase10);
var HH = String(hBase10);
var MIN = String(minBase10);
var Sigdate = new Date(Year, mBase10 - 1, dBase10,hBase10,minBase10);
//difference in hours
var difference = Math.trunc((Start - Sigdate)/1000/3600);
return difference;

Calculated bandwidth from an interface that only provides byte count
Again in this example the assumption is you have an item, probably from SNMP, that lists the total inbound/outbound byte count of a network interface – hopefully stored as a 64-bit number to avoid frequent rollovers. But the quantity that really excites you is bandwidth, such as megabits per second.

Use a calculated item as in this example for Bluecoat ProxySG:

change(sgProxyInBytesCount)*8/1000000/300

Give it type numeric, Units of mbps. sgProxyInBytesCount is the key for an SNMP monitor that uses OID

IF-MIB::ifHCInOctets.{$INTERFACE_TO_MEASURE}

where {$INTERFACE_TO_MEASURE} is a macro set for each proxy with the SNMP-reported interface number that we want to pull the statistics for.

The 300 in the denominator of the calculated item is required for me because my item is run every five minutes.

Alternative
No one really cares about the actual total value of byte count, right? So just re-purpose the In Bytes Count item a bit as follows:

  • add preprocessing step: Change per second
  • add second preprocessing step, Custom multiplier 8e-6

The first step gives you units of bytes/second which is less interesting than mbps, which is given by the second step. So the final units are mbps.

Conclusion
A couple of really useful but poorly documented items are shared. Perhaps more will be added in the future.


References and related

https://support.zabbix.com/browse/ZBXNEXT-3899 for SNMP DateAndTime format

My first Zabbix post was mostly documentation of a series of disasters and unfinished business.

Categories
Admin Firewall

The IT Detective Agency: large packets dropped by firewall, but logs show OK

Intro
All of a sudden one day I could not access the GUI of one my security appliances. It had only worked yesterday. CLI access kind of worked – until it didn’t. It was the standby part of a cluster so I tried the active unit. Same issues. I have some ill-defined involvement with the firewall the traffic was traversing, so I tried to debug the problem without success. So I brought in a real firewall expert.

More details
Of course I knew to check the firewall logs. Well, they showed this traffic (https and ssh) to have been accepted, no problems. Hmm. I suspected some weird IPS thing. IPS is kind of a big black box to me as I don’t deal with it. But I have seen cases where it blocks traffic without logging the fact. But that concern led me to bring in the expert.

By myself I had gotten it to the point where I had done tcpdump (I had totally forgotten how to use fw monitor. Now I will know and refer to my own recent blog post) on the corporate network side as well as the protected subnet side. And I saw that packets were hitting the corporate network interface that weren’t crossing over to the protected subnet. Why? But first some more about the symptoms.

The strange behaviour of my ssh session
The web GUI just would not load the home page. But ssh was a little different. I could indeed log in. But my ssh froze every time I changed to the /var/log directory and did a detailed directory listing ls -l. The beginning of the file listing would come back, and then just hang there mid-stream, frozen. In my tcpdump I noticed that the packets that did not get through were larger than the ones sent in the beginning of the session – by a lot. 1494 data bytes or something like that. So I could kind of see that with ssh, you normally send smallish packets, until you need a bigger one for something like a detailed directory listing! And https sends a large server certificate at the beginning of the session so it makes sense that it would hang if those packets were being stopped. So the observed behaviour makes sense in light of the dropping of the large packets. But that doesn’t explain why.

I asked a colleague to try it and they got similar results.

The solution method
It had nothing to do with IPS. The firewall guy noticed and did several things.

  • He agreed the firewall logs showed my connection being accepted.
  • He saw that another firewall admin had installed policy around the time the problem began. We analyzed what was changed and concluded that was a false lead. No way those changes could have caused this problem.
  • He switched the active firewall to standby so that we used the standby unit. It worked just fine!
  • He observed that the current active unit became active around the time of the problem, due to a problem with an interface on the normally active unit.

I probably would have been fine to just work using the standby but I didn’t want to crimp his style, so he continued in investigating…and found the ultimate root cause.

And finally the solution
He noticed that on the bad firewall the one interface – I swear I am not making this up – had been configured with a non-standard MTU! 1420 instead of 1500.

Analysis
I did a head slap when he shared that finding. Of course I should have looked for that. It explains everything. The OS was dropping the packet, not the firewall blade per se. And I knew the history. Some years back these firewalls were used for testing OLTV, a tunneling technology to extend layer 2 across physically separated subnets. That never did work to my satisfaction. One of the issues we encountered was a problem with large packets. So the firewall guy at the time tried this out to help. Normally firewalls don’t fail so the one unit where this MTU setting was present just wasn’t really used, except for brief moments during OS upgrade. And, funny to say, this mis-configuration was even propagated from older hardware to newer! The firewall guys have a procedure where they suck up all the configuration from the old firewall and restore to the newer one, mapping updated interface names, etc, as needed.

Well, at least we found it before too many others complained. Though, as expected, complain they did, the next day.

Aside: where is curl?
I normally would have tested the web page from the firewall iself using curl. But curl has disappeared from Gaia v 80.20. And there’s no wget either. How can such a useful and universal utility be missing? The firewall guy looked it up and quickly found that instead of curl, they have curl_cli. Who knew?

Conclusion
The strange case of the large packets dropped by a firewall, but not by the firewall blade, was resolved the same day it occurred. It took a partner ship of two people bringing their domain-specific knowledge to bear on the problem to arrive at the solution.

Categories
Admin Firewall Network Technologies Security

Linux shell script to cut a packet trace every 10 minutes on Checkpoint firewall

Intro
Scripts are normally not worth sharing because they are so easy to construct. This one illustrates several different concepts so may be of interest to someone else besides myself:

  • packet trace utility in Checkpoint firewall Gaia
  • send Ctrl-C interrupt to a process which has been run in the background
  • giving unqieu filenames for each cut
  • general approach to tacklnig the challenge of breaking a potentially large output into manageable chunks

The script
I wanted to learn about unexpected VPN client disconnects that a user, Sandy, was experiencing. Her external IP is 99.221.205.103.

while /bin/true; do
# date +%H%M inserts the current Hour (HH) and minute (MM).
 file=/tmp/sandy`date +%H%M`.cap
# fw monitor is better than tcpdump because it looks at all interfaces
 fw monitor -o $file -l 60 -e "accept src=99.221.205.103 or dst=99.221.205.103;" &
# $! picks up the process number of the command we backgrounded just above
 pid=$!
 sleep 600
 #sleep 90
 kill $pid
 sleep 3
 gzip $file
done

This type of tracing of this VPN session produces about 20 MB of data every 10 minutes. I want to be able to easily share the trace file afterwards in an email. And smaller files will be faster when analyzed in Wireshark.

The script itself I run in the background:
# ./sandy.sh &
And to make sure I don’t get logged out, I just run a slow PING afterwards:
# ping ‐i45 1.1.1.1

Alternate approaches
In retrospect I could have simply used the -ci argument and had the process terminate itself after a certain number of packets were recorded, and saved myself the effort of killing that process. But oh well, it is what it is.

Small tip to see all packets
Turn acceleration off:
fwaccel stat
fwaccel off
fwaccel on (when you’re done).

Conclusion
I share a script I wrote today that is simple, but illustrates several useful concepts.

References and related
fw monitor cheat sheet.

The standard packet analyzer everyone uses is Wireshark from https://wireshark.org/

Categories
Admin Apache CentOS Linux Security

Trying to upgrade WordPress brings a thicket of problems

Intro
Wordpress tells me to upgrade to version 5.4. But when I try it says nope, your version of php is too old. Now admittedly, I’m running on an ancient CentOS server, now at version 6.10, which I set up back in 2012 I believe.

I’m pretty comfortable with CentOS so I wanted to continue with it, but just on a newer version at Amazon. I don’t like being taken advantage of, so I also wanted to avoid those outfits which charge by the hour for providing CentOS, which should really be free. Those costs can really add up.

Lots of travails setting up my AWS image, and then…

I managed to find a CentOS amongst the community images. I chose centos-8-minimal-install-201909262151 (ami-01b3337aae1959300).

OK. Brand new CentOS 8 image, 8.1.1911 after patching it, which will be supported for 10 years. Surely it has the latest and greatest??

Well, I’m not so sure…

If only I had known

I really wish I had seen this post earlier. It would have been really, really helpful: https://blog.ssdnodes.com/blog/how-to-install-wordpress-on-centos-7-with-lamp-tutorial/

But I didn’t see it until after I had done all the work below the hard way. Oh well.

When I install php I get version 7.2.11. WordPress is telling me I need a minimum of php version 7.3. If i download the latest php, it tells me to download the latest apache. So I do. Version 2.4.43. I also install gcc, anticipating some compiling in my future…

But apache won’t even configure:

httpd-2.4.43]$ ./configure --enable-so
checking for chosen layout... Apache
checking for working mkdir -p... yes
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
configure:
configure: Configuring Apache Portable Runtime library...
configure:
checking for APR... no
configure: error: APR not found.  Please read the documentation.
  --with-apr=PATH         prefix for installed APR or the full path to
                             apr-config
  --with-apr-util=PATH    prefix for installed APU or the full path to
                             apu-config
 
(apr-util configure)
checking for APR... no
configure: error: APR could not be located. Please use the --with-apr option.
 
try:
 
 ./configure --with-apr=/usr/local/apr
 
but
 
-D_GNU_SOURCE   -I/usr/local/src/apr-util-1.6.1/include -I/usr/local/src/apr-util-1.6.1/include/private  -I/usr/local/apr/include/apr-1    -o xml/apr_xml.lo -c xml/apr_xml.c &amp;&amp; touch xml/apr_xml.lo
xml/apr_xml.c:35:10: fatal error: expat.h: No such file or directory
 #include 
          ^~~~~~~~~
compilation terminated.
make[1]: *** [/usr/local/src/apr-util-1.6.1/build/rules.mk:206: xml/apr_xml.lo] Error 1

So I install expat header files:
$ yum install expat-devel
And then the make of apr-util goes through. Not sure this is the right approach or not yet, however.

So following php’s advice, I have:
$ ./configure –enable-so

checking for chosen layout... Apache
...
checking for pcre-config... false
configure: error: pcre-config for libpcre not found. PCRE is required and available from http://pcre.org/

So I install pcre-devel:
$ yum install pcre-devel
Now the apache configure goes through, but the make does not work:

/usr/local/apr/build-1/libtool --silent --mode=link gcc  -g -O2 -pthread         -o htpasswd  htpasswd.lo passwd_common.lo       /usr/local/apr/lib/libaprutil-1.la /usr/local/apr/lib/libapr-1.la -lrt -lcrypt -lpthread -ldl -lcrypt
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_GetErrorCode'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_SetEntityDeclHandler'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_ParserCreate'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_SetCharacterDataHandler'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_ParserFree'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_SetUserData'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_StopParser'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_Parse'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_ErrorString'
/usr/local/apr/lib/libaprutil-1.so: undefined reference to `XML_SetElementHandler'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:48: htpasswd] Error 1

So I try configure or apr-util with expat built-in.

$ ./configure –with-expat=builtin –with-apr=/usr/local/apr

But when I do the make of apr-util I now get this error:

/usr/local/apr/build-1/libtool: line 7475: cd: builtin/lib: No such file or directory
libtool:   error: cannot determine absolute directory name of 'builtin/lib'
make[1]: *** [Makefile:93: libaprutil-1.la] Error 1
make[1]: Leaving directory '/usr/local/src/apr-util-1.6.1'
make: *** [/usr/local/src/apr-util-1.6.1/build/rules.mk:118: all-recursive] Error 1

From what I read this new error occurs due to having –expat-built-in! So now what? So I get rid of that in my configure statement for apr-util. For some reason, apr-util goes through and compiles. And so I try this for compiling apache24:

$ ./configure –enable-so –with-apr=/usr/local/apr

And then I make it. And for some reason, now it goes through. I doubt it will work, however… it kind of does work.

It threw the files into /usr/local/apache2, where there is a bin directory containing apachectl. I can launch apachectl start, and then access a default page on port 80. Not bad so far…

I still need to tie in php however.

I just wing it and try

$ ./configure –with-apxs2=/usr/local/apache2/bin/apxs –with-mysql

Hey, maybe for once their instructions will work. Nope.

configure: error: Package requirements (libxml-2.0 >= 2.7.6) were not met:

Package 'libxml-2.0', required by 'virtual:world', not found

Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

So I guess I need to install libxml2-devel:

$ yum install libxm2-devel

Looks like I get past that error. Now it’s on to this one:

configure: error: Package requirements (sqlite3 > 3.7.4) were not met:

So I install sqlite-devel:
$ yum install sqlite-devel
Now my configure almost goes through, except, as I suspected, that was a nonsense argument:

configure: WARNING: unrecognized options: --with-mysql

It’s not there when you look for it! Why the heck did they – php.net – give an example with exactly that?? Annoying. So I leave it out. It goes through. Run make. It takes a long time to compile php! And this server is pretty fast. It’s slower than apache or anything else I’ve compiled.

But eventually the compile finished. It added a LoadModule statement to the apache httpd.conf file. And, after I associated files with php extension to the php handler, a test file seemed to work. So php is beginning to work. Not at all sure about the mysql tie-in, however. In fact see further down below where I confirm my fears that there is no MySQL support when PHP is compiled this way.

Is running SSL asking too much?
Apparently, yes. I don’t think my apache24 has SSL support built-in:

Invalid command 'SSLCipherSuite', perhaps misspelled or defined by a module not included in the server configuration

So I try
$ ./configure –enable-so –with-apr=/usr/local/apr –enable-ssl

Not good…

checking for OpenSSL... checking for user-provided OpenSSL base directory... none
checking for OpenSSL version &gt;= 0.9.8a... FAILED
configure: WARNING: OpenSSL version is too old
no
checking whether to enable mod_ssl... configure: error: mod_ssl has been requested but can not be built due to prerequisite failures

Where is it pulling that old version of openssl? Cause when I do this:

$ openssl version

OpenSSL 1.1.1c FIPS  28 May 2019

That’s not that old…

I also noticed this error:

configure: WARNING: Your APR does not include SSL/EVP support. To enable it: configure --with-crypto

So maybe I will re-compile APR with that argument.

Nope. APR doesn’t even have that argument. But apr-uil does. I’ll try that.

Not so good:

configure: error: Crypto was requested but no crypto library could be enabled; specify the location of a crypto library using --with-openssl, --with-nss, and/or --with-commoncrypto.

I give up. maybe it was a false alarm. I’ll try to ignore it.

So I install openssl-devel:

$ yum install openssl-devel

Then I try to configure apache24 thusly:

$ ./configure –enable-so –with-apr=/usr/local/apr –enable-ssl

This time at least the configure goes through – no ssl-related errors.

I needed to add the Loadmodule statement by hand to httpd.conf since that file was already there from my previous build and so did not get that statement after my re-build with ssl support:

LoadModule ssl_module   modules/mod_ssl.so

Next error please
Now I have this error:

AH00526: Syntax error on line 92 of /usr/local/apache2/conf/extra/drjohns.conf:
SSLSessionCache: 'shmcb' session cache not supported (known names: ). Maybe you need to load the appropriate socache module (mod_socache_shmcb?).

I want results. So I just comment out the lines that talk about SSL Cache and anything to do with SSL cache.

And…it starts…and…it is listening on both ports 80 and 443 and…it is running SSL. So I think i cracked the SSL issue.

Switch focus to Mysql
I didn’t bother to find mysql. I believe people now use mariadb. So I installed the system one with a yum install mariadb. I became root and learned the version with a select version();

+-----------------+
| version()       |
+-----------------+
| 10.3.17-MariaDB |
+-----------------+
1 row in set (0.000 sec)

Is that recent enough? Yes! For once we skate by comfortably. The WordPress instructions say:

MySQL 5.6 or MariaDB 10.1 or greater

I setup apache. I try to access wordpress setup but instead get this message:

Forbidden
 
You don't have permission to access this resource.

every page I try gives this error.

The apache error log says:

client denied by server configuration: /usr/local/apache2/htdocs/

Not sure where that’s coming from. I thought I supplied my own documentroot statements, etc.

I threw in a Require all granted within the Directory statement and that seemed to help.

PHP/MySQL communication issue surfaces
Next problem is that PHP wasn’t compiled correctly it seems:

Your PHP installation appears to be missing the MySQL extension which is required by WordPress.

So I’ll try to re-do it. This time I am trying these arguments to configure:
$ ./configure ‐‐with-apxs2=/usr/local/apache2/bin/apxs ‐‐with-mysqli

Well, I’m not so sure this worked. Trying to setup WordPress, I access wp-config.php and only get:

Error establishing a database connection

This is roll up your sleeves time. It’s clear we are getting no breaks. I looked into installing PhpMyAdmin, but then I would neeed composer, which may depend on other things, so I lost interest in that rabbit hole. So I decide to simplify the problem. The suggested test is to write a php program like this, which I do, calling it tst2.php:

 <!--?php
$servername = "localhost";
$username = "username";
$password = "password";
 
// Create connection
$conn = mysqli_connect($servername, $username, $password);
 
// Check connection
if (!$conn) {
    die("Connection failed: " . mysqli_connect_error());
}
echo "Connected successfully";
?-->

Run it:
$ php tst2.php
and get:

PHP Warning:  mysqli_connect(): (HY000/2002): No such file or directory in /web/drjohns/blog/tst2.php on line 7
 
Warning: mysqli_connect(): (HY000/2002): No such file or directory in /web/drjohns/blog/tst2.php on line 7

Some quick research tells me that php does not know where the file mysql.sock is to be found. I search for it:

$ sudo find / ‐name mysql.sock

and it comes back as

/var/lib/mysql/mysql.sock

So…the prescription is to update a couple things in pph.ini, which has been put into /usr/local/lib in my case because I compiled php with mostly default values. I add the locatipon of the mysql.sock file in two places for good measure:

pdo_mysql.default_socket = /var/lib/mysql/mysql.sock
mysqli.default_socket = /var/lib/mysql/mysql.sock

And then my little test program goes through!

Connected successfully

Install WordPress
I begin to install WordPress, creating an initial user and so on. When I go back in I get a directory listing in place of the index.php. So I call index.php by hand and get a worisome error:

Fatal error: Uncaught Error: Call to undefined function gzinflate() in /web/drjohns/blog/wp-includes/class-requests.php:947 Stack trace: #0 /web/drjohns/blog/wp-includes/class-requests.php(886): Requests::compatible_gzinflate('\xA5\x92\xCDn\x830\f\x80\xDF\xC5g\x08\xD5\xD6\xEE...'

I should have compiled php with zlib is what I determine it means… zlib and zlib-devel packages are on my system so this should be straightforward.

More arguments for php compiling
OK. Let’s be sensible and try to reproduce what I had done in 2017 to compile php instead of finding an resolving mistakes one at a time.

$ ./configure –with-apxs2=/usr/local/apache2/bin/apxs –with-mysqli –disable-cgi –with-zlib –with-gettext –with-gdbm –with-curl –with-openssl

This gives this new issue:

Package 'libcurl', required by 'virtual:world', not found

I will install libcurl-devel in hopes of making this one go away.

Past that error, and onto this one:

configure: error: DBA: Could not find necessary header file(s).

I’m trying to drop the –with-gdbm and skip that whole DBA thing since the database connection seemed to be working without it. Now I see an openssl problem:

make: *** No rule to make target '/tmp/php-7.4.4/ext/openssl/openssl.c', needed by 'ext/openssl/openssl.lo'.  Stop.

Even if I get rid of openssl I still see a problem when running configure:

gawk: ./build/print_include.awk:1: fatal: cannot open file `ext/zlib/*.h*' for reading (No such file or directory)

Now I can ignore that error because configure exits with 0 status and make, but the make then stops at zlib:

SIGNALS   -c /tmp/php-7.4.4/ext/sqlite3/sqlite3.c -o ext/sqlite3/sqlite3.lo
make: *** No rule to make target '/tmp/php-7.4.4/ext/zlib/zlib.c', needed by 'ext/zlib/zlib.lo'.  Stop.

Reason for above php compilation errors
I figured it out. My bad. I had done a make distclean in addition to a make clean when i was re-starting with a new set of arguments to configure. i saw it somewhere advised on the Internet and didn’t pay much attention, but it seemed like a good idea. But I think what it was doing was wiping out the files in the ext directory, like ext/zlib.

So now I’m starting over, now with php 7.4.5 since they’ve upgraded in the meanwhile! With this configure command line (I figure I probably don’t need gdb):
./configure –with-apxs2=/usr/local/apache2/bin/apxs –with-mysqli –disable-cgi –with-zlib –with-gettext –with-gdbm –with-curl –with-openssl

Well, the php compile went through, however, I can’t seem to access any WordPress pages (all WordPress pages clock). Yet my simplistic database connection test does work. Hmmm. OK. If they come up at all, they come up exceedingly slowly and without correct formatting.

I think I see the reason for that as well. The source of the wp-login.php page (as viewed in a browser window) includes references to former hostnames my server used to have. Of course fetching all those objects times out. And they’re the ones that provide the formatting. At this point I’m not sure where those references came from. Not from the filesystem, so must be in the database as a result of an earlier WordPress install attempt. Amazon keeps changing my IP, you see. I see it is embedded into WordPress. In Settings | general Settings. I’m going to have this problem every time…

What I’m going to do is to create a temporary fictitious name, johnstechtalk, which I will enter in my hosts file on my Windows PC, in Windows\system32\drivers\etc\hosts, and also enter that name in WordPress’s settings. I will update the IP in my hosts file every time it changes while I am playing around. And now there’s even an issue doing this which has always worked so reliably in the past. Well, I found I actually needed to override the IP for drjohnstechtalk.com in my hosts file. But it seems Firefox has moved on to using DNS over https, so it ignores the hosts file now! i think. Edge still uses it however, thankfully.

WordPress
So WordPress is basically functioning. I managed to install a few of my fav plugins: Akismet anti-spam, Limit Login Attempts, WP-PostViews. Some of the plugins are so old they actually require ftp. Who runs ftp these days? That’s been considered insecure for many years. But to accommodate I installed vsftpd on my server and ran it, temporarily.

Then Mcafee on my PC decided that wordpress.org is an unsafe site, thank you very much, without any logs or pop-ups. I couldn’t reach that site until I disabled the Mcafee firewall. Makes it hard to learn how to do the next steps of the upgrade.

More WordPress difficulties

WordPress is never satisfied with whatever version you’ve installed. You take the latest and two weeks later it’s demanding you upgrade already. My first upgrade didn’t go so well. Then I installed vsftpd. The upgrade likes to use your local FTP server – at least in my case. so for ftp server I just put in 127.0.0.1. Kind of weird. Even still I get this error:

Downloading update from https://downloads.wordpress.org/release/wordpress-5.4.2-no-content.zip…

The authenticity of wordpress-5.4.2-no-content.zip could not be verified as no signature was found.

Unpacking the update…

Could not create directory.

Installation Failed

So I decided it was a permissions problem: my apache was running as user daemon (do a ps -ef to see running processes), while my wordpress blog directory was owned by centos. So I now run apache as user:group centos:centos. In case this helps anyone the apache configurtion commands to do this are:

User centos
Group centos

then I go to my blog directory and run something like:

chown -R centos:centos *
Wordpres Block editor non-functional after the upgrade

When I did the SQL import from my old site, I killed the block editor on my new site! This was disconcerting. That little plus sign just would not show up on new pages, no posts, whatever. So I basically killed wordpress 5.4. So I took a step backwards and started v 5.4 with a clean (empty) database like a fresh install to make sure the block editor works then. It did. Whew! Then I did an RTFM and deactivated my plugins on my old WordPress install before doing the mysql backup. I imported that SQL database, with a very minimal set of plugins activated, and, whew, this time I did not blow away the block editor.

CentOS bogs down

I like my snappy new Centos 8 AMI 80% of the time. But that remaining 20% is very annoying. It freezes. Really bad. I ran a top until the problem happened. Here I’ve caught the problem in action:

top - 16:26:11 up 1 day, 21 min, 2 users, load average: 3.96, 2.93, 5.30
Tasks: 95 total, 1 running, 94 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 2.6 sy, 0.0 ni, 0.0 id, 95.8 wa, 0.4 hi, 0.3 si, 0.7 st
MiB Mem : 1827.1 total, 63.4 free, 1709.8 used, 53.9 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 9.1 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
44 root 20 0 0 0 0 S 1.6 0.0 12:47.94 kswapd0
438 root 0 -20 0 0 0 I 0.5 0.0 1:38.84 kworker/0:1H-kblockd
890 mysql 20 0 1301064 92972 0 S 0.4 5.0 1:26.83 mysqld
5282 centos 20 0 1504524 341188 64 S 0.4 18.2 0:06.06 httpd
5344 root 20 0 345936 1008 0 S 0.4 0.1 0:00.09 sudo
560 root 20 0 93504 6436 3340 S 0.2 0.3 0:02.53 systemd-journal
712 polkitd 20 0 1626824 4996 0 S 0.2 0.3 0:00.15 polkitd
817 root 20 0 598816 4424 0 S 0.2 0.2 0:12.62 NetworkManager
821 root 20 0 634088 14772 0 S 0.2 0.8 0:18.67 tuned
1148 root 20 0 216948 7180 3456 S 0.2 0.4 0:16.74 rsyslogd
2346 john 20 0 273640 776 0 R 0.2 0.0 1:20.73 top
1 root 20 0 178656 4300 0 S 0.0 0.2 0:11.34 systemd

So what jumps out at me is the 95.8% wait time – that ain’t good – an that a process which includes the name swap is at the top of ths list, combined with the fact that exactly 0 swap space is allocated. My linux skills may be 15 years out-of-date, but I think I better allocate some swap space (but why does it need it so badly??). On my old system I think I had done it. I’m a little scared to proceed for fear of blowing up my system.

So if you use drjohnstechtalk.com and it freezes, just come back in 10 minutes and it’ll probably be running again – this situation tends to self-correct. No one’s perfect.

Making a swap space

I went ahead and created a swap space right on my existing filesystem. I realized it wasn’t too hard once I found these really clear instructions: https://www.maketecheasier.com/swap-partitions-on-linux/

Some of the commands are dd to create an empty file, mkswap, swapon and swapon -s to see what it’s doing. And it really, really helped. I think sometimes mariadb needed to swap, and sometimes apache did. My system only has 1.8 GB of memory or so. And the drive is solid state, so it should be kind of fast. Because I used 1.2 GB for swap, I also extended my volume size when I happened upon Amazon’s clear instructions on how you can do that. Who knew? See below for more on that. If I got it right, Amazon also gives you more IO for each GB you add. I’m definitely getting good response after this swap space addition.

An aside about i/o

In the old days I perfected  a way to study i/o using the iostat utility. You can get it by installing the sysstat package. A good command to run is iostat -t -c -m -x 5

Examing these three consecutive lines of output from running that command is very instructional:

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
xvda 2226.40 1408.00 9.35 5.54 1.00 0.20 0.04 0.01 2.37 5.00 10.28 4.30 4.03 0.25 90.14

07/04/2020 04:05:36 PM
avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 1.20 48.59 0.60 48.59

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
xvda 130.14 1446.51 0.53 5.66 0.60 0.00 0.46 0.00 4.98 8.03 11.47 4.15 4.01 0.32 51.22

07/04/2020 04:05:41 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.00 0.00 100.00

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
xvda 1.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 2.69 0.00 0.62 0.10

I tooled around in the admin panel (which previously had brought my server to its knees), and you see the %util shot up to 90%, reads per sec over 2000 , writes per second 1400. So, really demanding. It’s clear my server would die if more than a few people were hitting it hard.  And I may need some fine-tuning.

Success!

Given all the above problems, you probably never thought I’d pull this off. I worked in fits and starts – mostly when my significant other was away because this stuff is a time suck. But, believe it or not, I got the new apache/openssl/apr/php/mariadb/wordpress/centos/amazon EC2 VPC/drjohnstechtalk-with-new-2020-theme working to my satisfaction. I have to pat myself on the back for that. So I pulled the plug on the old site, which basically means moving the elastic IP over from old centos 6 site to new centos8 AWS instance. Since my site was so old, I had to first convert the elastic IP from type classic to VPC. It was not too obvious, but I got it eventually.

Damn hackers already at it

Look at the access log of your new apache server running your production WordPress. If you see like I did people already trying to log in (POST accesses for …/wp-login.php), which is really annoying because they’re all hackers, at least install the WPS Hide Login plugin and configure a secret login URL. Don’t use the default login.

Meanwhile I’ve decided to freeze out anyone who triess to access wp-login.php because they can only be up to no good. So I created this script which I call wp-login-freeze.sh:

#!/bin/sh
# freeze hackers who probe for wp-login
# DrJ 6/2020
DIR=/var/log/drjohns
cd $DIR
while /bin/true; do
tail -200 access_log|grep wp-login.php|awk '{print $1}'|sort -u|while read line; do
echo $line
route add -host $line gw 127.0.0.1
done
sleep 60
done

Works great! Just do a netstat -rn to watch your ever-growing list of systems you’ve frozen out.

Increasing EBS filesystem size causes worrisome error

As mentioned above I used some of the filesystem for swap so I wanted to enlarge it.

$ sudo growpart /dev/xvda 1
CHANGED: partition=1 start=2048 old: size=16773120 end=16775168 new: size=25163743,end=25165791
root@ip-10-0-0-181:~/hosting$ sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 12G 0 disk
mqxvda1 202:1 0 12G 0 part /
root@ip-10-0-0-181:~/hosting$ df -k
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 912292 0 912292 0% /dev
tmpfs 935468 0 935468 0% /dev/shm
tmpfs 935468 16800 918668 2% /run
tmpfs 935468 0 935468 0% /sys/fs/cgroup
/dev/xvda1 8376320 3997580 4378740 48% /
tmpfs 187092 0 187092 0% /run/user/0
tmpfs 187092 0 187092 0% /run/user/1001
root@ip-10-0-0-181:~/hosting$ sudo resize2fs /dev/xvda1
resize2fs 1.44.6 (5-Mar-2019)
resize2fs: Bad magic number in super-block while trying to open /dev/xvda1
Couldn't find valid filesystem superblock.

The solution is to use xfs_growfs instead of resize2fs. And that worked!

$ sudo xfs_growfs -d /
meta-data=/dev/xvda1 isize=512 agcount=4, agsize=524160 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=2096640, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
data blocks changed from 2096640 to 3145467
root@ip-10-0-0-181:~/hosting$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 891M 0 891M 0% /dev
tmpfs 914M 0 914M 0% /dev/shm
tmpfs 914M 17M 898M 2% /run
tmpfs 914M 0 914M 0% /sys/fs/cgroup
/dev/xvda1 12G 3.9G 8.2G 33% /
tmpfs 183M 0 183M 0% /run/user/0
tmpfs 183M 0 183M 0% /run/user/1001
PHP found wanting by WordPress health status

Although my site seems to be humming alnog, now I have to find the more obscure errors. WordPress mentioned my site health has problems.

WordPress site health

I think gd is used for graphics. I haven’t seen any negative results from this, yet. I may leave it b for the time being.

References and related
This blog post is about 1000% better than my own if all you want to do is install WordPress on Centos: https://blog.ssdnodes.com/blog/how-to-install-wordpress-on-centos-7-with-lamp-tutorial/

Here is WordPress’s own extended instructions for upgrading. Of course this should be your starting point: https://wordpress.org/support/article/upgrading-wordpress-extended-instructions/

I’ve been following the php instructions: https://www.php.net/manual/en/install.unix.apache2.php

Before you install WordPress. Requirements and such.

This old article of mine has lots of good tips: Compiling apache 2.4

This is a great article about how Linux systems use swap space and how you can re-configure things: https://www.maketecheasier.com/swap-partitions-on-linux/

Amazon has this clear article on the linux commands you run after you extend an EBS volume. they worked for me: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html

My Centos 8 AMI is centos-8-minimal-install-201909262151 (ami-01b3337aae1959300)

Categories
Admin Network Technologies Web Site Technologies

Examining certificates over explicit proxy with openssl

Intro
This is pretty esoteric, but I’ve personally been waiting for this for a long time. It seems that beginning with openssl 1.1, the s_client sub-menu has had support for a proxy setting. Until then it was basically impossible to examine the certificates your proxy was sending back to users.

The syntax is something like:

openssl s_client -proxy <proxy_ip>:<proxy_port> -servername expired.badssl.com -showcerts -connect expired.badssl.com:443

where the proxy is a standard HTTP proxy.

Why is it a great thing? If your proxy does SSL interception then it is interfering with with the site’s normal certificate. And worse, it can good. What if its own signing certificate has expired?? I’ve seen it happen, and it isn’t pretty…

To find the openssl version just run openssl version.

My SLES12 SP4 servers have a version which is too old. My Cygwin install is OK, actually. My Redhat 7.7 has a version which is too old. I do have a SLES 15 server which has a good version. But even the version on my F5 devices is too old, surprisingly.

References and related
the openssl project home page: https://www.openssl.org/

A few of my favorite openssl commands.

Categories
Admin Linux Network Technologies

Quick Tip: Powershell command to unblock a firewall port when running Windows Defender

Setup
I decided to run an X Server on my Windows 10 laptop. I only need it for Cognos gateway configuration, but when you need it, you need it. Of course an X Server listens on port 6000, so hosts outside of your PC have to be able to initiate a TCP connection to your PC with destination port 6000. So that port has to be open. The software I use for the X Server is Mobatek XTerm.

Here is the Powershell command to disable the block of TCP port 6000.

New-NetFirewallRule -DisplayName "MobaXterm Allow Incoming Requests" -Direction Inbound -LocalPort 6000 -Protocol TCP -Profile Domain -Action Allow

The Powershell window needs to be run as administrator. The change is permanent: it suffices to run it once.

Conclusion
And, because inquiring minds want to know, did it work? Yes, it worked and I could send my cogconfig X window to my Mobatek X Server. I had to look for a new Window. It was slow.

Categories
Admin Network Technologies

Monitoring by Zabbix: a working document

Intro
I panned Zabbix in this post: DIY monitoring. But I have compelling reasons to revisit it. I have to say it has matured, but there remain some very frustrating things about it, especially when compared with SiteScope (now owned by Microfocus) which is so much more intuitive.

But I am impressed by the breadth of the user base and the documentation. But learning how to do any specific thing is still an exercise in futility.

I am going to try to structure this post as a problems encountered, and how they were resolved.

Current production version as of this writing?
Answer: 4.4

Zabbix Manual does not work in Firefox
That’s right. I can’t even read the manual in my version of Firefox. Its sections do not expand. Solution: use Chrome

Which database?
You may see references to MYSQL in Zabbix docs. MYSQL is basically dead. what should you do?

Zabbix quick install on Redhat

Answer
Install mariadb which has replaced MYSQL and supports the same commands such as the mysql from the screenshot. On my Redhat instance I have installed these mariadb-related repositories:

mariadb-5.5.64-1.el7.x86_64
mariadb-server-5.5.64-1.el7.x86_64
mariadb-libs-5.5.64-1.el7.x86_64

Terminology confusion
what is a host, a host group, a template, an item, a web scenario, a trigger, a media type?
Answer

Don’t ask me. When I make progress I’ll post it here.

Web scenario specific issues
Can different web scenarios use different proxies?
Answer: Yes, no problem. In really old versions this was not possible. See web scenario screenshot below.

Can the proxy be a variable so that the same web scenario can be used for different proxies?
Answer: Yes. Let’s say you attach a web scenario to a host. In that host’s configuration you can define a “macro” which sets the variable value. e.g., the value of HTTP_PROXY in my example. I think you can do the same from a template, but I’m getting ahead of myself.

Similarly, can you do basic proxy auth and hide the credentials in a MACRO? Answer: I think so. I did it once at any rate. See above screenshot.

Why does my google.com web scenario work whereas my amazon.com scenario not when they’re exactly the same except for the URL?
Answer: some ideas, but the logging information is bad. Amazon does not take to bots hitting it for health check reasons. It may work better to change the agent type to Linux|Chrome, which is what I am trying now. Here’s my original answer: Even with command-line curl I get an error through this proxy. That can’t be good:
$ curl ‐vikL www.amazon.com

...
NSS error -5961 (PR_CONNECT_RESET_ERROR)
* TCP connection reset by peer
* Closing connection 1
curl: (35) TCP connection reset by peer


My amazon.com web scenario is not working (status of 1), yet in dashboard does not return any obvious warning or error or red color. Why? Answer:
no idea. Maybe you have to define a trigger?

Say you’re on the Monitoring|latest data screen. Does the data get auto-updated? Answer: yes, it seems to refresh every 30 seconds.

Why is the official FAQ so useless? Answer: no idea how a piece of software otherwise so feature-rich could have such a useless FAQ.

Zabbix costs nothing. Is it still actively supported? Answer: it seems very actively supported for some reason. Not sure what the revenue model is, however…

Can I force one or more web scenarios to be run immediately? I do this all the time in SiteScope. Answer: I guess not. There is no obvious way.

Suppose you have defined an item. what is the item key? Answer: no idea. But I’m seeing it’s very important…

What is the equivalent to SiteScope’s script monitors? Answer: Either ssh check or external check.

How would you set up a simple PING monitor, i.e., to see if your host is up? Answer: Create an item as a “simple check”, e.g., with the name ping this host, and the key icmpping[{HOST.IP},3]. That can go into a template, by the way. If it succeeded it will return a 1.

I’ve made an error in my script for an external check. Why does Latest data show nothing at all? Answer: no idea. If the error is bad enough Zabbix will disable the item on you, so it’s not really running any longer. But even when it doesn’t do that, a lot of times I simply see no output whatsoever. Very frustrating.

Why is the output from an ssh check truncated, where does the rest go? Answer: no idea.

How do you increase the information contained in the zabbix server log? Let’s say your zabbix server is running normally. Then run this command: zabbix_server ‐R log_level_increase
You can run it multiple times to keep increasing the verbosity (log level), I think.

Attempting to use ssh items with key authentication fails with :”Public key authentication failed: Callback returned error” Initially I thought Zabbix was broken with regards to ssh public key authentication. I can get it to work with password. I can use my public/private key to authenticate by hand from command-line as root. Turns out running command such as sudo -u zabbix ssh … showed that my zabbix account did not have permissions to write to its home directory (which did not even exist). I guess this is a case of RTFM, because they do go over all those steps in the manual. I fixed up permissions and now it works for me, yeah.

Where should the scripts for external checks go? In my install it is /usr/lib/zabbix/externalscripts.

Why is the behaviour of triggers inconsistent. sometimes the same trigger has expected behaviour, sometimes not. Answer: No idea. Very frustrating. See more on that topic below.

How do you force a web scenario check when you are using templates? Answer: No idea.

Why do (resolved) Problems disappear no matter how you search for them if they are older than, say, 30 minutes? Answer: No idea. Just another stupid feature I guess.

Why does it say No media defined for user even though user has been set up with email as his media? Answer: no idea.

Why do too many errors disable an ssh check so that you get Status Disabled and have no graceful way to recover? Answer: no idea. It makes sense that Zabbix should not subject itself to too many consecutive errors. But once you’ve fixed the underlying problem the only recovery I can figure is to delete the item and recreate it. or delete the host and re-create it. Not cool.

I heard dependent items are the way to go to parse complex data coming out of a rich text item. How do you do that? Answer: Yes they are. I have gotten them to work and really give me the fine-grained control I’ve always wanted. I hope to show a real-life example soon. To get started creating a dependent item you can right-click on the dots of an item, or create a new item and choose type Dependent Item.

I am looking at Latest data and one item is grayed out and has no data. Why? Answer: almost no idea. This happens to me in a dependent item formed by a regular expression where the regular expression does not match the content. I am trying to make my RegEx more flexible to match both good and error conditions.

Why do my dependent items, when running a Check Now, say Cannot send request: wrong data type, yet they are producing data just fine when viewed through Latest data? Answer: this happens if you ran a Check Now on your template rather than when viewing an individual host. Make sure you select a host before you run Check Now. Actually, even still it does not work, so final answer: no idea.

Why do some regular expressions check out just fine on regex101.com yet produce a match for value of type “string”: pattern does not match error in Zabbix? Answer: Some idea. Fancy regular expressions do not seem to work for some reason.

Every time I add an item it takes the absolute maximum amount of time before I see data, whether or not I run check Now until I turn blue in the face. Why? Answer: no idea. Very frustrating.

If the Zabbix server is in one time zone and I am in another, can I have my view of timestamps customized to my time zone? Otherwise I see all times in the timezone of the Zabbix server. Answer: You are out of luck. The suggestion is to run two GUIs, one in your time zone.

My DNS queries using net.dns don’t do anything. Why? Answer: no idea. Maybe your host is not running an actual Zabbiox agent? That’ll do it. Forget that net.dns check if you can’t install an agent. Zabbix has no agentless DNS monitor for some strange reason.

What does Check Now really do? Answer: essentially nothing as far as I tell. It certainly doesn’t “check now”, i.e., force the item to be run.

I have a lot of hosts I want to add to a template. Does that Mass Update feature actually work? Answer: yes. Use it. It will save you time.

Help! I accidentally deleted an entire template. I meant to just delete one of its macros. Is there a revert? Answer: it doesn’t look like it. Hope you remember what you did…

It seems if I choose units in an item which have too many characters, e.g., client connections, the graph cuts it off and doesn’t even display the scale? Answer: seems so. I’d call it a bug.

A word about SSH checks and triggers
Through the school of hard knocks I have learned that my ssh check is clipping the output from the executed command. So you know that partial data you see when you look at latest data, and thought it was truncating it for display purposes? Nuh, ah. That’s all you’re getting to go up against in your trigger, which sucks. It’s something like 260 characters. I got lucky in a sense to discover this early by running an ssh check against dns resolution of amazon.com. The response I got varied almost every 60 seconds depending on whether or not the response came out of the dns cache. So this was an excellent testbed to learn about the flakiness of triggers as well as waste an entire day.

Another thing about triggers with a regex. As far as I can tell the logic is reversed. So you think you’re defining the OK condition when you seek to match the output and have it given the value of 1. But instead try to match the desired output for the OK condition, but assign it a value of 0. I guess. Only that approach seems to work for me. And getting the regex to treat multiple lines as a unit was also a little tricky. I think by default it favored testing only against the last line.

So let’s say my output as scraped from Monitoring|Latest Data alternated between either

proxy1>test dns amazon.com
Performing DNS lookup for: amazon.com
 
DNS Response data:
Official Host Name: amazon.com
Resolved Addresses:
  205.251.242.103
  176.32.98.166
  176.32.103.205
Cache TTL: 1, cache HIT
DNS Resolver Response: Success

or

proxy1>test dns amazon.com
Performing DNS lookup for: amazon.com
 
Sending A query for amazon.com to 192.168.135.145.
 
Sending A query for amazon.com to 8.8.8.8.
 
DNS Response data:
Official Host Name: amazon.com
Resolved Addresses:
  20

, then here is my iregexp expression which seems to do the correct thing (treat both of these outcomes as successes):

{proxy1:ssh.run[resolve DNS,1.2.3.4,22,utf-8].iregexp("(?s)((205\.251\.|176\.32\.)|Sending A query.+\s20)")}=0

Note that the (?s) at the beginning helps, I think, to treat the newline character as just another character which matches “.”. I may have an extra set of parentheses around the outermost alternating expression, but I can only experiment so much…

I ran various tests such as to change just one of the numbers to make sure it triggered.

I now think I will get better, i.e., more complete, results if I make the item of type textrather than character, at least that switch definitely helped with another truncated output I was getting from another ssh check. So, yes, now I am capturing all the output. So, note to self, use type text unless you have really brief output from your ssh check.

So with all that gained knowledge, my simplified expression now reads like this:

{proxy:ssh.run[resolve a dns name,1.2.3.4,22,utf-8].iregexp("(205\.251\.|176\.32\.)")}=0

Here’s a CPU trigger. From a show status it focuses on the line:

CPU utilization: 29%

and so if I want to trigger a problem for 95% or higher CPU, this expression works for me:

CPU utilization:\s+([ 1-8]\d|9[0-4])\%

A nice online regular expression checker is https://regexr.com/

And a very simple PING test ssh check item, where the expected resulting line will be:

5 packets transmitted, 5 packets received, 0% packet loss

– for that I used the item wizard, altered what it came up with, and arrived at this:

(({proxy:ssh.run[ping 8.8.8.8,1.2.3.4,22,utf-8].iregexp("[45] packets received")})=0)

So I will accept the results as OK as long as at most one of five packets was dropped.

A lesson learned from SNMP monitoring of F5 devices
My F5 BigIP devices began producing problems as soon as we set up the SNMP monitoring. Something like this:

Node /Common/drj-10_1_2_3 is not available in some capacity: blue (4)

It never seemed to matter until now that my nodes appear blue. But perhaps SNMP is enforcing a best practice and expecting nodes to not be blue, meaning to be monitored. And it turns out you can set up a default monitor for your nodes (I use gateway_icmp). It’s found in Nodes | Default Monitor. I’m not sure why this is not better documented by F5. After this, many legacy nodes turn red so I am cleaning them up… But my conclusion is that I have learned something about my own systems from the act of implementing this monitoring, and that’s a good thing.

To be continued…

References and related
A good commercial solution for infrastructure monitoring: Microfocus SiteScope.

DIY monitoring

The Zabbix manual

A nice online regular expression (RegEx) checker is: https://RegEx101.com/.

Another online regular expression checker is: https://RegExr.com/.

Just to put it out there: If you like Zabbix you may also like Specto. Specto is an open-source tool for monitoring web sites (“synthetic” monitoring). I know one major organization which uses it so it can’t be too bad. https://specto.sourceforge.net/

Since this document is such a mess I’m starting to document some of my interesting items and Zabbix tips in this newer and cleaner post.