Categories
Linux Perl Python SLES

Using syslog within python

Intro

We created a convention where-in our scripts log to syslog with a certain style. Originally these were Perl scripts but newer scripts are written in python. My question was, how to do in python what we had done in Perl?

The details

The linux system uses syslog-ng. In /etc/syslog-ng/conf.d I created a test file 03drj.conf with these contents:

destination d_drjtest { file("/var/log/drjtest.log"); };
filter f_drjtest{ program("drjtest"); };
log { source(s_src); filter(f_drjtest); destination(d_drjtest); flags(final); };

So we want that each of our little production scripts has its own log file in /var/log/.

The python test program I wrote which outputs to syslog is this:

[

import syslog
syslog.openlog('jhtest',syslog.LOG_PID,facility=syslog.LOG_LOCAL0)
syslog.syslog(syslog.LOG_NOTICE,'[Notice] Starting')
syslog.syslog(syslog.LOG_ERR,'[Error] We got an error')
syslog.syslog(syslog.LOG_INFO,'[Info] Just informational stuff')

Easy, right? Then someone newer to python showed me what he had done – not using syslog but logger, in which he accomplished pretty much the same goal but by different means. But he had to hard-code a lot more of the values and so it was not as elegant in my opinion.

In any case, the output is in /var/log/drjtest.log which look like this after a test run:

Jul 24 17:45:32 drjohnshost drjtest[928]: [Notice] Starting
Jul 24 17:45:32 drjohnshost drjtest[928]: [Error] We got an error
Jul 24 17:45:32 drjohnshost drjtest[928]: [Info] Just informational stuff
OSes using rsyslog

Today I needed to make this style of logging work on a new system which was running rsyslog. The OS is SLES v 15. For this OS I added a file to /etc/rsyslog.d called drjtest.conf with the contents:

if ($programname == 'drjtest' ) then {
        -/var/log/drjtest.log
        stop
}

But the python program did not need to change in any way.

Conclusion

We show how to properly use the syslog facility within python by using the syslog package. It’s all pretty obvious and barely needs to be mentioned, except when you’re just statring out you want a little hint that you may not find in the man pages or the documentation at syslog-ng.

References and related

I have a neat script which we use to parse all these other scripts and give us once a week summary emails, unless and error has been detected in which case the summary email goes out the day after a script has reported an error. It has some pretty nice logic if I say so myself. Here it is: drjohns script checker.

Categories
Admin Linux SLES

How to add private root CAs in SLES or Redhat or Debian

Intro
From time-to-time I run my own PKI infrastructure, namely issuing my own certificates from my private root CA. I wanted this root CA to be recognized by Linux utilities running on Suse Linux (SLES), in particular, lftp, which I was trying to use to access an ftps site, which itself is a post for another day. In other words, how do you add a certificate to the certificate store in linux?

The details
Let’s say you have your root certificate in the standard form like this example

-----BEGIN CERTIFICATE-----
MIIIPzCCBiegAwIBAgITfgAAAATHCoXJivwKLQAAAAAABDANBgkqhkiG9w0BAQsF\nADA2MQswCQYD
VQQGEwJERTENMAsGA1UEChMEQkFTRjEYMBYGA1UEAxMPQkFTRiBS\nb290IENBIDIxMB4XDTE3MDgxMDEyNDAwOFoXDTI4MDgxMDEyNTAwOFowXDETMBEG\nCgm
...
PEScyptUSAaGjS4JuxsNoL6URXYHxJsR0bPlet\nSct
-----END CERTIFICATE-----

Then you can put the certificate inline and within one script install it so that it permanently joins the other root CAs in /etc/ssl/certs with a script like this example:

DrJ_Root_CA="-----BEGIN CERTIFICATE-----\nMIIIPzCCBiegAwIBAgITfgAAAATHCoXJivwKLQAAAAAABDANBgkqhkiG9w0BAQsF\nADA2MQswCQYD
VQQGEwJERTENMAsGA1UEChMEQkFTRjEYMBYGA1UEAxMPQkFTRiBS\nb290IENBIDIxMB4XDTE3MDgxMDEyNDAwOFoXDTI4MDgxMDEyNTAwOFowXDETMBEG\nCgm
SJomT8ixkARkWA05FVDEUMBIGCgmSJomT8ixkARkWBEJBU0YxFjAUBgoJkiaJ\nk/IsZAEZFgZCQVNGQUQxFzAVBgNVBAMTDkJBU0YgU1VCIENBIDIzMIICIjAN
Bgkq\nhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAqrfoKxrCPCw/u2PBEaAwW/VHLxBw6JNi\n42F3EhXmligGb/Uu4kcWO016IGFatVrPhdAtShAqmTXis0w57hW
jn1Iptvo7rROY\nGPmH7aSW/fYM/x2Lln7NlltayXspWawqBzWzYGADodyjn/Z5TaLYaG8lajiabCM5\nUJDhlZ/SUR3xylqIIFaQK3k2twjeGoxobhbr9hJcQZ
fXF0V5FCSCzJExDYma6bs1\nZtyqP/yHaiOeWXGdnqM9EPfT8kmIC42ZXq7s2JZI5OUflJBbaebYEbuDad6Rh19E\nRchXABLe68+TF/4AZCw16iRwRgq/2Re2W
WPMtVomyZ2txvn51iizqBkdVGzIRklC\n3yIv5MRzDFTfG940/tSAomHsz+RdGbL+NCBeWSY+rnJQdExJ7bLXFLVsTNGL68lP\nMuYrkxYQKWRtVhvQCHsdd5E0
t9QR4iY1JLWQxq3GHy98tBbCGiKMpBbuj/9I/E6c\nGrikouv2QyNnCN34PXpUxTQmDj5LZGV9w2faqpwUBD2ZWsbyVSgvD8TcjdxzcMcj\nLBnYUaZ8wHFqUj2
DBahctfKQxA8Ptrzt1mDIGOQliZGDwrTVMECd+noQhTlF1eS+\nvNraV3dYRMymVxh58MPEaDJgwIRcBWAAOeBbZlyx76oskXdmjOiz5jqyoR5eweCE\ntS4jfM
EW6UECAwEAAaOCAx4wggMaMAsGA1UdDwQEAwIBhjAQBgkrBgEEAYI3FQEE\nAwIBADAdBgNVHQ4EFgQUdn7nwFGpb8uzpFVs5QWQcsA0Q6IwQwYDVR0gBDwwOjA
4\nBgwrBgEEAYGlZAMCAgEwKDAmBggrBgEFBQcCARYaaHR0cDovL3BraXdlYi5iYXNm\nLmNvbS9jcAAwGQYJKwYBBAGCNxQCBAweCgBTAHUAYgBDAEEwEgYDVR
0TAQH/BAgw\nBgEB/wIBADAfBgNVHSMEGDAWgBSS9auUcX38rmNVmQsv6DKAMZcmXDCCAQkGA1Ud\nHwSCAQAwgf0wgfqggfeggfSGgbZsZGFwOi8vL0NOPUJBU
0YlMjBSb290JTIwQ0El\nMjAyMSxDTj1DRFAsQ049UHVibGljJTIwS2V5JTIwU2VydmljZXMsQ049U2Vydmlj\nZXMsQ049Q29uZmlndXJhdGlvbixEQz1yb290
LERDPWJhc2YsREM9Y29tP2NlcnRp\nZmljYXRlUmV2b2NhdGlvbkxpc3Q/YmFzZT9vYmplY3RDbGFzcz1jUkxEaXN0cmli\ndXRpb25Qb2ludIY5aHR0cDovL3B
raXdlYi5iYXNmLmNvbS9yb290Y2EyMS9CQVNG\nJTIwUm9vdCUyMENBJTIwMjEuY3JsMIIBNgYIKwYBBQUHAQEEggEoNIIBJDCBuQYI\nKwYBBQUHMAKGgaxsZG
FwOi8vL0NOPUJBU0YlMjBSb290JTIwQ0ElMjAyMSxDTj1B\nSUEsQ049UHVibGljJTIwS2V5JTIwU2VydmljZXMsQ049U2VydmljZXMsQ049Q29u\nZmlndXJhd
GlvbixEQz1yb290LERDPWJhc2YsREM9Y29tP2NBQ2VydGlmaWNhdGU/\nYmFzZT9vYmplY3RDbGFzcz1jZXJ0aWZpY2F0aW9uQXV0aG9yaXR5MGYGCCsGAQUF\n
BzAChlpodHRwOi8vcGtpd2ViLmJhc2YuY29tL3Jvb3RjYTIxL1JPT1RDQTIxLnJ6\nLWMwMDctajY1MC5iYXNmLWFnLmRlX0JBU0YlMjBSb290JTIwQ0ElMjAyM
S5jcnQw\nDQYJKoZIhvcNAQELBQADggIBAClCvn9sKo/gbrEygtUPsVy9cj9UOQ2/CciCdzpz\nXhuXfoCIICgc0YFzCajoXBLj4V6zcYKjz8RndaLabDaaSQgj
phXFiZSBH8OII+cp\nTCWW1x+JElJXo9HB7Ziva2PeuU5ajXtvql5PegFYWdmgK2Q1QH0J2f1rr7B4nNGu\noyBi1TOSll+0yJApjx213lM9obt6hkXkjeisjcq
auMVh+8KloM0LQOTAD1bDAvpa\nVVN9wlbytvf4tLxHpvrxEQEmVtTAdVchuQV1QCeIbqIxW41l6nhE2TlPwEmTr+Cv\najMID/ebnc9WzeweyTddb6DSmn4mSc
okGpj8j8Z7cw173Yomhg1tEEfEzip+/Jx6\nd2qblZ9BUih9sHE8rtUBEPLvBZwr2frkXzL3f8D6w36LxuhcqJOmDaIPDpJMH/65\nAbYnJyhwJeGUbrRm3zVtA
5QHIiSHi2gTdEw+9EfyIhuNKS4FO/uonjJJcKBtaufl\nGFL6y0WegbS5xlMV9RwkM22R7sQkBbDTr+79MqJXYCGtbyX0JxIgOGbE4mxvdDVh\nmuPo9IpRc5Jl
pSWUa7HvZUEuLnUicRbfrs1PK/FBF7aSrJLoYprHPgP6421pl08H\nhhJXE9XA2aIfEkJ4BcKw0BqOP/PEScyptUSAaGjS4JuxsNoL6URXYHxJsR0bPlet\nSct
3\n-----END CERTIFICATE-----\n"
 
cd /etc/pki/trust/anchors/
echo -e -n $DrJ_Root_CA > DrJ_Root_CA.pem
c_rehash
update-ca-certificates

So the key commands are c_rehash and update-ca-certificates.

Usually SLES is similar to Redhat. But it seems to be different in this case.

This was tested on a SLES 12 SP3 system.

It copies the certificate to /etc/pki/trust/anchors, which by itself is insufficient. Then it creates some kind of hash symlink to the CA file and makes sure that this new certificate doesn’t get wiped out by subsequent system patching. That’s the purpose of the c_rehash and update-ca-certificates commands.

You may also see these hashes and certificates in /etc/ssl/certs. I’m not sure because that’s where I started with all this. But merely dropping the private root CA into /etc/ssl/certs is insufficient, I can say from experience!

Redhat
Redhat is better documented, but for completeness I include it here. You have your inline certificate as in the SLES script, then following that:

...
cd /etc/pki/ca-trust/source/anchors/
echo -e -n $DrJ_Root_CA > DrJ_Root_CA.pem
update-ca-trust

So update-ca-trust is the key command for Redhat Linux. This was tested on Redhat Linux v 7.6.

Fedora v 33

Put your CA file with a .crt file extension into /etc/pki/ca-trust/source/anchors like for Redhat. Run update-ca-trust extract

Debian Linux circa 2023

Put your private CA file into a new directory /usr/share/ca-certificates/extra. Then run sudo dpkg-reconfigure ca-certificates. When prompted with a list of bundles to include make sure to enable your new extra file. Your certificate file needs to end in ‘crt’, not, e.g., ‘cer’. Seems pretty arbitrary to me, but that’s how it is. Of course it has to be standard PEM format. (—BEGIN CERTIFICATE—, etc.).

Python and self-signed certificates or certificates from private CAs

First, note that those are two different cases and need to be handled slightly differently! You may be in need of these measures if you are getting an error in python like this:

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host=’www.myhost.local’, port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)’)))

Self-signed certificate

If the certificate is truly self-signed, then throw it into a file, let’s call it my-crt.crt in your home directory. Then set an environment variable before running python:

$ export REQUESTS_CA_BUNDLE=~/my-crt.crt

It should now work.

Python reverts to SSL: CERTIFICATE_VERIFY_FAILED after upgrade

So I had it all working. Then the requests package complained about my version of the urllib package. So I upgraded requests with a pip3 –upgrade requests. They the above-mentioned SSL error came back. I noticed that urllib got upgraded when requests was upgraded.

I basically gave up and totally kludged python to fix this. But only after playing with the certifi package etc. So I took the file that is for me the output of certifi.where(): /usr/local/lib/python3.9/site-packages/certifi/cacert.pem

I edited it as root and simply appended by private root CAs to that file. I hated to do it, and I know I should have at least used a virtual env, blah, blah. But at least now my jobs run. By the way, my by-hand tests of urlopen all worked! So it’s just the way some package was using it beyond my control that I had to create this kludge.

Certificate issued from a private CA

I added the private CA to the system CA on Debian with the update-ca-certificates mentioned above. Still no joy. Then I noticed the web server forgot to provide the intermediate certificate so I added that as well. Then, at least, curl began to work. But not python. Strange. For python I still need to define this environment variable:

$ export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

A second method to handle the case of a certificate issued from a private CA is to bundle the certificate + the intermediate certificate + the private root CA all into a single file, let’s call it my-crt.crt, in your home directory, and define the envirnoment variable same as for the self-signed certificate case:

$ export REQUESTS_CA_BUNDLE=~/my-crt.crt

My favorite openssl commands shows some commands to run to examine the certificate of a web server.

lftp usage tip with a private CA
If like me you were doing this work in conjunction with running ftps using a certificate signed by a private CA, and want your ftp client, lftp, to not complain about the unrecognized CA, then this tip will help.

After initiating your lftp and sending the username and password, you can send this command
$ ssl:ca-file <path-to-your-private-CA-file>
lftp is so flexible it offers many other ways to do this as well. But this is the one I use.

Conclusion
We show how to add your own root CA to a SLES 12 system. I did not find a good reference for this informaiton anywhere on the Internet.

References and related
My favorite openssl commands.

The basics of working with cipher settings

For Reedhat/CentOS I am evaluating this blog post on the proper way to add your own private CA: https://www.happyassassin.net/2015/01/14/trusting-additional-cas-in-fedora-rhel-centos-dont-append-to-etcpkitlscertsca-bundle-crt-or-etcpkitlscert-pem/

For the Redhat approach I used this blog post: https://www.happyassassin.net/2015/01/14/trusting-additional-cas-in-fedora-rhel-centos-dont-append-to-etcpkitlscertsca-bundle-crt-or-etcpkitlscert-pem/

Categories
Admin Linux Network Technologies SLES

Linux tip: how to enable remote syslog on SLES

Intro
I write this knowing I still don’t know anything to speak of about syslog, but, sometimes you gotta act without knowing. I needed to send syslog to somewhere in a big hurry so I figured out the absolute minimum I needed to do to get it running on one of my other systems.

The details
This all started because of a deficiency in the F5 ASM. At best it’s do slow when looking through the error log. But in particular there was one error that always timed out when I tried to bring up the details, a severity 5 error, so it looked pretty important. Worse, local logging, even though it is selected, also does not work – the /var/log/asm file exists but contains basically nothing of interest. I suppose there is some super-fancy and complicated MySQL command you could run to view the logs, but that would take a long time to figure out.

So for me the simplest route was to enable remote syslog on a Linux server and send the ASM logging to it. This seems to be working, by the way.

The minimal steps
Again, this was for Suse Enterprise Linux running syslog-ng.

  1. modify /etc/sysconfig/syslog as per the next step
  2. SYSLOGD_PARAMS=”-r”
  3. modify /etc/syslog-ng/syslog-ng.conf as per the next step
  4. uncomment this line: udp(ip(“0.0.0.0”) port(514));
  5. launch yast (I use curses-based yast [no X-Windows] which is really cantankerous)
  6. go to Security and Users -> Firewall -> Allowed services -> Internal Zone -> Advanced
  7. add udp port 514 as additional allowed Ports in internal zone and save it
  8. service syslog stop
  9. service syslog start
  10. You should start seeing entries in /var/log/localmessages as in this suitably anonymized example (I added a couple line breaks for clarity:
Jul 27 14:42:22 f5-drj-mgmt ASM:"7653503868885627313","50.17.188.196","/Common/drjohnstechtalk.com_profile","blocked","/drjcrm/bi/tjhmore345","0","Illegal URL,Attack signature detected","200021075","Automated client access ""curl""","US","<!--?xml version='1.0' encoding='UTF-8'?-->44e7f1ffebff2dfb-800000000000000044f7f1ffebff2dfb-800000000000000044e7f1ffe3ff2dfb-80000000000000000000000000000000-000000000000000042VIOL_ATTACK_SIGNATURErequest200021075
7VXNlci1BZ2VudDogY3VybC83LjE5LjcgKHg4Nl82NC1yZWRoYXQtbGludXgtZ251KSBsaWJjdXJsLzcuMTkuNyBOU1MvMy4yNy4xIHpsaWIvMS4yLjMgbGliaWRuLzEuMTggbGlic3NoMi8xLjQuMg0KSG9zdDogYWctaW50ZWw=
01638
VIOL_URL","GET /drjcrm/bi/tjhmore345 HTTP/1.1\r\nUser-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 zlib/1.2.3 libidn/1.18 libssh2/1.4.2\r\nHost: drjohnstechtalk.com\r\nAccept: */*\r\n\r\n"

Observations
Interestingly, there is no syslogd on this particular system, and yet the “-r” flag is designed for syslogd – it’s what turns it into a remote syslogging daemon. And yet it works.

It’s easy enough to log these messages to their own file, I just don’t know how to do it yet because I don’t need to. I learn as I need to. just as I learned enough to publish this tip.

Conclusion
We have demonstrated activating the simplest possible remote syslogger on Suse Linux Enterprise Server.

References and related

Want to know what syslog is? Howtonetwork has this very good writeup: https://www.howtonetwork.com/technical/security-technical/syslog-server/

Categories
Raspberry Pi SLES Web Site Technologies

Pi-hole: it’s as easy as pi to get rid of your advertisements

Intro
I learned about pi-hole from Bloomberg Businessweek of all places. Seems right up my alley – uses Raspberry Pi in your home to get rid of advertisements. Turns out it was too easy and I don’t have much to contribute except my own experiences with it!

The details
When I read about it I got to thinking big picture and wondered what would prevent us from running an enterprise version of this same thing? Well, large enerprises don’t normally run production critical applications like DNS servers (which this is, by the way) on Raspberry Pis, which is not the world’s most stable hardware! But first I had to try it at home just to learn more about the technology.

pi-hole admin screen

I was surprised just how optimized it was for the Raspberry Pi, to the neglect of other systems. So the idea of using an old SLES server is out the window.

But I think I got the essence of the idea. It replaces your DNS server with a custom one that resolves normal queries for web sites the usual way, but for DNS queries that would resolve to an Ad server, it clobbers the DNS and returns its own IP address. Why? So that it can send you a harmless blank image or whatever in place of an Internet ad.

You know those sites that obnoxiously throw up those auto-playing videos? That ain’t gonna happen any more when you run pi-hole.

You have to be a little adept at modifying your home router, but they even have a rough tutorial for that.

Installation
For the record on my Rspberry Pi I only did this:
$ sudo su ‐
$ curl ‐sSL https://install.pi‐hole.net | bash

It prompted me for a few configuration details, but the answers were obvious. I chose Google DNS servers because I have a long and positive history using them.

You can see that it installs a bunch of packages – surprisingly many considering how simple in theory the thing is.

Test it
On your Raspberry Pi do a few test resolutions:

$ dig google.com @localhost # should look like it normally does
$ dig pi.hole # should return the IP of your Raspberry Pi
$ dig adservices.google.com # I gotta check this one. Should return IP address of your Pi

It runs a little web server on your Pi so the Pi acts as adservices.google.com and just serves out some white space instead of the ad you would have gotten.

Linksys router
Another word about the home router DHCP settings. You have the option to enter DNS server. So I put the IP address of my raspberry pi, 192.168.1.119. What I expected is that this is the DNS server that would be directly handed out to the DHCP clients on my home network. But that is not the case. Instead it still hands out itself, 192.168.1.1 as DNS server. But in turn it uses the raspberry PI for its resolution. This through me when I did an ipconfig /all on my Windows 10 and didn’t see the DNS server I expected. But it wa all working. About 10% of my DNS queries were pi-holed (see picture of my admin screen above).

I guess pi-hole is run by fanatics, because it works surprisingly well. Those complex sites still worked, like cnn.com, cnet.com. But they probably load faster without the ads.

Two months check up

I checked back with pihole. I know a DNS server is running. The dashboard is broken – the sections just have spinning circle instead of data. It’s already asking me to upgrade to v 3.3.1. I run pihole -up to do the upgrade.

Another little advantage
I can now ssh to my pi by specifying the host as pi.hole – which I can actually remember!

Idea for enterprise
finally, the essence of the idea probably could be ported over to an enterprise. In my opinion the secret sauce are the lists of domain names to clobber. There are five or six of them. Some have 50,000 entries. So you’d probably need a specialized DNS server rather than the default ISC BIND. I remember running a specialized DNS server like that when I ran Puremessage by Sophos. It was optimized to suck in real-time blacklists and the like. I have to dig through my notes to see what we ran. I’m sure it wasn’t dnsmasq, which is what pi-hole runs on the Raspberry Pi! But with these lists and some string manipulation and a simple web server I’d think it’d be possible to replicate in enterprise environment. I may never get the opportunity, more for lack of time than for lack of ability…

Conclusion
Looking for a rewarding project for your Raspberry Pi? Spare yourself Internet advertisements at home by putting it to work.

References and related
The pi-hole web site: https://pi-hole.net/
Another Raspberry Pi project idea: monitor your cable modem and restart it when it goes south.

Categories
Linux SLES Web Site Technologies

Compiling curl and openssl on Redhat Linux

Intro
I have an ancient Redhat system which I’m not in a position to upgrade. I like to use curl to test web sites, but it’s getting to the point that my ancient version has no SSL versions in common with some secure web sites. I desperately wanted to upgrade curl while leaving the rest of the system as is. Is it even possible? How would you do it? All these things and more are explained in today’s riveting blog post.

The details
Redhat version
I don’t know the proper command so I do this:
$ cat /etc/system-release

ed Hat Enterprise Linux Server release 6.6 (Santiago)

Current curl version
$ ./curl ‐‐version

curl 7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2

Limited set of SSL/TLS protocols
$ curl ‐help

...
 -2/--sslv2         Use SSLv2 (SSL)
 -3/--sslv3         Use SSLv3 (SSL)
...
 -z/--time-cond <time> Transfer based on a time condition
 -1/--tlsv1         Use TLSv1 (SSL)
...

New version of curl

curl 7.55.1 (x86_64-unknown-linux-gnu) libcurl/7.55.1 OpenSSL/1.1.0f zlib/1.2.3

New SSL options

     --ssl           Try SSL/TLS
     --ssl-allow-beast Allow security flaw to improve interop
     --ssl-no-revoke Disable cert revocation checks (WinSSL)
     --ssl-reqd      Require SSL/TLS
 -2, --sslv2         Use SSLv2
 -3, --sslv3         Use SSLv3
...
     --tls-max <VERSION> Use TLSv1.0 or greater
     --tlsauthtype <type> TLS authentication type
     --tlspassword   TLS password
     --tlsuser <name> TLS user name
 -1, --tlsv1         Use TLSv1.0 or greater
     --tlsv1.0       Use TLSv1.0
     --tlsv1.1       Use TLSv1.1
     --tlsv1.2       Use TLSv1.2
     --tlsv1.3       Use TLSv1.3

Now that’s an upgrade! How did we get to this point?

Well, I tried to get a curl RPM – seems like the appropriate path for a lazy system administrator, right? Well, not so fast. It’s not hard to find an RPM, but trying to install one showed a lot of missing dependencies, as in this example:
$ sudo rpm ‐i curl‐minimal‐7.55.1‐2.0.cf.fc27.x86_64.rpm

warning: curl-minimal-7.55.1-2.0.cf.fc27.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID b56a8bac: NOKEY
error: Failed dependencies:
        libc.so.6(GLIBC_2.14)(64bit) is needed by curl-minimal-7.55.1-2.0.cf.fc27.x86_64
        libc.so.6(GLIBC_2.17)(64bit) is needed by curl-minimal-7.55.1-2.0.cf.fc27.x86_64
        libcrypto.so.1.1()(64bit) is needed by curl-minimal-7.55.1-2.0.cf.fc27.x86_64
        libcurl(x86-64) >= 7.55.1-2.0.cf.fc27 is needed by curl-minimal-7.55.1-2.0.cf.fc27.x86_64
        libssl.so.1.1()(64bit) is needed by curl-minimal-7.55.1-2.0.cf.fc27.x86_64
        curl conflicts with curl-minimal-7.55.1-2.0.cf.fc27.x86_64

So I looked at the libcurl RPM, but it had its own set of dependencies. Pretty soon it looks like a full-time job to get this thing compiled!

I found the instructions mentioned in the reference, but they didn’t work for me exactly like that. Besides, I don’t have a working git program. So here’s what I did.

Compiling openssl

I downloaded the latest openssl, 1.1.0f, from https://www.openssl.org/source/ , untar it, go into the openssl-1.1.0f directory, and then:

$ ./config ‐Wl,‐‐enable‐new‐dtags ‐‐prefix=/usr/local/ssl ‐‐openssldir=/usr/local/ssl
$ make depend
$ make
$ sudo make install

So far so good.

Compiling zlib
For zlib I was lazy and mostly followed the other guy’s commands. Went something like this:
$ lib=zlib-1.2.11
$ wget http://zlib.net/$lib.tar.gz
$ tar xzvf $lib.tar.gz
$ mv $lib zlib
$ cd zlib
$ ./configure
$ make
$ cd ..
$ CD=$(pwd)

No problems there…

Compiling curl
curl was tricky and when I followed the guy’s instructions I got the very problem he sought to avoid.

vtls/openssl.c: In function ‘Curl_ossl_seed’:
vtls/openssl.c:276: error: implicit declaration of function ‘RAND_egd’
make[2]: *** [libcurl_la-openssl.lo] Error 1
make[2]: Leaving directory `/usr/local/src/curl/curl-7.55.1/lib'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/local/src/curl/curl-7.55.1/lib'
make: *** [all-recursive] Error 1

I looked at the source and decided that what might help is to add a hint where the openssl stuff could be found.

Backing up a bit, I got the source from https://curl.haxx.se/download.html. I chose the file curl-7.55.1.tar.gz. Untar it, go into the curl-7.55.1 directory,
$ ./buildconf
$ PKG_CONFIG_PATH=/usr/local/ssl/lib/pkgconfig LIBS=”‐ldl”

and then – here is the single most important point in the whole blog – configure it thusly:

$ ./configure ‐‐with‐zlib=$CD/zlib ‐‐disable‐shared ‐‐with‐ssl=/usr/local/ssl

So my insight was to add the ‐‐with‐ssl=/usr/local/ssl to the configure command.

Then of course you make it:

$ make

and maybe even install it:

$ make install

This put curl into /usr/local/bin. I actually made a sym link and made this the default version with this kludge (the following commands were run as root):

$ cd /usr/bin; mv curl{,.orig}; ln ‐s /usr/local/bin/curl

That’s it! That worked and produced a working, modern curl.

By the way it mentions TLS1.3, but when you try to use it:

$ curl ‐i ‐k ‐‐tlsv1.3 https://drjohnstechtalk.com/

curl: (4) OpenSSL was built without TLS 1.3 support

It’s a no go. But at least TLS1.2 works just fine in this version.

One other thing – put shared libraries in a common area
I copied my compiled curl from Redhat to a SLES 11 SP 3 system. It didn’t quite run. Only thing is, it was missing the openssl libraries. So I guess it’s also important to copy over

libssl.so.1.1
libcrypto.so.1.1

to /usr/lib64 from /usr/local/lib64.

Once I did that, it worked like a charm!

Conclusion
We show how to compile the latest version of openssl and curl on an older Redhat 6.x OS. The motivation for doing so was to remain compatible with web sites which are already or soon dropping their support for TLS 1.0. With the compiled version curl and openssl supports TLS 1.2 which should keep it useful for a long while.

References and related
I closely followed the instructions in this stackoverflow post: https://stackoverflow.com/questions/44270707/cant-build-latest-libcurl-on-rhel-7-3#44297265
openssl source: https://www.openssl.org/source/
curl sources: https://curl.haxx.se/download.html
Here’s a web site that only supports TLS 1.2 which shows the problem: https://www.askapache.com/. You can see for yourself on ssllabs.com

Categories
Admin Apache Security SLES Web Site Technologies

RSA Web Agent Installation: what might go wrong

Intro
As usual I ran into a few problems installing the RSA Web agent for a client. With this documentation I hope to jog my memory for my next installation or help someone else out who is experiencing the same problems.

The details
I was installing it on on SLES 11 system, Web Agent version 7.1.

So I ran the CD/install program as root and went through the prompts for the initial setup. I tried to laucnh firefox at the end, which it couldn’t, but I don’t think that is significant. I start up the web server. The error.log file begins to fill up! It looks like this:

acestatus: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
rpc_server 2389 started by 2379
RSALogoffCookieService: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file o
r directory
AceShutdown try to kill process 2389
signal 15 received
acestatus: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
RSALogoffCookieService: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file o
r directory
start child 2403
[Mon Aug 18 16:17:55 2014] [notice] Apache/2.2.27 (Unix) mod_rsawebagent/7.1.0[639] DAV/2 PHP/5.2.14 with Suhosin-Patch con
figured -- resuming normal operations
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2403 end
start child 2409
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2409 end
start child 2410
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2410 end
start child 2411
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2411 end
start child 2412
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2412 end
start child 2413
...

Not good.

So I eventually realize that my web server is running as user wwwrun and the RSA web agent stuff I installed as root and its directory, rsawebagent, is owned by userid 40959 – there was no attempt by the installer to match that up to the user the web server runs as. So I try a fix by hand like this:

$ chown -R wwwrun rsawebagent

Success! That succeeds in getting rid of the repeating RPC error. Now the error.log file has only a modest level of errors:

acestatus: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
rpc_server 27766 started by 27756
RSALogoffCookieService: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
AceShutdown try to kill process 27766
signal 15 received
acestatus: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
RSALogoffCookieService: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
start child 27780
[Mon Aug 18 16:25:00 2014] [notice] Apache/2.2.27 (Unix) mod_rsawebagent/7.1.0[639] DAV/2 PHP/5.2.14 with Suhosin-Patch configured -- resuming normal operations

But the thing is, it actually, mostly kind of, seems to work. You see a promising Authentication Succeeded screen in your browser after logging in to the home page. But then it directs you back to the RSA login screen. I was actually stuck on this point for a long time.

The error.log file also looks encouraging at this point:

[Mon Aug 18 16:27:28 2014] [notice] Authentication succeeded User: drj.

My insight today is to tackle the libaceclnt.so problem. I actually ran a trace of the startup to see where it was looking for that file so I could put it there. It was looking in system directories like these:

[pid 31974] open("/usr/lib64/tls/x86_64/libaceclnt.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 31974] stat("/usr/lib64/tls/x86_64", 0x7fff93b721b0) = -1 ENOENT (No such file or directory)
[pid 31974] open("/usr/lib64/tls/libaceclnt.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 31974] stat("/usr/lib64/tls", 0x7fff93b721b0) = -1 ENOENT (No such file or directory)
[pid 31974] open("/usr/lib64/x86_64/libaceclnt.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 31974] stat("/usr/lib64/x86_64", 0x7fff93b721b0) = -1 ENOENT (No such file or directory)
[pid 31974] open("/usr/lib64/libaceclnt.so", O_RDONLY) = -1 ENOENT (No such file or directory)
...

So I decided to make a soft link to it from /usr/lib64 such that:

 libaceclnt.so -> /usr/local/apache202/rsawebagent/libaceclnt.so

Note that my ServerRoot was /usr/local/apache202.

Now when I start up my apache202 instance I have this in error.log:

rpc_server 28874 started by 28860
grep RSALogoffCookieService /proc/*/cmdline | sed 's/\/cmdline.*\/proc\// /g' | sed 's/\/cmdline.*/ /'  | sed 's/.*\/proc\// /' | sort -u
start child 28877
grep RSALogoffCookieService /proc/*/cmdline | sed 's/\/cmdline.*\/proc\// /g' | sed 's/\/cmdline.*/ /'  | sed 's/.*\/proc\// /' | sort -u
AceShutdown try to kill process 28874
signal 15 received
grep RSALogoffCookieService /proc/*/cmdline | sed 's/\/cmdline.*\/proc\// /g' | sed 's/\/cmdline.*/ /'  | sed 's/.*\/proc\// /' | sort -u
start child 28913
[Mon Aug 18 16:36:23 2014] [notice] Apache/2.2.27 (Unix) mod_rsawebagent/7.1.0[639] DAV/2 PHP/5.2.14 with Suhosin-Patch configured -- resuming normal operations

And best of all – it actually works!

I get the RSA authentication page initially. I log on and get redirected to the actual server home page. The access.log file records my username in the access line.

Additional error observed months later
You know that symptom I described above? You see a promising Authentication Succeeded screen in your browser after logging in to the home page. But then it directs you back to the RSA login screen. My web server had been running fine for over a month when all of a sudden it behaved that way again. Confounding. So I put on my big boy pants and did an strace. Nothing popped out at me, but I was struck by frequent access to an htdocs filepath. What’s so unusual about that? I don’t use htdocs in my configurations! So where was that coming from? I re-checked my configuration. OK, this is embarrassing. I have a sweeping include statement in my top-level httpd.conf file:

# pick up all vhosts
Include conf/vhosts/*.conf

It seemed like a good idea at the time. In my conf/vhosts directory I actually had two conf files, my rsaauth.conf but also a dflt.conf!! And the dflt.conf had the references to htdocs, but no references to the RSA authentication. So it was being used to establish the location of the home directory and the other conf file to fix the authentication type, I guess.

I removed the dflt.conf file, restarted and everything began to work once again. Whew!

RPC errors returned after a few months
After a year or so of running the RPC errors mentioned above returned and I never could figure out why and I no longer needed this service so I didn’t pursue it.

Conclusion
A few errors were observed installing RSA Web Agent v 7.1 on SLES Linux. I had had similar problems on Redhat as well. I finally found some solutions and now they’re ready to use it!

References
This write-up is partially related to my blog post of installing multiple apache instances.

Categories
Admin Internet Mail SLES

The IT Detecive Agency: emails began piling up this week, no obvious cause

Intro
Today I had my choice of problems I could highlight, but I like this one the best. Our mail server delivers email to a wide variety of recipients. All was going well and it ran pretty much unattended until this week when it didn’t go so well. Most emails were getting delivered, but more and more were starting to pile up in the queues. This is the story of how we unraveled the mystery.

The details
It’s best to work from examples I think. I noticed emails to me.com were being refused delivery as well as emails to rnbdesign.com. The latter is a smaller company so we heard from them the usual story that we’re the only ones who can’t send to them.

So I forced delivery with verbose logging. I’m running sendmail, so that looks like this:

> sendmail -qRrnbdesign.com -Cconfig_file -v

That didn’t work out, producing a no route to host type of error. I did a DNS lookup by hand. That showed one set of results, while sendmail was connecting to an entirely different IP address. How could that be??

I was at a loss so I do what I do when I’m desperate: strace. That looks like this:

> strace -f sendmail -qRrnbdesign.com -Cconfig_file -v > /tmp/strace 2>&1

That produced 12,000 lines of output. All the system calls that the process and any of its forked processes invoke. Is that too much to comb through by hand? No, not at all, not when you begin to see the patterns.

I pored over the trace, not knowing what most of it meant, but looking for especially any activity regarding networking and DNS. Around line 6,000 I found it. There was mention of nscd.

For the unaware the use of nscd (nameserver caching daemon) might seem innocent enough, or even good-intentioned. What could be wrong with caching frequently used DNS results? The only issue is that it doesn’t work right! nscd derives from UC Berkeley Unix code and has never been supported. I didn’t even like it when I was running SunOS. It caches the DNS queries but ignores TTLs. This is fatal for mail servers or just about anything you can think of, especially on servers that are infrequently booted as mine are.

I stopped nscd right away:

> service nscd stop

and re-ran the sendmail queue runner (same command as above). The rnbdesign.com emails flowed out instantly! Soon hundreds of stuck emails were flushed out.

Of course for good measure nscd had to be removed from the startup sequence:

> chkconfig nscd off

An IT pro always keeps unsolved mysteries in his mind. This time I knew I also had in hand the solution an earlier-documented mystery about email to paladinny.com.

Conclusion
nscd might show up in your SLES or OpenSuse server. I strongly suggest to disable it before you wind up with old DNS values and an extremely hard-to-debug issue.

Case closed!

Categories
Admin DNS Internet Mail SLES

Strange problem with email to paladinny.com

Intro
This is probably the most obscure of all postings I will ever do – it’s really just opening up my private journal to the Internet, which helps me when I need to recall how I fixed something.

So the story is that I’m having trouble sending email to anyone in the domain paladinny.com, and I just couldn’t figure out why.

The details
With my sendmail config I finally rolled up my sleeves, and did some debugging, even though I am pressed for time. Start up our sendmail debugging session:

> sendmail -Cconfig_file.cf -bt -d35.9

This produces a lot of blah, blah, configuration settings, blah, blah, and finally a sort of sendmail debugging shell. So let’s test a good “normal” domain:

> 3,0 [email protected]

canonify           input: test @ gmail . com
Canonify2          input: test < @ gmail . com >
Canonify2        returns: test < @ gmail . com . >
canonify         returns: test < @ gmail . com . >
parse              input: test < @ gmail . com . >
Parse0             input: test < @ gmail . com . >
Parse0           returns: test < @ gmail . com . >
ParseLocal         input: test < @ gmail . com . >
ParseLocal       returns: test < @ gmail . com . >
Parse1             input: test < @ gmail . com . >
Mailertable        input: < gmail . com > test < @ gmail . com . >
Mailertable        input: gmail . < com > test < @ gmail . com . >
Mailertable      returns: test < @ gmail . com . >
Mailertable      returns: test < @ gmail . com . >
SmartTable         input: test < @ gmail . com . >
SmartTable       returns: test < @ gmail . com . >
MailerToTriple     input: < > test < @ gmail . com . >
MailerToTriple   returns: test < @ gmail . com . >
Parse1           returns: $# esmtp $@ gmail . com . $: test < @ gmail . com . >
parse            returns: $# esmtp $@ gmail . com . $: test < @ gmail . com . >

and then this problem domain:

> 3,0 [email protected]

canonify           input: test @ paladinny . com
Canonify2          input: test < @ paladinny . com >
Canonify2        returns: test < @ paladinny . no-ip . biz . >
canonify         returns: test < @ paladinny . no-ip . biz . >
parse              input: test < @ paladinny . no-ip . biz . >
Parse0             input: test < @ paladinny . no-ip . biz . >
Parse0           returns: test < @ paladinny . no-ip . biz . >
ParseLocal         input: test < @ paladinny . no-ip . biz . >
ParseLocal       returns: test < @ paladinny . no-ip . biz . >
Parse1             input: test < @ paladinny . no-ip . biz . >
Mailertable        input: < paladinny . no-ip . biz > test < @ paladinny . no-ip . biz . >
Mailertable        input: paladinny . < no-ip . biz > test < @ paladinny . no-ip . biz . >
Mailertable        input: paladinny . no-ip . < biz > test < @ paladinny . no-ip . biz . >
Mailertable      returns: test < @ paladinny . no-ip . biz . >
Mailertable      returns: test < @ paladinny . no-ip . biz . >
Mailertable      returns: test < @ paladinny . no-ip . biz . >
SmartTable         input: test < @ paladinny . no-ip . biz . >
SmartTable       returns: test < @ paladinny . no-ip . biz . >
MailerToTriple     input: < > test < @ paladinny . no-ip . biz . >
MailerToTriple   returns: test < @ paladinny . no-ip . biz . >
Parse1           returns: $# esmtp $@ paladinny . no-ip . biz . $: test < @ paladinny . no-ip . biz . >
parse            returns: $# esmtp $@ paladinny . no-ip . biz . $: test < @ paladinny . no-ip . biz . >

I have to look more into what Canonify2 does. But this gives me an idea: force the mailertable to handle paladinny . no-ip . biz the way I want it to, namely:

paladinny.no-ip.biz relay:barracuda.cblconsulting.com

because in DNS my DNS server returns this funny result:

> dig mx paladinny.com

; <<>> DiG 9.6-ESV-R7-P3 <<>> mx paladinny.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17559
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0
 
;; QUESTION SECTION:
;paladinny.com.                 IN      MX
 
;; ANSWER SECTION:
paladinny.com.          351     IN      CNAME   paladinny.no-ip.biz.
 
;; AUTHORITY SECTION:
no-ip.biz.              60      IN      SOA     nf1.no-ip.com. hostmaster.no-ip.com. 2052775595 600 300 604800 600
 
;; Query time: 30 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jan 18 08:53:49 2013
;; MSG SIZE  rcvd: 121

whereas Google’s public DNS says this, which looks like the intended result:

> dig mx paladinny.com @8.8.8.8

; <<>> DiG 9.6-ESV-R7-P3 <<>> mx paladinny.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3749
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
 
;; QUESTION SECTION:
;paladinny.com.                 IN      MX
 
;; ANSWER SECTION:
paladinny.com.          1800    IN      MX      10 barracuda.cblconsulting.com.
 
;; Query time: 236 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Jan 18 08:55:42 2013
;; MSG SIZE  rcvd: 71

So at least we know where that odd paladinny.no-ip.biz comes from, sort of. It comes from my nameserver, but where it got that answer from I have no idea. It doesn’t come from the authoritative nameservers:

> dig mx paladinny.com @dns1.name-services.com.

; <<>> DiG 9.6-ESV-R7-P3 <<>> mx paladinny.com @dns1.name-services.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45704
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
 
;; QUESTION SECTION:
;paladinny.com.                 IN      MX
 
;; ANSWER SECTION:
paladinny.com.          1800    IN      MX      10 barracuda.cblconsulting.com.
 
;; Query time: 82 msec
;; SERVER: 98.124.192.1#53(98.124.192.1)
;; WHEN: Fri Jan 18 08:59:50 2013
;; MSG SIZE  rcvd: 71

A CNAME is not an MX record, so why my nameserver is returning an answer (ANSWER: 1)when queried for the MX record when all it thinks it has is a CNAME seems to be an out-and-out error.

And putting the resolved name in the mailertable is also not normal. Normally you put the domain itself, as in:

paladinny.com relay:barracuda.cblconsulting.com

and of course that’s the first thing I tried, but it has no effect whatsoever.

February Update and Conclusion
The mystery was solved when a whole bunch of email deliveries started failing on my system and I was forced to do some serious debugging. Long story short my SLES system was regrettably running nscd, the nameserver caching daemon. I didn’t even bother to check paladinny.com. So many other things cleared up when I killed it I’m sure it was the cause of the paladinny.com issue as well. This is all described in this post.

Categories
Linux Network Technologies SLES TCP/IP

Ethernet Bridging on the cheap. Fail. Then Success with OLTV

Intro
Some experiments just don’t work out. I became curious about a technology that has various names: ethernet bridging, wide-area VLANs, OTV, L2TP, etc. It looked like it could be done on the cheap, but that didn’t pan out for me. But later on we got hold of high-end gear that implements OTV and began to get it to work.

The details
What this is is the ability to extend a subnet to a remote location. How cool is that? This can be very useful for various reasons. A disaster recovery center, for instance, which uses the same IP addressing. A strategic decision to move some, but not all equipment on a particular LAN to another location, or just for the fun of it.

As with anything truly useful there is an open source implementation(s). I found openvpn, but decided against it because it had an overall client/server description and so didn’t seem quite what I had in mind. Openvpn does have a page about creating an ethernet bridging setup which is quite helpful, but when you install the product it is all about the client/server paradigm, which is really not what I had in mind for my application.

Then I learned about Astaro RED at the Amazon Cloud conference I attended. That’s RED as in Remote Ethernet Device. That sounded pretty good, but it didn’t seem quite what we were after. It must have looked good to Sophos as well because as I was studying it, Sophos bought them! Asataro RED is more for extending an ethernet to remote branch offices.

More promising for cheapo experimentation, or so I thought at the time, is etherip.

Very long story short, I never got that to work out in my environment, which was SLES VM servers.

What seems to be the most promising solution, and the most expensive, is overlay transport virtualization (OLTV or simply OTV), offered by Cisco in their Nexus switches. I’ll amend this post when I get a chance to see if it worked or not!

December Update
OTV is beginning to work. It’s really cool seeing it for the first time. For instance, I have a server in South Carolina on an OTV subnet, IP 10.94.45.2. Its default gateway is in New Jersey! Its gateway is in the ARP table, as it has to be, but merely to PING the gateway produces this unusual time lag:

> ping 10.94.45.1

PING 10.94.45.1 (10.194.54.33) 56(84) bytes of data.
64 bytes from 10.94.45.1: icmp_seq=1 ttl=255 time=29.0 ms
64 bytes from 10.94.45.1: icmp_seq=2 ttl=255 time=29.1 ms
64 bytes from 10.94.45.1: icmp_seq=3 ttl=255 time=29.6 ms
64 bytes from 10.94.45.1: icmp_seq=4 ttl=255 time=29.1 ms
64 bytes from 10.94.45.1: icmp_seq=5 ttl=255 time=29.4 ms

See those response times? Huge. I ping the same gateway from a different LAN but same server room in New Jersey and get this more typical result:

# ping 10.94.45.1

Type escape sequence to abort.
Sending 5, 64-byte ICMP Echos to 10.94.45.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/0/1 ms
Number of duplicate packets received = 0

But we quickly stumbled upon a gotcha. Large packets were killing us. The thing is that it’s one thing to run OTV over dark fiber, which we know another customer is doing without issues; but to run it in an MPLS network is something else.

Before making any adjustment on our servers we found behaviour like the following:
– initial ssh to linux server works OK; but session soon freezes after a directory listing or executing other commands
– pings with the -s parameter set to anything greater than 1430 bytes failed – they didn’t get returned

So this issue is very closely related to a problem we observed on a regular segment where getvpn had just been implemented. That problem, which manifested itself as occasional IE errors, is described in some detail here.

Currently we don’t see our carrier being able to accommodate larger packets so we began to see what we could alter on our servers. On Checkpoint IPSO you can lower the MTU as follows:

> dbset interface:eth1c0:ipmtu 1430

The change happens immediately. But that’s not a good idea and we eventually abandoned that approach.

On SLES Linux I did it like this:

> ifconfig eth1 mtu 1430

In this platform, too, the change takes place right away.

By the we experimented and found that the largest MTU value we could use was 1430. At this point I’m not sure how to make this change permanent, but a little research should show how to do it.

After changing this setting, our ssh sessions worked great, though now we can’t send pings larger than 1402 bytes.

The latest problem is that on our OTV segment we can ping only one device but not the other.

August 2013 update
Well, we are resourceful people so yes we got it running. Once the dust settled OTV worked pretty well, with certain concessions. We had to be able to control the MTU on at least one side of the connection, which, fortunately we always could. Load balancers, proxy servers, Linux servers, we ended up jiggering all of them to lower their MTU to 1420. For firewall management we ended up lowering the MTU on the centralized management station.

Firewalls needed further voodoo. After pushing policy clamping needs to be turned back on and acceleration off like this (for Checkpoint firewalls):

$ fw ctl set int fw_clamp_tcp_mss 1
$ fwaccel off

Conclusion
Having preserved IPs during a server move can be a great benefit and OTV permits it. But you’d better have a talented staff to overcome the hurdles that will accompany this advanced technology.

Categories
Admin Linux ntp SLES

The IT Detective Agency: ntp server shows the wrong time after patching

Intro
One of my ntp servers hadn’t been patched in awhile so it was time. Other systems rely on it for time synchronization. The next morning after the patching I noticed that the ntp service wasn’t even running. I started it and went about my business. Checking back some minutes later, it had died again. What happened, and how to get it fixed? Read on to see how we diagnosed and solved this puzzler.

The details
I like to use ntpq -p to query my ntp server – it’s easy to type! So when I started it up the results looked like this:

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.          10 l   59   64   17    0.000    0.000   0.001
 drjegw.drjn.com 192.5.41.209     2 u   56   64   17    0.335  3605497  26.136
 drjegw2.drjn.co 192.5.41.41      2 u   56   64   17   19.241  3605534  39.621
 drjeuro.drjn.eu 128.252.19.1     2 u   60   64   17  105.970  3605532  38.946

That’s some offset, eh? 3.605 x 10^6 msec, or, when you think about it, just over an hour. And yet the local clock had no offset. Strange.

Date
I like to do a crude check of system time by running the date command – quickly – on two different systems. Lacking some sleep, I noticed eventually but not right away, that my ntpd server had a date that was retarded by almost exactly an hour. I didn’t notice it at first because I had trained myself to only look at the seconds, which were “only” off by five seconds.

I checked to see if the timezone or localization settings had been changed by the patching – they hadn’t. So I went ahead and advanced the system clock by an hour. Actually the yast GUI of SLES gave me the option to sync against a time server, so I chose my closest one and did that after I had stopped ntpd.

Next problem, please
That got the time in the ballpark. But ntpd still wasn’t behaving. It exhibited a strange behaviour I’ve never seen before – its offset kept increasing. I observed this behaviour over the course of several minutes:

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l    3   64  377    0.000    0.000   0.001
*drjegw.drjn.com 192.5.41.209     2 u  129  128  377    0.350  146.846  81.771
+drjegw2.drjn.co 192.5.41.41      2 u    1  128  377   20.211  183.047  97.286
+drjeuro.drjn.eu 128.252.19.1     2 u   72  128  377  104.931  161.696  79.561

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l    5   64  377    0.000    0.000   0.001
*drjegw.drjn.com 192.5.41.209     2 u    2  128  377    1.803  182.380  97.636
+drjegw2.drjn.co 192.5.41.41      2 u    3  128  377   20.211  183.047  97.286
+drjeuro.drjn.eu 128.252.19.1     2 u   74  128  377  104.931  161.696  79.561

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l   28   64  377    0.000    0.000   0.001
*drjegw.drjn.com 192.5.41.209     2 u   89  128  377    1.803  182.380  97.636
+drjegw2.drjn.co 192.5.41.41      2 u   90  128  377   20.211  183.047  97.286
+drjeuro.drjn.eu 128.252.19.1     2 u   32  128  377  104.667  197.864  96.296

> ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l    5   64  377    0.000    0.000   0.001
*drjegw.drjn.com 192.5.41.209     2 u    4  128  377    1.813  218.325 113.345
+drjegw2.drjn.co 128.118.25.5     2 u    5  128  377   19.667  219.077 113.157
+drjeuro.drjn.eu 128.252.19.1     2 u   75  128  377  104.667  197.864  96.296

Look at that offset column. See? It keeps going up, at about a rate of 40 msec every two minutes. It ain’t supposed to do that!

So a Unix pal of mine said he had encountered an issue in ntp and had commented out that local clock. I honestly had absolutely no idea what that LOCAL line did, but it had never hurt before.

The local clock comes from these lines in ntp.conf:

server 127.127.1.0              # local clock (LCL)
fudge  127.127.1.0 stratum 10   # LCL is unsynchronized

So I took those out, stopped ntpd with a sudo service ntp stop, synced the time with a sudo sntp -P no -r drjegw.drjn.com, and restarted ntpd. It didn’t work immediately, but it became apparent eventually that it was working.

Meantime I discovered the ntpdc command, which is kind of informative in this situation:

> ntpdc
ntpdc> loopinfo

offset:               0.097373 s
frequency:            -132.558 ppm
poll adjust:          12
watchdog timer:       841 s

This tells me the offset if 97 msec (already too large in my experience) and that for some reason the system clock hadn’t been adjusted in 841 s, and that the clock drift rate was -132 ppm – much, much higher than any other system

Then in a few minutes it clicked and got the offset in order:

ntpdc> loopinfo

offset:               0.000000 s
frequency:            -132.558 ppm
poll adjust:          4
watchdog timer:       11 s

So removing the local system clock seemed to be working. But what was the real cause of all this? I discussed it with an admin. Bear in mind that this is physical server. He said the system clock gets its time from a hardware clock which should be visible in the ILO. We checked it. Sure enough, there it was, reporting in the ILO – still, after we had fixed the problem at the OS level – as one hour retarded. There was no way to manually adjust it. The only option was to set up sntp servers, which we did, which forced the ILO to restart.

We logged back in to the ILO and voila, the time was right!

I now realize that in the OS the LOCAL Time was using that hardware clock, which must have drifted by an hour since the system was installed.

Before the patch the system was incrementally keeping up with the drift, making the necessary incremental changes periodically. But the discrepancy was too large for it after rebooting after the patching. In the /var/log/ntp I even see a line:

14 Sep 07:20:53 ntpd[10259]: time correction of 3605 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.

Conclusion
Now the system is better and we have:

ntpdc> loopinfo

offset:               0.057029 s
frequency:            -5.465 ppm
poll adjust:          4
watchdog timer:       24 s

That’s better, but the offset of 57 msec is still far larger than normal. But it’s useable for now.