Categories
Admin

Azure Cloud: can you swap public IP addresses on two VMs?

Intro
I am just beginning to use Microsoft’s Azure cloud environment. Although I am inclined to be a fan of AWS, I haven’t looked at AWS networking for awhile, and the last time I did something I felt totally lost in trying to understand their terminology.

But in spite of my natural inclination to support everything Amazon, I gotta admit that Azure was a good, usable environment for what I needed to do; swap the public IPs on two VMs.

The details
I was not getting any help whatsoever from with my organization. But I did at least get sufficient access to the Resource Group where my Redhat 7.4 VM was running. That was a godsend.

In Azure network interfaces are resources. They have IPs like 10.0.1.4, 10.0.1.7, etc.

Public IP addresses are resources. They have IPs apprporiate for your region. They are typically associated with a network interface.

A network interface in turn is associated to a VM, typically.

For some reason which no one could explain to me, I could no longer patch my RHEL 7.4 server. That began about September 2019. Meantime, I was using an application which relied on a built-in package. Now Redhat always ships with old versions of everything, so running this old version plus lack of patches really put pressure on me to upgrade to a new OS. Can you do an in-place upgrade? As far as I can tell, no. I went with SLES 15 SP1 on a new VM within the same resource group and data center since I have some familiarity with Suse Linux. That OS had a newer version of that open source package, plus it could be patched.

But the IP of the old server was embedded in several places and switching it was not an option. What to do? What to do? Can you even swap IPs on two VMs within the same Resource Group? Who knows?

Well, it turns out you can. The documentation on the topic is pretty good and cleared up some things for me. Particularly the first two links in the references at the bottom.

I took this approach.

Changed the IP from dynamic to Static (go to configure section when looking at this resource). This should have been done from the get-go, but wasn’t. Who knew?

Dissociate the IP from the network interface.

Changed the second IP from dynamic to static.

Dissociate this IP from its network interface.

Associate IP to network interface of the SLES 15 server.

Associate second IP to network interface of the RHEL server.

And that’s it…. It worked like a charm.

Then I cleaned up some old public IP addresses which weren’t being used. You have to remember there is a shortage of IPs. So a lot of the quirk you encounter are due to their utilizing Ips as sparingly as possible. makes sense to me. For instance you can have a “public IP” resource which has no value! it may not get a value until its absolutely needed by virtue of being associated with a network interface on an active server. Stuff like that…

Conclusion
Yes, you can indeed swap public IPs on two servers if they belong to the same Resource Group and I guess the same data center. I know because I did it. As a bonus I found that the Azure documentation is pretty clear and sufficiently detailed.

References and related
https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-ip-addresses-overview-arm
https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-network-interface-addresses

Categories
Admin Network Technologies

The IT Detective Agency: the case of the Is it the firewall? or routing? or switch? or layer 2?

Intro
This is yet another tale of things in the IT world often do not turn out the way it seems at first blush. Or possibly a tale of just when you think you’ve seen it all after decades in the industry, something new (to you) occurs.

What’s going on
The firewall team was all busy so when this strange problem occurred Friday they called in the second string: me. I consider some of the team to be less-than-customer focused so I try to compensate for them and for my lack of knowledge about the firewall by applying a more customer-first attitude. In other words, a sympathetic listening ear. These days it can be hard just to find someone to complain to about your It problem, and I am keenly aware of that.

There was some strange communication which wasn’t working, mediated by a firewall I had never accessed and was not sure i even had access to. So of course I was asked to join a big conference call where an ongoing debugging session was taking place.

I refused.

I hate being blindsided, and i hate not having answers, making me sound even less competent than I already am.

But what I did do is being my research to see what the system is, if perhaps I had access, etc.

Yes. I found that through a management system I have access to I had access to view the policies on that particular firewall and view the logs as well.

So once I had that up, I agreed to join the call.

They had one server communicating to three different systems. Only one of the three systems was being reached. Yes the other two were on the same subnet. Two of our firewalls were between the system and the three servers.

And, yes, i could see some drops. The interesting TCP error stated: TCP packet out of state, first packet isn’t SYN.

No problem. routing must be screwed up such that we have asymmetric routing. It happens all the time. Right? well these systems are really appliances with only some basic networking information configurable, not real debugging facility, and really no ability to add a host route.

I could not establish a shell session onto the firewall – not sure what the password naming scheme was that they used.

Then a real firewall guy comes on the call. But his connectivity is messed up, so I keep with the debug session, if nothing else than to support him since four eyes is more effective than just two. He shares the routing tables of our two in-line firewalls. It’s hard to understand as these are all new subnets for me, some are ones that don’t look right. But just focusing on possible host routes for any of these three servers, I don’t see anything amiss.

Firewall policy
And, in firewall policy I see the entire subnet has this traffic permitted. There is no rule specific to one or the other of these systems.

So what do we have up until now?
A purist firewall administrator attitude would be as follows:
The firewall treats all these systems the same, therefore this cannot be a firewall problem. Talk to your networking or system people. Have a nice day.

Well, in fact there was some serious question about the network switch as well. So we had a network guy on the call. So they dug up the MAC addresses of these systems, from which they found the switch ports. Then they checked the port configuration. Ah, some complex 802.1x authentication was configured. As I understand this means the device would not even be allowed onto the subnet until it passed some kind of Radius authentication. So they removed this 802.1x stuff and just made sure that port was assigned to the right vlan.

Still, the problem persisted.

I think the other firewall guy was also new to this equipment. Eventually, though, he tries to do a packet trace of the one that’s working versus the one that isn’t.

You know, I never saw the results of those traces, but I’m pretty sure, reading between the lines, that they surprised him, meaning, they did not fit the hypothesis of the asymmetric routing.

In these situations there is the main communication in the mian session, then side communications going on, like between me and the firewall guy. But it is all chaotic. Acoustics are mediocre, accents are hard to understand. So the net transfer of information is pretty low. Statements, even important ones, often have to be repeated multiple times (rebroadcasts) to assure everyone “gets it.”

Typical questions were asked. When did this last work? what had changed? There were a couple changes. Some kind of networking thing (I forget what), and then the firewalls changed management systems after that. The firewall change seemed closer in time to the last known success.

You acquire more and more information as you dig into problems. It’s hard to judge which is relevant at the time and which lines if inquiry are a complete waste of time. A good incident manager or project manager can sense which are the more productive lines of investigation and nurture those discussions while suppressing the noise.

Actually it was the networking guy who found the Checkpoint link below. I looked at it. the firewall guy was muttering something about badly behaved, older applications that might exhibit this behaviour.

So we agreed to take the suggested steps, which would basically allow these out of state packets. Drat. The firewall returned an error.

But I continue to refresh the firewall logs. The communication was occurring about every minute. Lo and behold, I see the older drops, and then accepts for the last few minutes! I think it worked. I tell them to check.

They check their end. Sure enough. Communication beginning to work…

The customer tries to make assertion that this was a firewall problem all along. Not so fast. Firewall guy says, well, the firewall is doing exactly what it’s supposed to be doing. who’s right?

We’re all good for now, but we state this is a kludge for today and a follow-up meeting needs to occur.

So what happened?
I think the single most important thing is that the firewall guy switched his problem hypothesis from Must be asymmetric routing, to Maybe it’s a badly behaved application. Meaning what? What if you have an application that establishes a TCP connection, and then to beat idle timeouts, sends a KEEP ALIVE packet every minute? Well, now, suppose your firewall is rebooted in the middle of that because it has changed management stations and needs to reload policy? What might the situation look like to it?

It you were unlucky, it just might see these KEEP ALIVE TCP packets without having the connection in its connection table, in other words, exactly the situation we are observing!

What should have happened?
It would have been great if the communication were forced to be re-established form time-to-time, even once a day. This problem had been going on for days.

But, given this very stupid behaviour on the part of this application, if the app people had been aware they should have forced their application to re-establish the TCP connection after the firewall reboot. Probably, for the one that did work, it had been forced to re-establish.

A firewall person has to be sufficiently aware to realize this could be happening, and advise the app owner on what to do to prevent it.

Conclusion
So whose problem is it?

To the app people it looks like a firewall issue, cut-and-dried. To a firewall guy it looks like an application issue, cut-and-dried. I see both sides. It is some of both. An app owner has to understand enough about firewalls to see that this type of thing can occur. Assigning blame to one side or the other, as most people are wont to do, is not productive. Only a team effort could have revealed this issue. And recall that the “fix” is actually a kludge that lowers security.

Case: almost closed.

References and related
Checkpoint’s note on TCP packet out of state first packet isn’t SYN: https://community.checkpoint.com/t5/General-Topics/TCP-packet-out-of-state-First-packet-isn-t-SYN-tcp-flags-SYN-ACK/td-p/37166

The IT Detective agency cases are still coming fast and furious. Here’s another recent case. Failed to convert character

Categories
Admin Network Technologies

The IT Detective Agency: WebEx and the case of the mysterious reset

Intro
A company known to me was a contented user of WebEx until they noticed a strange behaviour: their calls were losing quality or even dropped exactly after one hour. No one, most especially the vendor of WebEx, had the slightest idea of the root cause. Read on to see how this fascinating case is playing out.

Scene: company offices, Sao Paolo, Brazil
Triago, a very competent IT professional located in Sao Paolo was the first to report the problem. More-or-less it went like this:

– when he uses the call-my-computer feature in WebEx the call quality is fine, until he has been on the meeting for one hour. At the one hour mark the voice quality of others (from his perspective) dropped dramatically. Sometimes the call was completely lost. Then, about five minutes later, the quality was OK again.
– the problem only occurred when in the office or using VPN, i.e., when using the company network
– others in South America are having the same issue
– he can use the same company laptop, on the Internet, and will not have the problem

After the usual finger-pointing amongst various vendors a debugging plan was created.

It’s going really well. There have been about 12 test calls, stretched out over the last five months. You have to admire the chutzpah of US software vendors who sell to major customers and then still manage to treat them like crap come time for support.

The pattern, more-or-less, goes like this. test call with several vendors plus Triago and I in the US. Wait around for an hour, produce the problem. Wait for software vendor to “analyze”. Wait for two weeks. Some small insight may be gleaned by them. Conclusion: another test is needed, we didn’t have all the traces we need. rinse and repeat.

Scene: A soulless office park somewhere in northern New Jersey
To be continued, literally…

Scene: an enterprise-class server room somewhere in Research triangle Park, North Carolina
If one uses the company’s guest WiFi, one uses the company’s firewall, but not the company’s proxy server. This test succeeds. But, unfortunately, one also uses UDP rather than TCP for the communication because that is the default. See the references for communication requirements.

So one thought is to knock out the ability to use UDP by blocking UDP port 9000, thereby forcing use of TCP. Testing that today….

References and related
Networking requirements for WebEx: https://help.webex.com/en-us/WBX264/How-Do-I-Allow-Webex-Meetings-Traffic-on-My-Network

Categories
Admin

What I’m Working on Now: building my own 3D printer

Intro
Because someone else on the team had a good experience with it, and so i can exchange notes and get some tips form them, I went ahead and ordered the Anet A8 3D printer + an extra roll of PLA. That is, despite the vocal negative reviews which exist.

Although normally I don’t consider myself very mechanical, I guess I’m pretty good at following instructions. So I’m watching their detailed Youtube video and mimicking their actions, step-by-step. And I’m having a blast! It’s like having an erector set all over again.

And I’m in constant awe that so much could be had for so little. There must be, what, a couple hundred parts, including five stepper motors, several limit switches, a circuit board, lcd, power supply, steel rods, belts, arylic(?) parts, and 0.5 Kg of PLA – all for $140?? I feel like its raw value is several times that. Hey, I just bought two m5 screws and nuts from Home Depot and paid almost $3.

Not exactly like the video

Things were going great, until the kit actually deviated form the parts shown in the video! Especially where the kit provided a two 3D-printed parts shaped significantly differently from what they show. Assembly slowed down after that.

The rods did not fit through the 3D parts as they should have – I had to bore them out a bit with one of the yet-unused threaded rods that came with the kit! Which did work by the way + a lot of muscle and hand strength.

Partially assembled – extruder not yet inserted.

Many gray hairs later I finished the assembly. If you’re familiar with furniture assembly of a semi-complex piece like a desk with top shelves, this is about three times more complicated. You keep hoping that OK, now it’ll get easier and I’ll cruise through this. But it never does! Every stage features unique steps presenting their own unique challenges.

In my case my cooling fan on the extruder, which looks suspiciously like a 3D part, hangs below the extruder nozzle, so I don’t think I can use it.

Then once it was assembled my first test print, a simple box, went awry. Around the third layer the whole thing started moving around on the print bed! Once again I was hoping that the hard part was the assembly and then I could cruise to printing to my heart’s content. Wrong. Now comes a whole new set of challenges unique to 3D printing, and you have to master that as well. This is nothing at all like going to Staples to get an ink jet printer where the most challenging thing you’ll have to do is change an ink cartridge. Nothing at all like that. This is more like constructing a model railroad yourself.

Things get smoother

So I took everyone’s advice and installed Cura 3D as my slicer. Then I just decided to go for it and print out my first upgraded part: a center nozzle fan. It worked really, really well!

My very first print – a center nozzle fan

You can almost see form the picture that the quality was really good. I had to sandpaper the chute a little for it to fit – apparently ABS sandpapers easily – and voila, it snapped into place.

Next I printed a filament guide. Also no problems there.

Then my filament broke.

Some terms
slicer – software which takes an STL file an translates it into a series of layer-by-layer movements. Cura 3D is the slicer I use.
gcode – the file format that the anet a8 printer understands. You take an STL file, put it into your slicer, and have it produce a gcode file for you that you print.
PLA – the type of plastic most often used for 3D printing at home

References and related
Amazon link to what I purchased – now only $135! https://smile.amazon.com/gp/product/B01N5D2ZIB/ref=ppx_yo_dt_b_asin_title_o01_s00?ie=UTF8&psc=1

Assembly video, part 1
Part 2.
Anet test guide: part 3: https://www.youtube.com/watch?v=EB5Q3_sJ-Tk

Someone’s additional first steps guide – has some useful tips: https://www.instructables.com/id/Beginners-Guide-to-3D-Printing-Anet-A8-DIY-3D-Prin/

Note that the included microSD card has PDFs for assembly instructions, usage instructions and troubleshooting guide.

Upgrade part: center nozzle fan download: http://www.thingiverse.com/thing:1620630

I am contemplating using openscad to build my 3D models. www.openscad.org. This is really good tutorial: http://www.tridimake.com/2014/09/how-to-use-openscad-tricks-and-tips-to.html

3D printing some parts for my house

Categories
Admin IT Operational Excellence Network Technologies

No Internet, secure WiFi status message in Windows 10

Intro
Finding out how Windows decides if there is an Internet connection or not can be a challenge often posed by trying to do an Internet search comprised or words that are common and therefore used in many other contexts. I have to give credit to someone else who found most of these pertinent links that help explain how Windows decides whether or not your PC has an Internet connection.

What they don’t tell you
I think there are a lot more tests microsoft does than what they’ve documented. In my opinion, based on observation, in addition to the sites they recommend to whitelist, also whitelist

www.msftconnecttest.com

Some PCs get stuck in a loop requesting www.msftconnecttest.com/connecttest.txt indefinitely, which isn’t good for anyone.

Here’s one they don’t mention, of the same ilk:

ipv6.msftconnecttest.com/connecttest.txt

I’m thinking to just leave that one alone, unless you really are fully running on ipv6.

Now if you have a PAC file, what you’re going to see are accesses for
<PAC-file-address>/connecttest.txt

I don’t think that one’s documented either. I’m not yet sure how best to have the PAC file web server respond, where best means the reply which would make the PC most likely to decide Yes I really do have an Internet connection.

References and related
This Pulse Secure article is pretty good. You start with an Internet connection, then launch Pulse Secure vpn, then find you are told there is no longer an Internet connection. This explains why it might be, but in my opinion it is incomplete as it does not even consider the case where an authenticating proxy is the sole gateway to the Internet:
https://kb.pulsesecure.net/articles/Pulse_Secure_Article/KB43805

These are two more articles about VPN tunneling
https://community.pulsesecure.net/t5/Pulse-Desktop-Clients/Pulse-Secure-blocks-Windows-10-apps-from-internet-access/td-p/11944
https://docs.pulsesecure.net/WebHelp/PCS/8.3R1/Home.htm#PCS/PCS_AdminGuide_8.3/About_VPN_Tunneling.htm

network Location Awareness (NLA) and Network Connection Status Indicator (NCSI) are explained in these articles
https://support.microsoft.com/en-us/help/4494446/an-internet-explorer-or-edge-window-opens-when-your-computer-connects
https://support.microsoft.com/en-us/help/2778122/using-authenticated-proxy-servers-together-with-windows-8

Categories
Admin Linux Raspberry Pi

Raspberry Pi Recovery Mode or interrupting the boot process

Intro
If you installed Raspbian from the NOOBS distribution as I do, then you may occasionally “blow up” your installation as I just have! You have an out, sort of, short of re-imaging the disk, though about with the same impact.

To interrupt the boot process and enter recovery mode, attach a USB keyboard and repeatedly hit the Shift key. You should come to the NOOBS OS install selection screen. Just re-install Rasbian again… But if you’re using WiFi first configure your WiFi setup before re-installing Raspbian.

Symptoms
When I powered up, I got the initial multi-color screen. Then a two-line text message popped up – too quickly to be read, then a grayish screen, then it split into a lower and upper part, then both halves faded away and there it stayed… At that point it was not responsive to any keyboard inputs or mouse clicks.

Conclusion
While doing my advanced slide show and rotating display project, I somehow managed to blow up my OS. finding the way to interrupt the boot-up was not so easy so I am amplifying the answer that worked for me on the Internet: repeatedly hit the Shift key during the boot, until you see the NOOBS image selector screen.

Categories
Admin Linux Network Technologies Raspberry Pi Security Web Site Technologies

How to test if a web site requires a client certificate

Intro
I can not find a link on the Internet for this, yet I think some admins would appreciate a relatively simple test to know is this a web site which requires a client certificate to work? The errors generated in a browser may be very generic in these situations. I see many ways to offer help, from a recipe to a tool to some pointers. I’m not yet sure how I want to proceed!

why would a site require a client CERT? Most likely as a form of client authentication.

Pointers for the DIY crowd
Badssl.com plus access to a linux command line – such as using a Raspberry Pi I so often write about – will do it for you guys.

The Client Certificate section of badssl.com has most of what you need. The page is getting big, look for this:

So as a big timesaver badssl.com has created a client certificate for you which you can use to test with. Download it as follows.

Go to your linux prompt and do something like this:
$ wget https://badssl.com/certs/badssl.com‐client.pem

If this link does not work, navigate to it starting from this link: https://badssl.com/download/

badssl.com has a web page you can test with which only shows success if you access it using a client certificate, https://client.badssl.com/

to see how this works, try to access it the usual way, without supplying a client CERT:

$ curl ‐i ‐k https://client.badssl.com/

HTTP/1.1 400 Bad Request
Server: nginx/1.10.3 (Ubuntu)
Date: Thu, 20 Jun 2019 17:53:38 GMT
Content-Type: text/html
Content-Length: 262
Connection: close

400 Bad Request

No required SSL certificate was sent


nginx/1.10.3 (Ubuntu)

 

Now try the same thing, this time using the client CERT you just downloaded:

$ curl ‐v ‐i ‐k ‐E ./badssl.com‐client.pem:badssl.com https://client.badssl.com/

* About to connect() to client.badssl.com port 443 (#0)
*   Trying 104.154.89.105... connected
* Connected to client.badssl.com (104.154.89.105) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* skipping SSL peer certificate verification
* NSS: client certificate from file
*       subject: CN=BadSSL Client Certificate,O=BadSSL,L=San Francisco,ST=California,C=US
*       start date: Nov 16 05:36:33 2017 GMT
*       expire date: Nov 16 05:36:33 2019 GMT
*       common name: BadSSL Client Certificate
*       issuer: CN=BadSSL Client Root Certificate Authority,O=BadSSL,L=San Francisco,ST=California,C=US
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*       subject: CN=*.badssl.com,O=Lucas Garron,L=Walnut Creek,ST=California,C=US
*       start date: Mar 18 00:00:00 2017 GMT
*       expire date: Mar 25 12:00:00 2020 GMT
*       common name: *.badssl.com
*       issuer: CN=DigiCert SHA2 Secure Server CA,O=DigiCert Inc,C=US
&gt; GET / HTTP/1.1
&gt; User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
&gt; Host: client.badssl.com
&gt; Accept: */*
&gt;
&lt; HTTP/1.1 200 OK
HTTP/1.1 200 OK
&lt; Server: nginx/1.10.3 (Ubuntu)
Server: nginx/1.10.3 (Ubuntu)
&lt; Date: Thu, 20 Jun 2019 17:59:08 GMT
Date: Thu, 20 Jun 2019 17:59:08 GMT
&lt; Content-Type: text/html
Content-Type: text/html
&lt; Content-Length: 662
Content-Length: 662
&lt; Last-Modified: Wed, 12 Jun 2019 15:43:39 GMT
Last-Modified: Wed, 12 Jun 2019 15:43:39 GMT
&lt; Connection: keep-alive
Connection: keep-alive
&lt; ETag: "5d011dab-296"
ETag: "5d011dab-296"
&lt; Cache-Control: no-store
Cache-Control: no-store
&lt; Accept-Ranges: bytes
Accept-Ranges: bytes
 
&lt;
 
 
 
 
  <style>body { background: green; }</style>

client.
badssl.com

 
* Connection #0 to host client.badssl.com left intact
* Closing connection #0

No more 400 error status – that looks like success to me. Note that we had to provide the password for our client CERT, which they kindly provided as badssl.com

Here’s an example of a real site which requires client CERTs:

$ curl ‐v ‐i ‐k ‐E ./badssl.com‐client.pem:badssl.com https://jp.nissan.biz/

* About to connect() to jp.nissan.biz port 443 (#0)
*   Trying 150.63.252.1... connected
* Connected to jp.nissan.biz (150.63.252.1) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* skipping SSL peer certificate verification
* NSS: client certificate from file
*       subject: CN=BadSSL Client Certificate,O=BadSSL,L=San Francisco,ST=California,C=US
*       start date: Nov 16 05:36:33 2017 GMT
*       expire date: Nov 16 05:36:33 2019 GMT
*       common name: BadSSL Client Certificate
*       issuer: CN=BadSSL Client Root Certificate Authority,O=BadSSL,L=San Francisco,ST=California,C=US
* NSS error -12227
* Closing connection #0
* SSL connect error
curl: (35) SSL connect error

OK, so you get an error, but that’s to be expected because our certificate is not one it will accept.

The point is that if you don’t send it a certificate at all, you get a different error:

$ curl ‐v ‐i ‐k https://jp.nissan.biz/

* About to connect() to client.badssl.com port 443 (#0)
*   Trying 104.154.89.105... connected
* Connected to client.badssl.com (104.154.89.105) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* warning: ignoring value of ssl.verifyhost
* Unable to load client key -8025.
* NSS error -8025
* Closing connection #0
curl: (58) Unable to load client key -8025.

Chrome gives a fairly intelligible error

Possibly to be continued…

Conclusion
We have given a recipe for testing form a linux command line if a web site requires a client certificate or not. thus it could be turned into a program

References and related
My article about ciphers has been popular.

I’ve also used badssl.com for other related tests.

Can you use openssl directly? You’d hope so, but I haven’t had time to explore it… Here are my all-time favorite openssl commands.

https://badssl.com/ – lots of cool tests here. The creators have been really thorough.

Categories
Admin

F5 clustering tips

Intro
I was having trouble with one of my new F5 clusters. I thought I’d be clever and add two new F5’s to the existing cluster, and then remove the old members when everything looked good.

But the device groups somehow remembered the old servers, even though they weren’t in the device groups. Eventually I could not even sync the two new servers together.

Here’s a concise summary of the steps which worked, taken from this article:

Hi Manuel, do you try to setup a sync-failover device-group containing three units? To establish device trust I would recommend to force two units into offline state. Remove the machines from the existing sync-failover device-group (repeat on each machine if required) and delete the sync-failover device-group. Now reset the device trust on all machines. Next step will be to use the active machine to add both offline machines as peers. Now all three units should show up in each machine´s device list. On the active unit create a new device-group of type sync-failover with network failover enabled. Add all machines to the new device-group. This will be synced to all machines and you can release them from forced offline. Time for the initial sync now. Thanks, Stephan

I get a little confused about sync-only vs sync-failover. That difference is clearly discussed there as well:

“Sync-only” will not allow to synchronize LTM configurations.
To synchronize an LTM config all units need to belong to the same “sync-failover” device-group.

I had afraid and ignorant of the need to reset device trust. For good measure I deleted all device groups. Then this procedure seems to leave me with an auto-created datasync-global-dg (sync-only). Since there was no sync-failover group I manually created one I called sync-LTM-configs. Another auto-created, sync-only group is the device_trust_group. So I see three device groups in all when I am on the sync page, but only two when I am on the device group page.

Conclusion
I had afraid and ignorant of the need to reset device trust in cleaning up an F5 cluster with old and new members. But once I did that and followed the recipe above, I was good to go.

References and related
The discussion on the F5 Devcentral site: https://devcentral.f5.com/questions/f5-sync-options

Categories
Admin Network Technologies

Postfix Operational tips

Intro
I’m trying out the system-supplied postfix on a SLES system. i had been using sendmail but there doesn’t seem to be any development on that software.

Some commands I needed right away
Well, right away I had thousands of queued messages so I needed a way to make sense of what was happening.

For these commands to make sense you need to know that I am running a second postfix configuraiton out of /etc/postfixEXT.

Display the queue

postqueue -c /etc/postfixEXT -p

Force delivery from the queue

postqueue -c /etc/postfixEXT -f

List one email in detail

postcat -vq -c /etc/postfixEXT QUEUEID

Delete one email

postsuper -c /etc/postfixEXT -d QUEUEID

Put mail on hold

postsuper -c /etc/postfixEXT -h ALL|QUEUEID

Release mail form hold

postsuper -c /etc/postfixEXT -H ALL|QUEUEID

How to force delivery of a single message
This command is not documented anywhere – because it doesn’t exist so you have to get creative. If you have the luxury of halting all email for a few seconds simply do this:

Display the queue to find the queue ID of the email you want to force delivery of

postqueue -c /etc/postfixEXT -p

Put all mail on hold

postsuper -c /etc/postfixEXT -h ALL

Now release the hold on that one email

postsuper -c /etc/postfixEXT -H QUEUEID

QUEUEID is, of course, the queue id .e.g., F2A1A27891E, of the email in question.

Look for what happened
Check your mail log’s last lines in /var/log/mail

Revert back to normal running

postsuper -c /etc/postfixEXT -H ALL

Since mail is store-and-forward and not real time, you can do these steps, quickly, even on a production system and no one will be the wiser if you are pretty quick. Probably takes two minutes even for a slow typer.

How to run multiple listeners
I didn’t want to disturb the system-installed postfix too much. I would let it “have” the loopback address, 127.0.0.1, leaving me the other IPs for my relay config to listen on. I added these lines to /etc/postfix/main.cf

multi_instance_enable = yes
multi_instance_directories = /etc/postfixEXT

service postfix start starts up the local postfix plus my relay. Grep the process table for either master or postfix to see. However, to be honest, service postfix stop does not kill all processes. So I always end up killing one of the master processes by hand. Update: postmulti -p stop does the trick to kill all. There is also a status or start option instead of stop.

Sendmail to Postfix migration tips
This could be a separate post but I am too lazy to do that.

What happens to the access file? I kept the name of the file access but just list all the IPs, one per line, without any further arguments, to permit just those IPs relay access. In my main.cf I have a line like this to tie it together:

mynetworks = /etc/postfixEXT/access

Note that there is no hashed or .db version of this file any longer, unlike in the sendmail case.

References and related
Since I mentioned sendmail I have to give a shout out to one of my old sendmail posts.

More info on postfix multiple instances. A pretty complete guide.

Categories
Admin Linux SLES

How to add private root CAs in SLES or Redhat or Debian

Intro
From time-to-time I run my own PKI infrastructure, namely issuing my own certificates from my private root CA. I wanted this root CA to be recognized by Linux utilities running on Suse Linux (SLES), in particular, lftp, which I was trying to use to access an ftps site, which itself is a post for another day. In other words, how do you add a certificate to the certificate store in linux?

The details
Let’s say you have your root certificate in the standard form like this example

-----BEGIN CERTIFICATE-----
MIIIPzCCBiegAwIBAgITfgAAAATHCoXJivwKLQAAAAAABDANBgkqhkiG9w0BAQsF\nADA2MQswCQYD
VQQGEwJERTENMAsGA1UEChMEQkFTRjEYMBYGA1UEAxMPQkFTRiBS\nb290IENBIDIxMB4XDTE3MDgxMDEyNDAwOFoXDTI4MDgxMDEyNTAwOFowXDETMBEG\nCgm
...
PEScyptUSAaGjS4JuxsNoL6URXYHxJsR0bPlet\nSct
-----END CERTIFICATE-----

Then you can put the certificate inline and within one script install it so that it permanently joins the other root CAs in /etc/ssl/certs with a script like this example:

DrJ_Root_CA="-----BEGIN CERTIFICATE-----\nMIIIPzCCBiegAwIBAgITfgAAAATHCoXJivwKLQAAAAAABDANBgkqhkiG9w0BAQsF\nADA2MQswCQYD
VQQGEwJERTENMAsGA1UEChMEQkFTRjEYMBYGA1UEAxMPQkFTRiBS\nb290IENBIDIxMB4XDTE3MDgxMDEyNDAwOFoXDTI4MDgxMDEyNTAwOFowXDETMBEG\nCgm
SJomT8ixkARkWA05FVDEUMBIGCgmSJomT8ixkARkWBEJBU0YxFjAUBgoJkiaJ\nk/IsZAEZFgZCQVNGQUQxFzAVBgNVBAMTDkJBU0YgU1VCIENBIDIzMIICIjAN
Bgkq\nhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAqrfoKxrCPCw/u2PBEaAwW/VHLxBw6JNi\n42F3EhXmligGb/Uu4kcWO016IGFatVrPhdAtShAqmTXis0w57hW
jn1Iptvo7rROY\nGPmH7aSW/fYM/x2Lln7NlltayXspWawqBzWzYGADodyjn/Z5TaLYaG8lajiabCM5\nUJDhlZ/SUR3xylqIIFaQK3k2twjeGoxobhbr9hJcQZ
fXF0V5FCSCzJExDYma6bs1\nZtyqP/yHaiOeWXGdnqM9EPfT8kmIC42ZXq7s2JZI5OUflJBbaebYEbuDad6Rh19E\nRchXABLe68+TF/4AZCw16iRwRgq/2Re2W
WPMtVomyZ2txvn51iizqBkdVGzIRklC\n3yIv5MRzDFTfG940/tSAomHsz+RdGbL+NCBeWSY+rnJQdExJ7bLXFLVsTNGL68lP\nMuYrkxYQKWRtVhvQCHsdd5E0
t9QR4iY1JLWQxq3GHy98tBbCGiKMpBbuj/9I/E6c\nGrikouv2QyNnCN34PXpUxTQmDj5LZGV9w2faqpwUBD2ZWsbyVSgvD8TcjdxzcMcj\nLBnYUaZ8wHFqUj2
DBahctfKQxA8Ptrzt1mDIGOQliZGDwrTVMECd+noQhTlF1eS+\nvNraV3dYRMymVxh58MPEaDJgwIRcBWAAOeBbZlyx76oskXdmjOiz5jqyoR5eweCE\ntS4jfM
EW6UECAwEAAaOCAx4wggMaMAsGA1UdDwQEAwIBhjAQBgkrBgEEAYI3FQEE\nAwIBADAdBgNVHQ4EFgQUdn7nwFGpb8uzpFVs5QWQcsA0Q6IwQwYDVR0gBDwwOjA
4\nBgwrBgEEAYGlZAMCAgEwKDAmBggrBgEFBQcCARYaaHR0cDovL3BraXdlYi5iYXNm\nLmNvbS9jcAAwGQYJKwYBBAGCNxQCBAweCgBTAHUAYgBDAEEwEgYDVR
0TAQH/BAgw\nBgEB/wIBADAfBgNVHSMEGDAWgBSS9auUcX38rmNVmQsv6DKAMZcmXDCCAQkGA1Ud\nHwSCAQAwgf0wgfqggfeggfSGgbZsZGFwOi8vL0NOPUJBU
0YlMjBSb290JTIwQ0El\nMjAyMSxDTj1DRFAsQ049UHVibGljJTIwS2V5JTIwU2VydmljZXMsQ049U2Vydmlj\nZXMsQ049Q29uZmlndXJhdGlvbixEQz1yb290
LERDPWJhc2YsREM9Y29tP2NlcnRp\nZmljYXRlUmV2b2NhdGlvbkxpc3Q/YmFzZT9vYmplY3RDbGFzcz1jUkxEaXN0cmli\ndXRpb25Qb2ludIY5aHR0cDovL3B
raXdlYi5iYXNmLmNvbS9yb290Y2EyMS9CQVNG\nJTIwUm9vdCUyMENBJTIwMjEuY3JsMIIBNgYIKwYBBQUHAQEEggEoNIIBJDCBuQYI\nKwYBBQUHMAKGgaxsZG
FwOi8vL0NOPUJBU0YlMjBSb290JTIwQ0ElMjAyMSxDTj1B\nSUEsQ049UHVibGljJTIwS2V5JTIwU2VydmljZXMsQ049U2VydmljZXMsQ049Q29u\nZmlndXJhd
GlvbixEQz1yb290LERDPWJhc2YsREM9Y29tP2NBQ2VydGlmaWNhdGU/\nYmFzZT9vYmplY3RDbGFzcz1jZXJ0aWZpY2F0aW9uQXV0aG9yaXR5MGYGCCsGAQUF\n
BzAChlpodHRwOi8vcGtpd2ViLmJhc2YuY29tL3Jvb3RjYTIxL1JPT1RDQTIxLnJ6\nLWMwMDctajY1MC5iYXNmLWFnLmRlX0JBU0YlMjBSb290JTIwQ0ElMjAyM
S5jcnQw\nDQYJKoZIhvcNAQELBQADggIBAClCvn9sKo/gbrEygtUPsVy9cj9UOQ2/CciCdzpz\nXhuXfoCIICgc0YFzCajoXBLj4V6zcYKjz8RndaLabDaaSQgj
phXFiZSBH8OII+cp\nTCWW1x+JElJXo9HB7Ziva2PeuU5ajXtvql5PegFYWdmgK2Q1QH0J2f1rr7B4nNGu\noyBi1TOSll+0yJApjx213lM9obt6hkXkjeisjcq
auMVh+8KloM0LQOTAD1bDAvpa\nVVN9wlbytvf4tLxHpvrxEQEmVtTAdVchuQV1QCeIbqIxW41l6nhE2TlPwEmTr+Cv\najMID/ebnc9WzeweyTddb6DSmn4mSc
okGpj8j8Z7cw173Yomhg1tEEfEzip+/Jx6\nd2qblZ9BUih9sHE8rtUBEPLvBZwr2frkXzL3f8D6w36LxuhcqJOmDaIPDpJMH/65\nAbYnJyhwJeGUbrRm3zVtA
5QHIiSHi2gTdEw+9EfyIhuNKS4FO/uonjJJcKBtaufl\nGFL6y0WegbS5xlMV9RwkM22R7sQkBbDTr+79MqJXYCGtbyX0JxIgOGbE4mxvdDVh\nmuPo9IpRc5Jl
pSWUa7HvZUEuLnUicRbfrs1PK/FBF7aSrJLoYprHPgP6421pl08H\nhhJXE9XA2aIfEkJ4BcKw0BqOP/PEScyptUSAaGjS4JuxsNoL6URXYHxJsR0bPlet\nSct
3\n-----END CERTIFICATE-----\n"
 
cd /etc/pki/trust/anchors/
echo -e -n $DrJ_Root_CA &gt; DrJ_Root_CA.pem
c_rehash
update-ca-certificates

So the key commands are c_rehash and update-ca-certificates.

Usually SLES is similar to Redhat. But it seems to be different in this case.

This was tested on a SLES 12 SP3 system.

It copies the certificate to /etc/pki/trust/anchors, which by itself is insufficient. Then it creates some kind of hash symlink to the CA file and makes sure that this new certificate doesn’t get wiped out by subsequent system patching. That’s the purpose of the c_rehash and update-ca-certificates commands.

You may also see these hashes and certificates in /etc/ssl/certs. I’m not sure because that’s where I started with all this. But merely dropping the private root CA into /etc/ssl/certs is insufficient, I can say from experience!

Redhat
Redhat is better documented, but for completeness I include it here. You have your inline certificate as in the SLES script, then following that:

...
cd /etc/pki/ca-trust/source/anchors/
echo -e -n $DrJ_Root_CA > DrJ_Root_CA.pem
update-ca-trust

So update-ca-trust is the key command for Redhat Linux. This was tested on Redhat Linux v 7.6.

Fedora v 33

Put your CA file with a .crt file extension into /etc/pki/ca-trust/source/anchors like for Redhat. Run update-ca-trust extract

Debian Linux circa 2023

Put your private CA file into a new directory /usr/share/ca-certificates/extra. Then run sudo dpkg-reconfigure ca-certificates. When prompted with a list of bundles to include make sure to enable your new extra file. Your certificate file needs to end in ‘crt’, not, e.g., ‘cer’. Seems pretty arbitrary to me, but that’s how it is. Of course it has to be standard PEM format. (—BEGIN CERTIFICATE—, etc.).

Python and self-signed certificates or certificates from private CAs

First, note that those are two different cases and need to be handled slightly differently! You may be in need of these measures if you are getting an error in python like this:

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host=’www.myhost.local’, port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)’)))

Self-signed certificate

If the certificate is truly self-signed, then throw it into a file, let’s call it my-crt.crt in your home directory. Then set an environment variable before running python:

$ export REQUESTS_CA_BUNDLE=~/my-crt.crt

It should now work.

Python reverts to SSL: CERTIFICATE_VERIFY_FAILED after upgrade

So I had it all working. Then the requests package complained about my version of the urllib package. So I upgraded requests with a pip3 –upgrade requests. They the above-mentioned SSL error came back. I noticed that urllib got upgraded when requests was upgraded.

I basically gave up and totally kludged python to fix this. But only after playing with the certifi package etc. So I took the file that is for me the output of certifi.where(): /usr/local/lib/python3.9/site-packages/certifi/cacert.pem

I edited it as root and simply appended by private root CAs to that file. I hated to do it, and I know I should have at least used a virtual env, blah, blah. But at least now my jobs run. By the way, my by-hand tests of urlopen all worked! So it’s just the way some package was using it beyond my control that I had to create this kludge.

Certificate issued from a private CA

I added the private CA to the system CA on Debian with the update-ca-certificates mentioned above. Still no joy. Then I noticed the web server forgot to provide the intermediate certificate so I added that as well. Then, at least, curl began to work. But not python. Strange. For python I still need to define this environment variable:

$ export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

A second method to handle the case of a certificate issued from a private CA is to bundle the certificate + the intermediate certificate + the private root CA all into a single file, let’s call it my-crt.crt, in your home directory, and define the envirnoment variable same as for the self-signed certificate case:

$ export REQUESTS_CA_BUNDLE=~/my-crt.crt

My favorite openssl commands shows some commands to run to examine the certificate of a web server.

lftp usage tip with a private CA
If like me you were doing this work in conjunction with running ftps using a certificate signed by a private CA, and want your ftp client, lftp, to not complain about the unrecognized CA, then this tip will help.

After initiating your lftp and sending the username and password, you can send this command
$ ssl:ca-file <path-to-your-private-CA-file>
lftp is so flexible it offers many other ways to do this as well. But this is the one I use.

Conclusion
We show how to add your own root CA to a SLES 12 system. I did not find a good reference for this informaiton anywhere on the Internet.

References and related
My favorite openssl commands.

The basics of working with cipher settings

For Reedhat/CentOS I am evaluating this blog post on the proper way to add your own private CA: https://www.happyassassin.net/2015/01/14/trusting-additional-cas-in-fedora-rhel-centos-dont-append-to-etcpkitlscertsca-bundle-crt-or-etcpkitlscert-pem/

For the Redhat approach I used this blog post: https://www.happyassassin.net/2015/01/14/trusting-additional-cas-in-fedora-rhel-centos-dont-append-to-etcpkitlscertsca-bundle-crt-or-etcpkitlscert-pem/