Categories
Admin Linux Network Technologies Web Site Technologies

The IT Detective Agency: This site can’t be reached

Intro

It’s been awhile since I’ve had the opportunity to relatean IT mystery. After awhile they are repates of what’s already happened in the past, or it’s too complex to relate, or I was only peripherally involved. But today I came across a good one. It falls into the never been seen before category.

The details

A web server behind my web application firewall became unreachable. In the browser they get a message This site can’t be reached. The app owners came to me looking for input. I checked the WAF and it was fine. The virtual server was looking healthy. So I took a packet trace, something to this effect:

$ tcpdump -nni 0.0 host 192.168.2.124

14:00:45.180349 IP 192.68.1.13.42045 > 192.68.2.124.443: Flags [S], seq 1106553901, win 23360, options [mss 1460,sackOK,TS val 3715803515 ecr 0], length 0 out slot1/tmm3 lis=/Common/was90extqa.drjohn.com.app/was90extqa.drjohn.com_vs port=0.53 trunk=
14:00:45.181081 IP 192.68.2.124 > 192.68.1.13: ICMP host 192.68.2.124 unreachable - admin prohibited filter, length 64 in slot1/tmm2 lis= port=0.47 trunk=
14:00:45.181239 IP 192.68.1.13.42045 > 192.68.2.124.443: Flags [R.], seq 1106553902, ack 0, win 0, length 0 out slot1/tmm3 lis=/Common/was90extqa.drjohn.com.app/was9
0extqa.drjohn.com port=0.53 trunk=

I’ve never seen that before, ICMP host 192.68.2.124 unreachable – admin prohibited filter. But I know ICMP can be used to relay out-of-band routing information on occasion, though I do not see it often. I suspect it is a BAD THING and forces the connection to be shut down. Question is, where was it coming from?

The communication is via a firewall so I check the firewall. I see a little more traffic so I narrow the filter down:

$ tcpdump -nni 0.0 host 192.168.2.124 host 443

And then I only see the initial SYN packet followed by the RST – from the same source IP! So since I didn’t see the bad ICMP packet on the firewall, but I do see it on the WAF, I preliminarily conclude the problem exists on the WAF.

Rookie mistake! Did you fall for it? So very, very often, in the heat of debugging, we invent some unit test which we’ve never done before, and we have to be satisified with the uncertainty in the testing method and hope to find a control test somehow, somewhere to validate our new unit test.

Although I very commonly do compound filters, in this case it makes no sense, as I realized a few minutes later. My port 443 filter would of course exclude logging the bad ICMP packets because ICMP does not use tcp port 443! So I took that out and re-run it. Yup. bad ICMP packet still present on the firewall, even on the interface of the firewall directly connected to the server.

So at this point I have proven to my satisfaction that this packet, which is ruining the communication, really comes frmo the server.

What the server guys say

Server support is outsourced. The vendor replies

As far as the patching activities go , there is nothing changed to the server except distro upgrading from 15.2 to 15.3. no other configs were changed. This is a regular procedure executed on almost all 15.2 servers in your environment. No other complains received so far…

So, the usual It’s not us, look somewhere else. So the app owner asks me for further guidance. I find it’s helpful to create a test that will convince the other party of the error with their service. And what is one test I would have liked to have seen but didn’t cnoduct? A packet trace on the server itself. So I write

I would suggest they (or you) do a packet trace on the server itself to prove to themselves that this server is not behaving ini an acceptable way, network-wise, if they see that same ICMP packet which I see.

The resolution

This kind of thing can often come to a stand-off, or many days can be wasted as an issue gets escalated to sufficiently competent technicians. In this case it wasn’t so bad. A few hours later the app owners write and mention that the home-grown local firewall seemed suspect to them. They dsabled it and this traffic began to work.

They are reaching out to the vendor to understand what may have happened.

Case: closed!

Conclusion

An IT mystery was resolved today – something we’ve never seen but were able to diagnose and overcome. We learned it’s sometimes a good thing to throw a wider net when seeing unexpected reset packets because maybe just maybe there is an ICMP host unreachable packet somewhere in the mix.

Most firewalls would just drop packets and you wait for a timeout. But this was a homegrown firewall running on SLES 15. So it abides by its own ways of working, I guess. So because of the RST, your connection closes quickly, not timing out as with a normal network firewall.

As always, one has to maintain an open mind as to the true source of an issue. What was working yesterday does not today. No one admits to changing anything. Finding clever ad hoc unit tests is the way forward, and don’t forget to validate the ad hoc test. We use curl a lot for these kinds of tests. A browser is a complex beast and too much of a black box.

Categories
Admin

Azure Cloud: can you swap public IP addresses on two VMs?

Intro
I am just beginning to use Microsoft’s Azure cloud environment. Although I am inclined to be a fan of AWS, I haven’t looked at AWS networking for awhile, and the last time I did something I felt totally lost in trying to understand their terminology.

But in spite of my natural inclination to support everything Amazon, I gotta admit that Azure was a good, usable environment for what I needed to do; swap the public IPs on two VMs.

The details
I was not getting any help whatsoever from with my organization. But I did at least get sufficient access to the Resource Group where my Redhat 7.4 VM was running. That was a godsend.

In Azure network interfaces are resources. They have IPs like 10.0.1.4, 10.0.1.7, etc.

Public IP addresses are resources. They have IPs apprporiate for your region. They are typically associated with a network interface.

A network interface in turn is associated to a VM, typically.

For some reason which no one could explain to me, I could no longer patch my RHEL 7.4 server. That began about September 2019. Meantime, I was using an application which relied on a built-in package. Now Redhat always ships with old versions of everything, so running this old version plus lack of patches really put pressure on me to upgrade to a new OS. Can you do an in-place upgrade? As far as I can tell, no. I went with SLES 15 SP1 on a new VM within the same resource group and data center since I have some familiarity with Suse Linux. That OS had a newer version of that open source package, plus it could be patched.

But the IP of the old server was embedded in several places and switching it was not an option. What to do? What to do? Can you even swap IPs on two VMs within the same Resource Group? Who knows?

Well, it turns out you can. The documentation on the topic is pretty good and cleared up some things for me. Particularly the first two links in the references at the bottom.

I took this approach.

Changed the IP from dynamic to Static (go to configure section when looking at this resource). This should have been done from the get-go, but wasn’t. Who knew?

Dissociate the IP from the network interface.

Changed the second IP from dynamic to static.

Dissociate this IP from its network interface.

Associate IP to network interface of the SLES 15 server.

Associate second IP to network interface of the RHEL server.

And that’s it…. It worked like a charm.

Then I cleaned up some old public IP addresses which weren’t being used. You have to remember there is a shortage of IPs. So a lot of the quirk you encounter are due to their utilizing Ips as sparingly as possible. makes sense to me. For instance you can have a “public IP” resource which has no value! it may not get a value until its absolutely needed by virtue of being associated with a network interface on an active server. Stuff like that…

Conclusion
Yes, you can indeed swap public IPs on two servers if they belong to the same Resource Group and I guess the same data center. I know because I did it. As a bonus I found that the Azure documentation is pretty clear and sufficiently detailed.

References and related
https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-ip-addresses-overview-arm
https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-network-interface-addresses