Categories
Admin Network Technologies

Routing based on source MAC address

Intro
As I am not a true network specialist but more a security operations specialist, I am always amazed by discovering network things that they probably teach in networking 101 and seem obvious to those in the know.

I had a “revelation” (to me anyways) like that lately.

The details
For all these years I’ve dabbled setting up interfaces, creating static routes, setting up RIP advertisements, reading BGP configurations, solving martian source problems, or slightly harder things like network address translation, secure network address translation and vpn tunnels, I thought I knew all the mainly relevant things there is to know about how to get packets to where you want them.

For a typical, non-routing device, I had always thought that the only choice on how to send packets out is

– local subnet if destination is on a subnet of one of the interfaces
– static route if configured for that destination IP
– default gateway if configured

And that’s that, right?

This limited number of route selection methods is very important in some architectural designs I’ve been involved with. For instance, an Intranet which has “borrowed” large swaths of valid Internet address space pretty much necessitates a two-stage proxy approach if using explicit proxy settings. Or so I thought.

What I learned
Someone pointed out to me a feature on Bluecoat proxy called return-to-sender. Normally it is disabled. But when enabled, what it does is (to me, because I never thought it possible) amazing: it sends its response packets back to the MAC address of the inbound sender! Thus it will shortcut all the routing decisions mentioned above and use the MAC address of the host which sent it a packet to which it is responding.

If I had know that was possible I might never have implemented a two-stage proxy.

I decided to try the ultimate worst-case scenario:

– use of proxy with this feature enabled and
– proxy client on same internal subnet as proxy server, 10.11.12.0/24
– source IP of proxy client = my AWS server, 50.17.188.196
– requested web site: my AWS web site at http://50.17.188.196/

In other words, steal my own IP, put it on the Intranet, and make the proxy route packets back to me in response while simultaneously connecting to that exact same IP on the Internet.

How I configured the VIP
This is very old school, but I did one of these numbers ona SLES11 system:

$ sudo ifconfig eth0:0 50.17.188.196 up

And that was sufficient to add that IP to the eth0 interface.

Did it work? Yes, it did. Amazing.

I only needed one single request to verify that it worked. Here is that request, using wget from the Linux host which is on the same subnet.

export http_proxy=http://10.11.12.13:8080/
wget --bind-address=50.17.188.196 http://50.17.188.196/

And the proxy log line this produced:

2014-08-04 13:55:08 3 50.17.188.196 - "none" 200 TCP_HIT GET text/html http 50.17.188.196 80 / - - "Wget/1.11.4" OBSERVED 10.11.12.13 769 158 - -

Other appliances can do this, too
On F5 BigIP this feature is also supported. It is called auto last hop and can be configured either globally or for an individual virtual server.

To be continued…

Categories
Network Technologies

The IT Detective Agency: Why our forwarding vserver doesn’t route

Intro
F5 BigiP appliances are very versatile networking appliances. But sometimes you gotta know what you are doing!

The details
We set up a load-balanced Radius service using the same subnet for the radius servers as the load balancer itself. Setting up this service is moderately tricky. You have to set the default route of the Radius servers to be the load balancer, and on the load balancer SNAT (which I prefer to translate as “source NAT,” though technically it is “secure NAT”) and NAT should be disabled. And there are two services, AAA and audit (UDP ports 1812 and 1813).

So everything’s a bit different when all you'[re used to is creating load-balanced pools for web servers.

So with this setup an incoming packet comes in, its source is preserved, but its destination is NAT’d to the radius server by the load balancer. In the response, the source is the radius server. That gets NAT’d to the IP of the load-balanced service. So there are two stages for incoming request packets (pre- and -post NAT) and two for the responses. Here’s a trace which shows all this:

12:10:41.259073 IP drj-wlc-nausresea055-01.drj.com.filenet-rpc > radius.drj.com.radius: RADIUS, Access Request (1), id: 0x30 length: 260
12:10:41.259086 IP drj-wlc-nausresea055-01.drj.com.filenet-rpc > wusandradaa01.drjad.drj.net.radius: RADIUS, Access Request (1), id: 0x30 length: 260
12:10:41.259735 IP wusandradaa01.drjad.drj.net.radius > drj-wlc-nausresea055-01.drj.com.filenet-rpc: RADIUS, Access Reject (3), id: 0x30 length: 44
12:10:41.259745 IP radius.drj.com.radius > drj-wlc-nausresea055-01.drj.com.filenet-rpc: RADIUS, Access Reject (3), id: 0x30 length: 44

So all is good, right? Except that now we have a default route from the Radius server to the load balancer and so all response traffic is going through the load balancer, even things not related to Radius, such as a packets from an RDP session.

So we defined a forwarding_vserver to make the BigIP act as a router:

A forwarding vserver is a virtual server of type Forwarding (IP). In the bigip.conf file it looks like this:

virtual forwarding_vserver {
   ip forward
   destination any:any
   mask 0.0.0.0
   profiles route_friendly_fastL4 {}
}

But it doesn’t work! Packets from the Radius server come to the load balancer, and then they get source NAT’d to the floating self-IP of the load balancer. That’s no good. In TCP your response packets have to come from the IP you connected to! For simple PINGs you kind of get away with it, but with a warning. In TCP your PC will send a RST (reset) packet every time it gets a response packet with the wrong source IP, even if the other information is correct.

The solution
With the help of someone who understands snat auto-maps better than I do (evidently), I got the tip that I have a global snat-automap enabled, which is doing this. That’s how I’ve always run these LTMs (Local Traffic Managers). I had forgotten why or how I did it. Well the snat-automap pretty mcuh applies to all my other load-balanced services so I can’t simply chuck it. And I don’t have another subnet handy for use so I can’t simply exclude one of my vlans. They suggested that it could be turned off on my forwarding_vserver with an irule! Who would have figured? So I created a very simple irule:

# Turn off snat, i.e., for us in our forwarding_vserver
# inspired by https://devcentral.f5.com/wiki/iRules.snat.ashx
# DrJ, 11/2013
when CLIENT_ACCEPTED {
         snat none
}

and applied it to my forwarding_vserver, which now looks like this:

virtual forwarding_vserver {
   ip forward
   destination any:any
   mask 0.0.0.0
   rules snat-none
   profiles route_friendly_fastL4 {}
}

And voila, the LTM now routes those packets correctly without any address translation! And the Radius service still does its translations as desired.

Case closed!

Conclusion
We learned a little about F5 BigIPs today. The frustrating thing about the documentation is that they don’t really cover actual use cases so they introduce configuration settings without fully explaining why you would want to use them.

For the curious, the forwarding_vserver is accomodating an asymmetric routing situation. Incoming RDP (remote desktop protocol) packets get sent directly from a Cisco router to the Radius server. It’s just the response packets that flow from the Radius server, to the LTM, to the Cisco router.

References and related
In this post I show why a basic virtual server might not be working – a kind of rookie mistake we’ve all probably made at some point!
This post shows some non-trivial iRule examples.

Categories
Admin Web Site Technologies

Tipsheet: How to run NTP on F5 BigIP

If you feel you’ve added your ntp servers correctly via the GUI (System|Configuration|Device|NTP), and yet you get an output like this:

# ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 ntp1.drj        .INIT.          16 -    - 1024    0    0.000    0.000   0.000
 ntp2.drj        .INIT.          16 -    - 1024    0    0.000    0.000   0.000

and you observe the time is off by seconds or even minutes, then you may have made the mistake I made. I used fully-qualified domain names (FQDN) for the ntp servers.

Switch from FQDNs to IP addresses and it will work fine:

# ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*ntp1.drj         10.23.34.1     3 u  783 1024  377    0.951   -8.830   0.775
+ntp2.drj         10.23.35.1     3 u  120 1024  377   20.963   -8.051   5.705

The date command will now give the correct time.

Having correct time is useful for the logging, especially if you are using ASM and are trying to correlate known activity against the reported errors.

Categories
Admin IT Operational Excellence Web Site Technologies

Virtual Server not Working in F5 BigIP

OK. This posting is only directly applicable to the small number of people who run BigIP load balancers. And of that set, only a certaIn subset will likely ever have this situation. Nevertheless, it’s useful to document it. There are lessons in it for the rest of us, it shows the creative problem-solving process used in IT, or rather the creative process that should be used.

So I had a virtual server associated with a certain pool and it was operating fine for years. Then something changes. We want to associate a new name with this virtual server, but test it first, all while keeping the old name working. Well, this is a secured site by which I mean it is running https rather than http. There’s nothing intrinsic in the web site itself that ties it to a particular name. If this were a run-of-the-mill non-secure site you would solve this problem with DNS. Set up an alias and you’re good to go. But secured sites are a wee bit trickier. They present a certificate after all. And the certificate has just one name, at least ours does. Guess I can address multi-name certificates known as Subject Alternative Name CERTs in a separate post. And that name is the original DNS name. What to do? Simple. As any BigIP admin would tell you you create a new virtual server and associate it with a new IP and a new SSL profile containing the new certificate you just bought but the old pool. In DNS assign this new IP to your new DNS name. That’s all pretty straightforward

Having done all that, I blithely tested with lynx (iI’s an old curses-based browser which runs on old Unix systems. The main point is to not test with a complex browser where like Internet Explorer where you are never 100% sure if the problem lies with the browser. If I had it, I would test with curl, but it’s not on that system.). And…it hangs.

Now I’ll admit a lot of stupid things I did (which is typical of any good debugging session of an IT professional – some self-created red herrings accompany any decent sleuthing) and I ratchet up the debugging a notch. Check the web server logs. I see no log of my lynx accesses. Dig a little deeper still. Fire up a trace. Here’s a little time-saver. BigIP does have a tcpdump program, but it is a little stunted. Typically you have multiple interfaces on a BigIP. In this case I felt it pertinent to know if packets were getting to the BigIP from lynx, and then again, if those packets were leaving the BigIP and going to the web server. So the tip is that whereas a “normal” tcpdump might allow you to use the switch -i any to listen on all interfaces, that doesn’t work on BigIP. Use -i 0.0 instead. And of course restrict it somehow so that your own shell session’s packets won’t be picked up by the trace, or else you could be in for a nasty surprise of exponentially increasing traffic (a devastating situation perhaps worthy of its own blog entry!). In this case I added an expression port 443. So I have:

tcpdump -i 0.0 port 443

And, somewhat to my surprise (You should always have a hypothesis, even if it’s just a gut feeling: will this little test work, or not. Why?) not only were packets going from lynx to BigIp and then again to the web server, I could even see returned packets back from the web server to BigIp to lynx. But it was not a lot of packets. A SYN, SYN-ACK and maybe a single data packet and that’s about it. It should have been more chatty.

The more tests you can think of, the better, especially ones that emphasize the marginal differences between the thing that works and the one that doesn’t. One test along those lines: take this same virtual server and associate it with a different pool. I did that, and that test worked!

Next, I tried to access the web server using curl on the BigIP itself. I could, but not at first. First I used the local web server URL http://web_server_ip:443/. It hung my curl command, just like using lynx on the other server. Hmm. I then looked on the web server again. I notice that it has a certificate installed. Ah. So it’s actually running https. So try curl from BigIP again, but this time with the -k switch (insecure, meaning don’t verify the certificate issuer) and a url beginning with https rather than http. Bingo. It comes back with the home page. Now we’re getting somewhere.

Finally I look more closely at the virtual server setup for the old name, the one that works. I see that the server profile is SSL. It basically means that the traffic is encrypted when it hits the BigIP, and the server CERT is associated with the external name. The BigIP decrypts the traffic, then re-encrypts it before sending it along to the web server. The CERT for the second leg is a self-signed CERT and is never seen by users.

I had forgotten to set up my new test virtual server with the server SSL profile, so the second leg of traffic was not being re-encyrpted by the BigIP, even though the web server was only willing to engage in SSL communication with the BigIP. Once I updated the server profile, it all worked fine! Of course after getting the expected results from lynx I went to my desktop browser, just like a regular user, and successfully tested it there as well. You want to make sure your final tests are a realistic approximation of what the user will be doing. If that’s not all possible under your own control, bring in a user for testing.

Liked this article? Here’s another of my IT operational excellence articles that has a somewhat wider applicability.