Categories
Admin Web Site Technologies

TCL iRule program with comments for F5 BigIP

Intro

A publicity-adverse colleague of mine wrote this amazing program. I wanted to publish it not so much for what it specifically does, but as well for the programming techniques it uses. I personally find i relatively hard to look up concepts when using TCL for an F5 iRule.

Program Introduction

Test


# RULE_INIT is executed once every time the iRule is saved or on reboot. So it is ideal for persistent data that is shared accross all sessions.
# In our case it is used to define a template with some variables that are later substituted

when RULE_INIT {
# "static" variables in iRules are global and read only. Unlike regular TCL global variables they are CMP-friendly, that means they don't break the F5 clustered multi-processing mechanism. They exist in memory once per CMP instance. Unlike regular variables that exist once per session / iRule execution. Read more about it here: https://devcentral.f5.com/s/articles/getting-started-with-irules-variables-20403
#
# One thing to be careful about is not to define the same static variable twice in multiple iRules. As they are global, the last iRule saved overwrites any previous values.
# Originally the idea was to load an iFile here. That's also the main reason to even use RULE_INIT and static variables. The reasoning was (and I don't even know if this is true), that loading the iFile into memory once would have to be more efficient than to do it every time the iRule is executed. However, it is entirely possible that F5 already optimized iFiles in a way that loads them into memory automatically at opportune times, so this might be completely unnecessary.
# Either way, as you can tell, in the end I didn't even use iFiles. The reason for that is simply visibility. iFiles can't be easily viewed from the web UI, so it would be quite inconvenient to work with.
# The template idea and the RULE_INIT event stayed, even though it doesn't really serve a purpose, except maybe visually separating the templates from the rest of the code.
#
# As for the actual content of the variable: First thing to note is the use of  {} to escape the entire string. Works perfectly, even though the string itself contains braces. TCL magic.
# The rest is just the actual PAC file, with strategically placed TCL variables in the form of $name (this becomes important later)

            set static::pacfiletemplate {function FindProxyForURL(url, host)
{
            var globalbypass = "$globalbypass";
            var localbypass = "$localbypass";
            var ceglobalbypass = "$ceglobalbypass";
            var zpaglobalbypass = "$zpaglobalbypass";
            var zscalerbypassexception = "$zscalerbypassexception";

            var bypass = globalbypass.split(";").concat(localbypass.split(";"));
            var cebypass = ceglobalbypass.split(";");
            var zscalerbypass = zpaglobalbypass.split(";");
            var zpaexception = zscalerbypassexception.split(";");

            if(isPlainHostName(host)) {
                        return "DIRECT";
            }

            for (var i = 0; i < zpaexception.length; ++i){
                        if (shExpMatch(host, zpaexception[i])) {
                                   return "PROXY $clientproxy";
                        }
            }

            for (var i = 0; i < zscalerbypass.length; ++i){
                        if (shExpMatch(host, zscalerbypass[i])) {
                                   return "DIRECT";
                        }
            }

            for (var i = 0; i < bypass.length; ++i){
                        if (shExpMatch(host, bypass[i])) {
                                   return "DIRECT";
                        }
            }

            for (var i = 0; i < cebypass.length; ++i) {
                        if (shExpMatch(host, cebypass[i])) {
                                   return "PROXY $ceproxy";
                        }
            }

            return "PROXY $clientproxy";
}
}

            set static::forwardingpactemplate {function FindProxyForURL(url, host)
{
            var forwardinglist = "$forwardinglist";
            var forwarding = forwardinglist.split(";");

            for (var i = 0; i < forwarding.length; ++i){
                        if (shExpMatch(host, forwarding[i])) {
                                   return "PROXY $clientproxy";
                        }
            }

            return "DIRECT";
}
}
}

# Now for the actual code (executed every time a user accesses the vserver)
when HTTP_REQUEST {
    # The request URI can of course be used to differentiate between multiple PAC files or to restrict access.
    # So can basically any other request attribute. Client IP, host, etc.
            if {[HTTP::uri] eq "/proxy.pac"} {

                        # Here we set variables with the exact same name as used in the template above.
                        # In our case the values come from a data group, but of course they could also be defined
                        # directly in this iRule. Using data groups makes the code a bit more compact and it
                        # limits the amount of times anyone needs to edit the iRule (potentially making a mistake)
                        # for simple changes like adding a host to the bypass list
                        # These variables are all set unconditionally. Of course it is possible to set them based
                        # on for example client IP (to give different bypass lists or proxy entries to different groups of users)
                        set globalbypass [ class lookup globalbypass ProxyBypassLists ]
                        set localbypass [ class lookup localbypassEU ProxyBypassLists ]
                        set ceglobalbypass [ class lookup ceglobalbypass ProxyBypassLists ]
                        set zpaglobalbypass [ class lookup zpaglobalbypass ProxyBypassLists ]
                        set zscalerbypassexception [ class lookup zscalerbypassexception ProxyBypassLists ]
                        set ceproxy [ class lookup ceproxyEU ProxyHosts ]

                        # Here's a bit of conditionals, setting the proxy variable based on which virtual server the
                        # iRule is currently executed from (makes sense only if the same iRule is attached to multiple
                        # vservers of course)
                        if {[virtual name] eq "/Common/proxy_pac_http_90_vserver"} {
                            set clientproxy [ class lookup formauthproxyEU ProxyHosts ]
                        } elseif {[virtual name] eq "/Common/testproxy_pac_http_81_vserver"} {
                            set clientproxy [ class lookup testproxyEU ProxyHosts]
                        } elseif {[virtual name] eq "/Common/proxy_pac_http_O365_vserver"} {
                            set clientproxy [ class lookup ceproxyEU ProxyHosts]
                        } else {
                            set clientproxy [ class lookup clientproxyEU ProxyHosts ]
                }

                        # Now this is the actual magic. As noted above we have now set TCL variables named for example
                        # $globalbypass and our template includes the string "$globalbypass"

                        # What we want to do next is substitute the variable name in the template with the variable values
                        # from the code.
                        # "subst" does exactly that. It performs one level of TCL execution. Think of "eval" in basically
                        # any language. It takes a string and executes it as code.
                        # Except for "subst" there are two in this context very useful parameters: -nocommands and -nobackslashes.
                        # Those prevent it from executing commands (like if there was a ping or rm or ssh or find or anything
                        # in the string being subst'd it wouldn't actually try to execute those commands) and from normalizing
                        # backslashes (we don't have any in our PAC file, but if we did, it would still work).
                        # So what is left that it DOES do? Substituting variables! Exactly what we want and nothing else.
                        # Now since the static variable is read only, we can't do this substitution on the template itself.
                        # And if we could it wouldn't be a good idea, because it is shared accross all sessions. So assuming
                        # there are multiple versions of the PAC file with different proxies or bypass lists, we would
                        # constantly overwrite them with each other.
                        # The solution is simply to save the output of the subst in a new local variable that exists in
                        # session context only.
                        # So from a memory point of view the static/global template doesn't really gain us anything.
                        # In the end we have the template in memory once per CMP and then a substituted copy of the template
                        # once per session. So as noted earlier, could've probably just removed the entire RULE_INIT block,
                        # set the template in session context (HTTP_REQUEST event) and get the same result,
                        # maybe even slightly more efficient.
                        set pacfile [subst -nocommands -nobackslashes $static::pacfiletemplate]

                        # All that's left to do is actually respond to the client. Simple stuff.
                        HTTP::respond 200 content $pacfile "Content-Type" "application/x-ns-proxy-autoconfig" "Cache-Control" "private,no-cache,no-store,max-age=0"
            # In this example we have two different PAC files with different templates on different URLs
            # Other iRules we use have more differentiation based on client IP. In theory we could have one big iRule
            # with all the PAC files in the world and it would still scale very well (just a few more if/else or switch cases)
            } elseif { [HTTP::uri] eq "/forwarding.pac" } {
                set clientproxy [ class lookup clientproxyEU ProxyHosts]
                set forwardinglist [ class lookup forwardinglist ProxyBypassLists ]
            set forwardingpac [subst -nocommands -nobackslashes $static::forwardingpactemplate]
            HTTP::respond 200 content $forwardingpac "Content-Type" "application/x-ns-proxy-autoconfig" "Cache-Control" "private,no-cache,no-store,max-age=0"
            } else {
                # If someone tries to access a different path, give them a 404 and the right URL
                HTTP::respond 404 content "Please try http://webproxy.drjohns.com/proxy.pac" "Content-Type" "text/plain" "Cache-Control" "private,no-cache,no-store,max-age=0"
            }
}

To be continued...

Categories
Web Site Technologies

F5 BigIP irule: serving a dynamic proxy PAC file

Intro
A large organization needed to have considerable flexibility in serving out its proxy PAC file to web browsers. The legacy approach – perl script on web servers – was replaced by a TCL script I developed which runs on one of their F5 load balancers. In this post I lay out the requirements and how I tackled an unfamiliar language, TCL, to come up with an efficient and elegant solution.
Even if you have no interest in PAC files, if you are developing an irule you’ll probably scratch your head figuring out the oddities of TCL programming. This post may help in that case too.


The irule

# - DrJ 8/3/17
when CLIENT_ACCEPTED {
set cip [IP::client_addr]
}
when HTTP_REQUEST {
set debug 0
# supply an X-DRJ-PAC, e.g., w/ curl, to debug: curl -H 'X-DRJ-PAC: 1.2.3.4' 50.17.188.196/proxy.pac
if {[HTTP::header exists "X-DRJ-PAC"]} {
# overwrite client ip from header value for debugging purposes
  log local0. "DEBUG enabled. Original ip: $cip"
  set cip [HTTP::header value "X-DRJ-PAC"]
  set debug 1
  log local0. "DEBUG. overwritten ip: $cip"
}

# security precaution: don't accept any old uri
if { ! ([HTTP::uri] starts_with "/proxy.pac" || [HTTP::uri] starts_with "/proxy/proxy.cgi") } {
  drop
  log local0. "uri: [HTTP::uri], drop section. cip: $cip"
} else {
#
set LDAPSUFFIX ""
if {[HTTP::uri] ends_with "cgi"} {
  set LDAPSUFFIX "-ldap"
}
# determine which central location to use
if { [class match $cip equals PAC-subnet-list] } {
# If client IP is in the datagroup, send user to appropriate location
    set LOCATION [class lookup $cip PAC-subnet-list]
    if {$debug} {log local0. "DEBUG. match list: LOCATION: $LOCATION"}
} elseif {  $cip ends_with "0" || $cip ends_with "1" || $cip ends_with "4" || $cip ends_with "5" } {
# client IP was not amongst the subnets, use matching of last digit of last octet to set the NJ proxy (01)
    set LOCATION "01"
    if {$debug} {log local0. "DEBUG. match last digit prefers NJ : LOCATION: $LOCATION"}
} else {
# set LA proxy (02) as the default choice
    set LOCATION "02"
    if {$debug} {log local0. "DEBUG. neither match list nor match digit matched: LOCATION: $LOCATION"}
}
 
HTTP::respond 200 content "
function FindProxyForURL(url, host)
{
// o365 and other enterprise sites handled by dedicated proxy...
var cesiteslist = \"*.aadrm.com;*.activedirectory.windowsazure.com;*.cloudapp.net;*.live.com;*.microsoft.com;*.microsoftonline-p.com;*.microsoftonline-p.net;*.microsoftonline.com;*.microsoftonlineimages.com;*.microsoftonlinesupport.net;*.msecnd.net;*.msn.co.jp;*.msn.co.uk;*.msn.com;*.msocdn.com;*.office.com;*.office.net;*.office365.com;*.onmicrosoft.com;*.outlook.com;*.phonefactor.net;*.sharepoint.com;*.windows.net;*.live.net;*.msedge.net;*.onenote.com;*.windows.com\";
var cesites = cesiteslist.split(\";\");
for (var i = 0; i &lt; cesites.length; ++i){
  if (shExpMatch(host, cesites\[i\])) {
    return \"PROXY http-ceproxy-$LOCATION.drjohns.net:8081\" ;
  }
}
// client IP: $cip.
// Direct connections to local domain
if (dnsDomainIs(host, \"127.0.0.1\") ||
           dnsDomainIs(host, \".drjohns.com\") ||
           dnsDomainIs(host, \".drjohnstechtalk.com\") ||
           dnsDomainIs(host, \".vmanswer.com\") ||
           dnsDomainIs(host, \".johnstechtalk.com\") ||
           dnsDomainIs(host, \"localdomain\") ||
           dnsDomainIs(host, \".drjohns.net\") ||
           dnsDomainIs(host, \".local\") ||
           shExpMatch(host, \"10.*\") ||
           shExpMatch(host, \"192.168.*\") ||
           shExpMatch(host, \"172.16.*\") ||
           isPlainHostName(host)
        ) {
           return \"DIRECT\";
        }
        else
        {
           return \"PROXY http-proxy-$LOCATION$LDAPSUFFIX.drjohns.net:8081\" ;
        }
 
}
" \
"Content-Type" "application/x-ns-proxy-autoconfig" \
"Expires" "[clock format [expr ([clock seconds]+7200)] -format "%a, %d %h %Y %T GMT" -gmt true]"
}
}

I know general programming concepts but before starting on this project, not how to realize my ideas in F5’s version of TCL. So I broke all the tasks into little pieces and demonstrated that I had mastered each one. I describe in this blog post all that I leanred.

How to use the browser’s IP in an iRule
If you’ve ever written an iRule you’ve probably used the section that starts with when HTTP_REQUEST. But that is not where you pick up the web browser’s IP. For that you go to a different section, when CLIENT_ACCEPTED. Then you throw it into a variable cip like this:
set cip [IP::client_addr]
And you can subsequently refer back to $cip in the when HTTP_REQUEST section.

How to send HTML (or Javascript) body content

It’s also not clear you can use an iRule by itself to send either HTML or Javascript in this case. After all until this point I’ve always had a back-end load balancer for that purpose. But in fact you don’t need a back-end web server at all. The secret is this line:

HTTP::respond 200 content "
function FindProxyForURL(url, host)
...

So with this command you set the HTTP response status (200 is an OK) as well as send the body.

Variable interpolation in the body

If the body begins with ” it will do variable interpolation (that’s what we Perl programmers call it, anyway, where your variables like $cip get turned into their value before being delivered to the user). You can also begin the body with a {, but what follows that is a string literal which means no variable interpolation.

The bad thing about the ” character is that if your body contains the ” character, or a [ or ], you have to escape each and every one. And mine does – a lot of them in fact.

But if you use { you don’t have to escape characters, even $, but you also don’t have a way to say “these bits are variables, interpolate them.” So if your string is dynamic you pretty mcuh have to use “.

Defeat scanners

This irule will be the resource for a VS (virtual server) which effectively acts like a web server. So dumb enterprise scanners will probably come across it and scan for vulnerabilities, whether it makes sense or not. A common thing is for these scanners to scan with random URIs that some web servers are vulnerable to. My first implementation had this VS respond to any URI with the PAC file! I don’t think that’s desirable. Just my gut feeling. We’re expecting to be called by one of two different names. Hence this logic:

if { ! ([HTTP::uri] starts_with "/proxy.pac" || [HTTP::uri] starts_with "/proxy/proxy.cgi") } {
  drop
...
else
  (send PAC file)

The original match operator was equals, but I found that some rogue program actually appends ?Type=WMT to the normal PAC URL! How annoying. That rogue application, by the way, seems to be Windows Media Player. You can kind of see where they were goinog with this, for Windows Media PLayer you might want to present a different set of proxies, I suppose.

Match IP against a list of subnets and pull out value
Some background. Company has two proxies with identical names except one ends in 01, the other in 02. 01 and 02 are in different locales. So we created a data group of type address: PAC-subnet-list. The idea is you put in a subnet, e.g., 10.9.7.0/24 and a proxy value, either “01” or “02”. This TCL line checks if the client IP matches one of the subnets we’ve entered into the datagroup:

if { [class match $cip equals PAC-subnet-list] } {

Then this tcl line is used to match the client IP against one of those subnets and retrieve the value and store it into variable LOCATION:

set LOCATION [class lookup $cip PAC-subnet-list]

The reason for the datagroup is to have subnets with LAN-speed connection to one of the proxies use that proxy.

Something weird
Now something weird happens. For clients within a subnet that doesn’t match our list, we more-or-less distribute their use of both proxies equally. So at a remote site, users with IPs ending in 0, 1, 4, or 5 use proxy 01:

elseif {  $cip ends_with "0" || $cip ends_with "1" || $cip ends_with "4" || $cip ends_with "5" } {

and everyone else uses proxy 02. So users can be sitting right next to each other, each using a proxy at a different location.

Why didn’t we use a regular expression, besides the fact that we don’t know the syntax 😉 ? You read about regular expressions in the F5 Devcentral web site and the first thing it says is don’t use them! Use something else like start_with, ends_with, … I guess the alternatives will be more efficient.

Further complexity: different proxies if called by different name
Some specialized desktops are configured to use a PAC file which ends in /proxy/proxy.cgi. This PAC file hands out different proxies which do LDAP authentication, as opposed to NTLM/IWA authentication.. Hence the use of the variable LDAPSUFFIX. The rest of the logic is the same however.

Debugging help
I like this part – where it helps you debug the thing. Because you want to know what it’s really doing and that can be pretty hard to find out, right? You could run a trace but that’s not fun. So I create this way to do debugging.

if {[HTTP::header exists "X-DRJ-PAC"]} {
# overwrite client ip from header value for debugging purposes
  log local0. "DEBUG enabled. Original ip: $cip"
  set cip [HTTP::header value "X-DRJ-PAC"]
  set debug 1
  log local0. "DEBUG. overwritten ip: $cip"
}

It checks for a custom HTTP request header, X-DRJ-PAC. You can call it with that header, from anywhere, and for the value put the client iP you wish to test, e.g., 1.2.3.4. That will overwrite the client IP varibale, cip, set the debug variable, and add some log lines which get nicely printed out into your /var/log/ltm file. So your ltm file may log info about your script’s goings-on like this:

Aug  8 14:06:48 f5drj1 info tmm[17767]: Rule /Common/PAC-irule : DEBUG enabled. Original ip: 11.195.136.89
Aug  8 14:06:48 f5drj1 info tmm[17767]: Rule /Common/PAC-irule : DEBUG. overwritten ip: 12.196.68.91
Aug  8 14:06:48 f5drj1 info tmm[17767]: Rule /Common/PAC-irule : DEBUG. match last digit prefers NJ : LOCATION: 01

And with curl it is not hard at all to send this custom header as I mention in the comments:

$ curl ‐H ‘X-DRJ-PAC: 12.196.88.91’ 50.17.188.196/proxy.pac

Expires header
We add an expires header so that the PAC file is good for two hours (7200 seconds). I don’t think it does much good but it seems like the right thing to do. Don’t ask me, I just stole the whole line from f5devcentral.

A PAC file should have the MIME type application/x-ns-proxy-autoconfig. So we set that explicit MIME type with

"Content-Type" "application/x-ns-proxy-autoconfig"

The “\” at the end of some lines is a line continuation character.

Performance
They basically need this to run several hundred times per second. Occasionally PAC file requests “go crazy.” Will it? This part I don’t know. It has yet to be battle-tested. But it is production tested. It’s been in production for over a week. The virtual server consumes 0% of the load balancer’s CPU, which is great. And for the record, the traffic is 0.17% of total traffic from the proxy server, so very modest. So at this point I believe it will survive a usage storm much better than my apache web servers did.

Why bother?
After I did all this work someone pointed out that this all could have been done within the Javascript of the PAC file itself! I hadn’t really thought that through. But we agreed that doesn’t feel right and may force the browser to do more evaluation than what we want. But maybe it would have executed once and the results cached somehow?? It’s hard to see how since each encountered web site could have potentially a different proxy or none at all so an evaluation should be done each time. So we always try to pass out a minimal PAC file for that reason.

Why not use Bluecoat native PAC handling ability?
Bluecoat proxySG is great at handing out a single, fixed PAC file. It’s not so good at handing out different PAC files, and I felt it was just too much work to force it to do so.

References and related
F5’s DevCentral site is invaluable and the place where I learned virtually everything that went into the irule shown above. devcentral.f5.com
Excessive calls to PAC file.

A more sophisticated PAC file example created by my colleague is shown here: https://drjohnstechtalk.com/blog/2021/03/tcl-irule-program-with-comments-for-f5-bigip/

Categories
Admin Web Site Technologies

The IT Detective agency: Outlook client is Disconnected, all else fine

Intro
Today we were asked to consult on the following problem. Some proxy users at a large company could not connect to Microsoft Outlook. Only a few users were affected. Fix it.

The details
Affected users would bring up Outlook and within a few short seconds it would simply show Disconnected and stay that way.

It was quickly established that the affected users shared this in common: they use LDAP authentication and proxy-basic-authentication. The users who worked used NTLM authentication. The way they distinguish one from the other is by using a different proxy autoconfiguration (PAC) file.

More observations
Well, actually there was almost no difference whatsoever between the two PAC files. They are syntactically identical. The only difference in fact is that a different proxy is handed out for the NTLM users. That’s it!

We were able to reproduce the problem ourselves by using the same PAC file as the affected user. We tried to trace the traffic on our desktop but it was a complete mess. I did not see any connection to the designated proxy for Outlook traffic, but it’s hard to say definitively because there is so much other junk present. Strangely, all web sites worked OK and even the web-based version of Outlook works OK. So this Outlook client was the only known application having a problem.

When the affected users put in the proxy directly in manual proxy settings in IE and turned off proxy autoconfig, then Outlook worked. Strange.

We observed the header for the PAC file was a little bit inconsistent (it was being served from multiple web servers through a load balancer). The content-tyep MIME header was coming back as either text/plain or there was no such header at all, depending on which web server you were hitting. But note that the NTLM users were also getting PAC files with this same header.

The solution

Although everything had been fine with this header situation up until the introduction of Outlook, we guessed it was technically incorrect and should be fixed. We changed all web servers to have the PAC file be served with this MIME header:

Content-Type: application/x-ns-proxy-autoconfig

The results

A re-test confirmed that this fixed the Outlook problem for the LDAP-affected users. NTLM users were not impacted and continued to work fine.

Conclusion
A strange Outlook connection problem was resolved in large company Intranet by adjusting the PAC file to include the correct content-type header. Case closed!

References and related information
Here’s a PAC file case we never did resolve: excessive calls to the PAC file web server from individual users.

Categories
Network Technologies Proxy

The IT Detective Agency: The case of the purloined packet

Intro
Someone in the org notices that laptops can’t connect to our VPN concentrator when using Verizon MiFi. An investigation ensues and culprits are found. Read on for the details…

The details

Some background
The laptops are configured to use an explicit PAC file (proxy auto-config) for proxy access to the Internet. It works great when they’re on the Intranet. There were some initial bumps in the road, so we found the best results (as determined by using normal ISPs such as CenturyLink) for successful sign-on to VPN and establishing the tunnel was to create a dummy service on the Internet for the PAC file server. The main thing that does is send a RESET packet every time someone requests a PAC file. In that way the laptop knows that it should quickly give up on retrieving the PAC file and just use direct Internet connections for its initial connection to the VPN concentrator. OK?

So with that background, this current situation will make more sense. sometime after we developed that approach someone from the organization came along and tried to connect to VPN using a Verizon MiFi device. Although in the JunOS client it shows Connected in actual fact no tunnel is ever established so it does not work.

Time for to try some of our methodical testing to get to the bottom of this. Fortunately my friend has a Verizon MiFi jetpack, so I hooked myself up. Yup, same problem. It shows me connected but I haven’t been issued a tunnel IP (as ipconfig shows) and I am not on the internal network.

Next test – no PAC file
Then I turned off use of a PAC file and tried again. I connected and brought up a tunnel just fine!

Next test – PAC file restored, use of normal ISP

Then I restored the PAC file setting and tried my regular ISP, CenturyLink. I connected just fine!

Next test – try with a Verizon Hotspot

I tried my Verizon phone’s Hotspot. It also did not work!

I performed these tests several times to make sure there weren’t any flukes.

I also took traces of all these tests. To save myself time I won’t share the traces here like I normally do, but describe the relevant bits that I feel go into the heart of this case.

The heart of the matter
When the laptop is bringing up the VPN tunnel there is always at least one and usually several requests for the PAC file (assuming we’re in a mode where PAC file is in use). Remember I said above that the PAC file on the Internet is “served” by a dummy server that all it does is respond to every request (at TCP level every SYN) with a RST (reset)?

Well, that’s exactly what I saw on the laptop trace when using CenturyLink. But is not what I saw when using either Verizon MiFi or HotSpot. No, in the Verizon case the laptop is sent a (SYN,ACK). So the laptop in that case finishes the TCP handshake with an ACK and then makes a full HTTP request for the PAC file (which of course it never receives).

Well, that is just wrong because now the laptop thinks there’s a chance that it might get the PAC file if it just waits around long enough, or something like that.

The main point is that Verizon – probably with the best intentions – has messed with our TCP packets and the laptop’s behavior is so different as a result that it breaks this application.

I’m still thinking about what to do. I doubt Verizon, being a massive organization, is likely to change their ways, but we’ll see. UPDATE. After a few weeks and poor customer support from Verizon, they want to try some more tests. Verizon refuses to engage in a discussion at a technical level with us and ignores all our technical points in this open case. They don’t agree or disagree, but simply ignore it. Then they cherry pick some facts which support their preconceived notion of the conclusion. “It work with the PAC file turned off so the only change is on your end” is pretty much an exact quote from their “support.”

We also asked Verizon for all their ranges so we could respond differently to Verizon users, but they also ignored that request.

In particular What Verizon is doing amounts to WAN optimization.

UPDATE – a test with Sprint
I got a chance to try yet a different Telecom ISP: Sprint. I expected it to work because we don’t have any complaints, and indeed it did. I tested with a Sprint aircard someone lent me. But the surprising thing – indeed amazing thing – is how it worked. Sprint also clobbers those RST packets from when the laptop if fetching the PAC file. They also turn them into SYN,ACKs. But they carry the deceit one step further. When the laptop then does the GET request, they go so far as to fake the server response! They respond with a short 503 server fail status code! The laptop makes a few attempts to get the PAC file, always getting these 503 responses and then it gives up and it is happy to establish the VPN tunnelled connection at that point! So this has got me thinking…

April, 2014 update
Verizon actually did continue to work with us. They asked us the IP of our external PAC file server, never revealing what they were going to do with that information. Then we provided them phone numbers of Verizon devices we wished to test with. As an aside did you know that a Verizon MiFi Jetpack has a phone number? It does. Just log into it at my.jetpack and it will be displayed.

So I tried the Jetpack after they did their secret things, and, yes, VPN now works. But we are not helpless. I also traced that session, and in particular the packets to and from the PAC file webserver. Yes, they are coming back to my laptop as RSTs. I tested a second time, because once can be a fluke. Same thing: successful connection and RST packets as we wanted them to be.

Now we just need the fix to be generalized to all Verizon devices.

May 2014 update
Finally, finally, on May 6th we got the word that the fix has been rolled out. They are “bypassing” the IPs of our PAC file webservers, which means we get to see the RST packets. I tested an aircard and a Verizon hotspot and all looked good. VPN was established and RST packets were observed.

Case finally closed.

Conclusion
Verizon has done some “optimizations” which clobber our RST packets. It is probably pre-answering SYNs, with ACKs, which under ordinary circumstances probably helps performance. But it is fatal to creating an SSL VPN tunnel with the JunOS client configured with a PAC file. Sprint optimizes as well, but in such a way that things actually work. Verizon actually worked with us, at their own slow pace and secretive methodology, and changed how they handled those packets and things began to work correctly.

References
The next two links are closely related to this current issue.
The IT Detective Agency: How We Neutralized Nasty DNS Clobbering Before it Could Bite Us.
The IT Detective Agency: Browsing Stopped Working on Internet-Connected Enterprise Laptops

Categories
Admin DNS Proxy

The IT Detective Agency: Browsing Stopped Working on Internet-Connected Enterprise Laptops

Intro
Recall our elegant solution to DNS clobbering and use of a PAC file, which we documented here: The IT Detective Agency: How We Neutralized Nasty DNS Clobbering Before it Could Bite Us. Things were going smoothly with that kludge in place for months. Then suddenly it wasn’t. What went wrong and how to fix?? Read on.

The Details
We began hearing reports of increasing numbers of users not being able to access Internet sites on their enterprise laptops when directly connected to Internet. Internet Explorer gave that cryptic error to the effect Internet Explorer cannot display the web page. This was a real problem, and not a mere inconvenience, for those cases where a user was using a hotel Internet system that required a web page sign-on – that web page itself could not be displayed, giving that same error message. For simple use at home this error may not have been fatal.

But we had this working. What the…?

I struggled as Rop demoed the problem for me with a user’s laptop. I couldn’t deny it because I saw it with my own eyes and saw that the configuration was correct. Correct as in how we wanted it to be, not correct as in working. Basically all web pages were timing out after 15 seconds or so and displaying cannot display the web page. The chief setting is the PAC file, which is used by this enterprise on their Intranet.

On the Internet (see above-mentioned link) the PAC file was more of an annoyance and we had aliased the DNS value to 127.0.0.1 to get around DNS clobbering by ISPs.

While I was talking about the problem to a colleague I thought to look for a web server on the affected laptop. Yup, there it was:

C:\Users\drj>netstat -an|more

Active Connections

  Proto  Local Address          Foreign Address        State
  TCP    0.0.0.0:80             0.0.0.0:0              LISTENING
  TCP    0.0.0.0:135            0.0.0.0:0              LISTENING
  ...

What the…? Why is there a web server running on port 80? How will it respond to a PAC file request? I quickly got some hints by hitting my own laptop with curl:

$ curl -i 192.168.3.4/proxy.pac

HTTP/1.1 404 Not Found
Content-Type: text/html; charset=us-ascii
Server: Microsoft-HTTPAPI/2.0
Date: Tue, 17 Apr 2012 18:39:30 GMT
Connection: close
Content-Length: 315


Not Found

Not Found


HTTP Error 404. The requested resource is not found.

So the server is Microsoft-HTTPAPI, which upon invetsigation seems to be a Microsoft Web Deployment Agent Service (MsDepSvc).

The main point is that I don’t remember that being there in the past. I felt it’s presence, probably a new “feature” explained the current problem. What to do about it however??

Since this is an enterprise, not a small shop with a couple PCs, turning off MsDepSvc is not a realistic option. It’s probably used for peer-to-peer software distribution.

Hmm. Let’s review why we think our original solution worked in the first place. It didn’t work so well when the DNS was clobbered by my ISP. Why? I think because the ISP put up a web server when it encountered a NXDOMAIN DNS response and that web server gave a 404 not found error when the browser searched for the PAC file. Turning the DNS entry to the loopback interface, 127.0.0.1, gave the browser a valid IP, one that it would connect to and quickly receive a TCP RST (reset). Then it would happily conclude there was no way to reach the PAC file, not use it, and try DIRECT connections to Internet sites. That’s my theory and I’m sticking to it!

In light of all this information the following option emerged as most likely to succeed: set the Internet value of the DNS of the PAC file to a valid IP, and specifically, one that would send a TCP RST (essentially meaning that TCP port 80 is reachable but there is no listener on it).

We tried it and it seemed to work for me and my colleague. We no loner had the problem of not being able to connect to Internet sites with the PAC file configured and directly connected to Internet.

I noticed however that once I got on the enterprise network via VPN I wasn’t able to connect to Internet sites right away. After about five minutes I could.

My theory about that is that I had a too-long TTL on my PAC file DNS entry. The TTL was 30 minutes. I shortened it to five minutes. Because when you think about it, that DNS value should get cached by the PC and retained even after it transitions to Intranet-connected.

I haven’t retested, but I think that adjustment will help.

Conclusion
I also haven’t gotten a lot of feedback from this latest fix, but I’m feeling pretty good about it.

Case: again mostly solved.

Hey, don’t blame me about the ambiguity. I never hear back from the users when things are working 🙂

References
A closely related case involving Verizon “clobbering” TCP RST packets also bit us.

Categories
Admin Apache Uncategorized Web Site Technologies

The IT Detective agency: Excessive Requests for PAC file Crippling Web Server

Intro
Funny thing about infrastructure. You may have something running fine for years, and then suddenly it doesn’t. That is one of the many mysteries in the case of the excessive requests for PAC file.

The Details
We serve our Proxy Auto-config (PAC) file from a couple web servers which are load-balanced. It’s worked great for over 10 years. The PAC file is actually produced by a Perl script which can alter the content based on the user’s IP or other variables.

The web servers got bogged down last week. I literally observed the load average shoot up past 200 (on a 2-CPU server). This is not good.

I quickly noticed lots and lots of accesses for wpad.dat and proxy.pac. Some PCs were individually making hundreds of requests for these files in a day! Sometimes there were 15 or 20 requests in a minute. Since it is a script it takes some compute power to handle all those requests. So it was time to do one of two things: either learn the root cause and address it, or make a quick fix. The symptoms were clear enough, but I had no idea about the root cause. I also was fascinated by the requests for wpad.dat which I felt was serving no purpose whatsoever in our environment. So I went for the quick fix hopinG that understanding would come later.

To be continued…
As promised – three and a half years later! And we still have this problem. It’s probably worse than ever. I pretty much threw in the towel and learned how to scale up our apache web server to handle more PAC file requests simultaneously, see the references.

References
Scaling apache to handle more requests.

Categories
Admin IT Operational Excellence Network Technologies

The IT Detective Agency: the case of the Adobe form network issue

Intro
Sometimes IT is called in to fix things we know little or nothing about. We may fix it, still not know anything, except what we did to fix it, and move on. Such is the case here with a mysterious Adobe Form that wasn’t working when I was called in to a meeting to discuss it.

The Case
One of our developers created a simple Adobe Acrobat form document. It has some logic to ask for a username and password and then verify them against an LDAP server using a network connection. Or at least that was the theory. It worked fine and then we got new PCs running Windows 7 and it stopped working. They asked me for help.

I asked to get a copy of the form because I like to test on my desktop where I am free to try dumb things and not waste others’ time. Initially I thought I also had the error. They showed me how to turn on Javascript debugging in edit|preferences. The debug window popped up with something like this:

debug 5 : function setConstants
debug 5 : data.gURL = https://b2bqual.drjohnstechtalk.com/invoke/Drjohns_Services.utilities:httpPostToBackEnd
debug 5 : function today_ymd
debug 5 : mrp::initialize: version 0.0001 debug level = 5
debug 5 : Login clicked
debug 5 : calling LDAPQ
debug 5 : in LDAPQ
 
NotAllowedError: Security settings prevent access to this property or method.
SOAP.request:155:XFA:data[0]:mrp[0]:sub1[0]:btnLogin[0]:clic

But this wasn’t the real problem. In this form you get a yellow bar at the top along with this message and you give approval to the form to access what it needs. Then you run it again.

For me, then, it worked. I knew this because it auto-populated some user information fields after taking a few seconds.

So i worked with a couple people for whom it wasn’t working. One had Automatically detect proxy settings checked. Unfortunately the new PCs came this way and it’s really not what we want. We prefer to provide a specific PAC file. With the auto-detect unchecked it worked for this guy.

The next guy said he already had that unchecked. I looked at his settings and confirmed that was the case. However, in his case he mentioned that Firefox is his default browser. He decided to change it back to Internet Explorer. Then he tested and lo and behold. It began to work for him as well!

When it wasn’t working he was seeing an error:

NetworkError: A network connection could not be created.

Later he realized that in Firefox he also was using auto-detect for the proxy settings. When he switched that to Use System Settings all was OK and he could have FF as default browser and get this form to work.

Conclusion
This is speculation on my part. I guess that our new version of Acrobat Reader X, v 10.1.1, is not competent in interpreting the auto-detect proxy setting, and that it is also tripped up by the proxy settings in Firefox.

There’s a lot more I’d like to understand about what was going on here, but sometimes speed counts. The next problem is already calling my name…

Categories
Admin DNS IT Operational Excellence

The IT Detective Agency: How We Neutralized Nasty DNS Clobbering Before it Could Bite Us

This gets a little involved. But if you’re the IT expert called on to fix something, you better be able to roll up your sleeves and figure it out!

In this article, I described how some, but not all ISPs change the results of DNS queries in violation of Internet standards.

A Proxy PAC for All
This work was done for an enterprise. They want everyone to use a proxy PAC file which whose location was to be (obfuscating the domain name just a little here) http://webproxy.intranet.drjohnstechtalk.com/proxy.pac. Centralized large enterprises like this sort of thing because the proxy settings are controlled in the one file, proxy.pac, by the central IT department.

So two IT guys try this PAC file setting on their work PC at their home networks. The guy with Comcast as his ISP reports that he can surf the Internet just fine at home. I, with Centurylink, am not so successful. It takes many minutes before an eventual timeout seems to occur and I cannot surf the Internet as long as I have that PAC file configured. But I can always uncheck it and life is good.

Now along comes a new requirement. This organization is going to roll out VPN without split tunneling, and the initial authentication to that VPN is a web page on the VPN switch. Now we have a real problem on our hands.

With my ISP, I can shut off the PAC file, get to the log-on page, establish VPN, but at that point if I wanted to get back out to the Internet (which is required for some job functions) I’d have to re-establish the PAC file setting. Furthermore it is desirable to lock down the proxy settings so that users can’t change them in any case. That makes it sound impossible for Centurylink customers, right?

Wrong. By the way the Comcast guy had this whole scenario working fine.

The Gory Details
This enterprise organization happened to have chosen legitimately owned but unused internal namespace for the PAC file location, analagous to my webproxy.intranet.drjohnstechtalk.com in my example. I reasoned as follows. Internet Explorer (“IE”) must quickly learn in the Comcast case that the domain name of the PAC file (webproxy.intranet.drjohnstechtalk.com) resolves with a NXDOMAIN and so it must fall back to making DIRECT connections to the Internet. For the unfortunate soul with CenturyLink (me), the domain name is clobbered! It does resolve, and to an active web site. That web site must produce a HTTP 404 not found. At least you’d think so. Today it seems to produce a simplified PAC file, which I am totally astonished by. And I wonder if this is more recent behaviour present in an attempt to ameliorate this situation. In any case, I reasoned that if they were clobbering a non-existent DNS record, we could actually define this domain name, but instead of going through the trouble of setting up a web server with the PAC file, just define the domain name as the loopback interface, 127.0.0.1. There’s no web server to connect to, so I hoped the browser would quickly detect this as a bad PAC URL, go on its way to make DIRECT connections to the VPN authentication web site, and then once VPN were established, use the PAC file again actively to permit the user to surf the Internet. And, furthermore, that this should work for both kinds of users: ones with DNS-clobbering ISPs and ones without.

That’s a lot of assumptions in the previous paragraph! But I built the case for it – it’s all based on reasonable extrapolation from observed behaviour. More testing needs to be done. What we have seen so far is that this DNS entry does no harm to the Comcast user. Direct Internet browsing works, VPN log-in works, Internet browsing post-login works. For the CenturyLink user the presence of this DNS entry permitted the browser of the work PC to surf the Internet very readily, which is already progress. VPN was not tested but I see no reason why it wouldn’t work.

More tests need to be done but it appears to be working out as per my educated guess.

April 2012 Update
Our fix seemed to collapse like a house of cards all-of-a-sudden many months later. Read how instead of panicking, we re-fixed it using our best understanding of the problems and mechanisms involved. The IT Detective Agency: Browsing Stopped Working on Internet-Connected Enterprise Laptops

Conclusion
We found a significant issue with DNS clobbering as practiced by some ISPs in an enterprise-class application: VPN. We found a work-around after taking an educated guess as to what would work – defining webproxy… to resolve to 127.0.01. We could have also changed the domain name of the PAC file – to one that wouldn’t be clobbered – but that was set by another group and so that option was not available to us. Also, we don’t yet know how extensive DNS clobbering is at other ISPs. Perhaps some clobber every domain name which returns a NXDOMAIN flag. That’s what Google’s DNS FAQ seems to imply at any rate. A more sensible approach may have been to migrate to use the auto-detect proxy settings, but that’s a big change for an enterprise and they weren’t ready to do that. A final concern is what if the PC is running a local web server because some application requires it?? That might affect our results.

Case: just about solved!

References
A related case of Verizon clobbering TCP reset packets is described here.