Someone in the org notices that laptops can’t connect to our VPN concentrator when using Verizon MiFi. An investigation ensues and culprits are found. Read on for the details…
The laptops are configured to use an explicit PAC file (proxy auto-config) for proxy access to the Internet. It works great when they’re on the Intranet. There were some initial bumps in the road, so we found the best results (as determined by using normal ISPs such as CenturyLink) for successful sign-on to VPN and establishing the tunnel was to create a dummy service on the Internet for the PAC file server. The main thing that does is send a RESET packet every time someone requests a PAC file. In that way the laptop knows that it should quickly give up on retrieving the PAC file and just use direct Internet connections for its initial connection to the VPN concentrator. OK?
So with that background, this current situation will make more sense. sometime after we developed that approach someone from the organization came along and tried to connect to VPN using a Verizon MiFi device. Although in the JunOS client it shows Connected in actual fact no tunnel is ever established so it does not work.
Time for to try some of our methodical testing to get to the bottom of this. Fortunately my friend has a Verizon MiFi jetpack, so I hooked myself up. Yup, same problem. It shows me connected but I haven’t been issued a tunnel IP (as ipconfig shows) and I am not on the internal network.
Next test – no PAC file
Then I turned off use of a PAC file and tried again. I connected and brought up a tunnel just fine!
Next test – PAC file restored, use of normal ISP
Then I restored the PAC file setting and tried my regular ISP, CenturyLink. I connected just fine!
Next test – try with a Verizon Hotspot
I tried my Verizon phone’s Hotspot. It also did not work!
I performed these tests several times to make sure there weren’t any flukes.
I also took traces of all these tests. To save myself time I won’t share the traces here like I normally do, but describe the relevant bits that I feel go into the heart of this case.
The heart of the matter
When the laptop is bringing up the VPN tunnel there is always at least one and usually several requests for the PAC file (assuming we’re in a mode where PAC file is in use). Remember I said above that the PAC file on the Internet is “served” by a dummy server that all it does is respond to every request (at TCP level every SYN) with a RST (reset)?
Well, that’s exactly what I saw on the laptop trace when using CenturyLink. But is not what I saw when using either Verizon MiFi or HotSpot. No, in the Verizon case the laptop is sent a (SYN,ACK). So the laptop in that case finishes the TCP handshake with an ACK and then makes a full HTTP request for the PAC file (which of course it never receives).
Well, that is just wrong because now the laptop thinks there’s a chance that it might get the PAC file if it just waits around long enough, or something like that.
The main point is that Verizon – probably with the best intentions – has messed with our TCP packets and the laptop’s behavior is so different as a result that it breaks this application.
I’m still thinking about what to do. I doubt Verizon, being a massive organization, is likely to change their ways, but we’ll see. UPDATE. After a few weeks and poor customer support from Verizon, they want to try some more tests. Verizon refuses to engage in a discussion at a technical level with us and ignores all our technical points in this open case. They don’t agree or disagree, but simply ignore it. Then they cherry pick some facts which support their preconceived notion of the conclusion. “It work with the PAC file turned off so the only change is on your end” is pretty much an exact quote from their “support.”
We also asked Verizon for all their ranges so we could respond differently to Verizon users, but they also ignored that request.
In particular What Verizon is doing amounts to WAN optimization.
UPDATE – a test with Sprint
I got a chance to try yet a different Telecom ISP: Sprint. I expected it to work because we don’t have any complaints, and indeed it did. I tested with a Sprint aircard someone lent me. But the surprising thing – indeed amazing thing – is how it worked. Sprint also clobbers those RST packets from when the laptop if fetching the PAC file. They also turn them into SYN,ACKs. But they carry the deceit one step further. When the laptop then does the GET request, they go so far as to fake the server response! They respond with a short 503 server fail status code! The laptop makes a few attempts to get the PAC file, always getting these 503 responses and then it gives up and it is happy to establish the VPN tunnelled connection at that point! So this has got me thinking…
April, 2014 update
Verizon actually did continue to work with us. They asked us the IP of our external PAC file server, never revealing what they were going to do with that information. Then we provided them phone numbers of Verizon devices we wished to test with. As an aside did you know that a Verizon MiFi Jetpack has a phone number? It does. Just log into it at my.jetpack and it will be displayed.
So I tried the Jetpack after they did their secret things, and, yes, VPN now works. But we are not helpless. I also traced that session, and in particular the packets to and from the PAC file webserver. Yes, they are coming back to my laptop as RSTs. I tested a second time, because once can be a fluke. Same thing: successful connection and RST packets as we wanted them to be.
Now we just need the fix to be generalized to all Verizon devices.
May 2014 update
Finally, finally, on May 6th we got the word that the fix has been rolled out. They are “bypassing” the IPs of our PAC file webservers, which means we get to see the RST packets. I tested an aircard and a Verizon hotspot and all looked good. VPN was established and RST packets were observed.
Case finally closed.
Verizon has done some “optimizations” which clobber our RST packets. It is probably pre-answering SYNs, with ACKs, which under ordinary circumstances probably helps performance. But it is fatal to creating an SSL VPN tunnel with the JunOS client configured with a PAC file. Sprint optimizes as well, but in such a way that things actually work. Verizon actually worked with us, at their own slow pace and secretive methodology, and changed how they handled those packets and things began to work correctly.
The next two links are closely related to this current issue.
The IT Detective Agency: How We Neutralized Nasty DNS Clobbering Before it Could Bite Us.
The IT Detective Agency: Browsing Stopped Working on Internet-Connected Enterprise Laptops