Intro
Recall our elegant solution to DNS clobbering and use of a PAC file, which we documented here: The IT Detective Agency: How We Neutralized Nasty DNS Clobbering Before it Could Bite Us. Things were going smoothly with that kludge in place for months. Then suddenly it wasn’t. What went wrong and how to fix?? Read on.
The Details
We began hearing reports of increasing numbers of users not being able to access Internet sites on their enterprise laptops when directly connected to Internet. Internet Explorer gave that cryptic error to the effect Internet Explorer cannot display the web page. This was a real problem, and not a mere inconvenience, for those cases where a user was using a hotel Internet system that required a web page sign-on – that web page itself could not be displayed, giving that same error message. For simple use at home this error may not have been fatal.
But we had this working. What the…?
I struggled as Rop demoed the problem for me with a user’s laptop. I couldn’t deny it because I saw it with my own eyes and saw that the configuration was correct. Correct as in how we wanted it to be, not correct as in working. Basically all web pages were timing out after 15 seconds or so and displaying cannot display the web page. The chief setting is the PAC file, which is used by this enterprise on their Intranet.
On the Internet (see above-mentioned link) the PAC file was more of an annoyance and we had aliased the DNS value to 127.0.0.1 to get around DNS clobbering by ISPs.
While I was talking about the problem to a colleague I thought to look for a web server on the affected laptop. Yup, there it was:
C:\Users\drj>netstat -an|more Active Connections Proto Local Address Foreign Address State TCP 0.0.0.0:80 0.0.0.0:0 LISTENING TCP 0.0.0.0:135 0.0.0.0:0 LISTENING ...
What the…? Why is there a web server running on port 80? How will it respond to a PAC file request? I quickly got some hints by hitting my own laptop with curl:
$ curl -i 192.168.3.4/proxy.pac
HTTP/1.1 404 Not Found Content-Type: text/html; charset=us-ascii Server: Microsoft-HTTPAPI/2.0 Date: Tue, 17 Apr 2012 18:39:30 GMT Connection: close Content-Length: 315Not Found Not Found
HTTP Error 404. The requested resource is not found.
So the server is Microsoft-HTTPAPI, which upon invetsigation seems to be a Microsoft Web Deployment Agent Service (MsDepSvc).
The main point is that I don’t remember that being there in the past. I felt it’s presence, probably a new “feature” explained the current problem. What to do about it however??
Since this is an enterprise, not a small shop with a couple PCs, turning off MsDepSvc is not a realistic option. It’s probably used for peer-to-peer software distribution.
Hmm. Let’s review why we think our original solution worked in the first place. It didn’t work so well when the DNS was clobbered by my ISP. Why? I think because the ISP put up a web server when it encountered a NXDOMAIN DNS response and that web server gave a 404 not found error when the browser searched for the PAC file. Turning the DNS entry to the loopback interface, 127.0.0.1, gave the browser a valid IP, one that it would connect to and quickly receive a TCP RST (reset). Then it would happily conclude there was no way to reach the PAC file, not use it, and try DIRECT connections to Internet sites. That’s my theory and I’m sticking to it!
In light of all this information the following option emerged as most likely to succeed: set the Internet value of the DNS of the PAC file to a valid IP, and specifically, one that would send a TCP RST (essentially meaning that TCP port 80 is reachable but there is no listener on it).
We tried it and it seemed to work for me and my colleague. We no loner had the problem of not being able to connect to Internet sites with the PAC file configured and directly connected to Internet.
I noticed however that once I got on the enterprise network via VPN I wasn’t able to connect to Internet sites right away. After about five minutes I could.
My theory about that is that I had a too-long TTL on my PAC file DNS entry. The TTL was 30 minutes. I shortened it to five minutes. Because when you think about it, that DNS value should get cached by the PC and retained even after it transitions to Intranet-connected.
I haven’t retested, but I think that adjustment will help.
Conclusion
I also haven’t gotten a lot of feedback from this latest fix, but I’m feeling pretty good about it.
Case: again mostly solved.
Hey, don’t blame me about the ambiguity. I never hear back from the users when things are working 🙂
References
A closely related case involving Verizon “clobbering” TCP RST packets also bit us.