Admin IT Operational Excellence Network Technologies

No Internet, secure WiFi status message in Windows 10

Finding out how Windows decides if there is an Internet connection or not can be a challenge often posed by trying to do an Internet search comprised or words that are common and therefore used in many other contexts. I have to give credit to someone else who found most of these pertinent links that help explain how Windows decides whether or not your PC has an Internet connection.

What they don’t tell you
I think there are a lot more tests microsoft does than what they’ve documented. In my opinion, based on observation, in addition to the sites they recommend to whitelist, also whitelist

Some PCs get stuck in a loop requesting indefinitely, which isn’t good for anyone.

Here’s one they don’t mention, of the same ilk:

I’m thinking to just leave that one alone, unless you really are fully running on ipv6.

Now if you have a PAC file, what you’re going to see are accesses for

I don’t think that one’s documented either. I’m not yet sure how best to have the PAC file web server respond, where best means the reply which would make the PC most likely to decide Yes I really do have an Internet connection.

References and related
This Pulse Secure article is pretty good. You start with an Internet connection, then launch Pulse Secure vpn, then find you are told there is no longer an Internet connection. This explains why it might be, but in my opinion it is incomplete as it does not even consider the case where an authenticating proxy is the sole gateway to the Internet:

These are two more articles about VPN tunneling

network Location Awareness (NLA) and Network Connection Status Indicator (NCSI) are explained in these articles

IT Operational Excellence Network Technologies Web Site Technologies

F5 Big-IP: When your virtual server does not present your chain certificate

While I was on vacation someone replaced a certificate which had expired on the F5 Big-IP load balancer. Maybe they were not quite as careful as I would like to hope I would have been. In any case, shortly afterwards our SiteScope monitoring reported there was an untrusted server certificate chain. It took me quite some digging to get to the bottom of it.

The details
Well, the web site came up just fine in my browser. I checked it with SSLlabs and its grade was capped at B because of problems with the server certificate chain. I also independently confirmed usnig openssl that no intermediate certificate was being presented by this virtual server. To see what that looks like with an exampkle of this problem knidly privided by, do:

$ openssl s_client ‐showcerts ‐connect

depth=0 /C=US/ST=California/L=San Francisco/O=BadSSL Fallback. Unknown subdomain or no SNI./CN=badssl-fallback-unknown-subdomain-or-no-sni
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 /C=US/ST=California/L=San Francisco/O=BadSSL Fallback. Unknown subdomain or no SNI./CN=badssl-fallback-unknown-subdomain-or-no-sni
verify error:num=27:certificate not trusted
verify return:1
depth=0 /C=US/ST=California/L=San Francisco/O=BadSSL Fallback. Unknown subdomain or no SNI./CN=badssl-fallback-unknown-subdomain-or-no-sni
verify error:num=21:unable to verify the first certificate
verify return:1
Certificate chain
 0 s:/C=US/ST=California/L=San Francisco/O=BadSSL Fallback. Unknown subdomain or no SNI./CN=badssl-fallback-unknown-subdomain-or-no-sni
   i:/C=US/ST=California/L=San Francisco/O=BadSSL/CN=BadSSL Intermediate Certificate Authority
Server certificate
subject=/C=US/ST=California/L=San Francisco/O=BadSSL Fallback. Unknown subdomain or no SNI./CN=badssl-fallback-unknown-subdomain-or-no-sni
issuer=/C=US/ST=California/L=San Francisco/O=BadSSL/CN=BadSSL Intermediate Certificate Authority
    Verify return code: 21 (unable to verify the first certificate)

So you get that message about benig unable to verify the first certificate.

Here’s the weird thing, the certificate in question was issued by Globalsign, and we have used them for years so we had the intermediate certificate configured already in the SSL client profile. The so-called chain certificate was GlobalsignIntermediate. But it wasn’t being presented. What the heck? Then I checked someone else’s Globalsign certificate and found the same issue.

Then I began to get suspicious about the certificate. I checked the issuer more carefully and found that it wasn’t from the intermediate we had been using all these past years. Globalsign changed their intermediate certificate! The new one dates frmo November 2018 and expires in 2028.

And, to compound matters, F5 “helpfully” does not complain and simply does not send the wrong intermediate certificate we had specified in the SSL client profile. It just sends no intermediate certificate at all to accompany the server certificate.

The case of the missing intermediate certificate was resolved. It is not the end of the world to miss an intermediate certificate, but on the other hand it is not professional either. Sooner or later it will get you into trouble.

References and related is a great resource.
My favorite openssl commands can be very helpful.

Admin Apache Hosting Service IT Operational Excellence Linux Web Site Technologies

Scaling your apache to handle more requests

I was running an apache instance very happily with mostly default options until the day came that I noticed it was taking seconds to serve a simple web page – one that it used to serve in 50 ms or so. I eventually rolled up my sleeves to see what could be done about it. It seems that what had changed is that it was being asked to handle more requests than ever before.

The details
But the load average on a 16-core server was only at 2! sar showed no particular problems with either cpu of I/O systems. Both showed plenty of spare capacity. A process count showed about 258 apache processes running.

An Internet search helped me pinpoint the problem. Now bear in mind I use a version of apache I myself compiled, so the file layout looks different from the system-supplied apache, but the ideas are the same. What you need is to increase the number of allowed processes. On my server with its great capacity I scaled up considerably. These settings are in /conf/extra/httpd-mpm.conf in the compiled version. In the system-supplied version on SLES I found the equivalent to be /etc/apache2/server-tuning.conf. To begin with the key section of that file had these values:

<IfModule mpm_prefork_module>
    StartServers             5
    MinSpareServers          5
    MaxSpareServers         10
    MaxRequestWorkers      250
    MaxConnectionsPerChild   0

(The correct section is <IfModule prefork.c> in the system-supplied apache).

I replaced these as follows:

<IfModule mpm_prefork_module>
    StartServers          256
    MinSpareServers        16
    MaxSpareServers       128
    ServerLimit          2048
    MaxClients           2048
    MaxRequestsPerChild  20000

Note that ServerLimit has to be greater than or equal to MaxClients (thank you Apache developers!) or you get an error like this when you start apache:

WARNING: MaxClients of 2048 exceeds ServerLimit value of 256 servers,
 lowering MaxClients to 256.  To increase, please see the ServerLimit

So you make this change, right, stop/start apache and what difference do you see? Probably none whatsoever! Because you probably forgot to uncomment this line in httpd.conf:

#Include conf/extra/httpd-mpm.conf

So remove the # at the beginning of that line and stop/start. If like me you’ve changed the usual diretory where the PID file and lock file get written in your httpd.conf file you may need this additional measure which I had to do in the httpd-mpm.conf file:

<IfModule !mpm_netware_module>
    #PidFile "logs/"
# The accept serialization lock file MUST BE STORED ON A LOCAL DISK.
<IfModule !mpm_winnt_module>
<IfModule !mpm_netware_module>
#LockFile "logs/accept.lock"

In other words I commented out this file’s attempt to place the PID and lock files in a certain place because I have my own way of storing those and it was overwriting my choices!

But with all those changes put together it works much, much better than before and can handle more requests than ever.

In creating a simple benchmark we could easily scale to 400 requests / second, and we didn’t really even try to push it – and this was before we changed any parameters. So why couldn’t 250 or so simultaneous processes handle more real world requests? I believe that if all clients were as fast as our server it could have handled them all. But the clients themselves were sometimes distant (thousands of miles) with slow or lossy connections. Then they need to acknowledge every packet sent by the web server and the web server has to wait around for that, unable to go on to the next client request! Real life is not like laboratory testing. As the waiting around bit requires next-to-no cpu the load average didn’t rise even though we had run up against a limit, the limit was an artificial application-imposed one, not a system-imposed resource constraint.

More analysis, what about threads?

Is this the only or best way to scale up your web server? Probably not. It’s probably the most practical however because you probably didn’t compile it with support for threads. I know I didn’t. Or if you’re using the system-provided package it probably doesn’t support threads. Find your httpd binary. Run this command:

$ ./httpd -l|grep prefork

If it returns:


you have the prefork module and not the worker module and the above approach is what you need to do. To me a more modern approach is to scale by using threads – modern cpus are designed to run threads, which are kind of like light-weight processes. But, oh well. The gatekeepers of apache packages seem stuck in this simple-minded one process per request mindset.

My scaled-up apache is handling more requests than ever. I’ve documented how I increased the total process count.

References and related articles
How I compiled apache 2.4 and ran into (and resolved) a zillion errors seems to be a popular post!
The mystery of why we receive hundreds or even thousands of PAC file requests from each client every day remains unsolved to this day. That’s why we needed to scale up this apache instance – it is serving the PAC file. I first wrote about it three and a half years ago!04

Admin IT Operational Excellence Linux Network Technologies Raspberry Pi

Screaming Streaming on the Raspberry Pi

The Raspberry Pi plus camera is just irresistible fun. But I had a strong motivation to get it to work the way I wanted it to as well: a First robotics team that was planning on using it for vision for the drive team. So of course those of us working on it wanted to offer something with a real-time view of the field with a fast refresh rate and good (though not necessarily perfect) reliability. Was it all possible? Before starting I didn’t know. In fact I started the season in January not knowing the team would want to use a Raspberry Pi, much less that there was a camera for it! But we were determined to push through the obstacles and share my love of the Pi with students. Eventually we found a way.

The details
Well, we sure made a lot of missteps along the way, that’s why I’m excited to write this article to help others avoid some of the pain points. It needs to be fleshed out some more, but this post will be expanded to become a litany of what didn’t work – and that list is pretty long! All of it borrowed from well-meaning people on various Internet sites.

The essence of the solution is the quick start page – I always search for Raspberry pi camera quick start to find it – which basically has the right idea, but isn’t fleshed out enough. So raspivid + nc + a PC with netcat (nc) and mplayer will do the trick. Below I provide a tutorial on how to get it all to work.

Additional requirement
Remember I wanted to make this almost fool-proof. So I wanted the Pi to be like a passive device that doesn’t need more than a one-time configuration. Power-up and she’s got to be ready. Cut power and re-power, it better be ready once more. No remote shell logins, no touching it. That’s what happens when it’s on the robot – it suddenly gets powered up before the match.

Here is the startup script I created that does just that. I put it in /etc/init.d/raspi-vid:

#! /bin/sh
# /etc/init.d/raspi-vid
# 2/2014
# The following part always gets executed.
echo "This part always gets executed"
# The following part carries out specific functions depending on arguments.
case "$1" in
    echo "Starting raspi-vid"
# -n means don't show preview on console; -rot 180 to make image right-side-up
# run a loop because this command dies unless it can connect to a listener
    while /bin/true; do
# if acting as client do this. Probably it's better to act as server however
# try IPs of the production PC, test PC and home PC
#      for IP in; do
#        raspivid -n -o - -t 9999999 -rot 180 -w 640 -h 480 -b 800000 -fps 15|nc $IP 80
#      done
# act as super-simple server listening on port 443 using nc
# -n means don't show preview on console; -rot 180 to make image right-side-up
# -b (bitrate) of 1000000 (~ 1 mbit) seems adequate for our 640x480 video image
# so is -fps 20 (20 frames per second)
# To view output fire up mplayer on a PC. I personally use this command on my PC:
# c:\apps\netcat\nc 443|c:\apps\smplayer\mplayer\mplayer -ontop -fps 60 -vo gl -cache 1024 -geometry 600:50 -noborder -msglevel all=0 -
      raspivid -n -o - -t 9999999 -rot 180 -w 640 -h 480 -b 1000000 -fps 20|nc  -l 443
# this nc server craps out after each connection, so just start up the next server automatically...
      sleep 1;
    echo "raspi-vid is alive"
    echo "Stopping rasip-vid"
    pkill 'raspi-?vid'
    echo "raspi-vid is dead"
    echo "Usage: /etc/init.d/rasip-vid {start|stop}"
    exit 1
exit 0

I made it run on system startup thusly:

$ cd /etc/init.d; sudo chmod +x raspi-vid; sudo update-rc.d raspi-vid defaults

Of course I needed those extra packages, mplayer and netcat:

$ sudo apt-get install mplayer netcat

Actually you don’t really need mplayer, but I frequently used it simply to study the man pages which I never did figure out how to bring up on the Windows installation.

On the PC I needed mplayer and netcat to be installed. At first I resisted doing this, but in the end I caved. I couldn’t meet all my requirements without some special software on the PC, which is unfortunate but OK in our circumstances.

I also bought a spare camera to play with my Pi at home. It’s about $25 from, though the shipping is another $11! If you’re an Amazon Prime member that’s a better bet – about $31 when I looked the other day. Wish I had seen that earlier!

I guess I used the links provided by the quick start page for netcat and mplayer, but I forget. As I was experimenting, I also installed smplayer. In fact I ended up using the mplayer provided by smplayer. That may not be necessary, however.

A word of caution about smplayer
smplayer, if taken from the wrong source (see references for correct source), will want to modify your browser toolbar and install adware. Be sure to do the Expert install and uncheck everything. Even so it might install some annoying game which can be uninstalled later.

Lack of background
I admit, I am no Windows developer! So this is going to be crude…
I relied on my memory of some basics I picked up over the years, plus analogies to bash shell programming, where possible.

I kept tweaking a batch file on my desktop. So I associated notepad to my Send To menu. Briefly, you type


where it says Search programs and files after clicking the Start button. Then drag a copy of notepad from c:\windows\notepad into the window that popped up.

Now we can edit our .bat file to our heart’s content.

So I created a mplayer.bat file and saved it to my desktop. Here are its contents.

if not "%minimized%"=="" goto :minimized
set minimized=true
start /min cmd /C "%~dpnx0"
goto :EOF
rem Anything after here will run in a minimized window
REM DrJ 2/2014
rem very simple mplayer batch file to play output from a Raspberry Pi video stream
rem Use the following line to set up a server
REM c:\apps\netcat\nc -L -p 80|c:\apps\smplayer\mplayer\mplayer -fps 30 -vo gl -cache 1024 -msglevel all=0 -

rem Set up as client with this line...
rem put in loop because I want it to start up whenever there is something listening on port 80 on the server

rem this way we are acting as a client - this is more how you'd expect and want things to work
c:\apps\netcat\nc 443|c:\apps\smplayer\mplayer\mplayer -ontop -fps 60 -vo gl -cache 1024 -geometry 600:50 -noborder -msglevel all=0 -

rem stupid trick to sleep for about a second. Boy windows shell is lacking...
ping -n 2 -w 1000 > NUL
goto loop

A couple notes about what is specific to my installation. I like to install programs to c:\apps so I know I installed them by hand. So that’s why smplayer and netcat were put there. Of course is my Pi’s IP address on my home network. In this post I describe how to set a static IP address for your Pi. We also found it useful to have the CMD Window minimize itself after starting up and running in the background, so the I discovered that the lines on the top allow that to happen.

The results
With the infinite loops I programmed either Pi or mplayer.bat can be launched first – there is no necessary and single order to do things in. So it is a more robust solution than that outlined in the quick start guide.
Most of my other approaches suffered from lag – delay in displaying a live event. Some other suggested approaches had quite large lag in fact. The lag from the approach I’ve outlined above is only 0.2 s. So it feels real-time. It’s great. Below I outline a novel method (novel to me anyways) of measuring lag precisely.
Many of my other approaches also suffered from a low refresh rate. You’d specify some decent number of frames per second, but in actual fact you’d get 1 -2 fps! That made for choppy and laggy viewing. With the approach above there is a full 20 frames per second so you get the feel of true motion. OK, fast motions are blurred a bit, but it’s much better than what you get with any solution involving raspistill: frame updates every 0.6 s and nothing you do can speed it up!
Many Internet video examples showed off high-resolution images. I had a different requirement. I had to keep the bandwidth usage tamped down and I actually wanted a smaller image, not larger because the robot driver has a dashboard to look at.
I chose an unconventional port, tcp port 443, for the communication because that is an allowed port in the competition. The port has to match up in raspi-vid and mplayer.bat. Change it to your own desired value.

Well, this is a one-client at a time solution, for starters! did I mention that nc makes for a lousy server?
Even with the infinite looping, things do get jammed up. You get into situation where you need to kill the mplayer CMD window to get things going again.
I would like to have gotten the lag down even further, but haven’t had time to look into it.
Begin a video amateur I am going to make up my own terms! This solution exhibits a phenomenon I call convergence. What that means is that once the mplayer window pops up, which takes a few seconds, what it’s displaying shows a big lag – about 10 seconds. But then it speeds along through the buffered frames and converges with real-time. This convergence takes slightly more than 10 seconds. So if you need instant-on and real-time, you’re not getting it with this solution!

What no one told us
I think we were all so excited to get this little camera for the Pi no one bothers to talk about the actual optical properties of the thing! And maybe they should. because even if it is supposedly based on a cellphone camera, I don’t know which cellphone, certainly not the one from my Samsung Galaxy S3. The thing is (and I admit someone else first pointed this out to me) that it has a really small field-of-view. I measured it as spreading out only 8.5″ at a 15″ distance – that works out to only 31.6 degrees! See what I mean? And I don’t believe there are any tricks or switches to make that larger – that’s dictated by the optics of the lens. This narrow field-of-view may make it unsuitable for use as security camera or many other projects, so bear that in mind. If I put my Samsung next to it and look at the same view its field of view is noticeably larger, perhaps closer to 45 degrees.

Special Insights
At some point I realized that the getting started guide put things very awkwardly in making the PC the server and the Pi the client. You normally want things the other way around, like it would be for an ethernet camera! So my special insight was to realize that nc could be used in the reverse way they had documented it to switch client/server roles. nc is still a lousy “server,” if you can call it that, but hey, the price is right.

Fighting lag
To address the convergence problem mentioned above I chose a frame rate much higher on the viewer than on the camera. The higher this ratio the faster convergence occurs. So I have a 3:1 ratio: 60 fps on mplayer and 20 fps on raspivid. The PC does not seem to strain from the small bit of extra cpu cycles this may require. I think if you have an exact fps match you never get convergence, so this small detail alone could convince you that raspivid is always laggy when in fact it is more under your control than you realized.

Even though with the video quality such as it is there probably is no real difference between 10 fps and 20 fps, I chose 20 fps to reduce lag. After all, 10 fps means an image only every 100 msec, so on average by itself it introduces a lag of half that, 50 msec. Might as well minimize that by increasing the fps to make this a negligble contributor to lag.

Measuring lag
Take a smartphone with a stopwatch app which displays large numbers. Put that screen close up to the Pi camera. Arrange it so that it is next to your PC monitor so both the smartphone and the monitor are in your field of view simultaneously. Get mplayer.bat running on your PC and move the video window close to the edge of the monitor by the smartphone.

Now you can see both the smartphone screen as well as the video of the smartphone screen running the stopwatch (I use Swiss Army Knife) so you can glance at both simultaneously and quantify the lag. But it’s hard to look at both rapidly moving images at the same time, right? So what you do is get a second camera and take a picture of the two screens! We did this Saturday and found the difference between the two to be 0.2 s. To be more scientific several measurements ought to be taken and results avergaed and hundredths of seconds perhaps should be displayed (though I’m not sure a still picture could capture that as anything other than a blur).

mplayer strangeness on Dell Inspiron desktop
I first tried mplayer on an HP laptop and it worked great. It was a completely different story on my Dell Inspiron 660 home desktop however. There that same mplayer command produced this result:

VO: [directx] 640x480 => 640x480 Packed YUY2
FATAL: Cannot initialize video driver.
FATAL: Could not initialize video filters (-vf) or video output (-vo).
Exiting... (End of file)

So this was worrisome. I happened on the hint to try -vo gl and yup, it worked. Supposedly it makes for slower video so maybe on PCs where this trick is not required lag could be reduced.

mplayer personal preferences
I liked the idea of a window without a border (-noborder option) – so the only way to close it out is to kill the CMD window, which helps keep them in sync. Running two CMD windows doesn’t produce such good results!

I also wanted the window to first pop-up in the upper right corner of the screen, hence the -geometry 600:50

And I wanted the video screen to always be on top of other windows, hence the -ontop switch.

I decided the messages about cache were annoying and unimportant, hence the message suppression provided by the -msglevel all=0 switch.

Simultaneously recording and live streaming
I haven’t played with this too much, but I think the unix tee command works for this purpose. So you would take your raspivid line and make it something like:

raspivid -n -o – -t 9999999 -rot 180 -w 640 -h 480 -b 1000000 -fps 20|tee /home/pi/video-`date +%Y%h%d-%H%M`|nc -l 443

and you should get a nice date-and-time-stamped output file while still streaming live to your mplayer! Tee is an under-appreciated command…

I have tinkered with the Pi until I got its camera display to be screaming fast on my PC. I’ve shown how to do this and described some limitations.

Next Act?
I’m contemplating superimposing a grid with tick marks over the displayed video. This will help the robot driver establish their position relative to fixed elements on the field. This may be possible by integrating, for instance, openCV, for which there is some guidance out there. But I fear the real-time-ness may greatly suffer. I’ll post if I make any significant progress!
Update: I did get it to work, and the lag was an issue as suspected. Read about it here.

References and related
First Robotics is currently in season as I write this. The competition this year is Aerial Assist. More on that is at their web site,
Raspberry Pi camera quick start is a great place to get started for newbies.
Setting one or more static IP addresses on your Pi is documented here.
How not to set up your Pi for real-time video will be documented here.
How to get started on your Pi without a dedicated monitor is described here.
Finally, how to overlay a grid onto your video output (Yes, I succeeded to do it!) is documented here.
Correct source for smplayer for Windows.

Admin DNS IT Operational Excellence

The IT Detective Agency: since when can a powered off PC do dynamic DNS updates?

The IT Detectives are back after a short lull during which no great mysteries needed expert resolution – you knew that situation couldn’t last too long. The following tale was relayed to me, I unfortunately cannot claim to have been any help whatsoever. The details have been somewhat obscured in this retelling.

The details
One of our DNS servers at drjohns was busy fielding lots and lots of DDNS updates. Good, right? No, not so. Because our employee PCs are all configured to not do this very thing. In Windows 7 drilling down into the advanced DNS settings you have a Checkbox for Register this connection’s addresses in DNS. And that is unchecked. So although we use DHCP, the PCs shouldn’t be sending their DDNS updates. Yet they were. In fact at one point a considerable amount of bandwidth was being eaten up with these unwanted updates, so we had to investigate and act. But where to begin?

Word finally got around to one of our PC experts who I guess probably had his suspicions. He suggested the following test:

turn the PC off and look for DDNS updates on the DNS server

Amazingly, that’s exactly what we found to be the case – DDNS updates coming from a powered off PC. The DDNS updates did not always go to the same DNS server. The chosen DNS server seemed randomly chosen, but they all were drjohns DNS servers.

A Wireshark examination of a trace (taken by a network engineer) showed lots of Dynamic Update SOA I looked at the trace and found that that was just a title given by Wireshark for what was happening, and not a very accurate one. If you expand the packet you saw inside of it that (mostly) it was a workstation trying to register its A record on the DNS server (a DDNS update). It wasn’t literally trying to change the SOA record for the zone though that might have been the logical result of updating its A record.

What the power-off test showed to our subject-area expert is that Intel vPro was responsible for these DDNS updates. Wait, you ask, what the heck is vPro? We didn’t know either. As I understand it, it’s an additional Intel chip that some business-class laptops (e.g., DELL Latitude) might include that permits more and better remote management, allowing perhaps even some hardware diagnostics to occur.

So let’s go back to that test. Note that I said PC powered off, I did not say disconnected from the network! Powered-off-but-network-connected produces the DDNS update, powered-off-and-disconnected – no update, of course (Hey, it’s not magic going on here!).

So the solution, obvisouly, is to turn off DDNS in vPro. We thought it was off, but maybe not. We expect and hope this to the solution, but a few more days will be needed before this all plays out and we know for sure.

I better hold off on any conclusion until our premise is confirmed! But one feeling I have is that sometimes you have to ingratiate yourself to the right people because no one person has all the answers!

DNS IT Operational Excellence Network Technologies

The IT Detective Agency: Roku player can’t connect to Internet

I bought a used Roku player, just to try it out. Things didn’t go right at first.

The details
Setting it up, I got to the point where it tests its Internet connection. There are three status lights. The first two were OK, but the Internet connection returned an error, I think error #9. It didn’t matter whether I used a wired or wireless connection.

I have an unusual Internet router at home, a Juniper SSG5. All my other devices off that router were getting to Internet just dandy however. Long story short, it turns out that there was a slight DNS misconfiguration. The Juniper had itself as secondary DNS server. There was no primary DNS server. Why this only affected Roku is still a mystery.

I updated the Juniper config to use as primary DNS server and the Roku connected just fine and it worked wirelessly as well.

Case closed!

Well, as a network type-of-guy, I would have appreciated access to the Roku Linux OS so I could have seen for myself what was going on. I’m sure I would have figured out the problem much sooner than I did. The error message was not particularly helpful and most of the online discussions attribute that problem to other causes than was the DNS issue discovered here. Short of OS access, more robust debugging tools, such as PING, would have been might handy.

Admin IT Operational Excellence Proxy

The IT Detective Agency: the case of the Sales and Use tax software

I have to give credit to my colleague “Ben” for cracking this case, which left me scratching my head. Users at drjohns were getting new Windows 7 PCs and some of the old software wasn’t going to run on those new PCs, including our indirect tax sales and use software from Thomson Reuters. The new approach is SaaS – software as a service. The new package was approved and everyone thought it was going to work fine, until late in the game it was actually tested. They couldn’t bring up their old tax returns. So at the last hour they bring in the Internet experts.

The Details
At drjohns our users are insulated from the Internet by proxy servers. There are no direct routes. It’s private address space and an explicit proxy connection to browse out to Internet. 99% of the time this works fine. And it sure is a secure way to go. But those exceptions can be quite a headache. This case is a very typical presentation of what we see, though the particular solution varies case by case.

We get detailed network requirements. They usually talk about opening up the firewall to certain servers, etc. We always patiently explain that the firewall is open – to the proxy! The desktops have no Internet routes, nor can they resolve Internet domain names. That’s right we have private root DNS servers. Most vendors have never encountered this setup and so they dig in their heels and insist that the only way is to ‘open the firewall…”

This case was no different, except we didn’t actually talk to the vendor. But their requirements were crystal clear in this networking document. Here’s the snippet that would seem to be fatal given our Intranet architecture:

The ONESOURCE Sales & Use Application Servers use TCP/IP communications from the client
PC to the Server. The requirements for communications with the ONESOURCE Application
Servers are itemized below:
- DNS Name Resolution is not used for the Application Servers.
- Proxy Server access to the Application Servers is supported ONLY in transparent mode. The
Proxy Server must not translate the TCP/IP address of the Application Servers. PCs must be
able to establish a connection using the actual TCP/IP address and port numbers of the
Application Server without application “awareness” of a Proxy Server.
- Network Address Translation (NAT) is supported for the client addresses but is NOT
supported for the Application Server addresses.
- Connections are outbound only from the client to the server.
- Security policies, firewall rules, proxy rules and router packet filters must allow outbound
connections (and inbound replies) on destination port 2429 to the Class “B” network address when using the non-WCF application servers. If the client’s account has been
configured to use Windows Communications Foundation or WCF, there are no additional port
requirements. The source port selection uses standard port numbers 1024 and above.

The application installs about 10 ActiveX controls and it wouldn’t run on my desktop. Ben managed to get it to run using the OpenText socks client. It has an option to “socksify everything else” which he says proves to be very useful when you don’t know what specific application to socksify. So now let me repeat what I have just said: Ben got it to work without any changes to the firewall, ignoring all the vendor’s advice and requirements!

I was very pleased as this was getting to be a high-priority issue what with these sales and use taxes due each month.

But Ben didn’t stop there. He came up with even better solution. He said he was looking around at the folder where all the stuff is installed by the application. He noticed a file called ConfigProxy. He configured it to use the system proxy settings. Then he exempted the target site from proxy authentication. Lo and behold that worked as well, with no socksification required at all. We only socksify an app as a last resort.

This latest finding completely contradicts the vendor’s stated network requirements. But it’s better this way.

We now have a happy tax department. Case closed.

Vendor network requirements are not always what they seem. Clearly they are not testing in the more obscure environments such as a private Intranet with an independent namespace that connects to the wider Internet only via explicit proxy. If you’re in this situation, which offers some serious security advantages, there are things you can do to get demanding applications to work.

Admin IT Operational Excellence Network Technologies

The IT Detective Agency: the case of the Adobe form network issue

Sometimes IT is called in to fix things we know little or nothing about. We may fix it, still not know anything, except what we did to fix it, and move on. Such is the case here with a mysterious Adobe Form that wasn’t working when I was called in to a meeting to discuss it.

The Case
One of our developers created a simple Adobe Acrobat form document. It has some logic to ask for a username and password and then verify them against an LDAP server using a network connection. Or at least that was the theory. It worked fine and then we got new PCs running Windows 7 and it stopped working. They asked me for help.

I asked to get a copy of the form because I like to test on my desktop where I am free to try dumb things and not waste others’ time. Initially I thought I also had the error. They showed me how to turn on Javascript debugging in edit|preferences. The debug window popped up with something like this:

debug 5 : function setConstants
debug 5 : data.gURL =
debug 5 : function today_ymd
debug 5 : mrp::initialize: version 0.0001 debug level = 5
debug 5 : Login clicked
debug 5 : calling LDAPQ
debug 5 : in LDAPQ
NotAllowedError: Security settings prevent access to this property or method.

But this wasn’t the real problem. In this form you get a yellow bar at the top along with this message and you give approval to the form to access what it needs. Then you run it again.

For me, then, it worked. I knew this because it auto-populated some user information fields after taking a few seconds.

So i worked with a couple people for whom it wasn’t working. One had Automatically detect proxy settings checked. Unfortunately the new PCs came this way and it’s really not what we want. We prefer to provide a specific PAC file. With the auto-detect unchecked it worked for this guy.

The next guy said he already had that unchecked. I looked at his settings and confirmed that was the case. However, in his case he mentioned that Firefox is his default browser. He decided to change it back to Internet Explorer. Then he tested and lo and behold. It began to work for him as well!

When it wasn’t working he was seeing an error:

NetworkError: A network connection could not be created.

Later he realized that in Firefox he also was using auto-detect for the proxy settings. When he switched that to Use System Settings all was OK and he could have FF as default browser and get this form to work.

This is speculation on my part. I guess that our new version of Acrobat Reader X, v 10.1.1, is not competent in interpreting the auto-detect proxy setting, and that it is also tripped up by the proxy settings in Firefox.

There’s a lot more I’d like to understand about what was going on here, but sometimes speed counts. The next problem is already calling my name…

Admin IT Operational Excellence Linux Proxy Web Site Technologies

The IT Detective Agency: intermittent web page not found error

One of the high arts of IT is system integration, and an important off-shoot of this is acquisitions. We are involved in integrating a new location, which, unfortunately, we do not yet have full access to. The local networking is still provided by their vendor, not ours and this makes troubleshooting all the more difficult.

The Details
So the word begins to spread that users at this site are having intermittent problems accessing some of our secure web sites. As it was described to me, they can load the page in their browser for, say five straight times, get a simple Internet Explorer cannot display the web page error, and the sixth time (or whenever) it will load properly. All other connectivity was working. No one else at other locations was having this problem with this web site. More than strange, right?

In drjohn’s perfect IT world, problem reproducibility is critical to resolution, but we simply didn’t have it this time. I also could not produce the problem myself, which means relying on other people.

I’m not sure if we tried to contact their vendor or not at first. But if we had I’m sure they would have denied having anything to do with it.

So we got one of our confederates, Tim, over to this location and we hooked him up with Wireshark so he could get take a packet trace when the failure occurs. It wasn’t long before Tim reproduced the error and emailed us the packet capture.

In the following the PC has IP address, the web server is at The Linux command used to look at the capture file is:

# tcpdump -A -r bodega-error.cap port 443 > /tmp/dump

1 15:54:27.495952 IP > S 2803722614:2803722614(0) win 64240 <mss 1460,nop,wscale 0,nop,nop,sackOK>
2 15:54:27.496309 IP > S 3201081612:3201081612(0) ack 2803722615 win 5840 <mss 1432,nop,nop,sackOK>
3 15:54:27.496343 IP > . ack 1 win 64240
4 15:54:27.497270 IP > P 1:82(81) ack 1 win 64240
5 15:54:27.497552 IP > . ack 82 win 5840
6 15:54:30.743827 IP > P 1:286(285) ack 82 win 5840
..S.......^M..i.P.......HTTP/1.0 200 OK^M
Cache-Control: no-store^M
Pragma: no-cache^M
Cache-Control: no-cache^M
X-Bypass-Cache: Application and Content Networking System Software 5.5.17^M
Connection: Close^M
7 15:54:30.744036 IP > F 82:82(0) ack 286 win 63955
8 15:54:30.744052 IP > F 286:286(0) ack 82 win 5840
9 15:54:30.744077 IP > . ack 287 win 63955
10 15:54:30.744289 IP > . ack 83 win 5840

The output was scrubbed a bit of meaningless junk characters and I added serial packets numbers in the beginning by hand because I don’t (yet) know how to do that with tcpdump!

What, It’s Encrypted – what can you even learn from a trace?
Yeah, an SSL stream sure adds to the already steep challenges we faced in this problem. There just isn’t much to work with. But it is something. I’m about to say what I noticed in this packet trace, but for it to be meaningful you need to know like I did that the web server is situated almost four thousand miles from the user’s location.

The first packet is a SYN from the PC to web server on TCP port 443. So far so good. In fact packets one – three constitute the three-way handshake in TCP.

Although SSL is encrypted, the beginning of the protocol communication should show the SSL cipher being chosen. Unfortunately, tcpdump doesn’t seem to have the smarts to show any of this. So I got myself ssldump. On Ubuntu:

# sudo apt-get install ssldump

did the trick. Then run this same capture file through ssldump, which has very similar arguments to tcpdump:

# ssldump -r bodega-error.cap port 443

New TCP connection #1: <->
1 1  0.0013 (0.0013)  C>S SSLv2 compatible client hello
  Version 3.1
  cipher suites
  Unknown value 0xff
Unknown SSL content type 72
1    3.2480 (3.2467)  C>S  TCP FIN
1 2  3.2481 (0.0000)  S>CShort record
1    3.2481 (0.0000)  S>C  TCP FIN

The way to interpret this is that 0.0013 s into the TCP port 443 communication the cipher suites listed above were sent by the PC to the server. This corresponds to our packet number 4 in the trace file.

Using Wireshark to look at the trace is a lot more convenient – it provides packet numbers, timing, decodes packets and displays the SSL ciphers. But I wanted to show that it _could_ be done with text-based tools.

Look at the timings more closely. In the tcpdump output, packet 2, the SYN ACK, comes 1 ms after the SYN. But given the distances involved between PC and server, the SYN ACK should have come more like 100 ms later, at least. Similarly packet 5, which is an ACK, comes less than 1 ms after packet 4. A 1 ms ACK? Physically impossible.

I have seen this behaviour before – on our own load balance – which I know employs some TCP optimization tricks. So I concluded that they must have physically present at this site some kind of appliance which is doing TCP optimization. It can only provide blank ACKs in its rapid-fire responses since it can’t know what data the server is really going to respond with. That might all be OK. But I’m pretty sure the problem lies between packets 5 and 6. 5 is one of those meaningless rapid-fire empty ACKs generated by the local router. But the PC has just sent a wish list of SSL ciphers in packet 4. It needs to be responded to by the server which has to finish setting up the SSL session.

But that critical packet from the server never arrives. Perhaps even some of the SSL handshake is secretly completed between the local router and the server. Who knows? I have heard of man-in-the-middle devices that decrypt SSL sessions. And packet 6 contains fairly inappropriate content. It almost does look like it has been manufactured by a man-in-the-middle device. Its telling the browser to do a redirect to the same site, except specified by IP address rather than FQDN. And that doesn’t make a lot of sense. The browser likely realizes that this amounts to a looping redirect request so at that point it probably decides to cut its losses and FIN the connection in packet 7.

I traced my own PC hitting this same web server. Now I know we don’t have any of these optimizing devices between me and the web server. I don’t have time to show the results here, but to summarize, it looks rather completely different from the trace above. The ACK packets come back in about 100 ms or so. There is no delay of three seconds. The cipher proposals are responded to in a timely fashion. There is no redirect.

Their Side of the Story
We did get to hear back from the vendor who supports the LAN/WAN. They said they were running WCCP and diverting traffic to a proxy server. This was the correct behaviour before we hooked our infrastructure to this site, but is no longer. They realized this was probably a bad thing and took corrective action to turn off WCCP for destinations in the internal network

Shutting off WCCP, which diverted web site requests to an old proxy server, fixed the problem.

Case closed.

Unsolved Mysteries
I wish we could tie all the loose ends neatly up, but there are too many players involved. We’ll never really know why the problem was intermittent, for instance. Or why some secure web sites could be accessed without any issue whatsoever throughout this ordeal.

WCCP, Web cache Communication Protocol, is a Cisco-developed routing protocol to transparently intercept traffic destined for web servers. More information can be found on it in wikipedia.

It bothers me that after the SSL session was initiated the dump showed the source, unencrypted, of the HTML redirect packet. Why wasn’t that encrypted? Perhaps the WCCP-invoked proxy server was desperately trying to help the PC recover from an unrecoverable situation and manufactured that HTTP-EQUIV REFRESH… to try to force the PC to choose a web site that might work. The fact that it was sent unencrypted over a channel that should have been encrypted was probably even the death bell that triggered the browser to think this makes no sense at all and is even a violation of security, I’m getting out of here.

Admin IT Operational Excellence

The IT Detective Agency: SAP Afaria communication to APNS failed

IT is often called in and expected to produce results when only vague or partial information is presented. This is especially so when integrating commercial software.

The Issue
We were looking to implement SAP SUP (Sybase Unwired Platform). Part of it – for mobile device management for iOS devices – requires an Afaria server to communicate with the Apple Push Notification Service,, port 2195; and, port 2196.

Whatever software it is that’s running on Afaria, it’s not well-behaved insofar as it does not support proxy access. That means it expects to be on the Internet and be able to initiate these communication channels unencumbered.

Since I consider that to be a security risk I looked for a way to mediate this communication over a proxy server anyways, even though it wasn’t supposed to work.

Yes, we did get it to work, but not exactly in the way we expected.

We built TCP tunnels on the proxy.

To be continued…