Category: Network Technologies

Network Technologies

Quick Tip: Why Windows traceroute works better than Linux

Post author By john
Post date May 18, 2016
No Comments on Quick Tip: Why Windows traceroute works better than Linux

Intro
We noticed when debugging with the always useful tool traceroute (tracert on Windows systems) that we got more responsive results from Windows than from a Linux server on the same or nearby network. Finally I decided to look into it, my Linux pride at stake!

what it is is that Windows tracert utility uses ICMP by default whereas Linux traceroute uses UDP packets. We had been testing on a corporate Intranet where the default firewall policy was to allow ICMP but deny everything else.

The fix
Just add a ‐I switch to your linux traceroute command and the results will be as good as Windows. That switches the packet type to ICMP.

On the Internet where that type of firewall is more uncommon it probably won’t make that much a difference. But on an Intranet it could be just the thing you need.

Tags traceroute, tracert

Admin Network Technologies

The IT Detective Agency: spotty ISP performance traced to Google Drive

Post author By john
Post date March 29, 2016
No Comments on The IT Detective Agency: spotty ISP performance traced to Google Drive

Intro
Sometimes things are not what they seem to be on the surface. I was getting lousy PING times to 8.8.8.8 at home for weeks with Centurylink, my ISP. The problem was especially bad at night. I chalked it up to too much competition for their bandwidth for streaming Netflix and other on-demand media. Their customer support was useless. They only knew enough to walk you through power-cycling your DSL modem.

ISP # 2
Hitting a brick wall, I decided after all these years to switch ISPs to Service Electric Broadband Cable. My standalone test with their modem showed good throughput. Something like 29 mbps download and 3 mbps upload using speedtest.net. Then after a few days I got around to putting more of my home network on it and the service degraded as well. could it be that both ISPs were bad at the same time? PING times were between 200 – 900 msec, with plenty of timeouts in between.

Additional symptoms
I noticed that if I power-cycled the modem things ran pretty well for a minute or two, then started going downhill again. I had observed the same thing when they had me power cycle the DSL modem. Then I noticed that when I restarted my laptop the situation improved for awhile, then degraded. So it finally dawned on me that this one laptop was correlated with the problem. In Windows 10 Task Manager it has a convenient process view that allows you to view the top bandwidth-consuming applications (click on Network).

Suspicions raised around Google Drive
There I saw that Google Drive was consuming 3 mbps! Is that a lot or not? It all depends on whether it is downloading or uploading files. In my case I happened to put several multi-GByte movie files on my laptop on the Google Drive. So clearly it was trying to upload them to the cloud. Plus, worse, the power management was such that the laptop was only powered on for a few hours – not long enough for any of those files to finish uploading!!

The short-term solution
Google Drive has a feature that allows you to limit bandwidth usage. When I set that I wanted to keep the upload going but I also wanted to work. I settled on upload of about 240 KB/sec and download of 2000 KB/sec. I figured it was high enough to use most available bandwidth, but save some for others. And I changed my power management scheme to never hibernate when plugged in.

The results
While the files were uploading performance of PING was still quite impacted, but I was doing VPN pretty comfortably so I left it alone. It was certainly better. When all files finished uploading after a few days my performance with the new ISP was great.

Why did rebooting the DSL modem help?
During reboot no Internet connection is available and Google Drive goes into an error state No connection. Periodically it checks if the Internet connection is working. Finally after a modem reboot it does begin to work, then eventually Google Drive will realize that. But even once it does, it starts from the beginning and scans all the files to see what needs to be sycned, and that takes awhile. So only after a few minutes does it begin to use your Internet bandwidth. Meanwhile you think everything is good, until it very quickly turns bad again!

Conclusion
A problem with an ISP at night is explained by the presence of an application that was taking all available bandwidth for doing massive file uploads which never completed! A change to a new ISP was not such a bad thing as it is faster and cheaper.

Tags Centurylink, Google Drive

Admin Network Technologies

iRule script examples

Intro
F5’s BigIP load balancers have an API accessible via iRules which are written in their bastardized version of the TCL language.

I wanted to map all incoming source IPs to a unique source IP belonging to the load balancer (source NAT or snat) to avoid session stealing issues encountered in GUIxt.

First iteration
In my first approach, which was more proof-of-concept, I endeavored to preserve the original 4th octet of the scanner’s IP address (scanners are the users of GUIxt which itself is just a gateway to an SAP load balancer). I have three unused class C subnets available to me on the load balancer. So I took the third octet and did a modulo 3 operation to effectively randomly spread out the IPs in hopes of avoiding overlaps.

rule snat-test2 {
# see https://devcentral.f5.com/questions/snat-selected-source-addresses-on-a-vs
# and https://devcentral.f5.com/questions/load-balance-on-source-ip-address
# spread things out by taking modulus of 3rd octet
# - DrJ 2/11/16
when CLIENT_ACCEPTED {
# maybe IP::client_addr
set snat_Subnet_base "141"
  set ip3 [lindex [split [IP::client_addr] "."] 2]
  set ip4 [lindex [split [IP::client_addr] "."] 3]
  set offset [expr $ip3 % 3]
  set snat_Subnet [expr $snat_Subnet_base + $offset]
  set newip "10.112.$snat_Subnet.$ip4"
#  log local0. "Client IP: [IP::client_addr], ip4: $ip4, ip3: $ip3, offset: $offset, newip: $newip"
  snat $newip
}
}

It worked for awhile but eventually there were overlaps anyway and session stealing was reported.

The next act steps it up
So then I decided to cycle through all roughly 765 addresses available to me on the LB and maintain a mapping table. Maintaining variable state is tricky on the LB, as is working with arrays, syntax, version differences, … In fact the whole environment is pretty backwards, awkward, poorly documented and unpleasant. So you feel quite a sense of accomplishment when you actually get working code!

rule snat-GUIxt {
# see https://devcentral.f5.com/questions/snat-selected-source-addresses-on-a-vs
# and https://devcentral.f5.com/questions/load-balance-on-source-ip-address
# spread things out by taking modulus of 3rd octet
# - DrJ 2/22/16
 
when CLIENT_ACCEPTED {
# DrJ 2/16
# use ~ 750 addresses available to us in the SNAT pool
#  initialization. uncomment after first run
##set ::counter 0
 
  set clientip [IP::client_addr]
# can we find it in our array?
  set indx [array get ::iparray $clientip]
  set ip [lindex $indx 0]
  if {$ip == ""} {
# add new IP to array
    incr ::counter
# IPs = # IPs per subnet * # subnets = 255 * 3
    set IPs 765
    set serial [expr $::counter % $IPs]
    set subnetOffset [expr $serial / 255]
    set ip4 [expr $serial % 255 ]
    log local0. "Matched blank ip. clientip: $clientip, counter: $::counter, serial: $serial, ip4: $ip4 , subnetOffset: $subnetOffset"
    set ::iparray($clientip) $ip4
    set ::subnetarray($clientip) $subnetOffset
  } else {
# already seen IP
    set ip4 [lindex $indx 1]
    set sindx [array get ::subnetarray $clientip]
    set subnetOffset [lindex $sindx 1]
#    log local0. "Matched seen ip. counter: $::counter, ip4: $ip4 , subnetOffset: $subnetOffset"
  }
  set thrdOctet [expr 141 + $subnetOffset]
  set snat_Subnet "10.112.$thrdOctet"
 
  set newip "$snat_Subnet.$ip4"
#  log local0. "Client IP: [IP::client_addr], indx: $indx, ip4: $ip4, counter, $::counter, ip3: $thrdOctet, newip: $newip"
  snat $newip
# one-time re-set when updating the code...
# Re-set procedure:  uncomment, run, commnt out, run again... Plus set ::counter at the top
#unset ::iparray
#unset ::subnetarray
}
}

Criticism of this approach
Even though there are far fewer users than my 765 addresses, they get their addresses dynamically from many different subnets. So soon the iRule will have encountered 765 unique addresses and be forced to re-use its IPs from the beginning. At that point session stealing is likely to occur all over again! I’ve just delayed the onset.

What I would really need to do is to look for the opportunity to clear out the global arrays and the global counter when it is near its maximum value and the time is favorable, like 1 AM Sunday. But this environment makes such things so hard to program…

A word about the snat pool
I used tmsh to create a snat pool. It looks like this:

snatpool SNAT-GUIxt {
   members {
      10.112.141.0
      10.112.141.1
      10.112.141.2
      10.112.141.3
      10.112.141.4
      10.112.141.5
      10.112.141.6
      10.112.141.7
      10.112.141.8
      10.112.141.9
      10.112.141.10
      10.112.141.11
      10.112.141.12
      10.112.141.13
      10.112.141.14
      10.112.141.15
      10.112.141.16
...

Conclusion
A couple real-world iRules were presented, one significantly more sophisticated than the other. They show how awkward the language is. But it is also powerful and allows to execute some otherwise out-there ideas.

References and related
This article discusses trouble-shooting a virtual server on the load balancer

Tags F5 BigIP, iRule, SNAT

Network Technologies Web Site Technologies

Superimpose grid on video ouput from an IP camera

Post author By john
Post date January 31, 2016
1 Comment on Superimpose grid on video ouput from an IP camera

Intro
We were asked to superimpose a grid on the video output of an IP camera for this year’s FIRST FRC competition, FIRST STRONGHOLD, a sort of medieval-themed contest with a castle and medieval-inspired obstacles. The present thinking is that a cheap D-Link DCS-931L camera will do just fine. It’s $30 on Amazon. I found this is a real research project because it is poorly documented. So in this blog I show how to do that.

The details
D-Link provides viewing software called D-ViewCam. It has a lot of options, but not the ability to superimpose a grid. It’s more being a security console – allowing views from multiple cameras. Capturing and recording images, that sort of thing. i knew in my heart that there had to be a URL to tap into the camera directly, but it wasn’t easy to find. First I found the URL for a capture image:

http://dcs-931l/image.jpg

and that’s all i could find! I thought, OK, I can work with even that. I’ll build a web page that includes that as source image and refreshes itself as fast as possible! And despite the crudeness o that approach, it actually worked. It was a little laggy (maybe 1.2 s or so) an a little jumpy, but good enough for our purposes. Time permitting, I will share that crude code.

HTML5 video image
But then, somewhat by accident, i found a D-Link blog post where they just happened to mention the URL to a video stream that will work in an HTML5-compatible browser such as Firefox. i can’t believe how hidden they keep this URL. It is:

http://dcs-931l/mjpeg.cgi

and you treat it like an image file.

That’s the first breakthrough.

Then I found a Stackoverflow page that described how to superimpose a grid in an HTML page using CSS – Cascading Style Sheets. That sounded pretty good to me. Actually that’s what i searched for. I know there are other ways to do it but Javascript gets ugly quickly and other methods are more kludgy. At least with CSS I feel I am learning something about CSS. I am not a web developer, just a fumbler.

It’s Broken on Firefox
So i carefully implement the Stackoverflow code. you have to understand that it’s presented so tidily that you feel there’s no way it could not work. I tried it out in Firefox. No matter how much I proof-read my code, it only drew the vertical bars of the grid, but not the horizontal lines! So Firefox’s either has a bug, or the features of CSS aren’t agreed upon by all major browser vendors.

At some point I came to try my code in Chrome – worked great! That was a shock. But I wanted it to work in Firefox since that is my principal browser. I finally found that for whatever reason, in Firefox the horizontal bars have to be drawn using a different function. Instead of a more simple linear-gradient CSS function which works just fine for the vertical bars, you need to resort to a more complex repeating-linear-gradient function.

so putting all this together we arrive at the html page code. It’s nice and brief.

<html>
<head>
<style type="text/css">
<!-- DrJ 1/2016
Note that Firefox's implementation of linear-gradient is broken and requires us to
use repeat linear gradient 
Some fairly lousy documentation on repeat linear gradient is here:
https://developer.mozilla.org/en-US/docs/Web/CSS/repeating-linear-gradient
 
-->
* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}
div {
  display: inline-block;
  position: relative;
  margin: 10px;
}
div:after {
  content: '';
  position: absolute;
  height: 100%;
  width: 100%;
  top: 0;
  left: 0;
  background: repeating-linear-gradient(to bottom, black, black 1px, transparent 1px, transparent 80px), linear-gradient(to right, black 1px, transparent 1px);
  background-size: 15%;
  padding: 1px;
}
</style></head>
<body>
<div>
  <img src="http://dcs-931l/mjpeg.cgi" width="480" height="320" />
</div>
</body></html>

That’s it!

Well, mostly. This puts a horizontal bar every 80 pixels. If i change that 80px to 15% (which is the parameter in effect for the vertical bars due to the background-size statement), it will work OK in Firefox. However, it does not work in Chrome. With 80px it works in both browsers.

Network info
Needless to say, dcs-931l is just the hostname of the camera, assuming that mDNS is all working which it generally does. You can replace that with the IP address. of course you have to be on the same LAN as the camera. This is not a setup for viewing the camera from the Internet which I haven’t looked into yet. mDNS is multicast DNS. I think this technology or its equivalent is pretty common in home networks these days. It’s a convenient way to assign (and later refer to by that name) a hostname to a dynamic IP address. There’s a Wikipedia article about it which gets pretty technical.

Where to put that HTML page – stupid Notepad tricks
Most people automatically feel HTML pages have to be on a web server, but they don’t. You can put that HTML above into a file on your PC and that’s what we will do. No local web server required at all. I just saved the file as “grid.htm” in Notepad – yes it’s as crude as it gets but I said I’m not a web developer. Yes, anyone who knew anything would at least get Notepad++, but oh well. By the way, to save a .htm file in Notepad just specify All Files and put the name in quotes “grid.htm”. I save it to C:\temp, so the URL becomes:

c:\temp\grid.htm

It shows up a little differently, but that’s what I typed in. And here’s a screen capture of my live video with the grid superimposed, just so it’s been documented as really working!

Measuring the lag of the video display
In this blog post I show an accessible technique for measuring lag that only requires two smartphones. I love to show this to students. They get all confused at first, but when you do it you see how obviously simple and accurate it is. So we measured the lag as .51 seconds. So not the best, but not terrible either.

Superimpose crosshairs instead of grid
Now that we’ve set up the basic approach, changing from a whole grid to just crosshairs, with thicker lines is as simple as changing 15% to 50%, plus changing 1px to 2px.

Password prompt
But we still get that password prompt initially when brining up our local web page. Even that can be fixed by embedding the username/passwor dinto the URL. Putting crosshairs and password toegther we arrive at this version:

<html>
<head>
<style type="text/css">
<!-- DrJ 1/2016
Note that Firefox's implementation of linear-gradient is broken and requires us to
use repeat linear gradient 
Some fairly lousy documentation on repeat linear gradient is here:
https://developer.mozilla.org/en-US/docs/Web/CSS/repeating-linear-gradient
 
-->
* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}
div {
  display: inline-block;
  position: relative;
  margin: 10px;
}
div:after {
  content: '';
  position: absolute;
  height: 100%;
  width: 100%;
  top: 0;
  left: 0;
  background: repeating-linear-gradient(to bottom, black, black 2px, transparent 1px, transparent 50px), linear-gradient(to right, black 2px, transparent 2px);
  background-size: 50%;
  padding: 1px;
}
</style></head>
<body>
<div>
  <img src="http://admin:your_camera_password@dcs-931l/mjpeg.cgi" width="480" height="320" />
</div>
</body></html>

That is the Firefox version, of course. Replace your_camera_password with your camera’s password. Don’t use a password which contains the “@” character or things will get really complicated!

References and related
Link to competition information, including brief videos.
Cheap but functional D-Link video camera.
Stackoverflow description of superimposing a grid on an image using CSS.
Multicast DNS is described in excruciating detail here.
Blog post on measuring lag and getting streaming to work on the Raspberry Pi camera.

Tags css, Firefox

Network Technologies

WAN performance monitor tool

Post author By john
Post date January 14, 2016
No Comments on WAN performance monitor tool

Intro
This is an interesting tool I just learned about, iPerf. It has a server and client mode. You can install it on a PC, but you need admin access to the PC.

It has a client and server mode. As client I was told to run the following test:

c:\apps\iperf> iperf3 -c 10.12.13.10 -b 50M -l 1024k -w 512k -t 25

References and related
Download site.

Tags iPerf

Network Technologies Raspberry Pi

Dig for Windows or Raspberry Pi

Post author By john
Post date January 11, 2016
5 Comments on Dig for Windows or Raspberry Pi

Windows

Intro
Dig is a really useful networking tool. I use it several times a day. But always on Linux where it’s usually built-in. On Raspberry Pi’s raspbian you can install it with a simple apt-get install dnsutils. Then I learned it wasn’t hard at all to install on Windows, especially as a fairly minimalist installation that just puts files on your PC and makes no changes to the Registry, which is all you really need for light use.

The details
Go to http://www.isc.org/downloads/. Expand BIND.

Click download button for the current stable release.
Pick the win-64-bit link (because chances are you’re running Windows 64 bit these days) and wait for download to complete.
Open up zip file.
Unzip or extract all files to (this is my suggestion) c:\apps\bind.

To run it
Open a command window. Probably easiest way is hold down Windows key + r and type in cmd. In CMD window simply type \apps\bind\dig to run dig like you do on Linux.

Example commands
Example 1, Resolve address for google.com

C:\> \apps\bind\dns google.com

; &lt;&lt;&gt;&gt; DiG 9.9.8-P2 &lt;&lt;&gt;&gt; google.com
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 24929
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
 
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.                    IN      A
 
;; ANSWER SECTION:
google.com.             88      IN      A       173.194.207.113
google.com.             88      IN      A       173.194.207.139
google.com.             88      IN      A       173.194.207.138
google.com.             88      IN      A       173.194.207.101
google.com.             88      IN      A       173.194.207.102
google.com.             88      IN      A       173.194.207.100
 
;; Query time: 41 msec
;; SERVER: 192.168.2.1#53(192.168.2.1)
;; WHEN: Mon Jan 11 12:16:17 Eastern Standard Time 2016
;; MSG SIZE  rcvd: 135

This gives all kinds of useful information – what your default DNS server is (at the bottom – mine is 192.168.2.1), how long the query took *this one: 41 msec), whether the answer is authoritative or not (no AA flag here, so this is not an authoritative answer), as well as the answer to the question posed.

Example 2, Resolve nameserver records for the domain amazon.com using Google’s DNS server 8.8.8.8 over TCP from our local IP address of 192.168.2.3

We started out slow, but this example throws the kitchen sink at you to show the power of dig!

C:\> \apps\bind\dig +tcp -b 192.168.2.3 ns amazon.com @8.8.8.8

; &lt;&lt;&gt;&gt; DiG 9.9.8-P2 &lt;&lt;&gt;&gt; +tcp -b 192.168.2.3 ns amazon.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 64444
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
 
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;amazon.com.                    IN      NS
 
;; ANSWER SECTION:
amazon.com.             3599    IN      NS      ns3.p31.dynect.net.
amazon.com.             3599    IN      NS      ns4.p31.dynect.net.
amazon.com.             3599    IN      NS      ns1.p31.dynect.net.
amazon.com.             3599    IN      NS      pdns1.ultradns.net.
amazon.com.             3599    IN      NS      pdns6.ultradns.co.uk.
amazon.com.             3599    IN      NS      ns2.p31.dynect.net.
 
;; Query time: 50 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Jan 11 12:27:26 Eastern Standard Time 2016
;; MSG SIZE  rcvd: 188

The only problem is that I don’t think the TCP option actually worked – I gotta run wireshark to verify. On Linux it definitely works! Not sure what’s wrong with windows. But the other options are working as designed.

OK, wireshark install is failing, but I ran tcpdump on a DNS server I run and confirmed that indeed the +tcp option is working forcing dig to use TCP communication for those queries.

Raspberry Pi

I believe you do

$ sudo apt-get install bind9-dnsutils

At least on a generic Debian system that works. I have to confirm on RPi still.

Conclusion
We’ve demonstrated a low-impact way to install dig for Windows and shown some examples of using it.

References and related

Current BIND link from ISC: https://downloads.isc.org/isc/bind9/9.16.8/BIND9.16.8.x64.zip

Or…you get get dig through a Cygwin installation. I’ve written about Cygwin here: Cygwin. Or just go to cygwin.com.

Tags dig, dnsutils, RPi

Network Technologies TCP/IP

Spin up your own VPN with OpenVPN

Post author By john
Post date January 7, 2016
No Comments on Spin up your own VPN with OpenVPN

Intro
I recently visited a foreign country where I was unable to watch an Amazon Prime Original show because of my location. Annoyed, I decided then and there to investigate OpenVPN. I am an ideal candidate – I already run my own Linux server in the Amazon cloud, and I know Linux and networking, so I’ve pretty much got all the ingredients already present. The software is free and I will incur no additional cost if I ever do get it working since I already pay for my server which is primarily used as a low-demand web server. Little did I know what I was getting myself into!

The details
I seem to have made every mistake in the book, but I have a strong grasp of the networking involved so I was confident in my general approach. Seeing how configurable the software is I decided to ramp up my efforts incrementally, introducing more and more features until I arrived to the minimum desired (still not there, by the way, but getting close!).

The package on various OSes
Conveniently, openvpn is an installable package on SLES and Centos. On SLES a
zypper install openvpn suffices whereas in CentOS a yum install openvpn does the trick. In Debian Linux such as Raspbian, an apt-get install openvpn works to install it.

Then look at the examples at the bottom of the man page for openvpn. I picked two servers where I have root and tried to replicate their most basic example. I think this is really important to crawl before you run. To paraphrase it:

Example 1: A simple tunnel without security
       On may:
 
              openvpn --remote june.kg --dev tun1 --ifconfig 10.4.0.1 10.4.0.2 --verb 9
 
       On june:
 
              openvpn --remote may.kg --dev tun1 --ifconfig 10.4.0.2 10.4.0.1 --verb 9
 
       Now verify the tunnel is working by pinging across the tunnel.
 
       On may:
 
              ping 10.4.0.2
 
       On june:
 
              ping 10.4.0.1

june.kg and may.kg are the IP addresses or FQDNs of the june and may server, respectively.

If you installed from a package you probably don’t need these preliminary steps they mention:

              mknod /dev/net/tun c 10 200
 
              modprobe tun

I checked it by a directory listing of /dev/net/tun:

crw-rw-rw- 1 root root 10, 200 Jan  6 08:35 /dev/net/tun

and running modprobe tun for good measure.

And, yes, their basic example worked. How? The command creates a virtual adapter, tun0, specially designed for tunnels, which records the private tunnel IP of the server itself as well as the IP of the tunnel endpoint on the other server.

It’s so cool. What’s not well explained is that you ought to choose completely private IPs for building your tunnels that don’t interfere with your other private IPs. You’ll see I eventually settled on 172.27.28.29 – I’ve never seen that range used anywhere.

You quit running the openvpn command and the tun0 adapter is destroyed. It’s very tidy.

A small word about routing for now
What about routing. How does that work? What you probably didn’t appreciate is that in this simple example although you can ping each other using the tunnel IP, you really can’t do any more than that unless you start to introduce additional routes. So as is it’s a long way from where we need to be. More on routing later. Check your route table with a netstat -rn

Finding the right example is surprisingly hard
For such a popular program you’d think examples of what I’m trying to do – change the effective IP of my laptop – would abound and implementation would be a piece of cake. But alas, nothing could be further from the truth. I’ve yet to see a complete example so I cobbled together things from various places.

You gotta understand that for a lot of people with enough tech-savvy to write up what they did, I guess they were just tickled pink to be able to tunnel through their home router and access their home networks. That’s fine and all, but it’s not all that pertinent to my usage, where the routing considerations are pretty different.

Anywho, this article is really helpful, though not sufficient by itself:

https://openvpn.net/index.php/open-source/documentation/miscellaneous/78-static-key-mini-howto.html

That describes a single server/client setup with simple security.

Server config
After a lot of debugging and incremental steps, I’m currently using this file with filepath /etc/openvpn/server.conf on my Amazon AWS server:

# -DrJ 1/5/16
# simple one server/one client setup...
# https://openvpn.net/index.php/open-source/documentation/miscellaneous/78-static-key-mini-howto.html
# static.key generated with openvpn --genkey --secret static.key
# iptables NAT discussion: http://www.karlrupp.net/en/computer/nat_tutorial
# using (simple udp test worked): iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
# list: iptables -t nat -L -v
# dig on RasPi: from dnsutils
dev tun
# 1194 is default, but...
port 2096
ifconfig 172.27.28.29 172.27.28.30
secret static.key
#use compresison
comp-lzo
# resist failures
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
# run as daemon
user nobody
group nobody
daemon
proto tcp-server

An experimental client config file currently looks like this:

# for understanding what openvpn can do...
# simple single server/client setup from https://openvpn.net/index.php/open-source/documentation/miscellaneous/78-static-ke
y-mini-howto.html
# - DrJ 1/6/16
remote drjohnstechtalk.com
# 1994 is default but I'll use this one...
port 2096
dev tun
ifconfig 172.27.28.30 172.27.28.29
secret static.key
# resist failures
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
#use compresison
comp-lzo
# for testing
proto tcp-client
# a test host route
#route 172.16.0.23
# a test network route which includes endpoint
route 172.16.0.0 255.240.0.0
# stands for broad, Internet route which may overlap our route to the openvpn server: doesn't work by itself!
route drjohnstechtalk.com 255.255.255.255 net_gateway
route 50.17.188.0 255.255.255.0

Works through proxy
Amazingly, I can confirm openvpn works through a standard http proxy. This is both cool and a little scary. To the above config file I added something like this:

# use proxy which requires basic authentication
http-proxy proxy-1.johnstechtalk.com 8080 authfile

where authfile is in the same directory as client.conf (/etc/eopnvpn) and has the proxy username and password on two separate lines.

You have to run your server in TCP mode in order to access it from a client that’s behind an http proxy however, hence the proto tcp-server in the config line of the server and the proto tcp-client line in the client config. I’ll experiment with that to see if that’s costing performance.

A little more on routing
Why the circumspect routes? I’m deathly afraid of locking myself out of these servers by implementing the wrong routes! And I learned a lot bootstrapping my way to broader routes. The thing is, I noticed my AWS server uses this private IP address for its DNS server: 172.16.0.23 (cat /etc/resolv.conf, for example). So I realized that I could introduce a host route to that IP on the openvpn client to begin the arduous process of building up and testing real routing. How to test? Simple. With commands like

> dig ns johnstechtalk.com @172.16.0.23

on the client.

Dig is a useful networking tool. I describe installing it on Windows systems in this post.

Point of concern over performance
Even before expanding the routes I am concerned about TCP performance. I noticed that when I add the +tcp option to dig to force the query to use TCP I get a big performance penalty. a regular query that takes 60 msec to the AWS DNS server from my PC takes anywhere from 200 – 400 msec over TCP! Now there are a bunch more packets a back-and-forth, but all that aside and still there is a disconcerting performance hit of 100 msec or so. By contrast the enterprise-class VPN provided by Juniper suffers from no such TCP performance penalty – I know because I tested dig over a Juniper VPN using the JunOS Pulse client.

Up our game, try to run as a full server w/ DHCP and everything
I’ll show the config file below. Let me just mention that my first attempt didn’t work because I continued to use a shared secret (the secret declaration), but that is incompatible with a new statement I introduced on the server:

server 172.27.28.32 255.255.255.224

So I gotta bite the bullet and get my PKI infrastructure up and running. In the old days they bundled easy-rsa with openvpn. Now you have to install it separately. I was able to do a yum install easy-rsa on my CentOS instance.

Hey, easy-rsa is pretty cool – I might be able to use that for other things. I always wanted to know how to create my own CA! The instructions in the Howto are so complete that there’s really no need to go over that here. HowTo link.

I copied my /usr/share/easy-rsa/2.0/* files to /etc/openvpn/rsa and did all the pki-buildingwork there. But I don’t want to go crazy with encyrption so I downgraded diffie-hellman from 2048 to 1024 bits:

$ openssl dhparam -out dh1024.pem 1024

Full Monty – doing a full VPN
Enough warm-up. I tried full VPN and for some reason hit the right configuration on the first try.

Here’s my server.conf file on my Amazon AWS server.

# -DrJ 1/13/16
# multi-client server from https://openvpn.net/index.php/open-source/documentation/howto.html#examples
# static.key generated with openvpn --genkey --secret static.key
# iptables NAT discussion: http://www.karlrupp.net/en/computer/nat_tutorial
# using (simple udp test worked): iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
# list: iptables -t nat -L -v
port 1194
proto udp
dev tun
ca rsa/keys/ca.crt
cert rsa/keys/server.crt
key rsa/keys/server.key  # This file should be kept secret
dh rsa/keys/dh1024.pem
# pool of addresses for clients and server. server gets 172.27.28.33
server 172.27.28.32 255.255.255.224
ifconfig-pool-persist ipp.txt
# very experimental
push "redirect-gateway def1 bypass-dhcp"
# just use closest Google DNS server
push "dhcp-option DNS 8.8.8.8"
# resist failures
keepalive 10 60
ping-timer-rem
# use compression
comp-lzo
# only allow two clients max for now
max-clients 2
user nobody
group nobody
persist-key
persist-tun
status openvpn-status.log
# verbosity level. 0 - all but fatal errors, 9 - extremely verbose
verb 3

And here’s my Windows 10 client file.

# from https://openvpn.net/index.php/open-source/documentation/howto.html#examples
# - DrJ 1/17/16
client
dev tun
proto udp
remote drjohnstechtalk.com 1194
nobind
# resist failures
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
#use compresison
comp-lzo
# SSL/TLS parmas
ca ca.crt
cert client1.crt
key client1.key
# stands for broad, Internet route which may overlap our route to the openvpn server: doesn't work by itself!
route drjohnstechtalk.com 255.255.255.255 net_gateway
verb 3

I was a bit concerned the DHCP stuff might not work, but it did, including assigning a Google DNS server at 8.8.8.8.

I generated the client1.crt and client1.key using easy-rsa. These plus ca.crt I copied down to my laptop. For security I then deleted client1.key on my server (so no one else can grab it).

Performance
Performance is amazing. There no longer is the penalty for doing dig over tcp. speedtest.net actually shows better results when I am connected over my VPN! This is a totally unexpected surprise. I guess that’s due to the compression, using UDP and not using overly strong encryption.

Speedtest results
speedtest.net results without VPN, i.e., regular mode: 8.0 mbps download, 1.0 mbps upload. With VPN I got 11.0 mbps download and 1.2 mbps upload.

A platform that didn’t work
You’ll see from my other blogs that I am a Raspberry Pi fan. so naturally I wanted to bring up a VPN client on the Raspberry Pi. I am stuck on this point however because unlike the SLES Linux I tested with, Raspbian brings up a tun interface but then drops the wlan0 IP address so all communication to it is lost. Maybe it would work better wired – I’ll try it and post the results.

Still more about routing
I never understood how the openvpn examples were supposed to work until I had to implement them. Indeed they generally wouldn’t work until you address some routing and NAT issues. There is no magic. I did traces to show my client was trying to communicate with its assigned private IP of 172.27.28.29. Well that’s never going to work. You have to make sure routing is enabled on your server:

On Linux, use the command:
$ echo 1 > /proc/sys/net/ipv4/ip_forward

But that’s not sufficient either. That doesn’t address that the packets from your client have the wrong IP address as far as the outside world is concerned. So you have to NAT (address translate) those packets. I don’t like iptables but this turned out to be easier than expected. As mentioned in my server config file you run:

$ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

This is the boiled down summary from this helpful article.

Don’t forget that to look at your NAT rules you need a command like this:

$ iptables -t nat -L -v

not simply an iptables -L.

Note that this is a hide NAT – I am not making the openvpn client visible as a server on the Internet. That’s a lot harder with IP addresses being in scarce supply.

Does it work to solve my original problem? Well I’m not in a foreign country to run that test, but I can vouch that I can run Amazon Prime through openvpn. So I expect it will work overseas as well.

It works so well how do I know I’m really running through openvpn?

Many ways. For instance my routing table. From a CMD window:

C:> netstat -rn

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0      192.168.2.1      192.168.2.8     25
          0.0.0.0        128.0.0.0     172.27.28.37     172.27.28.38     20
    50.17.188.196  255.255.255.255      192.168.2.1      192.168.2.8     25
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    306
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    306
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    306
        128.0.0.0        128.0.0.0     172.27.28.37     172.27.28.38     20
      169.254.0.0      255.255.0.0         On-link       192.168.2.8    306
  169.254.255.255  255.255.255.255         On-link       192.168.2.8    281
     172.27.28.33  255.255.255.255     172.27.28.37     172.27.28.38     20
     172.27.28.36  255.255.255.252         On-link      172.27.28.38    276
     172.27.28.38  255.255.255.255         On-link      172.27.28.38    276
     172.27.28.39  255.255.255.255         On-link      172.27.28.38    276
      192.168.2.0    255.255.255.0         On-link       192.168.2.8    281
      192.168.2.8  255.255.255.255         On-link       192.168.2.8    281
    192.168.2.255  255.255.255.255         On-link       192.168.2.8    281
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    306
        224.0.0.0        240.0.0.0         On-link       192.168.2.8    281
        224.0.0.0        240.0.0.0         On-link      172.27.28.38    276
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    306
  255.255.255.255  255.255.255.255         On-link       192.168.2.8    281
  255.255.255.255  255.255.255.255         On-link      172.27.28.38    276

Specifically note the two routes to 0.0.0.0 netmask 128.0.0.0 and 128.0.0.0 netmask 128.0.0.0 via gateway 172.27.28.37. That’s just what this experimental command in my server config file was supposed to do:

push "redirect-gateway def1 bypass-dhcp"

namely, create two broad routes to the entire Internet, just slightly more specific than a default route so they take precedence over the default route – I think it’s a clever idea. And of course ipconfig shows my private IP address:

Ethernet adapter Ethernet 2:
 
   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::8871:c910:185e:953b%5
   IPv4 Address. . . . . . . . . . . : 172.27.28.38
   Subnet Mask . . . . . . . . . . . : 255.255.255.252
   Default Gateway . . . . . . . . . :

References and related
No patience to roll your own? Five top commercial VPN offerings are described here: http://www.itworld.com/article/3152904/security/top-5-vpn-services-for-personal-privacy-and-security.html
A lightweight dig installation method for Windows is described here.
My article about iptables.

Tags iptables, NAT, openvpn, VPN performance

Network Technologies Python

Tips on using scapy for custom IP packets

Post author By john
Post date December 18, 2015
No Comments on Tips on using scapy for custom IP packets

Intro
scapy is an IP packet customization tool that keeps coming up in my searches so I could no longer avoid it. I was unnecessarily intimidated because it was built around python and the documentation is a little strange. But I’m warming up to it now…

The details
Download and install
CentOS
Just go to scapy.net and it will propose to you to download the .zip file. I got scapy-2.3.1.zip. Then you can unzip it; change directory to the scapy-2.3.1 sub-directory and run

$ sudo python setup.py install

Debian systems such as Raspberry Pi
Simple. It’s just:

$ sudo apt-get install python-scapy

Usage modes
scapy can be called from within python, but if you’re afraid to do that like I am, you can run it from the command line which simply throws you into a python shell. I’m finding that a lot more comfortable as I slowly learn python syntax and some useful shortcuts.

Example 1
The background
Let’s cut to the chase and do something hard first. Remember how we got those Cisco Jabber packets with DSCP set, causing Cisco Jabber to not work for some users? The long-term solution according to that post is to turn off the DSCP flag for all packets on the Internet router. So we want to be able to generate packets under our control with that flag set so we can see if we’ve managed to turn it off correctly.

DSCP value occupies the first 6 bits of the 8-bit tos field. The packets we got from Cisco had DSCP of 0x2e which is Expedited Forwarding (EF), and if you do the math that corresponds to tos of 0xb8 which in decimal is 184.

$ sudo scapy
>>> sr(IP(dst="50.17.188.196",tos=184)/TCP(dport=80,sport=4025))

Begin emission:
....Finished to send 1 packets.
.*
Received 6 packets, got 1 answers, remaining 0 packets
(<Results: TCP:1 UDP:0 ICMP:0 Other:0>, <Unanswered: TCP:0 UDP:0 ICMP:0 Other:0>)
>>>

Instead of the call to sr you can simply use send. Breaking this down, I’m testing against my drjohns server with IP 50.17.188.196. tos is a property of an IP packet so it’s included as a keyword argument to the IP function. The “/” following the IP function is funny syntax but it somehow says that more properties at different layers are coming. So in the TCP section I used keyword arguments and set source port of 4025 and destination port of 80. What I observed is that this will send a SYN packet even though I didn’t explicitly identify that.

Want to have a random source port like “real” packets? Then use this:

$ >>> sr(IP(dst="50.17.188.196",tos=184)/TCP(dport=80,sport=RandShort()))

Look for it
I know tcpdump better so I look for my packet with that tool like this:

$ sudo tcpdump -v -n -i eth0 host 71.2.39.115 and port 80

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
19:33:29.749170 IP (tos 0xb8, ttl 39, id 1, offset 0, flags [none], proto TCP (6), length 44)
    71.2.39.115.partimage > 10.185.21.116.http: Flags [S], cksum 0xd97b (correct), seq 0, win 8192, options [mss 1460], length 0
19:33:29.749217 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 44)
    10.185.21.116.http > 71.2.39.115.partimage: Flags [S.], cksum 0x3e39 (correct), seq 3026513916, ack 1, win 5840, options [mss 1460], length 0
19:33:29.781193 IP (tos 0x0, ttl 41, id 19578, offset 0, flags [DF], proto TCP (6), length 40)

Interpretation
Our tos was wiped clean by the time our generated packet was received by Amazon AWS. This was a packet I sent from my home using my Raspberry Pi. So likely my ISP CenturyLink is removing QOS from packets its residential customers send out. With some ISPs and business class service I have seen the tos field preserved exactly. When sent from Amazon AWS I saw the field value altered, but not set to 0!

Example 2, ping
>>> sr(IP(dst="8.8.8.8")/ICMP())

Begin emission:
Finished to send 1 packets.
.*
Received 2 packets, got 1 answers, remaining 0 packets
(<Results: TCP:0 UDP:0 ICMP:1 Other:0>, <Unanswered: TCP:0 UDP:0 ICMP:0 Other:0>)

Getting info on return packet
$ >>> sr1(IP(dst="drjohnstechtalk.com",tos=184)/TCP(dport=80,sport=RandShort()))

Begin emission:
...............................................................................................Finished to send 1 packets.
...........................................*
Received 139 packets, got 1 answers, remaining 0 packets
<IP  version=4L ihl=5L tos=0x0 len=44 id=0 flags=DF frag=0L ttl=25 proto=tcp chksum=0xe1d7 
src=50.17.188.196 dst=144.29.1.2 options=[] |<TCP  sport=http dport=17176 seq=3590570804 ack=1 dataofs=6L reserved=0L flags=SA window=5840 chksum=0x24b0 urgptr=0 options=[('MSS', 1460)] |<Padding  load='\x00\x00' |>>>

Note that this tells me about the return packet, which is a SYN ACK. So it tells me my SYN packet must have been sent from port 17176 (it changes every time because I’ve included sport=RandShort()). Each “.” in the response indicates a packet hitting the interface. I guess it’s promiscuously listening on the interface.

Hitting a closed port
$ >>> sr1(IP(dst="drjohnstechtalk.com",tos=184)/TCP(dport=81,sport=RandShort()))

Begin emission:
....................................................................................
.........................................................................................................Finished to send 1 packets.
...............................................................................
.................................................................................
.........................................................................................

Basically those dots are going to keep going forever until you type -C, because there will be no return packet if something like a firewall is dropping your packet, or the returned packet.

Useful shortcuts
The scapy commands look pretty daunting at first, right? And too much trouble to type in, right? Just get it right once and you’re set. In typical networking debugging you’ll be running such test packets multiple times. Because it’s basically a python shell, you can use the up arrow key to recall the previous thing, or hit it multiple times to scroll through your previously typed commands. And even if you exit and return, it still remembers your command history so you can hit the up-arrow to get back to your commands from previous sesisons and previous days.

References and related
This scapy for dummies guide is very well written.
I’m finding this python tutorial really helpful.
DSCP and explanation of Cisco Jabber not working is described here.
A simpler tool which is fine for most things is nmap. I provide some real-world examples in this blog post.

Tags DSCP, scapy

Admin Network Technologies Proxy Security TCP/IP Web Site Technologies

The IT Detective Agency: Cisco Jabber stopped working for some using WAN connections

Post author By john
Post date December 4, 2015
No Comments on The IT Detective Agency: Cisco Jabber stopped working for some using WAN connections

Intro
This is probably the hardest case I’ve ever encountered. It’s so complicated many people needed to get involved to contribute to the solution.

Initial symptoms
It’s not easy to describe the problem while providing appropriate obfuscation. Over the course of a few days it came to light that in this particular large company for which I consult many people in office locations connected via an MPLS network were no longer able to log in to Cisco Jabber. That’s Cisco’s offering for Instant Messaging. When it works and used in combination with Cisco IP phones it’s pretty good – has some nice features. This major problem was first reported November 17th.

Knee-jerk reactions
Networking problem? No. Network guys say their networks are running fine. They may be a tad overloaded but they are planning to route Internet over the secondary links so all will be good in a few days.
Proxy problem? Nope. proxy guys say their Bluecoat appliances are running fine and besides everyone else is working.
Application problem? Application owner doesn’t see anything out of the ordinary.
Desktop problem? Maybe but it’s unclear.

Methodology
So of the 50+ users affected I recognized two power users that I knew personally and focussed on them. Over the course of days I learned:
– problem only occurs for WAN (MPLS) users
– problem only occurs when using one particular proxy
– if a user tries to connect often enough, they may eventually get in
– users can get in if they use their VPN client
– users at HQ were not affected

The application owner helpfully pointed out the URL for the web-based version of Cisco Jabber: https://loginp.webexconnect.com/… Anyone with the problem also could not log in to this site.

So working with these power users who patiently put up with many test suggestions we learned:

– setting the PC’s MTU to a small value, say 512 up to 696 made it work. Higher than that it generally failed.
– yet pings of up to 1500 bytes went through OK.
– the trace from one guy’s PC showed all his packets re-transmitted. We still don’t understand that.
– It’s a mess of communications to try to understand these modern, encrypted applications
– even the simplest trace contained over 1000 lines which is tough when you don’t know what you’re looking for!
– the helpful networking guy from the telecom company – let’s call him “Regal” – worked with us but all the while declaring how it’s impossible that it’s a networking issue
– proxy logs didn’t show any particular problem, but then again they cannot look into SSL communication since it is encrypted
– disabling Kaspersky helped some people but not others
– a PC with the problem had no problem when put onto the Internet directly
– if one proxy associated with the problem forwarded the requests to another, then it begins to work
– Is the problem reproducible? Yes, about 99% of the time.
– Do other web sites work from this PC? Yes.

From previous posts you will know that at some point I will treat every problem as a potential networking problem and insist on a trace.

Biases going in
So my philosophy of problem solving which had stood the test of time is either it’s a networking problem, or it’s a problem on the PC. Best is if there’s a competition of ideas in debugging so that the PC/application people seek to prove beyond a doubt it is a networking problem and the networking people likewise try to prove problem occurs on the PC. Only later did I realize the bias in this approach and that a third possibility existed.

So I enthused: what we need is a non-company PC – preferably on the same hardware – at the same IP address to see if there’s a problem. Well we couldn’t quite produce that but one power user suggested using a VM. He just happened to have a VM environment on his PC and could spin up a Windows 7 Professional generic image! So we do that – it shows the problem. But at least the trace form it is a lot cleaner without all the overhead of the company packages’ communication.

The hard work
So we do the heavy lifting and take a trace on both his VM with the problem and the proxy server and sit down to compare the two. My hope was to find a dropped packet, blame the network and let those guys figure it out. And I found it. After the client hello (this is a part of the initial SSL protocol) the server responds with its server hello. That packet – a largeish packet of 1414 bytes – was not coming through to the client! It gets re-transmitted multiple times and none of the re-transmits gets through to the PC. Instead the PC receives a packet the proxy never sent it which indicates a fatal SSL error has occurred.

So I tell Regal that look there’s a problem with these packets. Meanwhile Regal has just gotten a new PC and doesn’t even have Wireshark. Can you imagine such a world? It seems all he really has is his tongue and the ability to read a few emails. And he’s not convinced! He reasons after all that the network has no intelligent, application-level devices and certainly wouldn’t single out Jabber communication to be dropped while keeping everything else. I am no desktop expert so I admit that maybe some application on the PC could have done this to the packets, in effect admitting that packets could be intercepted and altered by the PC even before being recorded by Wireshark. After all I repeated this mantra many times throughout:

This explanation xyz is unlikely, and in fact any explanation we can conceive of is unlikely, yet one of them will prove to be correct in the end.

Meanwhile the problem wasn’t going away so I kludged their proxy PAC file to send everyone using jabber to the one proxy where it worked for all.

So what we really needed was to create a span port on the switch where the PC was plugged in and connect a 2nd PC to a port enabled in promiscuous mode with that mirrored traffic. That’s quite a lot of setup and we were almost there when our power user began to work so we couldn’t reproduce the problem. That was about Dec 1st. Then our 2nd power user also fell through and could no longer reproduce the problem either a day later.

10,000 foot view
What we had so far is a whole bunch of contradictory evidence. Network? Desktop? We still could not say due to the contradictions, the likes of which I’ve never witnessed.

Affiliates affected and find the problem
Meanwhile an affiliate began to see the problem and independently examined it. They made much faster progress than we did. Within a day they found the reason (suggested by their networking person from the telecom, who apparently is much better than ours): the server hello packet has the expedited forwarding (EF) flag set in the differentiated code services point (DSCP) section of the IP header.

Say what?
So I really got schooled on this one. I was saying It has to be an application-aware “something” on the network or PC that is purposefully messing around with the SSL communication. That’s what the evidence screamed to me. So a PC-based firewall seemed a strong contender and that is how Regal was thinking.

So the affiliate explained it this way: the company uses QOS on their routers. Phone (VOIP) gets priority and is the only application where the EF bit is expected to be set. VOIP packets are small, by the way. Regular applications like web sites should just use the default QOS. And according to Wikipedia, many organizations who do use QOS will impose thresholds on the EF pakcets such that if the traffic exceeds say 30% of link capacity drop all packets with EF set that are over a certain size. OK, maybe it doesn’t say that, but that is what I’ve come to understand happens. Which makes the dropping of these particular packets the correct behaviour as per the company’s own WAN contract and design. Imagine that!

Smoking gun no more
So now my smoking gun – blame it on the network for dropped packets – is turned on its head. Cisco has set this EF bit on its server hello response on the loginp.webexconnect.com web site. This is undesirable behaviour. It’s not a phone call after all which requires a minimum jitter in packet timing.

So next time I did a trace I found that instead of EF flag being set, the AF (Assured Forwarding) flag was set. I suppose that will make handling more forgiving inside the company’s network, but I was told that even that was too much. Only default value of 0 should be set for the DSCP value. This is an open issue in Cisco’s hands now.

But at least this explains most observations. Small MTU worked? Yup, those packets are looked upon more favorably by the routers. One proxy worked, the other did not? Yup, they are in different data centers which have different bandwidth utilization. The one where it was not working has higher utilization. Only affected users are at WAN sites? Yup, probably only the WAN routers are enforcing QOS. Worked over VPN, even on a PC showing the problem? Yup – all VPN users use a LAN connection for their proxy settings. Fabricated SSL fatal error packet? I’m still not sure about that one – guess the router sent it as a courtesy after it decided to drop the server hello – just a guess. Problem fixed by shutting down Kaspersky? Nope, guess that was a red herring. Every problem has dead ends and red herrings, just a fact of life. And anyway that behaviour was not very consistent. Problem started November 17th? Yup, the affiliate just happened to have a baseline packet trace from November 2nd which showed that DSCP was not in use at that time. So Cisco definitely changed the behaviour of Cisco Jabber sometime in the intervening weeks. Other web sites worked, except this one? Yup, other web sites do not use the DSCP section of the IP header so it has the default value of 0.

Conclusion
Cisco has decided to remove the DSCP flag from these packets, which will fix everything. Perhaps EF was introduced in support of Cisco Jabber’s extended use as a soft phone??? Then this company may have some re-design of their QOS to take care of because I don’t see an easy solution. Dropping the MTU on the proxy to 512 seems pretty drastic and inefficient, though it would be possible. My reading of TCP is that nothing prevents QOS from being set on any sort of TCP packet even though there may be a gentleman’s agreement to not ordinarily do so in all except VOIP packets or a few other special classes. I don’t know. I’ve really never looked at QOS before this problem came along.

The company is wisely looking for a way to set all packets with DSCP = 0 on the Intranet, except of course those like VOIP where it is explicitly supposed to be used. This will be done on the Internet router. In Cisco IOS it is possible with a policy map and police setting where you can set set-dscp-transmit default. Apparently VPN and other things that may check the integrity of packets won’t mind the DSCP value being altered – it can happen anywhere along the route of the packet.

Boy applications these days are complicated! And those rare times when they go wrong really require a bunch of cooperating experts to figure things out. No one person holds all the expertise any longer.

My simplistic paradigm of its either the PC or the network had to make room for a new reality: it’s the web site in the cloud that did them in.

Could other web sites be similarly affected? Yes it certainly seems a possibility. So I now know to check for use of DSCP if a particular web site is not working, but all others are.

References and related
This Wikipedia article is a good description of DSCP: https://en.wikipedia.org/wiki/Differentiated_services

Admin Network Technologies

The IT Detective agency: bad PING times explained

Post author By john
Post date September 14, 2015
No Comments on The IT Detective agency: bad PING times explained

Intro
In a complicated corporate environment somewhat unusual problems can be made extremely difficult to debug as there may be many technicians involved, each one knowing just their piece of the infrastructure. Such was the case when i consulted for a problem in which a company reported slow Internet response for users in Asia.

The details
In preparation for an evening call I managed to get enough access to be able to log into the proxy server their Asian users use. Just issuing regular commands was slow, often hanging for many seconds.

The call
So we had the call at 9 PM. These things are always pretty amazing in the sense of How do corporations ever get anything done when they’ve outsourced so much? So there was a representative from the firewall team from Germany, representative from the telecom in the US, a couple representatives from the same telecom but stationed in Asia, an employee who oversees the telecom vendor in the US and me, representing the proxy service normally handled by a group in Europe. The common language was English, of course, though that doesn’t mean that everyone was easy to understand for us native speakers. No one on the call had any real familiarity with the infrastructure. We were all essentially reverse engineering it from a diagram the telecom produced.

The firewall guy and I both noticed that PINGs from the proxy to its gateway (which was a firewall) took 50 msec. I’ve never seen that. Same for another piece of equipment using that same firewall interface. The telecom, which was responsible for the switches, could not actually log in to all of them. Only the firewall guy could reach a few of them. And he eventually figured out the diagram we were all using was somewhat wrong and different switches were in use for some of the equipment than indicated. Which was important because we wanted to check the interface status on the switch.

So imagine this. The guy with no access to anything, the vendor overseer in the US, patiently asks for the results of a few commands to be shared (by email) with the group. He gets the results of show interface status and identifies one port as looking off. It’s listed as 100 mbit instead of 1 gig. In addition to the strange PNIG times, when we PINGed from equipment in the US, packet loss rate varied frmo between 4 – 15%. Pretty high in other words. They try to reset the port, hard-code both sides, but nothing works. And this is the port that the firewall is connected to.

We finally switch to the backup firewall. This destroys the routing and so that has to be fixed. But finally it is and suddenly response is much better and PINGing the gateway from the proxy is at the expected < 1 msec. Not content to leave it at that, they persist to fix the broken port. Thy reason that the most likely problem after the test results are in is a bad cable. He explains that in 1 gig communication all 8 wires have to be good. If just one breaks you can't run 1 gig. Now they have to figure out who has access to this 3rd-party data center where the equipment is hosted! They finally identify an employee with access and get him to come with different cable lengths (of course no one knows the layout to actually know how long the cable is or how close the equipment is). The cable is replaced and both sides come up 1 gig auto-negotiated! They reverted back to the primary firewall. So in the end the employee without access to anything figured this out. Amazing. The intense activity on this problem lasted from 9 PM to about 4 AM the following morning. The history
Actually it would be a pretty decent turn-around by this company’s standards if the problem had been resolved in seven hours. But actually it had been ongoing for a couple weeks beforehand. It seemed that the total data usage was capped at 100 mbits by that bad cable and so it wasn’t a total outage or totally obvious where to look.

Case closed!

Conclusion
I think a lot of people on the call had the expertise to solve the problem, and much more quickly than it was solved. But no one had sufficient access to do debugging his own way and needed cooperation of others. The telecom who owns and manages the switches particularly disappointed in their performance. Not in the individuals, who seemed to be competent, but in their processes which permitted faulty and incomplete documentation, as well as lack of familiarity with the particular infrastructure – like you’ve just hired a smart network technician and communicated nothing about what he is supporting.
Keep good people on staff! Give them as much access as possible.