Categories
Network Technologies Raspberry Pi

Dig for Windows or Raspberry Pi

Windows

Intro
Dig is a really useful networking tool. I use it several times a day. But always on Linux where it’s usually built-in. On Raspberry Pi’s raspbian you can install it with a simple apt-get install dnsutils. Then I learned it wasn’t hard at all to install on Windows, especially as a fairly minimalist installation that just puts files on your PC and makes no changes to the Registry, which is all you really need for light use.

The details
Go to http://www.isc.org/downloads/. Expand BIND.
ISC-BIND-download-screen
Click download button for the current stable release.
Pick the win-64-bit link (because chances are you’re running Windows 64 bit these days) and wait for download to complete.
Open up zip file.
Unzip or extract all files to (this is my suggestion) c:\apps\bind.

To run it
Open a command window. Probably easiest way is hold down Windows key + r and type in cmd. In CMD window simply type \apps\bind\dig to run dig like you do on Linux.

Example commands
Example 1, Resolve address for google.com

C:\> \apps\bind\dns google.com

; <<>> DiG 9.9.8-P2 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24929
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
 
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.                    IN      A
 
;; ANSWER SECTION:
google.com.             88      IN      A       173.194.207.113
google.com.             88      IN      A       173.194.207.139
google.com.             88      IN      A       173.194.207.138
google.com.             88      IN      A       173.194.207.101
google.com.             88      IN      A       173.194.207.102
google.com.             88      IN      A       173.194.207.100
 
;; Query time: 41 msec
;; SERVER: 192.168.2.1#53(192.168.2.1)
;; WHEN: Mon Jan 11 12:16:17 Eastern Standard Time 2016
;; MSG SIZE  rcvd: 135

This gives all kinds of useful information – what your default DNS server is (at the bottom – mine is 192.168.2.1), how long the query took *this one: 41 msec), whether the answer is authoritative or not (no AA flag here, so this is not an authoritative answer), as well as the answer to the question posed.

Example 2, Resolve nameserver records for the domain amazon.com using Google’s DNS server 8.8.8.8 over TCP from our local IP address of 192.168.2.3

We started out slow, but this example throws the kitchen sink at you to show the power of dig!

C:\> \apps\bind\dig +tcp -b 192.168.2.3 ns amazon.com @8.8.8.8

; <<>> DiG 9.9.8-P2 <<>> +tcp -b 192.168.2.3 ns amazon.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64444
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
 
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;amazon.com.                    IN      NS
 
;; ANSWER SECTION:
amazon.com.             3599    IN      NS      ns3.p31.dynect.net.
amazon.com.             3599    IN      NS      ns4.p31.dynect.net.
amazon.com.             3599    IN      NS      ns1.p31.dynect.net.
amazon.com.             3599    IN      NS      pdns1.ultradns.net.
amazon.com.             3599    IN      NS      pdns6.ultradns.co.uk.
amazon.com.             3599    IN      NS      ns2.p31.dynect.net.
 
;; Query time: 50 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Jan 11 12:27:26 Eastern Standard Time 2016
;; MSG SIZE  rcvd: 188

The only problem is that I don’t think the TCP option actually worked – I gotta run wireshark to verify. On Linux it definitely works! Not sure what’s wrong with windows. But the other options are working as designed.

OK, wireshark install is failing, but I ran tcpdump on a DNS server I run and confirmed that indeed the +tcp option is working forcing dig to use TCP communication for those queries.

Raspberry Pi

I believe you do

$ sudo apt-get install bind9-dnsutils

At least on a generic Debian system that works. I have to confirm on RPi still.

Conclusion
We’ve demonstrated a low-impact way to install dig for Windows and shown some examples of using it.

References and related

Current BIND link from ISC: https://downloads.isc.org/isc/bind9/9.16.8/BIND9.16.8.x64.zip

Or…you get get dig through a Cygwin installation. I’ve written about Cygwin here: Cygwin. Or just go to cygwin.com.

Categories
Network Technologies TCP/IP

Spin up your own VPN with OpenVPN

Intro
I recently visited a foreign country where I was unable to watch an Amazon Prime Original show because of my location. Annoyed, I decided then and there to investigate OpenVPN. I am an ideal candidate – I already run my own Linux server in the Amazon cloud, and I know Linux and networking, so I’ve pretty much got all the ingredients already present. The software is free and I will incur no additional cost if I ever do get it working since I already pay for my server which is primarily used as a low-demand web server. Little did I know what I was getting myself into!

The details
I seem to have made every mistake in the book, but I have a strong grasp of the networking involved so I was confident in my general approach. Seeing how configurable the software is I decided to ramp up my efforts incrementally, introducing more and more features until I arrived to the minimum desired (still not there, by the way, but getting close!).

The package on various OSes
Conveniently, openvpn is an installable package on SLES and Centos. On SLES a
zypper install openvpn suffices whereas in CentOS a yum install openvpn does the trick. In Debian Linux such as Raspbian, an apt-get install openvpn works to install it.

Then look at the examples at the bottom of the man page for openvpn. I picked two servers where I have root and tried to replicate their most basic example. I think this is really important to crawl before you run. To paraphrase it:

Example 1: A simple tunnel without security
       On may:
 
              openvpn --remote june.kg --dev tun1 --ifconfig 10.4.0.1 10.4.0.2 --verb 9
 
       On june:
 
              openvpn --remote may.kg --dev tun1 --ifconfig 10.4.0.2 10.4.0.1 --verb 9
 
       Now verify the tunnel is working by pinging across the tunnel.
 
       On may:
 
              ping 10.4.0.2
 
       On june:
 
              ping 10.4.0.1

june.kg and may.kg are the IP addresses or FQDNs of the june and may server, respectively.

If you installed from a package you probably don’t need these preliminary steps they mention:

              mknod /dev/net/tun c 10 200
 
              modprobe tun

I checked it by a directory listing of /dev/net/tun:

crw-rw-rw- 1 root root 10, 200 Jan  6 08:35 /dev/net/tun

and running modprobe tun for good measure.

And, yes, their basic example worked. How? The command creates a virtual adapter, tun0, specially designed for tunnels, which records the private tunnel IP of the server itself as well as the IP of the tunnel endpoint on the other server.

It’s so cool. What’s not well explained is that you ought to choose completely private IPs for building your tunnels that don’t interfere with your other private IPs. You’ll see I eventually settled on 172.27.28.29 – I’ve never seen that range used anywhere.

You quit running the openvpn command and the tun0 adapter is destroyed. It’s very tidy.

A small word about routing for now
What about routing. How does that work? What you probably didn’t appreciate is that in this simple example although you can ping each other using the tunnel IP, you really can’t do any more than that unless you start to introduce additional routes. So as is it’s a long way from where we need to be. More on routing later. Check your route table with a netstat -rn

Finding the right example is surprisingly hard
For such a popular program you’d think examples of what I’m trying to do – change the effective IP of my laptop – would abound and implementation would be a piece of cake. But alas, nothing could be further from the truth. I’ve yet to see a complete example so I cobbled together things from various places.

You gotta understand that for a lot of people with enough tech-savvy to write up what they did, I guess they were just tickled pink to be able to tunnel through their home router and access their home networks. That’s fine and all, but it’s not all that pertinent to my usage, where the routing considerations are pretty different.

Anywho, this article is really helpful, though not sufficient by itself:

https://openvpn.net/index.php/open-source/documentation/miscellaneous/78-static-key-mini-howto.html

That describes a single server/client setup with simple security.

Server config
After a lot of debugging and incremental steps, I’m currently using this file with filepath /etc/openvpn/server.conf on my Amazon AWS server:

# -DrJ 1/5/16
# simple one server/one client setup...
# https://openvpn.net/index.php/open-source/documentation/miscellaneous/78-static-key-mini-howto.html
# static.key generated with openvpn --genkey --secret static.key
# iptables NAT discussion: http://www.karlrupp.net/en/computer/nat_tutorial
# using (simple udp test worked): iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
# list: iptables -t nat -L -v
# dig on RasPi: from dnsutils
dev tun
# 1194 is default, but...
port 2096
ifconfig 172.27.28.29 172.27.28.30
secret static.key
#use compresison
comp-lzo
# resist failures
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
# run as daemon
user nobody
group nobody
daemon
proto tcp-server

An experimental client config file currently looks like this:

# for understanding what openvpn can do...
# simple single server/client setup from https://openvpn.net/index.php/open-source/documentation/miscellaneous/78-static-ke
y-mini-howto.html
# - DrJ 1/6/16
remote drjohnstechtalk.com
# 1994 is default but I'll use this one...
port 2096
dev tun
ifconfig 172.27.28.30 172.27.28.29
secret static.key
# resist failures
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
#use compresison
comp-lzo
# for testing
proto tcp-client
# a test host route
#route 172.16.0.23
# a test network route which includes endpoint
route 172.16.0.0 255.240.0.0
# stands for broad, Internet route which may overlap our route to the openvpn server: doesn't work by itself!
route drjohnstechtalk.com 255.255.255.255 net_gateway
route 50.17.188.0 255.255.255.0

Works through proxy
Amazingly, I can confirm openvpn works through a standard http proxy. This is both cool and a little scary. To the above config file I added something like this:

# use proxy which requires basic authentication
http-proxy proxy-1.johnstechtalk.com 8080 authfile

where authfile is in the same directory as client.conf (/etc/eopnvpn) and has the proxy username and password on two separate lines.

You have to run your server in TCP mode in order to access it from a client that’s behind an http proxy however, hence the proto tcp-server in the config line of the server and the proto tcp-client line in the client config. I’ll experiment with that to see if that’s costing performance.

A little more on routing
Why the circumspect routes? I’m deathly afraid of locking myself out of these servers by implementing the wrong routes! And I learned a lot bootstrapping my way to broader routes. The thing is, I noticed my AWS server uses this private IP address for its DNS server: 172.16.0.23 (cat /etc/resolv.conf, for example). So I realized that I could introduce a host route to that IP on the openvpn client to begin the arduous process of building up and testing real routing. How to test? Simple. With commands like

> dig ns johnstechtalk.com @172.16.0.23

on the client.

Dig is a useful networking tool. I describe installing it on Windows systems in this post.

Point of concern over performance
Even before expanding the routes I am concerned about TCP performance. I noticed that when I add the +tcp option to dig to force the query to use TCP I get a big performance penalty. a regular query that takes 60 msec to the AWS DNS server from my PC takes anywhere from 200 – 400 msec over TCP! Now there are a bunch more packets a back-and-forth, but all that aside and still there is a disconcerting performance hit of 100 msec or so. By contrast the enterprise-class VPN provided by Juniper suffers from no such TCP performance penalty – I know because I tested dig over a Juniper VPN using the JunOS Pulse client.

Up our game, try to run as a full server w/ DHCP and everything
I’ll show the config file below. Let me just mention that my first attempt didn’t work because I continued to use a shared secret (the secret declaration), but that is incompatible with a new statement I introduced on the server:

server 172.27.28.32 255.255.255.224

So I gotta bite the bullet and get my PKI infrastructure up and running. In the old days they bundled easy-rsa with openvpn. Now you have to install it separately. I was able to do a yum install easy-rsa on my CentOS instance.

Hey, easy-rsa is pretty cool – I might be able to use that for other things. I always wanted to know how to create my own CA! The instructions in the Howto are so complete that there’s really no need to go over that here. HowTo link.

I copied my /usr/share/easy-rsa/2.0/* files to /etc/openvpn/rsa and did all the pki-buildingwork there. But I don’t want to go crazy with encyrption so I downgraded diffie-hellman from 2048 to 1024 bits:

$ openssl dhparam -out dh1024.pem 1024

Full Monty – doing a full VPN
Enough warm-up. I tried full VPN and for some reason hit the right configuration on the first try.

Here’s my server.conf file on my Amazon AWS server.

# -DrJ 1/13/16
# multi-client server from https://openvpn.net/index.php/open-source/documentation/howto.html#examples
# static.key generated with openvpn --genkey --secret static.key
# iptables NAT discussion: http://www.karlrupp.net/en/computer/nat_tutorial
# using (simple udp test worked): iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
# list: iptables -t nat -L -v
port 1194
proto udp
dev tun
ca rsa/keys/ca.crt
cert rsa/keys/server.crt
key rsa/keys/server.key  # This file should be kept secret
dh rsa/keys/dh1024.pem
# pool of addresses for clients and server. server gets 172.27.28.33
server 172.27.28.32 255.255.255.224
ifconfig-pool-persist ipp.txt
# very experimental
push "redirect-gateway def1 bypass-dhcp"
# just use closest Google DNS server
push "dhcp-option DNS 8.8.8.8"
# resist failures
keepalive 10 60
ping-timer-rem
# use compression
comp-lzo
# only allow two clients max for now
max-clients 2
user nobody
group nobody
persist-key
persist-tun
status openvpn-status.log
# verbosity level. 0 - all but fatal errors, 9 - extremely verbose
verb 3

And here’s my Windows 10 client file.

# from https://openvpn.net/index.php/open-source/documentation/howto.html#examples
# - DrJ 1/17/16
client
dev tun
proto udp
remote drjohnstechtalk.com 1194
nobind
# resist failures
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
#use compresison
comp-lzo
# SSL/TLS parmas
ca ca.crt
cert client1.crt
key client1.key
# stands for broad, Internet route which may overlap our route to the openvpn server: doesn't work by itself!
route drjohnstechtalk.com 255.255.255.255 net_gateway
verb 3

I was a bit concerned the DHCP stuff might not work, but it did, including assigning a Google DNS server at 8.8.8.8.

I generated the client1.crt and client1.key using easy-rsa. These plus ca.crt I copied down to my laptop. For security I then deleted client1.key on my server (so no one else can grab it).

Performance
Performance is amazing. There no longer is the penalty for doing dig over tcp. speedtest.net actually shows better results when I am connected over my VPN! This is a totally unexpected surprise. I guess that’s due to the compression, using UDP and not using overly strong encryption.

Speedtest results
speedtest.net results without VPN, i.e., regular mode: 8.0 mbps download, 1.0 mbps upload. With VPN I got 11.0 mbps download and 1.2 mbps upload.

A platform that didn’t work
You’ll see from my other blogs that I am a Raspberry Pi fan. so naturally I wanted to bring up a VPN client on the Raspberry Pi. I am stuck on this point however because unlike the SLES Linux I tested with, Raspbian brings up a tun interface but then drops the wlan0 IP address so all communication to it is lost. Maybe it would work better wired – I’ll try it and post the results.

Still more about routing
I never understood how the openvpn examples were supposed to work until I had to implement them. Indeed they generally wouldn’t work until you address some routing and NAT issues. There is no magic. I did traces to show my client was trying to communicate with its assigned private IP of 172.27.28.29. Well that’s never going to work. You have to make sure routing is enabled on your server:

On Linux, use the command:
$ echo 1 > /proc/sys/net/ipv4/ip_forward

But that’s not sufficient either. That doesn’t address that the packets from your client have the wrong IP address as far as the outside world is concerned. So you have to NAT (address translate) those packets. I don’t like iptables but this turned out to be easier than expected. As mentioned in my server config file you run:

$ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

This is the boiled down summary from this helpful article.

Don’t forget that to look at your NAT rules you need a command like this:

$ iptables -t nat -L -v

not simply an iptables -L.

Note that this is a hide NAT – I am not making the openvpn client visible as a server on the Internet. That’s a lot harder with IP addresses being in scarce supply.

Does it work to solve my original problem? Well I’m not in a foreign country to run that test, but I can vouch that I can run Amazon Prime through openvpn. So I expect it will work overseas as well.

It works so well how do I know I’m really running through openvpn?

Many ways. For instance my routing table. From a CMD window:

C:> netstat -rn

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0      192.168.2.1      192.168.2.8     25
          0.0.0.0        128.0.0.0     172.27.28.37     172.27.28.38     20
    50.17.188.196  255.255.255.255      192.168.2.1      192.168.2.8     25
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    306
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    306
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    306
        128.0.0.0        128.0.0.0     172.27.28.37     172.27.28.38     20
      169.254.0.0      255.255.0.0         On-link       192.168.2.8    306
  169.254.255.255  255.255.255.255         On-link       192.168.2.8    281
     172.27.28.33  255.255.255.255     172.27.28.37     172.27.28.38     20
     172.27.28.36  255.255.255.252         On-link      172.27.28.38    276
     172.27.28.38  255.255.255.255         On-link      172.27.28.38    276
     172.27.28.39  255.255.255.255         On-link      172.27.28.38    276
      192.168.2.0    255.255.255.0         On-link       192.168.2.8    281
      192.168.2.8  255.255.255.255         On-link       192.168.2.8    281
    192.168.2.255  255.255.255.255         On-link       192.168.2.8    281
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    306
        224.0.0.0        240.0.0.0         On-link       192.168.2.8    281
        224.0.0.0        240.0.0.0         On-link      172.27.28.38    276
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    306
  255.255.255.255  255.255.255.255         On-link       192.168.2.8    281
  255.255.255.255  255.255.255.255         On-link      172.27.28.38    276

Specifically note the two routes to 0.0.0.0 netmask 128.0.0.0 and 128.0.0.0 netmask 128.0.0.0 via gateway 172.27.28.37. That’s just what this experimental command in my server config file was supposed to do:

push "redirect-gateway def1 bypass-dhcp"

namely, create two broad routes to the entire Internet, just slightly more specific than a default route so they take precedence over the default route – I think it’s a clever idea. And of course ipconfig shows my private IP address:

Ethernet adapter Ethernet 2:
 
   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::8871:c910:185e:953b%5
   IPv4 Address. . . . . . . . . . . : 172.27.28.38
   Subnet Mask . . . . . . . . . . . : 255.255.255.252
   Default Gateway . . . . . . . . . :

References and related
No patience to roll your own? Five top commercial VPN offerings are described here: http://www.itworld.com/article/3152904/security/top-5-vpn-services-for-personal-privacy-and-security.html
A lightweight dig installation method for Windows is described here.
My article about iptables.

Categories
Network Technologies Python

Tips on using scapy for custom IP packets

Intro
scapy is an IP packet customization tool that keeps coming up in my searches so I could no longer avoid it. I was unnecessarily intimidated because it was built around python and the documentation is a little strange. But I’m warming up to it now…

The details
Download and install
CentOS
Just go to scapy.net and it will propose to you to download the .zip file. I got scapy-2.3.1.zip. Then you can unzip it; change directory to the scapy-2.3.1 sub-directory and run

$ sudo python setup.py install

Debian systems such as Raspberry Pi
Simple. It’s just:

$ sudo apt-get install python-scapy

Usage modes
scapy can be called from within python, but if you’re afraid to do that like I am, you can run it from the command line which simply throws you into a python shell. I’m finding that a lot more comfortable as I slowly learn python syntax and some useful shortcuts.

Example 1
The background
Let’s cut to the chase and do something hard first. Remember how we got those Cisco Jabber packets with DSCP set, causing Cisco Jabber to not work for some users? The long-term solution according to that post is to turn off the DSCP flag for all packets on the Internet router. So we want to be able to generate packets under our control with that flag set so we can see if we’ve managed to turn it off correctly.

DSCP value occupies the first 6 bits of the 8-bit tos field. The packets we got from Cisco had DSCP of 0x2e which is Expedited Forwarding (EF), and if you do the math that corresponds to tos of 0xb8 which in decimal is 184.

$ sudo scapy
>>> sr(IP(dst="50.17.188.196",tos=184)/TCP(dport=80,sport=4025))

Begin emission:
....Finished to send 1 packets.
.*
Received 6 packets, got 1 answers, remaining 0 packets
(<Results: TCP:1 UDP:0 ICMP:0 Other:0>, <Unanswered: TCP:0 UDP:0 ICMP:0 Other:0>)
>>>

Instead of the call to sr you can simply use send. Breaking this down, I’m testing against my drjohns server with IP 50.17.188.196. tos is a property of an IP packet so it’s included as a keyword argument to the IP function. The “/” following the IP function is funny syntax but it somehow says that more properties at different layers are coming. So in the TCP section I used keyword arguments and set source port of 4025 and destination port of 80. What I observed is that this will send a SYN packet even though I didn’t explicitly identify that.

Want to have a random source port like “real” packets? Then use this:

$ >>> sr(IP(dst="50.17.188.196",tos=184)/TCP(dport=80,sport=RandShort()))

Look for it
I know tcpdump better so I look for my packet with that tool like this:

$ sudo tcpdump -v -n -i eth0 host 71.2.39.115 and port 80

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
19:33:29.749170 IP (tos 0xb8, ttl 39, id 1, offset 0, flags [none], proto TCP (6), length 44)
    71.2.39.115.partimage > 10.185.21.116.http: Flags [S], cksum 0xd97b (correct), seq 0, win 8192, options [mss 1460], length 0
19:33:29.749217 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 44)
    10.185.21.116.http > 71.2.39.115.partimage: Flags [S.], cksum 0x3e39 (correct), seq 3026513916, ack 1, win 5840, options [mss 1460], length 0
19:33:29.781193 IP (tos 0x0, ttl 41, id 19578, offset 0, flags [DF], proto TCP (6), length 40)

Interpretation
Our tos was wiped clean by the time our generated packet was received by Amazon AWS. This was a packet I sent from my home using my Raspberry Pi. So likely my ISP CenturyLink is removing QOS from packets its residential customers send out. With some ISPs and business class service I have seen the tos field preserved exactly. When sent from Amazon AWS I saw the field value altered, but not set to 0!

Example 2, ping
>>> sr(IP(dst="8.8.8.8")/ICMP())

Begin emission:
Finished to send 1 packets.
.*
Received 2 packets, got 1 answers, remaining 0 packets
(<Results: TCP:0 UDP:0 ICMP:1 Other:0>, <Unanswered: TCP:0 UDP:0 ICMP:0 Other:0>)

Getting info on return packet
$ >>> sr1(IP(dst="drjohnstechtalk.com",tos=184)/TCP(dport=80,sport=RandShort()))

Begin emission:
...............................................................................................Finished to send 1 packets.
...........................................*
Received 139 packets, got 1 answers, remaining 0 packets
<IP  version=4L ihl=5L tos=0x0 len=44 id=0 flags=DF frag=0L ttl=25 proto=tcp chksum=0xe1d7 
src=50.17.188.196 dst=144.29.1.2 options=[] |<TCP  sport=http dport=17176 seq=3590570804 ack=1 dataofs=6L reserved=0L flags=SA window=5840 chksum=0x24b0 urgptr=0 options=[('MSS', 1460)] |<Padding  load='\x00\x00' |>>>

Note that this tells me about the return packet, which is a SYN ACK. So it tells me my SYN packet must have been sent from port 17176 (it changes every time because I’ve included sport=RandShort()). Each “.” in the response indicates a packet hitting the interface. I guess it’s promiscuously listening on the interface.

Hitting a closed port

$ >>> sr1(IP(dst="drjohnstechtalk.com",tos=184)/TCP(dport=81,sport=RandShort()))

Begin emission:
....................................................................................
.........................................................................................................Finished to send 1 packets.
...............................................................................
.................................................................................
.........................................................................................

Basically those dots are going to keep going forever until you type -C, because there will be no return packet if something like a firewall is dropping your packet, or the returned packet.

Useful shortcuts
The scapy commands look pretty daunting at first, right? And too much trouble to type in, right? Just get it right once and you’re set. In typical networking debugging you’ll be running such test packets multiple times. Because it’s basically a python shell, you can use the up arrow key to recall the previous thing, or hit it multiple times to scroll through your previously typed commands. And even if you exit and return, it still remembers your command history so you can hit the up-arrow to get back to your commands from previous sesisons and previous days.

References and related
This scapy for dummies guide is very well written.
I’m finding this python tutorial really helpful.
DSCP and explanation of Cisco Jabber not working is described here.
A simpler tool which is fine for most things is nmap. I provide some real-world examples in this blog post.

Categories
Admin Network Technologies Proxy Security TCP/IP Web Site Technologies

The IT Detective Agency: Cisco Jabber stopped working for some using WAN connections

Intro
This is probably the hardest case I’ve ever encountered. It’s so complicated many people needed to get involved to contribute to the solution.

Initial symptoms

It’s not easy to describe the problem while providing appropriate obfuscation. Over the course of a few days it came to light that in this particular large company for which I consult many people in office locations connected via an MPLS network were no longer able to log in to Cisco Jabber. That’s Cisco’s offering for Instant Messaging. When it works and used in combination with Cisco IP phones it’s pretty good – has some nice features. This major problem was first reported November 17th.

Knee-jerk reactions
Networking problem? No. Network guys say their networks are running fine. They may be a tad overloaded but they are planning to route Internet over the secondary links so all will be good in a few days.
Proxy problem? Nope. proxy guys say their Bluecoat appliances are running fine and besides everyone else is working.
Application problem? Application owner doesn’t see anything out of the ordinary.
Desktop problem? Maybe but it’s unclear.

Methodology
So of the 50+ users affected I recognized two power users that I knew personally and focussed on them. Over the course of days I learned:
– problem only occurs for WAN (MPLS) users
– problem only occurs when using one particular proxy
– if a user tries to connect often enough, they may eventually get in
– users can get in if they use their VPN client
– users at HQ were not affected

The application owner helpfully pointed out the URL for the web-based version of Cisco Jabber: https://loginp.webexconnect.com/… Anyone with the problem also could not log in to this site.

So working with these power users who patiently put up with many test suggestions we learned:

– setting the PC’s MTU to a small value, say 512 up to 696 made it work. Higher than that it generally failed.
– yet pings of up to 1500 bytes went through OK.
– the trace from one guy’s PC showed all his packets re-transmitted. We still don’t understand that.
– It’s a mess of communications to try to understand these modern, encrypted applications
– even the simplest trace contained over 1000 lines which is tough when you don’t know what you’re looking for!
– the helpful networking guy from the telecom company – let’s call him “Regal” – worked with us but all the while declaring how it’s impossible that it’s a networking issue
– proxy logs didn’t show any particular problem, but then again they cannot look into SSL communication since it is encrypted
– disabling Kaspersky helped some people but not others
– a PC with the problem had no problem when put onto the Internet directly
– if one proxy associated with the problem forwarded the requests to another, then it begins to work
– Is the problem reproducible? Yes, about 99% of the time.
– Do other web sites work from this PC? Yes.

From previous posts you will know that at some point I will treat every problem as a potential networking problem and insist on a trace.

Biases going in
So my philosophy of problem solving which had stood the test of time is either it’s a networking problem, or it’s a problem on the PC. Best is if there’s a competition of ideas in debugging so that the PC/application people seek to prove beyond a doubt it is a networking problem and the networking people likewise try to prove problem occurs on the PC. Only later did I realize the bias in this approach and that a third possibility existed.

So I enthused: what we need is a non-company PC – preferably on the same hardware – at the same IP address to see if there’s a problem. Well we couldn’t quite produce that but one power user suggested using a VM. He just happened to have a VM environment on his PC and could spin up a Windows 7 Professional generic image! So we do that – it shows the problem. But at least the trace form it is a lot cleaner without all the overhead of the company packages’ communication.

The hard work
So we do the heavy lifting and take a trace on both his VM with the problem and the proxy server and sit down to compare the two. My hope was to find a dropped packet, blame the network and let those guys figure it out. And I found it. After the client hello (this is a part of the initial SSL protocol) the server responds with its server hello. That packet – a largeish packet of 1414 bytes – was not coming through to the client! It gets re-transmitted multiple times and none of the re-transmits gets through to the PC. Instead the PC receives a packet the proxy never sent it which indicates a fatal SSL error has occurred.

So I tell Regal that look there’s a problem with these packets. Meanwhile Regal has just gotten a new PC and doesn’t even have Wireshark. Can you imagine such a world? It seems all he really has is his tongue and the ability to read a few emails. And he’s not convinced! He reasons after all that the network has no intelligent, application-level devices and certainly wouldn’t single out Jabber communication to be dropped while keeping everything else. I am no desktop expert so I admit that maybe some application on the PC could have done this to the packets, in effect admitting that packets could be intercepted and altered by the PC even before being recorded by Wireshark. After all I repeated this mantra many times throughout:

This explanation xyz is unlikely, and in fact any explanation we can conceive of is unlikely, yet one of them will prove to be correct in the end.

Meanwhile the problem wasn’t going away so I kludged their proxy PAC file to send everyone using jabber to the one proxy where it worked for all.

So what we really needed was to create a span port on the switch where the PC was plugged in and connect a 2nd PC to a port enabled in promiscuous mode with that mirrored traffic. That’s quite a lot of setup and we were almost there when our power user began to work so we couldn’t reproduce the problem. That was about Dec 1st. Then our 2nd power user also fell through and could no longer reproduce the problem either a day later.

10,000 foot view
What we had so far is a whole bunch of contradictory evidence. Network? Desktop? We still could not say due to the contradictions, the likes of which I’ve never witnessed.

Affiliates affected and find the problem
Meanwhile an affiliate began to see the problem and independently examined it. They made much faster progress than we did. Within a day they found the reason (suggested by their networking person from the telecom, who apparently is much better than ours): the server hello packet has the expedited forwarding (EF) flag set in the differentiated code services point (DSCP) section of the IP header.

Say what?
So I really got schooled on this one. I was saying It has to be an application-aware “something” on the network or PC that is purposefully messing around with the SSL communication. That’s what the evidence screamed to me. So a PC-based firewall seemed a strong contender and that is how Regal was thinking.

So the affiliate explained it this way: the company uses QOS on their routers. Phone (VOIP) gets priority and is the only application where the EF bit is expected to be set. VOIP packets are small, by the way. Regular applications like web sites should just use the default QOS. And according to Wikipedia, many organizations who do use QOS will impose thresholds on the EF pakcets such that if the traffic exceeds say 30% of link capacity drop all packets with EF set that are over a certain size. OK, maybe it doesn’t say that, but that is what I’ve come to understand happens. Which makes the dropping of these particular packets the correct behaviour as per the company’s own WAN contract and design. Imagine that!

Smoking gun no more
So now my smoking gun – blame it on the network for dropped packets – is turned on its head. Cisco has set this EF bit on its server hello response on the loginp.webexconnect.com web site. This is undesirable behaviour. It’s not a phone call after all which requires a minimum jitter in packet timing.

So next time I did a trace I found that instead of EF flag being set, the AF (Assured Forwarding) flag was set. I suppose that will make handling more forgiving inside the company’s network, but I was told that even that was too much. Only default value of 0 should be set for the DSCP value. This is an open issue in Cisco’s hands now.

But at least this explains most observations. Small MTU worked? Yup, those packets are looked upon more favorably by the routers. One proxy worked, the other did not? Yup, they are in different data centers which have different bandwidth utilization. The one where it was not working has higher utilization. Only affected users are at WAN sites? Yup, probably only the WAN routers are enforcing QOS. Worked over VPN, even on a PC showing the problem? Yup – all VPN users use a LAN connection for their proxy settings. Fabricated SSL fatal error packet? I’m still not sure about that one – guess the router sent it as a courtesy after it decided to drop the server hello – just a guess. Problem fixed by shutting down Kaspersky? Nope, guess that was a red herring. Every problem has dead ends and red herrings, just a fact of life. And anyway that behaviour was not very consistent. Problem started November 17th? Yup, the affiliate just happened to have a baseline packet trace from November 2nd which showed that DSCP was not in use at that time. So Cisco definitely changed the behaviour of Cisco Jabber sometime in the intervening weeks. Other web sites worked, except this one? Yup, other web sites do not use the DSCP section of the IP header so it has the default value of 0.

Conclusion
Cisco has decided to remove the DSCP flag from these packets, which will fix everything. Perhaps EF was introduced in support of Cisco Jabber’s extended use as a soft phone??? Then this company may have some re-design of their QOS to take care of because I don’t see an easy solution. Dropping the MTU on the proxy to 512 seems pretty drastic and inefficient, though it would be possible. My reading of TCP is that nothing prevents QOS from being set on any sort of TCP packet even though there may be a gentleman’s agreement to not ordinarily do so in all except VOIP packets or a few other special classes. I don’t know. I’ve really never looked at QOS before this problem came along.

The company is wisely looking for a way to set all packets with DSCP = 0 on the Intranet, except of course those like VOIP where it is explicitly supposed to be used. This will be done on the Internet router. In Cisco IOS it is possible with a policy map and police setting where you can set set-dscp-transmit default. Apparently VPN and other things that may check the integrity of packets won’t mind the DSCP value being altered – it can happen anywhere along the route of the packet.

Boy applications these days are complicated! And those rare times when they go wrong really require a bunch of cooperating experts to figure things out. No one person holds all the expertise any longer.

My simplistic paradigm of its either the PC or the network had to make room for a new reality: it’s the web site in the cloud that did them in.

Could other web sites be similarly affected? Yes it certainly seems a possibility. So I now know to check for use of DSCP if a particular web site is not working, but all others are.

References and related
This Wikipedia article is a good description of DSCP: https://en.wikipedia.org/wiki/Differentiated_services

Categories
Admin Network Technologies

The IT Detective agency: bad PING times explained

Intro
In a complicated corporate environment somewhat unusual problems can be made extremely difficult to debug as there may be many technicians involved, each one knowing just their piece of the infrastructure. Such was the case when i consulted for a problem in which a company reported slow Internet response for users in Asia.

The details
In preparation for an evening call I managed to get enough access to be able to log into the proxy server their Asian users use. Just issuing regular commands was slow, often hanging for many seconds.

The call
So we had the call at 9 PM. These things are always pretty amazing in the sense of How do corporations ever get anything done when they’ve outsourced so much? So there was a representative from the firewall team from Germany, representative from the telecom in the US, a couple representatives from the same telecom but stationed in Asia, an employee who oversees the telecom vendor in the US and me, representing the proxy service normally handled by a group in Europe. The common language was English, of course, though that doesn’t mean that everyone was easy to understand for us native speakers. No one on the call had any real familiarity with the infrastructure. We were all essentially reverse engineering it from a diagram the telecom produced.

The firewall guy and I both noticed that PINGs from the proxy to its gateway (which was a firewall) took 50 msec. I’ve never seen that. Same for another piece of equipment using that same firewall interface. The telecom, which was responsible for the switches, could not actually log in to all of them. Only the firewall guy could reach a few of them. And he eventually figured out the diagram we were all using was somewhat wrong and different switches were in use for some of the equipment than indicated. Which was important because we wanted to check the interface status on the switch.

So imagine this. The guy with no access to anything, the vendor overseer in the US, patiently asks for the results of a few commands to be shared (by email) with the group. He gets the results of show interface status and identifies one port as looking off. It’s listed as 100 mbit instead of 1 gig. In addition to the strange PNIG times, when we PINGed from equipment in the US, packet loss rate varied frmo between 4 – 15%. Pretty high in other words. They try to reset the port, hard-code both sides, but nothing works. And this is the port that the firewall is connected to.

We finally switch to the backup firewall. This destroys the routing and so that has to be fixed. But finally it is and suddenly response is much better and PINGing the gateway from the proxy is at the expected < 1 msec. Not content to leave it at that, they persist to fix the broken port. Thy reason that the most likely problem after the test results are in is a bad cable. He explains that in 1 gig communication all 8 wires have to be good. If just one breaks you can't run 1 gig. Now they have to figure out who has access to this 3rd-party data center where the equipment is hosted! They finally identify an employee with access and get him to come with different cable lengths (of course no one knows the layout to actually know how long the cable is or how close the equipment is). The cable is replaced and both sides come up 1 gig auto-negotiated! They reverted back to the primary firewall. So in the end the employee without access to anything figured this out. Amazing. The intense activity on this problem lasted from 9 PM to about 4 AM the following morning. The history
Actually it would be a pretty decent turn-around by this company’s standards if the problem had been resolved in seven hours. But actually it had been ongoing for a couple weeks beforehand. It seemed that the total data usage was capped at 100 mbits by that bad cable and so it wasn’t a total outage or totally obvious where to look.

Case closed!

Conclusion
I think a lot of people on the call had the expertise to solve the problem, and much more quickly than it was solved. But no one had sufficient access to do debugging his own way and needed cooperation of others. The telecom who owns and manages the switches particularly disappointed in their performance. Not in the individuals, who seemed to be competent, but in their processes which permitted faulty and incomplete documentation, as well as lack of familiarity with the particular infrastructure – like you’ve just hired a smart network technician and communicated nothing about what he is supporting.
Keep good people on staff! Give them as much access as possible.

Categories
Admin Network Technologies

General failure PING error partially explained

Intro
My Dell PC running Windows 7 was going along fine until one day I noticed I couldn’t get out to the Internet from any browser. As a network specialist I reacted in my standard knee-jerk fashion and tried a few simple network commands from the command prompt to get a better idea of what’s going on.

The details

Here is the first thing I tried – to ping one of Google’s DNS servers which always respond so if you don’t get a response there’s something wrong on your side.

C:\Users\DrJ>ping 8.8.8.8

Pinging 8.8.8.8 with 32 bytes of data:
Request timed out.
General failure.
General failure.
General failure.
 
Ping statistics for 8.8.8.8:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

Weird. Never seen that before.

Then it gets stranger
Then I tried to ping my local router. But first I had to find its IP address:

C:\Users\DrJ>netstat -rn

blah, blah
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0    192.168.0.254    192.168.0.102     25
blah, blah

Then ping it:

C:\Users\Dad>ping 192.168.0.254

Pinging 192.168.0.254 with 32 bytes of data:
Reply from 192.168.0.254: bytes=32 time=2ms TTL=64
Reply from 192.168.0.254: bytes=32 time=1ms TTL=64
Reply from 192.168.0.254: bytes=32 time=1ms TTL=64
Reply from 192.168.0.254: bytes=32 time=1ms TTL=64
 
Ping statistics for 192.168.0.254:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 2ms, Average = 1ms

That’s a normal response, which under the circumstances is also weird. So we can’t ping the Internet but we can ping our local router. Sounds like a problem with either my Internet connection or my home router, right? Yeah, maybe, except for these two important facts. Routers have built-in simple diagnostic tools like PING. So I logged into the local router and ran a ping to 8.8.8.8 and it worked just fine. OK that’s one. Two is that you get a different failure message when your Internet connection is down. I unplugged my DSL router and got this more familiar error:

C:\Users\DrJ>ping 8.8.8.8

Pinging 8.8.8.8 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
 
Ping statistics for 8.8.8.8:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

I also observed that if I just waited it out (doing all these tests kills a few minutes) the connection would come back by itself after about 9 or 10 minutes. I timed rebooting. On my PC it’s about six minutes before I have a working Internet connection again. Not very impressive, but that’s how it is.

A Google search showed a bunch of sites with junk answers apparently trying to push adware on your PC. You know those sites that have somehow documented every single PC problem, supposedly, and have a boilerplate bogus generic description of likely causes and the generic fixes which are all the same but actually have nothing whatsoever to do with the specific problem? I’m getting really annoyed with those sites. But one guy mentioned a firewall. I am running McAfee Livesafe. Hmmm.

Self-inflicted denial of service attack
Yup. From the Windows start menu I typed in McAfee to launch it. I navigated to the part for Web and Email protection. Turned off firewall protection.
McAfeeFirewall
The instant I did that my browsers sprung to life! Gmail started working. All was good.

So in the business this is what we call a self-inflicted denial of service, which is somewhat of a tongue-in-cheek name, but apt. A security service that shuts everything down is just as bad as no security whatsoever. I tried to check the McAfee logs to look for a bright red warning that says we’re shutting you down for now but haven’t found anything like that.

And those pings to Google? They now look like this:

C:\Users\DrJ>ping 8.8.8.8

Pinging 8.8.8.8 with 32 bytes of data:
Reply from 8.8.8.8: bytes=32 time=88ms TTL=40
Reply from 8.8.8.8: bytes=32 time=64ms TTL=40
Reply from 8.8.8.8: bytes=32 time=63ms TTL=40
Reply from 8.8.8.8: bytes=32 time=63ms TTL=40
 
Ping statistics for 8.8.8.8:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 63ms, Maximum = 88ms, Average = 69ms

2nd possible cause
Today I also witnessed General failure in testing ping to a single particular destination on a Windows server. So first thing we checked is the Windows firewall. It was disabled. So what else could it be? Since it was a server in a complex environment the application owner had added routes. But he wasn’t very familiar with the route command so he just literally added routes with all the options present like in their example under route /help:

$ route ADD 157.0.0.0 MASK 255.0.0.0 157.55.80.1 METRIC 3 IF 2

Only he chose IF 1. This created the bizarre situation where the route was added with the correct gateway, but the wrong interface! The system assigned IF 1 to 127.0.0.1. So those packets weren’t going anywhere because that’s the loopback interface! I suggested to delete that route, then add it without the METRIC and IF options – that’s how I’ve always done it.

Result: General failure disappeared.

Conclusion
A Windows system reports General failure during a PING test when the ICMP packets cannot leave the system. This can be due to running a local firewall or having bad routes present.

Categories
DNS Linux Network Technologies Perl

Announcing a simple DNS web interface and code

Intro
For demonstration purposes I’ve written a WEB interface to do DNS queries. This can be used for light querying. Once it gets abused I will pull it from the web site.

Motivation
Some large enterprises are behind not only a corporate firewall, but also confined to a private namespace with no access to Internet name resolution. Users in such situations can use one of the many available tools to do DNS resolution through the web, but they all want to throw advertising at you and it’s not clear which can be trusted not to load you up with spyware. I am offering this ad-free DNS lookup using my position on the Internet as a trusted source.

And if you’re lucky and looking for code to do this yourself, you might find it. But nowhere will you find a site that’s running its own published code for DNS resolution. Except here.

The code
Admittedly very simple-minded, but hopefully not fatally flawed, here it is in Perl.

#!/usr/bin/perl
use CGI;
$query = new CGI;
%allowedArgs = (domainname =&gt; 'dum',type =&gt; 'dum',short =&gt; 'dum');
#
print "Content-type: text/html\n\n";
print "
\n";
foreach $key ($query->param) {
  exit(1) unless defined $allowedArgs{$key};
  exit(1) if $query->param($key) !~ /^([a-zA-Z0-9\.-]){2,256}$/;
  print "$key " . $query->param($key) . "\n";
}
# possible keys: domainname, type
$domainname = $query->param(domainname);
$type     = $query->param(type);
$type = "any" unless $type;
# argument validation checks
exit(1) if $domainname !~ /^([a-zA-Z0-9\.-]){2,256}$/ || $domainname =~ /\.\./ ||  ! $domainname;
exit(1) if $type !~ /^([a-zA-Z]){1,8}$/;

# short answer?
$short = "+short" if defined $query->param(short);

# authoritative request?
if (defined $query->param(authoritative)) {
# this will be a lot more complicated and so is not implemented. Perhaps someday if there is a request...
}

open(DIG,"dig $short $type $domainname|") || die "Cannot run dig!!\n";
while() {
  print ;
}

Yes it’s very old-school. I do not even use a DNS package. Why bother? It’s not rocket science. There’s a lot more to argument validation than it looks like – you would not believe the evil things people send to your web server. So you have to vigilant about injection attacks or shelling out by use of unexpected characters.

Usage

2020 Update

This URL has been deactivated since I moved to my new server. I’ll have to see if there’s time and interest to restore this functionality.

example 1

https://drjohnstechtalk.com/cgi-bin/digiface.cgi?domainname=johnstechtalk.com&type=a

domainname johnstechtalk.com
type a
 
; &lt;&lt;&gt;&gt; DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.4 &lt;&lt;&gt;&gt; a johnstechtalk.com
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 8711
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
 
;; QUESTION SECTION:
;johnstechtalk.com.		IN	A
 
;; ANSWER SECTION:
johnstechtalk.com.	3600	IN	A	50.17.188.196
 
;; Query time: 10 msec
;; SERVER: 172.16.0.23#53(172.16.0.23)
;; WHEN: Mon May  4 14:59:05 2015
;; MSG SIZE  rcvd: 51

example 2
https://drjohnstechtalk.com/cgi-bin/digiface.cgi?domainname=drjohnstechtalk.com&short

domainname drjohnstechtalk.com
short 
50.17.188.196

Familiarity with dig will help you determine the best switches to use as you can see that at the end of the day it is merely calling dig and sending back that output with a minimum of html markup. This will make it easy to parse the output programatically.

Conclusion
A simple DNS web interface is being announced today. Both the service and the code are being made available. The service may be pulled once it becomes abused.

References
A nice, not too commercial web interface to dig and traceroute that is more user-friendly than mine is http://www.kloth.net/services/dig.php
The dig man pages can be helpful.

Got a geoDNS entry? Although this link has ads, it’s quite interesting because it sends your query to open DNS servers around the world: https://dnschecker.org/.

You can explore some details behind Google’s public resolving server 8.8.8.8 by using the web site: https://dns.google.com/. It’s quite helpful.

I won’t paste the link to my service but you can see what it is from the examples above.

There’s a simple but effective DIG available for your Android smartphone from the Playstore. That’s DNS debugger from TurboBytes. No obnoxious ads and yet no cost.

Of course if you are on the Internet and have access to dig, Google’s DNS servers are available for you to use directly.

Want to learn if the Great Friewall of China is clobbering the expected DNS result? The site https://viewdns.info/chinesefirewall/ is designed to do just that.

Categories
Admin Network Technologies

Fixing a hanging JunOS Pulse VPN client login

Intro
I often have trouble getting a clean disconnect when shutting down my JunOS Pulse client. As often as not it hangs while displaying Disconnecting… A reboot seemed a little drastic to me so I found a kinder, gentler way to reset things. Read the details if this applies…

The details
When it’s hanging you will have an additional adapter not normally present called JunOS virtual adapter or something like that. To get to this adapter in Windows 7 type network in the Run text box. Click on Network and Sharing Center; then Change adapter settings.

Find the JunOS virtual adapter.

Right-click and disable it.

That’s it!

Your disconnect should then complete and the virtual adapter will eventually disappear on its own. I imagine you would need administrator access to your PC in order to be able to do this.


The catch

And this is a very big catch. This did save me a reboot as promised. But it has a huge drawback. The next time you try to use the JunOS Pulse client it will never finish connecting! So while it is trying to connect you have to repeat the steps above but this time enable the adapter!

I was really stumped when I first encountered this problem and couldn’t connect.

Why does this work?
Well, the symptoms I was experiencing during hanging is that the virtual adapter JunOS creates is present and keeps its IP address, as you can see form an output of ipconfig /all. So I thought there should be a way to remove the adapter with a command-line command. But when I clicked on the adapter I reasoned that if I could simply remove the IP address then I would achieve what I needed and restore my regular connectivity. Disabling it did that and it worked!

How do I get myself in this situation?
I use VPN. Then I leave my laptop for a length of time. Eventually the laptop hibernates, keeping its memory of running JunOS Pulse. Next I bring it to an office with a physical LAN port and that JunOS virtual adapter is still hanging around upon wake-up and the Pulse client is stuck disconnecting.

Conclusion
I have shown a method of saving yourself a reboot if your JunOS Pulse client is hanging upon disconnecting. However I have given you enough rope to hang yourself. You will never connect again unless you undo those very same steps the next time you try to connect!

The JunOS Pulse client is provided by Juniper Networks.

References
I explain how to work on a Juniper SA appliance in this post.

Categories
Admin Network Technologies

Routing based on source MAC address

Intro
As I am not a true network specialist but more a security operations specialist, I am always amazed by discovering network things that they probably teach in networking 101 and seem obvious to those in the know.

I had a “revelation” (to me anyways) like that lately.

The details
For all these years I’ve dabbled setting up interfaces, creating static routes, setting up RIP advertisements, reading BGP configurations, solving martian source problems, or slightly harder things like network address translation, secure network address translation and vpn tunnels, I thought I knew all the mainly relevant things there is to know about how to get packets to where you want them.

For a typical, non-routing device, I had always thought that the only choice on how to send packets out is

– local subnet if destination is on a subnet of one of the interfaces
– static route if configured for that destination IP
– default gateway if configured

And that’s that, right?

This limited number of route selection methods is very important in some architectural designs I’ve been involved with. For instance, an Intranet which has “borrowed” large swaths of valid Internet address space pretty much necessitates a two-stage proxy approach if using explicit proxy settings. Or so I thought.

What I learned
Someone pointed out to me a feature on Bluecoat proxy called return-to-sender. Normally it is disabled. But when enabled, what it does is (to me, because I never thought it possible) amazing: it sends its response packets back to the MAC address of the inbound sender! Thus it will shortcut all the routing decisions mentioned above and use the MAC address of the host which sent it a packet to which it is responding.

If I had know that was possible I might never have implemented a two-stage proxy.

I decided to try the ultimate worst-case scenario:

– use of proxy with this feature enabled and
– proxy client on same internal subnet as proxy server, 10.11.12.0/24
– source IP of proxy client = my AWS server, 50.17.188.196
– requested web site: my AWS web site at http://50.17.188.196/

In other words, steal my own IP, put it on the Intranet, and make the proxy route packets back to me in response while simultaneously connecting to that exact same IP on the Internet.

How I configured the VIP
This is very old school, but I did one of these numbers ona SLES11 system:

$ sudo ifconfig eth0:0 50.17.188.196 up

And that was sufficient to add that IP to the eth0 interface.

Did it work? Yes, it did. Amazing.

I only needed one single request to verify that it worked. Here is that request, using wget from the Linux host which is on the same subnet.

export http_proxy=http://10.11.12.13:8080/
wget --bind-address=50.17.188.196 http://50.17.188.196/

And the proxy log line this produced:

2014-08-04 13:55:08 3 50.17.188.196 - "none" 200 TCP_HIT GET text/html http 50.17.188.196 80 / - - "Wget/1.11.4" OBSERVED 10.11.12.13 769 158 - -

Other appliances can do this, too
On F5 BigIP this feature is also supported. It is called auto last hop and can be configured either globally or for an individual virtual server.

To be continued…

Categories
DNS Network Technologies

The IT Detective Agency: internal DNS queries getting clobbered after bind upgrade

Intro
We’ve upgraded BIND innumerable times over the years. There’s never really been an issue. The new version just picks up and behaves exactly like the old version and all is good. But this time, in upgrading from ISC’s BIND v 9.8.5-P2 to BIND v 9.9.5-P1 something was dramatically different.

The details

Look at these queries:

> dig ns 10.in-addr.arpa

; <<>> DiG 9.9.2-P2 <<>> ns 10.in-addr.arpa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60248
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
 
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;10.in-addr.arpa.               IN      NS
 
;; ANSWER SECTION:
10.in-addr.arpa.        0       IN      NS      10.IN-ADDR.ARPA.
 
;; Query time: 0 msec
;; SERVER: blah, blah
;; WHEN: Wed Jun 25 09:49:30 2014
;; MSG SIZE  rcvd: 73

> dig -x 10.100.208.10

; <<>> DiG 9.9.2-P2 <<>> -x 10.100.208.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 6088
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
 
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;10.208.100.10.in-addr.arpa.   IN      PTR
 
;; AUTHORITY SECTION:
10.IN-ADDR.ARPA.        86400   IN      SOA     10.IN-ADDR.ARPA. . 0 28800 7200 604800 86400
 
;; Query time: 0 msec
;; SERVER: blah, blah
;; WHEN: Wed Jun 25 09:49:56 2014
;; MSG SIZE  rcvd: 106

That is seriously bad and wrong – for us! This is a cache-only server and there are indeed RFC-1918 addresses defined on internal nameservers, such as that 10.100.200.10.

An email relay which relies on reverse lookups started to fail.

A DuckDuckGo search did not show anything relevant. Maybe Google would have.

I ultimately registered for an account at the knowledge base at isc.org, kb.isc.org, and quickly found my answer.

In fact they were crystal clear in explaining this very problem, so I hesitated to document it here, but I figure others might leap first and then read the documentation later like myself so it might do someone else some good.

They say:

...
Although this will be effective as a workaround, administrators are urged not to just specify empty-zones-enable no;
 
It is much better to use one or more disable-empty-zone option declarations to disable only the RFC 1918 empty zones that are in use internally.  
...

That empty-zones-enable no; by the way is a configuration option you can toss in your main configuration file in the options section.

Case closed.

Conclusion
Our reverse lookups on the Intranet began to fail after an innocuous upgrade of the ISC bind nameserver to version 9.9. A simple addition of an extra configuration statement resolved the matter. I guess it really is a good idea to RTFM.

References
ISC’s site is www.isc.org.
A different type of DNS clobbering was described in this case.
A word about Google’s awesome public DNS service.