Categories
Admin Linux

Getting GNU screen to work on Windows 10 for a productive terminal multiplex environment

Intro
My jump server is getting old and they’re threatening to cut it off. A jump server is a server from which you launch CLI terminal sessions into your linux servers. Since my laptop has firewall access to all the same servers I wondered if I could build up a productive environment right within Windows 10 on my own laptop. For me this would be running GNU screen as a terminal multiplexer since I hop between terminal screens all day.

More details
Windows 10 is coming around to more fully integrating with Linux! it’s about time. WSL, windows subsystem for Linux, is all about that. And things like bash shell, ubuntu and OpenUSE Linux are available from the windows store. But that was not an option for me. My organizaiton has shut all that down.

So I thought back to my days as a Cygwin user those many years ago… Could I get GNU screen running within Cygwin environment on Windows 10? Well, yes, I can with just a few tweaks.

I think the initial Cygwin install required admin privileges, but once installed to run it does not.

Within Cygwin screen is an optional package and you can run their setup program to search and install it.

Here is my .screenrc file

defscrollback 4000
#change init sequence to not switch width
termcapinfo  xterm Z0=\E[?3h:Z1=\E[?3l:is=\E[r\E[m\E[2J\E[H\E[?7h\E[?1;4;6l
 
# Make the output buffer large for (fast) xterms.
termcapinfo xterm* OL=10000
 
# tell screen that xterm can switch to dark background and has function
# keys.
termcapinfo xterm 'VR=\E[?5h:VN=\E[?5l'
termcapinfo xterm 'k1=\E[11~:k2=\E[12~:k3=\E[13~:k4=\E[14~'
termcapinfo xterm 'kh=\E[1~:kI=\E[2~:kD=\E[3~:kH=\E[4~:kP=\E[H:kN=\E[6~'
 
# special xterm hardstatus: use the window title.
termcapinfo xterm 'hs:ts=\E]2;:fs=\007:ds=\E]2;screen\007'
 
#terminfo xterm 'vb=\E[?5h



lt;200/>\E[?5l' termcapinfo xterm 'vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l' # emulate part of the 'K' charset termcapinfo xterm 'XC=K%,%\E(B,[\304,\\\\\326,]\334,{\344,|\366,}\374,~\337' # xterm-52 tweaks: # - uses background color for delete operations termcapinfo xterm ut #from https://stackoverflow.com/questions/359109/using-the-scrollwheel-in-gnu-screen termcapinfo xterm* ti@:te@ escape ^\\ # changes espace sequence password

Note that in my .screenrc I use <Ctrl-\> as my escape sequence, so, e.g., to pop to the previous screen it is <Ctrl-\> <Ctrl-\>. I’m not sure that’s standard but my fingers will remember that to my dying day. They probably still remember some of those EDT/TPU VAX editor commands to this day!

Compare and contrast
Here are my day 0 observations.

ssh, curl, nslookup and tracert are coming from the underlying Windows system (do a which curl to see that) so that means you get the dumb version your system has.

So there is no dig, and no nc or netcat.

touch, cat, mkdir and vi behave pretty normally. man pages are installed, which can be a help.

If you use proxy, a funny thing can happen and your environment variables can get mixed. You may have inherited an HTTP_PROXY environment variable form the system, but the alias you copied from a linux jump server probably defines an http_proxy environment variable (lower case). And both can co-exist! As to which one curl would then use, who knows? Better just stick to working with the upper-case one and NOT define another in lower case.

For awhile it looked like scrolling was not working at all when screen was running. Then i found that tip I reference at the bottom of my .screenrc file which makes scrolling work via the mouse’s scroll wheel, which isn’t too bad.

Old friends like ls, grep, echo and while (built-in bash command) are available however. dig can be installed from the bind-utils package.

A lot of other packages are optionally available, including a whole X-Windows environment, which I used to run in the past but hope to avoid this time around.

No crontabs however (to have cron daemon requires installing admin privileges) which kind of hurts.

Simple output redirection seems to work, as does job control, e.g.,

ping -t 8.8.8.8 &gt; /dev/null 2&gt;&amp;1 &amp;

Not sure why you’d want to run the above command, but this nice example shows that the /dev/null device exists, and the ping command is inherited from your Windows environment hence the -t option to run it indefinitely, and that it will create a background process which you can view and control with jobs / kill.

Now I typically move my laptop off the work environment each night, so all my ssh logins will be lost, unlike the jump server situation. But our jump server isn’t that stable anyway so no big loss I’d say…

I am sooo used to highlighting text in Teraterm, which is my current environment, and that being sufficient to put that text into the clipboard, that I keep doing that in this environment. But it doesn’t work. I have to use the CMD window convention of highlighting the text and then hitting ENTER to get it into the clipboard. oops. That was because I had been launching Cygwin from a CMD window. Now I am launching from a proper Cygwin shortcut and simple text highlighting works, BUT, right-clicking to paste it in brings up a menu rather than just doing it! So there’s that difference now… Instead of right-click I can quickly paste the text in doing a SHIFT-Insert.

ssh will get you

By default you end up using the Windows-10 supplied ssh, and that works pretty well. But when you’re ready to advance and need to put some thing into a .ssh/config file, forget about it. In principle it’s possible in Windows 10, but it’s too complex. Just install the ssh package. That in turn permits you the facility familiar to you where you can create a ~/.ssh/config file.

How to set your userid by default for your ssh logins

First make sure you install the Cygwin ssh package and are using that one. A which ssh should come back with /usr/bin/ssh.

My config file looks like this:

Host *
User drjohn

That sets my default userid to be drjohn on any random server I ssh to.

New ssh error pops up
Unable to negotiate with 50.17.188.196 port 22: no matching key exchange method found. Their offer: diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1

This only happened when I switched from my Windows ssh to the Cygwin one. This is, of course, when connecting to a system (ironically, a firewall) with an old image. I think the only solution to be able to access these old systems is to switch back to the Windows 10 ssh – after all we never got rid of it and it used to work. Since all my customary ssh’s are aliased, this works well enough. I just made an alias like this

alias oldFW='screen -t oldFW /cygdrive/c/windows/system32/openssh/ssh.exe admin@10.0.0.17'

since on my system the Windows 10 openssh is installed there in the system32 folder.

How do you get multiple login sessions (shells) within your screen to the localhost?

Well, you can’t just do a su – and you probably don’t have an ssh daemon running locally, so this is more of a non-trivial question than it first appears.

I define a bunch of aliases. My alias for getting an additional shell on the Windows 10 machine is this:

alias local=’screen -t localhost bash –login -i’

A word on package management
I don’t know why I was afraid of installing packages when I first tried Cygwin over a decade ago. Now for me that’s the key – to understand and practice installing packages because it’s actually really easy when you’re used to it.

The key is to simply keep your initial install setup hanging around, setup-x86_64.exe. In my case it’s in my downloads directory. Example usage: I wondered if I could install a decent version of ping rather than continually suffer with the dumb DOS version. So, fire up the above-mentioned executable. Go through a few screens (where it remembers the answers from the initial install), then search for the package (Yes, it’s there!), and select to install the most recent version from the drop-down. A few more clicks and it’s done and available in your path. it’s that easy… Not sure about uninstalling because you almost never need to do that. It seems maybe a thousand packages are available? so no, there’s no yum or zypper or rpm or apt-get, but who really needs those anyway?

As a concrete example, I am learning about SNMP. So I got something running on a Bluecoat proxy, and I wanted to see what I could see. The guide recommended using snmpwalk, which of course I did not have. So I learned which package it is in with a DDG search, then ran the Cygwin setup, found that package, installed it, and voila, there was snmpwalk in my path. And it worked, by the way. Easy peasy.

Creating your own scripts

If you have the funny situation, like me, where you had enough privileges to install Cygwin, perhaps by temporarily assigning your account the Admin role, but when you use it day-to-day, you do not have admin privileges, you will find yourself unable to create files in some of the system directories like /usr/local/bin – permission denied! But in your home directory you will be able to edit files.

So what I did is to create a bin directory under my home directory, where I plan to add my home-grown scripts such as mimeencode, and make sure my PATH includes this directory with a statement like

 export PATH=$PATH:${HOME}/bin

which I put in my .alias file, which in turn I source from .bashrc.

Conclusion
GNU screen for Windows is indeed possible, but you gotta run it on top of Cygwin. It’s of interest that after all these years Cygwin is still viable on Windows 10. Cygwin can be run in a pretty lightweight fashion if you avoid the X-Windows stuff. There are some quirks but it is surprisingly linux-like at the end of the day. I believe it is really suitable as a replacement for a linux jump server. screen, for the uninitiated, is a temrinal multiplexer, which means it makes it very fast for you to switch between multiple terminal windows.

Some things are a bit different.

I think I will use this both at work and at home… Nope! My home PC runs too darn slow to ever use the Cygwin environment. My work laptop has SSD which probably helps keep performance good.

It is possible to set up an ssh default user.

It is possible to create multiple local shells within one screen within one Cygwin terminal.

So it is really possible to have your Linux command line. I use it every day…

If you have access, a look at WSL and native bash might be worthwhile.

References and related

Here’s the GNU Cygwin home page: https://www.cygwin.com/

Install Cygwin by running https://www.cygwin.com/setup-x86_64.exe

Interesting discussion: https://stackoverflow.com/questions/359109/using-the-scrollwheel-in-gnu-screen

If you have a linux jump server that runs screen, or just want to ssh to a linux server, teraterm can be a good choice (as opposed to putty or built-in ssh). These days it can be found here: https://osdn.net/projects/ttssh2/releases/

Categories
Admin

Nmap: Swiss Army Knife of network utilities

Intro
I just wanted to put in a plug for nmap. It’s a very useful tool for any network specialist. I show a use case that came up today.

The details
While cleaning up DNS entries I came across a network segment that didn’t seem to have any active network devices, at least not after I cleaned up the old DNS entries for inactive devices.

So I wanted to see if I could tell the networking tech that this subnet is unused and could be allocated for some other purpose.

I remembered using nmap years ago, and that it was a powerful tool for this kind of thing. What I had in mind was to ping every IP on this segment to see if there were any undocumented hosts.

As it turns out I didn’t even have it installed, but it was very easy to get:

On SLES:
$ zypper install nmap

On CentOS:
$ yum install nmap

It doesn’t get easier than that!

A quick review of the man page showed that what I wanted was indeed possible. Here’s the syntax for a systematic PING sweep through a subnet:

$ nmap −sP 10.101.192.0/24

Starting Nmap 4.75 ( http://nmap.org ) at 2012-11-08 10:21 EST
Host 10.101.192.5 appears to be up.
Host 10.101.192.10 appears to be up.
Host 10.101.192.151 appears to be up.
Host 10.101.192.152 appears to be up.
Host 10.101.192.153 appears to be up.
Nmap done: 256 IP addresses (5 hosts up) scanned in 1.28 seconds

Now I know that subnet has rogue or at least undocumented hosts and is not unused!

The original usage for nmap, at least for me, was to fingerprint an unknown host:

$ nmap −A −T4 ossim.drj.com

Interesting ports on ossim.drj.com (10.22.235.19):
Not shown: 996 filtered ports
PORT    STATE  SERVICE  VERSION
22/tcp  open   ssh       (protocol 2.0)
80/tcp  open   http     Apache httpd
|_ HTML title: 302 Found
443/tcp open   ssl/http Apache httpd
|_ HTML title: Site doesn't have a title.
514/tcp closed shell
1 service unrecognized despite returning data. If you know the service/version, please submit the following fingerprint at http://www.insecure.org/cgi-bin/servicefp-submit.cgi :
SF-Port22-TCP:V=4.75%I=7%D=4/25%Time=51793070%P=x86_64-suse-linux-gnu%r(NU
SF:LL,29,"SSH-2\.0-OpenSSH_5\.5p1\x20Debian-6\+squeeze2\r\n");
Device type: WAP|general purpose|PBX
Running (JUST GUESSING) : Linux 2.6.X|2.4.X (93%), Vodavi embedded (85%)
Aggressive OS guesses: OpenWrt 7.09 (Linux 2.6.22) (93%), OpenWrt 0.9 - 7.09 (Linux 2.4.30 - 2.4.34) (92%), Linux 2.6.20.6 (89%), Linux 2.6.21 (Slackware 12.0) (88%), OpenWrt 7.09 (Linux 2.6.17 - 2.6.21) (88%), Linux 2.6.19 - 2.6.21 (88%), Linux 2.6.22 (Fedora 7) (88%), Vodavi XTS-IP PBX (85%), Linux 2.6.22 (85%)
No exact OS matches for host (test conditions non-ideal).
 
TRACEROUTE (using port 21/tcp)
HOP RTT    ADDRESS
1   0.87   10.202...
2   0.38   ...
3   0.57   ...
4   6.10   ...
5   114.64 ...
6   119.79 ...
7   103.43 ossim.drj.com (10.22.235.19)
 
OS and Service detection performed. Please report any incorrect results at http://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 35.02 seconds

Now that was kind of an unusual example in which nmap wasn’t too sure about the OS. Usually you get a positive ID of some sort. That’s a chatty server and I’m still not sure what it is.

Nmap can be used for nasty things and in an impolite way, network-wise. So be careful to tone it down. Target your hosts and protocols with care. It can guess what OS a host is running, what ports are open, all kinds of amazing stuff.

I checked PING and did not see a built-in capability to do a PING sweep, though it would have been easy enough to script it. That was my backup option.

Once I had to check on a single UDP port being open on port 80 for a webcast client called Kontiki (they call this protocol KDP). No other ports were open, necessitating the -PN switch.

Single UDP port check
$ nmap −PN −sU -p 80 29.239.11.4

Starting Nmap 4.75 ( http://nmap.org ) at 2013-07-23 13:59 EDT
Interesting ports on 29.239.11.4:
PORT   STATE         SERVICE
80/udp open|filtered http
 
Nmap done: 1 IP address (1 host up) scanned in 2.15 seconds

Three TCP ports checked
$ nmap ‐PN ‐sS ‐p 445,28080,28443 12.92.96.37

Results of that scan

Starting Nmap 5.51 ( http://nmap.org ) at 2017-04-13 09:18 EDT
Nmap scan report for 12.92.96.37
Host is up.
PORT      STATE    SERVICE
445/tcp   filtered microsoft-ds
28080/tcp filtered unknown
28443/tcp filtered unknown
 
Nmap done: 1 IP address (1 host up) scanned in 3.06 seconds

filtered” means there were no reply packets to my SYN packets, usually a sign of an intervening firewall dropping packets. I’m not sure why it describes the host as “up” when actually it is down or behind a firewall. A state of closed indicates that a RST packet was received in reply, indicating that the port is closed on the host itself and it wasn’t a firewall that prevented the test from succeeding. the third possible state is open, which of couse means that it replied with a SYN-ACK to that probe on that port.

To fix the source port add a -g to the above command. E.g., some firewalls have trouble with permitting inbound UDP packets from port 53 so to test for that you throw in a -g 53 and try some random high destination port.

I needed to spoof another host’s IP address and send a simple PING (ICMP request) to diagnose what was going wrong with the reply. Here’s how I did that:

$ nmap −PE −e eth0 −S 10.42.48.1 10.1.145.10

But then I realized what I really needed to do to emulate the problem is to send a single TCP SYN packet to port 8081, without the accompanying ICMP probes that nmap is wont to throw in there first. Here’s how I built up that probe:

$ nmap −PN −sS −p 8081 −−max-retries 0 −e eth0 −S 10.42.48.1 10.1.145.10

Check if a web server is running
$ nmap −PN −p T:80,443 drjohnstechtalk.com
This will check both ports 80 and 443. It doesn’t execute any HTTP protocol. It’s just a quick and dirty test.

don’t have nmap but have something like netcat instead? A good tcp port check with netcat is
netcat -vzw5 <host> <port>. Here’s an actual example.

$ netcat ‐vzw5 drjohnstechtalk.com 443

DNS mismatch
drjohnstechtalk.com [50.17.188.196] 443 (https) open

Conclusion
Nmap is a great network tool that every IT network tech should be familiar with.

References and related
A more capable and complicated packet generation tool is scapy. I describe it in this blog post.

A simpler network for Windows (simpler than nmap for Windows) is PortQry. It was created by Microsoft.

Categories
DNS IT Operational Excellence Network Technologies Uncategorized

Google’s DNS Servers Rock!

Intro
DNS is the Domain name Service, the Internet service that converts IP addresses, e.g., 200.54.129.57 into mnemonic names like www.mysite.com.

I tried to run a cache-only DNS server for use by a proxy server. What I found is that certain sites were not accessible on a frequent basis. I think uol.com.br is one of the problem sites (need to check this). It may not mean much to a US audience, but it’s really popular in Brazil!

At some point I happened to learn that Google has a public DNS service. This is worth pondering. No one of any repute has offered a DNS service to that point. There are a host of concerns about security, especially DNS cache poisoning. They blazed a trail, and did it in a way only Google and very few other major infrastructure players could. Not only did they offer a DNS service, they put their DNS servers all over the Internet and created convenient anycast addresses for their servers.

I am no expert on anycast addresses. You can look it up on Wikipedia, however. The essence for my purposes is that with a single IP address you’re going to hit the closest server, network-wise. So no matter where you are some Google DNS server is not far away. Try it. The anycast addresses are 8.8.8.8 and 8.8.4.4. They don’t mind, really! You can ping them. Traceroute to them, whatever. From the Amazon cloud Northeast 8.8.8.8 responds to PINGs in 3.4 ms. That’s really low. Not so low as to make me think they are in the same data center (it is different companies after all), but not far away.

The gold standard for running a DNS service is BIND. I have been running it for many years now and I want to give the Internet Software Consortium their due for providing this wonderful application. Once I got wind of my DNS difficulties as mentioned above, I had to wonder why not everyone else was complaining? They had to be using something else. I ran a flat-out performance test. 5000 queries from an actual proxy log, fed straight to my BIND DNS server, and then to Google’s DNS server 8.8.8.8. I have to dig up the numbers, but Google’s won by quite a bit! This result was actually surprising because you’re always going off-site to the Google DNS server, whereas my server can build up its cache and is right on my network. From where I tested the Google server was about 11 ms away. So 5000 x 11 ms = 55 s. So there is a 55 s handicap from just network considerations alone! Yet it is faster. On the quickest of queries the local server is indeed faster, but what happens is that over the course of real life queries, you always get a few problematic ones which either time out or just seem to take a long time to get back a response. That’s what kills the traditional DNS server and where Google has (obviously) made some optimizations.

And, that’s not all! Google also deals in a more forgiving fashion with broken domain names. I used to get on my high horse and proclaim to others about how broken their DNS servers are – it’s no wonder I can’t resolve their names, which means, by the way, I also cannot get to their web site nor send them email!

It’s effectively like taking yourself off the Internet, or so I thought. Turns out in some cases that’s only true if you’ve constrained yourself to resolving names with BIND. You see, BIND enforces the rules. And I’m a believer in rules. The Internet has about 5,000 technical rules called RFCs. DNS is a topic of many of these rules. The Internet could only have expanded to the size it currently has because all the major players agreed to abide by those rules. What Google has done with their server, in effect, is to say, “Well, if you don’t follow the rules, we’re going to try to work with you anyways.”

Here’s a concrete example. appliedcoatings.org. I guess at some point they’ll actually fix their severely broken DNS, but at the time I write this, August 21, 2011, these comments are valid and their domain is severely broken. In fact, I was amazed that people weren’t jumping up and down screaming at them. I couldn’t even send an email to them. That’s akin to knocking yourself off the Internet, right? Ah, but it all depends on whose DNS servers you are using!

There used to be lots of good free DNS analyzers, like dnsreport.com. You can still find a few around. www.zonecheck.fr, for instance. It shows FAILURE. If it were better written it would show the real problem, which is a lame delegation. But we’re experts, and we don’t need such tools! We will do the queries ourselves and show the lame delegation. We start by learning who are the authoritative nameservers for .ca, the top-level domain used in Canada:

 dig ns ca

; <<>> DiG 9.7.1-P2 <<>> ns ca
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52928
;; flags: qr rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION:
;ca.                            IN      NS

;; ANSWER SECTION:
ca.                     83585   IN      NS      a.ca-servers.ca.
ca.                     83585   IN      NS      c.ca-servers.ca.
ca.                     83585   IN      NS      e.ca-servers.ca.
ca.                     83585   IN      NS      f.ca-servers.ca.
ca.                     83585   IN      NS      j.ca-servers.ca.
ca.                     83585   IN      NS      k.ca-servers.ca.
ca.                     83585   IN      NS      l.ca-servers.ca.
ca.                     83585   IN      NS      m.ca-servers.ca.
ca.                     83585   IN      NS      z.ca-servers.ca.
ca.                     83585   IN      NS      sns-pb.isc.org.

;; ADDITIONAL SECTION:
a.ca-servers.ca.        83594   IN      A       192.228.27.11

Now we ask one of them about the nameservers for appliedcoatings.ca:

 dig ns appliedcoatings.ca @a.ca-servers.ca.

; <<>> DiG 9.7.1-P2 <<>> ns appliedcoatings.ca @a.ca-servers.ca.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 288
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;appliedcoatings.ca.            IN      NS

;; AUTHORITY SECTION:
appliedcoatings.ca.     86400   IN      NS      sp2.domainpeople.com.
appliedcoatings.ca.     86400   IN      NS      sp1.domainpeople.com.

So far everything's cool. Now, since the authoritative flag (AA) was not present in that response we re-ask that query, but now to one of the nameservers that's supposed to be authoritative for that domain:

dig ns appliedcoatings.ca @sp2.domainpeople.com.

; <<>> DiG 9.7.1-P2 <<>> ns appliedcoatings.ca @sp2.domainpeople.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24373
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;appliedcoatings.ca.            IN      NS

;; ANSWER SECTION:
appliedcoatings.ca.     86400   IN      NS      ns1.domainpeople.com.
appliedcoatings.ca.     86400   IN      NS      ns2.domainpeople.com.

Oh, oh. That's not supposed to happen. We're getting back an entirely different set of nameservers. That's a lame delegation. The domain should be considered completely broken. I think even BIND might be forgiving up to this point. a BIND resolver does these types of quesires to get at the answer. At this point it says, "OK, this is strange, but not necessariily fatal. I will ask my subsequent queries to ns1.domainpeople.com and ns2.domainpeople.com since they are listed as being the nameservers of record.

So now let's get to something useful: looking up the mail exchanger record so we see how to deliver mail to this domain. BIND, which has been fastidiously following the rules, does it as follows:

dig mx appliedcoatings.ca @ns1.domainpeople.com.

; <<>> DiG 9.7.1-P2 <<>> mx appliedcoatings.ca @ns1.domainpeople.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 49996
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;appliedcoatings.ca.            IN      MX

;; Query time: 79 msec
;; SERVER: 204.174.223.72#53(204.174.223.72)
;; WHEN: Sun Aug 21 19:05:43 2011
;; MSG SIZE  rcvd: 36

That's not good. Status is REFUSED. But BIND can even forgive this slight. There is one more nameserver to try after all, right? Last chance query:

dig mx appliedcoatings.ca @ns2.domainpeople.com.

; <<>> DiG 9.7.1-P2 <<>> mx appliedcoatings.ca @ns2.domainpeople.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 44404
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;appliedcoatings.ca.            IN      MX

;; Query time: 72 msec
;; SERVER: 64.40.96.140#53(64.40.96.140)
;; WHEN: Sun Aug 21 19:07:34 2011
;; MSG SIZE  rcvd: 36

Status also REFUSED. Now we are really and truly dead. If you are using a BIND nameserver you have no way to send email to someone@appliedcoatings.ca. But not so with Google!

Of course I don't know how Google wrote their DNS server, but I do think that some of their infrastructure experts write it themselves rather than using open source programs. So with a Google nameserver you will get a response:

dig mx appliedcoatings.ca @8.8.8.8

; <<>> DiG 9.7.1-P2 <<>> mx appliedcoatings.ca @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6901
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;appliedcoatings.ca.            IN      MX

;; ANSWER SECTION:
appliedcoatings.ca.     82805   IN      MX      10 mail.appliedcoatings.ca.

;; Query time: 4 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Aug 21 19:11:14 2011
;; MSG SIZE  rcvd: 57

and just to close the loop and make sure this is a valid host you would do this:

dig mail.appliedcoatings.ca @8.8.8.8

; <<>> DiG 9.7.1-P2 <<>> mail.appliedcoatings.ca @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35190
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mail.appliedcoatings.ca.       IN      A

;; ANSWER SECTION:
mail.appliedcoatings.ca. 86400  IN      A       66.183.21.181

And we can go the next step and begin an SMTP conversation with that server to make sure it is really operating. After all, if they messed up DNS there's no telling what else they might have gotten wrong.

 telnet  66.183.21.181 25
Trying 66.183.21.181...
Connected to 66.183.21.181.
Escape character is '^]'.
220 mail.appliedcoatings.ca Microsoft ESMTP MAIL Service, Version: 6.0.3790.4675 ready at  Sun, 21 Aug 2011 16:22:04 -0700
HELO localhost
250 mail.appliedcoatings.ca Hello [50.17.188.196]
quit
221 2.0.0 mail.appliedcoatings.ca Service closing transmission channel
Connection closed by foreign host.

Yup. They've got an operating mail server at that IP.

So we can reverse engineer a bit what Google's DNS server must have done behind the scenes to arrive at a valid answer where BIND could not. I'm 100% sure that Google would have also done the query

dig mx appliedcoatings.ca @ns1.domainpeople.com

since that is the right thing to do. But not getting a satisfactory answer (status: REFUSED), what it must do additionally after getting refused a second time by ns2.domainpeople, is to go back to the originally named nameservers sp1 and sp2. Watch what happens in that case:

 dig mx appliedcoatings.ca @sp1.domainpeople.com.

; <<>> DiG 9.7.1-P2 <<>> mx appliedcoatings.ca @sp1.domainpeople.com.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10226
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;appliedcoatings.ca.            IN      MX

;; ANSWER SECTION:
appliedcoatings.ca.     86400   IN      MX      10 mail.appliedcoatings.ca.

;; AUTHORITY SECTION:
appliedcoatings.ca.     86400   IN      NS      ns1.domainpeople.com.
appliedcoatings.ca.     86400   IN      NS      ns2.domainpeople.com.

;; ADDITIONAL SECTION:
mail.appliedcoatings.ca. 86400  IN      A       66.183.21.181

The AA (authoritative) flag is set in the response. So it's a good response, but sent to the "wrong" nameserver. Nevertheless, it is a response and it gets anyone using that nameserver more functionality than someone using BIND.

Conclusion
So far we've got three advantages speaking favorably for Google's DNS server: it's faster, it's answers are more complete and it's universally available. Wait, there's more! Another nice thing is what it does not do. Some ISPs have a "feature" I call DNS clobbering. In fact it's so annoying I will devote a whole blog post to describing it in more detail. Essentially they take license with DNS and make up answers to some queries! It's true and it's truly annoying. Not all ISPs do this but mine certainly does. So the other nice thing about Google DNS is that it does not do DNS clobbering and it's available for you to use it at home and avoid this annoying feature. You just set your DNS servers rather than have them assigned automatically via DHCP.

Other Resources
I should mention that while researching public DNS servers I was also led to commercial versions of the same thing. I went so far as to test the timings on one of those services and found that it is more distant, round-trip-wise, than Google's anycast server. Stands to reason. Google's got the best Internet access of anyone. They're on all the major highways. The commercial offerings have some additional cool features, however. They can serve as URL filter. So if someone puts in a URL which leads to a malicious site, for example, they can respond with an answer that spares you from going to that infected site. This is a little more crude than URL filtering at the proxy level, since a DNS server has no knowledge of the URI whereas a proxy URL filter does, but it could be quite serviceable. I'm not sure it allows you to pick and choose URL categories to block as with a URL filter (gambling, porn, hacking sites, etc.).

A lot more information on using Google DNS is at http://code.google.com/speed/public-dns/docs/using.html.

September 1 Update - a Crack in the Infrastructure
I now have my first case of a domain name which Google DNS did not resolve correctly, and for no apparent reason. The domain name is forums.tweaktown.com. Here's proof of Google's failure, followed immediately by Amazon's DNS servers' success:

dig forums.tweaktown.com @8.8.8.8

; <<>> DiG 9.7.1-P2 <<>> forums.tweaktown.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 15826
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;forums.tweaktown.com.          IN      A

;; AUTHORITY SECTION:
tweaktown.com.          116     IN      SOA     ns21.domaincontrol.com. dns.jomax.net. 2011060602 28800 7200 604800 86400

;; Query time: 4 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Sep  1 14:40:50 2011
;; MSG SIZE  rcvd: 106


 dig forums.tweaktown.com

; <<>> DiG 9.7.1-P2 <<>> forums.tweaktown.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52290
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1

;; QUESTION SECTION:
;forums.tweaktown.com.          IN      A

;; ANSWER SECTION:
forums.tweaktown.com.   1885    IN      A       38.101.21.25

;; AUTHORITY SECTION:
tweaktown.com.          1943    IN      NS      ns22.domaincontrol.com.
tweaktown.com.          1943    IN      NS      ns21.domaincontrol.com.

;; ADDITIONAL SECTION:
ns21.domaincontrol.com. 753     IN      A       216.69.185.11

;; Query time: 0 msec
;; SERVER: 172.16.0.23#53(172.16.0.23)
;; WHEN: Thu Sep  1 14:40:55 2011
;; MSG SIZE  rcvd: 122

All BIND servers I tried during this time returned the correct answer.

Is this an isolated incident or a tip of an iceberg of problems? I hope it is a one-off. I'll post updates as I find out more. I am slightly concerned now.

References and related
I finally wrote my own web interface to DNS and published the code I did it with. Check it out here.

A web interface to Google's public DNS service, which will give you more debug information, is https://dns.google.com/