Categories
Admin Network Technologies

The IT Detecive Agency: FTPs and other things going slow to one part of one location

Intro
What happens when you have a slowly festering problem – slow but tolerable performance? Is it a problem when no one complains but you know the numbers don’t feel right? Read on to see what happens in this exciting adventure.

The details
I always felt that one of my locations, let’s call it Scattering, was just hard-to-explain slow. Web site accesses as measured in HP SiteScope was slower than at another site, lets call it Cooper, and the variation seemed larger. You could always chalk it up to the fact that the SiteScope server was in the data center with the good performance, but there seemed more to it than that.

The situation was, apparently, tolerable and people grew accustomed to it. Until one day someone said that it just wasn’t right. Ping times from his site, let’s call it Vanderbilt, looked like this from his desktop:

C:\Users>ping dmz-host
 
Pinging dmz-host [10.93.23.12] with 32 bytes of data:
Reply from 10.93.23.12: bytes=32 time=80ms TTL=248
Reply from 10.93.23.12: bytes=32 time=107ms TTL=248
Reply from 10.93.23.12: bytes=32 time=74ms TTL=248
Reply from 10.93.23.12: bytes=32 time=78ms TTL=248

And yet, ping times to another piece of equipment in that same physical server room looked much healthier – on order of 22 msec with very little jitter. What the…?

So I started looking around at my equipment. My equipment generally had good response times amongst itself, but when it crossed over to another group’s equipment it started to go up. Finally I saw it – a common denominator. My equipment had its gateway supplied by a different group. PINGing the local gateway gave 15 msec response times, with high jitter! I’ve never seen that before. You normally expect to get response times , 1 msec. Time to turn the problem over to them.

But as a former scientist, I like to have a hypothesis about what might be wrong. I thought a bad cable, because I’ve heard of that. Duplex mis-matches on the port are always a strong possibility. Guess what? It wasn’t any of those things. But it was something about the port on their equipment. What do you suppose it was? Hint: the equipment they were using was leftover stuff that needed to be replaced but hadn’t because it just kept working.

Mystery Resolved
The port was saturated! It was older equipment with 100 mbit ports, the amount of traffic slowly built up on it over time and sort of crept up on us all. I guess finally the demand on it was causing noticeable problems in response times that someone finally complained!

He shifted the gateway to a 1 gbit port and things got much better. Here are those PINGs now:

C:\Users>ping 10.93.23.12
 
Pinging 10.93.23.12 with 32 bytes of data:
Reply from 10.93.23.12: bytes=32 time=24ms TTL=248
Reply from 10.93.23.12: bytes=32 time=23ms TTL=248
Reply from 10.93.23.12: bytes=32 time=23ms TTL=248
Reply from 10.93.23.12: bytes=32 time=23ms TTL=248
 
Ping statistics for 10.93.23.12:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 23ms, Maximum = 24ms, Average = 23ms

That’s much better!

The HP SiteScope timing are more subtle, but no less important. Users don’t tolerate slow-loading web pages, you know!

I need to make an aside here. Is there no free and widely distributed Linux package to calculate the standard deviation of a set of numbers? I can’t believe it. I had to borrow this guy’s shell script, and it’s pretty lousy because it’s really slow: http://panoskrt.wordpress.com/2009/03/10/shell-script-for-standard-deviation-arithmetic-mean-and-median/. Anyways, doing a before and after analysis on my URL timing, where I fetched a Google page plus images during the daytime from 10 AM to 4 PM, I find this before and after result:

Before

Number of data points in “sca” = 144
Arithmetic mean (average) = .630548611
Standard Deviation = .225365467
Median number (middle) = .664500000

After

Number of data points in “sca2” = 144
Arithmetic mean (average) = .349270833
Standard Deviation = .162399076
Median number (middle) = .263000000

Users were very happy after the upgrade.

Conclusion
An old gateway port that maxed out its traffic capacity was to blame for performance problems in one part of a data center. This doesn’t happen too often, but it can happen, obviously.

Linux needs a simple math package to allow to calculate the standard deviation of a set of numbers.

Categories
Admin Internet Mail

SPF – not all it’s cracked up to be

Intro
I recently implemented a strict enforcement of Sender Policy Framework records on a mail server. The amount of false positives wasn’t worth the added benefit.

The details
When I want to learn about SPF I turn to the Wikipedia article. Many secure mail gateways offer the possibility to filter out emails based on the sending MTA being in the list allowed by that domains SPF record in DNS. I began to turn on enforcement domain-by-domain. First for bbb.org since their domain was spoofed by an annoying spam campaign earlier this year. Then fedex.com (another perennial favorite), aexp.com, amazon.com, ups.com, etc.

I was stricter than the standard. FAIL -> reject; SOFTFAIL -> reject! And it seemed to work well. Last week I went whole hog and turned on SPF enforcement for all domains with a defined SPF record, but with slightly relaxed standards. FAIL -> REJECT; SOFTFAIL -> quarantine. While it’s true that it may have helped prevent some new spam campaigns, complaints started mounting. Stuff was going into quarantine that never used to!

I knew SOFTFAIL must be the issue. Most(?) domains seem to have a SOFTFAIL condition. Here is a typical example for the domain communica-usa.com:

$ dig txt communica-usa.com

; <<>> DiG 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 <<>> txt communica-usa.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15711
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
 
;; QUESTION SECTION:
;communica-usa.com.             IN      TXT
 
;; ANSWER SECTION:
communica-usa.com.      600     IN      TXT     "v=spf1 ip4:72.241.62.1/26 ~all"

So the ~all at the end means they have defined a SOFTFAIL that matches “everything else.” And if a domain does have a SOFTFAIL contingency in its SPF record, this is almost always how it is defined. Which, when you think about it, is the same, formally, as not having bothered to define an SPF record at all! Because the proper action to take is NONE.

I reasoned as follows: why bother defining an SPF record unless you know what you’re doing and you feel you actually can identify the MTAs your mail will come from? Except when you’re first starting out and learning the ropes. That’s why I took stronger action than the standard suggested for SOFTFAIL.

A learning experience
Well, it wasn’t immediate, but after a week to ten days the complaints started rolling in. Looking at what had actually been released from quarantine was an eye-opener. It confirmed something I had previously seen, namely that only a minority speak up so the problems brought to my attention amounted to the tip of the iceberg. So the problem of my own creation was quite widespread in actual fact. I investigated a few by hand. Yup, they were sent from IPs not permitted by the narrowly defined part of their SPF record, and they were repeat offenders. Domains such as the aforementioned communica-usa.com, right.com, redphin.com, etc, etc. I haven’t gotten an explanation for a single infringement – finding someone with sufficient knowledge in these small companies to have a discussion with is a real challenge.

Waving the white flag
I had to back down on my quarantining SOFTFAILs in the global SPF setting. Now it’s SOFTFAIL -> PASS. I could have made domain-by-domain exceptions, but when the number of problem domains climbed over a hundred I decided against that reactive approach. I kept the original set of defined domains, ups.com, etc with their more-aggressive settings. These are larger companies, anyways, which either didn’t even have a SOFTFAIL defined, or never needed their SOFTFAIL.

August 2013 update
Well, a year has passed since this experiment – long enough to establish a trend. I was hoping by writing this article to nudge the community in the direction of wider adoption of SPF usage and more importantly, enforcement. Alas, I can now attest that there is no perceptible trend in either direction. How do I know? Because even with our more lax enforcement, we still ran into problems with lots of senders. I.e., they were too, how do I say this politely, confused, and apparently unable to make their strict SPF records actually match their sending hosts! Incredible, but true. If there were widespread or even spotty adoption, even 5 – 10 % of MTAs adopting enforcement of strict SPF records, they would also have refused messages from senders like this and these senders would have been motivated to fix their self-inflicted problems. But, in reality, what I have seen is complete lack of understanding amongst any of the dozen or so companies with this issue and complete inability to resolve it, forcing me to make an exception for their error.

SPF records can be bad for a number of reasons. Some companies provide two of them, and that is a source of problems. Here are a couple examples I had this issue with:

scterm.com

scterm.com.             3600    IN      TXT     "v=spf1 ip4:65.122.37.36 include:spf.protection.outlook.com -all"
scterm.com.             3600    IN      TXT     "MS=ms56827535"

atlantichealth.org

atlantichealth.org.     3600    IN      TXT     "v=spf1 mx ip4:198.140.183.244/32 ip4:198.140.183.245/32 ip4:198.140.184.244/32 ip4:198.140.184.245/32 ip4:207.211.31.0/25 ip4:205.139.110.0/24 ip4:205.139.111.0/24 -all"

So my guesstimate is < 1% adoption of even the most conservative (meaning least likely to reject legitimate emails) SPF enforcement. Too bad about that. Spammers really appreciate it, though. Conclusion
By being forced to back off aggressive SPF settings most domains with SOFTFAIL defined as “~all” – which is most of them – are lost to any enforcement. It’s a shame. SPF sounded so promising to me at first. Spammers are really good at controlling botnets, but terrible at controlling corporate MTAs. But still I do advocate for wider adoption of SPF, without the SOFTFAIL condition, of course!

References
A nice SPF validator is found here. It has no obnoxious ads or spyware.

Categories
Admin Web Site Technologies

unece.org site is busted again – how to access it anyways

Intro
I don’t usually provide tips on productive use of Internet tools, but, since I needed this one myself today, and I wasn’t 100% sure it was going to work, I think this is worth mentioning.

unece.org – completely broken
A not-so-pretty error message gets spit out, including the text:

Configuration Error: 404 page "/webhome/unece.org/www/fileadmin/read.txt" could not be found.

which reveals just a little too much about the hosting environment (i.e., shared) in my opinion.

Who cares? I like this site because it can act as a sort of arbiter in assigning three-letter names to virtually any locale around the world that has more than a few thousand people. Put the UN two-letter country code in front of that and you have a short, unique five-letter reference to any town or city in the world. This is my bookmark: http://www.unece.org/cefact/locode/service/location.htm

But today it’s not working. Maybe they’ll fix it tomorrow. Who knows? But I need a location code today. What to do? Well, it’s clearly a simple-minded site with static content – a perfect candidate for the Wayback machine at archive.org.

Long story short, yes, a July, 2011 site version was archived and is available. The archived link becomes:

http://web.archive.org/web/20110605223242/http://www.unece.org/cefact/locode/service/location.htm

And, cool, we’re in business and able to look up location codes!

Categories
DNS IT Operational Excellence Network Technologies

The IT Detective Agency: Roku player can’t connect to Internet

Intro
I bought a used Roku player, just to try it out. Things didn’t go right at first.

The details
Setting it up, I got to the point where it tests its Internet connection. There are three status lights. The first two were OK, but the Internet connection returned an error, I think error #9. It didn’t matter whether I used a wired or wireless connection.

I have an unusual Internet router at home, a Juniper SSG5. All my other devices off that router were getting to Internet just dandy however. Long story short, it turns out that there was a slight DNS misconfiguration. The Juniper had itself as secondary DNS server. There was no primary DNS server. Why this only affected Roku is still a mystery.

I updated the Juniper config to use 4.2.2.3 as primary DNS server and the Roku connected just fine and it worked wirelessly as well.

Case closed!

Conclusion
Well, as a network type-of-guy, I would have appreciated access to the Roku Linux OS so I could have seen for myself what was going on. I’m sure I would have figured out the problem much sooner than I did. The error message was not particularly helpful and most of the online discussions attribute that problem to other causes than was the DNS issue discovered here. Short of OS access, more robust debugging tools, such as PING, would have been might handy.

Categories
Admin Security Web Site Technologies

DOS Attack from South Korea

Intro
Last week I witnessed a denial-of-service (DOS) attack against a web server. This is disconcerting to say the least. I felt by putting the information out in the open some leverage might be applied to the responsible party for that server or subnet.

The source of the attack was a single IP, 58.180.70.160.

Some Details
The attack occurred July 31st.

It consisted of over 100,000 object accesses.

I do not see evidence of an actual hack.

It lasted about 10 hours, from 12:30 PM EST to 10:57 PM, with some few preliminary accesses starting at 4 AM before the onslaught at Noon.

Clearly some kind of tool was used. It is very aggressive around forms, doing lots of variations in its POST to the same form, over and over. Here’s one small snippet that gives the flavor:

58.180.70.160 - - [31/Jul/2012:22:53:05 -0400] "POST /[omitted]/forgot_password.jsp../../../../../../../../../../etc/passwd/./././././././././././[pattern repeated - total URI is over 4000 characters!]

Then there’s the fishing around for files which I suppose if present may become a vector for attack, like these lines:

58.180.70.160 - - [31/Jul/2012:12:31:47 -0400] "GET /qlc9tjIqjR.cfm HTTP/1.1" 404 292 "-" "Mozilla/4.0 (compa
tible; MSIE 8.0; Windows NT 6.0)" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:48 -0400] "GET /_vti_pvt/authors.pwd HTTP/1.1" 404 292 "-" "Mozilla/4.0
(compatible; MSIE 8.0; Windows NT 6.0)" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:47 -0400] "GET / HTTP/1.1" 200 9491 "-" "Mozilla/4.0 (compatible; MSIE 8
.0; Windows NT 6.0)" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:48 -0400] "GET /crossdomain.xml HTTP/1.1" 404 292 "-" "Mozilla/4.0 (comp
atible; MSIE 8.0; Windows NT 6.0)" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:48 -0400] "GET /solr/select/?q=test HTTP/1.1" 404 292 "-" "Mozilla/4.0 (
compatible; MSIE 8.0; Windows NT 6.0)" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:47 -0400] "GET /inexistent_file_name.inexistent0123450987.cfm HTTP/1.1"
404 292 "-" "<script>alert(12345)</script>" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:47 -0400] "GET http://www.acunetix.wvs HTTP/1.1" 404 292 "-" "Mozilla/4.
0 (compatible; MSIE 8.0; Windows NT 6.0)" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:47 -0400] "GET /index HTTP/1.1" 404 292 "-" "Mozilla/4.0 (compatible; MS
IE 8.0; Windows NT 6.0)" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:47 -0400] "GET /server-info HTTP/1.1" 404 292 "-" "Mozilla/4.0 (compatib
le; MSIE 8.0; Windows NT 6.0)" "-" "-"
58.180.70.160 - - [31/Jul/2012:12:31:48 -0400] "POST /_vti_bin/shtml.exe?_vti_rpc HTTP/1.1" 405 124 "-" "Mozi
lla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)" "-" "-"

The line mentioning Acunetix is I think a key to understanding this attack. I think it is leaving a cookie crumb trail to let the attacked site know what tool was used: Acunetix WVS Free Edition scanner.

So it’s a semi-legitimate tool, or at least one openly referenced by Google. But in the hands of a malicious or simply ignorant user, such a tool actually does become a DOS weapon. Its aggressive POSTs often have consequences for back-end databases. A medium-duty site is typically not tuned to handle 8,000 POSTS per hour, which is the rate they came in – over two per second. So the site can be tied up servicing these POSTs and not have resources left over to handle legitimate requests.

To get some background on where this originated from and by whom, I did a whois search on ARIN, the main Internet address space registry. This told me that that address space has been delegated to APNIC, the ASIA-Pacific NIC. So I did a Whois search on APNIC. That showed me that in fact that range is in turn delegated to Korea NIC! So it’s off to the KRNIC and their whois… That in turn gave me the most specific information of all:

More specific assignment information is as follows.
 
[ Network Information ]
IPv4 Address       : 58.180.68.0 - 58.180.71.255 (/22)
Network Name       : SHINBIRO-INFRA
Organization Name  : ONSE Telecom
Organization ID    : ORG2324
Address            : 2F Bundang Center, Onse IDC, 235-230, 
Zip Code           : 448-500
Registration Date  : 20081107
Publishes          : Y
 
 See the Contact Info
[ Technical Contact Information ] Name : IP Manager Organization Name : ONSE Telecom Zip Code : 448-500 Phone : +82-2-1666-0120 E-Mail : [email protected]
 
 
- KISA/KRNIC Whois Service -

Obviously it doesn’t get so specific that it tells me who owns the attacking host, but the mentioned subnet, 58.180.68.0/22 is not too big, all things considered, so that’s quite specific compared to where we were when we started our journey. It seems that its administered by ONSE Telecom in South Korea.

I tried to contact them by email – the email bounced. I also wrote to [email protected], which also bounced.

KRNIC is run by KISA, which claims to be an Internet security enterprise (fat chance in my book). I filled out their online contact form – it consistently failed to accept my message, no matter how simple I made it.

At this point I decided that this is too many offenses. The owners of that subnet, 58.180.68.0/22 are not playing by the rules of the Internet and they should be cut off from the Internet until they follow the rules.

So I set up rules to discard all packets from subnet 58.180.68.0/22 and I recommend that others do the same on their web sites.

Amazon Cloud not very sophisticated with regards to security
I initially tried to take my own advice and sought to put the above subnet into a deny security rule on my Amazon cloud service. I work with firewalls and this sort of thing is commonplace and been around for well over 15 years. But you can’t do it on the Amazon Cloud security service! I was shocked. They have so many sophisticated services and have clearly thought out this “cloud thing” better than just about anyone, but they didn’t stop to listen to customer requests on this basic security issue – a allow all but these subnets type of capability.

Conclusion
A DOS was observed emanating from a host in South Korea using Acunetix’s free WVS scanner. The owner of the subnet does not post legitimate contact information and should be banned from Internet connectivity for this egregious behaviour which was perpetrated using its address space.

Categories
Admin Linux SLES

The IT Detective Agency: A couple of our SLES servers are running very slowly

Intro
Sometimes truth is stranger than fiction, even in the IT realm. I actually had a mere supporting role in this case – credit must be given to my persistent and accomplished friends for finding the root cause.

This Unlikely case begins with a serendipitous accident
An incident accidentally gets assigned to me about a couple of application servers running slowly – the echo of command-line typing comes in fits and starts. Well, I quickly decide it’s not my issue and I look around to see who should handle it. Probably the network specialist, right? No other server is having this issue, and the servers with the issue are responding fine to PING on their local segment. But still, it sounds like a network problem. For instance an interface with duplex settings mismatched as compared to the switch port.

But I decide to be nice about it and approach the guy with the problem, “Freg,” and ask him what he thinks should be done with the ticket. He takes the opportunity to show me the problem in person. So I listen to his story and politely look. This took place on July 31st, mind you (yesterday). No one’s using theses servers until they tried today. They are too slow to use now. The last time they were used was about 50 days ago – they were fine then. There are two servers with a similar problem.

And it goes on like that. These are running an ERP app which starts a Java process. Both of these servers are VMs. So lots of facts are being thrown at me. Maybe some are relevant and some not. I have never heard of these servers and am not familiar with the app. No root access is available to us, but we can log on as the same user who runs the ERP app.

I do an uptime. It’s up 54 days. More imporantly, the load average is very high – 25 and pinned there because the 5- and 15-minute average is also around 25. I now feel comfortable explaining why the slow character-typing echo. So it’s not a network problem after all… Talk to a sysadmin is my advice. But he sits there expecting me to do more, to somehow offer something helpful…

So I hem and haw and do essentially the one thing I know how to do in these circumstances: run strace. I get the process number of the offending process and try to get some insight into what it’s spending its time on. I don’t do this very often, and when I do I usually don’t learn very much, but it is another piece of the puzzle: I have learned more than when I started out. Let’s say the process number was 26743, then I ran:

> strace -f -p 26743

and simply watched the output. The output was weird. It was filled with calls to a function I’ve never seen before: futex. In fact it was all futex calls, looping, rapidly, and the same calls, producing timeout errors.

You can look in section 2 of the man pages:

> man -s2 futex

on a Linux system and see for yourself that futex is the Fast userspace locking system call. Don’t ask me. I’d say it’s a kernel thing. But it intuitively doesn’t seem right that a program would be using excessive amounts of CPU doing nothing but this one system call. A more healthy program will be seen making a nice variety of calls, especially and usually TCP-related ones like open, read, socket, gethostbyname, etc.

A popular cause ruled out
I also checked out /etc/resolv.conf. It’s often that a sysadmin messes that file up with an invalid nameserver, or even, just the other day, a nameserver line that omitted the nameserver directive and only contained the IP address of the nameserver! The symptom of that is different. The initial login prompt comes slowly (as it times out doing a reverse lookup on your source IP address), but character echo after that is normal.

The leap second
Other ideas for isolating the problem: reboot, but turn off this process at start-up. I think the reboot did help. But our sysadmin found that in fact this is a known issue on Suse Linux (SLES).

From we learn that a leap second was added June 30th, 2012, and there is a problem associated with it. It “can cause applications that are using FUTEXes to consume 100% of CPU. The issue is present in all Linux kernel versions >= 2.6.22, therefor affecting SLES 11 SP1 and later releases.” Wow!

The remedy in that case is to execute the command

$ date -s “$(LC_ALL=C date)”

to trigger a clock_was_set() system call. We did this and it seems to have fixed our issue.

Case closed.

Conclusion
The best sleuthing involves multiple people looking at things. Sometimes their individual breakthroughs need to be combined. Here the incidental observation of futex calls helped associate a cpu problem to a kernel bug related to a leap second implementation. This also explains why the problem did not exist 50 days ago – that was before June 30th.

Categories
Admin Internet Mail Linux

Yahoo! stopped accepting emails – what we can learn about sendmail from this

Flash Update
I noticed yahoo.com, yahoo.ca and yahoo.com.br had stopped accepting emails today.

I wanted to provide insight into this problem that few others would have, namely, when did the problem first occur?

I looked at my stuck messages in queue. I realized that with sendmail, there is no easy way to answer the question: what is the oldest stuck message for domain XYZ in the queue? So I rigged up something, very crude, but typical for a Unix command line-type of guy.

Namely:

$ grep yahoo */qf*|grep for|cut -d\; -f2|sort|head

The answer? My first stuck message comes from 12:43:46 EST today (July 31st). As of this writing, 3:53 PM, the problem still exists, which already makes it a quite long outage even if it is fixed in the next few minutes. Just minutes later, around 4 PM, I noticed a lot of the messages being delivered, so the problem seems to have finally cleared up.

The expression above works if you are root and the current directory is, e.g., /mqueue, under which you have queue directories q0, q1, …, q9, etc corresponding to an MC statement:

define(`QUEUE_DIR',`/mqueue/q*')dnl

The grep above might miss a few messages, but if you have lots of them as I do, it doesn’t really matter as it’s purpose is just to convey the general idea of when the problem started. In my case it can be safely assumed that I am continuously sending emails to yahoo.com, so it is not possible to have a window that isn’t covered of more than ten minutes or so during the day.

Conclusion
We helped document a complete three-hour outage of Yahoo mail. Along the way we learned of a deficiency in sendmail’s mailq command – how limited its reporting options are. We compensated for that by rolling our own series of commands to answer the question of what is the oldest mail of this type in our queues.

Categories
JavaScript Linux Perl

A simple Perl script to build JavaScript folder objects

Part 2
Intro
This is the 2nd part in a two-part blog where I present a simple example of a JavaScript folder browser. In Part 1 I provided all the JavaScript required. By itself it may have seemed an academic exercise, but once you appreciate that it isn’t hard to write a program which creates the JavaScript objects from your server’s directory structure, well, now you have something that’s pretty powerful and useful.

The details
I considered writing this in Python, which seems to be a more current language, but old habits die hard as they say. I just know Perl too well to suffer the pain of learning all those neat tricks all over again in another language. Maybe someday I’ll re-write it in Python.
Notice the recursion through the directories? I first used that 17 years ago! Why throw out good code?

Here is the code, which I named scan.pl:

#!/usr/bin/perl
# DrJ - 7/2012
# scan picture-containing directories using recursion and build javascript objects from them
use Getopt::Std;
getopts('d:j:');
$homedir = $opt_d;
$jsfile = $opt_j;
usage() if ! $opt_d || ! $opt_j;
$DEBUG = 0;
print "Homedir: $homedir, jsfile: $jsfile\n";
open(JS,">$jsfile") || die "Cannot open JavaScript file: $jsfile!!\n";
$date = `date +%D`;
($homedirnoslash) = $homedir =~ /^\/(.+)/; # assumes leading "/"
 
# print opening of function
print JS qq#function init() {
// Generated data from scan.pl - DrJ $date
folder['browse'] = {path:'',depth:0,kids:['$homedirnoslash']};
#;
 
# get things going with our recursive function
traverse($homedir,0);
 
# closing statement
print JS qq(}\n);  # close of init JavaScript function
close(JS);
 
sub traverse {
my ($dir,$depth) = @_;
my @kids = ();
print "Traverse. dir: $dir\n" if $DEBUG;
opendir(DIR, $dir) || die "Cannot open dir $dir!!\n";
foreach (readdir(DIR)) {
  next if $_ eq '.' || $_ eq '..';
  print "Traverse. file: $_\n" if $DEBUG;
  $path = "$dir/$_";
  if (-d $path) {         # a directory
# we want only the last part of the path
    (my $lastpath) = $path =~ /([^\/]+)$/;
    push(@kids,$lastpath);
    traverse($path,$depth + 1); # recurse!
  } elsif ($_=~/$filespec/) {        #
  }
} # end loop over files in this directory
# write out the JS objects
print JS qq(folder["$dir"] = {path:"$dir",depth:$depth,kids:[);
my $i = 0;
# kids are in jumbled order.  Do regular sort on them.
foreach (sort @kids) {
  $comma = $i++ > 0 ? "," : "";
  print JS qq($comma"$_");
}
# end of object. close it out.
print JS qq(]};\n);
} # end sub traverse
#
sub usage {
  print "usage: $0 -d root_directory -j JavaScript_output_file\n";
  exit(1);
}

It’s pretty self-explanatory. Call it like this example:

> ./scan.pl -d /homepic -j init.js

and it produces an init.js file filled with an init() function and all the necessary folder objects, assuming the top-level folder to browse is /homepic.

My init.js looks like this:

function init() {
// Generated data from scan.pl - DrJ 07/27/12
 
folder['browse'] = {path:'',depth:0,kids:['homepic']};
folder["/homepic/pictures_chronological/2011_06"] = {path:"/homepic/pictures_chronological/2011_06",depth:2,kids:[]};
// lots more lines like this omitted
folder["/homepic"] = {path:"/homepic",depth:0,kids:["kodak_pictures","pictures_chronological"]};
}

And I tested it in browse13.html, which looks just like browse12.html, except I got rid of the init() function and added an include line at the top:

<html>
<head>
<script type="text/javascript" src="init.js"></script>
...

I am a little concerned about performance. This clearly isn’t designed to scale to tens of thousands of directories, but will it be sufficiently fast for my purposes? My init.js is 213 lines and about 25 KB in size. browse13.html which calls it loads fast and runs fast. So, yes, success!

Part 1, A Simple Javascript Folder Browser

Conclusion
We have created a fairly powerful and general-purpose folder browser out of fairly simple usage of JavaScript and Perl. It makes an ideal base upon which to build further.

Categories
Apache JavaScript Perl

A Simple Javascript Folder Browser

Part 1

Intro
I haven’t posted much lately. I’ve been tied up creating this folder browser using client-side JavaScript. I probably made every mistake in the book, but I worked through them all and the outcome is pretty cool, if I say so myself! It works in IE, FireFox and even Blackberry!

The details
I broke down the development of this browse app into 12 stages. Given time, I might show the progression of my thinking as each version becomes closer and closer to fulfilling all initial objectives. But who has time? I’ll show the source for browse3.html, warts and all, and then skip many iterations and jump to showing the final source, browse12.html.

Browse3.html
It ain’t pretty. It isn’t even correct. But it “does stuff.” It assumes Apache web server is running and “borrows” the closed and open folder icons from apache’s /icons directory. In my case I have a top-level directory called /homepic with folders under that and sub-folders under those folders that I want the ability to browse and , ultimately, take some action such as displaying all the folder’s images in the image viewer I wrote earlier.

<!DOCTYPE html>
<html>
<head>
<script type="text/javascript">
// global object
var folder = new Object;
function displayDate()
{
document.getElementById("demo").innerHTML=Date();
}
function init() {
// big initialization - generated code from perl perusal of directories
// see http://www.quirksmode.org/js/associative.html
//  var folder = new Object;
// we need this empty assignment to extend object with subproperties later on
  folder['homepic'] = '/homepic';
  //folder['/homepic'].path = '/homepic';
  folder.homepic.state = 'closed';
  folder[0] = '/homepic/kodak_pictures';
  folder[1] = '/homepic/pictures_chronological';
  folder['/homepic/kodak_pictures'] = '/homepic/kodak_pictures';
  folder['/homepic/kodak_pictures'].state = 'closed';
  //folder['/homepic/kodak_pictures'].path = '/homepic/kodak_pictures';
  folder['/homepic/pictures_chronological'] = '/homepic/pictures_chronological';
  folder['/homepic/pictures_chronological'].state = 'closed';
  //folder['/homepic/pictures_chronological'].path = '/homepic/pictures_chronological';
  var child = 'homepic';
  var cstate = folder['homepic'].state;
  var cpath = folder[0];
}
function browse(f) {
document.getElementById("demo").innerHTML="folder: "+f+" ";
var icon;
var imgid;
var fname;
if (f == "browse") {
// see http://www.quirksmode.org/dom/intro.html#create
// one-time initialization
  init();
  document.getElementById("browse").innerHTML='';
  icon = document.createElement('IMG');
  icon.src = "/icons/folder.gif";
  icon.id = "icon-/homepic";
  icon.onclick = Function('browse("/homepic")');
  document.getElementById("browse").appendChild(icon);
  var divfolder = document.createElement('DIV');
  divfolder.id = "/homepic";
  document.getElementById("browse").appendChild(divfolder);
  var x = document.createTextNode('homepic');
  document.getElementById('/homepic').appendChild(x);
  //document.getElementById("browse").innerHTML='<img id="homepic" onclick="browse(\'homepic\')" src="/icons/folder.gif">ho
mepic<br>';
} else {
  imgid = 'icon-' + f;
// change img to one of open folder
  document.getElementById(imgid).src = "/icons/folder.open.gif";
  // nope? imgid.src  = "/icons/folder.open.gif";
  for (i=0;i<3;i++) {
    icon = document.createElement('IMG');
    icon.src = "/icons/folder.gif";
    fname = folder[f].children[i];
    //fname = folder['homepic'].children[i];
    icon.id = "icon-" + fname;
    icon.onclick = Function('browse('+fname+')');
    document.getElementById(f).appendChild(icon);
    var divfolder = document.createElement('DIV');
    divfolder.id = fname;
    document.getElementById(f).appendChild(divfolder);
    var x = document.createTextNode(fname);
    document.getElementById(f).appendChild(x);
  } // end loop over children
}
}
// note variable # or arguments are being passed
function  openfolder()
{
var f = arguments[0];
var dir = "/icons/";
var fopen = "folder.open.gif";
var fclosed = "folder.gif";
var folder = document.getElementById(f);
var ftype = folder.src;
// src includes http... Get rid of stuff in front
var patt = /.*(folder.+)/;
var ftypebare = ftype.replace(patt,"$1");
// for debugging
document.getElementById("demo").innerHTML="folder: "+f+" "+ftypebare;
if (ftypebare == fopen) {
// close folder and remove sub-folders
  folder.src = dir+fclosed;
  document.getElementById("homepic/pictures_chronological").innerHTML='';
} else {
// open up folder and reveal sub-folders
  folder.src = dir+fopen;
  for (var i = 0; i < arguments.length; i++) {
    var placeid = arguments[i];
    var srcid = '/' + placeid;
    document.getElementById(placeid).innerHTML='&nbsp;&nbsp;&nbsp;<img id="' + srcid + '" onclick="openfolder(\'pictures_ch
ronological\')" src="/icons/folder.gif"/> pictures_chronological<br>';
  }
}
 
}
</script>
</head>
<body>
 
<h1>Folder Browser</h1>
<p id="demo">Debug Aid.</p>
<p id="browse"><a href="#" onclick='browse("browse")'>Folder Browser</a></p>
 
<img id="homepic" onclick="openfolder('homepic','homepic/pictures_chronological','homepic/canon_pictures')" src="/icons/fol
der.gif"/> homepic<br>
<div id="homepic/pictures_chronological"></div>
<div id="homepic/canon_pictures"></div>
<img id="cfolder2" onclick="openfolder(2)" src="/icons/folder.gif"/><br>
<img id="cfolder3" onclick="openfolder(3)" src="/icons/folder.gif"/><br>
<img id="cfolder4" onclick="openfolder(4)" src="/icons/folder.gif"/><br>
 
</body>
</html>

So you see in browse3 I’m wrestling with how to work with JavaScript Objects (which I didn’t really know existed at that time). I badly wanted to give an associative array properties, as in the initialization line

folder['/homepic/kodak_pictures'].state = 'closed';

but I couldn’t find any examples on the Internet. I eventually learned that wasn’t a correct assignment.

So one of my biggest and most worthwhile lessons was to gain a decent understanding of JavaScript objects, which hold multiple values, and object properties.

I tried to use Firebug for Firefox, with very limited success, but at least I could step through the Javascript code and see which branch in a conditional was being executed compared to what I thought should be executed, which tipped me off to one vexing problem concerning opening and closing folders multiple times. Also just looking up the error in Internet Explorer by double-=clicking the warning sign in the corner was tremendously helpful.

So…skipping for now versions 4 – 11, we arrive at browse12.html:

Browse12.html

<!DOCTYPE html>
<html>
<head>
<script type="text/javascript">
// global object
var folder = new Object;
function displayDate()
{
document.getElementById("demo").innerHTML=Date();
}
function init() {
// big initialization - generated code from perl perusal of directories
// I think my big breakthrough was to learn about Javascript objects and their properties
// from the book Javascript The Definitive Guide, 5th edition by D.Flanagan
// Some additional inspiration came from http://www.quirksmode.org/dom/intro.html#create
// folder is an associative array with properties path, state,depth and kids (which is itself an array)
// first entry is initial top-level to get things started
  folder['browse'] = {path:'',depth:0,kids:['homepic']};
// regular entries:
  folder['/homepic'] = {path:'/homepic',depth:1,kids:['kodak_pictures','pictures_chronological']};
// values for sub-folders
// /homepic/kodak_pictures
  folder['/homepic/kodak_pictures'] = {path:'/homepic/kodak_pictures',depth:2,kids:['2002_05','2002_06']};
// /homepic/pictures_chronological
  folder['/homepic/pictures_chronological'] = {path:'/homepic/pictures_chronological',depth:2,kids:['2011_11','2011_12']};
}
function browse(f) {
document.getElementById("demo").innerHTML="folder: "+f+" ";
var icon;
var imgid;
var fname;
var table;
if (f == "browse") {
// one-time initialization
  init();
}
 
imgid = 'icon-' + f;
if (!folder[f]) {
// folder not defined: must be a terminal folder. Do something here eventually
// but for now do nothing whatsoever
} else {
  if (folder[f].state === undefined || folder[f].state == 'closed') {
  // change img to one of open folder
    if (f == 'browse') {
      document.getElementById(f).innerHTML='';
    } else {
      document.getElementById(imgid).src = "/icons/folder.open.gif";
    }
    var kids = folder[f].kids;
    for(var i=0;  i < kids.length; i++) {
      table = document.createElement("table");
      table.border = 0;
      var row = document.createElement("tr");
      var td1 = document.createElement("td");
      var td2 = document.createElement("td");
      var td3 = document.createElement("td");
      icon = document.createElement("img");
      icon.src = "/icons/folder.gif";
      if (f == "browse") {
        fname = "/" + kids[i];
      } else {
        fname = f + "/" + kids[i];
      }
      icon.id = "icon-" + fname;
      if (folder[fname])
        folder[fname].state = 'closed';
      icon.onclick = Function('browse("'+fname+'")');
      td1.width = 27 + folder[folder].depth*27;
      td1.align = "right";
      td1.appendChild(icon);
      var node = document.createTextNode(kids[i]);
      td3.appendChild(node);
      row.appendChild(td1); row.appendChild(td2); row.appendChild(td3);
      table.appendChild(row);
      document.getElementById(f).appendChild(table);
      var divfolder = document.createElement("div");
      divfolder.id = fname;
      document.getElementById(f).appendChild(divfolder);
      folder[f].state = 'open';
    } // end loop over children
  } else {
  // set folder to closed state
  // this innerHTML nullification is kind of a kludge, but it works
    document.getElementById(f).innerHTML='';
    document.getElementById(imgid).src = "/icons/folder.gif";
    folder[f].state = 'closed';
  } // end conditional over folder state
} // end condition over whether folder is defined or not
} // end function browse
 
</script>
</head>
<body>
 
<h1>Folder Browser</h1>
<p id="demo">Debug Aid.</p>
<p id="browse"><a href="#" onclick='browse("browse")'>Folder Browser</a></p>
</body>
</html>

That’s it! Not bad, huh? I pan to generate the init() function periodically and automatically from a Perl script which peruses my directories.

I could have used Ajax and generated the subfolder information on the fly as it is needed – not that I know how, I just know enough to know it is possible and therefore I could do it – but I thought this method of pre-loading all the information might be a little more efficient. If this were a folder and file browser it would be different, but for now it is just a folder browser.

So the main revelation is that I had to set my associative array members to be objects during initialization, as in

folder['/homepic'] = {path:'/homepic',depth:1,kids:['kodak_pictures','pictures_chronological']};

and one of the object values is itself an anonymous array that holds the sub-folders.

Buying an actual book was probably a good move. I went with JavaScript, The Definitive Guide, 5th edition. Note that this is not the latest edition – the 6th – but this way I could buy the book used for a lot less and not get myself further confused by HTML5, which I am not ready to tackle and which many browsers do not yet fully support. The book is pretty heavy going and the discussion of DOM was particularly difficult and the examples too few and too removed from the real world. But the Core Javascript discussion made a lot of sense to me so was by itself worth the purchase price.

In my next post I’ve posted the Perl script which can generate the folder object initialization.

Part 2, A simple Perl script to build JavaScript folder objects

Conclusion
In this part one of a two-part post I’ve provided the JavaScript that implements a very compact folder browser. It has been tested on both IE and Firefox. The 2nd part of this series will provide the Perl Jvascript code generator for automation of the object creation.

Categories
Admin Network Technologies

The IT Detective Agency: Two of our sites got cut off!

Intro
I sometimes consult for the networking group of a large company. This incident really happened. I don’t know that it could ever happen again to anyone else, but it’s so bizarre that I just had to document it as an example of “you wouldn’t believe it unless you had actually been through it yourself.”

Let’s get into it
This company has lots of small and mid-sized offices connected via MPLS. WAN services are provided by a single telecom throughout the country. I feel obliged to not divulge specifics here. Let’s call the telecom “OE” as in over-extended.

So just before lunch yesterday they tell me that no PCs at one of their sites can access Internet, and this information is coming from a very reliable source. It also comes out that a second site is similarly affected. It kind of sounds like a WAN problem, but no other sites are affected. In the old days you’d almost certainly know to look at the WAN, but these days it’s a little more complicated. Everyone’s PC is in AD and they have the ability to push a GPO to all PCs at a site, so you just never know if the desktop group wasn’t involved in messing them up.

So they tell me they can PING their corporate Intranet server. Fine. But they cannot telnet to port 80. Newsflash. How did they get telnet enabled in Windows 7? I mentally stored this question for my continuing education. Crises are also great learning/teaching moments if you are of that frame-of-mind!

Ping is good. Of course I test the Intranet server myself, iwww.intranet. I can reach port 80 just fine. It happens that the front-end for iwww.intranet is a load balancer. I decide to do a trace using tcpdump. I’m not sure what I’ll find, but taking a trace is sort of a gut reaction in these cases. There’s lots of other traffic so we have to use a filter to see the tree in the forest. They give me the IP of the PC they’re testing from. My expression is something like this:

> tcpdump -i 0.0 host WKSTATION.AD.INTRANE

The 0.0 on this particular device is its way of saying use any and all interfaces.

Here’s what the output looks like:

11:51:03.852511 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3de0a5), length 100
11:51:05.855446 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3de101), length 100
11:51:06.187940 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3de10b), length 100
11:51:08.857957 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3de16d), length 100
11:51:09.184072 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3de179), length 100
11:51:09.858865 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3de198), length 100
11:51:14.855327 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3de272), length 100
11:51:15.183349 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3de27f), length 100
11:52:19.898380 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3decab), length 100
11:52:22.901684 IP WKSTATION.AD.INTRANET > iwww.intranet: ESP(spi=0xd7ca145d,seq=0x3ded28), length 100

I had never seen that before. I loop up ESP and see it is related to IP protocol 50 which in turn is used in IPSEC for VPN connections.

What the…?

Yeah. Let’s review. He’s sending TCP packets to port 80 of iwww.intranet and all I’m seeing are these ESP packets, which isn’t even TCP. Of course the load balancer has no idea whatsoever what to do with those packets and simply does not respond to them.

What would you do? I felt the nail in the coffin would be to take a trace on the PC itself – see how the packets are when they’re coming straight out of the PC. But to be honest they never did make that trace. They didn’t have Wireshark installed so it would take awhile.

Meanwhile, the infrastructure folks are talking to each other and someone mentions the OE has a certain project “runVPN” that thay’re rolling out. Now that sounds suspicous. In the imperfect world you have to work with what you have, not what you’d like to have. Based on our experiences and educated hunches, we now feel pretty certain it’s gotta be a WAN problem caused by OE.

Within an hour OE confirms the problem is of their creation, and they have it fixed. They are very unhappy with the tech who caused it.

Conclusion
Sometimes things are what they appear to be. If you notice I didn’t do much to really help with the issue, and that’s all just about anyone can do when so much is outsourced. They did feel that my trace helped to convince the telecom that this really was their issue.

I guess they were encrypting WAN traffic on one end, but forgot to decrypt it on the other end. One of the strangest things to have on a production network.

Turning on telnet in Windows 7
Did I mention that they tested with telnet on Windows 7? They later explained how to enable it. Go to control panel / Programs / Programs and Features / Turn Windows features on. There is an option for Telnet client. Reboot (yes, you really need to reboot for this to take effect) and you’re good to go.