Categories
Admin

SiteScope keeps restarting

Intro
I’m just documenting what the support tech had me do to fix this scary issue.

The details
This was a SiteScope v 11.24 instance running on a RHEL 6.6 VM.

2015-12-08 05:12:56,768 [SiteScope Main Thread] (SiteScopeSupport.java:721) ERROR - SiteScope unexpected shutdown
java.lang.reflect.UndeclaredThrowableException
        at com.sun.proxy.$Proxy106.postInit(Unknown Source)
        at com.mercury.sitescope.platform.configmanager.ConfigManager$PersistencyAdaptor.initPersistObjectsAfterLoad(ConfigManager.java:1967)
        at com.mercury.sitescope.platform.configmanager.ConfigManager$PersistencyAdaptor.initialize(ConfigManager.java:1247)
        at com.mercury.sitescope.platform.configmanager.ConfigManager$PersistencyAdaptor.access$300(ConfigManager.java:1112)
        at com.mercury.sitescope.platform.configmanager.ConfigManager.initialize(ConfigManager.java:145)
        at com.mercury.sitescope.platform.configmanager.ConfigManagerSession.initialize(ConfigManagerSession.java:153)
        at com.mercury.sitescope.bootstrap.SiteScopeSupport.initializeSiteScope(SiteScopeSupport.java:592)
        at com.mercury.sitescope.bootstrap.SiteScopeSupport.configureSiteScope(SiteScopeSupport.java:629)
        at com.mercury.sitescope.bootstrap.SiteScopeSupport.siteScopeMain(SiteScopeSupport.java:678)
        at com.mercury.sitescope.web.servlet.InitSiteScope$SiteScopeMainThread.run(InitSiteScope.java:233)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor109.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.mercury.sitescope.platform.configmanager.ManagedObjectConfigRef$ManagedObjectProxyHandler.invoke(ManagedObjectConfigRef.java:290)
        ... 10 more
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
        at com.mercury.sitescope.entities.monitors.MonitorGroup.readDynamic(MonitorGroup.java:445)
        at com.mercury.sitescope.entities.monitors.MonitorGroup.postInit(MonitorGroup.java:2001)
        ... 14 more
2015-12-08 05:12:56,776 [SiteScope Main Thread] (SiteScopeShutdown.java:51) INFO  - Shutting down SiteScope reason Exception: java.lang.reflect.UndeclaredThrowableException null...
2015-12-08 05:12:56,784 [ShutdownThread at Tue Dec 08 05:12:56 EST 2015] (SiteScopeGroup.java:1061) INFO  - Stopping dynamic counters flow...
2015-12-08 05:12:56,832 [ShutdownThread at Tue Dec 08 05:12:56 EST 2015] (SiteScopeGroup.java:1112) INFO  - Waiting 40 secs to allow monitors to complete.
2015-12-08 05:13:36,854 [ShutdownThread at Tue Dec 08 05:12:56 EST 2015] (SiteScopeGroup.java:1280) INFO  - Average Monitors Running: 0
2015-12-08 05:13:36,855 [ShutdownThread at Tue Dec 08 05:12:56 EST 2015] (SiteScopeGroup.java:1281) INFO  - Peak Monitors Per Minute: 0 at 7:00 pm 12/31/69
2015-12-08 05:13:36,855 [ShutdownThread at Tue Dec 08 05:12:56 EST 2015] (SiteScopeGroup.java:1283) INFO  - Peak Monitors Running: 0.0 at 7:00 pm 12/31/69
2015-12-08 05:13:36,855 [ShutdownThread at Tue Dec 08 05:12:56 EST 2015] (SiteScopeGroup.java:1284) INFO  - Peak Monitors Waiting: 0 at 7:00 pm 12/31/69

And this just kept happening and happening.

The solution
The support tech from HPE had me go in the groups directory and delete all files except those ending in .dyn and .config. Those directories are in the /opt/HP/SiteScope directory on my installation.

In the persistency directory we deleted all files ending in .tmp. But we made saved copies of the entire original groups and persistency directories elsewhere just in case.

The results
HP siteScope started just fine after that! A healthy siteScope startup includes lines like these:

2015-12-08 08:59:12,149 [SiteScope Main] (SiteScopeGroup.java:995) INFO  - Open your web browser to:
2015-12-08 08:59:12,149 [SiteScope Main] (SiteScopeGroup.java:996) INFO  -   http://10.192.136.89:8080
2015-12-08 08:59:12,180 [SiteScope Main] (SiteScopeGroup.java:324) INFO  - Starting common scheduler...
2015-12-08 08:59:12,273 [SiteScope Main] (SiteScopeGroup.java:344) INFO  - Starting maintenance scheduler...
2015-12-08 08:59:12,381 [SiteScope Main] (SiteScopeGroup.java:609) INFO  - Starting topaz manager
2015-12-08 08:59:12,384 [SiteScope Main] (SiteScopeGroup.java:611) INFO  - Topaz manager started.
2015-12-08 08:59:12,386 [SiteScope Main] (SiteScopeGroup.java:457) INFO  - Starting monitor scheduler...
2015-12-08 08:59:12,386 [SiteScope Main] (SiteScopeGroup.java:462) INFO  - Starting report scheduler...
2015-12-08 08:59:12,398 [SiteScope Main] (SiteScopeGroup.java:477) INFO  - Starting analytics scheduler...
2015-12-08 08:59:12,398 [SiteScope Main] (SiteScopeGroup.java:488) INFO  -
2015-12-08 08:59:12,398 [SiteScope Main] (SiteScopeGroup.java:489) INFO  - SiteScope is active...
2015-12-08 08:59:12,493 [SiteScope Main] (SiteScopeGroup.java:513) INFO  - SiteScope Starting all monitors
2015-12-08 08:59:12,830 [SiteScope Main] (SiteScopeGroup.java:517) INFO  - SiteScope Start monitors completed
2015-12-08 08:59:13,037 [SiteScope Main] (SiteScopeGroup.java:532) INFO  - SiteScope Startup Completed
2015-12-08 08:59:13,215 [SiteScope Main] (SiteScopeSupport.java:713) INFO  - SiteScope 11.24.241  build 165 process started at Tue Dec 08 08:59:13 EST 2015
2015-12-08 08:59:13,215 [SiteScope Main] (SiteScopeSupport.java:714) INFO  - SiteScope Start took 26 sec

Especially that last line.

The hypothesis
As the server had crashed the hypothesis is that one of the files must have gotten corrupted.

Conclusion
HP SiteScope which could not start was fixed by removing some older files. These files are not needed, either – your configuration will not be lost when you delete them.

Categories
Admin Linux

Upgrading your JDBC driver for all you HP SiteScope fans

Intro
HP SiteScope is a pretty good and not overly pricey infrastructure monitoring solution. We’ve used it for years. An unexpected Oracle error sent us scrambling to remember how the heck we installed an Oracle JDBC driver on HP SiteScope the last time we did it, which was eons ago. As with many very specific yet important things on the Internet, the documentation available on the Internet was pretty spotty. Here is my attempt to remedy that. These instructions are for Redhat Linux, though I would think similar considerations would apply to the Windows version.

The details
Well all our Oracle database monitors were working just fine for years. So when asked to monitor a new database we simply copied one of the old ones and appropriately changed the connect string. But a strange thing happened. We got this error:

ORA-28040

ORA-28040: No matching authentication protocol

So we spoke with a DBA. This new database, being newer, was running a much more current version, Oracle 12C. I became convinced that our several-years-old JDBC driver for SiteScope simply wasn’t compatible. The DBA searched the oracle site and found supporting evidence for that hypothesis. So how to upgrade?

The latest JDBC Drivers can be found here on Oracle’s Website. We selected JDBC Driver 12c Release 1 (12.1.0.2) and downloaded the ojdbc7.jar file.

The thing is that to download it you need some kind of Oracle developer account. Fortunately I had one from years back and it still worked. So we were able to download it.

Where does it go?
The other breakthrough I had was simply to remember after thinking about it what the old jdbc driver was called. Its name wasn’t anything like ojdbc.jar. No, it was classes12.jar!

Of course memories can be tricked. To confirm that that jar file looked basically right we did a

$ jar tvf classes12.jar

Sure enough, there were a bunch of lines for oracle/jdbc/blah, blah. Then out of curiosity I tried to check the actual classpath of the SiteScope process with something like this:

$ ps -ef|grep java|grep classes12

and sure enough, it highlighted a java process – clearly belonging to HP SiteScope – and the classes12.jar therein.

So memory confirmed.

Speculative next steps
This part is speculative and may not be necessary though it doesn’t seem to hurt anything. I wanted to maximize my chance of success the first time, rather than stopping/starting HP SiteScope multiple times, right? So I didn’t see a quick way to tell HP SiteScope that, hey, the new driver to use is ojdbc7.jar, not classes12.jar so I tried to force its hand. We moved the classes12.jar file out of its directory:

$ cd /opt/SiteScope/WEB-INF/lib; mv classes12.jar /tmp

and put the new jdbc file in that directory, and made a sym link from the old driver to the new one for good measure!

$ ln -s ojdbc7.jar classes12.jar

We tested if we could get away without stopping/restarting HP SiteScope. Nope. It didn’t pick up the new driver. So we were a little nervous. So we did the stop/start thing:

$ service hpss stop; service hpss start

It takes awhile, but…

Yes, the new monitor began working! Of course we were worried a bit about backwards compatibility between the 12C driver and the older version 11 databases, but those continued to work as well.

Conclusion
Installing a recent JDBC driver fixes the ORA-28040 error for our HP SiteScope installation. Was that sym link really necessary? I don’t know for sure, but I see that the java process still has classes12.jar in its path. It does not have ojdbc7.jar! There’s probably a way to modify the classpath, but I don’t know it. So in my case I’d be inclined to say Yes it was.

References and related articles
Oracle’s version 12C JDBC driver.
I rail against HP’s bureaucratic ways in this older posting.
My last HP SiteScope upgrade is documented here.

Categories
Admin Linux

Things that went wrong during the HP SiteScope upgrade

Intro
These are my notes of all the stupid things I did during my attempt to upgrade to HP SiteScope version 11.24 in response to a security problem in earlier releases. When you don’t do these things very often you totally forget what you did last time!

The details
I managed to download the “patch,” SIS_00314, from the HP Passport site. That was relatively straightforward.

First mistake: winging it
After baking up my configuration I looked into the zip file and saw an rpm that seemed like the thing I needed: packages/HPSiS1124Core-11.24.241-Linux2.4.rpm. So I decided to just install it directly. That created deep directories like this one:

/opt/HP/SiteScope/installation/HPSiS1124/flvr/SiteScope

and it really looked like it wasn’t going to do anything. And when I tried that with the Java package all kinds of dependent libraries were not found. So I removed that package and went back to the manual.

So I identified the included deployment guide and skimmed through that for inspiration. It seems you should drive the installation through the file HPSiS1124_11.24_setup.bin. So I tried it. First I had to set up my X-Windows stuff:

# vncserver :2
# export DISPLAY=:2.0
(then connect to that display using VNC client)
# xhost +
and to test it (the following command should pop up a new window):
# gnome-terminal

Then finally:

# ./HPSiS1124_11.24_setup.bin

Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...
 (i) Checking display ...
 (-) Display may not be properly configured
Please make sure the display is set properly...

I don’t know what went wrong, but I knew a GUI install was out of the question without a lot of digging. I also knew about silent or console installs. So I looked to that part of the deployment manual.

# ./HPSiS1124_11.24_setup.bin -i silent

Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...
Preparing SILENT Mode Installation...
 
===============================================================================
                                       (created with InstallAnywhere by Zero G)
-------------------------------------------------------------------------------
 
=======================================================
 
Installer User Interface Mode Not Supported
 
Unable to load and to prepare the installer in console or silent mode.
 
=======================================================

I learned this happens if you run silent mode but have a DISPLAY environment variable set! Just unset DISPLAY and re-run.

But that didn’t work out for me either. In /tmp/HPOvInstaller/HPSiS1124_11.24 the file HPSiS1124_11.24_2014.08.11_15_39_HPOvInstallerLog.txt showed:


2014-08-11 15:39:40,065 INFO – Checking free disk space… [FAILED]

However since it was a silent installation it never complained! I only knew there was a problem when I restarted SiteScope and saw it was still showing me version 11.23.

I cleared some more space and the install finally completed successfully.

Conclusion
We ran into quite a few problems doing a simple minor HP SiteScope upgrade, most of our own making. But we persevered and are now running version 11.24.

References
Yearning for the old Freshwater SiteScope? You’ll want to read this blog posting and comments.
The Secunia advisory.

Categories
Admin Network Technologies

DIY Home Power Monitoring Solution – Perfect for Sandy

Intro
My recent experience losing power thanks to Sandy has gotten me thinking. How can I know when my power’s on? Or what if it gets shut off again, which by the way actually happened to me? I realized that I have all the pieces in place already and merely need to take advantage of the infrastructure already out there.

The details
I started with:

  • a working smartphone
  • an enterprise monitoring system
  • a SoHo router in my home

And that’s pretty much it!

The smartphone doesn’t really matter, as long as it receives emails. I guess a plain old cellphone would work just as well – the messages can be sent as text messages.

For enterprise monitoring I like HP SiteScope because it’s more economical than hard-core systems. I wrote a little about it in a previous post. Nagios is also commonly used and it’s open source, meaning free. Avoid Zabbix at all costs. Editor’s note: OK. I’ve changed my mind about Zabbix some seven years later. It’s still confusing as heck, but now I’m using it and I have to admit it is powerful. See this write-up.

A good SoHo router is Juniper SSG5. It extends the enterprise LAN into the home. You can carve up a Juniper router like that and provide a Home network, Work network, Home wireless and Work wireless. It’s great!

The last requirement is the key. The SoHo router at my house is always on, and so the enterprise LAN is always available, as long as I have power. Get it? I defined a simple PING monitor in SiteScope to ping my SoHo router’s WAN interface, which has a static private IP address on the enterprise LAN. If I can’t ping it, I’ve lost power, and use my monitoring system to send an email alert to my smartphone. When, or in the case of Sandy’s interminable outage, if, power ever comes back on, I send another alert letting me know that as well. If you’re not using SiteScope make sure you send several PINGs. A PING can be lost here or there for various reasons. siteScope sends out five PINGs at a regular interval, as a guideline.

Alternative
Of course if I had a business-class DSL or cable modem service with a dedicated IP I could have just PINGed that, but I don’t. With regaulr consumer grade service your IP can and will change from time to time, and using a dynamic DNS protocol (like dyndns) to mask that problem is a bit tricky.

Yes, if you call the power company they offer to call you back when your power is restored, but I like my monitor better. It tells me when things go off as well as on.

Conclusion
This is my necessity-is-the-mother-of-invention moment. Thank you, Sandy!

Categories
Admin Network Technologies

The IT Detecive Agency: FTPs and other things going slow to one part of one location

Intro
What happens when you have a slowly festering problem – slow but tolerable performance? Is it a problem when no one complains but you know the numbers don’t feel right? Read on to see what happens in this exciting adventure.

The details
I always felt that one of my locations, let’s call it Scattering, was just hard-to-explain slow. Web site accesses as measured in HP SiteScope was slower than at another site, lets call it Cooper, and the variation seemed larger. You could always chalk it up to the fact that the SiteScope server was in the data center with the good performance, but there seemed more to it than that.

The situation was, apparently, tolerable and people grew accustomed to it. Until one day someone said that it just wasn’t right. Ping times from his site, let’s call it Vanderbilt, looked like this from his desktop:

C:\Users>ping dmz-host
 
Pinging dmz-host [10.93.23.12] with 32 bytes of data:
Reply from 10.93.23.12: bytes=32 time=80ms TTL=248
Reply from 10.93.23.12: bytes=32 time=107ms TTL=248
Reply from 10.93.23.12: bytes=32 time=74ms TTL=248
Reply from 10.93.23.12: bytes=32 time=78ms TTL=248

And yet, ping times to another piece of equipment in that same physical server room looked much healthier – on order of 22 msec with very little jitter. What the…?

So I started looking around at my equipment. My equipment generally had good response times amongst itself, but when it crossed over to another group’s equipment it started to go up. Finally I saw it – a common denominator. My equipment had its gateway supplied by a different group. PINGing the local gateway gave 15 msec response times, with high jitter! I’ve never seen that before. You normally expect to get response times , 1 msec. Time to turn the problem over to them.

But as a former scientist, I like to have a hypothesis about what might be wrong. I thought a bad cable, because I’ve heard of that. Duplex mis-matches on the port are always a strong possibility. Guess what? It wasn’t any of those things. But it was something about the port on their equipment. What do you suppose it was? Hint: the equipment they were using was leftover stuff that needed to be replaced but hadn’t because it just kept working.

Mystery Resolved
The port was saturated! It was older equipment with 100 mbit ports, the amount of traffic slowly built up on it over time and sort of crept up on us all. I guess finally the demand on it was causing noticeable problems in response times that someone finally complained!

He shifted the gateway to a 1 gbit port and things got much better. Here are those PINGs now:

C:\Users>ping 10.93.23.12
 
Pinging 10.93.23.12 with 32 bytes of data:
Reply from 10.93.23.12: bytes=32 time=24ms TTL=248
Reply from 10.93.23.12: bytes=32 time=23ms TTL=248
Reply from 10.93.23.12: bytes=32 time=23ms TTL=248
Reply from 10.93.23.12: bytes=32 time=23ms TTL=248
 
Ping statistics for 10.93.23.12:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 23ms, Maximum = 24ms, Average = 23ms

That’s much better!

The HP SiteScope timing are more subtle, but no less important. Users don’t tolerate slow-loading web pages, you know!

I need to make an aside here. Is there no free and widely distributed Linux package to calculate the standard deviation of a set of numbers? I can’t believe it. I had to borrow this guy’s shell script, and it’s pretty lousy because it’s really slow: http://panoskrt.wordpress.com/2009/03/10/shell-script-for-standard-deviation-arithmetic-mean-and-median/. Anyways, doing a before and after analysis on my URL timing, where I fetched a Google page plus images during the daytime from 10 AM to 4 PM, I find this before and after result:

Before

Number of data points in “sca” = 144
Arithmetic mean (average) = .630548611
Standard Deviation = .225365467
Median number (middle) = .664500000

After

Number of data points in “sca2” = 144
Arithmetic mean (average) = .349270833
Standard Deviation = .162399076
Median number (middle) = .263000000

Users were very happy after the upgrade.

Conclusion
An old gateway port that maxed out its traffic capacity was to blame for performance problems in one part of a data center. This doesn’t happen too often, but it can happen, obviously.

Linux needs a simple math package to allow to calculate the standard deviation of a set of numbers.

Categories
Admin Perl Web Site Technologies

Turning HP SiteScope into SiteScope Classic with Perl

Intro
HP siteScope is a terrific web application tool and not too expensive for those who have any kind of a budget. The built-in monitor types are a bit limited, but since it allows calls to user-provided scripts your imagination is the only real limitation. For those with too many responsibilities and too little time on their hands it is a real productivity enhancer.

I’ve been using the product for 12 years now – since it was Freshwater SiteScope. I still have misgivings about the interface change introduced some years ago when it was part of Mercury. It went from simple and reliable to Java, complicated and flaky. To this day I have to re-start a SiteScope screen in my browser on a daily basis as the browser cannot recover from a server restart or who knows what other failures.

So I longed for the days of SiteScope Classic. We kept it running for as long as possible, years in fact. But at some point there were no more releases created for the classic view. So I investigated the feasibility of creating my own conversion tool. And…partially succeeded. Succeeded to the point where I can pull up the web page on my Blackberry and get the statuses and history. Think you can do that with regular HP SiteScope? I can’t. Maybe there’s an upgrade for it, but still. It’s nice to have the classic interface when you want to pull up the statuses as quickly as possible, regardless of the Blackberry display issue.

Looking back at my code, I obviously decided to try my hand at OO (object oriented) programming in Perl, with mixed results. Perl’s OO syntax isn’t the best, which addles comprehension. Without further ado, let’s jump into it.

The Details
It relies on something I noticed, that this URL on your HP SiteScope server, http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getConfigurationSnapshot, contains a tree of relationships of all the monitors. Cool, right? But it’s not a tree like you or I would design. Between parent and child is an intermediate layer. I suppose you need that because a group can contain monitors (my only focus in this exercise), but it can also contain alerts and maybe some other properties as well. So I guess the intermediate layer gives them the flexibility to represent all that, though it certainly added to my complication in parsing it. That’s why you’ll see the concern over “grandkids.” I developed a recursive, web-enabled Perl program to parse through this xml. That gives me the tools to build the nice hierarchical groupings. But it does not give me the statuses.

For the status of each monitor I wrote a separate scraper script that simply reads the entire daily SiteScope log every minute! Crude, but it works. I use it for an installation with hundreds of monitors and a log file that grows to 9 MB by the end of the day so I know it scales to that size. Beyond that it’s untested.

In addition to giving only the relationships, the xml also changes with every invocation. It attaches ID numbers to the monitors which initially you think is a nice unique identifier, but they change from invocation to invocation! So an additional challenge was to match up the names of the monitors in the xml output to the names as recorded in the SiteScope log. Also a bit tricky, but in general doable.

So without further ado, here’s the source code for the xml parser and main program which gets called from the web:

#!/usr/bin/perl
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# build v.simple SiteScope web GUI appropriate for smartphones
# 7/2010
#
# Id is our package which defines th Id class
use Id;
use CGI::Pretty;
my $cgi=new CGI;
$DEBUG = 0;
# GIF location on SiteScope classic
$ssgifs = "/artwork/";
$health{good} = qq(<img src="${ssgifs}okay.gif">);
$health{error} = qq(<img src="${ssgifs}error.gif">);
$health{warning} = qq(<img src="${ssgifs}warning.gif">);
# report CGI
$rprt = "/SS/rprt";
# the frustrating thing is that this xml output changes almost every time you call it
$url = 'http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getConfigurationSnapshot';
# get current health of all monitors - which is scraped from the log every minute by a hilgarj cron job
$monitorstats = "/tmp/monitorstats.txt";
print "Content-type: text/plain\n\n" if $DEBUG;
open(MONITORSTATS,"$monitorstats") || die "Cannot open monitor stats file $monitorstats!!";
while(<MONITORSTATS>) {
  chomp;
  ($monitor,$status,$value) = /([^\t]+)\t([^\t]+)\t([^\t]+)/;
  $monitors{"$monitor"} = $status;
  $monitorv{"$monitor"} = $value;
}
open(CURL,"curl $url 2>/dev/null|") || die "cannot open $url for reading!!\n";
my %myobjs = ();
# the xml is one long line!
@lines = <CURL>;
#print "xml line: $lines[0]\n" if $DEBUG;
@multiRefs = split "<multiRef",$lines[0];
#parse multiRefs
# create top-level object
my $id = Id->new (
      id => "id0");
# hash of this object with id as key
$myobjs{"id0"} = $id;
 
# first build our objects...
foreach $mref (@multiRefs) {
  next unless $mref =~ /\sid=/;
#  id="id0" ...
  ($parentid) =  $mref =~ /id=\"(id\d+)/;
  print "parentid: $parentid\n" if $DEBUG;
# watch out for <item><key xsi:type="soapenc:string">groupSnapshotChildren</key><value href="#id3 ...
# vs <item><key xsi:type="soapenc:string">Network</key><value href="#id40"/>
  print "mref: $mref\n" if $DEBUG;
  @ids = split /<item><key/, $mref;
# then loop over ids mentioned in this mref
  foreach $myid (@ids) {
    next unless $myid =~ /href="#(id\d+)/;
    next unless $myobjs{"$parentid"};
# types include group, monitor, alert
    ($typebyregex) = $myid =~ />snapshot_(\w+)SnapshotChildren</;
    $parenttype = $myobjs{"$parentid"}->type();
    $type = $typebyregex ? $typebyregex : $parenttype;
    print "type: $type\n" if $DEBUG;
# skip alert definitions
    next if $type eq "alert";
    print "myid: $myid\n" if $DEBUG;
    ($actualid) = $myid =~ /href="#(id\d+)/;
    print "actualid: $actualid\n" if $DEBUG;
# construct object
    my $id = Id->new (
      id => $actualid,
      type => $type,
      parentid => $parentid );
# build hash of these objects with actualid as key
    $myobjs{$actualid} = $id;
# addchild to parent. note that parent should already have been encountered
    $myobjs{"$parentid"}->addchild($actualid);
    if ($myid !~ /groupSnapshotChildren/) {
# interesting child - has name (every other generation has no name!)
      ($name) = $myid =~ /string\">(.+?)<\/key/;  # use non-greedy operator
      print "name: $name\n" if $DEBUG;
# some names are not of interest to us: alerts, which end in "error" or "good"
      if ($name !~ /(error|good)$/) {
# name may not be unique - get extended name which include all parents
        if (defined $myobjs{"$parentid"}->parentid()) {
          $gdparid = $myobjs{"$parentid"}->parentid();
          $gdparname = $myobjs{"$gdparid"}->extname();
# extname -> extended, or distinguished name.  Should be unique
          $extname = $gdparname. '/' . $name;
        } else {
# 1st generation
          print "1st generation\n" if $DEBUG;
          $extname = $name;
        }
        print "extname: $extname\n" if $DEBUG;
        $id->name($name);
        $id->extname($extname);
        $id->isanamedid(1);
        $myobjs{"$parentid"}->hasnamedkids(1); # want to mark its parent as "special"
# we also need our hash to reference objects by extended name since id changes with each extract and name
may not be unique
        $myobjs{"$extname"} = $id;
      } # end conditional over desirable name check
    } else {
      $id->isanamedid(0);
    }
  }
}
#
# now it's all parsed and our objects are alive. Let's build a web site!
#
# build a cookie containing path
my $pi = $ENV{PATH_INFO};
$script = $ENV{SCRIPT_NAME};
$ua = $ENV{HTTP_USER_AGENT};
# Blackberry browser test
$BB = $ua =~ /^BlackBerry/i ? 1 : 0;
$MSIE = $ua =~ /MSIE /;
# font-size depends on browser
$FS = "font-size: x-small;" if $MSIE;
$cookie = $cgi->cookie("pathinfo");
$uri = $script . $pi;
$cookie=$cgi->cookie(-name=>"pathinfo", -value=>"$uri");
print $cgi->header(-type=>"text/html",-cookie=>$cookie);
($url) = $pi =~ m#([^/]+)$#;
#  -title=>'SmartPhone View',
# this doesn't work, sigh...
#print $cgi->start_html(-head=>meta({-http_equiv=>'Refresh'}));
print qq( <HEAD>
<meta http-equiv="Expires" content="0">
<meta http-equiv="Pragma" content="no-cache">
<meta HTTP-EQUIV="Refresh" CONTENT="60; URL=$url">
<TITLE>SiteScope Classic $url Detail</TITLE>
<style type="text/css">
a.good {color: green; }
a.warning {color: green; }
a.error {color: red; }
td {font-family: Arial, Helvetica, sans-serif; $FS}
p.ss {font-family: Arial, Helvetica, sans-serif;}
</style>
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
<script type=text/javascript>
function changeme(elemid,longvalue)
{
document.getElementById(elemid).innerText=longvalue;
}
function restoreme(elemid,truncvalue)
{
document.getElementById(elemid).innerText=truncvalue;
}
</script>
</HEAD><body>
);
 
#print $cgi->h1("This is the heading");
# parse path
# top lvl name:2nd lvl name:3rd lvl name
$altpi = $cgi->path_info();
print $cgi->p("pi is $pi") if $DEBUG;
#print $cgi->p("altpi is $altpi");
# relative url
$rurl = $cgi->url(-relative=>1);
if ($pi eq "") {
# the top
# top id is id3
  print qq(<p class="ss">);
  $myid = "id3";
  foreach $kid ($myobjs{"$myid"}->get_children()) {
    my $kidname = $myobjs{"$kid"}->name();
# kids can be subgroups or standalone monitors
    my $health = recurse("/$kidname");
    print "$health{$health} <a href=\"$rurl/$kidname\">$kidname</a><br>\n";
    $prodtest = $kid if $kidname eq "Production";
  }
  print "</p>\n";
} else {
  $extname = $pi;
  print "pi,name,extname,script: $pi,$name,$extname,$script\n" if $DEBUG;
# print where we are
  $uriname = $pi;
  $uriname =~ s#^/##;
  #print $cgi->p("name is $name");
  #print $cgi->p("uriname is $uriname");
  $uricompositepart = "/";
  @uriparts = split('/',$uriname);
  $lastpart = pop @uriparts;
  print qq(<p class="ss"><a href="$script"><b>Sitescope</b></a><br>);
  print qq(<b>Monitors in: );
  foreach $uripart (@uriparts) {
    my $healthp = recurse("$uricompositepart$uripart");
# build valid link
    ##$link = qq(<a class="good" href="$script$uricompositepart$uripart">$uripart</a>: );
    $link = qq(<a class="$healthp" href="$script$uricompositepart$uripart">$uripart</a>: );
    $uricompositepart .= "$uripart/";
    print $link;
  }
  my $healthp = recurse("$uricompositepart$lastpart");
  $color = $healthp eq "error" ? "red" : "green";
  print qq(<font color="$color">$lastpart</font></b></p>\n);
  print qq(<table border="1" cellspacing="0">);
  #print qq(<table>);
  %hashtrs = ();
  foreach $kid ($myobjs{"$extname"}->get_children()) {
    print "kid id: " . $myobjs{"$kid"}->id() . "\n" if $DEBUG;
    next unless $myobjs{"$kid"}->hasnamedkids();
    foreach $gdkid ($myobjs{"$kid"}->get_children()) {
      print "gdkid id: " . $myobjs{"$gdkid"}->id() . "\n" if $DEBUG;
      $gdkidname = $myobjs{"$gdkid"}->name();
      $gdkidextname = $myobjs{"$gdkid"}->extname();
      my $health = recurse("$gdkidextname");
      my $type = $myobjs{"$gdkid"}->type();
# dig deeper to learn health of the grankid's grandkids
      $objct = $healthct{good} = $healthct{error} = $healthct{warning} = 0;
      foreach $ggkid ($myobjs{"$gdkidextname"}->get_children()) {
        print "ggkid id: " . $myobjs{"$ggkid"}->id() . "\n" if $DEBUG;
        next unless $myobjs{"$ggkid"}->hasnamedkids();
        foreach $gggdkid ($myobjs{"$ggkid"}->get_children()) {
          print "gggdkid id: " . $myobjs{"$gggdkid"}->id() . "\n" if $DEBUG;
          $gggdkidname = $myobjs{"$gggdkid"}->name();
          $gggdkidextname = $myobjs{"$gggdkid"}->extname();
          my $health = recurse("$gggdkidextname");
          $objct++;
          $healthct{$health}++;
        }
      }
      $elemct++;
      $elemid = "elemid" . $elemct;
# groups should have distinctive cell background color to set them apart from monitors
      if ($type eq "group") {
        $bgcolor = "#F0F0F0";
        $celllink = "$lastpart/$gdkidname";
        $truncvalue = qq(<font color="red">$healthct{error}</font>/$objct);
        $tdval = $truncvalue;
      } else {
        $bgcolor = "#FFFFFF";
        $celllink = "$rprt?$gdkidname";
# truncate monitor value to save display space
        $longvalue = $monitorv{"$gdkidname"};
        (my $truncvalue) = $monitorv{"$gdkidname"} =~ /^(.{7,9})/;
        $truncvalue = $truncvalue? $truncvalue : "&nbsp;";
        $tdval = qq(<span id="$elemid" onmouseover="changeme('$elemid','$longvalue')" onmouseout="restorem
e('$elemid','$truncvalue')">$truncvalue</span>);
      }
      $hashtrs{"$gdkidname"} = qq(<tr><td bgcolor="#000000">$health{$health} </td><td>$tdval</td><td bgcol
or="$bgcolor"><a href="$celllink">$gdkidname</a></td></tr>\n);
# for health we're going to have to recurse
    }
  }
# print out in alphabetical order
  foreach $key (sort(keys %hashtrs)) {
    print $hashtrs{"$key"};
  }
  print "</table>";
}
print $cgi->end_html();
#######################################
sub recurse {
# to get the union of health of all ancestors
my $moniext = shift;
my ($moni) = $moniext =~ m#/([^/]+)$#;
# don't bother recursing and all that unless we have to...
return $myobjs{"$moniext"}->health() if defined $myobjs{"$moniext"}->health();
print "moni,moniext: $moni, $moniext\n" if $DEBUG;
my ($kid,$gdkidextname,$health,$cumhealth);
$cumhealth = $health = $monitors{"$moni"} ? $monitors{"$moni"} : "good";
foreach $kid ($myobjs{"$moniext"}->get_children()) {
    if ($myobjs{"$kid"}->hasnamedkids()) {
      foreach $gdkid ($myobjs{"$kid"}->get_children()) {
        $gdkidextname = $myobjs{"$gdkid"}->extname();
# for health we're going to have to recurse
        $health = recurse("$gdkidextname");
        if ($health eq "error" || $cumhealth eq "error") {
          $cumhealth = "error";
        } elsif ($health eq "warning" || $cumhealth eq "warning") {
          $cumhealth = "warning";
        }
      }
    } else {
# this kid is end of line
      $health = $monitors{"$kid"} ? $monitors{"$kid"} : "good";
        if ($health eq "error" || $cumhealth eq "error") {
          $cumhealth = "error";
        } elsif ($health eq "warning" || $cumhealth eq "warning") {
          $cumhealth = "warning";
        }
    }
}
$myobjs{"$moniext"}->health("$cumhealth");
return $cumhealth;
} # end sub recurse

I call it simply “ss” to minimize the typing required. You see it uses a package called Id.pm which I wrote to encapsulate the class and methods. Here is Id.pm:

package Id;
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# class for storing data about an id
# URL (not currently protected): http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getC
onfigurationSnapshot
# class for storing data about a group
use warnings;
use strict;
use Carp;
#group methods
# constructor
# get_members
# get_name
# get_id
# addmember
#
# member methods
# constructor
# get_id
# get_name
# get_type
# get_gp
# set_gp
 
sub new {
  my $class = shift;
  my $self = {@_};
  bless($self, "Id");
  return $self;
}
# get-set methods, p. 355
sub parentid { $_[0]->{parentid}=$_[1] if defined $_[1]; $_[0]->{parentid} }
sub isanamedid { $_[0]->{isanamedid}=$_[1] if defined $_[1]; $_[0]->{isanamedid} }
sub id { $_[0]->{id}=$_[1] if defined $_[1]; $_[0]->{id} }
sub name { $_[0]->{name}=$_[1] if defined $_[1]; $_[0]->{name} }
sub extname { $_[0]->{extname}=$_[1] if defined $_[1]; $_[0]->{extname} }
sub type { $_[0]->{type}=$_[1] if defined $_[1]; $_[0]->{type} }
sub health { $_[0]->{health}=$_[1] if defined $_[1]; $_[0]->{health} }
sub hasnamedkids { $_[0]->{hasnamedkids}=$_[1] if defined $_[1]; $_[0]->{hasnamedkids} }
 
# get children - use anonymous array, book p. 221-222
sub get_children {
# return empty array if arrary hasn't been defined...
  defined @{$_[0]->{children}} ? @{$_[0]->{children}} : ();
}
# adding children
sub addchild {
  $_[0]->{children} = [] unless defined  $_[0]->{children};
  push @{$_[0]->{children}},$_[1];
}
 
1;

ss also assumes the existence of just a few of the images from SiteScope classic – the green circle for good, red diamond for error and yellow warning, etc.. I borrowed them SiteScope classic.

Here is the code for the log scraper:

#!/usr/bin/perl
# analyze SiteScope log file
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# 8/2010
$DEBUG = 0;
$logdir = "/opt/SiteScope/logs";
$monitorstats = "/tmp/monitorstats.txt";
$monitorstatshis = "/tmp/monitorstats-his.txt";
$date = `date +%Y_%m_%d`;
chomp($date);
$file = "$logdir/SiteScope$date.log";
open(LOG,"$file") || die "Cannot open SiteScope log file: $file!!\n";
# example lines:
# 16:51:07 08/02/2010     good    LDAPServers     LDAP SSL test : ldapsrv.drj.com exit: 0, 0.502 sec    1:
3481  0       502
#16:51:22 08/02/2010     good    Network DNS: (AMEAST) ns2  0.033 sec   2:3459      200     33      ok
#16:51:49 08/02/2010     good    Proxy   proxy.pac script on iwww    0.055 sec   2:12467 200     55   ok
     4288    1280782309      0    0  55      0       0      200  0
#16:52:04 08/02/2010     good    Proxy   Disk Space: earth /logs   66% full, 13862MB free, 41921MB total
 3:3598      66      139862
#16:52:09 08/02/2010     good    DrjExtranet  URL: wwwsecure.drj.com     0.364 sec    1:3604      200
364  ok 26125   1280782328     0    0   358     4       2       200  0
while(<LOG>) {
  ($time,$date,$status,$group,$monitor,$value) = /(\S+)\s(\S+)\t(\S+)\t(\S+)\t([^\t]+)\t([^\t]+)/;
  print '$time,$date,$status,$group,$monitor,$value' . "$time,$date,$status,$group,$monitor,$value\n" if $DEBUG;
  next if $group =~ /__health__/; # don't care about these lines
  $mons{"$monitor"} = 1;
  push @{$mont{"$monitor"}} , $time;
  push @{$mond{"$monitor"}} , $date;
  push @{$monh{"$monitor"}} , $status;
  push @{$monv{"$monitor"}} , $value;
}
# open output at last moment to minimize chances of reading while locked for writing
open(MONITORSTATS,">$monitorstats") || die "Cannot open monitor stats file $monitorstats!!\n";
open(MONITORSTATSHIS,">$monitorstatshis") || die "Cannot open monitor stats file $monitorstatshis!!\n";
# write it all out - will always print the latest values
foreach $monitor (keys %mons) {
# dereference our anonymous arrays
  @times = @{$mont{"$monitor"}};
  @dates = @{$mond{"$monitor"}};
  @status = @{$monh{"$monitor"}};
  @value = @{$monv{"$monitor"}};
# last element is the latest measured status and value
  print MONITORSTATS "$monitor\t$status[-1]\t$value[-1]\n";
  print MONITORSTATSHIS "$monitor\n";
  #for ($i=-11;$i<0;$i++) {
# put latest measure on top
  for ($i=-1;$i>-13;$i--) {
    $time = defined $times[$i] ? $times[$i] : "NA";
    $date = defined $dates[$i] ? $dates[$i] : "NA";
    $stat = defined $status[$i] ? $status[$i] : "NA";
    $val = defined $value[$i] ? $value[$i] : "NA";
    print MONITORSTATSHIS "\t$time\t$date\t$stat\t$val\n";
  }
}

As I said it gets called every minute by cron.

That’s it! I enter the url sitescope.drj.com/SS/ss to access the main program which gets executed because I made /SS a CGI-BIN directory.

This gives you a read-only, Java-free view into your SiteScope status and hierarchy which beckons back to the good old days of Freshwater SiteScope.

Know your limits
What it does not do, unfortunately, is allow you to run a monitor – that seems like the next most simple thing which I should have been able to do but couldn’t figure out – much less define new monitors (never going to happen) or alerts.

I use this successfully against my HP SiteScope instance of roughly 400 monitors which itself is on a VM and there is no apparent strain. At some point this simple-minded script would no longer scale to suit the task at hand, but it might be good for up to a few thousand monitors.

And now a word about open source alternatives
Since I was so enamored with SiteScope Classic there seemed to be no compelling reason to shell out the dough for HP SiteScope with its unwanted interface, so I briefly looked around at free alternatives. Free sounds good, right? Not so much in practice. Out there in Cyberspace there is an enthusiast for a product called Zabbix. I just want to go on the record that Zabbix is the most confused piece of junk I have run across. You are getting less than what you paid for ($0) because you will be wasting a lot of time with it, and in the end it isn’t all that capable. Nagios also had its limits – I can’t remember the exact reason I didn’t go down that route, but there were definite reasons.

HP SiteScope is no panacea. “HP” and “stifling bureaucracy” need to be mentioned in the same sentence. Every time we renew support it is the most confusing mess of line items. Every time there’s a new cast of characters over at HP who nothing about the account’s history. You practically have to beg them to accept your money for a low-budget item like SiteScope because they really don’t pursue it in any way. Then their SAID and contract numbers stuff is confusing if you only see it once every few years.

Conclusion
A conversion program does exist for turning the finicky HP SiteScope Java-encumbered view into pure SiteScope Classic because I wrote it! But it’s a limited read-only view. Still, it’s helpful in a pinch and can even be viewed on the Blackberry’s browser.

Another problem is that HP has threatened to completely change the API so this tool, which is designed for HP SiteScope v 10.12, will probably completely break for newer versions. Oh, well.

References
This post shows some silly mistakes to avoid when doing a minor upgrade in version 11.