Categories
Linux Perl Web Site Technologies

For Experimentalists: How to Test if your Web Server has a long timeout

Intro
I use the old Sun Java System Web Server, now known as the Oracle Web Server, formerly Sun ONE web server and before that iPlanet Web Server and before that Netscape Enterprise Server. The question came up the other day if the web server times out web pages. I never fully trust the documentation. I developed a simple method to experiment and find the answer for myself.

The Method
Sometimes you test what’s easiest, not what you should. In this case, an easy test is to write a long-running CGI program. This program, timertest.pl, is embarrassingly old, but anyhow…

#!/usr/bin/perl
# DrJ, 3/1999
# The new, PERL5 way:
use CGI;
$query = new CGI;
$| = 1;
#
print "Content-type: text/html\n\n";
 
print "<h2>Environment Variables</h2>
 
<table>
<tr><th>Env Variable</th><th>Value</th></tr>\n";
foreach $key (sort(keys(%ENV))) {
  print "<tr><td>$key</td><td>$ENV{$key}</td></tr>\n";
}
print "</table>\n";
print "<hr>
<h2>Name/Value Pairs</h2>
<table>
<tr><th>Name</th><th>Value</th></tr>\n";
foreach $key ($query->param) {
  print "<tr><td>$key</td><td>" . $query->param($key) . "</td></tr>\n";
}
print "</table>\n";
$host = `hostname`;
print "Hostname: $host<br>\n";
sleep($ENV{QUERY_STRING});
 
print "we have slept for $ENV{QUERY_STRING} seconds.\n";

So you see it prints out some stuff, sleeps for a specified time, then prints out a final line. You call it like curl your_sevrer/cgi-bin/timertest.pl?305, where 305 is the time in seconds to sleep. I suggest use of the curl browser so as not to be thrown off by browser complications which may have their own timeouts. curl is simplicity itself and won’t bias the answer. Use a larger number for longer times. That was easy, right? Does it work? No. Does it show what we _really_ wanted to show? Also no. In other words, a CGI program that runs for 610 seconds will be killed by the web server, but that’s really a function of some CGI timer. Five and ten minutes seem to be magic timeout values for some built-in timers, so it is good to test times slightly smaller/larger than those times. So how do we test a plain web page??? It turns out we can…

The Solution – using the Unix bag of tricks
I only have a couple of minutes here. Briefly:

> mknod tmp.htm p

> chown me tmp.htm

(from another window)
> curl my_server/tmp.htm

(back to first window)
> sleep 610; ls -l > tmp.htm

Then wait! mknod as used above is apparently the old, Solaris, syntax. The syntax could be somewhat different under Linux. The point is to create a named pipe. Think of a named pipe, like it sounds, like giving a name to the “|” character used so often in Unix command lines. So it needs a process to give it input and a process to read it, hence the two separate windows.

See if you get the directory listing in your curl window after about 10 minutes. With my Sun Java System Web Server I do, so now I know both curl and the web server support probably unlimited page-load times.

An Unexpected Finding
Another tip and unexpected lesson – don’t use one of your named pipes more than once. If you mess up, create a new one and work with that. What happens when I re-use one of my pipes is that curl is able to read the web page over and over, without a process sending input to the named pipe! That wasn’t supposed to happen. What does it all mean? It can only be – -and I’ve often suspected this – that my web server is caching the content. It’s not a particularly well-documented feature, either. I think most times I wish it’d rather not.

Conclusion
The Sun Java System Web Server times out CGI scripts, but not regular static web pages. We proved this in a few minutes by devising an unambiguous experiment. As an added bonus we also proved that the web server caches at least some pages. The careful observer is always open to learning more than what he or she started out intending to look for!

Categories
Admin Apache CentOS Linux Web Site Technologies

Major Headaches Migrating Apache from Ubuntu to CentOS

Intro
I’m changing servers from Ubuntu server to CentOS. On Ubuntu I just checked off LAMP and got my environment. In CentOS I’m doing it piece-by-piece. I don’t think my Ubuntu install is quite regular, either, as I bastardized it by adding environment variables in the Apache config file, a concept I borrowed from SLES! Turns out it is quite an ordeal to make a smooth transition. I will share all my pitfalls. I still don’t have it working, but I think I’m over the hump. [Update: now it is working, or 99% of it is working. It is a bit sluggish, however.]

The Details
I installed httpd on CentOS using yum. I also installed some php5 packages which I saw were recommended as well. First thing I noticed is that the directory structure for “httpd” as it seems to be known on CentOS, is dramatically different from “apache2” as it is known in Ubuntu. This example illustrates the point. In CentOS the main config file is

/etc/httpd/conf/httpd.conf

while in Ubuntu I had

/etc/apache2/apache2.conf

so I tarred up my /etc/apache2 files and had the thought “Let’s make this work on CentOS.” Ha. Easier said than done.

To remind, the content of /etc/apache2 is:

apache2.conf, conf.d, sites-enabled sites-available mods-enabled mods-available plus some stuff I probably added, including envvars, httpd.conf and ports.conf.

envvars contains environment variables which are subsequently referenced in the config files, like these:

export APACHE_RUN_USER=www-data
export APACHE_RUN_GROUP=www-data
export APACHE_PID_FILE=/var/run/apache2$SUFFIX.pid
export APACHE_RUN_DIR=/var/run/apache2$SUFFIX
export APACHE_LOCK_DIR=/var/lock/apache2$SUFFIX
# Only /var/log/apache2 is handled by /etc/logrotate.d/apache2.
export APACHE_LOG_DIR=/var/log/apache2$SUFFIX

First step? Well we have to hook httpd startup to our new directory somehow. I don’t recall this part so well. I think I tried this from the command line:

$ apachectl -d /etc/apache2 -f apache2.conf -k start

and it may be at that point that I got the MPM workers error. But I forget. I switched to using the service command and that particular error seemed to go away at some point. I don’t believe I needed to do anything special.

So I tried this edit to /etc/sysconfig/httpd (sparing you the failed attempts):

OPTIONS=”-d /etc/apache2 -f apache2.conf”

Now we try to launch and see what happens.

$ service httpd start

Starting httpd: httpd: Syntax error on line 203 of /etc/apache2/apache2.conf: Syntax error on line 1 of /etc/apache2/mods-enabled/alias.load: Cannot load /usr/lib/apache2/modules/mod_alias.so into server: /usr/lib/apache2/modules/mod_alias.so: cannot open shared object file: No such file or directory
[FAILED]

Fasten your seatbelts, put on your big-boy pants or whatever. We’re just getting warmed up.

Let’s look at mods-available/alias.load:

$ more alias.load

LoadModule alias_module /usr/lib/apache2/modules/mod_alias.so

Sure enough, there is not only no such file, there is not even such a directory as /usr/lib/apache2. And all the load files have references like that. Where did the httpd install put its modules anyways? Why in /etc/httpd/modules. So I made a command decision:

$ mkdir /usr/lib/apache2
$ cd !$
$ ln -s /etc/httpd/modules

So where does that put us? Here:

$ service httpd start

Starting httpd: httpd: Syntax error on line 203 of /etc/apache2/apache2.conf: Syntax error on line 1 of /etc/apache2/mods-enabled/ssl.load: Cannot load /usr/lib/apache2/modules/mod_ssl.so into server: /usr/lib/apache2/modules/mod_ssl.so: cannot open shared object file: No such file or directory
     [FAILED]

Not everyone will see this issue. I had used SSL for some testing in Ubuntu so I had that module enabled. my CentOS is a core image and did not come with an SSL module. So let’s get it.

$ yum search mod_ssl

shows the full module name to be mod_ssl.x86_64, so we install it with yum install.

How far did that get us? To here:

$ service httpd start

Starting httpd: httpd: bad user name ${APACHE_RUN_USER}
  [FAILED]

Ah, remember my environment variables from above? As I said I actually use them with lines such as:

User ${APACHE_RUN_USER}

in apache2.conf. But clearly the definitions of those environment variables is not getting passed along. I decide to see if this step might work. I append these two lines to /etc/sysconfig/httpd:

$ Read in our environment variables. Inspired by apache on SLES.
. /etc/apache2/envvars

Could any more go wrong? Sure. Lots! Then there’s this:

$ service httpd start

Starting httpd: httpd: bad user name www-data
      [FAILED]

Amongst all the other stark differences, ubuntu and CentOS use different users to run apache. Great! So I create a www-data user as userid 33, gid 33 because that’s how it was under ubuntu. but GID 33 is already taken in CentOS. It is backup. I decide I will never use it that way, and change the group name to www-data.

That brings us here. you see I have a lot of patience…

$ service httpd start

Starting httpd: Syntax error on line 218 of /etc/apache2/apache2.conf:
Invalid command 'LogFormat', perhaps misspelled or defined by a module not included in the server configuration
   [FAILED]

Now my line 218 looks pretty regular. It’s simply:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined

I then realized something interesting. The modules built in to httpd on centOS and apache2 are different. apache2 seems to have some modules built in for logging:

$ apache2 -l

Compiled in modules:
  core.c
  mod_log_config.c
  mod_logio.c
  prefork.c
  http_core.c
  mod_so.c

whereas httpd does not:

$ httpd -l

Compiled in modules:
  core.c
  prefork.c
  http_core.c
  mod_so.c

So I made an empty log_config.conf and a log_config.load in mods-available that reads:

LoadModule log_config_module /usr/lib/apache2/modules/mod_log_config.so

I got the correct names by looking at the apache web site documenttion on that module. And i linked those two files up in the mods-available diretory:

$ cd mods-enabled
$ ln -s ../mods-available/log_config.conf
$ ln -s ../mods-available/log_config.load

Next error, please! Certainly. It is:

$ service httpd start

Starting httpd: Syntax error on line 218 of /etc/apache2/apache2.conf:
Unrecognized LogFormat directive %O
  [FAILED]

where line 218 is as recorded above. Well, some searches showed that you need the logio module. Note that it is also compiled into to apache2, but missing from httpd. So I did a similar thing with defining the necessary mods-{available,enabled} files. logio.load reads:

LoadModule logio_module /usr/lib/apache2/modules/mod_logio.so

The next?

$ service httpd start

Starting httpd: (2)No such file or directory: httpd: could not open error log file /var/log/apache2/error.log.
Unable to open logs
   [FAILED]

Oops. Didn’t make that directory. Naturally httpd and apache2 use different directories for logging. What else could you expect?

Now we’re down to this minimalist error:

$ service httpd start

Starting httpd:     [FAILED]

The error log contained this line:

[Mon Mar 19 14:11:14 2012] [error] (2)No such file or directory: Cannot create SSLMutex with file `/var/run/apache2/ssl_mutex'

After puzzling some over this what I eventually noticed is that my environment has references to directories which I haven’t defined yet:

export APACHE_RUN_DIR=/var/run/apache2$SUFFIX
export APACHE_LOCK_DIR=/var/lock/apache2$SUFFIX

So I created them.

And now I get:

$ service httpd start

Starting httpd:           [  OK  ]

But all is still not well. I cannot stop it the proper way. Trying to read its status goes like this:

$ service httpd status

httpd dead but subsys locked

I looked this one up. Killed off processes and semaphores as recommended with ipcs -s (see this link), etc. But since my case is different, I also did something different. I modified my /etc/init.d/httpd file:

#pidfile=${PIDFILE-/var/run/httpd/httpd.pid}
pidfile=${PIDFILE-/var/run/apache2.pid}
#lockfile=${LOCKFILE-/var/lock/subsys/httpd}
lockfile=${LOCKFILE-/var/lock/subsys/apache2}

Believe it or not, this worked. I can now run service httpd status and service httpd stop. To prove it:

$ service httpd status

httpd (pid  30366) is running...

Another Error Crops Up
I eventually noticed another problem with the web site. My trajectory page was not working. Upon investigation I found this comment in my main apache error log (not my virtual server error log, which I still don’t understand):

sh: /home/drj/traj/traj4.pl: Permission denied

This had to be a result of my call-out to a perl program from a php program:

...
$data = exec('/home/drj/traj/traj4.pl'.' '.$escargs);
...

But what’s so special about that call? Worked fine on Ubuntu, and I did a directory listing to show the file was really there. Well, here’s the thing, that file is under my home directory and guess what? When you crate your users in Ubuntu the home directory permissions are set to group and others read. Not in CentOS! A listing of /home looks kind of like this:

/home$ ll

total 12
drwx------ 2 drj   drj     4096 Mar 19 15:26 drj/
...

I set the permissions for all to read:

$ sudo chmod g+rx,o+rx drj

and I was good to go. The program began to work.

May 2013 Update
I was asked how all this survived after a yum update. Answer: pretty well, but not perfectly. The daemon was fine. And what miseld me is that it started fine. But then a couple days later I was looking at my access log and realized…it wasn’t there! Nor the errors log. Well, actually, the default access and error logs were there, but not for my virtual servers.

I soon realized that

$ service httpd status

produced

httpd dead but subsys locked

Well, who reads or remembers their own posts from a year ago? I totally forgot I had already dealt with this once, and my own post didn’t show up in my DDG search. Anywho, I stepped on the same rake twice. Being less patient this time around, probably because I am one year older, I simply altered the /etc/init.d/httpd file (looks like it had been changed by the update) thusly:

#pidfile=${PIDFILE-/var/run/httpd/httpd.pid}
#lockfile=${LOCKFILE-/var/lock/subsys/httpd}
# try as an experiment - DrJ 5/3/13
pidfile=/var/run/apache2.pid
lockfile=/var/lock/apache2/accept.lock

and I made sure I had a /var/lock/apache2 directory. This worked.

I chose a lock file with that particular name because I noticed this in my /etc/apache2/apache2.conf:

LockFile ${APACHE_LOCK_DIR}/accept.lock

To clean things out as I was working and re-working this problem since I couldn’t run

$ service httpd stop

I ran instead:

$ pkill -9 -f sbin/httpd

and I removed /var/run/apache2.pid.

Now, once again, I can get a status on my httpd service and restart works as well and my access and error logs are being written.

Conclusion
This conversion exercise turned out to be quite a teaching lesson and even after all this more remains. After the mysql migration I find the performance to be sub-par – about twice as slow as it was on Ubuntu.

Four months later, CentOS has not crashed once on me. Contrast that with Ubuntu freezing every two weeks or so. I optimized MySQL to cache some data and performance is adequate. I also have since learned about bitnami, which is kind of a stack for all the stuff I was using. Check out bitnami.org.

Categories
Admin Hosting Service Linux

Hosting: You Really Can’t beat Amazon Web Services EC2

Intro
You want to have your own server hosted by a service provider that’s going to take care of the hard stuff – uninterruptible power, fast pipe to the Internet, backups? That’s what I wanted. In addition I didn’t want to worry about actual, messy hardware. Give me a virtual server any day of the week. I am no hosting expert, but I have some experience and I’d like to share that.

The Details
I’d say chances are about even whether you’d think of Amazon Web Services for the above scenario. I’d argue that Amazon is actually the most competitive service out there and should be at the top of any short list, but the situation wasn’t always so even as recently as February of this year.

You see, Amazon markets itself a bit differently. They are an IaaS (infrastructure as a service) provider. I don’t know who their top competition really is, but AWS (Amazon Web Service) is viewed as both visionary and able to execute by Gartner from a recent report. My personal experience over the last 12 months backs that up. My main point, however, is that hosting a server is a subset of IaaS. I assume that if you want your own server where you get root access, you have the skill set (aided by the vast resources on the Internet including blogs like mine) to install a web server, database server, programming environment, application engines or whatever you want to do with it. You don’t need the AWS utility computing model per se, just a reliable 24×7 server, right? That’s my situation.

I was actually looking to move to “regular” hosting provider, but it turns out to have been a really great time to look around. Some background. I’m currently running such an environment running Ubuntu server 10.10 as a free-tier micro instance. I’ve enjoyed it a lot except one thing. From time to time my server freezes. At least once a month since December. I have no idea why. Knowing that my free tier would be up anyways this month I asked my computer scientist friend “Niz” for a good OS to run a web server and he said CentOS is what I want. It’s basically Redhat Enterprise Linux except you don’t pay Redhat for support.

I looked at traditional hosting providers GoDaddy and Rackspace and 1and1 a bit. I ran the numbers and saw that GoDaddy, with whom I already host my DNS domains, was by far the cost leader. They were also offering CentOS v 5.6 I think RackSpace also had a CentOS offering. I spoke with a couple providers in my own state. I reasoned I wuold keep my business local if the price was within 25% of other offers I could find.

Then, and here’s one of the cool things about IaaS, I fired up a CentOS image at Amazon Elastic Compute Cloud. With utility computing I pay only by the hour so I can experiment cheaply, which I did. Niz said run v 5.6 because all the bugs have been worked out. He hosts with another provider so he knows a thing or two about this topic and many other topics besides. I asked him what version he runs. 5.6. So I fired it up. But you know, it just felt like a giant step backwards through an open source timeline. I mean Perl v 5.8.8 vs Ubuntu’s 5.10.1. Now mind you by this time my version of Ubuntu is itself a year old. Apache version 2.2.3 and kernel version 2.6.18 versus 2.2.16 and 2.6.35. Just plain old. Though he said support would be available for fantastical amount of time, I decided to chuck that image.

Just as I was thinking about all these things Amazon made a really important announcement: prices to be lowered. All of a sudden they were competitive when viewed as a pure hosting provider, never mind all the other features they bring to bear.

I decided I wanted more memory than the 700 MB available to a micro image, and more storage than the 8 GB that tier gives. So a “small” image was the next step up, at 1.7 GB of memory and 160 GB disk space. But then I noticed a quirky thing – the small images only come in 32-bit, not 64-bit unlike all the other tiers. I am so used to 64-bit by now that I don’t trust 32-bit. I want to run what a lot of other people are running to know that the issues have been worked out.

Then another wonderful thing happened – Amazon announced support for 64-bit OSes in their small tier! What timing.

The Comparison Results
AWS lowered their prices by about 35%, a really substantial amount. I am willing to commit up front for an extended hosting because I expect to be in this for the long haul. Frankly, I love having my own server! So I committed to three years small tier, heavy usage after doing the math in order to get the best deal for a 24×7 server. It’s $300 $96 up front and about $0.012$0.027/hour for one instance hour. So that’s about $18 $22/month over three years. Reasonable, I would say. For some reason my earlier calculations had it coming out cheaper. These numbers are as of September, 2013. I was prepared to use GoDaddy which I think is $24/month for a two-year commitment. My finding was that RackSpace and 1and1 were more expensive in turn than GoDaddy. I have no idea how AWS did what they did on pricing. It’s kind of amazing. My local providers? One came in at six times the cost of GoDaddy(!), the other about $55/month. Too bad for them. But I am excited about my new server. I guess it’s a sort of master of my own destiny type of thing that appeals to my independent spirit. Don’t tell Amazon, but really I think they could have easily justified charging a small premium for their hosting, given all the other convenient infrastructure services that are there, ready to be dialed up, say, like a load balancer, snapshots, additional IPs, etc. And there are literally 8000 images to choose from when you are deciding what image (OS) to run. That alone speaks volumes about the choices you have available.

What I’m up to
I installed CentOS 6.0 core image. It feels fresher. It’s based on RedHat 6.0 It’s got Perl v. 5.10.1, kernel 2.6.32, and, once you install it, Apache v 2.2.15. It only came with about 300 packages installed, which is kind of nice, not the usual 1000+ bloated deal I am more used to. And it seems fast, too. Now whether or not it will prove to be stable is an entirely different question and only time will tell. I’m optimistic. But if not, I’ll chuck it and find something else. I’ve got my data on a separate volume anyways which will persist regardless of what image I choose – another nice plus of Amazon’s utility computing model.

A Quick Tip About Additional Volumes
With my micro instance it occupied a full 8 GB so I didn’t have a care about additional disk volumes. On the other hand, my CentOS 6.0 core image is a lean 6 GB. If I’m entitled to 160 GB as part of what I’m paying for, how do I get the access to the remaining 154 GB? I guess you create a volume. Using the Admin GUI is easiest. OK, so you have your volunme, how does your instance see it? It’s not too obvious from their documentation but in CentOS my extra volume is

/dev/xvdj

I mounted that a formatted it as an ext4 device as per their instructions. It didn’t take that long. I put in a line in /etc/fstab like this:

/dev/xvdj /mnt/vol ext4 defaults 1 2

Now I’m good to go! It gets mounted after reboot.

Dec, 2016 update
Amazon has announced Lightsail to better compete with GoDaddy and their ilk. Plans start as low as $5 a month. For $10 a month you get a static IP, 1 GB RAM, 20 GB SSD storage I think and ssh access. So I hope that means root access. Oh, plus a pre-configured WordPress software.

Conclusion
Amazon EC2 rocks. They could have charged a premium but instead they are the cheapest offering out there according to my informal survey. The richness of their service offerings is awesome. I didn’t mention that you can mount the entire data set of the human genome, or all the facts of the world which have been assembled in freebase.org. How cool is that?

Categories
Admin Apache Uncategorized Web Site Technologies

The IT Detective agency: Excessive Requests for PAC file Crippling Web Server

Intro
Funny thing about infrastructure. You may have something running fine for years, and then suddenly it doesn’t. That is one of the many mysteries in the case of the excessive requests for PAC file.

The Details
We serve our Proxy Auto-config (PAC) file from a couple web servers which are load-balanced. It’s worked great for over 10 years. The PAC file is actually produced by a Perl script which can alter the content based on the user’s IP or other variables.

The web servers got bogged down last week. I literally observed the load average shoot up past 200 (on a 2-CPU server). This is not good.

I quickly noticed lots and lots of accesses for wpad.dat and proxy.pac. Some PCs were individually making hundreds of requests for these files in a day! Sometimes there were 15 or 20 requests in a minute. Since it is a script it takes some compute power to handle all those requests. So it was time to do one of two things: either learn the root cause and address it, or make a quick fix. The symptoms were clear enough, but I had no idea about the root cause. I also was fascinated by the requests for wpad.dat which I felt was serving no purpose whatsoever in our environment. So I went for the quick fix hopinG that understanding would come later.

To be continued…
As promised – three and a half years later! And we still have this problem. It’s probably worse than ever. I pretty much threw in the towel and learned how to scale up our apache web server to handle more PAC file requests simultaneously, see the references.

References
Scaling apache to handle more requests.

Categories
Admin Linux SLES

How to Get By Without unix2dos in SLES

Intro
As a Unix old-timer looking at the latest releases, I only have observed one tendency – that of ever-increasing numbers of commands, always additive – until now. A command I considered useful (well, basically any command I have ever used I consider useful) has gone AWOL in Suse Linux Enterprise Server (SLES for short): unix2dos.

Why You Need It
These days you need it more than ever. What with sftp being used in place of ftp, your transferred text files will come over from a SLES server to your PC in binary mode, preserving the Linux-style way of declaring a new line with the newline character, “\n”. Bring that file onto your PC and look at it in Notepad and you’ll get one long line because Windows requires more to indicate a new line. Windows OS’s like Windows 7 require a carriage return + newline, i.e., “\r\n”.

Who You Going to Call
I spoke with some experts so I cannot take credit for finding this out personally. Long story short things evolved and there is a more sophisticated command available that does this sort of thing and much else. That’s recode.

But I don’t think I’ll ever use recode for anything else so I decided to re-create a unix2dos command using recode in a tiny shell script:

#!/bin/sh
# inspired by http://yourlinuxguy.com/?p=232 and the fact that they took away this useful command
# 3/6/12
recode latin1..ibmpc $*

You call it like this:

> unix2dos file

and it overwrites the file and converts it to the format Windows expects.

My other expert contact says I could find the old unix2dos in OpenSuse but I decided not to go that route.

Of course to convert in the other direction you have dos2unix which for some reason wasn’t removed from the distro. Strange, huh?

How to See That It Worked
I use

> od -c file|more

to look at the ascii characters in a text file. It also shows the newline and carriage return characters with a \n and \r respectively This is a good command to know because it is also a “safe” way to look at a binary file. By safe I mean it won’t try to print out 8-bit characters that will permanently mess your terminal settings!

2017 update
I finally needed this utility again after five years. My program doesn’t work on CentOS. – No recode, whatever that was. However, the one-liner provided in the comments worked just fine for me.

Conclusion
We can rest easy and send text files back-and-forth between a PC and a SLES server with the help of this unix2dos script we developed.

Interestingly, RedHat’s RHEL has kept unix2dos in its disrtibution. Good for them. In ubuntu Linux unix2dos also seems decidedly missing.

Categories
Admin Linux

Common Problems Installing Cognos Gateway on Linux

Updated for a 2018 Cognos 11 install
with 2013 updates for Cognos 10 installation

Intro
I tried to take a shortcut and get a 2nd Cognos gateway up and running by copying files, etc. rather than a proper install. At one time or another I feel I must have encountered just about every problem conceivable. I didn’t take great, systematic notes, but I’d like to mention some highlights while it is still fresh in my memory!

The Details
Note that I have a working gateway server running on the same version of Linux, SLES 11 SP1. So I thought I’d be clever and just copy all the files below /opt/cognos8 from the working server.

First Rookie Mistake
Let’s call our COGNOS_ROOT /opt/cognos8 for convenience.
Cognos 10 note: /opt/cognos10 would be a more sensible installation directory!

So you’re following along in the documentation and dutifully looking for /opt/cognos8/bin/cogconfig.sh, and not finding it? Me, neither. So I cleverly borrowed it from a working solaris installation. It’s all Java, right, no OS dependencies, what can go wrong? Ha, ha. You try:

./cogconfig.sh
and get:

Using /usr/lib64/jvm/jre/bin/java
The java class is not found:  CRConfig

Long story short. Give up. Without telling anyone they moved it to /opt/cognos8/bin64. That’s assuming you’re on a 64-bit system like most of us are.

OK. Now you run it from the …bin64 directory, expecting better results, only to perhaps get something like:

./cogconfig.sh

Unable to locate a JRE. Please specify a valid JAVA_HOME environment variable.

Long story short, java-1_4_2-ibm (java-1_6_0-ibm if installing a Cognos 10 gateway) is a good Java environment to install for Cognos Gateway. At least it is on SLES Linux. So you install that and set up environment variables like these:

export JAVA_BINDIR=/usr/lib64/jvm/jre/bin
export JAVA_HOME=/usr/lib64/jvm/jre
export JAVA_ROOT=/usr/lib64/jvm/jre

Now you’re cooking. Run it yet again. You’re smart and know to set up your DISPLAY environment to a valid XServer you have access to. But even if the X application actually does launch and run (you may need some Motif or additional X packages, possibly even from the SDK DVD – see appendix A), if you try to export the configuration you’ll get an error like this:

java.lang.ClassNotFoundException: org.bouncycastle134.jce.provider.BouncyCastleProvider

Cognos 10 note: I did not have this class missing in my Cognos 10 installation. Yeah!

Yes, you are missing the infamous bouncycastleprovider! This stuff is too good to make up, right? It’s a jar file that’s somewhere in the Cognos Gateway distribution, bcprov-jdk14-134.jar. In my case I need to put it here:

/etc/alternatives/jre/lib/ext

With that in place run it yet again. Now you may be unable to export the configuration with this error:

CAM-CRP-1057 Unable to generate the machine specific symmetric key.

Does it ever end? Yes!

You may have old values of keys and what-not cryptography stuff from your copy of the other system. So you remove these directories and all their contents:

/opt/cognos8/{encryptkeypair,signkeypair}

And I even saw the following error:

02/03/2012,11:26:56,Err,com.cognos.crconfig.data.DataManagerException: CAM-CRP-1132 An error occurred while attempting to request a certificate from the Certificate Authority service. Unable to connect to the Certificate Authority service. Ensure that the Content Manager computer is configured and that the Cognos 8 services on it are currently running. Reason: java.net.ConnectException: Connection refused, com.cognos.crconfig.data.DataManager.generateCryptoKeys(DataManager.java:2730)

I think it comes about if you save the default config without editing it and putting in a valid dispatcher URI, but I forget.

The main point towards the end was to start with a clean config by a:

cd /opt/cognos8/configuration;cp cogstartup.xml{.new,}

, making sure there is no encryptkeypair and signkeypair directories, launching …bin64/cogconfig.sh, working with the GUI to define the dispatcher URIs to your working, running Cognos dispatcher, exporting it,

(Let me take a breath here. If that export succeeds, you’re home.)

and finally saving it, which also generates the system-specific keys.

That’s it! A bunch of green check marks are your reward. Hopefully.

Conclusion
In the end you will see that this “cheap method” of installing Cognos Gateway worked. We had a few bumps along the road, but we worked through them all. Now that we’ve seen just about every conceivable problem we have a treasure trove of documented errors and fixes should we ever find ourselves in this situation again.

There is one more Cognos Gateway problem we resolved, by the way, that was previously documented here.

Appendix A – Cognos 10 note
Yes, I referred to this document in my own installation of Cognos version 10 gateway component. The problems are very similar, and this was a big help, if I say so myself.

I notice I write a tight narrative. I have lots of tangential thoughts, but to list them all as I think of them would destroy the flow of the narrative. In this case I wanted to expand on the openmotif packages.

I got a missing libXm.so.4 message when launching issetup the first time. I determined this came from an openmotif package from my previous successful installation on another server. My new server had limited repositories.

> zypper search openmotif

produced these results:

 
S | Name                   | Summary                    | Type
--+------------------------+----------------------------+-----------
  | openmotif21-demos      | Open Motif 2.2.4 Libraries | package
  | openmotif21-libs       | Open Motif 2.2.4 Libraries | package
  | openmotif21-libs       | Open Motif 2.2.4 Libraries | srcpackage
  | openmotif21-libs-32bit | Open Motif 2.2.4 Libraries | package
  | openmotif22-libs       | Open Motif 2.2.4 Libraries | package
  | openmotif22-libs       | Open Motif 2.2.4 Libraries | srcpackage
  | openmotif22-libs-32bit | Open Motif 2.2.4 Libraries | package

Well, I tried to install first openmotif21-libs-32bit then openmotif22-libs-32bit, but neither gave me the right version of libXm.so! I had versions 2, 3 and 6! So I simply did one of these numbers:

> cd /usr/lib; ln -s libXm.so.3.0.3 libXm.so.4

and, to my surprise, it worked!

More Errors Documented for completeness’ sake

At the risk of making this blog post a total mess, I’ll include a few more errors I encountered during the upgrade. Who knows who might find this useful.

Generating the cryptographic keys is always a hold-your-breath-and-pray operation. I had my upgrade files in place in a new install directory, /opt/cognos10. I ran bin64/cogconfig.sh like usual. It was suggested I could save the configuration even though the application gateway wasn’t running, so I tried that. No dice.

The cryptographic information cannot be encrypted.

Fine. So probably the app server needs to be running before we save the config, right? So they got it running. I tried to save the config. Same error. The details were as follows:

[ ERROR ]
CAM-CRP-1315 Current configuration points to a different Trust Domain than originally configured.
 
[ ERROR ] 
The cryptography information was not generated.

The remedy? Close the configuration and completely remove these directories beneath the /opt/cognos10/configuration directory:

– encryptkeypair
– signkeypair
– csk (actually I didn’t have this one. But I guess it should be removed if present)

I held my breath, re-ran cogconfig and saved. This time it worked!

I also had an error with my Java version:

./cogconfig.sh
Using /usr/lib64/jvm/jre/bin/java
The java class could not be loaded. java.lang.UnsupportedClassVersionError: (CRConfig) bad major version at offset=6
/usr/lib64/jvm/jre/bin/java -version

showed

java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 2.3)
IBM J9 VM (build 2.3, J2RE 1.4.2 IBM J9 2.3 Linux amd64-64 j9vmxa64142ifx-20110628 (JIT enabled)
J9VM - 20110627_85693_LHdSMr
JIT  - 20090210_1447ifx5_r8
GC   - 200902_24)

I installed a newer Java:

zypper install  java-1_6_0-ibm

and got past this error.

April 20123 update
Just when you thought every possible error was covered, you encounter a new one. Cognos Mobile isn’t working so well on actual mobile devices so they wanted to try a Fixpack from IBM. No problem, right? They gave me

up_cogmob_linuxi38664h_10.2.1102.33_ml.rar

and I set to work. I don’t particularly like rar files for Linux, but I figured out there is an unrar command:

$ unrar e up_*rar

But after setting up my DISPLAY environment variable I get this new error running ./issetup:

X Error of failed request:  BadDrawable (invalid Pixmap or Window parameter)
  Major opcode of failed request:  14 (X_GetGeometry)
  Resource id in failed request:  0x2
  Serial number of failed request:  257
  Current serial number in output stream:  257
IDS_MSG_PREFIXIDS_COPYRIGHT_LOGOIDS_MSG_PREFIXIDS_MSG_READ_ARCHIVE

The solution? They downloaded a tar.gz version of the Fixpack. I unpacked that and had absolutely no problems with issetup! The really strange thing is that in both issetup are identical files. I use cksum to do a quick compare. Even setup.csp are identical files. I did an strace -f of the two cases but the salient difference didn’t pop out at me. The files present in the tar.gz seem to be fewer in number.

Another random error you will encounter sooner or later

You are doing a Save in cogconfig and you get:

13/05/2013,17:39:05,Err,CAM-CRP-1132 An error occurred while attempting to request a certificate from the Certificate Authority service. Unable to connect to the Certificate Authority service. Ensure that the Content Manager computer is configured and that the IBM Cognos services on it are currently running. Reason: java.net.ConnectException: Connection refused, com.cognos.crconfig.data.crypto.ConfiguringSession.configure(ConfiguringSession.java:35)com.cognos.crconfig.data.DataManager.generateCryptoKeys(DataManager.java:3037)com.cognos.crconfig.data.DataManager$4.run(DataManager.java:4169)com.cognos.crconfig.data.CnfgActionEngine$CnfgActionThread.run(CnfgActionEngine.java:394)com.cognos.crconfig.data.crypto.ConfiguringSession.configure(ConfiguringSession.java:35)com.cognos.crconfig.data.DataManager.generateCryptoKeys(DataManager.java:3037)com.cognos.crconfig.data.DataManager$4.run(DataManager.java:4169)com.cognos.crconfig.data.CnfgActionEngine$CnfgActionThread.run(CnfgActionEngine.java:394)com.cognos.crconfig.data.crypto.ConfiguringSession.configure(ConfiguringSession.java:35)com.cognos.crconfig.data.DataManager.generateCryptoKeys(DataManager.java:3037)com.cognos.crconfig.data.DataManager$4.run(DataManager.java:4169)com.cognos.crconfig.data.CnfgActionEngine$CnfgActionThread.run(CnfgActionEngine.java:394)

This looks scary but has an easy fix. You aren’t communicating with the app server. Probably their dispatcher services are down. Bring them up and it should work fine – it did for me. This is assuming of course that you have your dispatcher URLs set up correctly.

I cloned my Cognos web gateway and got this error
I waited for a few weeks to examine the clone. I ran

$ ./cogconfig.sh

and got this error:

16/05/2013,15:57:35,Err,CAM-CRP-1280 An error occurred while trying to decrypt using the system protection key. Reason: javax.crypto.IllegalBlockSizeException: Input length (with padding) not multiple of 16 bytes

Umm. I don’t have the solution yet. One thing is most highly suspect: in the meatime we re-generated the keys on the production web gateway. So I am hoping that is all we need to do here as well.

Resolved. Here is the process I followed – a sort of colonic for Cognos:

$ cd /opt/cognos10/configuration; rm csk/* signkeypair/* encryptkeypair/* cogstartup.xml
$ cd ../bin64; ./cogconfig.sh

Then in the GUI I re-defined the app servers in the dispatcher URI portion of the environment.
Then did a Save.
Worked like a champ – four green check marks.

cogconfig hangs
This happened to me on an older server. The IBM Cognos Configuration screen displays but it’s supposed to exit so you can get to the part where you edit the configuration and it never does.

Currently no known solution.

June 2018 update
Cognos 11 install problem

The Cognos 11 install was going pretty well. Until it came time to launch cogconfig. That generated this error:

cognos10:/web/cognos11/bin64> ./cogconfig.sh

Using /usr/lib64/jvm/jre/bin/java
Exception in thread "main" java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major version; class=com/cognos/accman/jcam/crypto/CAMCryptoException, offset=6
        at java.lang.ClassLoader.defineClass(ClassLoader.java:286)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:74)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:538)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$300(URLClassLoader.java:77)
        at java.net.URLClassLoader$ClassFinder.run(URLClassLoader.java:1041)
        at java.security.AccessController.doPrivileged(AccessController.java:448)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:427)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:676)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:358)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:642)
        at java.lang.J9VMInternals.verifyImpl(Native Method)
        at java.lang.J9VMInternals.verify(J9VMInternals.java:73)
        at java.lang.J9VMInternals.initialize(J9VMInternals.java:133)
        at com.cognos.cclcfgapi.CCLConfigurationFactory.getInstance(CCLConfigurationFactory.java:59)
        at com.cognos.crconfig.CnfgPreferences.<init>(CnfgPreferences.java:51)
        at com.cognos.crconfig.CnfgPreferences.<clinit>(CnfgPreferences.java:36)
        at java.lang.J9VMInternals.initializeImpl(Native Method)
        at java.lang.J9VMInternals.initialize(J9VMInternals.java:199)
        at CRConfig.main(CRConfig.java:144)

Note my system java version is woefully out-of-date:

$ /usr/lib64/jvm/jre/bin/java ‐version

java version "1.6.0"
Java(TM) SE Runtime Environment (build pxa6460sr16fp15-20151106_01(SR16 FP15))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr16fp15-20151020_272943 (JIT enabled, AOT enabled)
J9VM - 20151020_272943
JIT  - r9_20151019_103450
GC   - GA24_Java6_SR16_20151020_1627_B272943)
JCL  - 20151105_01

whereas the Cognos-supplied Java is two versions ahead:
cognos10:/web/cognos11> ./jre/bin/java ‐version

java version "1.8.0"
Java(TM) SE Runtime Environment (build pxa6480sr4fp10-20170727_01(SR4 FP10))
IBM J9 VM (build 2.8, JRE 1.8.0 Linux amd64-64 Compressed References 20170722_357405 (JIT enabled, AOT enabled)
J9VM - R28_20170722_0201_B357405
JIT  - tr.r14.java_20170722_357405
GC   - R28_20170722_0201_B357405_CMPRSS
J9CL - 20170722_357405)
JCL - 20170726_01 based on Oracle jdk8u144-b01

Instead of the previous approach which involved upgrading the system Java, I decided to just try the Java version Cognos itself had installed. In the following commands note that my installation directory was /web/cognos11.

$ cd /web/cognos11; export JAVA_HOME=`pwd`/jre
$ ./cogconfig.sh

Using /web/cognos11/jre/bin/java
06/06/2018,11:13:04,Dbg,Use Customized settings for font and color.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/web/cognos11/bin/slf4j-nop-1.7.23.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/web/cognos11/configuration/utilities/config-util.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.helpers.NOPLoggerFactory]
06/06/2018,11:13:10,Dbg,The original cogstartup.xml file is clear text. Don't back it up.

That is to say, it worked! I’ve often seen software packages install their own versions of Java. This is the first time I thought to take advantage of that. Wish I had thought of this approach during the Cognos 10 install!