Categories
Admin IT Operational Excellence Linux SLES

The IT Detective Agency: Cognos stopped working

Intro
Here’s another in our continuing exciting IT drama. A user reports that her Cognos app stopped working. She’s in charge of the Cognos application servers, I run the Cognos gateway on a Linux server. I have almost no working knowledge of Cognos. I learned just enough to get the gateway installed and configured on Linux, specifically SLES. Cognos is used for business intelligence reports and is now owned by IBM.

The Details
The home page came up just fine, so I knew the web server – Apache, of course – was working. I know I hadn’t changed anything on the gateway. She also says that she hadn’t changed anything on the dispatcher. So she asks me to save the config. It’s an X application. I run cogconfig.sh, which by the way is in COGNOS-INSTALL_DIR/bin64, not COGNOS-INSTALL_DIR/bin, contrary to the documentation for Linux. I cannot save the config. She asks me to export it. I can’t do that either! I get the error

CAM-CRP-1057 unable to generate the machine specific symmetric key.

She asks me to delete the keypairs. These are in the directories COGNOS-INSTALL_DIR/configuration/{signkeypair,encryptkeypair}. So I clear out those. Still I cannot save or export the configuration. I quickly switch to a Solaris server which we had hoped to retire in order to get a working gateway while we mulled the problem over.

Over the next days I checked to see if Java had changed. Getting a working JRE was a little tricky on SLES. Nothing had changed. After the system admin came back from vacation the next week I asked if by chance. The last log showed he was logged in at the time. He admits to changing one thing.

He changed the system name. This system has multiple interfaces and a unique hostname for each interface. The hosts file in /etc/hosts included entries for each of the interface IPs. Seeing there were no other changes I concluded that this little innocent act was enough to kill the communication. Note that he did not change any of the routing, however. When you’re dealing with encryption, it can be that the system name is significant. So when those keys were initially generated they were tied to that name and would only work with that original hostname. At least that is my reverse engineering of the matter. Cognos is a pretty closed system so it’s hard to pin down more precisely what is going on.

Conclusion
The hostname was changed back to the original name. Sure enough, now I can export the config and most importantly, save it without any errors.

Case closed!

Lessons Learned
Well, avoiding finger-pointing and quick judgements was helpful in this case. Of course I suspected she actually had done something to the dispatcher, but I behaved as though the problem might be on my side. We treated each other professionally while the system was down and we had no clue why. That was very helpful.