Categories
Admin Security

User Add/Delete Jython Script for RSA Authentication Manager

Intro
I thought I might save someone else the trouble of re-creating a respectable program using the Authentication Manager SDK for version 7.1 which can simply add and delete users to keep the local database in sync with an external database.

History Lesson
We had this problem licked under Authentication Manager v 6.1.2. In that version the sdk was TCL-based and for whatever reason, it seemed a whole lot simpler to understand the model and get working code. When we began to look at v 7.1 we saw we were confronted with a whole different animal that required new understanding and new skills to master.

The Details
Jython is Python plus Java. I really don’t know either language so I used a technique you might call programming by extrapolation. Here is the code. Not really understanding python I preserved as much as possible for fear of breaking something. I nevertheless had to be a little innovative and create a new function.

'''
 * Jython class demonstrating the Administration API
 * usage from a Jython script.
 *
 * Run this script in the utils directory of the Authentication Manager installation.
 *
 * Execute the command "rsautil jython AdminAPIDemos.py create <admin user name> <password>"
 * Execute the command "rsautil jython AdminAPIDemos.py assign <admin user name> <password>"
 * Execute the command "rsautil jython AdminAPIDemos.py update <admin user name> <password>"
 * Execute the command "rsautil jython AdminAPIDemos.py delete <admin user name> <password>"
 *
 * If you are executing this script in an environment other than the predefined
 * rsautil scripting tool you must make the CommandClientAppContext.xml file
 * available in the end of the classpath for this script. You must also configure
 * the necessary connection parameters in a properties file located in the process
 * working directory. See the provided samples for more information.
'''
 
# imports
from jarray import array
import sys
# DrJ required import
# Not Workign! from org.python.modules import re
from java.util.regex import *
from java.lang import *
 
 
from java.util import Calendar,Date
from java.lang import String
 
from org.springframework.beans import BeanUtils
 
from com.rsa.admin import AddGroupCommand
from com.rsa.admin import AddPrincipalsCommand
from com.rsa.admin import DeleteGroupCommand
from com.rsa.admin import DeletePrincipalsCommand
from com.rsa.admin import LinkGroupPrincipalsCommand
from com.rsa.admin import LinkAdminRolesPrincipalsCommand
from com.rsa.admin import SearchAdminRolesCommand
from com.rsa.admin import SearchGroupsCommand
from com.rsa.admin import SearchPrincipalsCommand
from com.rsa.admin import SearchRealmsCommand
from com.rsa.admin import SearchSecurityDomainCommand
from com.rsa.admin import UpdateGroupCommand
from com.rsa.admin import UpdatePrincipalCommand
from com.rsa.admin.data import AdminRoleDTOBase
from com.rsa.admin.data import GroupDTO
from com.rsa.admin.data import IdentitySourceDTO
from com.rsa.admin.data import ModificationDTO
from com.rsa.admin.data import PrincipalDTO
from com.rsa.admin.data import RealmDTO
from com.rsa.admin.data import SecurityDomainDTO
from com.rsa.admin.data import UpdateGroupDTO
from com.rsa.admin.data import UpdatePrincipalDTO
from com.rsa.authmgr.admin.agentmgt import AddAgentCommand
from com.rsa.authmgr.admin.agentmgt import DeleteAgentsCommand
from com.rsa.authmgr.admin.agentmgt import LinkAgentsToGroupsCommand
from com.rsa.authmgr.admin.agentmgt import SearchAgentsCommand
from com.rsa.authmgr.admin.agentmgt import UpdateAgentCommand
from com.rsa.authmgr.admin.agentmgt.data import AgentConstants
from com.rsa.authmgr.admin.agentmgt.data import AgentDTO, ListAgentDTO
from com.rsa.authmgr.admin.hostmgt.data import HostDTO
from com.rsa.authmgr.admin.principalmgt import AddAMPrincipalCommand
from com.rsa.authmgr.admin.principalmgt.data import AMPrincipalDTO
from com.rsa.authmgr.admin.tokenmgt import GetNextAvailableTokenCommand
from com.rsa.authmgr.admin.tokenmgt import LinkTokensWithPrincipalCommand
from com.rsa.authn import SearchPasswordPoliciesCommand
from com.rsa.authn import UpdatePasswordPolicyCommand
from com.rsa.authn.data import PasswordPolicyDTO
from com.rsa.command import ClientSession
from com.rsa.command import CommandException
from com.rsa.command import CommandTargetPolicy, ConnectionFactory
from com.rsa.command.exception import DataNotFoundException, DuplicateDataException
from com.rsa.common.search import Filter
 
'''
 * This class demonstrates the usage patterns of the
 * Authentication Manager 7.1 API.
 *
 * <p>
 * The first set of operations performed if the first
 * command line argument is equal to "create".
 * The sample creates a restricted agent, a group, and a user.
 * Links the user to the group and the group to the agent.
 * </p>
 * <p>
 * The second set of operations performed if the first
 * command line argument is equal to "delete".
 * Lookup the user, group and agent created above.
 * Delete the user, group and agent.
 * </p>
 * <p>
 * A third set of operations is performed if the first
 * command line argument is equal to "assign".
 * Lookup the user and assign the next available
 * SecurID token to the user.
 * Lookup the SuperAdminRole and assign it to the user.
 * </p>
 * <p>
 * A fourth set of operations performed if the first
 * command line argument is equal to "update".
 * Update the Agent, Group, and User objects.
 * </p>
 * <p>
 * A fifth set of operations performed if the first
 * command line argument is equal to "disable".
 * Lookup a password policy with a name that starts
 * with "Initial" and then disable the password history
 * for that policy. Use this to allow the sample to
 * perform multiple updates of the user password using
 * the same password for each update.
 * </p>
 * <p>
 * The APIs demonstrated include the use of the Filter
 * class to generate search expressions for use with
 * all search commands.
 * </p>
'''
class AdminAPIDemos:
 
    '''
     * We need to know these fairly static values throughout this sample.
     * Set the references to top level security domain (realm) and system
     * identity source to use later.
     *
     * @throws CommandException if something goes wrong
    '''
    def __init__(self):
        searchRealmCmd = SearchRealmsCommand()
        searchRealmCmd.setFilter( Filter.equal( RealmDTO.NAME_ATTRIBUTE, "SystemDomain"))
        searchRealmCmd.execute()
        realms = searchRealmCmd.getRealms()
        if( len(realms) == 0 ):
            print "ERROR: Could not find realm SystemDomain"
            sys.exit( 2 )
 
        self.domain = realms[0].getTopLevelSecurityDomain()
        self.idSource = realms[0].getIdentitySources()[0]
 
 
    '''
     * Create an agent and set it to be restricted.
     *
     * @param: name the name of the agent to create
     * @param: addr the IP address for the agent
     * @param: alt array of alternate IP addresses
     * @return: the GUID of the agent just created
     * 
     * @throws CommandException if something goes wrong
    '''
    def createAgent(self, name, addr, alt):
        # need a HostDTO to be set
        host = HostDTO()
        host.setName(name)
        host.setPrimaryIpAddress(addr)
        host.setSecurityDomainGuid(self.domain.getGuid())
        host.setNotes("Created by AM Demo code")
 
        # the agent to be created
        agent = AgentDTO()
        agent.setName(name)
        agent.setHost(host)
        agent.setPrimaryAddress(addr)
        agent.setAlternateAddresses(alt)
        agent.setSecurityDomainId(self.domain.getGuid())
        agent.setAgentType(AgentConstants.STANDARD_AGENT)
        agent.setRestriction(1) # only allow activated groups
        agent.setEnabled(1)
        agent.setOfflineAuthDataRefreshRequired(0)
        agent.setNotes("Created by AM Demo code")
 
        cmd = AddAgentCommand(agent)
 
	try:        
	    cmd.execute()
        except DuplicateDataException:
            print "ERROR: Agent " + name + " already exists."
	    sys.exit(2)
 
        # return the created agents GUID for further linking
        return cmd.getAgentGuid()
 
 
    '''
     * Lookup an agent by name.
     *
     * @param: name the agent name to lookup
     * @return: the GUID of the agent
     * 
     * @throws CommandException if something goes wrong
    '''
    def lookupAgent(self, name):
        cmd = SearchAgentsCommand()
        cmd.setFilter(Filter.equal(AgentConstants.FILTER_HOSTNAME, name))
        cmd.setLimit(1)
        cmd.setSearchBase(self.domain.getGuid())
        # the scope flags are part of the SecurityDomainDTO
        cmd.setSearchScope(SecurityDomainDTO.SEARCH_SCOPE_ONE_LEVEL)
 
        cmd.execute()
 
	if (len(cmd.getAgents()) < 1):
            print "ERROR: Unable to find agent " + name + "."  
	    sys.exit(2)
 
        return cmd.getAgents()[0]
 
 
    '''
     * Update an agent, assumes a previous lookup done by lookupAgent.
     *
     * @param agent the result of a previous lookup
     *
     * @throws CommandException if something goes wrong
    '''
    def updateAgent(self, agent):
        cmd = UpdateAgentCommand()
 
        agentUpdate = AgentDTO()
        # copy the rowVersion to satisfy optimistic locking requirements
        BeanUtils.copyProperties(agent, agentUpdate)
 
        # ListAgentDTO does not include the SecurityDomainId
        # use the GUID of the security domain where agent was created
        agentUpdate.setSecurityDomainId(self.domain.getGuid())
 
        # clear the node secret flag and modify some others
        agentUpdate.setSentNodeSecret(0)
        agentUpdate.setOfflineAuthDataRefreshRequired(1)
        agentUpdate.setIpProtected(1)
        agentUpdate.setEnabled(1)
        agentUpdate.setNotes("Modified by AM Demo code")
 
        # set the requested updates in the command
        cmd.setAgentDTO(agentUpdate)
 
        # perform the update
        cmd.execute()
 
 
    '''
     * Delete an agent.
     *
     * @param: agentGuid the GUID of the agent to delete
     * 
     * @throws CommandException if something goes wrong
    '''
    def deleteAgent(self, agentGuid):
        cmd = DeleteAgentsCommand( [agentGuid] )
        cmd.execute()
 
 
    '''
     * Create an IMS user, needs to exist before an AM user can be
     * created.
     *
     * @param: userId the user's login UID
     * @param: password the user's password
     * @param: first the user's first name
     * @param: last the user's last name
     * 
     * @return: the GUID of the user just created
     * 
     * @throws CommandException if something goes wrong
    '''
    def createUser(self, userId, password, first, last):
        cal = Calendar.getInstance()
 
        # the start date
        now = cal.getTime()
# DrJ: add 50 years from now!    
        cal.add(Calendar.YEAR, 50)
 
        # the account end date
        expire = cal.getTime()
 
        principal = PrincipalDTO()
        principal.setUserID( userId )
        principal.setFirstName( first )
        principal.setLastName( last )
        #     principal.setPassword( password )
 
        principal.setEnabled(1)
        principal.setLockoutStatus(0)
        principal.setAccountStartDate(now)
        #principal.setAccountExpireDate(expire)
        #principal.setAccountExpireDate(0)
        principal.setAdminRole(0)
        principal.setCanBeImpersonated(0)
        principal.setTrustToImpersonate(0)
 
        principal.setSecurityDomainGuid( self.domain.getGuid() )
        principal.setIdentitySourceGuid( self.idSource.getGuid() )
        principal.setDescription("Created by DrJ utilities")
 
        cmd = AddPrincipalsCommand()
        cmd.setPrincipals( [principal] )
 
        try:
            cmd.execute()
	except DuplicateDataException:
            print "ERROR: User " + userId + " already exists."
	    sys.exit(2)
 
        # only one user was created, there should be one GUID result
        return cmd.getGuids()[0]
 
 
    '''
     * Lookup a user by login UID.
     * 
     * @param: userId the user login UID
     *
     * @return: the GUID of the user record.
    '''
    def lookupUser(self, userId):
        cmd = SearchPrincipalsCommand()
        cmd.setFilter(Filter.equal(PrincipalDTO.LOGINUID, userId))
        cmd.setSystemFilter(Filter.empty())
        cmd.setLimit(1)
        cmd.setIdentitySourceGuid(self.idSource.getGuid())
        cmd.setSecurityDomainGuid(self.domain.getGuid())
        cmd.setGroupGuid(None)
        cmd.setOnlyRegistered(1)
        cmd.setSearchSubDomains(0)
 
        cmd.execute()
 
	if (len(cmd.getPrincipals()) < 1):
            print "ERROR: Unable to find user " + userId + "."
	    sys.exit(2)
 
        return cmd.getPrincipals()[0]
 
 
    '''
     * Update the user definition.
     *
     * @param user the principal object from a previous lookup
    '''
    def updateUser(self, user):
        cmd = UpdatePrincipalCommand()
        cmd.setIdentitySourceGuid(user.getIdentitySourceGuid())
 
        updateDTO = UpdatePrincipalDTO()
        updateDTO.setGuid(user.getGuid())
        # copy the rowVersion to satisfy optimistic locking requirements
        updateDTO.setRowVersion(user.getRowVersion())
 
        # collect all modifications here
        mods = []
 
        # first change the email
        mod = ModificationDTO()
        mod.setOperation(ModificationDTO.REPLACE_ATTRIBUTE)
        mod.setName(PrincipalDTO.EMAIL)
        mod.setValues([ user.getUserID() + "@mycompany.com" ])
        mods.append(mod) # add it to the list
 
        # also change the password
        mod = ModificationDTO()
        mod.setOperation(ModificationDTO.REPLACE_ATTRIBUTE)
        mod.setName(PrincipalDTO.PASSWORD)
        mod.setValues([ "MyNewPAssW0rD1!" ])
        mods.append(mod) # add it to the list
 
        # change the middle name
        mod = ModificationDTO()
        mod.setOperation(ModificationDTO.REPLACE_ATTRIBUTE)
        mod.setName(PrincipalDTO.MIDDLE_NAME)
        mod.setValues([ "The Big Cahuna" ])
        mods.append(mod) # add it to the list
 
        # make a note of this update in the description
        mod = ModificationDTO()
        mod.setOperation(ModificationDTO.REPLACE_ATTRIBUTE)
        mod.setName(PrincipalDTO.DESCRIPTION)
        mod.setValues([ "Modified by AM Demo code" ])
        mods.append(mod) # add it to the list
 
        # set the requested updates into the UpdatePrincipalDTO
        updateDTO.setModifications(mods)
        cmd.setPrincipalModification(updateDTO)
 
        # perform the update
        cmd.execute()
 
 
    '''
     * Delete a user.
     *
     * @param: userGuid the GUID of the user to delete
     * 
     * @throws CommandException if something goes wrong
    '''
    def deleteUser(self, userGuid):
        cmd = DeletePrincipalsCommand()
        cmd.setGuids( array( [userGuid], String ) )
        cmd.setIdentitySourceGuid( self.idSource.getGuid() )
        cmd.execute()
 
 
    '''
     * Create an Authentication Manager user linked to the IMS user.
     * The user will have a limit of 3 bad passcodes, default shell
     * will be "/bin/sh", the static password will be "12345678" and
     * the Windows Password for offline authentication will be "Password123!".
     *
     * @param: guid the GUID of the IMS user
     * 
     * @throws CommandException if something goes wrong
    '''
    def createAMUser(self, guid):
        principal = AMPrincipalDTO()
        principal.setGuid(guid)
        principal.setBadPasscodes(3)
        principal.setDefaultShell("/bin/sh")
        principal.setDefaultUserIdShellAllowed(1)
        # these next three innocent-looking lines cost you a license! do not use them!! - DrJ 
        #principal.setStaticPassword("12345678")
        #principal.setStaticPasswordSet(1)
        #principal.setWindowsPassword("Password123!")
 
        cmd = AddAMPrincipalCommand(principal)
        cmd.execute()
 
 
    '''
     * Create a group to assign a user to.
     *
     * @param: name the name of the group to create
     * @return: the GUID of the group just created
     * 
     * @throws CommandException if something goes wrong
    '''
    def createGroup(self, name):
        group = GroupDTO()
        group.setName(name)
        group.setDescription("Created by AM Demo code")
        group.setSecurityDomainGuid(self.domain.getGuid())
        group.setIdentitySourceGuid(self.idSource.getGuid())
 
        cmd = AddGroupCommand()
        cmd.setGroup(group)
 
	try:
            cmd.execute()
	except DuplicateDataException:
            print "ERROR: Group " + name + " already exists."
	    sys.exit(2)
 
        return cmd.getGuid()
 
    '''
     * Lookup a group by name.
     *
     * @param: name the name of the group to lookup
     * @return: the GUID of the group
     * 
     * @throws CommandException if something goes wrong
    '''
    def lookupGroup(self, name):
        cmd = SearchGroupsCommand()
        cmd.setFilter(Filter.equal(GroupDTO.NAME, name))
        cmd.setSystemFilter(Filter.empty())
        cmd.setLimit(1)
        cmd.setIdentitySourceGuid(self.idSource.getGuid())
        cmd.setSecurityDomainGuid(self.domain.getGuid())
        cmd.setSearchSubDomains(0)
        cmd.setGroupGuid(None)
 
        cmd.execute()
 
	if (len(cmd.getGroups()) < 1):
            print "ERROR: Unable to find group " + name + "."
	    sys.exit(2)
 
        return cmd.getGroups()[0]
 
 
    '''
     * Update a group definition.
     *
     * @param group the current group object
    '''
    def updateGroup(self, group):
        cmd = UpdateGroupCommand()
        cmd.setIdentitySourceGuid(group.getIdentitySourceGuid())
 
        groupMod = UpdateGroupDTO()
        groupMod.setGuid(group.getGuid())
        # copy the rowVersion to satisfy optimistic locking requirements
        groupMod.setRowVersion(group.getRowVersion())
 
        # collect all modifications here
        mods = []
 
        mod = ModificationDTO()
        mod.setOperation(ModificationDTO.REPLACE_ATTRIBUTE)
        mod.setName(GroupDTO.DESCRIPTION)
        mod.setValues([ "Modified by AM Demo code" ])
        mods.append(mod)
 
        # set the requested updates into the UpdateGroupDTO
        groupMod.setModifications(mods)
        cmd.setGroupModification(groupMod)
 
        # perform the update
        cmd.execute()
 
 
    '''
     * Delete a group.
     *
     * @param: groupGuid the GUID of the group to delete
     * 
     * @throws CommandException if something goes wrong
    '''
    def deleteGroup(self, groupGuid):
        cmd = DeleteGroupCommand()
        cmd.setGuids( [groupGuid] )
        cmd.setIdentitySourceGuid( self.idSource.getGuid() )
        cmd.execute()
 
 
    '''
     * Assign the user to the specified group.
     *
     * @param: userGuid the GUID for the user to assign
     * @param: groupGuid the GUID for the group
     * 
     * @throws CommandException if something goes wrong
    '''
    def linkUserToGroup(self, userGuid, groupGuid):
        cmd = LinkGroupPrincipalsCommand()
        cmd.setGroupGuids( [groupGuid] )
        cmd.setPrincipalGuids( [userGuid] )
        cmd.setIdentitySourceGuid(self.idSource.getGuid())
 
        cmd.execute()
 
    '''
     * Assign the group to the restricted agent so users can authenticate.
     *
     * @param: agentGuid the GUID for the restricted agent
     * @param: groupGuid the GUID for the group to assign
     * 
     * @throws CommandException if something goes wrong
    '''
    def assignGroupToAgent(self, agentGuid, groupGuid):
        cmd = LinkAgentsToGroupsCommand()
        cmd.setGroupGuids( [groupGuid] )
        cmd.setAgentGuids( [agentGuid] )
        cmd.setIdentitySourceGuid(self.idSource.getGuid())
 
        cmd.execute()
 
    '''
     * Assign next available token to this user.
     *
     * @param: userGuid the GUID of the user to assign the token to
     * 
     * @throws CommandException if something goes wrong
    '''
    def assignNextAvailableTokenToUser(self, userGuid):
        cmd = GetNextAvailableTokenCommand()
        try:
            cmd.execute()
        except DataNotFoundException:
            print "ERROR: No tokens available"
        else:
            tokens = [cmd.getToken().getId()]
            cmd2 = LinkTokensWithPrincipalCommand(tokens, userGuid)
            cmd2.execute()
            print ("Assigned next available SecurID token to user jdoe")
 
    '''
     * Lookup an admin role and return the GUID.
     *
     * @param name the name of the role to lookup
     * @return the GUID for the required role
     *
     * @throws CommandException if something goes wrong
     '''
    def lookupAdminRole(self, name):
        cmd = SearchAdminRolesCommand()
 
        # set search filter to match the name
        cmd.setFilter(Filter.equal(AdminRoleDTOBase.NAME_ATTRIBUTE, name))
        # we only expect one anyway
        cmd.setLimit(1)
        # set the domain GUID
        cmd.setSecurityDomainGuid(self.domain.getGuid())
 
        cmd.execute()
	if (len(cmd.getAdminRoles()) < 1):
            print "ERROR: Unable to find admin role " + name + "."
	    sys.exit(2)
 
        return cmd.getAdminRoles()[0].getGuid()
 
    '''
     * Assign the given admin role to the principal provided.
     *
     * @param adminGuid the GUID for the administrator
     * @param roleGuid the GUID for the role to assign
     *
     * @throws CommandException if something goes wrong
     '''
    def assignAdminRole(self, adminGuid, roleGuid):
        cmd = LinkAdminRolesPrincipalsCommand()
        cmd.setIgnoreDuplicateLink(1)
        cmd.setPrincipalGuids( [ adminGuid ] )
        cmd.setAdminRoleGuids( [ roleGuid ] )
        cmd.execute()
        print ("Assigned SuperAdminRole to user jdoe")
 
    '''
     * Lookup a password policy by name and return the object.
     *
     * @param name the policy name
     * @return the object
     *
     * @throws CommandException if something goes wrong
     '''
    def lookupPasswordPolicy(self, name):
        cmd = SearchPasswordPoliciesCommand()
        cmd.setRealmGuid(self.domain.getGuid())
 
        # match the policy name
        cmd.setFilter(Filter.startsWith(PasswordPolicyDTO.NAME, name))
 
        cmd.execute()
 
	if (len(cmd.getPolicies()) < 1):
            print ("ERROR: Unable to find password policy with name starting with " + name + ".")
	    sys.exit(2)
 
        # we only expect one anyway
        return cmd.getPolicies()[0]
 
    '''
     * Update the given password policy, currently it just disables
     * password history.
     *
     * @param policy the policy to update
     *
     * @throws CommandException if something goes wrong
     '''
    def updatePasswordPolicy(self, policy):
        cmd = UpdatePasswordPolicyCommand()
 
        # disable password history
        policy.setHistorySize(0)
        cmd.setPasswordPolicy(policy)
 
        cmd.execute()
 
    '''
     * Create a collection of related entities, user, agent, group, token.
     *
     * @param admin the administrator user name
     * @param password the administrator password
     * 
     * @throws Exception if something goes wrong
    '''
    def doCreate(self):
 
        # Create a hypothetical agent with four alternate addresses
        addr = "1.2.3.4"
        alt = [ "2.2.2.2",  "3.3.3.3", "4.4.4.4", "5.5.5.5" ]
 
        # create a restricted agent
        agentGuid = self.createAgent("Demo Agent", addr, alt)
        print ("Created Demo Agent")
 
        # create a user group
        groupGuid = self.createGroup("Demo Agent Group")
        print ("Created Demo Agent Group")
 
        # assign the group to the restricted agent
        self.assignGroupToAgent(agentGuid, groupGuid)
        print ("Assigned Demo Agent Group to Demo Agent")
 
        # create a user and the AMPrincipal user record
        userGuid = self.createUser("jdoe", "Password123!", "John", "Doe")
        self.createAMUser(userGuid)
        print ("Created user jdoe")
 
        # link the user to the group
        self.linkUserToGroup(userGuid, groupGuid)
        print ("Added user jdoe to Demo Agent Group")
 
    '''
     * add user by DrJ
     *
     * @param admin the administrator user name
     * @param password the administrator password
     * 
     * @throws Exception if something goes wrong
    '''
    def doAdd(self):
        # create a user and the AMPrincipal user record
        # loop over all users listed in addusers.txt
        f = open('addusers.txt','r')
        str = f.readline()
        while str:
            strs = str.rstrip()
            cols = strs.split(",")
            userid = cols[0]
            fname = cols[1]
            lname = cols[2]
            print userid
# if user already exists we want to go continue with the list
	    try:
                userGuid = self.createUser(userid, "*LK*", fname, lname)
                self.createAMUser(userGuid)
                print "Created user userid,fname,lname: ", userid,",",lname,",",fname,"\n"
            except:
                print "exception for user ",userid,"\n"
            str = f.readline()
 
        f.close()
 
 
    '''
     * Assign the next available token to the user.
     *
     * @param admin the administrator user name
     * @param password the administrator password
     * 
     * @throws Exception if something goes wrong
    '''
    def doAssignNextToken(self):
 
        # lookup and then ...
        userGuid = self.lookupUser("jdoe").getGuid()
 
        # assign the next available token to this user
        self.assignNextAvailableTokenToUser(userGuid)
 
        # now that he has a token make him an admin
        roleGuid = self.lookupAdminRole("SuperAdminRole")
        self.assignAdminRole(userGuid, roleGuid)
 
    '''
     * Delete the entities created by the doCreate method.
     *
     * @param admin the administrator user name
     * @param password the administrator password
     * 
     * @throws Exception if something goes wrong
    '''
    def doDelete(self):
 
        # lookup and then ...
        # loop over all users listed in delusers.txt
        f = open('delusers.txt','r')
        str = f.readline()
        while str:
            # format: userid,fname,lname  . We just want the userid
            cols = str.split(",")
            userid = cols[0]
            print userid
# if user doesn't exist we want to go continue with the list
	    try:
                userGuid = self.lookupUser(userid).getGuid()
                # ... cleanup
                self.deleteUser(userGuid)
                print "Deleted user ",userid
            except:
                print "exception for user ",userid,"\n"
            str = f.readline()
 
        f.close()
 
    '''
     * Update the various entities created by the doCreate method.
     *
     * @throws Exception if something goes wrong
     '''
    def doUpdate(self):
        # lookup and then ...
        agent = self.lookupAgent("Demo Agent")
        group = self.lookupGroup("Demo Agent Group")
        user = self.lookupUser("jdoe")
 
        # ... update
        self.updateAgent(agent)
        print ("Updated Demo Agent")
        self.updateGroup(group)
        print ("Updated Demo Agent Group")
        self.updateUser(user)
        print ("Updated user jdoe")
 
    '''
     * Disable password history limit on default password policy so
     * we can issue multiple updates for the user password.
     *
     * @throws Exception if something goes wrong
     '''
    def doDisablePasswordHistory(self):
        # lookup and then ...
        policy = self.lookupPasswordPolicy("Initial")
 
        # ... update
        self.updatePasswordPolicy(policy)
        print ("Disabled password history")
 
# Globals here
'''
 * Show usage message and exit.
 * 
 * @param msg the error causing the exit
'''
def usage(msg):
    print ("ERROR: " + msg)
    print ("Usage: APIDemos <create|delete> <admin username> <admin password>")
    sys.exit(1)
 
'''
 * Use from command line with three arguments.
 * 
 * <p>
 * First argument:
 * create - to create the required entities
 * assign - to assign the next available token to the user
 * delete - to delete all created entities
 * </p>
 * <p>
 * Second argument is the administrator user name.
 * Third argument is the administrator password.
 * </p>
 * 
 * @param args the command line arguments
'''
 
if len(sys.argv) != 4:
    usage("Missing arguments")
 
# skip script name
args = sys.argv[1:]
 
# establish a connected session with given credentials
conn = ConnectionFactory.getConnection()
session = conn.connect(args[1], args[2])
 
# make all commands execute using this target automatically
CommandTargetPolicy.setDefaultCommandTarget(session)
 
 
try:
    # create instance
    api = AdminAPIDemos()
    # call delusers before addusers
    print "Deleting users...\n"
    api.doDelete()
    print "Adding users...\n"
    api.doAdd()
 
finally:
    # logout when done
    session.logout()

I of course worked from their demo file, AdminAPIDemos.py, and kept the name for simplicity. I added a a doAdd routine and modified their doDelete function.

These modified functions expect external files to exist, addusers.txt and delusers.txt. The syntax of addusers.txt is:

loginname1,first_name,last_name
loginname2,first_name,last_name
...

Delusers.txt has the same syntax.

The idea is that if you can create these files once per day with the new users/removed users from your corporate directory by some other means, then you have a way to use them as a basis for keeping your AM internal database in sync with your external enterprise directory, whatever it might be.

Other Notes
Initially I saw my users were set to expire after a year or so. The original code I borrwed from had lines like this:

        cal = Calendar.getInstance()
 
        # the start date
        now = cal.getTime()
 
        cal.add(Calendar.YEAR, 1)
 
        # the account end date
        expire = cal.getTime()

which caused this. I eventually found how to set a flag to create the account with unlimited validity.

I also introduced a very simple regex handling to break up the input lines. This caused the need for importing additional classes:

from java.util.regex import *
from java.lang import *

I could not get python regexes to work.

I also found these three innocent-looking lines were costing me a license unit for each added user:

        principal.setStaticPassword("12345678")
        principal.setStaticPasswordSet(1)
        principal.setWindowsPassword("Password123!")

So I commented them out as I did not need them.

That’s it!

Getting the SDK running cost me a few days but at least I’ve documented that as well in pretty good detail: Problems with Jython API for RSA Authentication Manager.

Conclusion
We’ve shared with the community an actual, working jython API for adding/removing users from an RSA Authentication Manager v 7.1 database.

Categories
Admin Security

Problems with Jython API for RSA Authentication Manager

Intro
This session is not for the faint-of-heart. I describe some of the many issues I had in trying to run a slightly modified jython program which I use to keep the local directory in sync with an external directory source. Since I am a non-specialist in all these technologies, I am describing this from a non-specialist’ point-of-view.

The Details
RSA provides an authentication manager sdk, AM7.1_sdk.zip, which is required to use the API.

I had it all working on our old appliance. I thought I could copy files onto the new appliance and it would all work again. It’s not nearly so simple.

You log on and su to rsaadmin for all this work.

Let’s call RSAHOME = /usr/local/RSASecurity/RSAAuthenticationManager.

Initially the crux of the problem is to get
$RSAHOME/appserver/jdk/samples/admin/src/AdminAPIDemos.py to run.

But how do you even run it if you know nothing about java, python, jython, and, IMS and weblogic? Ha. It isn’t so easy.

As you see from the above path I unpacked the sdk below appserver. I did most of my work in the $RSAHOME/appserver/jdk/samples/admin directory. First thing is to very carefully follow the instructions in the sdk documentation about copying files and about initializing a trust.jks keystore. You basically grab .jar files from several places and copy them to $RSAHOME/appserver/jdk/lib/java. Failure to do so will be disastrous.

Ant is apparently a souped-up version of make. You need it to compile the jython example. They don’t tell you where it is. I found it here:

$RSAHOME/appserver/modules/org.apache.ant_1.6.5/bin/ant

I created my own run script I call jython-build:

#!/bin/bash
Ant=/usr/local/RSASecurity/RSAAuthenticationManager/appserver/modules/org.apache.ant_1.6.5/bin/ant
# compiles the examples
#$Ant compile
#$Ant verify-setup
# this worked!
#$Ant run-create-jython
$Ant run-add-jython

This didn’t work at first. One error I got:

$Ant verify-setup
Error: JAVA_HOME is not defined correctly.
  We cannot execute java

OK. Even non-java experts know how to define the JAVA_HOME environment variable. I did this:

$ export JAVA_HOME=/usr/local/RSASecurity/RSAAuthenticationManager/appserver/jdk

I did a

$ $Ant verify-setup

and got missing com.bea.core.process.5.3.0.0.jar. I eventually found it in …utils/jars/thirdparty. And so I started copying in all the missing files. This was before I discovered the documentation! Copying all the jar files can get you so far, but you will never succeed in copying in wlfullclient.jar because it doesn’t exist! Turns out there is a step described in the sdk documentation which you need to do that creates this jar file. Similarly, trust.jks, your private keystore, does not exist. You have to follow the steps in the sdk documentation to create it with keytool, etc. You’ll need to augment your path, of course:

$ export PATH=$PATH:/usr/local/RSASecurity/RSAAuthenticationManager/appserver/jdk/bin

Some errors are experienced in and around this time:

org/apache/log4j/Appender not found
org/apache/commons/logging not found

This was when I was copying jar files in one-by-one and not following the sdk instructions.

I also got

weblogic.security.SSL.TrustManager class not found. This was more vexing at the time because it didn’t exist in any of my jar files! This is when I discovered that wlfullclient.jar has to be created by hand. It contains that class. Here’s how I check the jar contents:

$ jar tvf wlfullclient.jar|grep weblogic/security/SSL/Trust

   481 Wed May 09 18:12:02 EDT 2007 weblogic/security/SSL/TrustManager.class

For the record, my …lib/java directory, which has a few more files than it actually needs, looks like this:

activation-1.1.jar                commons-pool-1.2.jar                    opensaml-1.0.jar
am-client.jar                     commons-validator-1.3.0.jar             oscache-2.3.2rsa-1.jar
am-server-o.jar                   console-integration-api.jar             replication-api.jar
ant-1.6.5.jar                     dbunit-2.0.jar                          rsaweb-security-3.0.jar
antlr-2.7.6.jar                   dom4j-1.6.1.jar                         rsawebui-3.0.jar
asm-1.5.3.jar                     EccpressoAsn1.jar                       serializer-2.7.0.jar
axis-1.3.jar                      EccpressoCore.jar                       spring-2.0.7.jar
axis-saaj-1.3.jar                 EccpressoJcae.jar                       spring-mock-2.0.7.jar
c3p0-0.9.1.jar                    framework-common.jar                    store-command.jar
certj-2.1.1.jar                   groovy-all-1.0-jsr-05.jar               struts-core-1.3.5.jar
cglib-2.1_3.jar                   hibernate-3.2.2.jar                     struts-extras-1.3.5.jar
classpath.jar                     hibernate-annotations-3.2.1.jar         struts-taglib-1.3.5.jar
classworlds-1.1.jar               hibernate-ejb-persistence-3.2.2.jar     struts-tiles-1.3.5.jar
clu-common.jar                    hibernate-entitymanager-3.2.1.jar       systemfields-o.jar
com.bea.core.process_5.3.0.0.jar  ims-server-o.jar                        trust.jks
commons-beanutils-1.7.0.jar       install-utils.jar                       ucm-clu-common.jar
commons-chain-1.1.jar             iScreen-1-1-0rsa-2.jar                  ucm-server-o.jar
commons-cli-1.0.jar               iScreen-ognl-1-1-0rsa-2.jar             update-instance-node-ext.jar
commons-codec-1.3.jar             jargs-1.0.jar                           wlcipher.jar
commons-collections-3.0.jar       javassist-3.9.0.GA.jar                  wlfullclient.jar
commons-dbcp-1.2.jar              jboss-archive-browsing-5.0.0.Alpha.jar  wrapper-3.2.1rsa1.jar
commons-digester-1.6.jar          jdom-1.0.jar                            wsdl4j-1.5.1.jar
commons-discovery-0.2.jar         jline-0.9.91rsa-1.jar                   xalan-2.7.0.jar
commons-fileupload-1.2.jar        jsafe-3.6.jar                           xercesImpl-2.7.1.jar
commons-httpclient-3.0.1.jar      jsafeJCE-3.6.jar                        xml-apis-1.3.02.jar
commons-io-1.2.jar                jython-2.1.jar                          xmlsec-1.2.97.jar
commons-lang-2.2.jar              license.bea                             xmlspy-schema-2006-sp2.jar
commons-logging-1.0.4.jar         log4j-1.2.11rsa-3.jar
commons-net-2.0.jar               ognl-2.6.7.jar

I don’t know if these steps are necessary, but they should work at this stage:

$Ant compile
$Ant verify-setup
$Ant run-create-jython

But are we done? No way. Don’t forget to edit your config.properties:

# JNDI factory class.
java.naming.factory.initial = weblogic.jndi.WLInitialContextFactory
 
# Server URL(s).  May be a comma separated list of URLs if running against a cluster
# NOTE: Replace authmgr-test.drj.com with the hostname of the managed server
java.naming.provider.url = t3s://authmgr-test.drj.com:7002
 
# User ID for process-level authentication.
# run rsautil manage-secrets --action list to learn these
#
com.rsa.cmdclient.user = CmdClient_blahblahblah
 
# Password for process-level authentication
com.rsa.cmdclient.user.password = blahblahblah
 
# Password for Two-Way SSL client identity keystore
com.rsa.ssl.client.id.store.password = password
 
# Password for Two-Way SSL client identity private key
com.rsa.ssl.client.id.key.password = password
 
# Provider URL for Two-Way SSL client authentication
ims.ssl.client.provider.url = t3s://authmgr-test.drj.com:7022
 
# Identity keystore for Two-Way SSL client authentication
ims.ssl.client.identity.keystore.filename = client-identity.jks
 
# Identity keystore private key alias for Two-Way SSL client authentication
ims.ssl.client.identity.key.alias = client-identity
 
# Identity keystore trusted root CA certificate alias
ims.ssl.client.root.ca.alias = root-ca
 
# SOAPCommandTargetBasicAuth provider URL
ims.soap.client.provider.url = https://authmgr-test.drj.com:7002/ims-ws/services/CommandServer

As it says you need to run

$ rsautil manage-secrets –action list

or

$ rsautil manage-secrets –action listkeys

to get the correct username/password. The URLs also need appropriate tweaking of course. Failure to do these things will produce some fairly obvious errors, however. Strangely, the keystore-related values which look like placeholders since they say “password” really don’t have to be modified.

Try to run jython-build now and you may get, like I did,

Cannot import name AddGroupCommand

which is a clear reference to this line in the python file:

from com.rsa.admin import AddGroupCommand

Rooting around, I find this class in ims-server-o.jar which I already have in my …lib/java. So I decided to make a ~/.api file which contains not only the JAVA_HOME, but a CLASSPATH:

export JAVA_HOME=/usr/local/RSASecurity/RSAAuthenticationManager/appserver/jdk
export CLASSPATH=/usr/local/RSASecurity/RSAAuthenticationManager/appserver/jdk/lib/java/ims-server-o.jar

Actually my ~/.api file contains more. I just borrowed $RSAHOME/utils/rsaenv and added those definitions, but I’m not sure any of the other stuff is needed.

Are we there yet? Not in my case. Now we get a simple “Access Denied.” I was stymied by this for awhile. I went to reporting and found this failure logged under Authentication Monitor:

(Description) User admin attempted authentication using authenticator RSA_password.

(Reason) Authentication method failed.

What this was about is that I had forgot to update my build.xml file with the correct username and password:

    <property name="admin.name" value="admin"/>
    <property name="admin.pw" value="mypassword"/>

After correcting that, it began to work.


Update after several months

Well, after some months of inattention, now once again it doesn’t work! I checked my adduser log file and saw this nugget in the traceback:

[java] File "src/AdminAPIDemos.py", line 846, in ?
[java] com.rsa.authn.AuthenticationCommandException: Access Denied

The only other clue I have is that another administrator changed the Super Admin password a couple weeks ago as it was expired. I haven’t resolved this one yet, it’s a work in progress! Ah. Simple. I needed to update my build.xml with the latest admin password. Hmm. That could be kind of a pain if it’s expiring every 90 days.

Conclusion
I was battered by this exercise, mostly by my own failure to read the manual. But some things are simply not documented. Why all the fuss just to get demo code working? Because we can customize it. I felt that once I had the demo running, the sky was the limit and creating customization to do automated add/delete would not be that difficult.

To see the jython script I created to do the user add/deletes click on this article: Add/Delete Jython Script for RSA Authentication Manager

Categories
Linux Perl Web Site Technologies

For Experimentalists: How to Test if your Web Server has a long timeout

Intro
I use the old Sun Java System Web Server, now known as the Oracle Web Server, formerly Sun ONE web server and before that iPlanet Web Server and before that Netscape Enterprise Server. The question came up the other day if the web server times out web pages. I never fully trust the documentation. I developed a simple method to experiment and find the answer for myself.

The Method
Sometimes you test what’s easiest, not what you should. In this case, an easy test is to write a long-running CGI program. This program, timertest.pl, is embarrassingly old, but anyhow…

#!/usr/bin/perl
# DrJ, 3/1999
# The new, PERL5 way:
use CGI;
$query = new CGI;
$| = 1;
#
print "Content-type: text/html\n\n";
 
print "<h2>Environment Variables</h2>
 
<table>
<tr><th>Env Variable</th><th>Value</th></tr>\n";
foreach $key (sort(keys(%ENV))) {
  print "<tr><td>$key</td><td>$ENV{$key}</td></tr>\n";
}
print "</table>\n";
print "<hr>
<h2>Name/Value Pairs</h2>
<table>
<tr><th>Name</th><th>Value</th></tr>\n";
foreach $key ($query->param) {
  print "<tr><td>$key</td><td>" . $query->param($key) . "</td></tr>\n";
}
print "</table>\n";
$host = `hostname`;
print "Hostname: $host<br>\n";
sleep($ENV{QUERY_STRING});
 
print "we have slept for $ENV{QUERY_STRING} seconds.\n";

So you see it prints out some stuff, sleeps for a specified time, then prints out a final line. You call it like curl your_sevrer/cgi-bin/timertest.pl?305, where 305 is the time in seconds to sleep. I suggest use of the curl browser so as not to be thrown off by browser complications which may have their own timeouts. curl is simplicity itself and won’t bias the answer. Use a larger number for longer times. That was easy, right? Does it work? No. Does it show what we _really_ wanted to show? Also no. In other words, a CGI program that runs for 610 seconds will be killed by the web server, but that’s really a function of some CGI timer. Five and ten minutes seem to be magic timeout values for some built-in timers, so it is good to test times slightly smaller/larger than those times. So how do we test a plain web page??? It turns out we can…

The Solution – using the Unix bag of tricks
I only have a couple of minutes here. Briefly:

> mknod tmp.htm p

> chown me tmp.htm

(from another window)
> curl my_server/tmp.htm

(back to first window)
> sleep 610; ls -l > tmp.htm

Then wait! mknod as used above is apparently the old, Solaris, syntax. The syntax could be somewhat different under Linux. The point is to create a named pipe. Think of a named pipe, like it sounds, like giving a name to the “|” character used so often in Unix command lines. So it needs a process to give it input and a process to read it, hence the two separate windows.

See if you get the directory listing in your curl window after about 10 minutes. With my Sun Java System Web Server I do, so now I know both curl and the web server support probably unlimited page-load times.

An Unexpected Finding
Another tip and unexpected lesson – don’t use one of your named pipes more than once. If you mess up, create a new one and work with that. What happens when I re-use one of my pipes is that curl is able to read the web page over and over, without a process sending input to the named pipe! That wasn’t supposed to happen. What does it all mean? It can only be – -and I’ve often suspected this – that my web server is caching the content. It’s not a particularly well-documented feature, either. I think most times I wish it’d rather not.

Conclusion
The Sun Java System Web Server times out CGI scripts, but not regular static web pages. We proved this in a few minutes by devising an unambiguous experiment. As an added bonus we also proved that the web server caches at least some pages. The careful observer is always open to learning more than what he or she started out intending to look for!

Categories
Admin Apache CentOS Linux Web Site Technologies

Major Headaches Migrating Apache from Ubuntu to CentOS

Intro
I’m changing servers from Ubuntu server to CentOS. On Ubuntu I just checked off LAMP and got my environment. In CentOS I’m doing it piece-by-piece. I don’t think my Ubuntu install is quite regular, either, as I bastardized it by adding environment variables in the Apache config file, a concept I borrowed from SLES! Turns out it is quite an ordeal to make a smooth transition. I will share all my pitfalls. I still don’t have it working, but I think I’m over the hump. [Update: now it is working, or 99% of it is working. It is a bit sluggish, however.]

The Details
I installed httpd on CentOS using yum. I also installed some php5 packages which I saw were recommended as well. First thing I noticed is that the directory structure for “httpd” as it seems to be known on CentOS, is dramatically different from “apache2” as it is known in Ubuntu. This example illustrates the point. In CentOS the main config file is

/etc/httpd/conf/httpd.conf

while in Ubuntu I had

/etc/apache2/apache2.conf

so I tarred up my /etc/apache2 files and had the thought “Let’s make this work on CentOS.” Ha. Easier said than done.

To remind, the content of /etc/apache2 is:

apache2.conf, conf.d, sites-enabled sites-available mods-enabled mods-available plus some stuff I probably added, including envvars, httpd.conf and ports.conf.

envvars contains environment variables which are subsequently referenced in the config files, like these:

export APACHE_RUN_USER=www-data
export APACHE_RUN_GROUP=www-data
export APACHE_PID_FILE=/var/run/apache2$SUFFIX.pid
export APACHE_RUN_DIR=/var/run/apache2$SUFFIX
export APACHE_LOCK_DIR=/var/lock/apache2$SUFFIX
# Only /var/log/apache2 is handled by /etc/logrotate.d/apache2.
export APACHE_LOG_DIR=/var/log/apache2$SUFFIX

First step? Well we have to hook httpd startup to our new directory somehow. I don’t recall this part so well. I think I tried this from the command line:

$ apachectl -d /etc/apache2 -f apache2.conf -k start

and it may be at that point that I got the MPM workers error. But I forget. I switched to using the service command and that particular error seemed to go away at some point. I don’t believe I needed to do anything special.

So I tried this edit to /etc/sysconfig/httpd (sparing you the failed attempts):

OPTIONS=”-d /etc/apache2 -f apache2.conf”

Now we try to launch and see what happens.

$ service httpd start

Starting httpd: httpd: Syntax error on line 203 of /etc/apache2/apache2.conf: Syntax error on line 1 of /etc/apache2/mods-enabled/alias.load: Cannot load /usr/lib/apache2/modules/mod_alias.so into server: /usr/lib/apache2/modules/mod_alias.so: cannot open shared object file: No such file or directory
[FAILED]

Fasten your seatbelts, put on your big-boy pants or whatever. We’re just getting warmed up.

Let’s look at mods-available/alias.load:

$ more alias.load

LoadModule alias_module /usr/lib/apache2/modules/mod_alias.so

Sure enough, there is not only no such file, there is not even such a directory as /usr/lib/apache2. And all the load files have references like that. Where did the httpd install put its modules anyways? Why in /etc/httpd/modules. So I made a command decision:

$ mkdir /usr/lib/apache2
$ cd !$
$ ln -s /etc/httpd/modules

So where does that put us? Here:

$ service httpd start

Starting httpd: httpd: Syntax error on line 203 of /etc/apache2/apache2.conf: Syntax error on line 1 of /etc/apache2/mods-enabled/ssl.load: Cannot load /usr/lib/apache2/modules/mod_ssl.so into server: /usr/lib/apache2/modules/mod_ssl.so: cannot open shared object file: No such file or directory
     [FAILED]

Not everyone will see this issue. I had used SSL for some testing in Ubuntu so I had that module enabled. my CentOS is a core image and did not come with an SSL module. So let’s get it.

$ yum search mod_ssl

shows the full module name to be mod_ssl.x86_64, so we install it with yum install.

How far did that get us? To here:

$ service httpd start

Starting httpd: httpd: bad user name ${APACHE_RUN_USER}
  [FAILED]

Ah, remember my environment variables from above? As I said I actually use them with lines such as:

User ${APACHE_RUN_USER}

in apache2.conf. But clearly the definitions of those environment variables is not getting passed along. I decide to see if this step might work. I append these two lines to /etc/sysconfig/httpd:

$ Read in our environment variables. Inspired by apache on SLES.
. /etc/apache2/envvars

Could any more go wrong? Sure. Lots! Then there’s this:

$ service httpd start

Starting httpd: httpd: bad user name www-data
      [FAILED]

Amongst all the other stark differences, ubuntu and CentOS use different users to run apache. Great! So I create a www-data user as userid 33, gid 33 because that’s how it was under ubuntu. but GID 33 is already taken in CentOS. It is backup. I decide I will never use it that way, and change the group name to www-data.

That brings us here. you see I have a lot of patience…

$ service httpd start

Starting httpd: Syntax error on line 218 of /etc/apache2/apache2.conf:
Invalid command 'LogFormat', perhaps misspelled or defined by a module not included in the server configuration
   [FAILED]

Now my line 218 looks pretty regular. It’s simply:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined

I then realized something interesting. The modules built in to httpd on centOS and apache2 are different. apache2 seems to have some modules built in for logging:

$ apache2 -l

Compiled in modules:
  core.c
  mod_log_config.c
  mod_logio.c
  prefork.c
  http_core.c
  mod_so.c

whereas httpd does not:

$ httpd -l

Compiled in modules:
  core.c
  prefork.c
  http_core.c
  mod_so.c

So I made an empty log_config.conf and a log_config.load in mods-available that reads:

LoadModule log_config_module /usr/lib/apache2/modules/mod_log_config.so

I got the correct names by looking at the apache web site documenttion on that module. And i linked those two files up in the mods-available diretory:

$ cd mods-enabled
$ ln -s ../mods-available/log_config.conf
$ ln -s ../mods-available/log_config.load

Next error, please! Certainly. It is:

$ service httpd start

Starting httpd: Syntax error on line 218 of /etc/apache2/apache2.conf:
Unrecognized LogFormat directive %O
  [FAILED]

where line 218 is as recorded above. Well, some searches showed that you need the logio module. Note that it is also compiled into to apache2, but missing from httpd. So I did a similar thing with defining the necessary mods-{available,enabled} files. logio.load reads:

LoadModule logio_module /usr/lib/apache2/modules/mod_logio.so

The next?

$ service httpd start

Starting httpd: (2)No such file or directory: httpd: could not open error log file /var/log/apache2/error.log.
Unable to open logs
   [FAILED]

Oops. Didn’t make that directory. Naturally httpd and apache2 use different directories for logging. What else could you expect?

Now we’re down to this minimalist error:

$ service httpd start

Starting httpd:     [FAILED]

The error log contained this line:

[Mon Mar 19 14:11:14 2012] [error] (2)No such file or directory: Cannot create SSLMutex with file `/var/run/apache2/ssl_mutex'

After puzzling some over this what I eventually noticed is that my environment has references to directories which I haven’t defined yet:

export APACHE_RUN_DIR=/var/run/apache2$SUFFIX
export APACHE_LOCK_DIR=/var/lock/apache2$SUFFIX

So I created them.

And now I get:

$ service httpd start

Starting httpd:           [  OK  ]

But all is still not well. I cannot stop it the proper way. Trying to read its status goes like this:

$ service httpd status

httpd dead but subsys locked

I looked this one up. Killed off processes and semaphores as recommended with ipcs -s (see this link), etc. But since my case is different, I also did something different. I modified my /etc/init.d/httpd file:

#pidfile=${PIDFILE-/var/run/httpd/httpd.pid}
pidfile=${PIDFILE-/var/run/apache2.pid}
#lockfile=${LOCKFILE-/var/lock/subsys/httpd}
lockfile=${LOCKFILE-/var/lock/subsys/apache2}

Believe it or not, this worked. I can now run service httpd status and service httpd stop. To prove it:

$ service httpd status

httpd (pid  30366) is running...

Another Error Crops Up
I eventually noticed another problem with the web site. My trajectory page was not working. Upon investigation I found this comment in my main apache error log (not my virtual server error log, which I still don’t understand):

sh: /home/drj/traj/traj4.pl: Permission denied

This had to be a result of my call-out to a perl program from a php program:

...
$data = exec('/home/drj/traj/traj4.pl'.' '.$escargs);
...

But what’s so special about that call? Worked fine on Ubuntu, and I did a directory listing to show the file was really there. Well, here’s the thing, that file is under my home directory and guess what? When you crate your users in Ubuntu the home directory permissions are set to group and others read. Not in CentOS! A listing of /home looks kind of like this:

/home$ ll

total 12
drwx------ 2 drj   drj     4096 Mar 19 15:26 drj/
...

I set the permissions for all to read:

$ sudo chmod g+rx,o+rx drj

and I was good to go. The program began to work.

May 2013 Update
I was asked how all this survived after a yum update. Answer: pretty well, but not perfectly. The daemon was fine. And what miseld me is that it started fine. But then a couple days later I was looking at my access log and realized…it wasn’t there! Nor the errors log. Well, actually, the default access and error logs were there, but not for my virtual servers.

I soon realized that

$ service httpd status

produced

httpd dead but subsys locked

Well, who reads or remembers their own posts from a year ago? I totally forgot I had already dealt with this once, and my own post didn’t show up in my DDG search. Anywho, I stepped on the same rake twice. Being less patient this time around, probably because I am one year older, I simply altered the /etc/init.d/httpd file (looks like it had been changed by the update) thusly:

#pidfile=${PIDFILE-/var/run/httpd/httpd.pid}
#lockfile=${LOCKFILE-/var/lock/subsys/httpd}
# try as an experiment - DrJ 5/3/13
pidfile=/var/run/apache2.pid
lockfile=/var/lock/apache2/accept.lock

and I made sure I had a /var/lock/apache2 directory. This worked.

I chose a lock file with that particular name because I noticed this in my /etc/apache2/apache2.conf:

LockFile ${APACHE_LOCK_DIR}/accept.lock

To clean things out as I was working and re-working this problem since I couldn’t run

$ service httpd stop

I ran instead:

$ pkill -9 -f sbin/httpd

and I removed /var/run/apache2.pid.

Now, once again, I can get a status on my httpd service and restart works as well and my access and error logs are being written.

Conclusion
This conversion exercise turned out to be quite a teaching lesson and even after all this more remains. After the mysql migration I find the performance to be sub-par – about twice as slow as it was on Ubuntu.

Four months later, CentOS has not crashed once on me. Contrast that with Ubuntu freezing every two weeks or so. I optimized MySQL to cache some data and performance is adequate. I also have since learned about bitnami, which is kind of a stack for all the stuff I was using. Check out bitnami.org.

Categories
Admin Hosting Service Linux

Hosting: You Really Can’t beat Amazon Web Services EC2

Intro
You want to have your own server hosted by a service provider that’s going to take care of the hard stuff – uninterruptible power, fast pipe to the Internet, backups? That’s what I wanted. In addition I didn’t want to worry about actual, messy hardware. Give me a virtual server any day of the week. I am no hosting expert, but I have some experience and I’d like to share that.

The Details
I’d say chances are about even whether you’d think of Amazon Web Services for the above scenario. I’d argue that Amazon is actually the most competitive service out there and should be at the top of any short list, but the situation wasn’t always so even as recently as February of this year.

You see, Amazon markets itself a bit differently. They are an IaaS (infrastructure as a service) provider. I don’t know who their top competition really is, but AWS (Amazon Web Service) is viewed as both visionary and able to execute by Gartner from a recent report. My personal experience over the last 12 months backs that up. My main point, however, is that hosting a server is a subset of IaaS. I assume that if you want your own server where you get root access, you have the skill set (aided by the vast resources on the Internet including blogs like mine) to install a web server, database server, programming environment, application engines or whatever you want to do with it. You don’t need the AWS utility computing model per se, just a reliable 24×7 server, right? That’s my situation.

I was actually looking to move to “regular” hosting provider, but it turns out to have been a really great time to look around. Some background. I’m currently running such an environment running Ubuntu server 10.10 as a free-tier micro instance. I’ve enjoyed it a lot except one thing. From time to time my server freezes. At least once a month since December. I have no idea why. Knowing that my free tier would be up anyways this month I asked my computer scientist friend “Niz” for a good OS to run a web server and he said CentOS is what I want. It’s basically Redhat Enterprise Linux except you don’t pay Redhat for support.

I looked at traditional hosting providers GoDaddy and Rackspace and 1and1 a bit. I ran the numbers and saw that GoDaddy, with whom I already host my DNS domains, was by far the cost leader. They were also offering CentOS v 5.6 I think RackSpace also had a CentOS offering. I spoke with a couple providers in my own state. I reasoned I wuold keep my business local if the price was within 25% of other offers I could find.

Then, and here’s one of the cool things about IaaS, I fired up a CentOS image at Amazon Elastic Compute Cloud. With utility computing I pay only by the hour so I can experiment cheaply, which I did. Niz said run v 5.6 because all the bugs have been worked out. He hosts with another provider so he knows a thing or two about this topic and many other topics besides. I asked him what version he runs. 5.6. So I fired it up. But you know, it just felt like a giant step backwards through an open source timeline. I mean Perl v 5.8.8 vs Ubuntu’s 5.10.1. Now mind you by this time my version of Ubuntu is itself a year old. Apache version 2.2.3 and kernel version 2.6.18 versus 2.2.16 and 2.6.35. Just plain old. Though he said support would be available for fantastical amount of time, I decided to chuck that image.

Just as I was thinking about all these things Amazon made a really important announcement: prices to be lowered. All of a sudden they were competitive when viewed as a pure hosting provider, never mind all the other features they bring to bear.

I decided I wanted more memory than the 700 MB available to a micro image, and more storage than the 8 GB that tier gives. So a “small” image was the next step up, at 1.7 GB of memory and 160 GB disk space. But then I noticed a quirky thing – the small images only come in 32-bit, not 64-bit unlike all the other tiers. I am so used to 64-bit by now that I don’t trust 32-bit. I want to run what a lot of other people are running to know that the issues have been worked out.

Then another wonderful thing happened – Amazon announced support for 64-bit OSes in their small tier! What timing.

The Comparison Results
AWS lowered their prices by about 35%, a really substantial amount. I am willing to commit up front for an extended hosting because I expect to be in this for the long haul. Frankly, I love having my own server! So I committed to three years small tier, heavy usage after doing the math in order to get the best deal for a 24×7 server. It’s $300 $96 up front and about $0.012$0.027/hour for one instance hour. So that’s about $18 $22/month over three years. Reasonable, I would say. For some reason my earlier calculations had it coming out cheaper. These numbers are as of September, 2013. I was prepared to use GoDaddy which I think is $24/month for a two-year commitment. My finding was that RackSpace and 1and1 were more expensive in turn than GoDaddy. I have no idea how AWS did what they did on pricing. It’s kind of amazing. My local providers? One came in at six times the cost of GoDaddy(!), the other about $55/month. Too bad for them. But I am excited about my new server. I guess it’s a sort of master of my own destiny type of thing that appeals to my independent spirit. Don’t tell Amazon, but really I think they could have easily justified charging a small premium for their hosting, given all the other convenient infrastructure services that are there, ready to be dialed up, say, like a load balancer, snapshots, additional IPs, etc. And there are literally 8000 images to choose from when you are deciding what image (OS) to run. That alone speaks volumes about the choices you have available.

What I’m up to
I installed CentOS 6.0 core image. It feels fresher. It’s based on RedHat 6.0 It’s got Perl v. 5.10.1, kernel 2.6.32, and, once you install it, Apache v 2.2.15. It only came with about 300 packages installed, which is kind of nice, not the usual 1000+ bloated deal I am more used to. And it seems fast, too. Now whether or not it will prove to be stable is an entirely different question and only time will tell. I’m optimistic. But if not, I’ll chuck it and find something else. I’ve got my data on a separate volume anyways which will persist regardless of what image I choose – another nice plus of Amazon’s utility computing model.

A Quick Tip About Additional Volumes
With my micro instance it occupied a full 8 GB so I didn’t have a care about additional disk volumes. On the other hand, my CentOS 6.0 core image is a lean 6 GB. If I’m entitled to 160 GB as part of what I’m paying for, how do I get the access to the remaining 154 GB? I guess you create a volume. Using the Admin GUI is easiest. OK, so you have your volunme, how does your instance see it? It’s not too obvious from their documentation but in CentOS my extra volume is

/dev/xvdj

I mounted that a formatted it as an ext4 device as per their instructions. It didn’t take that long. I put in a line in /etc/fstab like this:

/dev/xvdj /mnt/vol ext4 defaults 1 2

Now I’m good to go! It gets mounted after reboot.

Dec, 2016 update
Amazon has announced Lightsail to better compete with GoDaddy and their ilk. Plans start as low as $5 a month. For $10 a month you get a static IP, 1 GB RAM, 20 GB SSD storage I think and ssh access. So I hope that means root access. Oh, plus a pre-configured WordPress software.

Conclusion
Amazon EC2 rocks. They could have charged a premium but instead they are the cheapest offering out there according to my informal survey. The richness of their service offerings is awesome. I didn’t mention that you can mount the entire data set of the human genome, or all the facts of the world which have been assembled in freebase.org. How cool is that?

Categories
Admin Apache Uncategorized Web Site Technologies

The IT Detective agency: Excessive Requests for PAC file Crippling Web Server

Intro
Funny thing about infrastructure. You may have something running fine for years, and then suddenly it doesn’t. That is one of the many mysteries in the case of the excessive requests for PAC file.

The Details
We serve our Proxy Auto-config (PAC) file from a couple web servers which are load-balanced. It’s worked great for over 10 years. The PAC file is actually produced by a Perl script which can alter the content based on the user’s IP or other variables.

The web servers got bogged down last week. I literally observed the load average shoot up past 200 (on a 2-CPU server). This is not good.

I quickly noticed lots and lots of accesses for wpad.dat and proxy.pac. Some PCs were individually making hundreds of requests for these files in a day! Sometimes there were 15 or 20 requests in a minute. Since it is a script it takes some compute power to handle all those requests. So it was time to do one of two things: either learn the root cause and address it, or make a quick fix. The symptoms were clear enough, but I had no idea about the root cause. I also was fascinated by the requests for wpad.dat which I felt was serving no purpose whatsoever in our environment. So I went for the quick fix hopinG that understanding would come later.

To be continued…
As promised – three and a half years later! And we still have this problem. It’s probably worse than ever. I pretty much threw in the towel and learned how to scale up our apache web server to handle more PAC file requests simultaneously, see the references.

References
Scaling apache to handle more requests.

Categories
Admin Linux SLES

How to Get By Without unix2dos in SLES

Intro
As a Unix old-timer looking at the latest releases, I only have observed one tendency – that of ever-increasing numbers of commands, always additive – until now. A command I considered useful (well, basically any command I have ever used I consider useful) has gone AWOL in Suse Linux Enterprise Server (SLES for short): unix2dos.

Why You Need It
These days you need it more than ever. What with sftp being used in place of ftp, your transferred text files will come over from a SLES server to your PC in binary mode, preserving the Linux-style way of declaring a new line with the newline character, “\n”. Bring that file onto your PC and look at it in Notepad and you’ll get one long line because Windows requires more to indicate a new line. Windows OS’s like Windows 7 require a carriage return + newline, i.e., “\r\n”.

Who You Going to Call
I spoke with some experts so I cannot take credit for finding this out personally. Long story short things evolved and there is a more sophisticated command available that does this sort of thing and much else. That’s recode.

But I don’t think I’ll ever use recode for anything else so I decided to re-create a unix2dos command using recode in a tiny shell script:

#!/bin/sh
# inspired by http://yourlinuxguy.com/?p=232 and the fact that they took away this useful command
# 3/6/12
recode latin1..ibmpc $*

You call it like this:

> unix2dos file

and it overwrites the file and converts it to the format Windows expects.

My other expert contact says I could find the old unix2dos in OpenSuse but I decided not to go that route.

Of course to convert in the other direction you have dos2unix which for some reason wasn’t removed from the distro. Strange, huh?

How to See That It Worked
I use

> od -c file|more

to look at the ascii characters in a text file. It also shows the newline and carriage return characters with a \n and \r respectively This is a good command to know because it is also a “safe” way to look at a binary file. By safe I mean it won’t try to print out 8-bit characters that will permanently mess your terminal settings!

2017 update
I finally needed this utility again after five years. My program doesn’t work on CentOS. – No recode, whatever that was. However, the one-liner provided in the comments worked just fine for me.

Conclusion
We can rest easy and send text files back-and-forth between a PC and a SLES server with the help of this unix2dos script we developed.

Interestingly, RedHat’s RHEL has kept unix2dos in its disrtibution. Good for them. In ubuntu Linux unix2dos also seems decidedly missing.

Categories
Admin Linux

Common Problems Installing Cognos Gateway on Linux

Updated for a 2018 Cognos 11 install
with 2013 updates for Cognos 10 installation

Intro
I tried to take a shortcut and get a 2nd Cognos gateway up and running by copying files, etc. rather than a proper install. At one time or another I feel I must have encountered just about every problem conceivable. I didn’t take great, systematic notes, but I’d like to mention some highlights while it is still fresh in my memory!

The Details
Note that I have a working gateway server running on the same version of Linux, SLES 11 SP1. So I thought I’d be clever and just copy all the files below /opt/cognos8 from the working server.

First Rookie Mistake
Let’s call our COGNOS_ROOT /opt/cognos8 for convenience.
Cognos 10 note: /opt/cognos10 would be a more sensible installation directory!

So you’re following along in the documentation and dutifully looking for /opt/cognos8/bin/cogconfig.sh, and not finding it? Me, neither. So I cleverly borrowed it from a working solaris installation. It’s all Java, right, no OS dependencies, what can go wrong? Ha, ha. You try:

./cogconfig.sh
and get:

Using /usr/lib64/jvm/jre/bin/java
The java class is not found:  CRConfig

Long story short. Give up. Without telling anyone they moved it to /opt/cognos8/bin64. That’s assuming you’re on a 64-bit system like most of us are.

OK. Now you run it from the …bin64 directory, expecting better results, only to perhaps get something like:

./cogconfig.sh

Unable to locate a JRE. Please specify a valid JAVA_HOME environment variable.

Long story short, java-1_4_2-ibm (java-1_6_0-ibm if installing a Cognos 10 gateway) is a good Java environment to install for Cognos Gateway. At least it is on SLES Linux. So you install that and set up environment variables like these:

export JAVA_BINDIR=/usr/lib64/jvm/jre/bin
export JAVA_HOME=/usr/lib64/jvm/jre
export JAVA_ROOT=/usr/lib64/jvm/jre

Now you’re cooking. Run it yet again. You’re smart and know to set up your DISPLAY environment to a valid XServer you have access to. But even if the X application actually does launch and run (you may need some Motif or additional X packages, possibly even from the SDK DVD – see appendix A), if you try to export the configuration you’ll get an error like this:

java.lang.ClassNotFoundException: org.bouncycastle134.jce.provider.BouncyCastleProvider

Cognos 10 note: I did not have this class missing in my Cognos 10 installation. Yeah!

Yes, you are missing the infamous bouncycastleprovider! This stuff is too good to make up, right? It’s a jar file that’s somewhere in the Cognos Gateway distribution, bcprov-jdk14-134.jar. In my case I need to put it here:

/etc/alternatives/jre/lib/ext

With that in place run it yet again. Now you may be unable to export the configuration with this error:

CAM-CRP-1057 Unable to generate the machine specific symmetric key.

Does it ever end? Yes!

You may have old values of keys and what-not cryptography stuff from your copy of the other system. So you remove these directories and all their contents:

/opt/cognos8/{encryptkeypair,signkeypair}

And I even saw the following error:

02/03/2012,11:26:56,Err,com.cognos.crconfig.data.DataManagerException: CAM-CRP-1132 An error occurred while attempting to request a certificate from the Certificate Authority service. Unable to connect to the Certificate Authority service. Ensure that the Content Manager computer is configured and that the Cognos 8 services on it are currently running. Reason: java.net.ConnectException: Connection refused, com.cognos.crconfig.data.DataManager.generateCryptoKeys(DataManager.java:2730)

I think it comes about if you save the default config without editing it and putting in a valid dispatcher URI, but I forget.

The main point towards the end was to start with a clean config by a:

cd /opt/cognos8/configuration;cp cogstartup.xml{.new,}

, making sure there is no encryptkeypair and signkeypair directories, launching …bin64/cogconfig.sh, working with the GUI to define the dispatcher URIs to your working, running Cognos dispatcher, exporting it,

(Let me take a breath here. If that export succeeds, you’re home.)

and finally saving it, which also generates the system-specific keys.

That’s it! A bunch of green check marks are your reward. Hopefully.

Conclusion
In the end you will see that this “cheap method” of installing Cognos Gateway worked. We had a few bumps along the road, but we worked through them all. Now that we’ve seen just about every conceivable problem we have a treasure trove of documented errors and fixes should we ever find ourselves in this situation again.

There is one more Cognos Gateway problem we resolved, by the way, that was previously documented here.

Appendix A – Cognos 10 note
Yes, I referred to this document in my own installation of Cognos version 10 gateway component. The problems are very similar, and this was a big help, if I say so myself.

I notice I write a tight narrative. I have lots of tangential thoughts, but to list them all as I think of them would destroy the flow of the narrative. In this case I wanted to expand on the openmotif packages.

I got a missing libXm.so.4 message when launching issetup the first time. I determined this came from an openmotif package from my previous successful installation on another server. My new server had limited repositories.

> zypper search openmotif

produced these results:

 
S | Name                   | Summary                    | Type
--+------------------------+----------------------------+-----------
  | openmotif21-demos      | Open Motif 2.2.4 Libraries | package
  | openmotif21-libs       | Open Motif 2.2.4 Libraries | package
  | openmotif21-libs       | Open Motif 2.2.4 Libraries | srcpackage
  | openmotif21-libs-32bit | Open Motif 2.2.4 Libraries | package
  | openmotif22-libs       | Open Motif 2.2.4 Libraries | package
  | openmotif22-libs       | Open Motif 2.2.4 Libraries | srcpackage
  | openmotif22-libs-32bit | Open Motif 2.2.4 Libraries | package

Well, I tried to install first openmotif21-libs-32bit then openmotif22-libs-32bit, but neither gave me the right version of libXm.so! I had versions 2, 3 and 6! So I simply did one of these numbers:

> cd /usr/lib; ln -s libXm.so.3.0.3 libXm.so.4

and, to my surprise, it worked!

More Errors Documented for completeness’ sake

At the risk of making this blog post a total mess, I’ll include a few more errors I encountered during the upgrade. Who knows who might find this useful.

Generating the cryptographic keys is always a hold-your-breath-and-pray operation. I had my upgrade files in place in a new install directory, /opt/cognos10. I ran bin64/cogconfig.sh like usual. It was suggested I could save the configuration even though the application gateway wasn’t running, so I tried that. No dice.

The cryptographic information cannot be encrypted.

Fine. So probably the app server needs to be running before we save the config, right? So they got it running. I tried to save the config. Same error. The details were as follows:

[ ERROR ]
CAM-CRP-1315 Current configuration points to a different Trust Domain than originally configured.
 
[ ERROR ] 
The cryptography information was not generated.

The remedy? Close the configuration and completely remove these directories beneath the /opt/cognos10/configuration directory:

– encryptkeypair
– signkeypair
– csk (actually I didn’t have this one. But I guess it should be removed if present)

I held my breath, re-ran cogconfig and saved. This time it worked!

I also had an error with my Java version:

./cogconfig.sh
Using /usr/lib64/jvm/jre/bin/java
The java class could not be loaded. java.lang.UnsupportedClassVersionError: (CRConfig) bad major version at offset=6
/usr/lib64/jvm/jre/bin/java -version

showed

java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 2.3)
IBM J9 VM (build 2.3, J2RE 1.4.2 IBM J9 2.3 Linux amd64-64 j9vmxa64142ifx-20110628 (JIT enabled)
J9VM - 20110627_85693_LHdSMr
JIT  - 20090210_1447ifx5_r8
GC   - 200902_24)

I installed a newer Java:

zypper install  java-1_6_0-ibm

and got past this error.

April 20123 update
Just when you thought every possible error was covered, you encounter a new one. Cognos Mobile isn’t working so well on actual mobile devices so they wanted to try a Fixpack from IBM. No problem, right? They gave me

up_cogmob_linuxi38664h_10.2.1102.33_ml.rar

and I set to work. I don’t particularly like rar files for Linux, but I figured out there is an unrar command:

$ unrar e up_*rar

But after setting up my DISPLAY environment variable I get this new error running ./issetup:

X Error of failed request:  BadDrawable (invalid Pixmap or Window parameter)
  Major opcode of failed request:  14 (X_GetGeometry)
  Resource id in failed request:  0x2
  Serial number of failed request:  257
  Current serial number in output stream:  257
IDS_MSG_PREFIXIDS_COPYRIGHT_LOGOIDS_MSG_PREFIXIDS_MSG_READ_ARCHIVE

The solution? They downloaded a tar.gz version of the Fixpack. I unpacked that and had absolutely no problems with issetup! The really strange thing is that in both issetup are identical files. I use cksum to do a quick compare. Even setup.csp are identical files. I did an strace -f of the two cases but the salient difference didn’t pop out at me. The files present in the tar.gz seem to be fewer in number.

Another random error you will encounter sooner or later

You are doing a Save in cogconfig and you get:

13/05/2013,17:39:05,Err,CAM-CRP-1132 An error occurred while attempting to request a certificate from the Certificate Authority service. Unable to connect to the Certificate Authority service. Ensure that the Content Manager computer is configured and that the IBM Cognos services on it are currently running. Reason: java.net.ConnectException: Connection refused, com.cognos.crconfig.data.crypto.ConfiguringSession.configure(ConfiguringSession.java:35)com.cognos.crconfig.data.DataManager.generateCryptoKeys(DataManager.java:3037)com.cognos.crconfig.data.DataManager$4.run(DataManager.java:4169)com.cognos.crconfig.data.CnfgActionEngine$CnfgActionThread.run(CnfgActionEngine.java:394)com.cognos.crconfig.data.crypto.ConfiguringSession.configure(ConfiguringSession.java:35)com.cognos.crconfig.data.DataManager.generateCryptoKeys(DataManager.java:3037)com.cognos.crconfig.data.DataManager$4.run(DataManager.java:4169)com.cognos.crconfig.data.CnfgActionEngine$CnfgActionThread.run(CnfgActionEngine.java:394)com.cognos.crconfig.data.crypto.ConfiguringSession.configure(ConfiguringSession.java:35)com.cognos.crconfig.data.DataManager.generateCryptoKeys(DataManager.java:3037)com.cognos.crconfig.data.DataManager$4.run(DataManager.java:4169)com.cognos.crconfig.data.CnfgActionEngine$CnfgActionThread.run(CnfgActionEngine.java:394)

This looks scary but has an easy fix. You aren’t communicating with the app server. Probably their dispatcher services are down. Bring them up and it should work fine – it did for me. This is assuming of course that you have your dispatcher URLs set up correctly.

I cloned my Cognos web gateway and got this error
I waited for a few weeks to examine the clone. I ran

$ ./cogconfig.sh

and got this error:

16/05/2013,15:57:35,Err,CAM-CRP-1280 An error occurred while trying to decrypt using the system protection key. Reason: javax.crypto.IllegalBlockSizeException: Input length (with padding) not multiple of 16 bytes

Umm. I don’t have the solution yet. One thing is most highly suspect: in the meatime we re-generated the keys on the production web gateway. So I am hoping that is all we need to do here as well.

Resolved. Here is the process I followed – a sort of colonic for Cognos:

$ cd /opt/cognos10/configuration; rm csk/* signkeypair/* encryptkeypair/* cogstartup.xml
$ cd ../bin64; ./cogconfig.sh

Then in the GUI I re-defined the app servers in the dispatcher URI portion of the environment.
Then did a Save.
Worked like a champ – four green check marks.

cogconfig hangs
This happened to me on an older server. The IBM Cognos Configuration screen displays but it’s supposed to exit so you can get to the part where you edit the configuration and it never does.

Currently no known solution.

June 2018 update
Cognos 11 install problem

The Cognos 11 install was going pretty well. Until it came time to launch cogconfig. That generated this error:

cognos10:/web/cognos11/bin64> ./cogconfig.sh

Using /usr/lib64/jvm/jre/bin/java
Exception in thread "main" java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major version; class=com/cognos/accman/jcam/crypto/CAMCryptoException, offset=6
        at java.lang.ClassLoader.defineClass(ClassLoader.java:286)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:74)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:538)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$300(URLClassLoader.java:77)
        at java.net.URLClassLoader$ClassFinder.run(URLClassLoader.java:1041)
        at java.security.AccessController.doPrivileged(AccessController.java:448)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:427)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:676)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:358)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:642)
        at java.lang.J9VMInternals.verifyImpl(Native Method)
        at java.lang.J9VMInternals.verify(J9VMInternals.java:73)
        at java.lang.J9VMInternals.initialize(J9VMInternals.java:133)
        at com.cognos.cclcfgapi.CCLConfigurationFactory.getInstance(CCLConfigurationFactory.java:59)
        at com.cognos.crconfig.CnfgPreferences.<init>(CnfgPreferences.java:51)
        at com.cognos.crconfig.CnfgPreferences.<clinit>(CnfgPreferences.java:36)
        at java.lang.J9VMInternals.initializeImpl(Native Method)
        at java.lang.J9VMInternals.initialize(J9VMInternals.java:199)
        at CRConfig.main(CRConfig.java:144)

Note my system java version is woefully out-of-date:

$ /usr/lib64/jvm/jre/bin/java ‐version

java version "1.6.0"
Java(TM) SE Runtime Environment (build pxa6460sr16fp15-20151106_01(SR16 FP15))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr16fp15-20151020_272943 (JIT enabled, AOT enabled)
J9VM - 20151020_272943
JIT  - r9_20151019_103450
GC   - GA24_Java6_SR16_20151020_1627_B272943)
JCL  - 20151105_01

whereas the Cognos-supplied Java is two versions ahead:
cognos10:/web/cognos11> ./jre/bin/java ‐version

java version "1.8.0"
Java(TM) SE Runtime Environment (build pxa6480sr4fp10-20170727_01(SR4 FP10))
IBM J9 VM (build 2.8, JRE 1.8.0 Linux amd64-64 Compressed References 20170722_357405 (JIT enabled, AOT enabled)
J9VM - R28_20170722_0201_B357405
JIT  - tr.r14.java_20170722_357405
GC   - R28_20170722_0201_B357405_CMPRSS
J9CL - 20170722_357405)
JCL - 20170726_01 based on Oracle jdk8u144-b01

Instead of the previous approach which involved upgrading the system Java, I decided to just try the Java version Cognos itself had installed. In the following commands note that my installation directory was /web/cognos11.

$ cd /web/cognos11; export JAVA_HOME=`pwd`/jre
$ ./cogconfig.sh

Using /web/cognos11/jre/bin/java
06/06/2018,11:13:04,Dbg,Use Customized settings for font and color.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/web/cognos11/bin/slf4j-nop-1.7.23.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/web/cognos11/configuration/utilities/config-util.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.helpers.NOPLoggerFactory]
06/06/2018,11:13:10,Dbg,The original cogstartup.xml file is clear text. Don't back it up.

That is to say, it worked! I’ve often seen software packages install their own versions of Java. This is the first time I thought to take advantage of that. Wish I had thought of this approach during the Cognos 10 install!

Categories
Admin Perl Web Site Technologies

Turning HP SiteScope into SiteScope Classic with Perl

Intro
HP siteScope is a terrific web application tool and not too expensive for those who have any kind of a budget. The built-in monitor types are a bit limited, but since it allows calls to user-provided scripts your imagination is the only real limitation. For those with too many responsibilities and too little time on their hands it is a real productivity enhancer.

I’ve been using the product for 12 years now – since it was Freshwater SiteScope. I still have misgivings about the interface change introduced some years ago when it was part of Mercury. It went from simple and reliable to Java, complicated and flaky. To this day I have to re-start a SiteScope screen in my browser on a daily basis as the browser cannot recover from a server restart or who knows what other failures.

So I longed for the days of SiteScope Classic. We kept it running for as long as possible, years in fact. But at some point there were no more releases created for the classic view. So I investigated the feasibility of creating my own conversion tool. And…partially succeeded. Succeeded to the point where I can pull up the web page on my Blackberry and get the statuses and history. Think you can do that with regular HP SiteScope? I can’t. Maybe there’s an upgrade for it, but still. It’s nice to have the classic interface when you want to pull up the statuses as quickly as possible, regardless of the Blackberry display issue.

Looking back at my code, I obviously decided to try my hand at OO (object oriented) programming in Perl, with mixed results. Perl’s OO syntax isn’t the best, which addles comprehension. Without further ado, let’s jump into it.

The Details
It relies on something I noticed, that this URL on your HP SiteScope server, http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getConfigurationSnapshot, contains a tree of relationships of all the monitors. Cool, right? But it’s not a tree like you or I would design. Between parent and child is an intermediate layer. I suppose you need that because a group can contain monitors (my only focus in this exercise), but it can also contain alerts and maybe some other properties as well. So I guess the intermediate layer gives them the flexibility to represent all that, though it certainly added to my complication in parsing it. That’s why you’ll see the concern over “grandkids.” I developed a recursive, web-enabled Perl program to parse through this xml. That gives me the tools to build the nice hierarchical groupings. But it does not give me the statuses.

For the status of each monitor I wrote a separate scraper script that simply reads the entire daily SiteScope log every minute! Crude, but it works. I use it for an installation with hundreds of monitors and a log file that grows to 9 MB by the end of the day so I know it scales to that size. Beyond that it’s untested.

In addition to giving only the relationships, the xml also changes with every invocation. It attaches ID numbers to the monitors which initially you think is a nice unique identifier, but they change from invocation to invocation! So an additional challenge was to match up the names of the monitors in the xml output to the names as recorded in the SiteScope log. Also a bit tricky, but in general doable.

So without further ado, here’s the source code for the xml parser and main program which gets called from the web:

#!/usr/bin/perl
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# build v.simple SiteScope web GUI appropriate for smartphones
# 7/2010
#
# Id is our package which defines th Id class
use Id;
use CGI::Pretty;
my $cgi=new CGI;
$DEBUG = 0;
# GIF location on SiteScope classic
$ssgifs = "/artwork/";
$health{good} = qq(<img src="${ssgifs}okay.gif">);
$health{error} = qq(<img src="${ssgifs}error.gif">);
$health{warning} = qq(<img src="${ssgifs}warning.gif">);
# report CGI
$rprt = "/SS/rprt";
# the frustrating thing is that this xml output changes almost every time you call it
$url = 'http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getConfigurationSnapshot';
# get current health of all monitors - which is scraped from the log every minute by a hilgarj cron job
$monitorstats = "/tmp/monitorstats.txt";
print "Content-type: text/plain\n\n" if $DEBUG;
open(MONITORSTATS,"$monitorstats") || die "Cannot open monitor stats file $monitorstats!!";
while(<MONITORSTATS>) {
  chomp;
  ($monitor,$status,$value) = /([^\t]+)\t([^\t]+)\t([^\t]+)/;
  $monitors{"$monitor"} = $status;
  $monitorv{"$monitor"} = $value;
}
open(CURL,"curl $url 2>/dev/null|") || die "cannot open $url for reading!!\n";
my %myobjs = ();
# the xml is one long line!
@lines = <CURL>;
#print "xml line: $lines[0]\n" if $DEBUG;
@multiRefs = split "<multiRef",$lines[0];
#parse multiRefs
# create top-level object
my $id = Id->new (
      id => "id0");
# hash of this object with id as key
$myobjs{"id0"} = $id;
 
# first build our objects...
foreach $mref (@multiRefs) {
  next unless $mref =~ /\sid=/;
#  id="id0" ...
  ($parentid) =  $mref =~ /id=\"(id\d+)/;
  print "parentid: $parentid\n" if $DEBUG;
# watch out for <item><key xsi:type="soapenc:string">groupSnapshotChildren</key><value href="#id3 ...
# vs <item><key xsi:type="soapenc:string">Network</key><value href="#id40"/>
  print "mref: $mref\n" if $DEBUG;
  @ids = split /<item><key/, $mref;
# then loop over ids mentioned in this mref
  foreach $myid (@ids) {
    next unless $myid =~ /href="#(id\d+)/;
    next unless $myobjs{"$parentid"};
# types include group, monitor, alert
    ($typebyregex) = $myid =~ />snapshot_(\w+)SnapshotChildren</;
    $parenttype = $myobjs{"$parentid"}->type();
    $type = $typebyregex ? $typebyregex : $parenttype;
    print "type: $type\n" if $DEBUG;
# skip alert definitions
    next if $type eq "alert";
    print "myid: $myid\n" if $DEBUG;
    ($actualid) = $myid =~ /href="#(id\d+)/;
    print "actualid: $actualid\n" if $DEBUG;
# construct object
    my $id = Id->new (
      id => $actualid,
      type => $type,
      parentid => $parentid );
# build hash of these objects with actualid as key
    $myobjs{$actualid} = $id;
# addchild to parent. note that parent should already have been encountered
    $myobjs{"$parentid"}->addchild($actualid);
    if ($myid !~ /groupSnapshotChildren/) {
# interesting child - has name (every other generation has no name!)
      ($name) = $myid =~ /string\">(.+?)<\/key/;  # use non-greedy operator
      print "name: $name\n" if $DEBUG;
# some names are not of interest to us: alerts, which end in "error" or "good"
      if ($name !~ /(error|good)$/) {
# name may not be unique - get extended name which include all parents
        if (defined $myobjs{"$parentid"}->parentid()) {
          $gdparid = $myobjs{"$parentid"}->parentid();
          $gdparname = $myobjs{"$gdparid"}->extname();
# extname -> extended, or distinguished name.  Should be unique
          $extname = $gdparname. '/' . $name;
        } else {
# 1st generation
          print "1st generation\n" if $DEBUG;
          $extname = $name;
        }
        print "extname: $extname\n" if $DEBUG;
        $id->name($name);
        $id->extname($extname);
        $id->isanamedid(1);
        $myobjs{"$parentid"}->hasnamedkids(1); # want to mark its parent as "special"
# we also need our hash to reference objects by extended name since id changes with each extract and name
may not be unique
        $myobjs{"$extname"} = $id;
      } # end conditional over desirable name check
    } else {
      $id->isanamedid(0);
    }
  }
}
#
# now it's all parsed and our objects are alive. Let's build a web site!
#
# build a cookie containing path
my $pi = $ENV{PATH_INFO};
$script = $ENV{SCRIPT_NAME};
$ua = $ENV{HTTP_USER_AGENT};
# Blackberry browser test
$BB = $ua =~ /^BlackBerry/i ? 1 : 0;
$MSIE = $ua =~ /MSIE /;
# font-size depends on browser
$FS = "font-size: x-small;" if $MSIE;
$cookie = $cgi->cookie("pathinfo");
$uri = $script . $pi;
$cookie=$cgi->cookie(-name=>"pathinfo", -value=>"$uri");
print $cgi->header(-type=>"text/html",-cookie=>$cookie);
($url) = $pi =~ m#([^/]+)$#;
#  -title=>'SmartPhone View',
# this doesn't work, sigh...
#print $cgi->start_html(-head=>meta({-http_equiv=>'Refresh'}));
print qq( <HEAD>
<meta http-equiv="Expires" content="0">
<meta http-equiv="Pragma" content="no-cache">
<meta HTTP-EQUIV="Refresh" CONTENT="60; URL=$url">
<TITLE>SiteScope Classic $url Detail</TITLE>
<style type="text/css">
a.good {color: green; }
a.warning {color: green; }
a.error {color: red; }
td {font-family: Arial, Helvetica, sans-serif; $FS}
p.ss {font-family: Arial, Helvetica, sans-serif;}
</style>
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
<script type=text/javascript>
function changeme(elemid,longvalue)
{
document.getElementById(elemid).innerText=longvalue;
}
function restoreme(elemid,truncvalue)
{
document.getElementById(elemid).innerText=truncvalue;
}
</script>
</HEAD><body>
);
 
#print $cgi->h1("This is the heading");
# parse path
# top lvl name:2nd lvl name:3rd lvl name
$altpi = $cgi->path_info();
print $cgi->p("pi is $pi") if $DEBUG;
#print $cgi->p("altpi is $altpi");
# relative url
$rurl = $cgi->url(-relative=>1);
if ($pi eq "") {
# the top
# top id is id3
  print qq(<p class="ss">);
  $myid = "id3";
  foreach $kid ($myobjs{"$myid"}->get_children()) {
    my $kidname = $myobjs{"$kid"}->name();
# kids can be subgroups or standalone monitors
    my $health = recurse("/$kidname");
    print "$health{$health} <a href=\"$rurl/$kidname\">$kidname</a><br>\n";
    $prodtest = $kid if $kidname eq "Production";
  }
  print "</p>\n";
} else {
  $extname = $pi;
  print "pi,name,extname,script: $pi,$name,$extname,$script\n" if $DEBUG;
# print where we are
  $uriname = $pi;
  $uriname =~ s#^/##;
  #print $cgi->p("name is $name");
  #print $cgi->p("uriname is $uriname");
  $uricompositepart = "/";
  @uriparts = split('/',$uriname);
  $lastpart = pop @uriparts;
  print qq(<p class="ss"><a href="$script"><b>Sitescope</b></a><br>);
  print qq(<b>Monitors in: );
  foreach $uripart (@uriparts) {
    my $healthp = recurse("$uricompositepart$uripart");
# build valid link
    ##$link = qq(<a class="good" href="$script$uricompositepart$uripart">$uripart</a>: );
    $link = qq(<a class="$healthp" href="$script$uricompositepart$uripart">$uripart</a>: );
    $uricompositepart .= "$uripart/";
    print $link;
  }
  my $healthp = recurse("$uricompositepart$lastpart");
  $color = $healthp eq "error" ? "red" : "green";
  print qq(<font color="$color">$lastpart</font></b></p>\n);
  print qq(<table border="1" cellspacing="0">);
  #print qq(<table>);
  %hashtrs = ();
  foreach $kid ($myobjs{"$extname"}->get_children()) {
    print "kid id: " . $myobjs{"$kid"}->id() . "\n" if $DEBUG;
    next unless $myobjs{"$kid"}->hasnamedkids();
    foreach $gdkid ($myobjs{"$kid"}->get_children()) {
      print "gdkid id: " . $myobjs{"$gdkid"}->id() . "\n" if $DEBUG;
      $gdkidname = $myobjs{"$gdkid"}->name();
      $gdkidextname = $myobjs{"$gdkid"}->extname();
      my $health = recurse("$gdkidextname");
      my $type = $myobjs{"$gdkid"}->type();
# dig deeper to learn health of the grankid's grandkids
      $objct = $healthct{good} = $healthct{error} = $healthct{warning} = 0;
      foreach $ggkid ($myobjs{"$gdkidextname"}->get_children()) {
        print "ggkid id: " . $myobjs{"$ggkid"}->id() . "\n" if $DEBUG;
        next unless $myobjs{"$ggkid"}->hasnamedkids();
        foreach $gggdkid ($myobjs{"$ggkid"}->get_children()) {
          print "gggdkid id: " . $myobjs{"$gggdkid"}->id() . "\n" if $DEBUG;
          $gggdkidname = $myobjs{"$gggdkid"}->name();
          $gggdkidextname = $myobjs{"$gggdkid"}->extname();
          my $health = recurse("$gggdkidextname");
          $objct++;
          $healthct{$health}++;
        }
      }
      $elemct++;
      $elemid = "elemid" . $elemct;
# groups should have distinctive cell background color to set them apart from monitors
      if ($type eq "group") {
        $bgcolor = "#F0F0F0";
        $celllink = "$lastpart/$gdkidname";
        $truncvalue = qq(<font color="red">$healthct{error}</font>/$objct);
        $tdval = $truncvalue;
      } else {
        $bgcolor = "#FFFFFF";
        $celllink = "$rprt?$gdkidname";
# truncate monitor value to save display space
        $longvalue = $monitorv{"$gdkidname"};
        (my $truncvalue) = $monitorv{"$gdkidname"} =~ /^(.{7,9})/;
        $truncvalue = $truncvalue? $truncvalue : "&nbsp;";
        $tdval = qq(<span id="$elemid" onmouseover="changeme('$elemid','$longvalue')" onmouseout="restorem
e('$elemid','$truncvalue')">$truncvalue</span>);
      }
      $hashtrs{"$gdkidname"} = qq(<tr><td bgcolor="#000000">$health{$health} </td><td>$tdval</td><td bgcol
or="$bgcolor"><a href="$celllink">$gdkidname</a></td></tr>\n);
# for health we're going to have to recurse
    }
  }
# print out in alphabetical order
  foreach $key (sort(keys %hashtrs)) {
    print $hashtrs{"$key"};
  }
  print "</table>";
}
print $cgi->end_html();
#######################################
sub recurse {
# to get the union of health of all ancestors
my $moniext = shift;
my ($moni) = $moniext =~ m#/([^/]+)$#;
# don't bother recursing and all that unless we have to...
return $myobjs{"$moniext"}->health() if defined $myobjs{"$moniext"}->health();
print "moni,moniext: $moni, $moniext\n" if $DEBUG;
my ($kid,$gdkidextname,$health,$cumhealth);
$cumhealth = $health = $monitors{"$moni"} ? $monitors{"$moni"} : "good";
foreach $kid ($myobjs{"$moniext"}->get_children()) {
    if ($myobjs{"$kid"}->hasnamedkids()) {
      foreach $gdkid ($myobjs{"$kid"}->get_children()) {
        $gdkidextname = $myobjs{"$gdkid"}->extname();
# for health we're going to have to recurse
        $health = recurse("$gdkidextname");
        if ($health eq "error" || $cumhealth eq "error") {
          $cumhealth = "error";
        } elsif ($health eq "warning" || $cumhealth eq "warning") {
          $cumhealth = "warning";
        }
      }
    } else {
# this kid is end of line
      $health = $monitors{"$kid"} ? $monitors{"$kid"} : "good";
        if ($health eq "error" || $cumhealth eq "error") {
          $cumhealth = "error";
        } elsif ($health eq "warning" || $cumhealth eq "warning") {
          $cumhealth = "warning";
        }
    }
}
$myobjs{"$moniext"}->health("$cumhealth");
return $cumhealth;
} # end sub recurse

I call it simply “ss” to minimize the typing required. You see it uses a package called Id.pm which I wrote to encapsulate the class and methods. Here is Id.pm:

package Id;
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# class for storing data about an id
# URL (not currently protected): http://localhost:8080/SiteScope/services/APIConfigurationImpl?method=getC
onfigurationSnapshot
# class for storing data about a group
use warnings;
use strict;
use Carp;
#group methods
# constructor
# get_members
# get_name
# get_id
# addmember
#
# member methods
# constructor
# get_id
# get_name
# get_type
# get_gp
# set_gp
 
sub new {
  my $class = shift;
  my $self = {@_};
  bless($self, "Id");
  return $self;
}
# get-set methods, p. 355
sub parentid { $_[0]->{parentid}=$_[1] if defined $_[1]; $_[0]->{parentid} }
sub isanamedid { $_[0]->{isanamedid}=$_[1] if defined $_[1]; $_[0]->{isanamedid} }
sub id { $_[0]->{id}=$_[1] if defined $_[1]; $_[0]->{id} }
sub name { $_[0]->{name}=$_[1] if defined $_[1]; $_[0]->{name} }
sub extname { $_[0]->{extname}=$_[1] if defined $_[1]; $_[0]->{extname} }
sub type { $_[0]->{type}=$_[1] if defined $_[1]; $_[0]->{type} }
sub health { $_[0]->{health}=$_[1] if defined $_[1]; $_[0]->{health} }
sub hasnamedkids { $_[0]->{hasnamedkids}=$_[1] if defined $_[1]; $_[0]->{hasnamedkids} }
 
# get children - use anonymous array, book p. 221-222
sub get_children {
# return empty array if arrary hasn't been defined...
  defined @{$_[0]->{children}} ? @{$_[0]->{children}} : ();
}
# adding children
sub addchild {
  $_[0]->{children} = [] unless defined  $_[0]->{children};
  push @{$_[0]->{children}},$_[1];
}
 
1;

ss also assumes the existence of just a few of the images from SiteScope classic – the green circle for good, red diamond for error and yellow warning, etc.. I borrowed them SiteScope classic.

Here is the code for the log scraper:

#!/usr/bin/perl
# analyze SiteScope log file
# Copyright work under the Artistic License, http://www.opensource.org/licenses/Artistic-2.0
# 8/2010
$DEBUG = 0;
$logdir = "/opt/SiteScope/logs";
$monitorstats = "/tmp/monitorstats.txt";
$monitorstatshis = "/tmp/monitorstats-his.txt";
$date = `date +%Y_%m_%d`;
chomp($date);
$file = "$logdir/SiteScope$date.log";
open(LOG,"$file") || die "Cannot open SiteScope log file: $file!!\n";
# example lines:
# 16:51:07 08/02/2010     good    LDAPServers     LDAP SSL test : ldapsrv.drj.com exit: 0, 0.502 sec    1:
3481  0       502
#16:51:22 08/02/2010     good    Network DNS: (AMEAST) ns2  0.033 sec   2:3459      200     33      ok
#16:51:49 08/02/2010     good    Proxy   proxy.pac script on iwww    0.055 sec   2:12467 200     55   ok
     4288    1280782309      0    0  55      0       0      200  0
#16:52:04 08/02/2010     good    Proxy   Disk Space: earth /logs   66% full, 13862MB free, 41921MB total
 3:3598      66      139862
#16:52:09 08/02/2010     good    DrjExtranet  URL: wwwsecure.drj.com     0.364 sec    1:3604      200
364  ok 26125   1280782328     0    0   358     4       2       200  0
while(<LOG>) {
  ($time,$date,$status,$group,$monitor,$value) = /(\S+)\s(\S+)\t(\S+)\t(\S+)\t([^\t]+)\t([^\t]+)/;
  print '$time,$date,$status,$group,$monitor,$value' . "$time,$date,$status,$group,$monitor,$value\n" if $DEBUG;
  next if $group =~ /__health__/; # don't care about these lines
  $mons{"$monitor"} = 1;
  push @{$mont{"$monitor"}} , $time;
  push @{$mond{"$monitor"}} , $date;
  push @{$monh{"$monitor"}} , $status;
  push @{$monv{"$monitor"}} , $value;
}
# open output at last moment to minimize chances of reading while locked for writing
open(MONITORSTATS,">$monitorstats") || die "Cannot open monitor stats file $monitorstats!!\n";
open(MONITORSTATSHIS,">$monitorstatshis") || die "Cannot open monitor stats file $monitorstatshis!!\n";
# write it all out - will always print the latest values
foreach $monitor (keys %mons) {
# dereference our anonymous arrays
  @times = @{$mont{"$monitor"}};
  @dates = @{$mond{"$monitor"}};
  @status = @{$monh{"$monitor"}};
  @value = @{$monv{"$monitor"}};
# last element is the latest measured status and value
  print MONITORSTATS "$monitor\t$status[-1]\t$value[-1]\n";
  print MONITORSTATSHIS "$monitor\n";
  #for ($i=-11;$i<0;$i++) {
# put latest measure on top
  for ($i=-1;$i>-13;$i--) {
    $time = defined $times[$i] ? $times[$i] : "NA";
    $date = defined $dates[$i] ? $dates[$i] : "NA";
    $stat = defined $status[$i] ? $status[$i] : "NA";
    $val = defined $value[$i] ? $value[$i] : "NA";
    print MONITORSTATSHIS "\t$time\t$date\t$stat\t$val\n";
  }
}

As I said it gets called every minute by cron.

That’s it! I enter the url sitescope.drj.com/SS/ss to access the main program which gets executed because I made /SS a CGI-BIN directory.

This gives you a read-only, Java-free view into your SiteScope status and hierarchy which beckons back to the good old days of Freshwater SiteScope.

Know your limits
What it does not do, unfortunately, is allow you to run a monitor – that seems like the next most simple thing which I should have been able to do but couldn’t figure out – much less define new monitors (never going to happen) or alerts.

I use this successfully against my HP SiteScope instance of roughly 400 monitors which itself is on a VM and there is no apparent strain. At some point this simple-minded script would no longer scale to suit the task at hand, but it might be good for up to a few thousand monitors.

And now a word about open source alternatives
Since I was so enamored with SiteScope Classic there seemed to be no compelling reason to shell out the dough for HP SiteScope with its unwanted interface, so I briefly looked around at free alternatives. Free sounds good, right? Not so much in practice. Out there in Cyberspace there is an enthusiast for a product called Zabbix. I just want to go on the record that Zabbix is the most confused piece of junk I have run across. You are getting less than what you paid for ($0) because you will be wasting a lot of time with it, and in the end it isn’t all that capable. Nagios also had its limits – I can’t remember the exact reason I didn’t go down that route, but there were definite reasons.

HP SiteScope is no panacea. “HP” and “stifling bureaucracy” need to be mentioned in the same sentence. Every time we renew support it is the most confusing mess of line items. Every time there’s a new cast of characters over at HP who nothing about the account’s history. You practically have to beg them to accept your money for a low-budget item like SiteScope because they really don’t pursue it in any way. Then their SAID and contract numbers stuff is confusing if you only see it once every few years.

Conclusion
A conversion program does exist for turning the finicky HP SiteScope Java-encumbered view into pure SiteScope Classic because I wrote it! But it’s a limited read-only view. Still, it’s helpful in a pinch and can even be viewed on the Blackberry’s browser.

Another problem is that HP has threatened to completely change the API so this tool, which is designed for HP SiteScope v 10.12, will probably completely break for newer versions. Oh, well.

References
This post shows some silly mistakes to avoid when doing a minor upgrade in version 11.

Categories
Internet Mail

How to run sendmail in queue-only mode

Intro
I guess I’ve ragged on sendmail before. Incredibly powerful program. Finding out how to do that simple thing you want to do may not be so easy, even with the bible at your side. So to that end I’m making an effort to document those simple things which I’ve found I’ve struggled with.

The Details
Today I wanted to capture all email coming into my sendmail daemon. Well, actually it’s a little more complicated. I didn’t want to disturb production email, but I wanted to capture a spam sample. Today there was a hugely effective spam campaign purporting to be email from the Better Business Bureau (BBB). All the emails however actually came from various senders @aicpa.org. Postini put a filter in place but I knew more were getting through. But they weren’t coming to me. How to get capture them without disturbing users?

In this post I gave some obscure but useful tips for sendmail admins, including the ever-useful smarttable add-on. To reprise, smarttable allows you to make delivery decisions based on sender! That’s totally antithetical to your run-of-the-mill sendmail admin, but it’s really useful… Like now. So I quickly put up a sendmail instance, copying a working config I use in production. But I changed the listener to IP address 127.0.0.2 (which I fortunately had already set up for some other reason I can no longer recall). That one’s pretty standard. That’s just:

DAEMON_OPTIONS(`Name=sm-cap, Addr=127.0.0.2')dnl

Of course you want to create a new queue directory just for the captured emails. I created /mqueue/c0 and put in this line into my .mc file:

define(QUEUE_DIR, `/mqueue/c*')dnl

And here’s the main point, how to defer delivery of all emails. Sendmail actually distinguishes between defer and queueonly. I chose queueonly thusly:

define(`confDELIVERY_MODE',`queueonly')dnl

If by chance you happen to misspell DELIVERY_MODE, like, let’s say, DELIERY_MODE, you don’t seem to get a whole lot of errors. Not that that would ever happen to us, mind you, I’m just saying. That’s why it’s good to also know about the command-line option. Keep reading for that.

It’s simple enough to test once you have it running (which I do with this line: sudo sendmail -bd -q -C/etc/mail/capture.cf).

> telnet 127.0.0.2 25
Trying 127.0.0.2…
Connected to 127.0.0.2.
Escape character is ‘^]’.
220 drj.com ESMTP server ready at Fri, 24 Feb 2012 15:16:40 -0500
helo localhost
250 drjemgw2.drj.com Hello [127.0.0.2], pleased to meet you
mail from: [email protected]
250 2.1.0 [email protected]… Sender ok
rcpt to: [email protected]
250 2.1.5 [email protected]… Recipient ok
data
354 Enter mail, end with “.” on a line by itself
subject: test of the capture-only sendmail instance

Just a test!
-Dr J
.

250 2.0.0 q1OKGet2008636 Message accepted for delivery
quit
221 2.0.0 drj.com closing connection
Connection closed by foreign host.

Is the message there, queued up the way we’d like? You bet:

> ls -l /mqueue/c0

total 16
-rw------- 1 root root  19 2012-02-24 15:17 dfq1OKGet2008636
-rw------- 1 root root 542 2012-02-24 15:17 qfq1OKGet2008636

There also seems to be a second way to run sendmail in queue-only fashion. I got it to work from the command-line like this:

> sudo sendmail -odqueueonly -bd -C/etc/mail/capture.cf

The book says this is deprecrated usage, however. But let’s see, that’s O’Reilly’s Sendmail 3rd edition, published in 2003, we’re in 2012, so, hmm, they still haven’t cut us off…

One last thing, that smarttable entry for my main sendmail daemon. I added the line:

@aicpa.org relay:[127.0.0.2]

Conclusion
It can be useful to queue all incoming emails for various reasons. It’s a little hard to find out how to do this precisely. We found a way to do this without stopping/starting our main sendmail process. This post shows a couple ways to do it, and why you might need to.

May 2012 Update
Just wanted to mention about BBB email how I handle it now. They told me they maintain an accurate SPF record. Sure enough, they do. Now we only accept bbb.org email when the SPF record is a match. But I don’t use sendmail for that, I use Postini’s (OK, Google’s, technically) mail hygiene service. Postini rocks!

My most recent post on how to tame the confounding sendmail log is here.