Categories
Security Web Site Technologies

Microsoft Exchange Online Protection is not PCI compliant

Intro
Microsoft’s cloud offering, Office 365, is pretty good for enterprises. It’s clear a lot of thought has been put into it, especially the security model. But it isn’t perfect and one area where it surprisingly falls short is compliance with payment card industry specifications.

The details
You need to have PCI (payment card industry) compliance to work with credit cards.

An outfit I am familiar with did a test Qualys run to check out our PCI failings. Of course there were some findings that constitute failure – simple things like using non-secure cookies. Those were all pretty simply corrected.

But the probe also looked at the MX records of the DNS domain and consequently at the MX servers. These happened to be EOP servers in the Azure cloud. They also had failures. A case was opened with Microsoft and as they feared, Microsoft displayed big-company-itis and absolutely refused to do anything about it, insisting that they have sufficient security measures in place.

So they tried a scan using a different security vendor, Comodo, using a trial license. Microsoft’s EOP servers failed that PCI compliance verification as well.

More details
The EOP servers gave a response when probed by UDP packets originating from port 53, but not when originating from some random port. That doesn’t sound like the end of the world, but there are so many exploits these days that I just don’t know. As I say Microsoft feels it has other security measures in place so they don’t see this as a real security problem. But explanations do not have a place in PCI compliance testing so they simply fail.

Exception finally granted
Well all this back-and-forth between Qualys and Microsoft did produce the desired result in the end. Qualys agreed with Micrsoft’s explanation of their counter-measures and gave them a pass in the end! So after many weeks the web site finally achieved PCI-DSS certification.

Conclusion
Microsoft’s cloud mail offering, Exchange Online Protection, is not PCI compliant according to standard testing by security vendors. As far as I know you literally cannot use EOP if you want your web site to accept credit card payments unless you get a by-hand exception from the security vendor whose test is used, which is what happened in this case after much painful debate. Also as far as I know I am the first person to publicly identify this problem.

Categories
Apache Web Site Technologies

GetSimple CMS – a non-SQL DB content management system

Intro
I’ve been looking at GetSimple CMS lately. It allows you to do some basic things so in that sense it’s pretty good.

The details
I know nothing about CMSes, except maybe a tiny bit about WordPress since I use it, but only in a really basic way. I like the idea of a simple CMS however – one that is simple to install and doesn’t even require a database, so this fit the bill.

First problem
My first attempts to use it met with failure. I could create new pages but when I saved them the characters I had typed into the pages were lost, just emptied out. I even went so far as to identify the XML file where these pages are stored and look at the raw xml files. The CDATA that should have contained my typing was blank! At first I was convinced that my version of php was to blame. But I tried different version, 5.3.3, 5.3.17. I still got the problem. So finally I decided that it had to be the fact that I compile my own apache and use somewhat different configuration compared to system defaults. So I switched to a system-supplied apache web server and sure enough, it began to work. Although I would like to believe it’s really something in my configuration, I couldn’t find such a thing and gave up trying. For the record the pages are stored here: <htdocs>/data/pages/<page-name>.xml

Security aspect
I would say this CMS is vulnerable to brute-force attacks. I suggest to randomize the location of the admin login by using the GST-adminlock plugin. You want to have multiple users? The version I used, 3.3.8 accommodates that! It needs just a little fudging to get going. Tie authentication to a back-end database? I doubt that is possible. i did check and it does store the login passwords encrypted. Not sure if there is salt added or not…

Plugins
I’m especially bad at evaluating plugins. I guess there should be better themes out there because the system-supplied one is so basic. It’s strange that you have to manually get the downloaded plugin onto the server with sftp or some similar tool. Unless I’m missing something, which I probably am.

Conclusion
GetSimple CMS may be good for a 15 – 20 page simple site. It doesn’t need a back-end database, doing all that sort of thing through use of PHP’s XML capabilities. Quite a few plugins exist which extend its functionality. The available documentation is pretty good. Security is adequate and should be made better by the person implementing it. The last thing we need is another way for the bad guys to take over legitimate web sites and inject malicious content.

References and related
Getsimple CMS info is here: http://get-simple.info/
I mentioned compiling my own apache. Usually it all works out for me, but not for use with GetSimple CMS. anyway, here is how I compiled a recent version of apache 2.4.

Categories
Python Web Site Technologies

Superimpose crosshairs plus grid marks on an image

Intro
My previous effort showed how to superimpose just a crosshairs on an image. That I was able to do within CSS. When it came time to add tick marks to those crosshairs I felt that CSS was getting too complicated. I had only a short time and I honestly couldn’t figure it out.

So I decided on an alternate appraoch of superimposing two images, one of which has transparency. Then it would come down to creating a suitable image that has the crosshairs and tick marks.

The details
I felt this was doable in python and it was, but I needed to add the Python Imaging Library (PIL). On CentOS I simply did a

$ sudo yum install python-imaging

The python program
Here is my python program.

# from http://stackoverflow.com/questions/8376359/how-to-create-a-transparent-gif-or-png-with-pil-python-imaging
# drJ 3/2016
# install PIL: yum install python-imaging
#
from PIL import Image, ImageDraw
width = 640
height = 480
halfwidth=width/2
halfheight=height/2
 
ticklength=10
starttickx=halfwidth - ticklength/2
endtickx=halfwidth + ticklength/2
startticky=halfheight - ticklength/2
endticky=halfheight + ticklength/2
 
img = Image.new('RGBA',(width, height))
 
draw = ImageDraw.Draw(img)
# crosshairs
draw.line((halfwidth, 0, halfwidth, height), fill=252, width=2)
draw.line((0, halfheight, width, halfheight), fill=252, width=2)
# tick marks
 
def my_range(start, end, step):
    while start <= end:
        yield start
        start += step
 
# top to bottom ticks
for y in my_range(0, 480, 30):
    draw.line((starttickx, y, endtickx, y), fill=252, width=2)
# left to right ticks
for x in my_range(20, 640, 30):
    draw.line((x, startticky, x, endticky), fill=252, width=2)
 
img.save('crosshairs.gif', 'GIF', transparency=0)

The web page
We actually have two iamges side-by-side becuase we have two cameras.

<html>
<head>
<style type="text/css">
<!-- DrJ 1/2016
Note that Firefox's implementation of linear-gradient is broken and requires us to
use repeat linear gradient 
Some fairly lousy documentation on repeat linear gradient is here:
https://developer.mozilla.org/en-US/docs/Web/CSS/repeating-linear-gradient
 
-->
 
#jpg1 {
 
    position:absolute;
 
    top:0;
 
    left:0;
 
    z-index:1;
 
}
 
#gif2 {
 
    position:absolute;
 
    top:10;
 
    left:11;
 
 
    /*
 
    set top and left here
 
    */
 
    z-index:1;
}
 
#gif3 {
 
    position:absolute;
 
    top:10;
 
    left:655;
 
 
    /*
 
    set top and left here
 
    */
 
    z-index:1;
}
 
</style></head>
<body>
<table><tr><td>
<div>
  <img id="jpg1" src="http://dcs-931l-ball/mjpeg.cgi" width="640" height="480" />
  <img id="gif2" src="crosshairs.gif" />
</div>
<td>
  <img src="http://dcs-931l-target/mjpeg.cgi" width="640" height="480" />
  <img id="gif3" src="crosshairs.gif" />
</tr></table>
</body></html>

To be continued…

Categories
Network Technologies Web Site Technologies

Superimpose grid on video ouput from an IP camera

Intro
We were asked to superimpose a grid on the video output of an IP camera for this year’s FIRST FRC competition, FIRST STRONGHOLD, a sort of medieval-themed contest with a castle and medieval-inspired obstacles. The present thinking is that a cheap D-Link DCS-931L camera will do just fine. It’s $30 on Amazon. I found this is a real research project because it is poorly documented. So in this blog I show how to do that.

The details
D-Link provides viewing software called D-ViewCam. It has a lot of options, but not the ability to superimpose a grid. It’s more being a security console – allowing views from multiple cameras. Capturing and recording images, that sort of thing. i knew in my heart that there had to be a URL to tap into the camera directly, but it wasn’t easy to find. First I found the URL for a capture image:

http://dcs-931l/image.jpg

and that’s all i could find! I thought, OK, I can work with even that. I’ll build a web page that includes that as source image and refreshes itself as fast as possible! And despite the crudeness o that approach, it actually worked. It was a little laggy (maybe 1.2 s or so) an a little jumpy, but good enough for our purposes. Time permitting, I will share that crude code.

HTML5 video image
But then, somewhat by accident, i found a D-Link blog post where they just happened to mention the URL to a video stream that will work in an HTML5-compatible browser such as Firefox. i can’t believe how hidden they keep this URL. It is:

http://dcs-931l/mjpeg.cgi

and you treat it like an image file.

That’s the first breakthrough.

Then I found a Stackoverflow page that described how to superimpose a grid in an HTML page using CSS – Cascading Style Sheets. That sounded pretty good to me. Actually that’s what i searched for. I know there are other ways to do it but Javascript gets ugly quickly and other methods are more kludgy. At least with CSS I feel I am learning something about CSS. I am not a web developer, just a fumbler.

It’s Broken on Firefox
So i carefully implement the Stackoverflow code. you have to understand that it’s presented so tidily that you feel there’s no way it could not work. I tried it out in Firefox. No matter how much I proof-read my code, it only drew the vertical bars of the grid, but not the horizontal lines! So Firefox’s either has a bug, or the features of CSS aren’t agreed upon by all major browser vendors.

At some point I came to try my code in Chrome – worked great! That was a shock. But I wanted it to work in Firefox since that is my principal browser. I finally found that for whatever reason, in Firefox the horizontal bars have to be drawn using a different function. Instead of a more simple linear-gradient CSS function which works just fine for the vertical bars, you need to resort to a more complex repeating-linear-gradient function.

so putting all this together we arrive at the html page code. It’s nice and brief.

<html>
<head>
<style type="text/css">
<!-- DrJ 1/2016
Note that Firefox's implementation of linear-gradient is broken and requires us to
use repeat linear gradient 
Some fairly lousy documentation on repeat linear gradient is here:
https://developer.mozilla.org/en-US/docs/Web/CSS/repeating-linear-gradient
 
-->
* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}
div {
  display: inline-block;
  position: relative;
  margin: 10px;
}
div:after {
  content: '';
  position: absolute;
  height: 100%;
  width: 100%;
  top: 0;
  left: 0;
  background: repeating-linear-gradient(to bottom, black, black 1px, transparent 1px, transparent 80px), linear-gradient(to right, black 1px, transparent 1px);
  background-size: 15%;
  padding: 1px;
}
</style></head>
<body>
<div>
  <img src="http://dcs-931l/mjpeg.cgi" width="480" height="320" />
</div>
</body></html>

That’s it!

Well, mostly. This puts a horizontal bar every 80 pixels. If i change that 80px to 15% (which is the parameter in effect for the vertical bars due to the background-size statement), it will work OK in Firefox. However, it does not work in Chrome. With 80px it works in both browsers.

Network info
Needless to say, dcs-931l is just the hostname of the camera, assuming that mDNS is all working which it generally does. You can replace that with the IP address. of course you have to be on the same LAN as the camera. This is not a setup for viewing the camera from the Internet which I haven’t looked into yet. mDNS is multicast DNS. I think this technology or its equivalent is pretty common in home networks these days. It’s a convenient way to assign (and later refer to by that name) a hostname to a dynamic IP address. There’s a Wikipedia article about it which gets pretty technical.

Where to put that HTML page – stupid Notepad tricks
Most people automatically feel HTML pages have to be on a web server, but they don’t. You can put that HTML above into a file on your PC and that’s what we will do. No local web server required at all. I just saved the file as “grid.htm” in Notepad – yes it’s as crude as it gets but I said I’m not a web developer. Yes, anyone who knew anything would at least get Notepad++, but oh well. By the way, to save a .htm file in Notepad just specify All Files and put the name in quotes “grid.htm”. I save it to C:\temp, so the URL becomes:

c:\temp\grid.htm

It shows up a little differently, but that’s what I typed in. And here’s a screen capture of my live video with the grid superimposed, just so it’s been documented as really working!

grid-capture

Measuring the lag of the video display
In this blog post I show an accessible technique for measuring lag that only requires two smartphones. I love to show this to students. They get all confused at first, but when you do it you see how obviously simple and accurate it is. So we measured the lag as .51 seconds. So not the best, but not terrible either.

Superimpose crosshairs instead of grid
Now that we’ve set up the basic approach, changing from a whole grid to just crosshairs, with thicker lines is as simple as changing 15% to 50%, plus changing 1px to 2px.

Password prompt

But we still get that password prompt initially when brining up our local web page. Even that can be fixed by embedding the username/passwor dinto the URL. Putting crosshairs and password toegther we arrive at this version:

<html>
<head>
<style type="text/css">
<!-- DrJ 1/2016
Note that Firefox's implementation of linear-gradient is broken and requires us to
use repeat linear gradient 
Some fairly lousy documentation on repeat linear gradient is here:
https://developer.mozilla.org/en-US/docs/Web/CSS/repeating-linear-gradient
 
-->
* {
  margin: 0;
  padding: 0;
  box-sizing: border-box;
}
div {
  display: inline-block;
  position: relative;
  margin: 10px;
}
div:after {
  content: '';
  position: absolute;
  height: 100%;
  width: 100%;
  top: 0;
  left: 0;
  background: repeating-linear-gradient(to bottom, black, black 2px, transparent 1px, transparent 50px), linear-gradient(to right, black 2px, transparent 2px);
  background-size: 50%;
  padding: 1px;
}
</style></head>
<body>
<div>
  <img src="http://admin:your_camera_password@dcs-931l/mjpeg.cgi" width="480" height="320" />
</div>
</body></html>

That is the Firefox version, of course. Replace your_camera_password with your camera’s password. Don’t use a password which contains the “@” character or things will get really complicated!

References and related
Link to competition information, including brief videos.
Cheap but functional D-Link video camera.
Stackoverflow description of superimposing a grid on an image using CSS.
Multicast DNS is described in excruciating detail here.
Blog post on measuring lag and getting streaming to work on the Raspberry Pi camera.

Categories
Admin Web Site Technologies

The IT Detective agency: Outlook client is Disconnected, all else fine

Intro
Today we were asked to consult on the following problem. Some proxy users at a large company could not connect to Microsoft Outlook. Only a few users were affected. Fix it.

The details
Affected users would bring up Outlook and within a few short seconds it would simply show Disconnected and stay that way.

It was quickly established that the affected users shared this in common: they use LDAP authentication and proxy-basic-authentication. The users who worked used NTLM authentication. The way they distinguish one from the other is by using a different proxy autoconfiguration (PAC) file.

More observations
Well, actually there was almost no difference whatsoever between the two PAC files. They are syntactically identical. The only difference in fact is that a different proxy is handed out for the NTLM users. That’s it!

We were able to reproduce the problem ourselves by using the same PAC file as the affected user. We tried to trace the traffic on our desktop but it was a complete mess. I did not see any connection to the designated proxy for Outlook traffic, but it’s hard to say definitively because there is so much other junk present. Strangely, all web sites worked OK and even the web-based version of Outlook works OK. So this Outlook client was the only known application having a problem.

When the affected users put in the proxy directly in manual proxy settings in IE and turned off proxy autoconfig, then Outlook worked. Strange.

We observed the header for the PAC file was a little bit inconsistent (it was being served from multiple web servers through a load balancer). The content-tyep MIME header was coming back as either text/plain or there was no such header at all, depending on which web server you were hitting. But note that the NTLM users were also getting PAC files with this same header.

The solution

Although everything had been fine with this header situation up until the introduction of Outlook, we guessed it was technically incorrect and should be fixed. We changed all web servers to have the PAC file be served with this MIME header:

Content-Type: application/x-ns-proxy-autoconfig

The results

A re-test confirmed that this fixed the Outlook problem for the LDAP-affected users. NTLM users were not impacted and continued to work fine.

Conclusion
A strange Outlook connection problem was resolved in large company Intranet by adjusting the PAC file to include the correct content-type header. Case closed!

References and related information
Here’s a PAC file case we never did resolve: excessive calls to the PAC file web server from individual users.

Categories
Admin Network Technologies Proxy Security TCP/IP Web Site Technologies

The IT Detective Agency: Cisco Jabber stopped working for some using WAN connections

Intro
This is probably the hardest case I’ve ever encountered. It’s so complicated many people needed to get involved to contribute to the solution.

Initial symptoms

It’s not easy to describe the problem while providing appropriate obfuscation. Over the course of a few days it came to light that in this particular large company for which I consult many people in office locations connected via an MPLS network were no longer able to log in to Cisco Jabber. That’s Cisco’s offering for Instant Messaging. When it works and used in combination with Cisco IP phones it’s pretty good – has some nice features. This major problem was first reported November 17th.

Knee-jerk reactions
Networking problem? No. Network guys say their networks are running fine. They may be a tad overloaded but they are planning to route Internet over the secondary links so all will be good in a few days.
Proxy problem? Nope. proxy guys say their Bluecoat appliances are running fine and besides everyone else is working.
Application problem? Application owner doesn’t see anything out of the ordinary.
Desktop problem? Maybe but it’s unclear.

Methodology
So of the 50+ users affected I recognized two power users that I knew personally and focussed on them. Over the course of days I learned:
– problem only occurs for WAN (MPLS) users
– problem only occurs when using one particular proxy
– if a user tries to connect often enough, they may eventually get in
– users can get in if they use their VPN client
– users at HQ were not affected

The application owner helpfully pointed out the URL for the web-based version of Cisco Jabber: https://loginp.webexconnect.com/… Anyone with the problem also could not log in to this site.

So working with these power users who patiently put up with many test suggestions we learned:

– setting the PC’s MTU to a small value, say 512 up to 696 made it work. Higher than that it generally failed.
– yet pings of up to 1500 bytes went through OK.
– the trace from one guy’s PC showed all his packets re-transmitted. We still don’t understand that.
– It’s a mess of communications to try to understand these modern, encrypted applications
– even the simplest trace contained over 1000 lines which is tough when you don’t know what you’re looking for!
– the helpful networking guy from the telecom company – let’s call him “Regal” – worked with us but all the while declaring how it’s impossible that it’s a networking issue
– proxy logs didn’t show any particular problem, but then again they cannot look into SSL communication since it is encrypted
– disabling Kaspersky helped some people but not others
– a PC with the problem had no problem when put onto the Internet directly
– if one proxy associated with the problem forwarded the requests to another, then it begins to work
– Is the problem reproducible? Yes, about 99% of the time.
– Do other web sites work from this PC? Yes.

From previous posts you will know that at some point I will treat every problem as a potential networking problem and insist on a trace.

Biases going in
So my philosophy of problem solving which had stood the test of time is either it’s a networking problem, or it’s a problem on the PC. Best is if there’s a competition of ideas in debugging so that the PC/application people seek to prove beyond a doubt it is a networking problem and the networking people likewise try to prove problem occurs on the PC. Only later did I realize the bias in this approach and that a third possibility existed.

So I enthused: what we need is a non-company PC – preferably on the same hardware – at the same IP address to see if there’s a problem. Well we couldn’t quite produce that but one power user suggested using a VM. He just happened to have a VM environment on his PC and could spin up a Windows 7 Professional generic image! So we do that – it shows the problem. But at least the trace form it is a lot cleaner without all the overhead of the company packages’ communication.

The hard work
So we do the heavy lifting and take a trace on both his VM with the problem and the proxy server and sit down to compare the two. My hope was to find a dropped packet, blame the network and let those guys figure it out. And I found it. After the client hello (this is a part of the initial SSL protocol) the server responds with its server hello. That packet – a largeish packet of 1414 bytes – was not coming through to the client! It gets re-transmitted multiple times and none of the re-transmits gets through to the PC. Instead the PC receives a packet the proxy never sent it which indicates a fatal SSL error has occurred.

So I tell Regal that look there’s a problem with these packets. Meanwhile Regal has just gotten a new PC and doesn’t even have Wireshark. Can you imagine such a world? It seems all he really has is his tongue and the ability to read a few emails. And he’s not convinced! He reasons after all that the network has no intelligent, application-level devices and certainly wouldn’t single out Jabber communication to be dropped while keeping everything else. I am no desktop expert so I admit that maybe some application on the PC could have done this to the packets, in effect admitting that packets could be intercepted and altered by the PC even before being recorded by Wireshark. After all I repeated this mantra many times throughout:

This explanation xyz is unlikely, and in fact any explanation we can conceive of is unlikely, yet one of them will prove to be correct in the end.

Meanwhile the problem wasn’t going away so I kludged their proxy PAC file to send everyone using jabber to the one proxy where it worked for all.

So what we really needed was to create a span port on the switch where the PC was plugged in and connect a 2nd PC to a port enabled in promiscuous mode with that mirrored traffic. That’s quite a lot of setup and we were almost there when our power user began to work so we couldn’t reproduce the problem. That was about Dec 1st. Then our 2nd power user also fell through and could no longer reproduce the problem either a day later.

10,000 foot view
What we had so far is a whole bunch of contradictory evidence. Network? Desktop? We still could not say due to the contradictions, the likes of which I’ve never witnessed.

Affiliates affected and find the problem
Meanwhile an affiliate began to see the problem and independently examined it. They made much faster progress than we did. Within a day they found the reason (suggested by their networking person from the telecom, who apparently is much better than ours): the server hello packet has the expedited forwarding (EF) flag set in the differentiated code services point (DSCP) section of the IP header.

Say what?
So I really got schooled on this one. I was saying It has to be an application-aware “something” on the network or PC that is purposefully messing around with the SSL communication. That’s what the evidence screamed to me. So a PC-based firewall seemed a strong contender and that is how Regal was thinking.

So the affiliate explained it this way: the company uses QOS on their routers. Phone (VOIP) gets priority and is the only application where the EF bit is expected to be set. VOIP packets are small, by the way. Regular applications like web sites should just use the default QOS. And according to Wikipedia, many organizations who do use QOS will impose thresholds on the EF pakcets such that if the traffic exceeds say 30% of link capacity drop all packets with EF set that are over a certain size. OK, maybe it doesn’t say that, but that is what I’ve come to understand happens. Which makes the dropping of these particular packets the correct behaviour as per the company’s own WAN contract and design. Imagine that!

Smoking gun no more
So now my smoking gun – blame it on the network for dropped packets – is turned on its head. Cisco has set this EF bit on its server hello response on the loginp.webexconnect.com web site. This is undesirable behaviour. It’s not a phone call after all which requires a minimum jitter in packet timing.

So next time I did a trace I found that instead of EF flag being set, the AF (Assured Forwarding) flag was set. I suppose that will make handling more forgiving inside the company’s network, but I was told that even that was too much. Only default value of 0 should be set for the DSCP value. This is an open issue in Cisco’s hands now.

But at least this explains most observations. Small MTU worked? Yup, those packets are looked upon more favorably by the routers. One proxy worked, the other did not? Yup, they are in different data centers which have different bandwidth utilization. The one where it was not working has higher utilization. Only affected users are at WAN sites? Yup, probably only the WAN routers are enforcing QOS. Worked over VPN, even on a PC showing the problem? Yup – all VPN users use a LAN connection for their proxy settings. Fabricated SSL fatal error packet? I’m still not sure about that one – guess the router sent it as a courtesy after it decided to drop the server hello – just a guess. Problem fixed by shutting down Kaspersky? Nope, guess that was a red herring. Every problem has dead ends and red herrings, just a fact of life. And anyway that behaviour was not very consistent. Problem started November 17th? Yup, the affiliate just happened to have a baseline packet trace from November 2nd which showed that DSCP was not in use at that time. So Cisco definitely changed the behaviour of Cisco Jabber sometime in the intervening weeks. Other web sites worked, except this one? Yup, other web sites do not use the DSCP section of the IP header so it has the default value of 0.

Conclusion
Cisco has decided to remove the DSCP flag from these packets, which will fix everything. Perhaps EF was introduced in support of Cisco Jabber’s extended use as a soft phone??? Then this company may have some re-design of their QOS to take care of because I don’t see an easy solution. Dropping the MTU on the proxy to 512 seems pretty drastic and inefficient, though it would be possible. My reading of TCP is that nothing prevents QOS from being set on any sort of TCP packet even though there may be a gentleman’s agreement to not ordinarily do so in all except VOIP packets or a few other special classes. I don’t know. I’ve really never looked at QOS before this problem came along.

The company is wisely looking for a way to set all packets with DSCP = 0 on the Intranet, except of course those like VOIP where it is explicitly supposed to be used. This will be done on the Internet router. In Cisco IOS it is possible with a policy map and police setting where you can set set-dscp-transmit default. Apparently VPN and other things that may check the integrity of packets won’t mind the DSCP value being altered – it can happen anywhere along the route of the packet.

Boy applications these days are complicated! And those rare times when they go wrong really require a bunch of cooperating experts to figure things out. No one person holds all the expertise any longer.

My simplistic paradigm of its either the PC or the network had to make room for a new reality: it’s the web site in the cloud that did them in.

Could other web sites be similarly affected? Yes it certainly seems a possibility. So I now know to check for use of DSCP if a particular web site is not working, but all others are.

References and related
This Wikipedia article is a good description of DSCP: https://en.wikipedia.org/wiki/Differentiated_services

Categories
Admin Apache Hosting Service IT Operational Excellence Linux Web Site Technologies

Scaling your apache to handle more requests

Intro
I was running an apache instance very happily with mostly default options until the day came that I noticed it was taking seconds to serve a simple web page – one that it used to serve in 50 ms or so. I eventually rolled up my sleeves to see what could be done about it. It seems that what had changed is that it was being asked to handle more requests than ever before.

The details
But the load average on a 16-core server was only at 2! sar showed no particular problems with either cpu of I/O systems. Both showed plenty of spare capacity. A process count showed about 258 apache processes running.

An Internet search helped me pinpoint the problem. Now bear in mind I use a version of apache I myself compiled, so the file layout looks different from the system-supplied apache, but the ideas are the same. What you need is to increase the number of allowed processes. On my server with its great capacity I scaled up considerably. These settings are in /conf/extra/httpd-mpm.conf in the compiled version. In the system-supplied version on SLES I found the equivalent to be /etc/apache2/server-tuning.conf. To begin with the key section of that file had these values:

<IfModule mpm_prefork_module>
    StartServers             5
    MinSpareServers          5
    MaxSpareServers         10
    MaxRequestWorkers      250
    MaxConnectionsPerChild   0
</IfModule>

(The correct section is <IfModule prefork.c> in the system-supplied apache).

I replaced these as follows:

<IfModule mpm_prefork_module>
    StartServers          256
    MinSpareServers        16
    MaxSpareServers       128
    ServerLimit          2048
    MaxClients           2048
    MaxRequestsPerChild  20000
</IfModule>

Note that ServerLimit has to be greater than or equal to MaxClients (thank you Apache developers!) or you get an error like this when you start apache:

WARNING: MaxClients of 2048 exceeds ServerLimit value of 256 servers,
 lowering MaxClients to 256.  To increase, please see the ServerLimit
 directive.

So you make this change, right, stop/start apache and what difference do you see? Probably none whatsoever! Because you probably forgot to uncomment this line in httpd.conf:

#Include conf/extra/httpd-mpm.conf

So remove the # at the beginning of that line and stop/start. If like me you’ve changed the usual diretory where the PID file and lock file get written in your httpd.conf file you may need this additional measure which I had to do in the httpd-mpm.conf file:

<IfModule !mpm_netware_module>
    #PidFile "logs/httpd.pid"
</IfModule>
 
#
# The accept serialization lock file MUST BE STORED ON A LOCAL DISK.
#
<IfModule !mpm_winnt_module>
<IfModule !mpm_netware_module>
#LockFile "logs/accept.lock"
</IfModule>
</IfModule>

In other words I commented out this file’s attempt to place the PID and lock files in a certain place because I have my own way of storing those and it was overwriting my choices!

But with all those changes put together it works much, much better than before and can handle more requests than ever.

Analysis
In creating a simple benchmark we could easily scale to 400 requests / second, and we didn’t really even try to push it – and this was before we changed any parameters. So why couldn’t 250 or so simultaneous processes handle more real world requests? I believe that if all clients were as fast as our server it could have handled them all. But the clients themselves were sometimes distant (thousands of miles) with slow or lossy connections. Then they need to acknowledge every packet sent by the web server and the web server has to wait around for that, unable to go on to the next client request! Real life is not like laboratory testing. As the waiting around bit requires next-to-no cpu the load average didn’t rise even though we had run up against a limit, the limit was an artificial application-imposed one, not a system-imposed resource constraint.

More analysis, what about threads?

Is this the only or best way to scale up your web server? Probably not. It’s probably the most practical however because you probably didn’t compile it with support for threads. I know I didn’t. Or if you’re using the system-provided package it probably doesn’t support threads. Find your httpd binary. Run this command:

$ ./httpd -l|grep prefork

If it returns:

  prefork.c

you have the prefork module and not the worker module and the above approach is what you need to do. To me a more modern approach is to scale by using threads – modern cpus are designed to run threads, which are kind of like light-weight processes. But, oh well. The gatekeepers of apache packages seem stuck in this simple-minded one process per request mindset.

Conclusion
My scaled-up apache is handling more requests than ever. I’ve documented how I increased the total process count.

References and related articles
How I compiled apache 2.4 and ran into (and resolved) a zillion errors seems to be a popular post!
The mystery of why we receive hundreds or even thousands of PAC file requests from each client every day remains unsolved to this day. That’s why we needed to scale up this apache instance – it is serving the PAC file. I first wrote about it three and a half years ago!04

Categories
Apache CentOS Hosting Service Web Site Technologies

Compiling Apache 2.4 on CentOS

Intro
This is a tale of one thing leading to another. I’ll probably either continue this post or delete it altogether if I find I’m headed down a wrong path.

The details
I suspect that to get better marks for my server’s SSL implementation I probably need apache 2.4. There is an RPM for apache 2.4 but it is almost two years old! So I decided to bite the bullet and compile the darn thing myself. Easier said than done. My current production version is 2.2.15.

Now if you just want to compile a recent version of apache 2.4 then this guide is much, much better than mine: https://jasonpowell42.wordpress.com/2013/04/05/install-apache-2-4-4-on-centos-6-4/. My guide, where I’ve hit just about every conceivable error and powered through, is more for timid folks like me who want to keep their current apache 2.2 running while trying 2.4. In spite of what you read elsewhere this is possible to do, but you need patience and perseverance.

Getting the source is easy enough. Then you configure it:

httpd-2.4.16$ ./configure −−prefix=/usr/local/apache24

checking for chosen layout... Apache
checking for working mkdir -p... yes
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
configure:
configure: Configuring Apache Portable Runtime library...
configure:
checking for APR... configure: WARNING: APR version 1.4.0 or later is required, found 1.3.9
configure: WARNING: skipped APR at apr-1-config, version not acceptable
no
configure: error: APR not found.  Please read the documentation.

What version of apr do we have?

$ sudo rpm −qa|grep apr

apr-util-devel-1.3.9-3.el6_0.1.x86_64
apr-util-1.3.9-3.el6_0.1.x86_64
apr-util-ldap-1.3.9-3.el6_0.1.x86_64
apr-1.3.9-5.el6_2.x86_64
apr-devel-1.3.9-5.el6_2.x86_64

Drat. No wonder we’re having trouble. Guess we could compile apr ourselves, but perhaps there’s a suitable version out there somewhere we can simply download?


Warning: this approach to apr shown below was a dead end for me. Further down I show a successful approach.

$ sudo yum search apr

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: linux.cc.lehigh.edu
 * epel: mirror.us.leaseweb.net
 * extras: mirror.rackspace.com
 * updates: mirror.es.its.nyu.edu
========================================================== N/S Matched: apr ==========================================================
...
httpd24-apr-debuginfo.x86_64 : Debug information for package httpd24-apr
httpd24-apr-devel.x86_64 : APR library development kit
httpd24-apr-util-debuginfo.x86_64 : Debug information for package
                                  : httpd24-apr-util
httpd24-apr-util-devel.x86_64 : APR utility library development kit
httpd24-apr-util-ldap.x86_64 : APR utility library LDAP support
httpd24-apr-util-mysql.x86_64 : APR utility library MySQL DBD driver
httpd24-apr-util-nss.x86_64 : APR utility library NSS crytpo support
httpd24-apr-util-odbc.x86_64 : APR utility library ODBC DBD driver
httpd24-apr-util-openssl.x86_64 : APR utility library OpenSSL crytpo support
httpd24-apr-util-pgsql.x86_64 : APR utility library PostgreSQL DBD driver
httpd24-apr-util-sqlite.x86_64 : APR utility library SQLite DBD driver
httpd24-apr.x86_64 : Apache Portable Runtime library
httpd24-apr-util.x86_64 : Apache Portable Runtime Utility library
...

I singled out the promising looking ones. After all it’s apache 2.4 that’s driving the need for this version so the httpd24 versions of apr should suffice.

So I installed these:

$ sudo yum install httpd24-apr-util.x86_64
$ sudo yum install httpd24-apr-util-devel.x86_64

Now how do we tell the configurator where our new apr package is?

httpd-2.4.16$ ./configure −−help|grep −i apr

  --enable-hook-probes    Enable APR hook probes
  --with-included-apr     Use bundled copies of APR/APR-Util
  --with-apr=PATH         prefix for installed APR or the full path to
                             apr-config
  --with-apr-util=PATH    prefix for installed APU or the full path to

The with-apr switch looks promising. Now we guess as to exactly what we should put for the path. Here’s what happens when we guess wrong:

httpd-2.4.16$ ./configure −−with-apr=/opt/rh/httpd24/root/usr/lib64 −−prefix=/usr/local/apache24

checking for chosen layout... Apache
checking for working mkdir -p... yes
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
configure:
configure: Configuring Apache Portable Runtime library...
configure:
checking for APR... configure: error: the --with-apr parameter is incorrect. It must specify an install prefix, a build directory, or an apr-config file.

I’ll spare you the guesswork. Here is the path correctly specified:

httpd-2.4.16$ ./configure −−with-apr=/opt/rh/httpd24/root/usr −−prefix=/usr/local/apache24

...
configure: Configuring Apache Portable Runtime Utility library...
configure:
checking for APR-util... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking for gcc option to accept ISO C99... -std=gnu99
checking for pcre-config... false
configure: error: pcre-config for libpcre not found. PCRE is required and available from http://pcre.org/

So we finally got past the apr error and are onto the next one : (. I’ll try to install pcre-devel to see if that helps:

$ sudo yum install pcre-devel.x86_64

Wow! Got lucky that time. That cleared up that error and the configure went all the way through!

Oh, no. It doesn’t compile! It begins to, but it can’t compile export.c:

httpd-2.4.16$ make

...
gawk -f /usr/local/src/apache24/httpd-2.4.16/build/make_exports.awk `cat export_files` > exports.c
/usr/lib64/apr-1/build/libtool --silent --mode=compile gcc -std=gnu99  -pthread      -DLINUX=2 -D_REENTRANT -D_GNU_SOURCE     -I. -I/usr/local/src/apache24/httpd-2.4.16/os/unix -I/usr/local/src/apache24/httpd-2.4.16/include -I/opt/rh/httpd24/root/usr/include/apr-1 -I/usr/include/apr-1 -I/usr/local/src/apache24/httpd-2.4.16/modules/aaa -I/usr/local/src/apache24/httpd-2.4.16/modules/cache -I/usr/local/src/apache24/httpd-2.4.16/modules/core -I/usr/local/src/apache24/httpd-2.4.16/modules/database -I/usr/local/src/apache24/httpd-2.4.16/modules/filters -I/usr/local/src/apache24/httpd-2.4.16/modules/ldap -I/usr/local/src/apache24/httpd-2.4.16/modules/loggers -I/usr/local/src/apache24/httpd-2.4.16/modules/lua -I/usr/local/src/apache24/httpd-2.4.16/modules/proxy -I/usr/local/src/apache24/httpd-2.4.16/modules/session -I/usr/local/src/apache24/httpd-2.4.16/modules/ssl -I/usr/local/src/apache24/httpd-2.4.16/modules/test -I/usr/local/src/apache24/httpd-2.4.16/server -I/usr/local/src/apache24/httpd-2.4.16/modules/arch/unix -I/usr/local/src/apache24/httpd-2.4.16/modules/dav/main -I/usr/local/src/apache24/httpd-2.4.16/modules/generators -I/usr/local/src/apache24/httpd-2.4.16/modules/mappers  -prefer-non-pic -static -c exports.c && touch exports.lo
exports.c:1244: error: redefinition of ‘ap_hack_apr_allocator_create’
exports.c:198: note: previous definition of ‘ap_hack_apr_allocator_create’ was here
exports.c:1245: error: redefinition of ‘ap_hack_apr_allocator_destroy’
exports.c:199: note: previous definition of ‘ap_hack_apr_allocator_destroy’ was here
exports.c:1246: error: redefinition of ‘ap_hack_apr_allocator_alloc’
exports.c:200: note: previous definition of ‘ap_hack_apr_allocator_alloc’ was here
exports.c:1247: error: redefinition of ‘ap_hack_apr_allocator_free’
exports.c:201: note: previous definition of ‘ap_hack_apr_allocator_free’ was here
exports.c:1248: error: redefinition of ‘ap_hack_apr_allocator_owner_set’

This could be tough! Maybe impossible for me to get past. I’ve never encountered this kind of error. OK. Got it. Not so tough. I had two versions of apr installed – the old one needed by my apache 2.2 and the new one installed as shown above. I didn’t want to completely blow away the old one as I feared that it is dynamically linked by Apache 2.2, so I did the following:

$ cd /usr/lib64; sudo mv apr-1 drjapr-1
– then change to my apache24 root directory and run configure again; then run make

And it went through this time!

Only modules installed
make install however only installed modules, not the httpd binary.

The problem seems related to my original apr libraries. They look like this:

$ sudo rpm −qa|grep ^apr

apr-util-devel-1.3.9-3.el6_0.1.x86_64
apr-util-1.3.9-3.el6_0.1.x86_64
apr-util-ldap-1.3.9-3.el6_0.1.x86_64
apr-1.3.9-5.el6_2.x86_64
apr-devel-1.3.9-5.el6_2.x86_64

I tried to move them all to a temporary directory but then the compiler cannot find libtool which is normally supplied by apr-devel.

I considered removing apr-devel, but boy there are so many dependencies that my other packages have on it that I did not feel comfortable doing that. PHP, apache2.2 and a whole lot more depend on it.

End of dead end approach to apr


New approach needed
My new approach is to try to use the APR from apache itself by downloaded the Unix sources for APR and apr-util from http://apr.apache.org/download.cgi. Yes, this worked best of all. I even put back all the apr files I had moved in the previous failed effort.

It’s not very clear what they mean by unpacking apr and apr-util in srclib. I created symlinks in my srclib directory such that apr -> apr-1.5.2 and apr-util -> apr-util-1.5.4. For the inexperienced the command format is like in this example:

$ ln −s apr-1.5.2 apr

Of course you first have to download the source tarball to your srclib directory and unpack it:

$ tar zxf apr-1.5.2.tar.gz

It Compiles and Installs
So after all those misfires I finally got a version that compiled and installed in its entirety. That process starts with this configure command:

$ ./configure −−with-included-apr −−prefix=/usr/local/apache24

Then the usual make and sudo make install.

Modules problem
I inherited a configuration that had a mods-avalable and a mods-enabled directory which is how my old apache 2.2 was set up. After tweaking the modules path using the replace command, something like this

$ cd /etc; cp −pr apache2 apache24; cd mods-avalable
$ sudo replace /usr/lib/apache2 /usr/local/apache24 −− *.load

I still could not start my new server:

Starting apache24: httpd: Syntax error on line 203 of /etc/apache24/apache24.conf: Syntax error on line 1 of /etc/apache24/mods-enabled/authz_default.load: Cannot load /usr/local/apache24/modules/mod_authz_default.so into server: /usr/local/apache24/modules/mod_authz_default.so: cannot open shared object file: No such file or directory
                                                           [FAILED]

I looked at all my configuration files and don’t see anything that relies on this module so I deleted the reference to it in mods-enabled.

Starting apache24: httpd: Syntax error on line 203 of /etc/apache24/apache24.conf: Syntax error on line 1 of /etc/apache24/mods-enabled/cgi.load: Cannot load /usr/local/apache24/modules/mod_cgi.so into server: /usr/local/apache24/modules/mod_cgi.so: cannot open shared object file: No such file or directory
                                                           [FAILED]

Now I do like to run CGI programs on occasion so this one can’t be so easily brushed aside. It could be that we should be using mod_cgid.so instead.

Then it’s onto this error:

Starting apache24: httpd: Syntax error on line 203 of /etc/apache24/apache24.conf: Syntax error on line 1 of /etc/apache24/mods-enabled/php5.load: Cannot load /usr/local/apache24/modules/libphp5.so into server: /usr/local/apache24/modules/libphp5.so: cannot open shared object file: No such file or directory
                                                           [FAILED]

I use php so I may have to investigate this one in some detail. simply trying to update the link to where the old libphp5.so resides under apache2.2 brings up this different kind of error:

Starting apache24: httpd: Syntax error on line 203 of /etc/apache24/apache24.conf: Syntax error on line 1 of /etc/apache24/mods-enabled/php5.load: Cannot load /usr/lib/apache2/modules/libphp5.so into server: /usr/lib/apache2/modules/libphp5.so: undefined symbol: unixd_config
                                                           [FAILED]

Wow. I’m reading various things and it looks like I’ll now have to compile php5 as well. This is getting hairy. This site, although old, seems to explain it most clearly. And of course I’ve got php 5.3 which you can’t even find source for on the php web site, www.php.net

So I downloaded php5.4.43, which is the oldest one I could find on the php web site!

To configure it I used this long list of options, some of which are determined by my choices of location for my apache24 files:

$ ./configure ‐‐with‐apxs2=/usr/local/apache24/bin/apxs ‐‐with‐mysql ‐‐prefix=/usr/local/apache24/php5 ‐‐with‐config‐file ‐path=/usr/local/apache24/php5 ‐‐disable‐cgi ‐‐with‐zlib ‐‐with‐gettext ‐‐with‐gdbm ‐‐with‐curl ‐‐with‐openssl

2017 update for php
I finally needed to update some WordPress packages and found my only transport is ftp. I think my command-line compile options for php5 above leave something to be desired. I think I need to add curl and openssl like so:

$ ./configure ‐‐with‐apxs2=/usr/local/apache24/bin/apxs ‐‐with‐mysql ‐‐prefix=/usr/local/apache24/php5 ‐‐with‐config‐file ‐path=/usr/local/apache24/php5 ‐‐disable‐cgi ‐‐with‐zlib ‐‐with‐gettext ‐‐with‐gdbm ‐‐with‐curl ‐‐with‐openssl

but I get these errors:

ext/curl/.libs/interface.o: In function `php_curl_option_url':
/usr/local/src/php5/php-5.4.43/ext/curl/interface.c:180: undefined reference to `core_globals'
ext/curl/.libs/interface.o: In function `_php_curl_setopt':
/usr/local/src/php5/php-5.4.43/ext/curl/interface.c:1821: undefined reference to `core_globals'
/usr/local/src/php5/php-5.4.43/ext/curl/interface.c:1804: undefined reference to `core_globals'
ext/curl/.libs/interface.o: In function `curl_progress':
/usr/local/src/php5/php-5.4.43/ext/curl/interface.c:1113: undefined reference to `executor_globals'
ext/curl/.libs/interface.o: In function `curl_write_header':
/usr/local/src/php5/php-5.4.43/ext/curl/interface.c:1264: undefined reference to `executor_globals'
ext/curl/.libs/interface.o: In function `curl_write':
/usr/local/src/php5/php-5.4.43/ext/curl/interface.c:1038: undefined reference to `executor_globals'
ext/curl/.libs/interface.o: In function `curl_read':
/usr/local/src/php5/php-5.4.43/ext/curl/interface.c:1187: undefined reference to `executor_globals'
ext/curl/.libs/streams.o: In function `php_curl_stream_opener':
/usr/local/src/php5/php-5.4.43/ext/curl/streams.c:320: undefined reference to `file_globals'
/usr/local/src/php5/php-5.4.43/ext/curl/streams.c:406: undefined reference to `core_globals'
/usr/local/src/php5/php-5.4.43/ext/curl/streams.c:414: undefined reference to `core_globals'
ext/curl/.libs/streams.o: In function `on_data_available':
/usr/local/src/php5/php-5.4.43/ext/curl/streams.c:68: undefined reference to `executor_globals'
ext/standard/.libs/info.o: In function `php_info_print_request_uri':
/usr/local/src/php5/php-5.4.43/ext/standard/info.c:97: undefined reference to `sapi_globals'
ext/standard/.libs/info.o: In function `php_print_gpcse_array':
/usr/local/src/php5/php-5.4.43/ext/standard/info.c:213: undefined reference to `executor_globals'
ext/standard/.libs/info.o: In function `php_print_info':
/usr/local/src/php5/php-5.4.43/ext/standard/info.c:918: undefined reference to `executor_globals'
collect2: ld returned 1 exit status
make: *** [sapi/cli/php] Error 1

Here the problem seems to be that since I had already compiled php5 and left it around, it was using the old parts.

You need to do a make clean first! Then it compiles.

Now I’m down to this apache error:

$ sudo service apache24 start

Starting apache24: AH00526: Syntax error on line 55 of /etc/apache24/apache24.conf:
Invalid command 'LockFile', perhaps misspelled or defined by a module not included in the server configuration
                                                           [FAILED]

I’m going to just try to comment out that pesky Option LockFile…. I’ve found this apache page which is helpful for this upgrade: http://httpd.apache.org/docs/trunk/upgrading.html OK, next error:

Starting apache24: AH00526: Syntax error on line 145 of /etc/apache24/apache24.conf:
Invalid command 'User', perhaps misspelled or defined by a module not included in the server configuration
                                                           [FAILED]

Here the advice is to load module mod_unixd. I don’t even have anything like that so I’m looking into it now. OK. It’s in the apache24/modules so I just need to load it in. Next error:

Starting apache24: AH00526: Syntax error on line 161 of /etc/apache24/apache24.conf:
Invalid command 'Order', perhaps misspelled or defined by a module not included in the server configuration

Wow. That comes from this pretty standard line:

<Files ~ "^\.ht">
    Order allow,deny
    Deny from all
    Satisfy all
</Files>

This is a helpful document: http://httpd.apache.org/docs/trunk/upgrading.html. So at their recommendation I replaced all that with a

Require all denied

That leads to the next error:

Starting apache24: AH00526: Syntax error on line 166 of /etc/apache24/apache24.conf:
Invalid command 'Require', perhaps misspelled or defined by a module not included in the server configuration

It means Require is not even found. I needed to load some new modules, names authz_code and unixd:

LoadModule authz_core_module /usr/local/apache24/modules/mod_authz_core.so
LoadModule unixd_module /usr/local/apache24/modules/mod_unixd.so

Next error:

AH00526: Syntax error on line 20 of /etc/apache24/mods-enabled/alias.conf:
Invalid command 'Order', perhaps misspelled or defined by a module not included in the server configuration

so some of my old conf files that I copied over use the old syntax. The alias.conf file looked like this:

Alias /icons/ "/var/www/icons/"
 
<Directory "/var/www/icons">
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

Again looking at http://httpd.apache.org/docs/trunk/upgrading.html they suggest to replace the Order… and following line with:

Require all granted

Next error:

AH00526: Syntax error on line 3 of /etc/apache24/mods-enabled/deflate.conf:
Invalid command 'AddOutputFilterByType', perhaps misspelled or defined by a module not included in the server configuration
                                                           [FAILED]

But I was already loading the deflate module which defines AddOutputfilterByType. What I learned is that in apache 2.4 you also need to load mod_filter.

And the next error please:

AH00526: Syntax error on line 43 of /etc/apache24/mods-enabled/ssl.conf:
SSLSessionCache: 'shmcb' session cache not supported (known names: ). Maybe you need to load the appropriate socache module (mod_socache_shmcb?).

That’s in complaint about this line:

SSLSessionCache        shmcb:${APACHE_RUN_DIR}/ssl_scache(512000)

The standard advice for this error is to uncomment this line:

LoadModule socache_shmcb_module modules/mod_socache_shmcb.so

But I don’t have that module!

I guess I chose the wrong options when doing the initial ./configure. See the references for a proper guide that lists some good options.

I’m now trying to configure like this:

$ ./configure −−with-included-apr −−prefix=/usr/local/apache24 −−enable-php5 −−enable-so −−enable-ssl −−with-mpm=prefork

Actually I don’t know if I needed all those options such as enable-ssl. The main thing was that my apache 2.2 mods-available directory didn’t have a mention of mod_socache_shmcb.so. My apache 2.4 built with these config options definitely does. so I just need one of these LoadModule statements like this:

LoadModule socache_shmcb_module /usr/local/apache24/modules/mod_socache_shmcb.so

Well we’ve moved six lines down into that config file. I guess that’s progress! because now we’ve made it all the wy to line 49:

AH00526: Syntax error on line 49 of /etc/apache24/mods-enabled/ssl.conf:
Invalid command 'SSLMutex', perhaps misspelled or defined by a module not included in the server configuration
                                                           [FAILED]

Even apache’s upgrade guide documents this error. It’s caused by a conf file line that looks something like this:

SSLMutex  file:${APACHE_RUN_DIR}/ssl_mutex

and they say – I’m paraphrasing here – just try to comment it out and hope for the best.

Next error:

AH00526: Syntax error on line 9 of /etc/apache24/mods-enabled/status.conf:
Invalid command 'Order', perhaps misspelled or defined by a module not included in the server configuration
                                                           [FAILED]

Yeah status.conf has

    Order deny,allow
    Deny from all
    Allow from 127.0.0.1 ::1

We’ll try to replace that with this:

Require host 127.0.0.1 ::1

Now it runs through all the configuration OK but doesn’t actually start. I had set up an init.d script and I wasn’t going to go into this but I may have to:

$ sudo service apache24 start

httpd (pid 30896) already running

Remember I am trying to run this while still running the old apache 2.2 server. Process 30896 is the old apache 2.2:

root     30896     1  0 10:05 ?        00:00:00 /usr/sbin/httpd -d /etc/apache2 -f apache2.conf

This results from the byzantine way I set up to launch apache. There is a /etc/sysconfig/apache24 which doesn’t do much other than import environment variable definitions from /etc/apache24/envvars, except I had forgotten to update that path so it pointed to the old /etc/apache2/envvars.

Now it starts! But not without complaint:

Starting apache24: [Thu Aug 06 11:18:04.711658 2015] [core:warn] [pid 22911] AH00117: Ignoring deprecated use of DefaultType in line 178 of /etc/apache24/apache24.conf.
                                                              [  OK  ]

That stems from this line which tries to establish a default MIME type:

DefaultType text/plain

I also notice I cannot really get the status of my new web server:

$ sudo service apache24 status

httpd dead but subsys locked

So stopping/starting doesn’t really work either once it’s started.

What I found is that it seems happier if I have a line in /etc/sysconfig/apache24 which has an explicit PIDFILE defined – I use PIDFILE=/var/run/apache24.pid – with the same filepath as is mentioned in the apache24.conf file, where I have PidFile ${APACHE_PID_FILE} where APACHE_PID_FILE is taken from my envvars and has the value /var/run/apache24.pid. OK, my setup is very convoluted and probably unique. But the problem is common on CentOS so the main takeway is to have consistent reference to the pidfile filepath in /etc/sysconfig/httpd or whatever you are calling it as in your main config file httpd.conf or whatever you are calling it.

Home page test (I’m running on port 1443 to avoid conflict with my production server):

$ curl −i −k https://127.0.0.1:1443/

HTTP/1.1 301 Moved Permanently
Date: Wed, 05 Aug 2015 18:33:19 GMT
Server: Apache/2
X-Powered-By: PHP/5.4.43
Location: https://drjohnstechtalk.com/blog/
Content-Length: 2
Content-Type: text/html

So that looks pretty good.

A simple php test:

$ curl −i −k https://127.0.0.1:1443/phpinfo.php

Long output. Basically looks right.

OK. What about the opening WordPress page?

$ curl −i −H ‘Host: drjohnstechtalk.com’ −k https://127.0.0.1:1443/blog/

Yes. Big long output. Looks good. I don’t think this proves that the mySQL/php interface is really working however as that page could be cached since I use a pagecache plugin.

Next test I’d like to run is the Qualys SSLLabs test, but it won’t run on port 1443. Maybe the DigiCERT test will. Yes, it does allow it. And I no longer have the BREACH vulnerability.

A few words about a BREACH test
This prompted me to look at why Digicert felt I was vulnerable to BREACH in the first place. I thnk it’s related to serving compressed objects. So I thought of this simple test. Against my apache 2.2 I can run a query like this:

$ curl −i −k −−compress https://127.0.0.1:1443/blog/|head −10

Date: Fri, 07 Aug 2015 14:02:48 GMT
Server: Apache/2
X-Powered-By: PHP/5.3.3
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 30414
Content-Type: text/html
 
<!DOCTYPE html>
<html lang="en-US">

See that Content-Encoding: gzip? Yet the actual content that begins <!DOCTYPE html… is in plain text and plainly not compressed. So I really wasn’t vulnerable to BREACH at all. The server claimed to be compressing the pages it was sendnig to the browser but in reality it wasn’t. For apache 2.4 the behaviour is basically the same except there is no response header Content-Encoding: gzip returned. This is why it passes Digicert’s BREACH test with flying colors.

Moving on
Next test. Swap apache 2.2 for apache 2.4 by changing listening ports 443 for 1443. Then do the SSLlabs test. I now get an A. well, actually I get an A both before and after the swap.

WordPress test
I’m writing this using my new shiny apache 2.4. With regards to WordPress it all seems to feel the same as before. One small thing I’ve noticed is that I don’t get WordPress news any longer:

RSS Error: WP HTTP Error: There are no HTTP transports available which can complete the requested request.

Hopefully there’s nothing more serious.

php.ini missing
If you blindly copied my config options for compiling php then sooner or later (much later in my case) you’ll realize that you have no valid php.ini file! You will see an error like this when the date() function is called:

Warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected the timezone 'UTC' for now, but please set date.timezone to select your timezone. in

So because I used the config option –with-config-fil
e-path=/usr/local/apache24/php5 I needed to put a php.ini file in that directory and only that directory. For now its contents are:

; DrJ, inspired by http://stackoverflow.com/questions/2184513/php-change-the-maximum-upload-file-size - 12/31/14
; Maximum allowed size for uploaded files.
upload_max_filesize = 10M
 
; Must be greater than or equal to upload_max_filesize
post_max_size = 10M
 
; You'll need this to avoid errors with the Date function
; http://stackoverflow.com/questions/16765158/date-it-is-not-safe-to-rely-on-the-systems-timezone-settings
[Date]
; Defines the default timezone used by the date functions
; http://php.net/date.timezone
date.timezone = America/New_York

Appendix A
mod_ssl error after patching

I have an apache 2.2.21 server on a SLES server. After a system patch (I guess) I realized the apache web server wouldn’t start. It shows this error:

> sudo service apache201 start

Starting httpd (/usr/local/apache2/bin/httpd) httpd: Syntax error on line 54 of /usr/local/apache201/conf/httpd.conf: Cannot load /usr/local/apache201/modules/mod_ssl.so into server: /usr/local/apache201/modules/mod_ssl.so: undefined symbol: ap_map_http_request_error

I had been playing fast and loose and I borrowed the mod_ssl.so from some other system, I guess. I forget which. In other words, I dropped in by hand a mod_ssl.so into the directory /usr/lib64/apache2-prefork. I was using those system-supplied modules paired with my compiled apache. All fine until that patch. So I found another mod_ssl.so frm a different system and tried that one. It worked. Whew. These were both SLES 11 SP 4 systems. The older one (with the mod_ssl.so that still works) is dated April 18th, 2017. The one with the broken mod_ssl.so Dec 29thth 2017. That’s from a uname -a.

References and related articles
A proper guide to installing apache 2.4 on CentOS is https://jasonpowell42.wordpress.com/2013/04/05/install-apache-2-4-4-on-centos-6-4/

Some upgrade issues are covered by apache’s own guide: http://httpd.apache.org/docs/2.4/upgrading.html

Scaling up apache to handle more than a couple hundred simultaneous requests is described in this blog post.

The DigiCERT certificate inspector tool, which is what I was referring to in this post when it comes to scanning for BREACH vulnerabilities, is here.

Categories
Admin Web Site Technologies

A day in the life of an IT Specialist

Intro
I’m not saying every day is like this, and I’m compressing several days into one narrative, but you’ll quickly get the idea and see the difficulties we face. As I like to joke this is why we make the medium bucks.

The single remaining guy responsible for the in-house application environment has finally convinced the powers that be to upgrade IBM WebSphere from a five-year-old version to version 8.5. We traditionally use a web server front-end which I have traditionally supported. So I get tapped to figure out what to do for new web servers.

I get three enormous zip files from him and nothing else.

I happen upon a documentation file containing a link to an IBM web site and not much else. I go there. The installation mentions using IBM Installation Manager. Never heard of it. I ask the guy for that.

Get it and unpack. Try to find documentation on how to install the Installation Manager and none seems to exist. Isn’t that ironic?

I wing it and try to run a file with the promising name of install:

$ sudo ./install

 sudo ./install
00:02.01 ERROR [main] org.eclipse.equinox.log.internal.ExtendedLogReaderServiceFactory safeLogged
  Application error
  org.eclipse.swt.SWTError: No more handles [gtk_init_check() failed]
  org.eclipse.swt.SWTError: No more handles [gtk_init_check() failed]
    at org.eclipse.swt.SWT.error(SWT.java:4387)
    at org.eclipse.swt.widgets.Display.createDisplay(Display.java:913)
    at org.eclipse.swt.widgets.Display.create(Display.java:899)
    at org.eclipse.swt.graphics.Device.<init>(Device.java:156)
    ...
Install:
An error has occurred. See the log file
/tmp/IBMinstall/configuration/1420812667336.log.

The logfile referred to contains this “helpful” information:

!SESSION 2015-01-09 09:11:05.439 -----------------------------------------------
eclipse.buildId=unknown
java.version=1.6.0_24
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=solaris, ARCH=sparc, WS=gtk, NL=en
Framework arguments:  -toolId install -accessRights admin input @osgi.install.area/install.xml
Command-line arguments:  -os solaris -ws gtk -arch sparc -toolId install -accessRights admin input @osgi.install.area/insta
ll.xml
 
!ENTRY org.eclipse.osgi 4 0 2015-01-09 09:11:12.346
!MESSAGE Application error
!STACK 1
org.eclipse.swt.SWTError: No more handles [gtk_init_check() failed]
        at org.eclipse.swt.SWT.error(SWT.java:4387)
        at org.eclipse.swt.widgets.Display.createDisplay(Display.java:913)
        at org.eclipse.swt.widgets.Display.create(Display.java:899)
        at org.eclipse.swt.graphics.Device.<init>(Device.java:156)
        at org.eclipse.swt.widgets.Display.<init>(Display.java:497)
        at org.eclipse.swt.widgets.Display.<init>(Display.java:488)
        at org.eclipse.ui.internal.Workbench.createDisplay(Workbench.java:669)
        at org.eclipse.ui.PlatformUI.createDisplay(PlatformUI.java:161)
        at com.ibm.cic.agent.internal.ui.AgentUIApplication.initDisplay(AgentUIApplication.java:140)
        at com.ibm.cic.agent.internal.ui.AgentUIApplication.launch(AgentUIApplication.java:162)
        at com.ibm.cic.agent.internal.ui.AgentUIApplication.start(AgentUIApplication.java:64)
        at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:353)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:180)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:629)
        at org.eclipse.equinox.launcher.Main.basicRun(Main.java:584)
        at org.eclipse.equinox.launcher.Main.run(Main.java:1438)
        at org.eclipse.equinox.launcher.Main.main(Main.java:1414)

The references to Display hint that my display is goofed up. Which it is. I have no X display.

So I have to export the DISPLAY to another utility server where I can run vncserver.

Oops. That server was rebooted and so there is no vncserver currently running. I launch that:

$ vncserver :2

Now I can connect to it from my desktop using the VNC client, fire up an xterm and allow others to export their displays to it:

$ xhost +

Now I go back to Solaris and set my DISPLAY environment variable:

$ export DISPLAY=vncserver_name:2

And re-install. This time it comes up. The screen dialogs are very sluggish but very simple. I get it going just before 9:30 AM. The status bar creeps over to the right veerrrry slowly. at 10 AM it is finally done – for a package of size 297 MB! But I can do other work in the meantime. Hey, they can’t do backups any longer on a firewalled subnet. may be a problem with resolving the backup server’s name in this domain. Can I look into it? Yes, the domain name is missing when I query the authoritative nameservers. The guy next to me, I happen to know, is the administrator of this special domain. I ask him to look into it.

Meanwhile I unzip disk 1 of the WAS 8.5 download and hunt for the documentation. I find it in readme_plugins/en/readme_en.html. It doesn’t have much, just a few links to IBM web sites. After a few wrong leads I decide there is no direct link. I want to install the plugin file. So I have to interact with the online documentation a bit to get what I want. The documentation is thorough to the point of being bloated and effectively masks whatever it is you actually need out of it. I think I am getting close now after about 15 clicks and skimming loads of crap. The bread crumb trail looks like this so far:

WebSphere Application Server Network Deployment 8.5.5
Network Deployment (Distributed Operating Systmes), Version 8.5
Setting up intermediary services (who knew?)
Implementing a web server plugin
Installing and configuring web server plugins
Installing and uninstalling the Web Server Plug-ins on distributed operating systems

I’m still not sure I’ve struck meat yet. I just feel I am getting close now! No, actually there is another level:

Installing the Web Server plugins using the GUI

From this document, which actually contains some useful information, I get the imp[ression that I may need a repository set up, whatever that is.

I find and launch the IBM Installation Manager regardless to see what it does. I found its path as /opt/IBM/InstallationManager/eclipse/IBMIM. Click on the Install option and sure enough it complains I have no repository setup. It offers a link to do that.

After some futzing it seems to lead me to click on a repository config file in /opt/IBM/InstallationManager/eclipse/repository.config. But that may be a fools errand because when I re-launch it says the repository is not connected. Huh?

So then I try to specify a URL as repository, but to connect that I need an IBM username/password which i don’t have. I ask my colleague for one.

Meanwhile I re-examine the unzipped 1 of 3 zip file for WAS 8.5 and I see a repsitory.config file there! So after some fumbling with the slow and awkward Installation Manager GUI I manage to indicate that as my repository config file and delete the original one I had configured. This looks promising. Now I see an option to select IBM WebServer plugins. Looking good.

Interruption. You know that SHA2 certificate you got last year? We don’t think it’s really gong to work and can you get an SHA1 one instead? I am doubtful at this late stage but I promise to ask my contacts and fire off some emails.

The installation needs disk2 so I have unzip that one; then disk3. Now I’m out of space and move things around before unzipping that one. I am soon able to hit the Install button and seven minutes later the 389 MB package is installed.

I see it hasn’t asked me which web server I use and where it is and all that. So clearly I need some more steps. Rummaging around I come across /opt/IBM/WebSphere8.5/Plugins/bin/ConfigureApachePlugin.sh, which sounds pretty promising.

I run that and see there are a bunch of switches I have to provide values for. No problem. I get those and it runs. I examine what it has done to my config file and it looks partially promising and partially puzzling. It relies on an environment variable which I don’t think it has defined.

I stop the server and it already complains about that very thing:

httpd: Syntax error on line 344 of /usr/local/apache203/conf/httpd.conf: Syntax error on line 183 of /usr/local/apache203/conf/vhosts/secure-siteinfo.conf: Cannot load /usr/local/apache203/${WAS_PLUGIN_DRIVER} into server: ld.so.1: httpd: fatal: /usr/local/apache203/${WAS_PLUGIN_DRIVER}: open failed: No such file or directory

I define that variable. And try to stop it again. The next error kind of scares me:

httpd: Syntax error on line 344 of /usr/local/apache203/conf/httpd.conf: Syntax error on line 183 of /usr/local/apache203/conf/vhosts/secure-siteinfo.conf: Cannot load /opt/IBM/WebSphere8.5/Plugins/bin/64bits/mod_was_ap22_http.so into server: ld.so.1: httpd: fatal: /opt/IBM/WebSphere8.5/Plugins/bin/64bits/mod_was_ap22_http.so: wrong ELF class: ELFCLASS64

To me that hints I may have the wrong architecture installed. I run some control tests:

$ file /opt/IBM/WebSphere8.5/Plugins/bin/64bits/mod_was_ap22_http.so

/opt/IBM/WebSphere8.5/Plugins/bin/64bits/mod_was_ap22_http.so:  ELF 64-bit MSB dynamic lib SPARCV9 Version 1, dynamically linked, not stripped

and now compared to my apache binary:

$ file /usr/local/apache2/bin/httpd

/usr/local/apache2/bin/httpd:   ELF 32-bit MSB executable SPARC Version 1, dynamically linked, not stripped

I check with the system administrator if he had ever provided me a 64-bit apahce package for Solaris. After some checking we realize that Solaris 10 does provide an apache package but it is 32-bit.

I have an idea. I can simply change the path to the shared object file in my environment definition:

export WAS_PLUGIN_DRIVER=/opt/IBM/WebSphere8.5/Plugins/bin/32bits/mod_was_ap22_http.so

I had originally specified 64bits. Maybe this will be compatible. My first thought is that I installed the wrong package and would have to ask for a different download.

Yess! It now stops. And it starts. And I can access its homepage.

Now go into its config and change its home page to the same as used by the Sun Java System web server.

Find a page that actually calls out to WebSphere by examining the log files and grepping for js (just a hunch). I find something. Try to reproduce it with curl on the real web server and I get a not found. Hmm. Work harder to match up the host header to the vhosts mentioned in the plugin config file. Specifying the right host it gives me a redirect and sets some cookies. I know the web server isn’t programmed to do that so I must have reached the back-end WebSphere app server and now I have something to test with. Test against the port running apache with this WAS config file and it produces the same result! A redirect and some cookies. Great. The hardest part is over. Now a control. We’ll remove the plugin config line in the apache config and re-try it. Yup. 404 not found. We really are communicating to the app server.

No way I am going to go through that pain for each and every server where this is needed. I’ll just tar up the needed files and untar them on any server where this is needed.

But I wonder if I should use the provided apache instead.

Interruption. We’ve received a corrupt pdf file in email two months ago. The vendor is mad at us because we are the only ones with this problem. Could our systems have corrupted an attachment? This is kind of an interesting question and deserves some rumination. The quick reaction is no we don’t do that. But years of experience tell me that exceptions abound. I open the attachment. Yup, corrupted. I save the file in an effort to examine the bytes. Then I see it has 0 length, That’s peculiar. I’ve never seen that around here. Then I think to check our mail server log files two months back for their record. I quickly find it and see that its size was reported as 34000 bytes. That strikes me as kind of large for a message with no attachment, but kind of small for a pdf attachment. I share my results with the requester.

Answer: they can still issue an SHA1 CERT. But probably only one which has a year’s duration. I tell the customer for this certificate that all is not rosy as they will probably use an obscure CA which is not accepted by all his customers, so there is no way out without experiencing some pain here.

Unix admin tells me they’re now getting alerts about running out of disk space on the filesystem and system where I put my WebSphere installation downloads. I move another one of those puppies (1 GB in size) to /tmp.

Categories
Admin Apache Security SLES Web Site Technologies

RSA Web Agent Installation: what might go wrong

Intro
As usual I ran into a few problems installing the RSA Web agent for a client. With this documentation I hope to jog my memory for my next installation or help someone else out who is experiencing the same problems.

The details
I was installing it on on SLES 11 system, Web Agent version 7.1.

So I ran the CD/install program as root and went through the prompts for the initial setup. I tried to laucnh firefox at the end, which it couldn’t, but I don’t think that is significant. I start up the web server. The error.log file begins to fill up! It looks like this:

acestatus: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
rpc_server 2389 started by 2379
RSALogoffCookieService: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file o
r directory
AceShutdown try to kill process 2389
signal 15 received
acestatus: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
RSALogoffCookieService: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file o
r directory
start child 2403
[Mon Aug 18 16:17:55 2014] [notice] Apache/2.2.27 (Unix) mod_rsawebagent/7.1.0[639] DAV/2 PHP/5.2.14 with Suhosin-Patch con
figured -- resuming normal operations
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2403 end
start child 2409
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2409 end
start child 2410
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2410 end
start child 2411
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2411 end
start child 2412
Cannot register service: RPC: Authentication error; why = Client credential too weak
unable to register (300760, 1).child 2412 end
start child 2413
...

Not good.

So I eventually realize that my web server is running as user wwwrun and the RSA web agent stuff I installed as root and its directory, rsawebagent, is owned by userid 40959 – there was no attempt by the installer to match that up to the user the web server runs as. So I try a fix by hand like this:

$ chown -R wwwrun rsawebagent

Success! That succeeds in getting rid of the repeating RPC error. Now the error.log file has only a modest level of errors:

acestatus: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
rpc_server 27766 started by 27756
RSALogoffCookieService: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
AceShutdown try to kill process 27766
signal 15 received
acestatus: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
RSALogoffCookieService: error while loading shared libraries: libaceclnt.so: cannot open shared object file: No such file or directory
start child 27780
[Mon Aug 18 16:25:00 2014] [notice] Apache/2.2.27 (Unix) mod_rsawebagent/7.1.0[639] DAV/2 PHP/5.2.14 with Suhosin-Patch configured -- resuming normal operations

But the thing is, it actually, mostly kind of, seems to work. You see a promising Authentication Succeeded screen in your browser after logging in to the home page. But then it directs you back to the RSA login screen. I was actually stuck on this point for a long time.

The error.log file also looks encouraging at this point:

[Mon Aug 18 16:27:28 2014] [notice] Authentication succeeded User: drj.

My insight today is to tackle the libaceclnt.so problem. I actually ran a trace of the startup to see where it was looking for that file so I could put it there. It was looking in system directories like these:

[pid 31974] open("/usr/lib64/tls/x86_64/libaceclnt.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 31974] stat("/usr/lib64/tls/x86_64", 0x7fff93b721b0) = -1 ENOENT (No such file or directory)
[pid 31974] open("/usr/lib64/tls/libaceclnt.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 31974] stat("/usr/lib64/tls", 0x7fff93b721b0) = -1 ENOENT (No such file or directory)
[pid 31974] open("/usr/lib64/x86_64/libaceclnt.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 31974] stat("/usr/lib64/x86_64", 0x7fff93b721b0) = -1 ENOENT (No such file or directory)
[pid 31974] open("/usr/lib64/libaceclnt.so", O_RDONLY) = -1 ENOENT (No such file or directory)
...

So I decided to make a soft link to it from /usr/lib64 such that:

 libaceclnt.so -> /usr/local/apache202/rsawebagent/libaceclnt.so

Note that my ServerRoot was /usr/local/apache202.

Now when I start up my apache202 instance I have this in error.log:

rpc_server 28874 started by 28860
grep RSALogoffCookieService /proc/*/cmdline | sed 's/\/cmdline.*\/proc\// /g' | sed 's/\/cmdline.*/ /'  | sed 's/.*\/proc\// /' | sort -u
start child 28877
grep RSALogoffCookieService /proc/*/cmdline | sed 's/\/cmdline.*\/proc\// /g' | sed 's/\/cmdline.*/ /'  | sed 's/.*\/proc\// /' | sort -u
AceShutdown try to kill process 28874
signal 15 received
grep RSALogoffCookieService /proc/*/cmdline | sed 's/\/cmdline.*\/proc\// /g' | sed 's/\/cmdline.*/ /'  | sed 's/.*\/proc\// /' | sort -u
start child 28913
[Mon Aug 18 16:36:23 2014] [notice] Apache/2.2.27 (Unix) mod_rsawebagent/7.1.0[639] DAV/2 PHP/5.2.14 with Suhosin-Patch configured -- resuming normal operations

And best of all – it actually works!

I get the RSA authentication page initially. I log on and get redirected to the actual server home page. The access.log file records my username in the access line.

Additional error observed months later
You know that symptom I described above? You see a promising Authentication Succeeded screen in your browser after logging in to the home page. But then it directs you back to the RSA login screen. My web server had been running fine for over a month when all of a sudden it behaved that way again. Confounding. So I put on my big boy pants and did an strace. Nothing popped out at me, but I was struck by frequent access to an htdocs filepath. What’s so unusual about that? I don’t use htdocs in my configurations! So where was that coming from? I re-checked my configuration. OK, this is embarrassing. I have a sweeping include statement in my top-level httpd.conf file:

# pick up all vhosts
Include conf/vhosts/*.conf

It seemed like a good idea at the time. In my conf/vhosts directory I actually had two conf files, my rsaauth.conf but also a dflt.conf!! And the dflt.conf had the references to htdocs, but no references to the RSA authentication. So it was being used to establish the location of the home directory and the other conf file to fix the authentication type, I guess.

I removed the dflt.conf file, restarted and everything began to work once again. Whew!

RPC errors returned after a few months
After a year or so of running the RPC errors mentioned above returned and I never could figure out why and I no longer needed this service so I didn’t pursue it.

Conclusion
A few errors were observed installing RSA Web Agent v 7.1 on SLES Linux. I had had similar problems on Redhat as well. I finally found some solutions and now they’re ready to use it!

References
This write-up is partially related to my blog post of installing multiple apache instances.