Don’t ask me why anyone would willingly run IBM AIX, but it happens. And when they do, watch out for network punishment. We dealt with such a case, unfortunately, and we ran into a serious, somewhat obscure network issue and figured out the solution (we think). Maybe someone else can learn from this painful experience. Or maybe we’ll completely forget what we ourselves have done two years from now and find ourselves stepping on the same rake.
So this new AIX server was configured to run an very old application, WebMethods, that makes a lot of database connections as well as connections to external partners for document exchange.
This had been working fine on the old AIX server, but we switched to newer hardware. As much as possible the old configurations were used. Yet this new server just couldn’t keep up with the load. Its queue started building up, connections to the database climbed into the hundreds, and then it just seemed like it was doing nothing at all.
Someone lent me root access so I can join the debugging party. What, no bash shell! Not even a properly configured korn shell. And everything’s just a little different on AIX – nothing is quite how you are accustomed to it. But at least it has tcpdump. I guess they also have their own AIXish utility as well, but I never bothered with that. tcpdump seemed to work. So I quickly began to get a feel from what the application folks were saying about their transfers which weren’t going outbound, and only slowly going inbound. They used port 5443 on one of the interfaces, en5:
# tcpdump -i en5 -n port 5443
And, true, not much was going on.
This went on for a day and things were looking desperate – to the point where we decided to go back to the old hardware! But we never stopped thinking.
Check the traffic to the Oracle database:
# tcpdump -i en0 -n port 1521
No, not that much either.
Try to check system logs, but who knows where those are? The ones I found had absolutely nothing of interest.
Being a DNS guy, I decided to check for DNS traffic:
# tcpdump -i en0 -n udp
(everything else uses tcp so I could get away with this).
Now DNS turned out to be quite chatty – around a dozen entries per second. And a lot of repetition. And a lot of IPv6 queries, labelled as AAAA?. I didn’t like it.
And this jogged my memory. I remembered encountering these IPv6 queries and wanting to turn them off on the old AIX servers. But how to do that??
As in all things, Google (actually DuckDuckGo) is your friend. You modify the /etc/netsvc.conf file. You need an entry like this:
hosts = local4, bind4
To be continued…