An F5 BigIP load balancer equipped with web application firewall worked for everyone, except one app used by one customer. What was going wrong?
I always do a packet trace when there is nothing else to go on, as is so often the case these days. Packet traces themselves are getting increasingly complex, what with encrypted communications and multiple connections, etc. In this case there seemed to be a single TCP connection which was of concern, but it was encrypted (SSL traffic on tcp port 443).
Well, I have access to the private key so I figured out how to insert that into Wireshark so it could decrypt the packets. Pretty cool – I’ve never done that before.
So the communication got a lot further than I had expected. What I had expected to learn is that there was an incompatibility between supported ciphers of the client and the server such that there were no overlapping ciphers. But no! That was not the case at all – the packet trace got well beyond that early stage of packet exchanges between client and server.
In fact the client got so far that it sent these (encrypted) HTTP headers, which I was able to decrypt with Wireshark and the servers’ private key:
POST /cgi-bin/java/JHAutomation.do?perform=login HTTP/1.0 Content-type: application/x-www-form-urlencoded Content-length: 1816 host: drjohnstechtalk.com:443 loginRequest=%3C%3Fxml+version+%3D+'1.0'+encoding...
Then right after that I saw the F5 BigIP device send the client a TCP reset (RST) as though it was unhappy about something and wanted to end it right there!
So, still stuck, I searched and found that you can enable logging of the reason for the TCP RST’s on F5 BigIPs:
Enable RST logging
To enable reset logging to the ltm log:
(tmos)# modify /sys db tm.rstcause.log value enable
And this is the error that was logged:
Jun 16 08:09:19 local/tmm err tmm: 01230140:3: RST sent from 126.96.36.199:443 to 188.8.131.52:56985, [0x11d17ec:1804] No available pool member
It’s just a hint of what was wrong, but it was enough to jog my memory. A WAF (web application firewall) policy exists that matches based on hostname and otherwise exits. The hostname entered is drjohnstechtalk.com. Well, clearly that match is pretty darn literal. When we put the same URL into our browser or into curl we could not reproduce the error. But those clients produce a host header with value drjohnstechtalk.com, not drjohnstechtalk.com:443.
So their client, a strange Java-base client, threw in the :443 into the host header and it was not matching the host header match in the WAF policy! So no pool was selected and the fall-through rule was executed, resulting in a TCP RST to the client!
I added an additional host header to match, drjohnstechtalk.com:443
They tested and it worked!
A mysterious TCP RST sent from an F5 load balancer to just one client is explained in great detail. Some valuable networking tools were learned in the process, namely, how to decrypt an encrypted SSL packet trace.
Could I stand on my high horse and complain that they were sending a non-standard header and go fix their stupid client? Well that might have felt satisfying but when I looked at the HTTP standard it does permit the port number to be present in that form! So I was the one in the wrong, even from a protocol standpoint.
References and related
F5’s SOL 13223 describing enabling logging for TCP RST packets https://support.f5.com/kb/en-us/solutions/public/13000/200/sol13223.html
HTTP Request header fields are described in this Wikipedia article.