Categories
Admin Network Technologies

The IT Detective Agency: the case of the mysterious reset

Intro
An F5 BigIP load balancer equipped with web application firewall worked for everyone, except one app used by one customer. What was going wrong?


Packet trace

I always do a packet trace when there is nothing else to go on, as is so often the case these days. Packet traces themselves are getting increasingly complex, what with encrypted communications and multiple connections, etc. In this case there seemed to be a single TCP connection which was of concern, but it was encrypted (SSL traffic on tcp port 443).

Well, I have access to the private key so I figured out how to insert that into Wireshark so it could decrypt the packets. Pretty cool – I’ve never done that before.

So the communication got a lot further than I had expected. What I had expected to learn is that there was an incompatibility between supported ciphers of the client and the server such that there were no overlapping ciphers. But no! That was not the case at all – the packet trace got well beyond that early stage of packet exchanges between client and server.

In fact the client got so far that it sent these (encrypted) HTTP headers, which I was able to decrypt with Wireshark and the servers’ private key:

POST /cgi-bin/java/JHAutomation.do?perform=login HTTP/1.0
Content-type: application/x-www-form-urlencoded
Content-length: 1816
host: drjohnstechtalk.com:443
 
loginRequest=%3C%3Fxml+version+%3D+'1.0'+encoding...

Then right after that I saw the F5 BigIP device send the client a TCP reset (RST) as though it was unhappy about something and wanted to end it right there!

So, still stuck, I searched and found that you can enable logging of the reason for the TCP RST’s on F5 BigIPs:

Enable RST logging
To enable reset logging to the ltm log:
# tmsh
(tmos)# modify /sys db tm.rstcause.log value enable

And this is the error that was logged:

Logged error

Jun 16 08:09:19 local/tmm err tmm[5072]: 01230140:3: RST sent from 8.29.2.75:443 to 50.17.188.196:56985, [0x11d17ec:1804] No available pool member

It’s just a hint of what was wrong, but it was enough to jog my memory. A WAF (web application firewall) policy exists that matches based on hostname and otherwise exits. The hostname entered is drjohnstechtalk.com. Well, clearly that match is pretty darn literal. When we put the same URL into our browser or into curl we could not reproduce the error. But those clients produce a host header with value drjohnstechtalk.com, not drjohnstechtalk.com:443.

So their client, a strange Java-base client, threw in the :443 into the host header and it was not matching the host header match in the WAF policy! So no pool was selected and the fall-through rule was executed, resulting in a TCP RST to the client!

I added an additional host header to match, drjohnstechtalk.com:443

They tested and it worked!

Case closed.

Conclusion
A mysterious TCP RST sent from an F5 load balancer to just one client is explained in great detail. Some valuable networking tools were learned in the process, namely, how to decrypt an encrypted SSL packet trace.

Analysis
Could I stand on my high horse and complain that they were sending a non-standard header and go fix their stupid client? Well that might have felt satisfying but when I looked at the HTTP standard it does permit the port number to be present in that form! So I was the one in the wrong, even from a protocol standpoint.

References and related
F5’s SOL 13223 describing enabling logging for TCP RST packets https://support.f5.com/kb/en-us/solutions/public/13000/200/sol13223.html
HTTP Request header fields are described in this Wikipedia article.

Categories
Admin Apache CentOS Network Technologies Security Web Site Technologies

Idea for free web server certificates: Let’s Encrypt

Intro
I’ve written various articles about SSL. I just came across a way to get your certificates for free, letsencrypt.org. But their thing is to automate certificate management. I think you have to set up the whole automated certificate management environment just to get one of their free certificates. So that’s a little unfortunate, but I may try it and write up my experience with it in this blog (Update: I did it!). Stay tuned.

Short duration certificates
I recently happened upon a site that uses one of these certificates and was surprised to see that it expires in 90 days. All the certificate I’ve ever bought are valid for at least a year, sometimes two or three. But Let’s Encrypt has a whole page justifying their short certificates which kind of makes sense. It forces you to adopt their automation processes for renewal because it will be too burdensome for site admins to constantly renew these certificates by hand the way they used to.

November 2016 update
Since posting this article I have worked with a hosting firm a little bit. I was surprised by how easily he could get for one of “my” domain names. Apparently all it took was that Let’s Encrypt could verify that he owned the IP address which my domain name resolved to. That’s different from the usual way of verification where the whois registration of the domain gets queried. That never happened here! I think by now the Let’s Encrypt CA, IdenTrust Commercial Root CA 1, is accepted by the major browsers.

Here’s a picture that shows one of these certificates which was just issued November, 2016 with its short expiration.

lets-encrypt-2016-11-22_15-03-39

My own experience in getting a certificate
I studied the ACME protocol a little bit. It’s complicated. Nothing’s easy these days! So you need a program to help you implement it. I went with acme.sh over Certbot because it is much more lightweight – works through bash shell. Certbot wanted to update about 40 packages on my system, which really seems like overkill.

I’m very excited about how easy it was to get my first certificate from letsencrypt! Worked first time. I made sure the account I ran this command from had write access to the HTMLroot (the “webroot”) because an authentication challenge occurs to prove that I administer that web server:

$ acme.sh ‐‐issue ‐d drjohnstechtalk.com ‐w /web/drj

[Wed Nov 30 08:55:54 EST 2016] Registering account
[Wed Nov 30 08:55:56 EST 2016] Registered
[Wed Nov 30 08:55:57 EST 2016] Update success.
[Wed Nov 30 08:55:57 EST 2016] Creating domain key
[Wed Nov 30 08:55:57 EST 2016] Single domain='drjohnstechtalk.com'
[Wed Nov 30 08:55:57 EST 2016] Getting domain auth token for each domain
[Wed Nov 30 08:55:57 EST 2016] Getting webroot for domain='drjohnstechtalk.com'
[Wed Nov 30 08:55:57 EST 2016] _w='/web/drj'
[Wed Nov 30 08:55:57 EST 2016] Getting new-authz for domain='drjohnstechtalk.com'
[Wed Nov 30 08:55:58 EST 2016] The new-authz request is ok.
[Wed Nov 30 08:55:58 EST 2016] Verifying:drjohnstechtalk.com
[Wed Nov 30 08:56:02 EST 2016] Success
[Wed Nov 30 08:56:02 EST 2016] Verify finished, start to sign.
[Wed Nov 30 08:56:03 EST 2016] Cert success.
-----BEGIN CERTIFICATE-----
MIIFCjCCA/KgAwIBAgISA8T7pQeg535pA45tryZv6M4cMA0GCSqGSIb3DQEBCwUA
MEoxCzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MSMwIQYDVQQD
ExpMZXQncyBFbmNyeXB0IEF1dGhvcml0eSBYMzAeFw0xNjExMzAxMjU2MDBaFw0x
NzAyMjgxMjU2MDBaMB4xHDAaBgNVBAMTE2Ryam9obnN0ZWNodGFsay5jb20wggEi
MA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC1PScaoxACI0jhsgkNcbd51YzK
eVI/P/GuFO8VCTYvZAzxjGiDPfkEmYSYw5Ii/c9OHbeJs2Gj5b0tSph8YtQhnpgZ
c+3FGEOxw8mP52452oJEqrUldHI47olVPv+gnlqjQAMPbtMCCcAKf70KFc1MiMzr
2kpGmJzKFzOXmkgq8bv6ej0YSrLijNFLC7DoCpjV5IjjhE+DJm3q0fNM3BBvP94K
jyt4JSS1d5l9hBBIHk+Jjg8+ka1G7wSnqJVLgbRhEki1oh8HqH7JO87QhJA+4MZL
wqYvJdoundl8HahcknJ3ymAlFXQOriF23WaqjAQ0OHOCjodV+CTJGxpl/ninAgMB
AAGjggIUMIICEDAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwEG
CCsGAQUFBwMCMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFGaLNxVgpSFqgf5eFZCH
1B7qezB6MB8GA1UdIwQYMBaAFKhKamMEfd265tE5t6ZFZe/zqOyhMHAGCCsGAQUF
BwEBBGQwYjAvBggrBgEFBQcwAYYjaHR0cDovL29jc3AuaW50LXgzLmxldHNlbmNy
eXB0Lm9yZy8wLwYIKwYBBQUHMAKGI2h0dHA6Ly9jZXJ0LmludC14My5sZXRzZW5j
cnlwdC5vcmcvMB4GA1UdEQQXMBWCE2Ryam9obnN0ZWNodGFsay5jb20wgf4GA1Ud
IASB9jCB8zAIBgZngQwBAgEwgeYGCysGAQQBgt8TAQEBMIHWMCYGCCsGAQUFBwIB
FhpodHRwOi8vY3BzLmxldHNlbmNyeXB0Lm9yZzCBqwYIKwYBBQUHAgIwgZ4MgZtU
aGlzIENlcnRpZmljYXRlIG1heSBvbmx5IGJlIHJlbGllZCB1cG9uIGJ5IFJlbHlp
bmcgUGFydGllcyBhbmQgb25seSBpbiBhY2NvcmRhbmNlIHdpdGggdGhlIENlcnRp
ZmljYXRlIFBvbGljeSBmb3VuZCBhdCBodHRwczovL2xldHNlbmNyeXB0Lm9yZy9y
ZXBvc2l0b3J5LzANBgkqhkiG9w0BAQsFAAOCAQEAc4w4a+PFpZqpf+6IyrW31lj3
iiFIpWYrmg9sa79hu4rsTxsdUs4K9mOKuwjZ4XRfaxrRKYkb2Fb4O7QY0JN482+w
PslkPbTorotcfAhLxxJE5vTNQ5XZA4LydH1+kkNHDzbrAGFJYmXEu0EeAMlTRMUA
N1+whUECsWBdAfBoSROgSJIxZKr+agcImX9cm4ScYuWB8qGLK98RTpFmGJc5S52U
tQrSJrAFCoylqrOB67PXmxNxhPwGmvPQnsjuVQMvBqUeJMsZZbn7ZMKr7NFMwGD4
BTvUw6gjvN4lWvs82M0tRHbC5z3mALUk7UXrQqULG3uZTlnD7kA8C39ulwOSCQ==
-----END CERTIFICATE-----
[Wed Nov 30 08:56:03 EST 2016] Your cert is in  /home/drj/.acme.sh/drjohnstechtalk.com/drjohnstechtalk.com.cer
[Wed Nov 30 08:56:03 EST 2016] Your cert key is in  /home/drj/.acme.sh/drjohnstechtalk.com/drjohnstechtalk.com.key
[Wed Nov 30 08:56:04 EST 2016] The intermediate CA cert is in  /home/drj/.acme.sh/drjohnstechtalk.com/ca.cer
[Wed Nov 30 08:56:04 EST 2016] And the full chain certs is there:  /home/drj/.acme.sh/drjohnstechtalk.com/fullchain.cer

Behind the scenes the authentication resulted in these two accesses to my web server:

66.133.109.36 - - [30/Nov/2016:08:55:59 -0500] "GET /.well-known/acme-challenge/EJlPv9ar7lxvlegqsdlJvsmXMTyagbBsWrh1p-JoHS8 HTTP/1.1" 301 618 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
66.133.109.36 - - [30/Nov/2016:08:56:00 -0500] "GET /.well-known/acme-challenge/EJlPv9ar7lxvlegqsdlJvsmXMTyagbBsWrh1p-JoHS8 HTTP/1.1" 200 5725 "http://drjohnstechtalk.com/.well-known/acme-challenge/EJlPv9ar7lxvlegqsdlJvsmXMTyagbBsWrh1p-JoHS8" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "drjohnstechtalk.com"

The first was HTTP which I redirect to https while preserving the URL, hence the second request. You see now why I needed write access to the webroot of my web server.

Refine our approach
In the end iI decided to run as root in order to protect the private key from prying eyes. This looked like this:

$ acme.sh ‐‐issue ‐‐force ‐d drjohnstechtalk.com ‐w /web/drj ‐‐reloadcmd "service apache24 reload" ‐‐certpath /etc/apache24/certs/drjohnstechtalk.crt ‐‐keypath /etc/apache24/certs/drjohnstechtalk.key ‐‐fullchainpath /etc/apache24/certs/fullchain.cer

What’s a nice feature about acme.sh is that it remembers parameters you’ve typed by hand and fills them into a single convenient configuration file. So the contents of mine look like this:

Le_Domain='drjohnstechtalk.com'
Le_Alt='no'
Le_Webroot='/web/drj'
Le_PreHook=''
Le_PostHook=''
Le_RenewHook=''
Le_API='https://acme-v01.api.letsencrypt.org'
Le_Keylength=''
Le_LinkCert='https://acme-v01.api.letsencrypt.org/acme/cert/037fe5215bb5f4df6a0098fefd50b83b046b'
Le_LinkIssuer='https://acme-v01.api.letsencrypt.org/acme/issuer-cert'
Le_CertCreateTime='1480710570'
Le_CertCreateTimeStr='Fri Dec  2 20:29:30 UTC 2016'
Le_NextRenewTimeStr='Tue Jan 31 20:29:30 UTC 2017'
Le_NextRenewTime='1485808170'
Le_RealCertPath='/etc/apache24/certs/drjohnstechtalk.crt'
Le_RealCACertPath=''
Le_RealKeyPath='/etc/apache24/certs/drjohnstechtalk.key'
Le_ReloadCmd='service apache24 reload'
Le_RealFullChainPath='/etc/apache24/certs/fullchain.cer'

References and related
Examples of using Lets Encrypt with domain (DNS) validation: How I saved $69 a year on certificate cost.
The Let’s Encrypt web site, letsencrypt.org
When I first switched from http to https: drjohnstechtalk is now an encrypted web site
Ciphers
Let’s Encrypt’s take on those short-lived certificates they issue: Why 90-day certificates
acme.sh script which I used I obtained from this site: https://github.com/Neilpang/acme.sh
CERTbot client which implements ACME protocol: https://certbot.eff.org/
IETF ACME draft proposal: https://datatracker.ietf.org/doc/draft-ietf-acme-acme/?include_text=1

Categories
Admin Network Technologies

Strange problem with Internet fiber connection

Intro
Yesterday the company I’ve been consulting for had a partial outage with their multi-gigabit fiber connection with TWC business class in North Carolina. We’ve never seen an outage with these characteristics.

The details
The outage was mostly unnoticed but various SiteScope monitors that fetch web pages were periodically going off and then working again. So it wasn’t a hard outage. How do you pin something like that down?

It’s easiest to relate to a web site whose IP address isn’t constantly changing, which pretty much rules out the majors like google.com or microsoft.com. I actually used my own site – it’s running on good infrastructure in Amazon’s data center and its IP is fixed. And yes I was occasionally seeing timeouts. Could it be simply waiting for a DNS lookup? That would explain everything. So I ran verbosely:

$ curl ‐vv www.drjohnstechtalk.com

* About to connect() to www.drjohnstechtalk.com port 80 (#0)
*   Trying 50.17.188.196...

That response came quickly, then it froze. So I knew it wasn’t a DNS resolution problem. This could have also been shown by doing a trace, but the curl method was a faster way to get results.

I decided to put a max_time limit on curl of one second just to get a feel for the problem, running this command frequently:

$ curl ‐m1 www.drjohnstechtalk.com

After all when the web site is working and the gigabit connection is working, the answer comes back in 60 msec. So 1 second should be more than enough time.

So while there was some of this:

$ curl ‐m1 www.drjohnstechtalk.com

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://drjohnstechtalk.com/blog/">here</a>.</p>
<hr>
<address>Apache/2 Server at www.drjohnstechtalk.com Port 80</address>
</body></html>

there was also some of this:

$ curl ‐m1 www.drjohnstechtalk.com
curl: (28) connect() timed out!

We determined on our perimeter firewall that it was passing all packets.

Again, the advantage of a fixed, rarely used IP address is that you can throw it into a trace statement and not get overwhelmed with noise. So I could see from a trace taken during one of those timeouts that we weren’t getting a response to our SYN packet.

So I tried to use a SYN packet geneator to reproduce the problem and learn more. scapy. I’ve just improved my write-up of scapy to reflect some of the new ways I used it yesterday!

To begin with, I didn’t know much about scapy except what I had previously posted. But it worked every time! I could not reproduce the problem no matter how hard I tried:

>>> sr(IP(dst=”drjohnstechtalk.com”)/TCP(dport=80))

Begin emission:
........................................................................................Finished to send 1 packets.
.............................................*
Received 134 packets, got 1 answers, remaining 0 packets
(<Results: TCP:1 UDP:0 ICMP:0 Other:0>, <Unanswered: TCP:0 UDP:0 ICMP:0 Other:0>)

I racked my brains, what could be different about these scapy packets?? Also not being a TCP expert the answer is many, many things. But did I give up? No! I quickly scanned the scapy for dummies tutorial and realized a few things. I assumed scapy was randomizing its source port the way all other TCP applications do, but it wasn’t! You need a sport=RandShort() argument in the TCP section to do that. Who knew? So I had been sending packets from the same source port, specifically 20. When I switched it to a randomized port I quickly reproduced the timeout issue! And most amazingly, when I encountered a port that didn’t work, it consistently didn’t work – every single time. Its neighboring ports were fine. Some of its neighbors’ neighbors didn’t work, also consistently.

So for instance
>>> sr(IP(dst=”drjohnstechtalk.com”)/TCP(dport=80,sport=21964))

Begin emission:
........................................................................................Finished to send 1 packets.
........................................

was consistently not working. Same for source port 21962. Source port 21963 was consistently fine.

Well, this explains the intermittent SiteScope errors.

Gotta be the firewall

I know what you’re thinking. Routers don’t care about TCP port information. That’s much more like a firewall connection table thing. And I agree, but our firewall trace showed these SYN packets getting through, and no SYN_ACK coming back.

It’ way too difficult to do a trace on a Cisco router, but I looked at the router config and didn’t see anything amiss.

So I called the ISP, TWC business class. I got a pre-recorded message talking about outages in North Carolina, where this link just happens to be located! The coincidence seems too great. I still don’t have clarity from them – I guess customer service is not their strong suit. They haven’t even bothered to get back to me 24 hours later (and this is for a major fiber circuit).


References and related

The amazingly customizable packet generator known as scapy.
Probably the best write-up of scapy is this scapy for dummies PDF.

Categories
Network Technologies TCP/IP

Quick Tip: Why Windows traceroute works better than Linux

Intro
We noticed when debugging with the always useful tool traceroute (tracert on Windows systems) that we got more responsive results from Windows than from a Linux server on the same or nearby network. Finally I decided to look into it, my Linux pride at stake!

what it is is that Windows tracert utility uses ICMP by default whereas Linux traceroute uses UDP packets. We had been testing on a corporate Intranet where the default firewall policy was to allow ICMP but deny everything else.

The fix
Just add a ‐I switch to your linux traceroute command and the results will be as good as Windows. That switches the packet type to ICMP.

On the Internet where that type of firewall is more uncommon it probably won’t make that much a difference. But on an Intranet it could be just the thing you need.

Categories
Admin

The IT Detective agency: Correct password fails to unlock laptop

Intro
This has happened to me on two different laptops so maybe it’s a thing. I type in the known, correct password to unlock it and it comes back with invalid password or something similar.

Then I finally figured out what was happening when I observed more carefully.

The solution
I typed in the characters of the password one-by-one and watched the password screen:

···

Then I typed in the fourth character and … nothing! No · was echoed back to the Password field! No wonder it thought I was typing in the wrong password. Technically, from its viewpoint I was. So I typed the fourth character harder until it finally responded by echoing a &middot, and proceeded with the rest of the characters which went more smoothly. Then the laptop unlocked.

I think the two laptops actually suffer from two different problems. The one may have a sticky key that doesn’t always press correctly – a key that I use as part of my password. The other laptop is just bogged down by a lot of bloatware, to the point where it is so slow it doesn’t echo characters back to the screen as fast as I type them in. I learned to repeat the missed character or even re-type it to make it take. Hard to believe but it’s true.

Conclusion
The mystery of the correctly typed password not unlocking the laptop screen has been resolved. Character echo was not occurring correctly, necessitating repeating keys while looking at the results on screen.

Categories
Security Web Site Technologies

Microsoft Exchange Online Protection is not PCI compliant

Intro
Microsoft’s cloud offering, Office 365, is pretty good for enterprises. It’s clear a lot of thought has been put into it, especially the security model. But it isn’t perfect and one area where it surprisingly falls short is compliance with payment card industry specifications.

The details
You need to have PCI (payment card industry) compliance to work with credit cards.

An outfit I am familiar with did a test Qualys run to check out our PCI failings. Of course there were some findings that constitute failure – simple things like using non-secure cookies. Those were all pretty simply corrected.

But the probe also looked at the MX records of the DNS domain and consequently at the MX servers. These happened to be EOP servers in the Azure cloud. They also had failures. A case was opened with Microsoft and as they feared, Microsoft displayed big-company-itis and absolutely refused to do anything about it, insisting that they have sufficient security measures in place.

So they tried a scan using a different security vendor, Comodo, using a trial license. Microsoft’s EOP servers failed that PCI compliance verification as well.

More details
The EOP servers gave a response when probed by UDP packets originating from port 53, but not when originating from some random port. That doesn’t sound like the end of the world, but there are so many exploits these days that I just don’t know. As I say Microsoft feels it has other security measures in place so they don’t see this as a real security problem. But explanations do not have a place in PCI compliance testing so they simply fail.

Exception finally granted
Well all this back-and-forth between Qualys and Microsoft did produce the desired result in the end. Qualys agreed with Micrsoft’s explanation of their counter-measures and gave them a pass in the end! So after many weeks the web site finally achieved PCI-DSS certification.

Conclusion
Microsoft’s cloud mail offering, Exchange Online Protection, is not PCI compliant according to standard testing by security vendors. As far as I know you literally cannot use EOP if you want your web site to accept credit card payments unless you get a by-hand exception from the security vendor whose test is used, which is what happened in this case after much painful debate. Also as far as I know I am the first person to publicly identify this problem.

Categories
Apache Web Site Technologies

GetSimple CMS – a non-SQL DB content management system

Intro
I’ve been looking at GetSimple CMS lately. It allows you to do some basic things so in that sense it’s pretty good.

The details
I know nothing about CMSes, except maybe a tiny bit about WordPress since I use it, but only in a really basic way. I like the idea of a simple CMS however – one that is simple to install and doesn’t even require a database, so this fit the bill.

First problem
My first attempts to use it met with failure. I could create new pages but when I saved them the characters I had typed into the pages were lost, just emptied out. I even went so far as to identify the XML file where these pages are stored and look at the raw xml files. The CDATA that should have contained my typing was blank! At first I was convinced that my version of php was to blame. But I tried different version, 5.3.3, 5.3.17. I still got the problem. So finally I decided that it had to be the fact that I compile my own apache and use somewhat different configuration compared to system defaults. So I switched to a system-supplied apache web server and sure enough, it began to work. Although I would like to believe it’s really something in my configuration, I couldn’t find such a thing and gave up trying. For the record the pages are stored here: <htdocs>/data/pages/<page-name>.xml

Security aspect
I would say this CMS is vulnerable to brute-force attacks. I suggest to randomize the location of the admin login by using the GST-adminlock plugin. You want to have multiple users? The version I used, 3.3.8 accommodates that! It needs just a little fudging to get going. Tie authentication to a back-end database? I doubt that is possible. i did check and it does store the login passwords encrypted. Not sure if there is salt added or not…

Plugins
I’m especially bad at evaluating plugins. I guess there should be better themes out there because the system-supplied one is so basic. It’s strange that you have to manually get the downloaded plugin onto the server with sftp or some similar tool. Unless I’m missing something, which I probably am.

Conclusion
GetSimple CMS may be good for a 15 – 20 page simple site. It doesn’t need a back-end database, doing all that sort of thing through use of PHP’s XML capabilities. Quite a few plugins exist which extend its functionality. The available documentation is pretty good. Security is adequate and should be made better by the person implementing it. The last thing we need is another way for the bad guys to take over legitimate web sites and inject malicious content.

References and related
Getsimple CMS info is here: http://get-simple.info/
I mentioned compiling my own apache. Usually it all works out for me, but not for use with GetSimple CMS. anyway, here is how I compiled a recent version of apache 2.4.

Categories
Admin Network Technologies

The IT Detective Agency: spotty ISP performance traced to Google Drive

Intro
Sometimes things are not what they seem to be on the surface. I was getting lousy PING times to 8.8.8.8 at home for weeks with Centurylink, my ISP. The problem was especially bad at night. I chalked it up to too much competition for their bandwidth for streaming Netflix and other on-demand media. Their customer support was useless. They only knew enough to walk you through power-cycling your DSL modem.

ISP # 2
Hitting a brick wall, I decided after all these years to switch ISPs to Service Electric Broadband Cable. My standalone test with their modem showed good throughput. Something like 29 mbps download and 3 mbps upload using speedtest.net. Then after a few days I got around to putting more of my home network on it and the service degraded as well. could it be that both ISPs were bad at the same time? PING times were between 200 – 900 msec, with plenty of timeouts in between.

Additional symptoms

I noticed that if I power-cycled the modem things ran pretty well for a minute or two, then started going downhill again. I had observed the same thing when they had me power cycle the DSL modem. Then I noticed that when I restarted my laptop the situation improved for awhile, then degraded. So it finally dawned on me that this one laptop was correlated with the problem. In Windows 10 Task Manager it has a convenient process view that allows you to view the top bandwidth-consuming applications (click on Network).

Suspicions raised around Google Drive
There I saw that Google Drive was consuming 3 mbps! Is that a lot or not? It all depends on whether it is downloading or uploading files. In my case I happened to put several multi-GByte movie files on my laptop on the Google Drive. So clearly it was trying to upload them to the cloud. Plus, worse, the power management was such that the laptop was only powered on for a few hours – not long enough for any of those files to finish uploading!!

The short-term solution
Google Drive has a feature that allows you to limit bandwidth usage. When I set that I wanted to keep the upload going but I also wanted to work. I settled on upload of about 240 KB/sec and download of 2000 KB/sec. I figured it was high enough to use most available bandwidth, but save some for others. And I changed my power management scheme to never hibernate when plugged in.

The results
While the files were uploading performance of PING was still quite impacted, but I was doing VPN pretty comfortably so I left it alone. It was certainly better. When all files finished uploading after a few days my performance with the new ISP was great.

Why did rebooting the DSL modem help?
During reboot no Internet connection is available and Google Drive goes into an error state No connection. Periodically it checks if the Internet connection is working. Finally after a modem reboot it does begin to work, then eventually Google Drive will realize that. But even once it does, it starts from the beginning and scans all the files to see what needs to be sycned, and that takes awhile. So only after a few minutes does it begin to use your Internet bandwidth. Meanwhile you think everything is good, until it very quickly turns bad again!

Conclusion
A problem with an ISP at night is explained by the presence of an application that was taking all available bandwidth for doing massive file uploads which never completed! A change to a new ISP was not such a bad thing as it is faster and cheaper.

Categories
Python Web Site Technologies

Superimpose crosshairs plus grid marks on an image

Intro
My previous effort showed how to superimpose just a crosshairs on an image. That I was able to do within CSS. When it came time to add tick marks to those crosshairs I felt that CSS was getting too complicated. I had only a short time and I honestly couldn’t figure it out.

So I decided on an alternate appraoch of superimposing two images, one of which has transparency. Then it would come down to creating a suitable image that has the crosshairs and tick marks.

The details
I felt this was doable in python and it was, but I needed to add the Python Imaging Library (PIL). On CentOS I simply did a

$ sudo yum install python-imaging

The python program
Here is my python program.

# from http://stackoverflow.com/questions/8376359/how-to-create-a-transparent-gif-or-png-with-pil-python-imaging
# drJ 3/2016
# install PIL: yum install python-imaging
#
from PIL import Image, ImageDraw
width = 640
height = 480
halfwidth=width/2
halfheight=height/2
 
ticklength=10
starttickx=halfwidth - ticklength/2
endtickx=halfwidth + ticklength/2
startticky=halfheight - ticklength/2
endticky=halfheight + ticklength/2
 
img = Image.new('RGBA',(width, height))
 
draw = ImageDraw.Draw(img)
# crosshairs
draw.line((halfwidth, 0, halfwidth, height), fill=252, width=2)
draw.line((0, halfheight, width, halfheight), fill=252, width=2)
# tick marks
 
def my_range(start, end, step):
    while start <= end:
        yield start
        start += step
 
# top to bottom ticks
for y in my_range(0, 480, 30):
    draw.line((starttickx, y, endtickx, y), fill=252, width=2)
# left to right ticks
for x in my_range(20, 640, 30):
    draw.line((x, startticky, x, endticky), fill=252, width=2)
 
img.save('crosshairs.gif', 'GIF', transparency=0)

The web page
We actually have two iamges side-by-side becuase we have two cameras.

<html>
<head>
<style type="text/css">
<!-- DrJ 1/2016
Note that Firefox's implementation of linear-gradient is broken and requires us to
use repeat linear gradient 
Some fairly lousy documentation on repeat linear gradient is here:
https://developer.mozilla.org/en-US/docs/Web/CSS/repeating-linear-gradient
 
-->
 
#jpg1 {
 
    position:absolute;
 
    top:0;
 
    left:0;
 
    z-index:1;
 
}
 
#gif2 {
 
    position:absolute;
 
    top:10;
 
    left:11;
 
 
    /*
 
    set top and left here
 
    */
 
    z-index:1;
}
 
#gif3 {
 
    position:absolute;
 
    top:10;
 
    left:655;
 
 
    /*
 
    set top and left here
 
    */
 
    z-index:1;
}
 
</style></head>
<body>
<table><tr><td>
<div>
  <img id="jpg1" src="http://dcs-931l-ball/mjpeg.cgi" width="640" height="480" />
  <img id="gif2" src="crosshairs.gif" />
</div>
<td>
  <img src="http://dcs-931l-target/mjpeg.cgi" width="640" height="480" />
  <img id="gif3" src="crosshairs.gif" />
</tr></table>
</body></html>

To be continued…

Categories
Admin Network Technologies

iRule script examples

Intro
F5’s BigIP load balancers have an API accessible via iRules which are written in their bastardized version of the TCL language.

I wanted to map all incoming source IPs to a unique source IP belonging to the load balancer (source NAT or snat) to avoid session stealing issues encountered in GUIxt.

First iteration
In my first approach, which was more proof-of-concept, I endeavored to preserve the original 4th octet of the scanner’s IP address (scanners are the users of GUIxt which itself is just a gateway to an SAP load balancer). I have three unused class C subnets available to me on the load balancer. So I took the third octet and did a modulo 3 operation to effectively randomly spread out the IPs in hopes of avoiding overlaps.

rule snat-test2 {
# see https://devcentral.f5.com/questions/snat-selected-source-addresses-on-a-vs
# and https://devcentral.f5.com/questions/load-balance-on-source-ip-address
# spread things out by taking modulus of 3rd octet
# - DrJ 2/11/16
when CLIENT_ACCEPTED {
# maybe IP::client_addr
set snat_Subnet_base "141"
  set ip3 [lindex [split [IP::client_addr] "."] 2]
  set ip4 [lindex [split [IP::client_addr] "."] 3]
  set offset [expr $ip3 % 3]
  set snat_Subnet [expr $snat_Subnet_base + $offset]
  set newip "10.112.$snat_Subnet.$ip4"
#  log local0. "Client IP: [IP::client_addr], ip4: $ip4, ip3: $ip3, offset: $offset, newip: $newip"
  snat $newip
}
}

It worked for awhile but eventually there were overlaps anyway and session stealing was reported.

The next act steps it up
So then I decided to cycle through all roughly 765 addresses available to me on the LB and maintain a mapping table. Maintaining variable state is tricky on the LB, as is working with arrays, syntax, version differences, … In fact the whole environment is pretty backwards, awkward, poorly documented and unpleasant. So you feel quite a sense of accomplishment when you actually get working code!

rule snat-GUIxt {
# see https://devcentral.f5.com/questions/snat-selected-source-addresses-on-a-vs
# and https://devcentral.f5.com/questions/load-balance-on-source-ip-address
# spread things out by taking modulus of 3rd octet
# - DrJ 2/22/16
 
when CLIENT_ACCEPTED {
# DrJ 2/16
# use ~ 750 addresses available to us in the SNAT pool
#  initialization. uncomment after first run
##set ::counter 0
 
  set clientip [IP::client_addr]
# can we find it in our array?
  set indx [array get ::iparray $clientip]
  set ip [lindex $indx 0]
  if {$ip == ""} {
# add new IP to array
    incr ::counter
# IPs = # IPs per subnet * # subnets = 255 * 3
    set IPs 765
    set serial [expr $::counter % $IPs]
    set subnetOffset [expr $serial / 255]
    set ip4 [expr $serial % 255 ]
    log local0. "Matched blank ip. clientip: $clientip, counter: $::counter, serial: $serial, ip4: $ip4 , subnetOffset: $subnetOffset"
    set ::iparray($clientip) $ip4
    set ::subnetarray($clientip) $subnetOffset
  } else {
# already seen IP
    set ip4 [lindex $indx 1]
    set sindx [array get ::subnetarray $clientip]
    set subnetOffset [lindex $sindx 1]
#    log local0. "Matched seen ip. counter: $::counter, ip4: $ip4 , subnetOffset: $subnetOffset"
  }
  set thrdOctet [expr 141 + $subnetOffset]
  set snat_Subnet "10.112.$thrdOctet"
 
  set newip "$snat_Subnet.$ip4"
#  log local0. "Client IP: [IP::client_addr], indx: $indx, ip4: $ip4, counter, $::counter, ip3: $thrdOctet, newip: $newip"
  snat $newip
# one-time re-set when updating the code...
# Re-set procedure:  uncomment, run, commnt out, run again... Plus set ::counter at the top
#unset ::iparray
#unset ::subnetarray
}
}

Criticism of this approach
Even though there are far fewer users than my 765 addresses, they get their addresses dynamically from many different subnets. So soon the iRule will have encountered 765 unique addresses and be forced to re-use its IPs from the beginning. At that point session stealing is likely to occur all over again! I’ve just delayed the onset.

What I would really need to do is to look for the opportunity to clear out the global arrays and the global counter when it is near its maximum value and the time is favorable, like 1 AM Sunday. But this environment makes such things so hard to program…

A word about the snat pool
I used tmsh to create a snat pool. It looks like this:

snatpool SNAT-GUIxt {
   members {
      10.112.141.0
      10.112.141.1
      10.112.141.2
      10.112.141.3
      10.112.141.4
      10.112.141.5
      10.112.141.6
      10.112.141.7
      10.112.141.8
      10.112.141.9
      10.112.141.10
      10.112.141.11
      10.112.141.12
      10.112.141.13
      10.112.141.14
      10.112.141.15
      10.112.141.16
...

Conclusion
A couple real-world iRules were presented, one significantly more sophisticated than the other. They show how awkward the language is. But it is also powerful and allows to execute some otherwise out-there ideas.

References and related
This article discusses trouble-shooting a virtual server on the load balancer