Intro
Yes my WordPress instance never gave any problems for years. Then one day my usual username/password wouldn’t log me in! One thing led to another until I realized I was under an ssh brute force attack from Hong Kong. Then I implemented a software that really helped the situation…
The details
Login failure
So like anyone would do, I double-checked that I was typing the password correctly. Once I convinced myself of that I went to an ssh session I had open to it. When all else fails restart, right? Except this is not Windows (I run CentOS) so there’s no real need to restart the server. There very rarely is.
Mysql fails to start
So I restarted mysql and the web server. I noticed mysql database wasn’t actually starting up. It couldn’t create a PID file or something – no space left on device.
No space on /
What? I never had that problem before. In an enterprise environment I’d have disk monitors and all that good stuff but as a singeleton user of Amazon AWS I suppose they could monitor and alert me to disk problems but they’d probably want to charge me for the privilege. So yeah, a df -k showed 0 bytes available on /. That’s never a good thing.
/var/log very large
So I ran a du -k from / and sent the output to /tmp/du-k so I could preview at my leisure. Fail. Nope, can’t do that because I can’t write to /tmp because it’s on the / partition in my simple-minded server configuration! OK. Just run du -k and scan results by eye… I see /var/log consumes about 3 GB out of 6 GB available which is more than I expected.
btmp is way too large
So I did an ls -l in /var/log and saw that btmp alone is 1.9 GB in size. What the heck is btmp? Some searches show it to be a log use to record ssh login attempts. What is it recording?
Disturbing contents of btmp
I learned that to read btmp you do a
> last -f btmp
The output is zillions of lines like these:
root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) root ssh:notty 43.229.53.13 Mon Oct 26 14:56 - 14:56 (00:00) ... |
I estimate roughly 3.7 login attempts per second. And it’s endless. So I consider it a brute force attack to gain root on my server. This estimate is based on extrapolating from a 10-minute interval by doing one of these:
> last -f btmp|grep ‘Oct 26 14:5’|wc
and dividing the result by 10 min * 60 s/min.
First approach to stop it
I’m at networking guy at heart and remember when you have a hammer all problems look like nails 😉 ? What is the network nail in this case? The attacker’s IP address of course. We can just make sure packets originating from that IP can’t get returned form my server, by doing one of these:
> route add -host 43.229.53.13 gw 127.0.0.1
Check it with one of these:
> netstat -rn
Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 43.229.53.13 127.0.0.1 255.255.255.255 UGH 0 0 0 lo 10.185.21.64 0.0.0.0 255.255.255.192 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 0.0.0.0 10.185.21.65 0.0.0.0 UG 0 0 0 eth0 |
Then watch the btmp grow silent since now your server sends the reply packets to its loopback interface where they die.
Short-lived satisfaction
But the pleasure and pats on your back will be short-lived as a new attack from a new IP will commence within the hour. And you can squelch that one, too, but it gets tiresome as you stay up all night keeping up with things.
Although it wouldn’t bee too too difficult to script the recipe above and automate it, I decided it might even be easier still to find a package out there that does the job for me. And I did. It’s called
fail2ban
You can get it from the EPEL repository of CentOS, making it particularly easy to install. Something like:
$ yum install fail2ban
will do the trick.
I like fail2ban because it has the feel of a modern package. It’s written in python for instance and it is still maintained by its author. There are zillions of options which make it daunting at first.
To stop these ssh attacks in their tracks all you need is to create a jail.local file in /etc/fail2ban. Mine looks like this:
# DrJ - enable sshd monitoring [DEFAULT] bantime = 3600 # exempt CenturyLink ignoreip = 76.6.0.0/16 71.48.0.0/16 # [sshd] enabled = true |
Then reload it:
$ service fail2ban reload
and check it:
$ service fail2ban status
fail2ban-server (pid 28459) is running... Status |- Number of jail: 1 `- Jail list: sshd |
And most sweetly of all, wait a day or two and appreciate the marked change in the contents of btmp or secure:
support ssh:notty 117.4.240.22 Mon Nov 2 07:05 gone - no logout support ssh:notty 117.4.240.22 Mon Nov 2 07:05 - 07:05 (00:00) dff ssh:notty 62.232.207.210 Mon Nov 2 03:38 - 07:05 (03:26) dff ssh:notty 62.232.207.210 Mon Nov 2 03:38 - 03:38 (00:00) zhangyan ssh:notty 62.232.207.210 Mon Nov 2 03:38 - 03:38 (00:00) zhangyan ssh:notty 62.232.207.210 Mon Nov 2 03:38 - 03:38 (00:00) support ssh:notty 117.4.240.22 Sun Nov 1 22:47 - 03:38 (04:50) support ssh:notty 117.4.240.22 Sun Nov 1 22:47 - 22:47 (00:00) oracle ssh:notty 180.210.201.106 Sun Nov 1 20:44 - 22:47 (02:03) oracle ssh:notty 180.210.201.106 Sun Nov 1 20:44 - 20:44 (00:00) a ssh:notty 180.210.201.106 Sun Nov 1 20:44 - 20:44 (00:00) a ssh:notty 180.210.201.106 Sun Nov 1 20:44 - 20:44 (00:00) openerp ssh:notty 123.212.42.241 Sun Nov 1 20:40 - 20:44 (00:04) openerp ssh:notty 123.212.42.241 Sun Nov 1 20:40 - 20:40 (00:00) dff ssh:notty 187.210.58.215 Sun Nov 1 20:36 - 20:40 (00:04) dff ssh:notty 187.210.58.215 Sun Nov 1 20:36 - 20:36 (00:00) zhangyan ssh:notty 187.210.58.215 Sun Nov 1 20:36 - 20:36 (00:00) zhangyan ssh:notty 187.210.58.215 Sun Nov 1 20:35 - 20:36 (00:00) root ssh:notty 82.138.1.118 Sun Nov 1 19:57 - 20:35 (00:38) root ssh:notty 82.138.1.118 Sun Nov 1 19:49 - 19:57 (00:08) root ssh:notty 82.138.1.118 Sun Nov 1 19:49 - 19:49 (00:00) root ssh:notty 82.138.1.118 Sun Nov 1 19:49 - 19:49 (00:00) PlcmSpIp ssh:notty 82.138.1.118 Sun Nov 1 18:42 - 19:49 (01:06) PlcmSpIp ssh:notty 82.138.1.118 Sun Nov 1 18:42 - 18:42 (00:00) oracle ssh:notty 82.138.1.118 Sun Nov 1 18:34 - 18:42 (00:08) oracle ssh:notty 82.138.1.118 Sun Nov 1 18:34 - 18:34 (00:00) karaf ssh:notty 82.138.1.118 Sun Nov 1 18:18 - 18:34 (00:16) karaf ssh:notty 82.138.1.118 Sun Nov 1 18:18 - 18:18 (00:00) vagrant ssh:notty 82.138.1.118 Sun Nov 1 17:13 - 18:18 (01:04) vagrant ssh:notty 82.138.1.118 Sun Nov 1 17:13 - 17:13 (00:00) ubnt ssh:notty 82.138.1.118 Sun Nov 1 17:05 - 17:13 (00:08) ubnt ssh:notty 82.138.1.118 Sun Nov 1 17:05 - 17:05 (00:00) ... |
The attacks still come, yes, but they are so quickly snuffed out that there is almost no chance of correctly guessing a password – unless the attacker has a couple centuries on their hands!
Augment fail2ban with a network nail
Now in my case I had noticed attacks coming from various IPs around 43.229.53.13, and I’m still kind of disturbed by that, even after fail2ban was implemented. Who is that? Arin.net said that range is handled by apnic, the Asia pacific NIC. apnic’s whois (apnic.net) says it is a building in Mong Kok district of Hong Kong. Now I’ve been to Hong Kong and the Mong Kok district. It’s very expensive real estate and I think the people who own that subnet have better ways to earn money than try to pwn AWS servers. So I think probably mainland hackers have a backdoor to this Hong Kong network and are using it as their playground. Just a wild guess. So anyhow I augmented fail2ban with a network route to prevent all such attacks form that network:
$ route add -net 43.229.0.0/16 gw 127.0.0.1
A few words on fail2ban
How does fail2ban actually work? It manipulates the local firewall, iptables, as needed. So it will activate iptables if you aren’t already running it. Right now my iptables looks clean so I guess fail2ban hasn’t found anything recently to object to:
$ iptables -L
Chain INPUT (policy ACCEPT) target prot opt source destination f2b-sshd tcp -- anywhere anywhere multiport dports ssh Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain f2b-sshd (1 references) target prot opt source destination RETURN all -- anywhere anywhere |
Indeed, checking my messages file the most recent ban was over an hour ago – in the early morning:
Nov 2 03:38:49 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 62.232.207.210 |
And here is fail2ban doing its job since the log files were rotated at the beginning of the month:
$ cd /var/log; grep Ban messages
Nov 1 04:56:19 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 185.61.136.43 Nov 1 05:49:21 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 5.8.66.78 Nov 1 11:27:53 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 61.147.103.184 Nov 1 11:32:51 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 118.69.135.24 Nov 1 16:57:05 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 162.246.16.55 Nov 1 17:13:17 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 82.138.1.118 Nov 1 18:42:36 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 82.138.1.118 Nov 1 19:57:55 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 82.138.1.118 Nov 1 20:36:05 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 187.210.58.215 Nov 1 20:44:17 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 180.210.201.106 Nov 2 03:38:49 ip-10-185-21-116 fail2ban.actions[28459]: NOTICE [sshd] Ban 62.232.207.210 |
Almost forgot to mention
How did I free up space so I could still examine btmp? I deleted an older large log file, secure-20151011 which was about 400 MB. No reboot necessary of course. Mysql restarted successfully as did the web servers and I was back in business logging in to my WP site.
August 2017 update
I finally had to reboot my AWS instance after more than three years. I thought about my ssh usage pattern and decided it was really predictable: I either ssh from home or work, both of which have known IPs. And I’m simply tired of seeing all the hack attacks against my server. And I got better with the AWS console out of necessity.
Put it all together and you get a better way to deal with the ssh logins: simply block ssh (tcp port 22) with an AWS security group rule, except from my home and work.
Conclusion
The mystery of the failed WordPress login is examined in great detail here. The case was cracked wide open and the trails that were followed led to discovery of a brute force attempt to gain root access to the hosting server. Methods were implemented to ward off these attacks. An older log file was deleted from /var/log and mysql restarted. WordPress logins are working once again.
References and related info
fail2ban is documented in a wiki of uneven quality at www.fail2ban.org.
Another tool is DenyHosts. One of the ideas behind DenyHosts – its capability to share data – sound great, but look at this page: http://stats.denyhosts.net/stats.html. “Today’s data” is date-stamped July 11, 2011 – four years ago! So something seems amiss there. So it looks like development was suddenly abandoned four years ago – never a good sign for a security tool.