Categories
Internet Mail Spam

enom is the source of recent spam campaigns

Intro
I’m still watching over spam. The latest trend are spam campaigns which have a few characteristics in common perhaps the most interesting of which is that the domains have all been registered at enom.com.

The details
Some other things in common. These recent campaigns fell into two main categories. One set uses domains which are semi-pronounceable. The other are domains which incorporate sensible english words. Both categories have these other features in common.

– brevity (no HTML, for instance)
– valid SPF records (!)
– domains were used for spam almost immediately after having been registered (new domains)

Today’s example

From:        Patriot Survival Plan <[email protected]> 
To:        <[email protected]> 
Date:        05/22/2014 04:22 AM 
Subject:        REVEALED: The Coming Collapse 
 
 
 
--------------------------------------------------------------------------------
 
 
 
 
[email protected]
 
Since I exposed this I'm getting a lot of comments. 
 
People are terrified and they are asking me to spread the word even more...
 
So don't miss this because it might be too late for you and your family!
 
Obama's done a lot of stupid things so far, but this one will freeze the blood in your veins!
 
He's been trying hard to keep this from American Patriots... but now his betrayal has finally come to light.
 
And he'll have to pay through the nose for this.
 
But here's a Warning: the effects of Obama's actions will hit you and your family by the end of this year.
 
And they'll hit you like nothing you've ever seen before...
 
So watch this revealing video to know what to expect...
and how to protect against it.
 
-> Watch Blacklisted video now, before it's too late -->                 http://check.best-survival-plan-types.com
 
 
 
 
 
 
 
No_longer_receive_this _Warning :   http://exit.best-survival-plan-types.com
Patriot Survival Plan _405 W. Fairmont Dr. _Tempe, AZ 85282
 
 
 
 
 
First off, there's nothing special 22409526 in the Ironbound. Food in quantity, 22409526not quality. It's amazing how many people 22409526 rate these establishments as excellent. This said, I've always had fun going to these places, 22409526 as long as your dining expectations are gauged accordingly. Therefore, 22409526 my rating reflects those reduced expectations. :)
 
Being a steakhouse, 22409526 one would expect a thorough steak menu such as those at Gallagher's, Luger's, or even Del Frisco's. However, you're not getting true steakhouse fare here; 22409526 it's the Ironbound after all. So, you're getting a less than Prime cut of beef, 22409526 sometimes cooked to your liking.

Whois lookup of best-survival-plan-types.com shows this:

Domain Name: BEST-SURVIVAL-PLAN-TYPES.COM
Registry Domain ID: 1859701370_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.enom.com
Registrar URL: www.enom.com
Updated Date: 2014-05-21 17:26:19Z
Creation Date: 2014-05-22 00:26:00Z
Registrar Registration Expiration Date: 2015-05-22 00:26:00Z
Registrar: ENOM, INC.
Registrar IANA ID: 48
Registrar Abuse Contact Email: [email protected]
Registrar Abuse Contact Phone: +1.4252744500
Reseller: NAMECHEAP.COM
Domain Status: clientTransferProhibited
Registry Registrant ID:
Registrant Name: DONI FOSTER
Registrant Organization: NONE
Registrant Street: 841-4 SPARKLEBERRY LN
Registrant City: COLUMBIA
Registrant State/Province: SC
Registrant Postal Code: 29229
Registrant Country: US
Registrant Phone: +1.8037886966
Registrant Phone Ext:
Registrant Fax: +1.5555555555
Registrant Fax Ext:
Registrant Email: [email protected]
Registry Admin ID:
Admin Name: DONI FOSTER
Admin Organization: NONE
Admin Street: 841-4 SPARKLEBERRY LN
Admin City: COLUMBIA
Admin State/Province: SC
Admin Postal Code: 29229
Admin Country: US
Admin Phone: +1.8037886966
Admin Phone Ext:
Admin Fax: +1.5555555555
Admin Fax Ext:
Admin Email: [email protected]
Registry Tech ID:
Tech Name: DONI FOSTER
Tech Organization: NONE
Tech Street: 841-4 SPARKLEBERRY LN
Tech City: COLUMBIA
Tech State/Province: SC
Tech Postal Code: 29229
Tech Country: US
Tech Phone: +1.8037886966
Tech Phone Ext:
Tech Fax: +1.5555555555
Tech Fax Ext:
Tech Email: [email protected]
Name Server: DNS1.REGISTRAR-SERVERS.COM
Name Server: DNS2.REGISTRAR-SERVERS.COM
Name Server: DNS3.REGISTRAR-SERVERS.COM
Name Server: DNS4.REGISTRAR-SERVERS.COM
Name Server: DNS5.REGISTRAR-SERVERS.COM

See 1) that it was registered yesterday at 17:26:19 Universal Time, and 2) that the registrar is enom?

And the SPF record:

> dig +short txt best-survival-plan-types.com

"v=spf1 a mx ptr ~all"

Actually this domain is a small aberration insofar as it does not have a SPF record with a -all at the end – the others I checked do.

What to do, what to do
Well, I reported the spam to Postini, but I don’t think that has any effect as they are winding down their business.

I am pinning greater hopes on filling out enom’s abuse form. Of course I have no idea what actions, if any, they take. But they claim to take abuse seriously so I am willing to give them their chance to prove that.

enom’s culpability
I don’t feel enom is complicit in this spam. I’m not even sure they can easily stop these rogue operators. But they have to try. Their reputation is at stake. On the Internet there are complaints like this from years ago, that enom domains are spamming.

Every one that comes across my desk I am reporting to them. The time it takes for me to report any individual one isn’t worth the effort compared to the ease of hitting DELETE, but I am hoping to help lead enom to find a pattern in all these goings-on so they can stop these registrations before new ones cause harm – that is why I feel my actions are for the greater good.

Other recently deployed enom domains

Domain

First spam seen

First registered

onlinetncresults.us

8/22

8/21

checkdnconlinesystems.us

8/20

8/20

extremeconcretecoating.com

8/8

8/8

woodsurface.com

8/7

8/7

shorttermloanspecial.com

7/24

7/23

heartattackfighter1.com

6/19

3/2

handle-unsafe-parasites.me

6/10

6/9

best-survivalplan-learn.com

5/28

5/28

survival-plan-days.com

5/27

5/26

only-survival-plan.com

5/20

5/19

local-vehicle-clearance.us

5/19

5/19

ghiused.com

5/14

5/14

pastutmy.com

5/14

5/14

lekabamow.com

5/14

5/14

etc – there are plenty more!

Finally we hear back
Weeks later, on June 14th, I finally received a formal response concerning only-survival-plan.com and local-vehicle-clearance.us.

From: [email protected]
Subject: [~OOQ-128-23745]: FW: eNom - Report Abuse - Reference #ABUSE-11116
 
Hello, 
 
Thank you for your email. While the domain name(s) reported is registered with Namecheap, it is hosted with another company. So we cannot check the logs for the domain(s) and confirm if it is involved in sending unsolicited bulk emails. We can only take an action if a report is confirmed by blacklists of trusted anti-spam organizations like SpamHaus or SURBL.
 
Thus, we have initiated a case regarding the following domain(s) blacklisted by trusted anti-spam organizations:
only-survival-plan.com
In case the listing is not removed, the domain(s) will be suspended.
 
The following domain(s) has already been suspended:
local-vehicle-clearance.us
 
Let us also suggest you addressing the issue to the hosting company which servers were involved in email transmission for help with investigating the incident of spam. You may find their IP address in the headers. To find their contact details, please whois this IP address. You may use any public Whois tool like https://www.domaintools.com/ 
 
Kindly let us know if you have any question.
 
-------------------------------
Regards,
Alexander XXX.
Legal & Abuse Department
Namecheap Group
http://www.namecheapgroup.com

Analysis of their response
Reading between the lines, here’s my analysis. There’s some not-well-documented relationship between enom and namecheap.com. I reported the abuse to enom and got a response from namecheap.com. I kind of agree that suspending a domain is a BIG DEAL and a registrar has to be on firm footing to do so. As I write this one Jun 16th, the domains do not yet appear to be suspended. Are you really going to trust Spamhaus to render your judgement? That’s basically one of those extortionist enterprises purportedly offering a take-it-or-leave-it service. If the author of that email was a lawyer, well, their English isn’t the best. That doesn’t provide a lot of confidence in their handling of the matter. And wasn’t my complaint by itself good enough for them to initiate action? I do have to concede the point that the sending of the spam was probably out of their control and probably did come from another hosting company. But it is glib advice to suppose it is that easy to track them down the way they describe. Since they are part of the problem and have the evidence why don’t they follow up with the hosting provider themselves?? There was no mention of my other eight or so formal complaints. So this still seems to be getting an ad hoc one-by-one case treatment and not the, Whoa, we got a problem on our hands and there’s something systemically wrong with what we’re doing here reaction I had hoped to provoke.

Actually I got two responses but with slightly different wording. So they were crafted by hand from some boilerplate text, and yet the person stitching together the boilerplate was sufficiently mindless of the task as to forget they had already just sent me the first email??

So their response is better than a blackhole, but perhaps could be characterized as close to the bare minimum.

I have gotten several other responses from some of my other complaints as well, all saying pretty much the same thing. In August the responses started to look different however.

August responses
Here’s one I received this morning about woodsurface.com, 19 days after my initial complaint:

Hello,
 
This is to inform you that woodsurface.com domain was suspended. It is now pointed to non-resolving nameservers and will be nullrouted once the propagation is over. The domain is locked for modifications in our system.
 
Thank you for letting us know about the issue. 
 
------------------
Regards,
Alexander T.
Legal & Abuse Department
Namecheap.com

Conclusion
I hope my actions spur enom into some action of their own in figuring out where there domain registration requirements are too lax that spammers are taking wholesale advantage of the situation and sullying their reputation.

June, 2014 Update
The storm of spam from enom has subsided. I’m basically not seeing any. Oops. Spoke too soon! New enom-registered domains popped up and created more spam storms (documented in the table above), but not as severe as in the past. I don’t know if our anti-spam filter got better or enom stepped up to the plate and improved their scrutiny of domain registrants. If another spam storm hits us I’ll report back…

August, 2014
enom-generated spam is back!

References
My most popular spam-fighting article describes how to defeat Chinese-language spam.
A new type of spam that uses Google search results for link laundering is described here.

Categories
Admin Internet Mail

Analysis of a spam campaign and how we managed to fight back for a few days

Intro
A long-running spam campaign has been bothering me lately. In this post I analyze it from a sendmail perspective and provide a simple script I wrote which helped me fight back.

The details
Let’s have a look see at the July 3rd variant of this spam. Although somewhat different from the previous campaigns in that this did not provide users with a carefully phished email to their inbox, from a sendmail perspective it had a lot of the same features.

So the July 3rd spam was a spoof of Marriott. Look at these from lines. They pretty much shout the pattern out:

Jul  3 14:12:20 drjemgw sm-mta[4707]: r63IA8dJ004707: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-m
ta, relay=eu1sysamx113.postini.com [217.226.243.182]
Jul  3 14:12:22 drjemgw sm-mta[7088]: r63ICDA7007088: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, r
elay=eu1sysamx138.postini.com [217.226.243.227]
Jul  3 14:12:23 drjemgw sm-mta[7220]: r63ICIhL007220: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-m
ta, relay=eu1sysamx103.postini.com [217.226.243.52]
Jul  3 14:12:24 drjemgw sm-mta[7119]: r63ICEp6007119: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm
-mta, relay=eu1sysamx112.postini.com [217.226.243.181]
Jul  3 14:12:33 drjemgw sm-mta[7346]: r63ICO8H007346: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta,
relay=eu1sysamx110.postini.com [217.226.243.59]
Jul  3 14:12:34 drjemgw sm-mta[7425]: r63ICTsI007425: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-
mta, relay=eu1sysamx107.postini.com [217.226.243.56]
Jul  3 14:12:35 drjemgw sm-mta[7387]: r63ICRMP007387: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=s
m-mta, relay=eu1sysamx108.postini.com [217.226.243.57]
Jul  3 14:12:39 drjemgw sm-mta[1757]: r63I7dfa001757: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, re
lay=eu1sysamx138.postini.com [217.226.243.227]
Jul  3 14:12:40 drjemgw sm-mta[6643]: r63IBpYm006643: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, relay=e
u1sysamx120.postini.com [217.226.243.189]
Jul  3 14:12:42 drjemgw sm-mta[4894]: r63IAFug004894: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta,
 relay=eu1sysamx110.postini.com [217.226.243.59]
Jul  3 14:12:43 drjemgw sm-mta[7573]: r63ICZJq007573: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mt
a, relay=eu1sysamx140.postini.com [217.226.243.229]
Jul  3 14:12:45 drjemgw sm-mta[7698]: r63ICfP9007698: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon
=sm-mta, relay=eu1sysamx102.postini.com [217.226.243.51]
Jul  3 14:12:46 drjemgw sm-mta[7610]: r63ICblx007610: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-
mta, relay=eu1sysamx109.postini.com [217.226.243.58]
Jul  3 14:12:50 drjemgw sm-mta[7792]: r63ICl6Y007792: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, rela
y=eu1sysamx112.postini.com [217.226.243.181]
Jul  3 14:12:51 drjemgw sm-mta[6072]: r63IBGCU006072: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, d
aemon=sm-mta, relay=eu1sysamx126.postini.com [217.226.243.195]
Jul  3 14:12:51 drjemgw sm-mta[7549]: r63ICYnm007549: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, rela
y=eu1sysamx115.postini.com [217.226.243.184]
Jul  3 14:12:55 drjemgw sm-mta[7882]: r63ICrUW007882: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, r
elay=eu1sysamx139.postini.com [217.226.243.228]
Jul  3 14:12:57 drjemgw sm-mta[7925]: r63ICtav007925: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, relay=e
u1sysamx110.postini.com [217.226.243.59]
Jul  3 14:12:57 drjemgw sm-mta[7930]: r63ICu5c007930: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-
mta, relay=eu1sysamx125.postini.com [217.226.243.194]
Jul  3 14:12:58 drjemgw sm-mta[7900]: r63ICsOE007900: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, relay=eu1
sysamx131.postini.com [217.226.243.220]
Jul  3 14:13:00 drjemgw sm-mta[7976]: r63ICwmu007976: [email protected], size=0, class=0, nrcpts=1, proto=SMTP, daemon=sm-mta, relay=
eu1sysamx127.postini.com [217.226.243.196]

Of 2035 of these that were sent out, 192 were delivered, meaning got into the users inbox past Postini’s anti-spam defenses. So that’s a pretty high success rate as spam goes. And users get concerned.

Now look at sendmail’s access file which I created shortly after becoming aware of similar phishing of linkedin.com more recently on July 11th:

# 7/11/13
linkedinmail.com DISCARD
linkedinmail.net DISCARD
linkedinmail.org DISCARD
linkedinmail.biz DISCARD
linkedin.net DISCARD
linkedin.org DISCARD
linkedin.biz DISCARD
inbound.linkedin.com DISCARD
complains.linkedin.com DISCARD
emalsrv.linkedin.com DISCARD
clients.linkedin.com DISCARD
emlreq.linkedin.com DISCARD
customercare.linkedin.com DISCARD
m.linkedin.com DISCARD
enc.linkedin.com DISCARD
services.linkedin.com DISCARD
amc.linkedin.com DISCARD
news.linkedin.com DISCARD

You get the idea.

What I noticed in these campaigns is a wide variety of subdomains of the domain being phished, with and without “mail” attached to the domain. In particular some rather peculiar-looking subdomains such as complains and emalsrv. So I realized that instead of waiting for me to get the spam, I can constantly comb the log file for these peculiar subdomains. If I come across a new one, voila, it means a new spam campaign has just started! And I can send myself an alert so I can decide – by hand – how best to treat it, knowing it will generally follow the pattern of the recent campaigns.

Now here’s the script I wrote to catch this type of pattern early on:

#!/usr/bin/perl
# DrJ, 7/2013
# I keep my sendmail log file here in /maillog/stat.log and cut it daily
$sl = "/maillog/stat.log";
# 10000 lines occurs in about eight minutes
$DEBUG = 0;
$i = 0;
$lastlines = "-10000";
$access = "/etc/mail/access";
open(ACCESS,$access) || die "Cannot open $access!!\n";
@access = <ACCESS>;
open(SL,"/usr/bin/tail $lastlines $sl|") || die "cannot run tail $lastlines on $sl!!";
print "anti-spam domain: ";
while(<SL>){
  ($domain) = /from=\w{1,25}@(?:emalsrv|complains)\.([^\.]+)\./;
  if ($domain) {
# test if we already have it on our access table
    $seenit = 0;
    foreach $line (@access) {
# lazy, inaccurate match, but good enough...
      $seenit = 1 if $line =~ /$domain/;
      print "seenit, domain, line: $seenit, $domain, $line\n" if $DEBUG;
    }
    if (! $seenit) {
      $i++;
      print "$domain\n" if $i == 1;
    }
  }
}

I call the script spam-check.pl. I invoke spam-check.pl every couple minutes from HP SiteScope. There I have alerts set up which email me a brief message that includes the new domain that is being phished.

No sooner had I implemented this script than it went off and told me about that linkedin phishing spam campaign! That was sweet.

Recent campaigns
Here is a chronology of spam campaigns which follow the pattern documented above. They seem to cook them up one per day.

5/16
wallmart.com - their misspelling, not mine!
5/29
amazon.com
6/20
adp.com
date uncertain
ebay.com
7/9
eftps.com
7/10
visabusinessnews.com
7/11
linkedin.com
7/15
ups.com
7/16
twitter.com
7/17
marriott.com
mmm.com - this one changed up the pattern a bit
7/18
marriott.com - again
ups.com - with somewhat new pattern
7/22
AA.com
7/23-28
a bit more AA.com, a smattering of marriott.com and ebay.com
7/29
tapering off...
7/30 and later
spammer seems to have gone on hiatus, or finally been arrested
10/2, they're back
staples.com

One example spam
Here was my phishing spam from 3M which I got yesterday:

From: "3M" <[email protected]>
To: DrJ
__________________________________________________
This is an automated e-mail.
PLEASE DO NOT RESPOND TO THIS EMAIL ACCOUNT.
This account is not reviewed for responses.
 
--------------------------------------------------------------------------------
 
 
This email is to confirm that on 07/17/2013, 3M's bank (JP Morgan) has debited $15,956.64 from your bank account.
 
If you have any questions, please visit the 3M EIPP Helpline at this link.

The HTML source for that last line looks like this:

If you have any questions, please visit the 3M EIPP=
 
		 		 Helpline <a href=3D"http://vlayaway.com/download/mmm.com.e-marketing.ht=
ml?help">at this link.</div>

When I checked Bluecoat’s K9 webfilter, which I even use at home, the URL in the link, vlayaway… was not rated. I submitted a suggest category, Malicious Sources, and they efficiently assigned it that category within minutes of my submission.

Also, note that the envelope sender of my email differs from the Sender header. The envelope sender was [email protected].

A word about DISCARD vs ERROR
While I’m waiting for more spam of this sort to come in as I write this on July 22nd, I had a brainstorm. Rather than DISCARDind these emails, which doesn’t tip the sender off, it’s probably better to send a 550 error code, which rattles the system a bit more. I think a sending IP with too many of these errors will be temporarily banned by Postini for all their users. So I changed all my DISCARDs. Here is the syntax for one example line:

linkedin-mail.com ERROR:"550 Sender banned. Please use legitimate domain to send email."

I originally wanted to put the message “No such user,” to try to get the spammer to take that specific recipient off their spam list, but it doesn’t really work in the right way: the error is reported in the context of the sender address, not the recipient address.

Here is the protocol which shows what I am talking about:

$ telnet drj.postini.com 25
Trying 217.136.247.13...
Connected to drj.postini.com..
Escape character is '^]'.
220 Postini ESMTP 133 y678_pstn_c6 ready.  CA Business and Professions Code Section 17538.45 forbids use of this system for unsolicited electronic mail advertisements.
helo localhost
250 Postini says hello back
mail from: [email protected]
250 Ok
rcpt to: [email protected]
550 5.0.0 [email protected]... Sender banned. Please use legitimate domain to send email. - on relay of: mail from: [email protected]
quit
221 Catch you later

So that – on relay of: mail from: … is added by Postini so it really doesn’t make sense to say No such user in that context.

Conclusion
My satisfaction may be short-lived. But it is always sweet to be on top, even for a short while.

References
For a lighthearted discussion of HP SiteScope, read the comments from this post.

Sendmail is discussed in various posts of mine. For instance, Analyzing the sendmail log, and Obscure tips for sendmail admins.

Categories
Admin DNS Internet Mail

The IT Detective Agency: can’t get email from one sender

Intro
For this article to make any sense whatsoever you have to understand that I enforce SPF in my mail system, which I described in SPF – not all it’s cracked up to be.

The details
Well, some domain admins boldly eliminated their SOFTFAIL conditions – but didn’t quite manage to pull it off correctly! Today I ran into this example. A sender from the domain pclnet.net sent me email from IP 64.8.71.112 which I didn’t get – my SPF protection rejected it. The sender got an error:

550 IP Authorization check failed - psmtp

Let’s look at his SPF record with this DNS query:

$ dig txt pclnet.net

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.10.rc1.el6_3.2 <<>> txt pclnet.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42145
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
 
;; QUESTION SECTION:
;pclnet.net.                    IN      TXT
 
;; ANSWER SECTION:
pclnet.net.             300     IN      TXT     "v=spf1 mx ip4:24.214.64.230 ip4:24.214.64.231 ip4:64.8.71.110 ipv4:64.8.71.111 ipv4:64.8.71.112 -all"

That IP, 64.8.71.112 is right there at the end. So what’s the deal?

Well, Google/Postini was called in for help. They apparently still have people who are on the ball because they noticed something funny about this SPF record, namely, that it isn’t correct. Notice that the first few IPs are prefixed with an ip4? Well the last IPs are prefixed with an ipv4! They are not both valid. In fact the ipv4 is not valid syntax and so those IPs are not considered by programs which evaluate SPF records, hence the rejection!

My recourse in this case was to remove SPF enforcement on an exception basis for this one domain.

Case closed!

Conclusion
It’s now a few months after my original post about SPF. I’m sticking with it and hope to increase its adoption more broadly. It has worked well, and the exceptions, such as today’s, have been few and far between. It’s a good tool in the fight against spam.

Categories
Internet Mail IT Operational Excellence Spam

How to Stop Chinese Spam – for Mail Admins, w/ June 2014 update

(Updated 12/19/2011 and 6/2014 with additional character sets)
(updated 9/2012 with additional signature)
Intro
I have been a target for random Chinese language spam in my various email accounts, but the problem has really gotten worse in the past few months.

The thing about these messages is that at first Postini (a Google spam filtering service used mostly by businesses), wasn’t very good at catching them. Postini is about the best in the business, and they’re competently catching just about every other type of spam. But these Chinese character messages kept slipping through…

Their support tech gave me some advice which turned out to be incorrect, but led me in the right direction. Their tech told told me to create a content manager rule, but the actual rule he provided was only going to catch Russian and Ukranian spam!

This is the rule he provided:

Rule Name: Non_English_spam
"Match Any"
Header - matches regex

koi8-r|koi8-u|koi7|koi8
Disposition: delete (blackhole)
Set quarantine to Recipient

I had no idea what that was doing, so I looked up koi8-r, koi8, etc and found that it had to do with the Cyrillic alphabet. So I wondered if the Chinese language spams have something similar, but for Chinese. Indeed they do: gb2312. Looking at a few of my Chinese spams, almost all contain this string in the headers. It’s not always in the exact same place, but it’s there. To be concrete, here’s an example (some headers have been obfuscated to prevent the bad guys from trying to reverse engineer Postini’s scoring algorithms):

Received: from websmtp.sohu.com ([61.135.132.136]) by eu1sys200amx108.postini.com ([207.126.147.10]) with SMTP;
		 Sun, 28 Aug 2011 18:41:21 GMT
Received: from omlbw (unknown [110.53.27.141])
		 by websmtp.sohu.com (Postfix) with ESMTPA id 9B3C6720CEA;
		 Sun, 28 Aug 2011 23:55:04 +0800 (CST)
Message-ID: <[email protected]>
From: =?gb2312?B?y7O1wsf4xu/A1rbguabE3NfU0NCztdPQz965q8u+?= <[email protected]>
To: 
Subject: =?gb2312?B?d3Azz/ogytsg1vcgudwg1/Yg0KkgIMqyIMO0IA==?=
		 =?gb2312?B?uaQg1/cgssUgxNwgzOEgIMn9INK1ILyoIKO/LS0=?=
		 =?gb2312?B?qIk=?=
Date: Sun, 28 Aug 2011 23:55:37 +0800
MIME-Version: 1.0
X-mailer: Lzke 2
X-SOHU-Antispam-Bayes: 0
X-pstn-levels:     omitted
X-pstn-settings: omitted
X-pstn-addresses: from <[email protected]> [49/2] 

Content-Type: multipart/mixed;
		 boundary="----=_NextPart_000_015A_013AC9FA.1A2D5A60"

------=_NextPart_000_015A_013AC9FA.1A2D5A60
Content-Transfer-Encoding: base64
Content-Type: text/html;
		 charset="gb2312"

See it? charset=”gb2312″ appears in the content-type header and =?gb2312? appears in both the Subject and From fields.

That message looks like this as displayed in my mail client:

How do I know this is Chinese? I pasted the characters into translate.google.com and it auto-detected it. That’s a convenient tool!

How do I know it is spam? I am open-minded. Perhaps it is a legitimate business proposition that just happens to be written in Chinese? It does sort of read that way from the translation of any one such message. On the other side are some stronger pieces of evidence. The empty To: header is a strong hint, but some legitimate messages could contain that undesirable feature, so that is merely an indicator but not definitive. Most important is the fact that I get these messages, all showing similar patterns in appearance, and most telling always coming from a different sender tells me unambiguously that this is really, truly spam.

So the actual Postini Content Manager rule to capture Chinese spam is this:

Rule Name: Chinese_spam
"Match Any"
Header matches regex (charset="gb2312"|=\?GB2312\?)

Disposition: delete (blackhole)
Set quarantine to Recipient

Obviously this type of rule is a bit dangerous. What if you are expecting something written in Chinese? It will be subject to the same treatment as the spam. That is why the suggestion is to Set quarantine to recipient so that these messages could be delivered from the user quarantine.

And over the course of a couple months Postini has gotten much better about capturing this type of spam. That is the best thing – to let the experts handle it. They just needed to train their algorithms. I was quite concerned at first that this spam is so different from the usual, recognizable spam campaigns that they might have a hard time spotting it while simultaneously allowing the good Chinese email through. But they’re almost there…

12/19 UpdateThe filter described above has been working extremely well for me. Essentially perfectly, in fact, as I can see when I look in my quarantine. But not today. Today I got some suspected Chinese spam in and examing the headers showed something slightly different. The subject looks like this:

Subject: =?GBK?B?bnZ2dyAyMDExLjEyLTIwMTItMDEgvqsgxrcgzcYgz/ogIGZkZXI=?=

And the Mime header also had that string:

Content-Type: text/plain;
		 charset=GBK

Looking up GBK character set you’ll immediately see it is simplified Chinese, extended. So I think we better add that character set to our expression. It makes our content manager rule only a little more complicated. Now we would have:

Rule Name: Chinese_spam
"Match Any"
Header matches regex (charset="gb(k|2312)"|=\?GB(K|2312)\?)

Disposition: delete (blackhole)
Set quarantine to Recipient

For the complete prescription see the summary in the Conclusion.

If you happened upon this article and don’t have the Postini service is there any relevance? Yes, I think so. You should be able to filter on the message headers to look for the string =?gb2312? or =?gbk? in the beginning of the subject line. To speak about mailers with which I have some experience, in sendmail you could do this with a milter. In PureMessage it would be possible to concoct an appropriate rule as well.

9/2012 Update
My filter was working so well these past few months I essentially forgot about the problem, but the occasional Chinese spam slipped through. How? It used a different encoding. Here is an example subject line:

Subject: =?utf-8?B?6K+35p+l5pS277yB?=

This is displayed by my mail client as three Chinese characters followed by “!” They used a different encoding. This one drove me to do a little research. This is an Encoded-Word, according to Wikipedia’s excellent MIME writeup. The “?B?” in the front means base64 encoding. I had previously written a mimedecoder in perl, which I put to use:

> mimedecode 6K+35p+l5pS277yB

which produces:

???!

which is pretty much garbage. So I decided to analyze the output with unix utility od:

> mimedecode 6K+35p+l5pS277yB|od -x

which gives

0000000 e8af b7e6 9fa5 e694 b6ef bc81

Next, I needed a UTF-8 converter, which I found at this Swiss site.

I used it with input type hexadecimal.

The results reproduced exactly the Chinese characters my mail client displayed to me! It also gives a lot of other descriptions for these characters (such as Cangjie). The first few lines begin:

As character names:

U+8BF7 CJK UNIFIED IDEOGRAPH character (请)
U+67E5 CJK UNIFIED IDEOGRAPH character (查)
U+6536 CJK UNIFIED IDEOGRAPH character (收)
U+FF01 FULLWIDTH EXCLAMATION MARK character (!)

As raw characters:

请查收!

Well, that was an interesting exercise, but I’m not sure we’ve learned anything that can be put to use in a RegEx on the original expression. Unless there’s a way to uniquely identify Chinese characters by the beginning of the encoded-word sequence following the ?B?. I have my doubts, but since I don’t seem to get thee UTF-8 emails from other sources, and I have a sample size of about five emails that fooled the other filter to work with, I have developed a content filter which would capture all of them!

Check for a header containing the RegEx:

=\?utf-8\?B\?[56]

More specifically sometimes the utf-8 string is used in the From header, sometimes it is in the subject. Most of my samples would have been caught by the simpler RegEx =\?utf-8\?B\?5, and I mention that in case you want to be more specific, but there was one recent one that had a “6” instead of a “5.”

For the record here’s that mimedecode “program”

#!/usr/bin/perl
# base64 MIME decoding
# example:
# mimedecode Nz84QGxhdGU=
# =&gt; 7?8@late
use MIME::Base64;
 
foreach (@ARGV) {
#      $encoded = encode_base64($_);
      $decoded = decode_base64($_);
#print "enc,dec: $encoded, $decoded\n";
        print $decoded;
}

And its sister program, which I call mimeencode:

#!/usr/bin/perl
# base64 MIME decoding
# DrJ, 6/2004
# example:
# mimedecode Nz84QGxhdGU=
# =&gt; 7?8@late
use MIME::Base64;
 
foreach (@ARGV) {
      $encoded = encode_base64($_);
#      $decoded = decode_base64($_);
#print "enc,dec: $encoded, $decoded\n";
        print $encoded;
}

There’s probably a built-in linux utility which does the same thing, I just don’t know what that is.

2022(!) update

Well, I finally ran across it. The built-in program to do mimeencode/mimedecode is base64. Oh well, better late than never…

Conclusion
Your users needn’t suffer from Chinese Spam. The vast majority are characterized by, um, Chinese characters, of course, whose presence is almost always indicated by the string gb2312 in the message headers. You can take advantage of that fact and build an appropriate rule for Postini or your mailer. But beware of throwing out the baby with the bathwater! In other words, make sure you only subject your users to this rule unless you either have a good quarantine, or they are sure they should never receive this type of email.

There are some spam types which evade the gb2312 rule mentioned above, however. And this part is not as well tested, frankly. The exceptions, which are still a minority of my Chinese spam, are characterized by a subject line or sender that contains =?utf-8?B?5… or =?utf-8?B?6… (see summary below). My honest expectation is that a rule this broad and coarse will also catch a few other languages (Portuguese?, Urdu?, etc.) so be careful! If you are expecting to get non-english email more testing is in order before implementing the utf-8 filter. But it will certainly help to eliminate even more Chinese spam.

4/2013 update
Summary, including 6/2014 update
My filter has worked very well for me and has withstood the test of time. I catch at least a dozen Chinese spams each day. One got through in 6/2014 however, with character set gb18030. I realize reading the above write-up is confusing because I’ve mixed my love of telling a good IT mystery with my desire to convey useful information. So, to summarize, the new combined rule is:

Match Any:

Header matches RegEx:
(charset=”gb(k|2312|18030)”|=\?GB(K|2312|18030)\?)

Header matches RegEx:
=\?utf-8\?B\?[56]

References
A spate of spam from enom-registered domains is described here.
A disappointing case where Google is not operating their Gmail service as a white-glove service is described here.