Dealing with Arsewipe Trackback Spammers

31 comments
Story Text:

Like everyone else on the web that has, or in my case, used to have trackback enabled on their sites, im being hit with a barrage of trackback spam on a minute by minute basis. Some months ago, i had to disable trackback, and for me, it was a really sad day. Now i want payback...

What i'd like to pick the brains of TW'ers about is this:

  • How can i stop requests for /trackback/3434 at all?
  • Not only that, what's the worst thing i can do to these requests?

The really annoying thing is that whenever i hit the admin panel of TW, all i see is a deluge of "trackback/1046 not found" rather than important messages. These requests are all served a valid 404, so they're wasting their own bandwidth, and my time - idiots...

I think the key would be to stop them at the .htaccess level, but do not have the know how to do that, so i wanted to ask you lot, known for a wide range of server skills, how/what to do?

The nastier the better...

Comments

localhost

Perhaps something like this:

--------------------------------------
RewriteRule ^trackback http://127.0.0.1 [L]
--------------------------------------

That'll send the trackbacks back in their own face.

Would that work?

By the time the trackback message hits your server, then "localhost" is the server. No?

no it just forwards

The requests (trackbacks) never hit TW, they're just passed on to localhost (for less technical readers, "localhost" is your own machine - it always has the IP number 127.0.0.1). The URL isn't even rewritten or anything.

If you try to fetch "threadwatch/trackback/whatever" the request will look like you're getting that address but you will be hitting your own machine.

Ie. "threadwatch/trackback" will just act as a pipe through which you will hit your own machine. You'll even see TW in the address bar if you try entering it in a browser. (unless i've made a typo that is).

---
Edit: Delinked trackback/whatever

...

Thank you. Very useful, claus.

That does sound good claus,

That does sound good claus, thanks. I'll test that in the morning...

The nastier the

Quote:
The nastier the better...

Send them 70 wives. ;)

Kill the server / script ?

I don't know the TB spec (am away from office and "borrowing" a wifi atm) but why not redirect it back to the URL instead of localhost?

Ahh bugger it. How about this?

I've just regged a domain name www.widgetspam.com - Check the IP that it resolves to. Let's see them deal with a redirection to that! There is a chance it may even kill the script and possibly have the knock on effect of taking the server out too.

I'd personally suggest not disabling TB but rather selectively serving up that redirect dependant upon the url that is being TB'd. Pity to lose the power of the format.

'Fail' sends them away

(eventually)
I've used the rewriterule [L] successfully - and they seem to understand what they are getting and go away (yay!)
OK, it takes a while :(

A few other options

I just tested the localhost one, works like a charm. Although it's not totally invisible: It does send a 302 redirect to 127.0.0.1.

If you want it not to return a status code, you should use the proxy flag, like this:

--------------------------------------
RewriteRule ^trackback http://127.0.0.1 [P]
--------------------------------------

However, the first option (sending the 302) is not as hard on your server (if it can't connect -- which is very likely -- the proxy will be hanging). Plus, the proxy option might not be enabled, I don't know the Platinax policy about this. A 301 might be better than a 302 though, as it's "permanent" not "temporary":

--------------------------------------
RewriteRule ^trackback http://127.0.0.1 [R=301,L]
--------------------------------------

Technically you could also just send them a code "403 Forbidden" like this:

--------------------------------------
RewriteRule ^trackback - [F]
--------------------------------------

OTOH, that would not be a lot different from what you do now sending them 404's.

Just send them to tubgirl...

...or maybe some spamming collector.

I wonder if trackback spam can be fought with variants of anti-phishing scripts

Manual Trackback on this Post

I wrote something about this thread. Since I couldn't trackback, here's the link:
http://mutually-inclusive.typepad.com/weblog/2005/07/innovation_in_f.html

/trackback/whatever

Whoever is testing this, i've not done it yet :)

Not working

I just put this line in .htaccess (rewrite is on btw)

RewriteRule ^trackback http://127.0.0.1 [R=301,L]

and i typed in a url with "trackback" in it and just got the 404 page?

...

Since I haven't yet tried it, I'll ask: what 404 page ... TW's or your browser's?

TW's

TW's

Hmmmm.....shouldn't it

Hmmmm.....shouldn't it be:

RewriteRule ^.*/trackback http://127.0.0.1 [R=301,L]

(just guessing...I haven't tried it)

Hmm.

My guess is that localhost gets read as TW because, in order to access the .htaccess file at all, the request must have been made to TW's server already ... which, of course, is not 127.0.0.1. That's just a guess.

Could it not be something along these lines:

RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .TWCOMMENTSFILE\.php*
RewriteCond %{HTTP_REFERER} !.*threadwatch.org.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^-$
RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]

ok, let's try fixing it then :-)

Yeah, It seems it doesn't work, that's odd. So, paraphrasing Oliver Hardy: "Here's another nice mess I've gotten myself into...". Emergency support underway = long post coming up (gotta make a section on some website with all these tips one day, or an eBook perhaps):

------------------------------------------
DianeV, the problem is not what you wrote above, as the request to threadwatch's server will always be sent - you can't get around that unless you block it before it reaches the server (at a router/firewall/load balancer/something).

So the request hits the TW server, which is okay. But, in stead of redirecting, the server starts looking for the file. And of course, it's not there (!)

This means that the server does not act as it's told by the rewrite rule in the ".htaccess" file. Now, that's always an interesting situation, as the reason can be all kinds of things. So: Rule #1 is Don't Panic. This about as annoying as those very large puzzles, except with .htaccess the tiles don't always have the shape that they are supposed to have. We'll take this one step at a time:

------------------------------------------
------------------------------------------
1) Does the server read the .htaccess file at all?
- Do you have other rules or content in that file that are executed as they should be? Stupid question; there's a custom error page so the file must be read and in txt format and all that allright.

------------------------------------------
------------------------------------------
2) Is mod_rewrite enabled?
This is controlled by the directive "AllowOverride" in the httpd.conf file (the server configuration file.) Specifically, the setting must be:

"AllowOverride All", or "AllowOverride FileInfo"

If you have any working rewrite rules already, this setting is as it's supposed to be.

------------------------------------------
------------------------------------------
3) Is the rewrite engine turned on?
If you have any working rewrite rules already, it will be turned on. If not, try putting tis line in the .htaccess file just above the rewrite rules:

RewriteEngine on

------------------------------------------
------------------------------------------
4) Are there conflicts with other redirects?
The .htaccess file is not the only place you can make a redirect. So, even if you don't have other working rewrite rules in the .hhtaccess (but especially if you have) the file structure on the server could be entirely different from the URL-structure.

If "/trackback/123" is getting caught by another rewrite rule first and translated to, eg. "/some-script?section=trackback&page=123" then this is the url that gives you the 404, not the "/trackback/" one.

So, either you should place the /trackback/ redirect code higher in your htaccess rule than other rules that will match the same URL (ie. as the very first of your rewrite rules - immediately after "RewriteEngine on").

- or try this in stead:

----------------------
RewriteRule ^some-script?section=trackback http://127.0.0.1 [R=301,L]
----------------------

- or indeed this one, which is very similar to the one you tried, but with one significant character ("^") missing:

----------------------
RewriteRule trackback http://127.0.0.1 [R=301,L]
----------------------

The one above will match "trackback" no matter where in the URL it is. The character i removed is the "start of string" character, which means that "trackback" must be the first thing after "tw.org/"

------------------------------------------
------------------------------------------
5) Are you using the right .htaccess file?
You can have .htaccess files in every folder of the site, and if "AllowOverride" is set they will all work, and sometimes conflict. Assuming that /trackback/ is a real folder (a filesystem folder) you can have an htaccess in that folder as well as in the root.

To prevent conflicts, there are rules, and even a "hierarchy of the .htacceses". The lowest/deepest placed files always overrule the higher placed ones. That is: The most specific one wins. So, if you have an .htaccesss file in the /trackback/ folder, this file will trump anything you write in the root htaccess folder.

Even if it's just an empty file, it will still overrule the contents of the root .htaccess (but only for the /trackback/ folder itself, and subfolders)

------------------------------------------
------------------------------------------
These are the things I think probably could be wrong. The rule should work on it's own, just being on a single line like that, so it's only if there's some kind missing link somewhere that things don't pan out as they should.

Hope some of this helps, otherwise we can probably think of something more. It's not the rule that's the problem, it must be something else :-)

RewriteEngine on #

RewriteEngine on

# Subdomain issues
RewriteCond %{HTTP_HOST} ^threadwatch.org
RewriteCond %{HTTP_HOST} !^intel.threadwatch.org
RewriteRule (.*) http://www.threadwatch.org/$1 [R=301,L]

# Modify the RewriteBase if you are using Drupal in a subdirectory and the
# rewrite rules are not working properly:
#RewriteBase /drupal

# Rewrite old-style URLS of the form 'node.php?id=x':
#RewriteCond %{REQUEST_FILENAME} !-f
#RewriteCond %{REQUEST_FILENAME} !-d
#RewriteCond %{QUERY_STRING} ^id=([^&]+)$
#RewriteRule node.php index.php?q=node/view/%1 [L]

# Rewrite old-style URLs of the form 'module.php?mod=x':
#RewriteCond %{REQUEST_FILENAME} !-f
#RewriteCond %{REQUEST_FILENAME} !-d
#RewriteCond %{QUERY_STRING} ^mod=([^&]+)$
#RewriteRule module.php index.php?q=%1 [L]

# Rewrite URLs of the form 'index.php?q=x':
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

# Send Trackback spammers home
# RewriteRule ^trackback http://127.0.0.1 [R=301,L]
RewriteRule trackback http://127.0.0.1 [R=301,L]

conflicts are the only thing

conflicts are the only thing i can think of claus, the rules in place are above, all you other points are covered i *think*

thanks a lot for the help!

try moving it

- most of your lines are commented out. I'll only post the non-commented stuff, so it's easier to read.

Try moving the TB-line to the top so that it's the first one that gets executed. I would do this anyway so that the other rules don't get checked for these requests. It's always good to throw away what you can as early as you can.

--------------------------------------------------
RewriteEngine on

# Send Trackback spammers home
RewriteRule trackback http://127.0.0.1 [R=301,L]

# Subdomain issues
RewriteCond %{HTTP_HOST} ^threadwatch.org
RewriteCond %{HTTP_HOST} !^intel.threadwatch.org
RewriteRule (.*) http://www.threadwatch.org/$1 [R=301,L]

# Rewrite URLs of the form 'index.php?q=x':
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
--------------------------------------------------

If it's the IP that's causing problems (it shouldn't) then try one of these:

--------------------------------------------------
RewriteRule trackback http://localhost/ [R=301,L]
--------------------------------------------------

- or this one:

--------------------------------------------------
RewriteRule trackback/(.*) http://www.google.com/search?q=$1 [R=301,L]
--------------------------------------------------

Last one will send a query to google in stead of localhost, so it's only for testing. Google does have good servers but it's not nice directing others to somebody elses site like that, even if it's Google. So, if that one works, "localhost" or "127.0.0.1" should also work.

Ooops!

Sorry claus, somehow i'd missed that you replied!

I just moved it to the top, out of boredom at stupid o'clock in the morning (kitten woke me up) and it works great!

Anything with "trackback" in the url now gets redirected home, thanks :)

LOL

- and I was just going to ask you if it worked or not when I saw that you had replied :-)

Glad to see it working. I think it was the [L] flag on your other rules that id it. That one means that "this is the Last rule for this request".

So, even though the rule for sending them home worked okay, it would never be executed because some other ruleset had already told Apache that there were no more rules to process for this request.

It is okay to have the [L] flag on the trackback rule even when it's the first one, as after you've sent them home you don't need to spend more computing power on them (which was the exact point of it all anyway ;-).

only thing

>> The nastier the better...

It's not really nasty as such, as it depends on how nasty the TB spammers are themselves. If they send you 1,000 requests per second they will get that back, but if they send you one per day that's all they'll get back.

We could probably think of something that would be a bit nastier, such as forwarding their URL/IP to some script that did something creative with it, but that would not be good for the people that just happen to use trackback as a regular feature of their blog software (ie. without spamming). So, I think it's a fair method as it is, and "nasty" will only require a little bit of extra effort should you ever decide that it was necessary.

Hmm, this means that we can't...

...do a site search for trackback any longer :(

http://www.threadwatch.org/search/node/trackback

Maybe your rule is a bit harsh Nick

Do you think alot of people

Do you think alot of people would search for trackback Wit?

>nasty

Nah, we can leave it as is claus, i don't really want to waste any more of anybodies time on people too stupid to have their bots pull 404's out the the db...

Hehe...

...I did when you started nagging about it ;-)

LOL actually I suspected something like this, so I checked. Anyway: small price to pay.

PS: firefox told me "the connection was refused when trying to contact 127.0.0.1". Was that what you intended, or did you actually go for something nastier?

I think it is a great

I think it is a great implementation but personally would prefer the redirect was to a 255.255.255.255 IP delegated domain name such as I mentioned above :)

rule too harsh

No problem with the site search. Either we can put the start-of-line character ("^") back in:

--------------------------------------
RewriteRule ^trackback http://127.0.0.1 [R=301,L]
--------------------------------------

Or we can make a condition that this rule is not valid for the search folder

--------------------------------------
RewriteCondition %{REQUEST_URI} !^/search/
RewriteRule ^trackback http://127.0.0.1 [R=301,L]
--------------------------------------

The first one is easiest on the server, as there's no condition to check.

Well, i put the newline

Well, i put the newline character in, that's probably best, but we'll see how it goes - i think that will sort it, but if they're testing different combinations, the more aggressive rule will have to go back.

Now, what's with this 255 IP instread of the 127 one?

What exactly does it do, and will there be additional overhead of any kind at this end?

Tech stuff

A special type of IP address is the limited broadcast address 255.255.255.255. A broadcast involves delivering a message from one sender to many recipients. Senders direct an IP broadcast to 255.255.255.255 to indicate all other nodes on the local network (LAN) should pick up that message. This broadcast is 'limited' in that it does not reach every node on the Internet, only nodes on the LAN.

Technically, IP reserves the entire range of addresses from 255.0.0.0 through 255.255.255.255 for broadcast, and this range should not be considered part of the normal Class E range.
http://compnetworking.about.com/od/workingwithipaddresses/l/aa042400b.htm

Most likely the server performing the trackback will not even know what 255 is, but it will most definitely recognize localhost. So, as I see it it's about sending the requests back home in their face (127) or dev/nulling them (255)

Overhead: Same, same...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.