Is It a Bird? No, It's 'Strider Search Defender'

10 comments

I envisage it, for some perverted reason, as being male, Australian, slightly overweight and in a badly-fitting spandex Superhero costume.

But it's not, it's the end of Spam Pages On The Web. It'll zap 'em before they get indexed. It says here.

'Microsoft Seeks to Stop Search Spam'

The effort is being headed up by researcher Yi-Min Wang and focuses on a major problem now plaguing the Web: blog spam. The basic premise of Strider Search Defender is that spammers utilize what Yi-Min calls "doorway pages" -- sites at reputable hosts and blog services. The doorway pages pull ads from a "target page" operated by the spammer.

Instead of reading the actual content of a page to see if it could be classified as spam, Microsoft is taking a context-based approach that analyzes URL redirection. Because many Web sites will use redirection to serve up different pages to search engines and humans, this methodology could prove more effective.

"Yi-Min has been working closely with the MSN Search team to share the results of his spam Web page research," a Microsoft representative told BetaNews. "The Search team has been actively pursuing his leads, and if they are indeed spam pages, they will be either removed from the search index or assigned a low relevance ranking."

None of the dark-hearted commenters seem overly worried.

>Edited to add Microsoft link

Comments

male, Australian, slightly

Quote:
male, Australian, slightly overweight...

Hmmm could be me, could be Woz...

Quote:
...and in a badly-fitting spandex Superhero costume

N0 NO not that!!

Hrmph!

Hrmph!

It could be any one of us ;)

It could be any one of us ;)

not me

not me, I consider myself a more than slightly over weight.
In fact you don't get many internet types who are only slightly over weight.

maybe that is his super power, being able to sit in front of a computer for 12hours a day and not put on weight.

they didn't get it at WMW

Heres the bit I like:

Similarly, advertisement syndicators can detect potential spammers by monitoring those customers who serve ads on a huge number of different URLs through a single account because it is highly unlikely that anyone can generate quality content at that scale.

All the syndicators, including Adsense already have this data. It's in the accounting system. A single SQL query run once a day should do the trick.


SELECT TOP 1000
COUNT(*) AS SITECOUNT,
ACCOUNTS.NAME
FROM ACCOUNTS

JOIN SITES ON SITE.ACCOUNTID=ACCOUNTS.ID

ORDER
BY SITECOUNT DESCENDING

GROUP
BY ACCOUNTS.NAME

On the other hand, the research article is sort of a "Guide to Spamming Google" for the uninitiated. Comme moi.

Hey Plumsauce - are you

Hey Plumsauce - are you trying to derail this thread with intelligent observations?

The rest of us are trying to determine the true identity of Mat's overweight, Australian, spandex suit clad, spam fighting superhero.

:)

advertisement syndicators

advertisement syndicators can detect potential spammers by monitoring those customers who serve ads on a huge number of different URLs through a single account because it is highly unlikely that anyone can generate quality content at that scale.

what stops me from manufacturing similarities between competing sites and sites generated exclusively to destroy the rankings of their good site?

seems like search engines are coming up with many different ways to add value to various types of web pollution. seems silly for them to share the specifics though, doesn't it?

nothing, but ...

However, the point is that there is at least a suspected relationship between getting paid for publishing ads on spam sites and the amount of spam growth.

In other words, the old rule of "follow the money" applies here.

The pay per click syndicators could stop a lot of the type of sites known genericaly as "made for adsense" by simply monitoring the relationship of the sites serving the ads and the publisher payment account. The key phrase being "through a single account because it is highly unlikely that anyone can generate quality content at that scale."

The SQL code I posted hints that this is needed only at the level top N number of sites. N being an arbitrary threshold designed to attain the desired degree of enforcement.

As for destroying reputations using spam, that is beyond the scope of my original post.

Clearly, in this specific case, publishing the methodology has little effect on its effectiveness. It tells the spammer, if you don't want to get squeezed out, then it is necessary to have a limit on the number of publishing domains associated with any one publisher account.

This then forces the spammer to setup as many publisher accounts as are needed to stay under the radar. In the case of the top 3 syndicators, I believe that it is one account per entity.

Therefore, one account per individual. Then, additional beneficiary accounts need to be setup using legal entities such as corporations, partnerships, limited liability companies, etc.

This in turn forces the use of multiple bank accounts for cashing the proceeds because banks are reluctant to permit third party endorsement of checks on a business account.

So, the costs per group of publishing domains become say at least $50 to setup a shell, the time to setup a bank account, the suspicion raised at the bank about someone constantly setting up new accounts.

By increasing the logistics cost beyond acquiring or borrowing a domain name, the day of the one shot, one day domain parked spamvertising site would be effectively over due to the required logistics.

The short version: nuke 'em by making it unprofitable.

Any of the top 3 can accomplish this by using the logic in the SQL query I suggested. If they wanted to.

They could also contract me to rewrite the SQL in whatever dialect they might need :)

no kidding

Quote:
All the syndicators, including Adsense already have this data.

Seems odd to me that you write SQL queries for a media outlet to use on their own data, to make the world a better place. If they wanted to eliminate MFA sites, they would eliminate MFA sites. Instead, they encourage them.

No SQL is going to change that.

quite true

But, publishing the SQL and pointing out that the data is available and under the syndicators' control cripples the possibility of a spin response that it is technically infeasible.

Anyways, as you point out, they could stop it if they wanted to. So, knowing this, the moaners and groaners over at WMW can just suck it up and get on with the job :)

Let Marissa Meyer work for her money.

And, it was first heard here at TW!

Of course, it was the simplest form of the SQL query needed to illustrate the purpose. There were other more useful forms of the query yielding additional information, but it involved a subquery that I didn't want to think about. At least, not for free :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.