Automated Content to Kill Search?


So the future is:

Collect huge amounts of free data automatically
Modify on the fly
Automatically build sites with revenue model and content fodder included
Automatically buy domain names
Automatically host
Automatically push some simply links into those sites

Wait and allow the spiders to eat away.

Go back to bed.

Now G/Y/MSN all know this is going on, so what will they do?

Hand check the big money terms and not care about the rest? Not a solution as the average person now search's for nearly 4 word terms, hence for most searches your index looks like rubbish, hand checking is not possible.

How can an algo stop sites appearing in the index, when the content on the page is almost perfect, and it behaves and looks like a real site?



It's not *quite* that simple though....

I've got a lot of experience of working with scraper scripts, both using and not using RSS as a format for pulling the content.

Yeah, it's pretty easy to build a site *really* quickly that contains thousands of pages, or even a site that grows over time, but when it gets tricky is when you try rewriting that content.

There are all kinds of things you can do to mix the content up a bit, running phrases through a thesaraus - using summarisers to cut the original content down - and it *does* tend to work well.

The real issue I face with this kind of site is getting the content to actually make any sense.

As soon as a real user gets to the page, they'll know (usually I would have thought) that it's autogen content.

All they need to do is report it.

That (for me at least) is the biggest problem with this kind of tactic.

I want my users to bookmark my pages, not report them for junk content.

I think search has still got a while before spammy sites ruin it.

In the December 19th Strikepoint DaveN complained about getting emails from so many people thinking his spam sites were legit :)

I have no anwser really. :)

If you can get it that good, then maybe it will be a real issue in the future (Or now).

It will be a real issue very soon

I am not just talking about rss here, that can be crap. But there are so many sources of real data that can be used again and again.....

Public domain
Press releases
Public Articles

Plus autogen stuff

The list is endless...

Ok it takes some work but you can create sites that even by hand check they look and feel real.

Obvioulsy once you know how to do, you can just plug in lots more data sources. If the engines kill one type of site, just chnage the footprints and datefeeds and off you go building 1000's per day.

The arguement I am now having is that I a believe these sites actually have a purpose....but I would hate to have a mom and pop site trying to compete in a swamped market........

So ppc is the only way to go if you have real are buying from google either on search on on the content those shares in ppc companies......assuming they can stop the fraud.


Not all Spam is useless

Taking an Amazon datafeed and organizing it into any useful manner is a blessing, not spam.

Amazon has taken customer profiling and sales ranking to such an extreme that often what I actually search for is at the bottom of the list because they know best what I'm more likely to buy.

You can't find shit there half the time so anyone that can just sort out their crap into a regular ecommerce format is aces in my books.

the future :->

it is not the future, it is SOOOO yesterday's methods, look at the present, it is much more then that , and see the new stuff that is coming .....

Simply trying to improve the user experience, one technique a day ... :->


SOO yesterday

Broadprospect I do agree.

What I am getting at is one day soon, or maybe now, it will be impossible for not only a spider to see the difference between a hand built site and an auto generated one.

Then what is the definition of a quality site.


