Do Search Engines have the Right to Copy your Content?

16 comments
Thread Title:
has Google got the right to display my website?
Thread Description:

This is always a topic of some contention: Does an SE have the legal right to copy, cache or just list links to a website. This WPW thread is frought with silly replies that totally miss the OP's point but amongst the garbage there are a few sensible comments on the legalities of SE's and copyrights.

thecat starts out with what to me is a pretty simple question:

Why has Google or any other SE got the right to display my website on its search engine? does it own the web?

if I took someone elses content or title and keywords and put the information on my site i'd be in trouble, so why can google or other SE's?

This is followed by a truck load of nonsense posts along the lines of "robots.txt", "because they can", "it's just how it works!". It's not untill quite late in the thread that PikoTech finally breaks the stranglehold of silliness and adds this to the fray:

Google labs actually have on their terms and conditions page that "You also agree that you will not use any robot, spider, other automated device, or manual process to monitor or copy any content from the Site." (http://labs.google.com/labsterms.html)

Yet although they do have a robots.txt file for labs.google.com Yahoo! do have the Google labs website in their index. And I'm doubting very much that they wrote to Google to get written permission to add it to their index.

Does that mean Yahoo! should be prosecuted by Google?
Also does that mean Yahoo! ignores robots.txt

after minstrel says point blank that SE's have the right to copy content the OP (thecat) asks "So if they have the right, do we?" and oh my, i just cant bring myself to repeat what follows lol! Go check it out, its a very, very funny thread.

Getting Down to Business

As far as i can tell, publishing links and snippets as descriptions would all be covered under "fair use" - this has been stated many times by many people. It's when you start talking about the idea of anyone displaying your entire web page (as in an SE cache) that the boundaries of fair use start to blur and we get to the real debate.

Do SE's have the Legal Right to Cache/Copy your Content?

Comments

Ready to find out?

Why do you think many partners outside the US does NOT use the cache links?

Both Opasia and Yahoo in Denmark (when they where still here) decided not to have the cached links. I know that one of the reasons is that they did not feel it was legal in Denmark (without making any conclusions on other countries). I agree, I don't think the caching of pages is legal (in the countries I know of). I do think it's legal to show links, snippets and titles in a search result, but not entire versions of my website. But even linking has been questioned.

In the Newsbooster case in Denmark the Chief Editor of a major newspaper said: "We loose money on our website, so any link to us and any visitor you send us is like stealing from us!" - and the judges accepted that explanation! Go figure!!

The one thing that could question all this (especially the caching question) is "whats in the publics interest". Sometimes law is strange. A case could show that the public interest of having the cache version is more important than the intelectual rights of the content owner. But I very much doubt that. Only a case (or in fact, a case for every single country!) would show that and we have yet to see them come. ... I think they will.

If there are any large brands out there that want to question this in court I have the right forensic team ready for you :)

Archive.org

For that matter, does Archive.org have the same right? Seems both go far beyond the rights of Fair Use. Not that I'm complaining, the caches often help me (I'm in China).

I still come across Web sites that demand you get written permission before hyperlinking to them!

(This is bullshit, right? It's kinda like demanding, "You need written permission to even quote the title and publisher of my printed article.")

Linking

Yeah, that's clearly rubbish notredamekid. It's a symptomatic knee jerk reaction from people that just dont understand the point of the web. One wonders why they even have a website if they dont want people to link to them lol...

Quote:
In the Newsbooster case in Denmark the Chief Editor of a major newspaper said: "We loose money on our website, so any link to us and any visitor you send us is like stealing from us!" - and the judges accepted that explanation! Go figure!!

I remember that case well Mikkel, in some areas Denmark is particularly slow to appreciate what the web is all about and the fact that they upheld that absurdity was quite horrifying.

Quote:
If there are any large brands out there that want to question this in court I have the right forensic team ready for you :)

I'd love to see this tested in court, preferably in the US and I agree, at some point it clearly will be.

From a webmasters point of view I dont have much of an issue with caching. If the SE's were to ask my permission I would grant it but, that's not the point is it?

For me, it's a little worrying that it would be possible, if unlikely, that an SE could decide to use caches to produce mass content in a different form.

Imagine if Google or Yahoo decided to use those caches to make an enourous portal of information on specific subjects. Much like a directory but the links all go to cached pages with advertising embedded into the page along with the cache of that page?

A frightning thought eh?

Like this?

>make an enourous portal of information

http://www.google.com/search?q=define%3A+portal

Clown...

you know what i mean heh!

Don't know...

I really don't know if it's legal or not and if it would stand up in court, but I'm really, really glad that they are doing it now as the cache is one of the best things going.

I would hate for it to be taken away.

Re: Clown...

What I mean is that the deal is very clear, the SE's can use my content within reason as long as they send visitors.

The define: search breaks that deal, I for one have never clicked through to a site from a define: search, why would you need too?

My ego would love...

...to have Google pull define: results from one of its (?) sites. Isn't that one of those "priceless" things?

unofficial contract

the issue isn't really with the big SE's - they at least give payback - but we talked about this at SG a while ago in relation to a spider which apparently belongs to a search engine which is in beta, but since no one can access the website you can't see what they're actually doing with the content and you don't get visitors.

The legal issue's the same though - it's only a matter of what we're willing to accept and how much traffic we want to make the deal 'fair'.

Ah...

Quote:
The define: search breaks that deal, I for one have never clicked through to a site from a define: search, why would you need too?

I see your point and beg your pardon!

I have, i often use define to find things, i tried it for YASN the other day and came up blank (yet another social network) but i often click them when they come up with somthing just in case they dont have the full snippet i need (or i think they might be ommiting a bit).

But point taken, they are generally complete answers to your query right? with no need to click-thru and "pay back" the website providing that info...

With regard to Gurties point I think what NFFC is saying about define puts a new light on it doesnt it?

I'm still a Clown though..

>they are generally complete answers to your query right?

Yes.

Drifting ever so slightly OT......

How do we think this would relate to a syndication technology such as atom/rss?

The webmaster offers a feed for all to use right? Does he have any ground to say, "hey! u cant use my stuff!" - i think not but its not 100% clear is it?

not if you submit it

- if you create an rss feed or submit your site to a SE I'd say you can't complain. If your content gets picked up without invitation that's different. Especially if a spider ignores robots.txt

Yup...

...especially webmasters with pages having {meta robots..."all"} or {..."index,follow"} shouldn't complain. If they use the robots meta, but forget (?) to include "noarchive", then they only have themselves to blame for letting the SEs cache their stuff. The gray area only exists for pages without robots meta or robots.txt

P.S.: wow, I just used < > instead of { } and the post was suspended because of "suspicious" content. I feel like a criminal now.

Copyright or Copywrong?

This is one of my special interests. We had the beginnings of an interesting thread on the subject a while ago at WmW, unfortunately it fizzled. Basically the SEs are working in reverse to copyright law and often cross the line IMO. My prediction is that one day there will be a major court action which will shake up the SE business big time.

Onya
Woz

Agree

I think you're right, there, Woz.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.