A Standards Based Solution to Google Hijacking

12 comments
Source Title:
Standards Compliant Solution for 302 Web Page Hijacking
Story Text:

If you followed the recent hilarity over Google itself being Google Hijacked then you'll be aware that despite what Google have said about their persistent 302 bug, it's not just spammers that are affected. With that in mind, Mike Majorowicz has proposed a standards based solution to the problem. The solution uses the Content-Location header...

Unfortunately, it puts the onus on publishers and im not sure that's the right way to go. Yahoo have fixed the problem, why can't Google just get the MF sorted?

Comments

Unfortunately, it puts the

Quote:
Unfortunately, it puts the onus on publishers and im not sure that's the right way to go

And therein lies the problem and the reason I'd welcome it (I can easily change my content to include these headers / tags) but the downside is I will be in the minority and still be able to abuse others that don't.

Is it a fix? It cuold be, but in reality it will probably work abuot as well as rel=nofollow.

The problem is the answer is being given to the world with no responsibility being undertaken by the 1 party that could do it on their own.

Not going to happen

Right now G doesn't show any signs of wanting to fix this, it is not negatively affecting ad sales or share price, it's a PR non-event outside of webmasters .. what's in it for them?

Even this 'fix' will not work because of Google, there is no incentive for Google to recognise the Content-Location information.

Perhaps if WSJ or BBC was hijacked they would do something, right now it is probably way low on their priority list otherwise they would have fixed it already.

How can we do something to get the issue noticed? How did the bloggers get no-follow to happen (however flawed)? Is there a concencus that G has a problem or did GGs anti-tw spin on slashdot muddy the waters enough to doubt it exists? (I know at least once I was told that Google has fixed it already)..

Controlled Chaos

The "bug" offers a rotation feature for money serps, can't be all bad.

according to one of the WMW

according to one of the WMW threads the Adsense thing last week was a totally unrelated bug due to recent mods which was rolled back when they found it was causing problems - that was about P7 of the questions for GG thread I think.

RFC 2616 is already a standard

A new standard is not needed. Just interpret the last non 301/302 or meta-refresh page in the redirect chain as being the real content page and credit that content page in the serp. Everything leading up to it is just breadcrumbs.

In other words just *unconditionally* attach the content presented in the serp and the url where it was retrieved from.

The notion that a 302 is temporary is a strawman argument. It has a meaning which is widely abused in practise. Therefore, deal with it by checking back, but in the meantime treat it as nothing more than a signpost on the road to the destination.

nice but not needed

- it's nice to see that people take this seriously and try to think about new ways to solve the problem, but as plumsauce has already said above, using the destination URL in SERPs (temporary or not) would be in full compliance with current standards. So, that new method is not really needed.

It is in fact in full compliance with the RFC to do whatever you want, as long as you have a valid reason to do so. Valid for your particular purpose, that is - you don't have to convince anyone that it's not possible to solve your specific problem another way.

Also, try telling publishers that they "only need to use the Content-Location header" and I bet that 80% will not know what on earth you're talking about. It's more than difficult just teaching people how to make a 301 redirect (which on the Apache server is among the easiest things you can do).

However, the worst part about this suggestion is that publishers shouldn't be responsible for fixing Googles problems. No, we shouldn't use a special header, meta tag, language, or brand of tooth paste - this is a Google problem and Google should fix it.

Devil's Advocate

Do I think this is the best solution? No.

If Google implimented it, would people start using it? Yes.

claus said:

...using the destination URL in SERPs (temporary or not) would be in full compliance with current standards. So, that new method is not really needed.

It is in fact in full compliance with the RFC to do whatever you want, as long as you have a valid reason to do so.

claus - Google's not the only one blowin' smoke. I couldn't find the "do whatever you want" section in RFC2616. Perhaps you could point it out. :)

I think Google is using this quote from RFC2616 Sec 10.3.3 regarding the 302 status code definition to justify their inaction.

The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.

I agree that it would be much simpler for everyone involved if Google bent the standards and used the URI at the end of the redirect chain in the Serps. But that doesn't seem to be happening.

As far as people figuring out how to use it, adding
<meta http-equiv="X-Content-Domain" content="http://yourdomain.com" />
to the header in all your web pages seems pretty simple to me.

I don't honestly expect Google will impliment this, but at least now they can't claim standards compliance as a reason for avoiding the problem.

but it's not a standard

I think Claus knows that I have beaten this drum to death elsewhere :)

Now, as far as:

If Google implimented it, would people start using it? Yes.

Whether google implemented it or not, it is NOT a standard. It would be a google specific workaround. Just like all the google specific suggestions for robots.txt and meta robots. Oh, and link rel= as well.

They should fix it themselves.

As for:

from RFC2616 Sec 10.3.3 regarding ...

First, it is "SHOULD". Second, they are not a "client". Third, once the spider has left the page and the serp is being prepared, they have left the realm of the protocol and hence any restriction implied by the protocol standard. They are reporting the current location of an interesting page. Not how they found it.

If I navigate to a page, and I save it to my disk, the protocol does not apply to my future use of the file.

My usual close to this topic:

IT'S NOT ROCKET SCIENCE!

But, today's bon mot:


rank (P) Pronunciation Key (rngk)
adj. rank·er, rank·est
...
2. Yielding a profuse, often excessive crop; highly fertile: rank earth.

3. Strong and offensive in odor or flavor.
Conspicuously offensive: rank treachery. See Synonyms at flagrant.
Absolute; complete: a rank amateur; a rank stranger.

Suggested Status Code

The suggested status code for an official Temporary Redirect is a 307. I've been digging deep into this stuff lately and also have a topic going at WebmasterWorld.

302 vs 307

I've never used a 302 intentionally. The offical Reason Phrase for a 302 is Found. Not Temporary Redirect.

302/307

Reading beyond the nonsensical "FOUND", you will find that a 302 and 307 are similar in the definition parts.

Since it is the numerical code that is definitive in the standard, it can be argued that they are the same. The text is treated with as much importantance as a comment.

Then we have the following:

The status codes 303 and 307 have been added for servers that wish to make unambiguously clear which kind of reaction is expected of the client.

In that case, 307 is a replacement for 302, where 303 is also implemented.

Finally, they haven't fixed 301/302 yet. Why would you want to throw 303/307 at them? At the current rate of progress that would suck up all the PHD's output by Stanford for the next 10 years.

whatever you want - as long as...

Welcome to Threadwatch MagicBeanDip :-)

---

I couldn't find the "do whatever you want" section in RFC2616. Perhaps you could point it out. :)

You should not read my sentencences in part, as you will most likely miss information which will be apparent if you read them in full. In this case the rest is very important:

(...) as long as you have a valid reason to do so

That said: Yes of course.

Note the capital-lettered word "SHOULD"? The exact meaning of this word is covered by RFC 2119 - Key words for use in RFCs to Indicate Requirement Levels, section 3. I'll quote it here in full text for convenience:

SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

I do believe that in this case Google has a "valid reason", but my opinion doesn't really matter anyway, as implied by the above quote. All that matters is that Google understands and carefully weighs the full implications.

Devils' Advocate Redux

I'll admit, that was rather abbrasive for my first comment on TW :)

plumsauce - As far as "rank" goes, I'll assume you were talking about SERPs.

You can claim this is not a standard, but I don't think you can claim it's not standards compliant.

claus - I was not aware of RFC2119, thanks for pointing that out.

The operative quote is "carefully weighed before choosing a different course." It's Google's choice. Whether or not you or I agree with it is irrelevant.

In my opinion, this is not the best choice, but it is a better choice than the current state of affairs.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.