Danny Sullivan Rips into FirstMonday Monkeys

35 comments
Source Title:
Looking At Google Bombs, But Not Very Well
Story Text:

Why is it that the last thing anyone does when writing something about Search, is ask someone who knows about Search? That's what Danny Sullivan is essentially asking when he tears into FirstMonday.org over a poorly written paper on "Google Bombing".

There are a bunch of fundamental errors in the paper according to Danny (no, i've not read it - that's what Danny's for heh..). He sums up with this:

This is from "peer-reviewed" First Monday? I spotted one major factual error and one major flaw to the analysis within a minute that apparently got past whatever the peer review committee is. Memo to First Monday. Get some search experts and search marketers to do some of your reviewing when running papers about search.

Another case of the "SEO Leper" syndrome perhaps? It does seem like the very last thought to enter anyones head when writing about Search is to actually ask anyone who's job it is to know about Search....

Comments

Danny's criticisms are invalid

He writes:

Quote:
For one, "The coining of the phrase 'Google bomb' " didn't lead to the "mimetic diffusion of the rank influencing techniques...now referred to as search engine optimization."

Well, that isn't what the paper says. The specific passage actually reads:

Quote:
Indeed, the coining of the phrase "Google bomb" facilitated mimetic diffusion of the rank influencing techniques, something that is now referred to as search engine optimization (SEO) in the more mundane circles of commoditized Page Rank manipulation.

As far as it goes, this statement is fundamentally and factually correct. However, Danny didn't bother to point out that the paper consistently uses "Page Rank" incorrectly (from the very start).

Next, Danny writes:

Quote:
The problem is, Google only reports a slice of whatever links point at a particular page.

Sorry, Danny, but any random sampling should provide a good basis for a statistical analysis. In fact, Google's random sampling should be more trustworthy than any "independently" achieved random sampling. It's better that he uses what Google gives him than that he constructs his own base of links.

Hence, the analysis is not statistically skewed (also, comparing Google's reported linkage to MSN's reported linkage is misleading, as there is no indication that MSN duplicates Google's coverage -- and, in fact, recent studies claim that the search engines DON'T have as much overlap in coverage as previously believed).

First Monday doesn't need help from the SEO community if the SEO community isn't going to bother to get its ducks in a row, either.

The paper's analysis may be flawed for any number of reasons, but certainly not for the reasons Danny provides.

No, Danny's bang on here. He

No, Danny's bang on here. He doesn't say that G return a random sample, because they don't. They return a selective sample, which is not the same thing, and is not a basis for statistical sampling.

G have been knocking authority sites (the very things we were looking for - anyone remember -link: ? *sigh*) out of the returned results for a while, specifically to render the results unusable for statistical analysis.

Tall Troll, GOOGLE says they return a random sample

You don't have to believe Google -- many people don't -- but even Matt Cutts has stipulated that they return a RANDOM sample.

Source

So, Danny's criticism is just not justified.

Martinez

you're wrong

many people misuse the word random

it ain't random, it's selective

Smoke and mirrors...

There may well be a random element.... after they've stripped the authorities. You think Google employs at least 1 guy with a working knowledge of statistics perhaps?

Wrong smoke and mirrors, gentlemen....

Whether Google employs anyone knowledgeable in statistics is not the issue. Whether they claim (through one or more means) to be displaying random results in their backlinks reports is quite another. That is the claim, it's silly to insist that I am "wrong" to say so when I clearly provide a source of information on the subject.

Andy, feel free to provide a reputable source of information that shows the Google backlink report is somehow "selective" (SEO forum rumor mills are not reputable sources of information).

Now, Googleguy made an interesting post last year in which he spoke of Google exporting only "M%" of links (apparently based on PageRank):

Quote:
Google doesn't return all backlinks in response to a link: command. In the ancient days, it was because there was a finite amount of storage space on the machines that served link: requests. So we only kept the backlinks for the top N pages. Later as we moved to a different indexing system, we kept backlinks for the top M% of pages. This was helpful for important pages, but it meant that Mom and Pop sites with lower PageRank wouldn't have as good a chance to see their backlinks.

At SES London, DaveN had a suggestion. He said: why don't you give all pages an equal chance of seeing backlinks? That's good for users, who will have a greater chance of seeing backlinks for a given page, and it's especially good for smaller websites--they'd have a chance to see backlinks. It seemed like a good idea, so we implemented it. In fact, in order to give each page a better chance of seeing backlinks (instead of just the top M% of pages), we doubled the amount of backlinks that Google exports to the outside world. So users now have access to twice as much link: data as before; it's just not all the top PageRank pages.

That would describe either "selective" or "random".

On the other hand, at one point, GoogleGuy did report that Google was considering DaveN's suggestion that random backlinks be reported:

Quote:
I believe that at the recent Search Engine Strategies conference in London, DaveN suggested to a Google rep that it might be better and more fair to show a random sample of backlinks instead of only higher PR links; that would allow site owners of smaller domains to see more of their backlinks, even if they don't have high PR links. I know the Google rep thought it was a really good idea that would help smaller sites that might not have high PR links, so they passed on DaveN's suggestion. If you see the link: command start to return a wider spectrum of backlinks instead of high PR links only, you can thank a WebmasterWorld member for the suggestion!]

So, it would appear that Matt Cutts has confirmed that Google is now reporting random backlinks.

So, I'll stick with the "random" position until I see something credible (other than "collective SEO wisdom") that supports any other point of view on the matter.

As always, Google remains the authority on Google.

Do some research yourself

Bulk load Google and Yahoo backlink results into a MySQL table and loop 1000 times fetching Yahoo backlinks ORDER BY RANDOM LIMIT NoOfGoogBL and comparing the cursor with Google's data set. If you still don't believe that Google backlinks are selected randomly from a subset then go to the next page having many inbounds and repeat.

it's silly to insist that I

Quote:
it's silly to insist that I am "wrong" to say so when I clearly provide a source of information on the subject.

It's not quite as silly as believing everything anyone tells you though eh?

"I wear a nuns habbit whilst riding my bike to the shops." - It could be true...

Silliness

Quote:
It's not quite as silly as believing everything anyone tells you though eh?

Ain't no one showed Matt to be wrong or misleading. Have you got any evidence showing that Matt is wrong?

I would be glad to take a look at it.

Sandbox? Pagejacking?

As always, Google remains the authority on Google.

won't even touch that one.

I'd hate to sound pedantic,

I'd hate to sound pedantic, but we don't actually seem to have a clear and definite statement from Google or Matt on the exact manner of backlink provision.

So far we have Stephen Spencer, and Matt Cutts quoting DaveN.

I'm not convinced most people will accept non-Google sources as speaking for Google policy.

"I wear a nuns habbit whilst

Quote:
"I wear a nuns habbit whilst riding my bike to the shops."

It's true! Matt Cutts said so in his blog. (Or maybe I dreamt that.)

Ain't no one showed Matt to be wrong or misleading

and the opposite?

It's Selective

I'm not going to give you all the reasons why it's "Selective" .. people always seem to quote the SES London quote .. Although they only ever read the bits that Google tell you and not all the the bits.. they never mention the 3 types of Link that we also discussed and the real reason why I personally wanted it changed.. :)

But hey you can believe it's just a random sample if you want :)

DaveN

More smoke and mirrors, DaveN

Quote:
I'm not going to give you all the reasons why it's "Selective"....

I can't force the bloggers and forum reviewers to post all the details about what is said at the various SE conferences. But if you have some real specifics, then you SHOULD share them -- somewhere accessible, if not here.

It's not enough that fifty SEO forum regulars repeat the Holy Mantra of "It's Selective". That proves nothing except that people in the SEO world, like in so many other online communities, will line up and sing any false tune just because it sounds good to them.

the quote says not random....

Quote:
I believe that at the recent Search Engine Strategies conference in London, DaveN suggested to a Google rep that it might be better and more fair to show a random sample of backlinks instead of only higher PR links; that would allow site owners of smaller domains to see more of their backlinks, even if they don't have high PR links. I know the Google rep thought it was a really good idea that would help smaller sites that might not have high PR links, so they passed on DaveN's suggestion. If you see the link: command start to return a wider spectrum of backlinks instead of high PR links only, you can thank a WebmasterWorld member for the suggestion!]

Yeah ok - Dave suggests a "random sample" and Google thought about that and decided to return "a wider spectrum ..... instead of high PR links only"

A wider spectrum is not random by definition is it? Random could as easily return only high PR ones as a wide spectrum. So I would say that if that proves anything it's that they considered Daves idea and made some changes that they thought would improve things - but that it's not random.

Or mayby he just typed something without thinking it would be analysed word by word at some later date?

I second that

well

my money's on DaveN regardless heh

Michael make a good case

I *don't* agree with Michael, but I find his argument to be sound. Maybe he will pester people to the point that they will offer evidence.

It would be illogical for G

It would be illogical for G to return a genuinely random sample for a link: search. The whole point about breaking it in the first place was to stop people hammering their servers with "non-productive" queries. How many AdWords clicks do you think come from link: searches?

If they allowed people to build up a statisitcally valid picture of backlink data over a large number of queries, they would in effect be inviting the data junkies to run a constant stream of spiders at them... that hardly reduces the load on their servers, which was the intent of the move

Reply to Talltroll

Quote:
It would be illogical for G to return a genuinely random sample for a link: search. The whole point about breaking it in the first place was to stop people hammering their servers with "non-productive" queries. How many AdWords clicks do you think come from link: searches?

Given how many threads the SEO forums see every week where people complain about not seeing all their links in Google, I would say it's a reasonable guess that Google is still being hammered. Maybe their primary concern was that making it easy for the spammers wasn't helping anyone but the spammers?

Hm. Can't speak for Google, but that is more logical.

Quote:
If they allowed people to build up a statisitcally valid picture of backlink data over a large number of queries, they would in effect be inviting the data junkies to run a constant stream of spiders at them... that hardly reduces the load on their servers, which was the intent of the move.

You are incorrectly assuming that the random sampling has to be updated every time the query is run. Random results can be fixed in time and are still random.

Matt Cutts has said they update the backlinks database on an occasional, not continual, basis. So there is no value in running the same link query over and over again.

Besides which, you can form a pretty decent assessment of which links Google has found by searching for references to the URL. Since they claim they can parse those URLs out of standard text, you have to assume there may be non-link data in the results.

But you get a more accurate picture of what Google knows by querying Google than by querying any other search service.

Michael Martinez

But if you have some real specifics, then you SHOULD share them ..

LOL .. Why .. DON'T you ever have OFF record convo's

Show me one post from Chris_r Jenstar or ME what we discussed with Sergey.. I bet you can't find one bit of earth breaking news... BUT that doesn't mean there wasn't any :)

Researching a subject to share is one thing, researching search engines because you are an SEO is totally different.

DaveN

Good reply, DaveN

Quote:
Researching a subject to share is one thing, researching search engines because you are an SEO is totally different.

I agree with you, since I don't share everything I learn (right away) either.

Nonetheless, without offering any backup, your position is not credible.

But if the results are truly NOT random, it should be provable through any of several possible tests that anyone with the time and resources can run. For example, let's say they are all based on age. You grab the backlinks from a few hundred domains, look up their start dates in a WHOIS tool (or their cache dates), etc.

Or, if they are all based on PR, you grab your sets of backlinks and do a PR lookup through the API.

Or, if they are all based on being non-affiliated, you crunch out the IP addresses and ownership data.

If they are from a class of manually vetted domains (I doubt this given that pizza.com shows meesook.com and malektips.com among its whopping 7 backlinks today), then there is no external test I can think of that would be able to show that they are manually vetted.

All of these tests, and others which could be proposed, would be time-consuming. No, I'm not willing to do them. I don't have to.

But like I said, I would be happy to review anyone else's statistical review. It might be an interesting read.

Do it yourself

I would be happy to review anyone else's statistical review. It might be an interesting read.

If you're really interested, hand out that minispec to a programmer:

Bulk load Google and Yahoo backlink results into a MySQL table and loop 1000 times fetching Yahoo backlinks ORDER BY RANDOM LIMIT NoOfGoogBL and comparing the cursor with Google's data set. If you still don't believe that Google backlinks are selected randomly from a subset then go to the next page having many inbounds and repeat.

This method is suitable to prove that Google backlinks are not a randomly excerpt, but it will not reveal the selection criteria.

actually

Nonetheless, without offering any backup, your position is not credible.

I find it to be pretty credible... he's DaveN

No one is credible on the basis of nothing

As much respect as I have for DaveN (and I don't by any means consider him to be among the TweedleDum/TweedleDee class of commentators who occasionally drop in here), if he feels constrained to withhold information he feels was provided in private confidence, he should not be claiming to have the correct answer.

Quite frankly, Andy, you said very succinctly "some people misuse the word random". Well, that cuts both ways. Substitute "selective" for random.

We could just as well say "Random means one thing to some, something else to others" and "selective means one thing to some, something else to others".

Maybe enough has been said on this topic for now. I cannot think of any reason to reply further, and if anyone wants to just goad me, I'll do my best to bite my tongue. My position is clear enough.

I do hope that whatever DaveN has heard comes to light through an acceptable channel in the future.

I cannot think of any reason

Quote:
I cannot think of any reason to reply further

yay!

Michael, I'm missing your

Michael, I'm missing your message here - you credit DaveN as an authoritative source on Google policy, then you indicate that because he does not divulge everything he communicates with Google he is therefore not a credible source.

It seems that the only thing you are consistent about is trying at every attempt to denigrate the opinions of others, while trolling yourself as an authority on subjects you are not even clear about. It would be great if you could try and be a more constructive part of the discussion.

Okay, here we go again

Quote:
Brian Turner: ...you credit DaveN as an authoritative source on Google policy,...

I said I respect DaveN. I didn't say or imply he was an authority on anything. As far as whether Google is only offering random samplings of backlinks or selective samplings, he is not a credible source of information because he offers nothing to back up what he says. He is not obligated to do so, and I don't care who accepts what he says at face value. I'm not going to ridicule people who believe everything DaveN says.

Quote:
Nick W:

SEO's (any seoøs / all seo's) know nothing

Well, the most vocal SEOs here and elsewhere devote most of their posting time to ridiculing and belittling anyone they don't agree with, rather than posting information, citing sources, and backing up their points of view with logic, reason, and facts.

Quote:
Im an authority

And putting words into other people's mouths.

Quote:
Everything that Google have ever said is true, to this present day.

Google has said a lot of stupid things in their time. But who has caught their public representatives in a lie regarding what they do and how they do it?

As long as all the SEO community can offer in response to that question is nothing, there seems no reason to believe that Google is lying about anything they do or how they do it.

As always, I invite everyone to share evidence to the contrary. The copmplete lack of such evidence, every time people imply Google are a bunch of liars, just reinforces the image that most people who spend their days posting ridicule and abuse in SEO discussion groups don't have any facts to back up what they say.

Your mileage may vary.

Quote:
It really doesn't make for a productive discussion.

The posts I just responded to don't make for a productive discussion. If you guys REALLY want to keep putting words into my mouth, I cannot stop you. Nor can I stop you, Nick, from editing or deleting my replies.

But if all you guys can do is snicker and tease when I ask for credible evidence to back up your positions, you have no reason to expect me to suddenly start agreeing with "conventional SEO wisdom".

Quote:
I'd like us to be able to talk about Search without the Michael vs the Word now please.

So would I, but you guys (and that includes you, Nick) keep taking this to a personal level. As long as you insist on making me the topic of discussion, it's never going to be otherwise. You as moderator DO allow the non-substantive ridiculing posts to stand.

Agreed. Im afraid the only

Agreed.

Im afraid the only messages i get, consistently are:

  • SEO's (any seoøs / all seo's) know nothing
  • Im an authority
  • Everything that Google have ever said is true, to this present day.
  • Unless you can give me proof to the contrary

It really doesn't make for a productive discussion. I'd like us to be able to talk about Search without the Michael vs the Word now please.

>> Maybe their primary

>> Maybe their primary concern was that making it easy for the spammers wasn't helping anyone but the spammers?

Do you think that if they DID return useful results to a link: search, they'd be seeing 100k searches / day from some of the serious data junkies here? They can't stop people using it at all (without just killing it), but they can sure as hell cut off some obvious load sources...

>> You are incorrectly assuming that the random sampling has to be updated every time the query is run.

No, just poor wording on my part. Since conserving their computational resources / bandwidth, etc lies behind the breaking of link:, it makes perfect sense for them to calculate the link: results for pages once per index or whatever, and then note new requests for link: data, and process when they've got time, if at all.

Milliondollarhomepage is a prime example : given the recent blog buzz, you'd expect to see some link data for it now. G has nothing, Y have over 350 links recorded. Kudos to Yahoo for providing fresh results here

>> form a pretty decent assessment of which links Google has found by searching for references to the URL

Surely that will pick out sites that merely talk about the target site, rather than explicitly link to it? There will be considerable crossover, but there are weaknesses too. You are still limited to 1000 returns, and you won't be able to see most image links.

For a site with a decent presence, the 1000 results is the killer. How do you get useful info on a sites links when the "www.example.com" search returns 1.1 million results? You can only see a maximum of 0.1% of them, and not all will be links anyway, just references.

>> then there is no external test I can think of that would be able to show that they are manually vetted.

That's kind of the point. Since we can't see the set of total possible responses (all backlinks for a domain), and we don't know exactly what is used to determine what shows, the data returned is useless.

Empirical data shows that in no case that I can recall have certain high value backlinks that I know exist have ever been returned. Several others have noticed a similar pattern, high value backlinks that can be seen by manual inspection don't show in a link: result set. Ever.

We can't prove it conclusively, since the necessary data is unavailable to the outside world, and no G engineer would be silly enough to confirm / deny it, I suspect, but it is logical that they remove at least some authority sites from a backlink set before processing them for display. It would be simple to program, and highly effective. That fulfills Gs' normal criteria for implementation

In reply to Talltroll

Having failed to bite my tongue, but trying to get back on topic...

Quote:
Do you think that if they DID return useful results to a link: search, they'd be seeing 100k searches / day from some of the serious data junkies here?

There is no doubt in my mind that Google would see an increase in autoquerying if people felt they could gain an advantage from it.

Quote:
Since conserving their computational resources / bandwidth, etc lies behind the breaking of link:, it makes perfect sense for them to calculate the link: results for pages once per index or whatever, and then note new requests for link: data, and process when they've got time, if at all.

What would be the purpose of noting new requests? They export link data on an occasional basis, apparently without regard for who wants it (and who, in fact, should want link data, except someone interested in search engine optimization?).

Quote:
>> form a pretty decent assessment of which links Google has found by searching for references to the URL

Surely that will pick out sites that merely talk about the target site, rather than explicitly link to it?

I did say that in the portion of text you omitted from your citation. However, you do get a more accurate report of what Google knows from Google by doing that than by looking at Yahoo!, MSN, or Ask.

Quote:
There will be considerable crossover, but there are weaknesses too. You are still limited to 1000 returns, and you won't be able to see most image links.

You can find more than 1,000 results by refining your queries. A lot of people do just that. Grab 1,000 results, change the query, grab another 1,000 results, change the query, etc. Tedious, but doable.

In any event, you can't see all your backlinks on Yahoo!, either.

Quote:
>> then there is no external test I can think of that would be able to show that they are manually vetted.

That's kind of the point. Since we can't see the set of total possible responses (all backlinks for a domain), and we don't know exactly what is used to determine what shows, the data returned is useless.

It's random. That makes it statistically valid for random sampling-based analysis.

Random sampling is widely used in many analytical situations. Just because SEOs think that Google's backlink feature is useless for them doesn't mean it's useless for everyone.

ON EDIT (because I hit Post comment too soon):

Quote:
We can't prove it conclusively, since the necessary data is unavailable to the outside world, and no G engineer would be silly enough to confirm / deny it, I suspect, but it is logical that they remove at least some authority sites from a backlink set before processing them for display. It would be simple to program, and highly effective. That fulfills Gs' normal criteria for implementation

There is no logical reason to remove authoritative sites from a random sampling of data. There is no logical reason to filter anything out of a random sampling of data.

I can assure you that I see plenty of authoritative sites (including Yahoo!) listed in backlinks for some of my own sites. I don't use the backlink search on Google except when discussing it with other people in the SEO community, but I just checked now and I see both authoritative and expert backlinks as well as fluff backlinks.

If I run a check for Threadwatch.org, I get a similar mix.

So, seeing authoritative sites in the random samplings, I have to conclude that the SEO community assumption that authoritative sites are being excluded from the results is erroneous.

And if people want to say that only SOME of the authoritative sites are getting through, well, yeah, that is also true of fluff sites. That's because it's RANDOM.

But if all you guys can do

Quote:
But if all you guys can do is snicker and tease when I ask for credible evidence to back up your positions, you have no reason to expect me to suddenly start agreeing with "conventional SEO wisdom".

Michael, you're the rudest, most objectionable troll we've ever had in here. Period. Every conversation you get into becomes Michael vs World and im bored of it.

If you don't possess the social skills necessary to interact with your peers. Don't.

A random sample of what?

Quote:
It's random. That makes it statistically valid for random sampling-based analysis.

Michael, you've been drawing inferences and stating them as fact for a long time. But this one is hard to understand.

Quote:
In fact, in order to give each page a better chance of seeing backlinks (instead of just the top M% of pages), we doubled the amount of backlinks that Google exports to the outside world. So users now have access to twice as much link: data as before; it's just not all the top PageRank pages.

So tell me Michael... did Google 'double the amount of backlinks' they show to the outside world by:

1) Completely dumping the backlink database, then rebuilding it with a completely random sampling of all links? (which would make the # of results from a link: search some fraction of the actual total)
or
2) Keeping the 'top M% of pages' and then adding a roughly equal number of randomly selected links?
3) Creating some other blend?

That quote you cited doesn't really say, does it? (note to the ultra-paranoid: it doesn't matter if GG is lying)

Charting Google's reported link counts against counts from Alexa, MSN, and Yahoo doesn't suggest that they've done #1 at all. It's slightly less like a scattergraph than it used to (actual cases of Y! reporting fewer links now exist), but it sure doesn't look like Google is simply presenting a random sample of all links.

I won't address the question of whether Google is filtering out some sites (duplicates, authorities, link whore directories) because I just don't have enough information to take a guess.

I will leave you with a quote from Donald Rumsfeld:

Quote:
As we know, there are known knowns. There are things we know we know. We also know there are known unknowns. That is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know.

There are also things we think we know, but actually don't know, and won't let go because we would rather be "right" than correct.

Excepting Mrs. Pentherby

You know, Nick, maybe Michael has a way of getting under folks' skin at times, but well-spoken trolls occasionally serve a very constructive purpose. From H.H. Munro's "Excepting Mrs. Pentherby", about relationships at an extended country house party:

Whether the object of her attentions was thick-skinned or sensitive, quick- tempered or good-natured, Mrs. Pentherby managed to achieve the same effect. She exposed little weaknesses, she prodded sore places, she snubbed enthusiasms, she was generally right in a matter of argument, or, if wrong, she somehow contrived to make her adversary appear foolish and opinionated.

Sounds familiar, huh? But, as the host of the house party sums it up:

She's invaluable, she's my official quarreller...I introduced her into the house-party for the express purpose of concentrating the feuds and quarrelling that would otherwise have broken out in all directions among the womenkind.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.