Google: No, really ours is the biggest, you'll just have to trust us on that ok?

A few of the headlines surrounding Google's announcement that their index is 3 times larger than the competion's are proclaiming the index size wars over. It hasn't stopped the childishness though, and apparently, we just have to trust Google when they tell us theirs is the largest.

So, the index size wars are over eh? Hardly. Google may have taken the "8 billion pages served" from their homepage, but are still claiming the index is 3 times larger than anyone elses and took the opportunity to slap Yahoo! about over duplicate entries and index size inflation:

Search engines' published metrics for index size measurement vary greatly and are no longer easily comparable. Often, for instance, web crawlers retrieve duplicate entries for one page or links to documents that they haven't crawled, and whose content thus isn't in the index. At Google we believe the essential quality of an index isn't the total number of documents, but its comprehensiveness – which unique documents are in the index. So we don't count duplicate or uncrawled pages. According to our internal testing, our newly expanded search index is more than three times larger than that of any other search engine.

That's a direct attack on Yahoo! Maybe quite justified aswell, but it's funny how on one hand they're getting all grown up and saying "enough is enough" with index size, whilst on the other still poking their tongues out and pulling faces at the competition.

Can we verify that Google's index is 3 times larger than Yahoo's? Of course we can't, you just have to really, really believe it's true ok?

John Battelle spoke to Marissa Mayer, and he had this to say:

I then asked Marissa if Google would be open to having a third party, agreed to by both sides, settle this in some reliable fashion. She said sure, but as she answered, I realized this will never happen. Both sides think they are right, and both sides will never divulge how they go about counting in the first place. So where are we left? Pretty much where we've been, only now, it's all about who you believe. So who's more comprehensive? Depends who you ask.....

You know, it seems to me that rather than the index size wars being over, we've just stepped up to a whole new level of petty mudslinging that should be well beneath your average 7yr olds, let alone grownup search engineers and PR people.


mine is bigger

All that "mine is bigger" talk is totally misleading. Yahoo might say that they have 20 billion pages and Google might now say they have 60 bio, but really - none of that matters much, as

1) they can say whatever they want, and
2) none of them even know how large a fraction of the total amout they have.

My guess; they've got a visible fraction, but not a very large one. And as their tech evolves they will both find it hard to get past some specific amount of pages. Perhaps they're already there and just throw around these large numbers because neither of them has got a clue.

I'd rather have an engine that had a limited amount of pages, but where you could easily find something you were interested in, and something that helped you with you query.

Imagine; an engine that don't just read anything and all, but one where it's actually hard to get in. I'd use it.

I hate to say this ...

but Google's is bigger at the moment we detected it on the 8 september ...



Hey Dave, you can't get past the 1.000 limit anyway, and their estimated results count are just that - estimated. Sometimes it's totally off the marks, afaik. So, are you sure you did not just detect more of the pages you already knew about? That's a somewhat different issue. Could be your pages just got easier to spider, or that they got more exposure to Googlebot, or something.

Anyway, here's some commentary from Google

Friends again?

Apparently they have kissed and made up.
"We're celebrating our seventh birthday.... We had a pretty strong year," Google Chief Executive Eric Schmidt said in a phone interview with CNET

Original google hissy fit link:

yeah, thanks hooter, and

yeah, thanks hooter, and welcome to TW

I saw danny talk about that also. Odd thing is, it was actually their birthday a few weeks back heh..

thanks for TW Nick

Just so you know, TW is one of my shortlist "first cup-of-coffee/first cigarette of the day" stops on the network.

Smaller might be better

I think at this point in time, smaller might actually be better.

I mean, who cares who has more scraped pages, auto generated spam, duplicate content with session ID URLs, etc.?

Give me the one that filters out all that shit any day!

I'm with Jill on this one!

Give me the one that filters out all the shit!!!! It's like wading through a swamp with no f*****g legs trying to find info when you have to resort to using a search engine. They are all crap just different level of crap.

As for size of the index, bollocks that's for the media and prats who actually think index size matters!

Quality, quality, quality...

I swear they set these lines up deliberately.

oh really you can't make this stuff up can you?

Battelle points to NYT who say

Google's chief executive, Eric E. Schmidt, said the company would remove the current number from its home page ("Searching 8,168,684,336 Web pages," it said yesterday) and instead ask users to guess the size of the new index.

Moreover, in typical offbeat Google style, there will be no announced prize for the best guess, although Mr. Schmidt did not rule out the possibility that one would be awarded.

"We're suggesting that users do a little taste test," he said in a telephone interview.

Actually I reckon it will be less than 20 because

"We have a very specific way of counting that we believe is very accurate," he said.

and unless either Larry or Sergey is one of those strange people with six toes on each foot they're gonna start loosing counting accuracy if it gets too high.....

Whole Thing is Funny

Neither are close to what they tout. Yahoo! has a lot of stuff indexed that has no value to a searcher like CSS files.

Google doesn't display results from sites made in the past year, so their numbers are skewed greatly as well.

>> Yahoo! has a lot of

>> Yahoo! has a lot of stuff indexed that has no value to a searcher like CSS files.

Ever played with filetype: in Google? That's an education...

I do like the fact that,

I do like the fact that, just a month ago Google were accusing Yahoo!'s new figure of being inflated & false, and they've now turned around and said "yeah, ours is not only three times bigger, its more accurate!" ... does not compute.

It's like watching a couple

It's like watching a couple of 4yr olds debate who's dad is the strongest...

