The Greatest Test of Open Source: Beating Google
Steve Mallet has an interesting, if slightly out there post over at the O'Reilly dev blogs today.

He speculates that Google's greatest threat, or Open Source's greatest challenge could come from a company or consortium that set up shop with Nutch the open source spider software that most of us will have seen in our logs at least once or twice in recent years.

Enter Nutch. Nutch is an open source search engine crawler, indexer, etc. The project appears to have been a bit dormant since its first media splash a few years ago, but has just recently become incubated with the Apache Software Foundation.

As I write this I have Nutch crawling a few sites just to test it out on my own. It's the fifth of my tests. I'm increasing the search depth, and playing with a few of its knobs & buttons. The first few tests worked, but weren't terribly compelling. Not that the Nutch site doesn't give you the straight goods upfront. Their site says, "Nutch has not yet been tuned for quality. There are ten or twenty knobs that we can twiddle to adjust the ranking formula. We are developing software to do this tuning automatically, but the current code just contains guesses. With a little tuning we should be able to get results that are competitive with those of major search engines."

Attract some more developers and I bet this happens sooner than later.

I've not played with Nutch but i understand it's easy enough to set up and get running so i may have to see if there's a Gentoo ebuild for it heh...

Google came under heavy fire recently for not giving back to the OS community when they owe so much of their success to it.


Open Sourcing the Data

Douwe Osinga posted thoughts along these lines, Why we should leave Google behind a good while ago (Sept '03). Essentially he envisages a seperation of data and algorithms, the data being open source with anyone free to develop their own algorithms to manipulate that data.

