MozDex Gets a New Lease on Life


mozDex, the open source search engine has just received a new lease on life.

The project, originally intended to port the Nutch search engine software from Sun's proprietary Java platform to an open source platform has been taken over and is getting a face lift and some renewed vitality.

Some of the changes that have been implemented already are:

  • mozDex's open source search feed which provides a free search API has been improved. Previously URL's to sites that ranked in the search feed were a redirect back through mozDex. URL's in the search feed are now direct to the ranking site. (The free search feed allows anyone to pull search results down from the site in an XML/OpenSearch format.)
  • AdSense and Searchfeed have been removed from the site and a new PPC system has been introduced, allowing advertisers to create their own accounts directly on mozDex.
  • A paid directory has been implemented providing both standard directory listings as well as enhanced 'content page' listings. Content page listings allow webmasters to better promote their sites from within the directory and include up to three outbound links on the content page.
  • Move to new, fast servers with lots of bandwidth which will allow for a larger, fresher index.

The project has received a lot of interest in the technical press and amongst academics and the open source community. Going forward the project has some new goals along with it's previous ones:

  • Keep the open source search feed API running, stable, and useful to the community.
  • Restart the port from Sun's java platform to a non-proprietary platform, ensuring that the entire search project relies exclusively on other open source projects.
  • Over the long term, some sort of shared revenue advertising feed will be developed where publishers can provide ad space, then select whether their share of the revenue is paid to them or to recognized open source projects.
  • Make information collected from the search engine available. Information such as search terms, trends over time, and other information collected as the result of searches will eventually be made available publicly for either commercial or research purposes.


Lucene Ports

Since the guts of Nutch is powered by Lucene other Lucene ports have been made into both platform and web langauages including C++, Python, PHP and PERL. You can easily write your own spider and make your own search engine if you're willing to commit the time in any prefered language :)

Not easy

You can easily write your own spider and make your own search engine

The last thing it is, is 'easy'. Even with nutch and various ports available, there's just about nothing plug and play. There's stuff that's 'easy' if you want to index 100 sites. If you plan on crawling and indexing 50 million pages in a few days, it's neither easy or a bit of time to get a program like that running :).


you said easy, I said easily...

easily as in simply, more like lucene is translated into another language all the leg work is done you do the rest... whether its indexing a large blog or webpage or crawling your mini web or integrating it into an email application you built.

Wheel, fix the Nutch link in

Wheel, fix the Nutch link in your post please :)

Actually having done this

Actually having done this recently, I can tell you that working with Nutch in anything other than Java is tricky at best.

this is good news

because I had to turn off my MozDex feeds because of several of the above outlined problems - looking forward to turning them back on and supporting open source search!

