Advanced Open Source Search Engine Launched

7 comments
Source Title:
New advanced search engine
Story Text:

Search geeks that like to tinker, might want to check out Sentensa, an "Industrial strength Open Source Search Engine" that has just been launched.

Sentensa, has used 20 years experience in search technology to create the next generation search motor. Users have the ability to use totally new ways of searching amongst very large text databases. The name Sentensa was chosen in order to describe how that it takes into account search text’s underlying significance to find the most relevant search results. Searches become much faster and effective, which make Sentensa applicable in many different markets, from the pharmaceutical industry to media and general document handling. Users can, with help from Sentensa, create advanced applications for searching in all kinds of text, which can be used by many users at the same time

They have a nice demo on their website. It does look kinda cool...

Comments

Looked good until I

Looked good until I read:

The SENTENSA implementation is 100% Pure Java

Oh well :-(

[oops]

Thought it was web search ... it's document search. Anyway, as I don't trust "desktop search toolbars" I'm downloading now.

Problems with using it as a web search

GPL would be a problem if you used this for a web search engine, particularly under v3 of the license. It forces you to publish any modifications to the code (they're closing the ASP loophole with the new license).

With access to the code, the chances of SEO'ers letting nature take it's course on rankings is slim to none, and slim just left town.

Give us your feedback later

Give us your feedback later then, Claus :-)

Well, yes... Here's my review... of sorts

I gave up. I was going to take it for a spin on my Windows partition, but here's what the "install.pdf" said:

1. Start up the system with the 64-bit kernel if the system run on a 32-bit kernel.
2. Enhanced Journaled File System (jfs2), which is recommended for 64-bit
kernel is mandatory.

Huh? First, this is Windows, neither of those options are possible. Second, if I'd tried it on Linux in stead, do they really think I would run a 64-bit kernel on a 32 bit system and that I would also just switch my whole file system?

Think again.

Oh, and then for it to work with your browser:

To run the web application there must be a servlet container installed, e.g. Resin.

A "servlet container" - WTF is that? I have no idea what that sencence even means, and I wouldn't know such a thing if I saw one. I don't really know a lot about specific Java technology or terminology. I should not really have to learn it either.

Anyway, as the documentation was clearly *nix specific, I looked around in the folders that was inside the zipped download in stead. I found that there was a whole lot of ".bat" files with the word "start" in them, so that was proof that I did download the windows version, not the linux one.

And of course I tried clicking them. Well, great - all that happened was that nice DOS windows appeared with the usual Java mumbo-jumbo in them.

Only difference was with the "start.bat" inside the "/spider/" folder. That one launched a window in which I could input an URL. I input an url of one of my very small sites (6 pages grand total), and the text at the bottom of the window read "searching http //example.com/" ... and a bit later it had moved on to the next page.

Well, that was 20 minutes ago. No, that site does not have more than six pages. A lot of links on those six pages though, I wonder if it spiders recursively? There's no options or settings anywhere in that window, except the cryptic text "save links" which i checked. Apart from that there's only an URL input field. If it is following links I'm about to download some pretty huge sites.

Oh, damn... "searching http://www.imdb.com/" - I'm off to kill it right away !

Useless

It's useless. At least to me it is.

It managed to make six XML files, one for each file on the site - plus one for the first external page that it got to index. Although XML sounds fancy, the content of these files is nothing but a linearization of on-page content (with some bugs, ie. JavaScript that gets partially indexed as well).

Couple that with not being able to actually search anything. And not being able to index desktop documents either. Well, it's useless to me.

I guess if you really know your Java stuff it might be interesting. And then again, two of the classes have names in which the strings "Xerces" and "Lucene" are part, so if you know your Java Stuff you might not even find it interesting after all, as it's obviously built using other open source stuff.

The PDF paper about the algorithms was the only thing I could make any sense out of. That was of course interesting by itself, but you can just download that anyway, and even that is not really all that interesting unless you're a search geek with some math eperience.

Sorry to be so hard. It's obviously a product that some people have spent a good portion of time and thought to create. I have a lot of respect for that. The product is utterly useless to me, but that might just be my problem and not theirs.

Update: Ignore my comments above

I've just talked with Bo Lindström, the president of Virtual Genetics. We had a chat, and he told me that they had a new version coming up soon.

He also told me that one of their customers were in fact using Sentensa to query medline which is a pretty large medical database. And, that there had been some kind of error so that the documentation that I read was for IBM mainframe computers, not for Windows PC's. I'm not sure if I downloaded the wrong product as well.

So, most likely I was totally wrong in my conclusions above. And when I'm wrong I like to admit it, so please ignore what I wrote above.

I will of course test it again, including the new version when that one comes out.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.