Google and Latent Semantic Indexing

Aaron Wall takes a look at Googles Latent Semantic Indexing technology in light of recent shake ups in the main index. He says that Google have increased the weight of the LSI part of the overall algorithm and that does seem likely as many around the web concur.

A brief definition:

Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.

I've also been told that Using Semantic Analysis to Classify Search Engine Spam - an older stanford document is well worth considering when looking at recent changes...

Getting MSN Results in RSS

As Danny has just pointed out, you can now get your MSN results in RSS in a much simpler manner - just key in whatever it is you want to monitor and check for the orange RSS button at the bottom of the page.

Unfortunately you can't jig the RSS to show results from out of the country you're in like you can with the normal SERPS - they say they're working on that.

Wired on SEO - Gee, this Search Stuff Really Works!

Quite a cute article on Wired today about SEO and how it affects a company's bottom line. Nice plug for oneupweb, the firm they interviewed so there, have another link guys, job well done heh..

Although my hunch was that the results would show benefit to cracking Google's top 30, I didn't realize just by how much. In fact, it is extraordinary. Oneupweb found that the first month a site appeared on the second or third page of Google results, traffic increased five times from the previous month, and in the second month, traffic was nine times greater. The number of unique visitors tripled when a company moved up from page two to page one, and in the second month doubled again to more than six times the traffic it received before it broke the top 10. More importantly, Oneupweb discovered a correlating impact on sales: 42 percent more the first month, and nearly double the second month.

More Conspiracy Speculation on Google as Registrar

I love a good conspiracy theory, they're just so much fun... This piece threadlinked above gives, as the title suggests, 10 things Google could do with it's new status of Domain Registar - some of them are rather good, i'll list the main points but check the article for more details:

Organize a Mega Whois database - with G technology, think what they might be able to do with all that data. Blog Hosting - think Blogger Mining link networks (love this one..) no.4 is lame Just so that people don't discover what the next Google product will be? Simply to have more control over thier domains Use with Google Adsense for new domains My personal favorite: To start an ID service - buy a domain from Google and get an ID, they then track your sites via your ID and other registrars would have to interface with it so that you could get into Google - heheh... Sell you the domain, host your site, give you advertising and cut out the SEO's er.. no.10 is pretty much the same - Geocities on steriods...

Yet Another Google BETA - Google Local Goes Live(ish)

I know it's not just me becuase i've just read the nice post Danny S made linked above about Googles "BETA Problem" - has just gone live on the homepage, as yet another BETA.

What is it that makes google either unable, or afraid to finish a project?

Ask Jeeves Launch Blog

The venerable butler powered search engine has joined the blogging masses and launched a blog.

Now as AJ haven't joined the rel=nofollow camp, as they have allowed comments and that they are using Typepad's JavaScript redirect on any URLs you drop in the comments will be suffering from Ask Jeeves hijacking?

YQ - Yahoo's New Contextual Search Tool

Yahoo! have launched a new contextual search technology called Y!Q, a play on 'IQ' (ho ho ho), that allows users to search for results in the context of what they are actually reading.

The idea, spawned by Y! Search chief Jeff Wiener's search for information on Gary Jules wonderful remake of Tears for Fears Mad World, is that you can make queries based on the whole page you are viewing or just passages of selected text. YQ will take that text or page and take out what it perceives as the key terms and search for pages in a similar context and theme.

Trying it out

There are several ways you can use Y!Q - firstly, try the Yahoo! News test page. Then when your appetite is whetted have a go with one of the Demo Bars available for Explorer or Firefox. Firefox users also have seveal other options including using conquery in the way described by Chris Sherman

Once you've installed the toolbar you can highlight any piece of text on a page and click "related search" or just right click and choose the same from the context menu.

Refining Your Results

One of the great things about Y!Q is the ability, with the help of a little DHTML magic, to refine your search once you have the basic results set in front of you. Once you have selected either a whole page or part of one to search, and you have your initial results you'll see a yellow highlighted box at the top of the result set that marks out the key terms that Y!Q thinks relates to your query - for the test i tried, there were about 8. Next to each term is a checkbox and by unchecking a term, you can get refined results and see those results very quickly as the page updates as soon as you uncheck (or re-check) the box - Furthermore, on each of the results you'll see the "more like this" link - clicking that will add that pages terms to your query in an attempt to further refine your search. Nice...

MSN & RSS - Feed Submissions Back Door to MSN?

Just a little more info on the new MSN Search's deal with Moreover for RSS features.

From the press release, some key points:

Feed Discovery - use my.msn to subscribe to feeds Feed Reading - naturally.. they also cache the feeds RSS Search - this is an interesting one i think, you can search the full text of the feed from my.msn Feed Inclusion - and this is the real killer, like you can submit your sites feeds to my.msn and have them made available and searchable across the whole "msn community of users" - exact details aren't given but im sure we'll work out the benefits with a little creative experimentation...

If that last point does what i hope it does, it could be a great thing indeed...

SEM Conference in Australia

It is not often we have SEM conferences in Australia, the last I attended was in 2003, and the one touted for last year never eventuated unfortunately. Nice to see a new conference announced with a good lineup of major players. Should be interesting, and a good opportunity to catch up with everyone.

Spamming New MSN for Fun and Profit

I've spent a little time this evening talking to some of the better er... enthusiastic link hunters out there about how to game the new MSN

"Just get links, from wherever is convenient"

Know what? It's easy, way too easy in fact. One professional search spammer told me "Just get links, from wherever is convenient" - it turns out that MSN's links are not weighted in any way like links are in Google and Yahoo, you can just go right out there and get the links you want, with the exact anchor text you want and be ranking for your terms in a very short space of time.

Also, they either have no duplicate content filter, or it has a major flaw - you can copy any page you like (provided you have the site owners.... ack, who am i kidding?) and providing you get more links coming into that page than the other guy, you win.

Is this the best start for MSN?

Ok, so i know MSN is new, and this is a v1 search tech but surely the fact that a child could game it is not good for MSN? The default homepage of a Windows install is still MSN if i've been informed correctly so there's plenty of incentive for people to want to rank now rather than "when it kicks in" right?

The interesting question for me though is this: If it's very easy to game, which it is, will users actually notice the fun and games search marketers are having with the new engine? After all, there's little point in spamming away like mad to rank for shoes if your site is about venezuelan beaver cheese is there?

Anyone care to offer thoughts on the new Mickey Mouse algo from MSN?

MSN Search Cock Up's - Cache Issues, Returning Google Results & more...

When you search on the new MSN the cached pages are served from the domain

I get a 404.

Auto pinger for hire. Just pay up to 10c per page and we will ping away until the cows come home!

The above is a quote I could imagine being on the above site. Essentially a service to let Yahoo you have updated your blog.

Personally if you want such a service it will take about 5 lines and perl. If you REALLY want such a service I'll do it for you at half their cost, and I'll still be charging you 10 times too much :)

Yahoo vs Google - Yahoo Lagging but Growing FASTER

There was an article today in NYT which was only mildly interesting - what i found on a related subject that captured my attention more though was the threadlinked post by Fred Wilson:

According to Comscore, one of my portfolio companies, Google's US market share is 38% and Yahoo! is 35%. So between the two, they own 73% of the US market. Worldwide, Google is stronger, with 47% to Yahoo!'s 27%, but even worldwide, the two of them capture almost 75% of the market.


It's interesting to me because I use Yahoo! ( to be specific) as my start page, but I use Google as the search field in Firefox, where I do most of my searching. And I use Google's desktop search which further drives me toward Google when I do a search. But for local search, maps, and yellow pages, I always go to Yahoo! because they have by far the best service for that kind of search.

Of course that's all well and good, Yahoo is a more rounded service by far, but what when, that's when, not if, Google put all these Beta's and aquisitions they've been hoarding into play? Maybe under - the whole scene could shift dramatically at that point...

MSN Officially Enter the Search Race

Im not sure if the Globe and Mail are breaking an NDA here but it looks like they might be, especially in light of peter's comments - one of the "SearchChamps" team.

From the Globe and Mail piece:

Microsoft has declared war on Google, Yahoo and other search engines with MSN Search, which is being officially launched tonight at midnight, Feb. 1.

After a development period of just 21 months, MSN Search — in Canada at — Microsoft took the final wraps off its newest product, which it promises will offer a "more personalized search experience."

Microsoft's entry into search-engine technology represents the Redmond, Wash., software giant's attempt to take a bigger bite out of the lucrative search-engine pie. Ad revenue from searches on all engines in Canada added up to $21.3-million in 2004, aimed at the country's 16.4 million Internet users.

Let the games begin...

News Officially Broken

...and some quotes and links from the more interesting sources:

NYT - Microsoft Introduces Its Own Search Service

Bill Gates, Microsoft's chairman, said that Microsoft was now in a position to differentiate its offering.

"There is a tremendous opportunity for rapid innovation here," Mr. Gates said in an e-mail interview, "and the great thing about the launch of MSN Search is that we now have a strong platform in place that will enable us to begin to deliver those innovations to consumers."

Microsoft has added a few features meant to differentiate its new service.

It has included, free, the content from its Encarta encyclopedia, which until now has been fee-based.

How to Break Captcha's on Blogs

Interesting article on how to comment-spam captcha enabled blogs

Quote: (if you were one of the 94 people i comment spammed) sorry about that, and hope that you are not pissed. if you are new to my site, then you must realize that i like to stir things up every once in a while. if you've been here before, then i'm hoping you've got a smile on your face, and sort of expect stuff like this from me :) anyways, you were targeted for 2 reasons. 1) because your blog uses CAPTCHA to provide a false sense of security. 2) because we are members of the same group. so i know a handful of you (and know of most of you). could easily have done this against a bunch of strangers ... but did not think that that was a good idea. this is just my way of saying that we've got more work to do. i will not be comment spamming you anymore. unless you comment spam me back in retaliation ... and then i'll have to blast you out of the water ... just kidding.

ThomasNet - Industrial Search

Thomas Register and Thomas Regional have combined to form ThomasNet - as searchviews points out, if you're looking for Thermoset polyurethane elastomer products, you're in luck!

From the about page:

ThomasNet, powered by Thomas Register® and Thomas Regional®, brings together industrial buyers and suppliers on a national, regional, and local level.

For industrial buyers, ThomasNet is an industrial search engine that provides one source for finding the exact product, service, or supplier they need - at the exact time they need it. ThomasNet also gives buyers direct access to the detailed information they need to make a purchasing or specifying decision, including line-item product details, CAD drawings, and more.

For industrial suppliers, ThomasNet is a leading provider of Internet marketing solutions. The company helps suppliers grow their business online by driving qualified industrial traffic to their Websites, and converting that traffic into customers. ThomasNet's complete range of online catalog, e-commerce, and CAD solutions help suppliers deliver the detailed information buyers expect on the Web.

No Microsoft Desktop Search for New Windows

Looks like M$ have been forced to rethink thier integration strategies for Windows as a result of recent antitrust rulings:

Speaking on a panel on search technology at the Harvard Business School's Cyberposium, Mark Kroese, general manager of information services and merchant platform product marketing for MSN, said the federal antitrust battle Microsoft waged with the government has made the company think twice about what technologies it can add to the operating system.

There are some search engines out there right now breathing a little sigh of releif no doubt though they mention nothing of integrating a toolbar for IE for their MSN Search stuff...

Google & Blogs - Should they Buy Technorati?

Russell Shaw at The Standard posts a little mild speculation on whether Google already have plans to buy Technorati - it's no secret that Google are severely lacking in the blog search department: The way Google works is just not suited to rapid indexing of the kind that Technorati specialize in and, i would say, that that flaw is becoming increasingly obvious and making GOOG far less useful for many types of search tasks.

Im not sure that GOOG would actually buy Technorati, i think it's far more likely they would develop something similar in-house. Technorati may be watching 6.5 million blogs but with Googles power, if they were to launch a pinging service they'd surpass that number in no time i think.

From the threadlinked standard piece:

Google, they of the $52 billion market cap, needs that functionality in the blog search space. They should acquire Technorati, and then do the following:

*Keep Technorati as a distinct URL. Otherwise, pinging anarchy would rule, and that would suck.

*Set up a Blog tab on the Google home page, on the same line as "Web, Images, Groups," and so forth. That way, you can specify a search of Blogs rather than Web, Images, Groups, or even your desktop.

*If you perform just a plain ol' Web search, list the first few Blog hits above the Web hits, in summary form.

Yes, Google should buy Technorati. How about, now??

For me, i rarely use google for breaking news - i do use of course but it's not the same, and it's not as fast as finding blogs that have already spotted a story and are linking and commenting on it right now.

