Yahoo! Robots-Nocontent Tag


Danny reported about Yahoo's new robots-nocontent attribute, which allows you to tell Yahoo's relevancy algorithms to ignore noisy parts of your pages:

The new robots-nocontent tag now allows you to tell Yahoo to ignore the clutter. Simply use the tag (technically, it's an attribute) to surround text you do NOT want included in searchable content within Yahoo.

How? It's a little complicated, but not too hard. You need to have a class attribute called robots-nocontent assigned to some tag within your document. The attribute looks like this:



I was going to complain

I was going to complain about this, but I can see how it would be useful on occasion -- although one could argue that search engines (or Yahoo) ought really to be able to tell what's important in a page, on the other hand, I can see that one might like to keep SEs from taking certain text into consideration.

I do wonder whether that means, or might mean in the future, that that page or part of that page might be discredited in SE eyes. That's my problem with all this stuff -- nofollow and now robots-nocontent.

But, as a designer, what I really don't like is that this new robots-nocontent is implemented in the same manner as a CSS class; in fact it is written exactly like a CSS class:

<p class="robots-nocontent">

That's CSS; I could use that and also write the class into my CSS to make it all red, for example.

As much as we (web designers) have tried to separate structure and design/display, I find that robots-nocontent somewhat infringes on that.

But what do I know.


>That's CSS;

Exactly! Whilst the intention is clear, it is missusing CSS for something other than display markup. There has to be a better way to do this.

A rigid CSS definition is the wrong tool for the job

Why not augment robots.txt to include a list of CSS classes that would be ignored?

This gives developers the opportunity to use whatever CSS classes they like, including existing classes. Volia! No special robot code sprinkled around.

Robot directives would stay where they belong.

Why not augment robots.txt

Why not augment robots.txt to include a list of CSS classes that would be ignored?

Yep saw that on SEL, seems an excellent suggestion. Unless there is a very compelling reason why that is flawed someone over at Yahoo think tank ought to get shot for missing that.

augment Robots.txt

>augment Robots.txt

Can you do that?

Can you do that?they just

>Can you do that?

they just did similar, but in a way that ticks off design purists. see above ;)

they have added wildcard support and sitemap discovery to robots.txt, so they could do this too.

but in a way that ticks off

but in a way that ticks off design purists

Not just design purists - all those people who don't want to spend time adding new tags to all their pages. Still it gives extra control which is always a good thing. I just wish Yandex/Rambler promoted this tag, KWD is still a significant factor in Runet. Might help bring back readable text that much quicker ;)

Wow! I'm a design purist!

That's okay -- but I think Adrian would find that amusiing, given that I've argued entirely practical but distinctly un-purist HTML solutions on occasion. :)

Just kidding, Aaron. My thought was that it sort of mixes things where they probably shouldn't go -- but that's the way these solutions *are* going.

Crap by design

Raping the markup with multi-class nonsense is a flawed attempt not only from a Web design perspective. Yahoo has stolen this idea from a crappy draft of a robots microformat, which didn't become a standard for very good reasons. The sole purpose of this unthoughtful thingy is torturing webmasters. Selling it as an implemented change request originating from the robots.txt summit is just laughable. Back to the white board!

It influences the search

It influences the search engine's interpretation of your web page. For that reason, it is a useful addition to the SEO toolbox.

Since when did it matter that search engines didn't adhere to web standards? That's a different game.

Not what I meant

You have a point, John. However, I think it's problematic to have some stuff here and some stuff there. What's next -- encoding search engine instructions into favicons?

Just kidding, but I do see an issue with mixing stuff here and there.

Coming from a search rep near you.....

Ads are noisy.. if you don't use this on ad sections you may get penalized.

Would comment tags not have been a better choice

Surely commenting within the page would have provided the same functionality, without impacting on the CSS aspect?

CSS is not the issue

Having multiple names assigned to a class doesn't impact CSS. The problem is that to make use of unsearchable page areas one must edit the markup. Assigning robots-nocontent behavior via robots.txt to existing (CSS) classes would have been way more flexible, and comes with less work for Webmasters.

That's mainly what I'm

That's mainly what I'm saying, and also that it's weird to have to implement it in a CSS-like fashion.

"Comment tags" are crap

Google's section targeting, where crawler directives are put in HTML comments, clutters the code, and adds way too much load on heavy/complex pages.

Page load time is another reason why adding class names to the markup is not the best procedure to handle crawler directives.

I don't think that CSS like syntax is weird in this context. Actually, that's a quite elegant approach. Say you've a footer block (P, TD, TR, DIV or whatever) you want to make unsearchable, and ads on a sidebar. The HTML looks like

...banners, AdSense code, text links...


Without touching any page you could assign robots-nocontent behavior to these elements in robots.txt:

td#ads-right-sidebar, div.footer {content:unsearchable;}

And thinking a step further, that could work with the link condom too. Assign rel-nofollow behavior to A elements within a table cell with the DOM-ID "ads-right-sidebar":

td#ads-right-sidebar > a {rel:nofollow;}

robots.txt is a better place to put crawler/indexer directives than the markup. Additional class names are as bad as inline style assignments.

The engines have the robots.txt cached already, so the indexing process has a current copy handy without bothering the crawler to refetch it. Gaps could get covered via the last modified attribute of robots.txt, the last-fetched timestamp, or even a cache-expires statement in robots.txt.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.