Yahoo! News Tag Soup

1 comment
Source Title:
Yahoo Tag Soup
Story Text:

John Herren spent about four or five hours creating Yahoo News Tag Soup., a very neat Yahoo! app using tags and the Y! API. Here's how he did it...

After discovering the Content Analysis web service
I decided it would be fun to see what Yahoo! thought the important
keywords would be from it's own news feeds. I was also interested to
see if I could use some simple analysis to see what the 'hot' keywords
were. What I've done is written some code in php to do the following:

  1. Grab several RSS feeds from Yahoo! News. I use caching to make sure I only fetch them once an hour.
  2. Chunk them into a MySQL database. I have to be a little tricky here because I don't want to insert the same article
    twice, even if the same article appears in several feeds. MySQL table indexes to the rescue.
  3. Next, I use the Content Analysis web service

    from Yahoo! to extract the keywords from the article titles and
    descriptions. The keywords I get back get shoved in the database and
    associated with the appropriate articles.

  4. Everything else is gravy. I pull out the popular keyword
    tags and use a silly scaling function to assign css according to how
    often the tag shows up.
  5. Clicking a tag's link will display all the article
    abstracts for that particular tag. For fun I also display the other
    tags associated with each article.

You can email him for the source.

Checking the site out, I thought it was pretty cool. The question is, where is the line drawn before something like this becomes 'bad scraping'? Is it just when you try to monetize the content? Or do it on a large scale like WordPress?

I don't know. I think the 'service' is pretty cool. You?


It's pretty cool

and it always makes me a bit envious when someone makes this in just an afternoon. Real nice.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.