According to a blog post from Stone Temple, Google skews towards those with higher follower counts and "authority" when indexing tweets. Makes sense. Also according to the post, indexation of tweets has increased from 0.6% in February to 3.4% in June. Which is a big leap, but still leaves a lot of Twitter unindexed.
Stone Temple promises to continue comparing data and monitoring how Google indexes Twitter as time goes on. An older post, I know, but I did see it on nearly the front page of Reddit today. The comments there are mostly garbage though. That's Reddit for ya.
The comments on the blog post are interesting though:
Is it possible that as Google indexes more tweets, its algorithm begins to “learn” what’s useful to index and what isn’t? (they probably already have this). If so, would it be possible to get more exposure in Google’s “tweet” index by deliberately optimizing tweets (almost going back to gaming the system)? I guess this could be used both ways, for ethical business purposes, as well as for less desirable (spammy) purposes.
My bet is that no deal for Facebook content will happen. FB and G are deadly enemies, and they both want it all, so I don’t see any coopetition in the works there.
In an online world where Google values long blog posts of 2000 words (which may be as much or more than 16,000 characters) – it may be that a high percentage of this firehose content has actually no value, or can’t be evaluated. When we are seeing over 3% of the data indexed, we may be seeing the most valuable part of it. That ratio is common in other industries (paid advertising has a much worse ratio, overall – percent of data people find valuable enough to click at.)