Google has a post up on the Google News blog today talking a little bit about how it recrawls news content in order to provide the most up to date content and eliminate dead links.
"How do you balance looking for new content against the need to update older content? How can you make sure the content is fresh, doesn’t link to dead pages or display headlines that have been changed by the publisher?" asks Google.
Google’s answer is that it has implemented a recrawl feature that lets it focus on getting the newest content, while displaying the most current version of older content. After Google News discovers an article, it will continue to crawl it repeatedly to look for changes. In the first day, it will actually recrawl it more frequently, because as the company says, the most changes are usually made to news stories soon after they’re published.
"In some cases, we’ll even revisit articles we had trouble crawling the first time around," says Google. "After that, we visit them less often. Either way, we try hard to present users with the freshest news. (We bet whoever wrote "Dewey Defeats Truman" wishes they had recrawl!)."
Google says the feature is intended to reduce the number of outdated headlines and dead links, and for publishers, it will provide assurance that Google will index the latest stories and updates as soon as possible.
Related Articles:
> Google Changes How it Handles Paid Content
> Minds of the Media Gather to Discuss Future of News
> Google Okay With Blocking News Corp.



"Now, with the news-specific crawler, if a publisher wants to opt out of Google News, they don’t even have to contact us – they can put instructions just for user-agent Googlebot-News in the same robots.txt file they have today," 









On the other hand, it is possible that publishers are partially behind the lab. "I'm going to guess that Flipper may be something that Google developed in conjunction with publishers, who have lobbied for more visible placement in Google News and contended their brands have been diluted and their content 'devalued' by intermingling on Google News with random blogs and no-name sources," 