Google appears to be getting ready to launch another offensive against website scrapers.
Scraper sites are usually operated by spammers. They copy almost all the content of the scraper site from other websites. By doing so, they hope to exploit the popularity of the material from original content makers to steer search engine traffic to their sites to make money through advertising.
"Scrapers getting you down? Tell us about blog scrapers you see... We need datapoints for testing," Google's web spam leader Matt Cutts said in a recent tweet.
Cutts' war cry illustrates Google feels more effort is needed to combat scrapers.
Along with his tweet, Cutts included a link to a form that allows web surfers to report scraper pages to Google. Some of that information may be used to test and improve Google's algorithm, the company said.
The form asks for the text of the search query that produced the scraping problem -- such as a scraper site outranking an original content site -- as well as the URL for the original content site and scraper site. There's also a form field for top-of-head comments.
Some scrapers are so successful in what they do that their sites achieve higher search engine rankings than the sites of the content makers from whom they pinch their material. Google attempted to correct that situation in January, when it changed its top-secret search algorithm aimed, among other things, to address the scraping problem.
Scraping, along with search results poisoning, have long been a problem with search engine results, although Google has steadfastly defended the quality of its results, saying the results are better than they have ever been in terms of relevance, freshness and comprehensiveness.
Earlier this year, Google announced changes, including filter changes, in its algorithm. The filter changes, referred to as "Panda," didn't quell the problem. Quite the contrary, it may have made it worse. "We've experienced a significant drop in our traffic (almost 35%) as a result of this change (with an equivalent drop in revenue)," wrote one webmaster after the change took effect. "We believe that our only crime is that we host user-generated content."
Google took another crack at the scraping problem in June, when it rolled out version 2.2 of the Panda filters. Reviews of that move appear to be mixed.
With this latest effort by Google to garner information on scraping sites, maybe the next version of Panda will finally put the issue to bed.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.