Critical.
Authoritative.
Strategic.
Subscribe to CIO Magazine »

Google's war against scraper sites continues

Scraper sites hope to exploit the popularity of the original content makers to steer search engine traffic to their sites.

Google appears to be getting ready to launch another offensive against website scrapers.

Scraper sites are usually operated by spammers. They copy almost all the content of the scraper site from other websites. By doing so, they hope to exploit the popularity of the material from original content makers to steer search engine traffic to their sites to make money through advertising.

"Scrapers getting you down? Tell us about blog scrapers you see... We need datapoints for testing," Google's web spam leader Matt Cutts said in a recent tweet.

Cutts' war cry illustrates Google feels more effort is needed to combat scrapers.

Along with his tweet, Cutts included a link to a form that allows web surfers to report scraper pages to Google. Some of that information may be used to test and improve Google's algorithm, the company said.

The form asks for the text of the search query that produced the scraping problem -- such as a scraper site outranking an original content site -- as well as the URL for the original content site and scraper site. There's also a form field for top-of-head comments.

Some scrapers are so successful in what they do that their sites achieve higher search engine rankings than the sites of the content makers from whom they pinch their material. Google attempted to correct that situation in January, when it changed its top-secret search algorithm aimed, among other things, to address the scraping problem.

Scraping, along with search results poisoning, have long been a problem with search engine results, although Google has steadfastly defended the quality of its results, saying the results are better than they have ever been in terms of relevance, freshness and comprehensiveness.

Earlier this year, Google announced changes, including filter changes, in its algorithm. The filter changes, referred to as "Panda," didn't quell the problem. Quite the contrary, it may have made it worse. "We've experienced a significant drop in our traffic (almost 35%) as a result of this change (with an equivalent drop in revenue)," wrote one webmaster after the change took effect. "We believe that our only crime is that we host user-generated content."

Google took another crack at the scraping problem in June, when it rolled out version 2.2 of the Panda filters. Reviews of that move appear to be mixed.

With this latest effort by Google to garner information on scraping sites, maybe the next version of Panda will finally put the issue to bed.

Follow freelance technology writer John P. Mello Jr. and Today@PCWorld on Twitter.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

More about: Google, Panda
References show all

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
Users posting comments agree to the CIO comments policy.
Login or register to link comments to your user profile, or you may also post a comment without being logged in.
Related Coverage
Related Whitepapers
Latest Stories
Community Comments
Tags: antispam, best of the web, Google, internet, search, search engines, security, spam, spam blockers
Latest Blog Posts
Whitepapers
  • Managing IBM License Complexity
    IBM provides thousands of products in its portfolio and uses a variety of license models, contract terms and conditions. These license models can be very complex, causing frequent confusion for organisations trying to grasp the concepts while maintaining license compliance. While at first IBM licensing may seem incomprehensible, some education on the license models and licensing scenarios will help minimise the confusion. In addition, a more automated approach to managing licenses enables organisations to gain control, reduce ongoing software costs and minimise license liability risks. Read on.
    Learn more »
  • Justifying Business Intelligence Applications
    This white paper explores the decision criteria used in a build vs. buy scenario when considering the Oracle BI Applications. The major benefits of the BI Applications will be discussed in the framework of an overall buy vs. build argument.
    Learn more »
  • The State of Data Security
    Recognize how your data can become vulnerable, including the latest issues stemming from unprotected data on mobile devices and social media sites. Understand the compliance issues involved, and identify data protection strategies you can use to keep your company’s information both safe and compliant.
    Learn more »
All whitepapers
rhs_login_lockGet exclusive access to Invitation only events CIO, reports & analysis.
Recent comments