Technology Evangelist has been live on
Google News for two weeks now, giving us time to analyze how this new syndication source has effected our traffic. While our traffic has increased significantly since being added to Google News, the most interesting thing we've learned is how Google News together with Google AdSense plays a role in blog spam.
Here is how Blog Spam using Google News is Done: 1. Spammer creates new blog
2. Spammer chooses theme for the blog.
3. Spammer scrapes headlines off Google News related to blog's theme. Publishes headlines as blog posts.
4. Spammer syndicates their scraped content onto search engines, including blogsearch.google.com and Technorati.
5. Spammer places Google AdSense ads on their site.
6. Spammer makes money off the AdSense clicks. Since the content is marginal, the ads look particularly good to visitors landing on the pages.
This is particularly annoying for news junkies subscribing to queries on blog search search engines. It leads to hundreds of false-positive search results from republished news stories. Here's an example:
The above splog (spam blog) has page after page of scraped headlines from Google News with Google AdSense ads running in the right column. A story from our site was one of the scraped stories.
Here's another example:
And another example:
Yes, that site really does post two huge Google AdSense ads blocks before showing any real content. Then they finally post the content they scraped from Google News.
Is this hard to do? Unfortunately, no. In fact, the scraping and publishing process can be automated to search, scrape, and publish new splog pages on regular intervals such as once an hour.
Clearly, the only reason this type of spam exists is because they can make money off Google AdSense advertising. This wouldn't be a problem, except it wastes my time, and the time of anyone else who happens to end up on pages like this. Chances are pretty good that the advertisers paying for the clicks from sites like this are getting less qualified visitors for their money than visitors clicking through from sites with great content. This theory is based on the assumption that visitors will click on something, anything, once landing on such marginal web pages. A click away from a quality web site is likely a more considered click, thus offering a more qualified visitor to advertisers.
Why does Google Allow Splogs to Use AdSense? Money. Money. Money. There was a time when Google hand approved publishers for inclusion in the AdSense program, but that no longer seems to be the case.
Blogsearch.Google.com is currently in Beta and will likely remain in Beta until Google stops allowing Sploggers to publish AdSense ads on their site. Until Google makes that move, Google Blogsearch will be overrun with too much splog noise to be a usable search tool. As soon as they do that, much of this splogging will disappear overnight.
Technorati Creates Effective Workaround Technorati has recently added a new feature that helps blog searchers filter out the blog spam. Their new Authority slider allows you to filter out search results from blogs with low or no authority. Authority is measured based on the number of inbound links and splogs rarely have any inbound links (who would like to them?), so sliding the authority filter to the right quickly cleans up the results. Hat tips to
Robert Scoble and
Josh Teeters for pointing this out.
Update: This article has already been scraped and splogged to a site running Google AdSense ads:
1. Posted by: Tim on February 16, 2006 8:39 AM:
I'd be curious to hear your thoughts (in the future) about how being included into Google News might impact Google's duplicate content filters. Will there be any cases in the future where there will be other sites that have scraped yours that rank higher than you for specific content on that article.