Why Content Scraping And Copying Articles Is A Really Bad Idea

Content Scraping And Copying Articles

You started a new blog, and you don’t have many pages or posts. So you consider content scraping and copying articles from around the Internet.

Yes, you can publish a lot of pages in a short time by copying. But, before you start, do you know what you are about to do?

Is there a copyright notice on the sites you are thinking about copying? If there is, you will be, at worst, stealing and, at best, committing blatant plagiarism.

On top of that, what’s the benefit to you other than more pages on your blog? Copied content has almost no SEO value, so it will not help you at all in getting traffic to your site.

Fight or ignore content scrapers?

Every day I come across sites using content scraping and copying to republish my articles in full.

Could I take action against them?

Yes, I can send an email asking the webmaster to remove the copied content.

There is also the choice of blocking the user’s IP address so they can’t access my site.

Another option is to send a DMCA (Digital Millennium Copyright Act) notice to the site’s web host.

But can you imagine how much time all this would take me each day?

I would be doing nothing other than trying to police my site.

I gave up on this time-consuming process a long time ago. Now I take the do-nothing approach to content copiers.

Except for one thing: I take advantage of them.

Yes, in their foolishness and laziness, content copiers help me.

 

How to take advantage of content scrapers

The first step is adding a code line to your RSS feed. It’s easy to set up if you use Yoast.

The advice from Yoast says this:

This feature is used to automatically add content to your RSS, more specifically, it’s meant to add links back to your blog and your blog posts, so dumb scrapers will automatically add these links too, helping search engines identify you as the original source of the content.

RSS feed code

The link will appear when a blog post is viewed in RSS in full or as a summary.

RSS code in content feeds

Even if a blogger is smart enough to delete the link when they copy a post, it’s too late. Google and all other search engines will have indexed the post and know that my posts are the original versions.

Search engines don’t only use links. They are also very good at identifying copied and plagiarized text.

The second tactic is to include links and especially internal links in my posts. Most bloggers who steal content are lazy, so they rarely go through all the trouble of stripping links.

Here’s a great example of an RSS copy scraper helping me.

Links in scraped and copied content

The first point to note is that this is a publisher’s site and should know better than copying content.

On the right, you can see that my internal links have helped me gain a few backlinks. That’s good for my SEO but not for this scraping site.

As a bonus for me, because the links are copied, they will all lead back to my site when clicked. On top of that, if I include affiliate links, they work on copies and might earn me a few bucks extra.

 

Some content scraping and copying sites are plain stupid

Plain stupid

Google Search Console is the easiest way to find bloggers copying your content.

Go to Links and then Top Linking Sites.

Here’s an example.

proof of content scraping and copying of my blog articles

This site alone has stolen, copied, and published 176 of my articles in full.

I sent numerous emails asking them to cease and desist, but to no avail.

But the owners of the site have no idea at all about how to maintain a website.

If you try to access the site, you get a security warning in most browsers because it doesn’t have a simple SSL certificate.

Security risk site

However, guess what?

Google can access the site without a problem, so I get credited with 176 article backlinks as well as 210 internal links to other articles.

Okay, all together, they are not worth anywhere near one link from the Guardian, the New York Times, or Wikipedia.

But Google recognizes the links, so they help my site a little.

Well, thanks, and nice. But how stupid can you get?

But if you look at the image above again, the site at the top of the list has linked to 256 of my pages.

The site is a well-managed aggregated blog.

It posts a snippet of the introduction of my articles with a link to continue reading the full article on my site.

That’s the correct way to use my content.

 

Is all copying bad?

No, no, not at all!

In fact, the opposite is true.

Copying, crediting, and linking selective text (or images) from authoritative websites to support your blog post is excellent for your SEO.

But you need to do it correctly.

SmartBug published this article, Content, Attribution, and Plagiarism: How to Give Credit Where Credit Is Due.

It says this about attribution.

If you’re taking text directly from another source, without any paraphrasing or rewording, place in quotes, name the source, and link to its website.

It’s very good advice, but I prefer to use italics instead of quotation marks.

By quoting and linking the right way, you support your post’s subject or your opinions.

You also send a positive signal to search engines that your blog is well-researched and trustworthy.

Used wisely, selectively copying text with correct crediting and linking has very good SEO value.

But content scraping and copying an entire article and republishing it has absolutely zero SEO value.

 

Write well, don’t copy and steal

A poor copy

Copying is copying, and plagiarism is plagiarism.

I view copying and republishing my articles in full the same as the thieves who pirated my books.

In both cases, it’s annoying but causes little real damage in the long term.

But if you are starting a new blog, don’t fall for the temptation to copy and paste the work of other bloggers.

It might seem like a good idea to get some quick posts on your site.

But in reality, you will handicap and diminish the SEO value of your new site very quickly, and it may never recover.

Every blog you read started with zero posts, even the big ones.

I remember when I started this blog with nothing but a theme and an about page. But then I got to work.

The only way to build a new blog is to write fantastic posts and learn to leverage SEO.

Sure, it takes much longer, but it’s the only proven way to succeed in blogging.

 

Summary of content scraping and copying

Everything on the Internet can be copied in seconds, and there is no way you can protect your content, even ebooks.

All you can do is accept that it happens and move on.

Writing and publishing quality and high-value articles regularly will always win and rank much higher than copycats.

However, in a weird sort of way, having your content copied is a bit of a backhanded compliment.

It must mean that scrapers think it’s great content, so it’s worth copying.

And as I have learned, I can then take advantage of their inexperience and lack of understanding of how SEO works.

One last point is that most of these copying sites fail and disappear quite quickly once they realize the futility.

But there are always plenty of new sites to replace them, so it is a never-ending story.

Derek Haines

A Cambridge CELTA English teacher and author with a passion for writing and all forms of publishing. My days are spent writing and blogging, as well as testing and taming new technology.

Avatar for Derek Haines

Add Your Comment

Your email address will not be published.

To prevent spam, all comments are moderated and will be published upon approval. Submit your comment only once, please.

This site uses Akismet to reduce spam. Learn how your comment data is processed.