What is duplicate content?
Duplicate content occurs when a sizable block of text on one page is very similar to — or exactly the same as — a block of text on another page. Google enforces a formal duplicate content penalty by marking it as spam and taking action against the page or site. Content will be marked as spam when it is clear that it was intentionally copied or the site is intentionally spamming users.
But duplicate content is not always intentionally copied. Google lists some examples of non-deceptive duplicate content in their documentation:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- Items in an online store that are shown or linked to by multiple distinct URLs
- Printer-only versions of web pages
If your website produces similar content to another page you won't automatically be marked as spam. If you are bringing value to your readers, then you probably have nothing to worry about as far as a penalty to your site goes. What you should be more concerned about is how duplicate content could affect your organic search channel.
Why is duplicate content bad for SEO?
When it comes to content marketing and fighting for keyword rankings, duplicate content is generally a negative thing. If you want to rank a piece of content, you need to deliver unique value.
Two things you need to know about duplicate content & SEO:
1.) If you accidentally create duplicate content, or your content looks similar to another page, it's not going to seriously hurt your site — Google will simply rank the page that was published first higher.
2.) If you are intentionally creating duplicate content, then Google will mark it as spam and take action against you.
If you copy someone else's content, verbatim, and post it as your own, Google will see that the other person's content was posted first, and will assume that they are the original content author. Google will not penalize your website, but the original content will rank higher. If you do this repeatedly though, Google will see you as a spam site, which can result in Google taking action against you.
But if you want to cite pieces of content from another website and either paraphrase or quote them directly, while also providing value and adding the conversation, Google will be able to see that you added original content to the existing content and you can rank that content in google search results. Google's primary objective is to ensure that search engine users have a good user experience.
To avoid duplicate content issues, here are some things to remember when referencing content from others:
1.) Include a URL link to the site of the referenced content (cite your sources).
2.) Make sure you are adding a new perspective or new information to your content.
If you follow these two principles when you reference the work of others, then you will ensure you aren't going to run into duplicate content issues, and you will be providing a better user experience.
In the below video, Matt Cutts from Google explains how Google views duplicate content and how duplicate content is treated in search results. According to him, around 25-35% of the internet is likely duplicate content — this includes things like quotes and boilerplate text.
He says "It's not the case that every time there's duplicate content it's spam, and if we made that assumption the changes that would happen as a result would probably end up hurting our search quality rather than helping our search quality."
Google doesn't automatically assume duplicate content is spam, but Google also doesn't want to rank two or the same pieces of content because that wouldn't make sense from a user perspective.
If you are concerned about SEO performance and want to rank your content on the first page of Google search results for a particular keyword, you'll need to make sure that the content is original — or at least provides some valuable, original content or perspective.
Use These SEO Practices to Avoid Duplicate Content Issues
Sometimes duplicate content happens for technical reasons. For these cases there are specific technical SEO practices you can use to solve duplicate content issues.
Use 301 Redirects:
If you have rebuilt your site or have redesigned an old web page (and changed the URL) you can redirect your user and bot traffic — including search engine crawlers — to the new site or URL using 301 redirects.
This will not only ensure that your new web page gets the same traffic that the old page was getting, but it will also help search engines like Google understand that this new page is a replacement web page, not duplicate content.
Minimize similar content:
If you can, it's a good practice to minimize similar content on your website as much as possible. This could mean consolidating similar articles into one longer piece of content or refreshing a pre-existing article with new content. If you need to write on a similar topic and are worried about duplicate content issues, you can also use a rel canonical tag to make sure that search engine bots know which page you want to pass rank equity to.
Use Rel Canonical Tags:
You may have pages with similar content on your site for various reasons. These pages can also cause duplicate content issues if you are not ensuring that similar content is marked with a rel canonical tag or non-canonical tag.
Keep Internal Linking Consistent
Here's the excerpt from the Google developer docs on the importance of internal links to avoid duplicate content:
Internal links have three main functions — they aid in website navigation, define the hierarchy of a website, and distribute page authority and ranking equity. Linking to different versions of the same page can cause confusion about which is the authority web page. If you only link to one of these pages, a search engine bot will ignore the other pages and only rank the page with internal links leading to it.