Menu

What is Duplicate Content and How to Avoid It

By: Compose.ly — March 24, 2020

Could duplicate content be wreaking havoc on your page rankings in Google? It’s possible — even if you’re not deliberately copying or repeating any content on your site.

That’s why it’s important to understand how duplicates can occur and what you can do to protect your website.

What is Duplicate Content in SEO?

Any web page available in more than one location online is considered duplicate content by search engines. Duplicate content might refer to two pages that are completely identical or two pages that have significant overlap in their wording and other elements. Pages with significant amounts of overlapping data are called “near-duplicate content” or “common content.”

Duplicate content can be internal (multiple pages with the same content on a single website) or external (two or more instances of the same content on different websites).

According to Google, duplicate content is defined as “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”

The Google definition also says, “Mostly, this is not deceptive in origin.” In other words, you might not be plagiarizing other sites or pirating content, but you could still be hosting material that Google counts as duplicate.

Why Would Your Site Have Duplicate Content?

When you look at how Google checks for duplicate content, you’ll see that it can turn up in a variety of ways.

Sometimes duplicate content occurs for legitimate reasons.

  • Maybe you’re running an A/B test on two landing page designs with the same copy, or you have two very similar products in your web store with a lot of overlap in their descriptions.
  • Your “about me” or “about the author” page might be identical across multiple sites if you have several businesses or do a lot of guest blogging.
  • If you have an online store and use product descriptions supplied by your vendor, that same description might appear in many other locations on the internet.
  • If a guest blogger republishes their post on their own site, that creates a second copy of the same material.
  • Sometimes the issue is technical. For example, you may have the same website available at both www and non-www locations, on HTTP and HTTPS, or via AMP pages. That’s five possible duplicate URLs for every piece of content on your site.
  • You could also wind up with the same content at more than one URL because of the way your system organizes your pages. For instance, suppose you create a blog post at the page blogpost.html. It could appear at yourdomain.com/blogpost.html and your domain.com/blog/blogpost.html even though the post is only in your database once.
  • Session IDs, used for tracking your visitors’ actions on your site, and URL parameters can also create multiple URLs for the same content in Google’s view.

Or, unfortunately, for many high-quality sites, your content could be duplicated by another publisher, either because they plagiarized your writing or because they are scraping your site and recreating it on their server.

Duplicate Content and SEO

Does duplicate content affect SEO? Internet experts disagree somewhat on this topic. Some experts say that webmasters don’t need to worry about duplicate content SEO impact because the search algorithms ignore the duplicates and show only one copy in results.

Other experts say it’s not quite that simple because the system isn’t foolproof. If the search results get divided between similar pages, then each page’s SEO gets watered down. If this happens, it will also dilute the SEO “juice” those pages could have passed on through internal linking, which can impact the rankings for your most important content.

Most of the experts agree that rumors of a “duplicate content penalty” are overblown. It seems the only time Google penalizes a site for duplicate content is when they are scraping other sites and not producing anything original.

However, most webmasters would prefer to decide for themselves which pages on their site to include in search results, rather than depending on an algorithm. Checking for and eliminating duplicate content gets rid of that uncertainty.

How to Check Your Site for Duplicate Content

These free tools that will help you find out whether search engines read some of your content as duplicates. Some of them will also tell you if anyone is scraping or plagiarizing your work.

  • Siteliner: Analyzes your entire site for internal duplicates, as well as broken links, page rank, redirections, and more.
  • SEO Review Tools: Checks for both internal and external duplicates.
  • Copyscape: A plagiarism checker that you can use even before you publish content to make sure it’s original
  • Plagspotter: Checks for plagiarized content across the web, and can check monthly to be sure none of your content is being scraped or stolen.

How to Avoid or Fix Duplicate Content Search Engine Issues

If any of these tools find internal duplicates or scraped content on other websites, you’ll need to take action. Here’s how to tackle the most common issues:

Scraped Content

If you discover that your content is being scraped — meaning another site is automatically stealing posts from your RSS feed — you can report it to Google. Still, you probably don’t need to worry about losing SEO power.

One thing the experts agree on is that anyone who is creating a site filled with stolen content will have trouble getting any SEO traction. Google shouldn’t be confused by people who copy or scrape your content.

Neil Patel offers some handy tricks for turning scrapers into inbound links for your website. One is to use a plug-in that adds a link back to your site in your RSS feed, and the other is to get a similar effect using internal links on your pages.

Internal Duplicates

Some site owners use a cloning plug-in to create multiple landing pages. Describing similar services in different cities is an example. If you use this shortcut, you should be changing the URL structure, title, topic headings, images, and some of the text to make each page’s content unique.

If you’re looking for organic traffic to any of these pages, Google recommends several methods for indicating which is the “canonical” or primary page. The most common technique is adding a special tag to the secondary pages that points to the original. That code is called a rel=canonical tag, and it looks like this:

<link rel=canonical href=“http://www.website.com/originalpost.html”/>

You’ll find the rel=canonical tag and other methods explained in detail in Google’s support documentation.

Other People’s Content on Your Site

If you’re using content supplied by other businesses, like product descriptions from a manufacturer or press release information, rewrite them using unique language. If you have a large web store and want your product pages to rank, hire a writer with SEO training to rework those pages for you.

Sometimes a busy blogger is tempted to grab popular material from another blog and quickly spin a new story by making a few changes to it. Google recognizes this for what it is and doesn’t give high rankings to sites that publish this kind of thinly-veiled copying.

Guest Posts

If you write a guest post for another blog, they may allow you to republish that content on your site (or vice versa if you publish their post). The usual practice is to wait a few weeks so it’s clear to search engines which page is the original. You can also add a rel=canonical tag to the reprint for clarity.

HTTPS and WWW

When you install an SSL certificate, your whole site migrates from an HTTP protocol to a secure HTTPS protocol. When Google crawls your website, it sees both versions and considers them two separate entities with lots of duplication.

To solve the problem, use 301 redirects on all of your HTTP pages to point to their HTTPS versions. Redirects are a good security practice for SSL in any case.

A similar problem can arise with the www prefix. Google views www.website.com as a sub-domain of website.com and will read the pages of the two URLs as duplicates.

Pick one version and 301 redirect the pages of the other site to it, or use Google Webmaster Tools to set a preference.

Good Website Hygiene Practices

These development practices can prevent duplicate content search engine problems on your site.

  • Don’t set up a second printer-friendly version of your pages. If something needs special formatting, link to a downloadable pdf instead.
  • Don’t publish content that is heavily “inspired by” someone else’s posts. A few tweaks or even a translation will not fool Google.
  • Allow robots to crawl your URLs, so the search engine has the information to resolve duplicates
  • Turn off comment pagination in your discussion settings to stop extended discussions from creating duplicate pages.
  • Disable session IDs in your settings to avoid creating multiple URLs
  • Use rel=canonical tags any time you knowingly publish duplicate content
  • If your site has more than 1,000 pages and uses parameters that create multiple URLs, Google recommends that you block crawling of parameterized duplicate content.
  • Focus on creating lots of fresh, original, high-quality content for your site.

Conclusion

Although non-malicious duplicate content won’t cause search engine penalties, it may still slow down your SEO efforts. Checking your site for any pages that might be seen as duplicates through the eyes of a search crawler, fixing those issues, and practicing good website hygiene are common-sense precautions.

This article was written by Lauren Haas.


No Comments

Start a Discussion

We're looking forward to hearing your thoughts! Please keep in mind that all comments are moderated, and abusive or spammy comments will not be published.

Need help developing and executing your content strategy? Compose.ly has you covered.

Learn More
Compose.ly ebook

Need help developing and executing your content strategy?
Compose.ly has you covered.

Outsmart your competition with cutting edge content outsourcing strategies used by experts worldwide. 2019 Edition.

Download Now
Show Buttons
Hide Buttons