Avoiding Duplicate Content

In a previous post, we discussed the fundamentals of duplicate content (i.e., what it is, why it’s important, etc.). Now, we’re going to complete that discussion by describing various techniques for avoiding duplicate content.

Solving The WWW vs. Non-WWW Dilemma

Previously, we learned that http://www.webgnomes.org and http://webgnomes.org are unique addresses. Consequently, if both addresses are serving up the same content, we have a duplicate content issue on our hands. To avoid this situation, we need to choose one of the addresses as our preferred address. Then, we need to redirect the other address to this preferred address (e.g., http://webgnomes.org will redirect to http://www.webgnomes.org). To accomplish this redirection, we will use a 301 HTTP redirect.

If you’re using the Apache Web server, you can define a 301 HTTP redirect in your .htaccess file, and if you’re using the IIS Web server, you can define it using the administrative console.

Once we have chosen a preferred address (WWW or Non-WWW), we’ll also want to inform Google about our decision. To accomplish this, we log into our Google Webmaster Tools account and select our preferred domain (e.g., www.webgnomes.org). This notifies Google that we want our URLs to be displayed in their results pages using our preferred domain.

Here’s a quick tutorial on how to register your site with Google Webmaster Tools (if you haven’t already done so).

General Duplicate Content Solutions

The WWW vs. Non-WWW dilemma is the most common cause of duplicate content, but it is by no means the only potential source of problems. Instead of trying to address every possible scenario, we will focus on the most effective solutions. Then, you will be prepared for any duplicate content that comes your way!

301 For The Win

We’ve already mentioned one of the most effective tools for avoiding duplicate content: 301 HTTP redirects. It solved our WWW vs. Non-WWW problem, but the fun doesn’t end there. Any time we have multiple URLs that are serving the same content (e.g., http://www.domain.com, http://www.domain.net, http://www.domain.org, etc.), we follow a similar procedure. First, we select our preferred URL (e.g., http://www.domain.com), and then, we redirect the other URLs to point to that preferred URL. Now, instead of having multiple URLs serving the same content, we have multiple pointers to a single version of the content.

Let’s Get Canonical

301 redirects are your best friend, but unfortunately, they can’t solve every problem. There are numerous situations where duplicate content is generated (e.g., syndication, mobile-friendly pages, printer-friendly pages, etc.), and you are unable to avoid it using redirection. But fear not: there is a simple solution for this problem.

When we have a duplicate content situation, a number of pages have very similar content. As we discussed in our previous post, search engines will resolve this situation by algorithmically choosing one of those pages as the original (this is also called the canonical page). Fortunately, we can influence this selection process by explicitly identifying the canonical page with the canonical tag.

To illustrate, let’s assume we have two very similar pages:

http://www.example.com/canonical (the canonical page)
http://www.syndicate.com/duplicate (the duplicate page)

Since http://www.syndicate.com/duplicate is the duplicate of the canonical page, we need to include the following canonical tag in the duplicate page’s HTML:

<link rel="canonical" href="http://www.example.com/canonical"/>

Now, when the search engines see this tag, they will know that http://www.example.com/canonical is the canonical page (instead of attempting to select it algorithmically). To learn even more about the canonical tag, watch this video by Matt Cutts:

Now that you know a few of the best techniques for avoiding duplicate content, let’s get out there and start de-duping the world!

Comments

The Forge says

May 30, 2012 at 5:53 pm

I was looking for an article on how to do this and this article explained what I needed. I am in the process of setting up a 301 HTTP redirect through .htaccess and this has helped immensely. If you have a good tutorial on how to change the .htaccess for a 301 redirect let me know. Thank you for the knowledge.
- steve says
  
  May 30, 2012 at 6:14 pm
  
  I’m glad you enjoyed the post 🙂
  
  Here’s a good resource for 301 redirects: http://www.webconfs.com/how-to-redirect-a-webpage.php (scroll down to the bottom for .htaccess techniques).

	The Most Actionable SEO Tips Ever
	33 Free SEO Tools You Should Know About
	How I Would Fix Grantland’s SEO: An In-Depth Audit
	10 SEO Analysis Tools You Should Be Using

Solving The WWW vs. Non-WWW Dilemma

General Duplicate Content Solutions

301 For The Win

Let’s Get Canonical

About The Author

Comments