In a previous post, we discussed the fundamentals of duplicate content (i.e., what it is, why it’s important, etc.). Now, we’re going to complete that discussion by describing various techniques for avoiding duplicate content.
Solving The WWW vs. Non-WWW Dilemma
Previously, we learned that
http://webgnomes.org are unique addresses. Consequently, if both addresses are serving up the same content, we have a duplicate content issue on our hands. To avoid this situation, we need to choose one of the addresses as our preferred address. Then, we need to redirect the other address to this preferred address (e.g.,
http://webgnomes.org will redirect to
http://www.webgnomes.org). To accomplish this redirection, we will use a 301 HTTP redirect.
Once we have chosen a preferred address (WWW or Non-WWW), we’ll also want to inform Google about our decision. To accomplish this, we log into our Google Webmaster Tools account and select our preferred domain (e.g., www.webgnomes.org). This notifies Google that we want our URLs to be displayed in their results pages using our preferred domain.
General Duplicate Content Solutions
The WWW vs. Non-WWW dilemma is the most common cause of duplicate content, but it is by no means the only potential source of problems. Instead of trying to address every possible scenario, we will focus on the most effective solutions. Then, you will be prepared for any duplicate content that comes your way!
301 For The Win
We’ve already mentioned one of the most effective tools for avoiding duplicate content: 301 HTTP redirects. It solved our WWW vs. Non-WWW problem, but the fun doesn’t end there. Any time we have multiple URLs that are serving the same content (e.g.,
http://www.domain.org, etc.), we follow a similar procedure. First, we select our preferred URL (e.g.,
http://www.domain.com), and then, we redirect the other URLs to point to that preferred URL. Now, instead of having multiple URLs serving the same content, we have multiple pointers to a single version of the content.
Let’s Get Canonical
301 redirects are your best friend, but unfortunately, they can’t solve every problem. There are numerous situations where duplicate content is generated (e.g., syndication, mobile-friendly pages, printer-friendly pages, etc.), and you are unable to avoid it using redirection. But fear not: there is a simple solution for this problem.
When we have a duplicate content situation, a number of pages have very similar content. As we discussed in our previous post, search engines will resolve this situation by algorithmically choosing one of those pages as the original (this is also called the canonical page). Fortunately, we can influence this selection process by explicitly identifying the canonical page with the canonical tag.
To illustrate, let’s assume we have two very similar pages:
http://www.example.com/canonical(the canonical page)
http://www.syndicate.com/duplicate(the duplicate page)
http://www.syndicate.com/duplicate is the duplicate of the canonical page, we need to include the following canonical tag in the duplicate page’s HTML:
<link rel="canonical" href="http://www.example.com/canonical"/>
Now, when the search engines see this tag, they will know that
http://www.example.com/canonical is the canonical page (instead of attempting to select it algorithmically). To learn even more about the canonical tag, watch this video by Matt Cutts:
Now that you know a few of the best techniques for avoiding duplicate content, let’s get out there and start de-duping the world!