In a previous post, we discussed how to use the robots.txt file to prevent search engine crawlers from crawling sections of your website. But what if you only want to restrict how crawlers handle specific pages on the site? Is there a granular way to control how crawlers handle individual pages? Fortunately, there is, and it’s called the robots meta tag. Now that you know what it’s called, I’m sure your mind is full of questions. So let’s start answering them…
What is a robots meta tag?
A robots meta tag is a line of HTML code that is included in a Web page to instruct search engine crawlers how they should process that page. Specifically, the tag tells crawlers if they are allowed to index the page, follow its links, and/or archive its contents. If you don’t want to restrict the crawlers, you shouldn’t include a robots meta tag.
What does a robots meta tag look like?
As previously mentioned, the tag is a line of HTML code. Here’s a very simple example:
<meta name=”robots” content=”noindex” />
This robots meta tag tells search engines not to index the corresponding page (i.e., don’t show the page in the search results).
What are the most common content values for a robots meta tag?
The following values are supported by the most popular search engines (Google, Bing, Yahoo!, Ask, and Yandex):
- index – allows search engines to index the page
- noindex – prevents search engines from indexing the page
- follow – allows search engine crawlers to follow (i.e., crawl) links on the page
- nofollow – prevents search engine crawlers from following (i.e., crawling) links on the page
- archive – allows search engines to store a cached copy of the page (and show it in the search results)
- noarchive – prevents search engines from storing a cached copy of the page
Are there any other content values for a robots meta tag?
Actually, there are a lot more. Here is an exhaustive list:
- all – allows search engine crawlers to index the page and follow its links (i.e., it combines the index and follow values) – supported by Google and Yandex
- none – prevents search engine crawlers from indexing the page and following its links (i.e., it combines the noindex and nofollow values) – supported by Google, Ask, and Yandex
- nosnippet – prevents search engines from displaying a descriptive snippet for the page in the search results (it also prevents the crawlers from storing a cached copy of the page) – supported by Google
- nocache – prevents search engines from storing a cached copy of the page (i.e., the same as the noarchive value) – supported by Bing (and Yahoo!, now that it’s powered by Bing)
- noodp – prevents search engines from using the Open Directory Project description as the page’s descriptive snippet in the search results – supported by Google, Bing (and Yahoo!)
- notranslate – prevents search engines from translating the page in the search results – supported by Google
- noimageindex – prevents search engines from indexing the page’s images – supported by Google
- unavailable_after – prevents search engines from showing the page in the search results after a specified date/time – supported by Google
- noydir – prevents search engines from using the Yahoo! Directory description as the page’s descriptive snippet in the search results – previously supported by Yahoo! (before it was powered by Bing)
Is there a sweet table summarizing robots meta tag content values?
Why yes, yes there is:
If I want to use multiple content values, do I have to use multiple robots meta tags?
No. If you want to use multiple values, you can combine them in a comma-separated list. To illustrate, let’s assume you want to prevent search engines from indexing a page and following its links. You can accomplish this with multiple robots meta tags:
<meta name=”robots” content=”nofollow” />
Or you can accomplish the same task with one robots meta tag that contains multiple content values:
What happens if I don’t include a robots meta tag in my page?
Absolutely nothing. By default, the search engines assume you don’t want any restrictions placed upon your page. More specifically, if you omit the robots meta tag, the search engines assume you want them to index the page, follow its links, and archive its contents.
What happens if a robots meta tag has conflicting content values?
Unfortunately, the answer to this question is more complicated than it should be. Google and Yandex are the only two search engines that have publicly commented on this situation, and each handles it completely different from the other. When presented with conflicting attribute values, Google will choose the most restrictive value, whereas Yandex will choose the attribute’s default value instead. To illustrate, let’s assume a page has the following robots meta tag:
In this example, Google will respect the noindex value and NOT index the corresponding page because that is more restrictive. Yandex, on the other hand, will respect the index value because that is the default value for this attribute. Confusing, right? Fortunately, you can completely avoid this chaos by making your values consistent.
What happens if a page’s robots meta tag conflicts with robots.txt?
Google handles this situation the same way they handle conflicting robots meta tags: they respect the most restrictive value. Therefore, if a page is blocked in robots.txt, Google will respect that, regardless of whether or not the page has a robots meta tag with an index value (because the page’s appearance in robots.txt is more restrictive). None of the other search engines have publicly commented on how they handle this situation so the best way to avoid a problem is to make your robots meta tags consistent with your robots.txt file.
Is there any other minutia you’d like to tell me?
Actually, yes. Thanks for asking. I have two more quick things. First, it’s important to note that content values are case insensitive. Specifically, noindex will be interpreted the same as NOINDEX and NoInDeX. Second, if you use multiple content values, they must be comma-delimited (as mentioned above), but the spaces surrounding those commas will be ignored. Thus, noindex, nofollow is treated the same as noindex,nofollow.
Did you just make all of this up?
I would really like to say, “Yes.” But I actually researched this post (boring, I know). If you’d like to double check my work, here are relevant resources for each of the major search engines:
If you have any other questions, please leave a comment below, and we’ll keep this little Q&A going!