Robots Meta Tag – The Definitive Guide

In a previous post, we discussed how to use the robots.txt file to prevent search engine crawlers from crawling sections of your website. But what if you only want to restrict how crawlers handle specific pages on the site? Is there a granular way to control how crawlers handle individual pages? Fortunately, there is, and it’s called the robots meta tag. Now that you know what it’s called, I’m sure your mind is full of questions. So let’s start answering them…

What is a robots meta tag?

A robots meta tag is a line of HTML code that is included in a Web page to instruct search engine crawlers how they should process that page. Specifically, the tag tells crawlers if they are allowed to index the page, follow its links, and/or archive its contents. If you don’t want to restrict the crawlers, you shouldn’t include a robots meta tag.

What does a robots meta tag look like?

As previously mentioned, the tag is a line of HTML code. Here’s a very simple example:

<html>
<head>

<meta name=”robots” content=”noindex” />
</head>

This robots meta tag tells search engines not to index the corresponding page (i.e., don’t show the page in the search results).

What are the most common content values for a robots meta tag?

The following values are supported by the most popular search engines (Google, Bing, Yahoo!, Ask, and Yandex):

  • index – allows search engines to index the page
  • noindex – prevents search engines from indexing the page
  • follow – allows search engine crawlers to follow (i.e., crawl) links on the page
  • nofollow – prevents search engine crawlers from following (i.e., crawling) links on the page
  • archive – allows search engines to store a cached copy of the page (and show it in the search results)
  • noarchive – prevents search engines from storing a cached copy of the page

According to the Baidu Search Help Center, Baidu does not support the noindex value.

Are there any other content values for a robots meta tag?

Actually, there are a lot more. Here is an exhaustive list:

  • all – allows search engine crawlers to index the page and follow its links (i.e., it combines the index and follow values) – supported by Google and Yandex
  • none – prevents search engine crawlers from indexing the page and following its links (i.e., it combines the noindex and nofollow values) – supported by Google, Ask, and Yandex
  • nosnippet – prevents search engines from displaying a descriptive snippet for the page in the search results (it also prevents the crawlers from storing a cached copy of the page) – supported by Google
  • nocache – prevents search engines from storing a cached copy of the page (i.e., the same as the noarchive value) – supported by Bing (and Yahoo!, now that it’s powered by Bing)
  • noodp – prevents search engines from using the Open Directory Project description as the page’s descriptive snippet in the search results – supported by Google, Bing (and Yahoo!)
  • notranslate – prevents search engines from translating the page in the search results – supported by Google
  • noimageindex – prevents search engines from indexing the page’s images – supported by Google
  • unavailable_after – prevents search engines from showing the page in the search results after a specified date/time – supported by Google
  • noydir – prevents search engines from using the Yahoo! Directory description as the page’s descriptive snippet in the search results – previously supported by Yahoo! (before it was powered by Bing)

Is there a sweet table summarizing robots meta tag content values?

Why yes, yes there is:

Value Google Bing Yahoo! Ask Yandex Baidu
index YES YES YES YES YES YES
noindex YES YES YES YES YES NO
follow YES YES YES YES YES YES
nofollow YES YES YES YES YES YES
archive YES YES YES YES YES YES
noarchive YES YES YES YES YES YES
all YES NO NO NO YES NO
none YES NO NO YES YES NO
nosnippet YES NO NO NO NO NO
nocache NO YES YES NO NO NO
noodp YES YES YES NO NO NO
notranslate YES NO NO NO NO NO
noimageindex YES NO NO NO NO NO
unavailable_after YES NO NO NO NO NO

If I want to use multiple content values, do I have to use multiple robots meta tags?

No. If you want to use multiple values, you can combine them in a comma-separated list. To illustrate, let’s assume you want to prevent search engines from indexing a page and following its links. You can accomplish this with multiple robots meta tags:

<meta name=”robots” content=”noindex” />
<meta name=”robots” content=”nofollow” />

Or you can accomplish the same task with one robots meta tag that contains multiple content values:

<meta name=”robots” content=”noindex, nofollow” />

What happens if I don’t include a robots meta tag in my page?

Absolutely nothing. By default, the search engines assume you don’t want any restrictions placed upon your page. More specifically, if you omit the robots meta tag, the search engines assume you want them to index the page, follow its links, and archive its contents.

What happens if a robots meta tag has conflicting content values?

Unfortunately, the answer to this question is more complicated than it should be. Google and Yandex are the only two search engines that have publicly commented on this situation, and each handles it completely different from the other. When presented with conflicting attribute values, Google will choose the most restrictive value, whereas Yandex will choose the attribute’s default value instead. To illustrate, let’s assume a page has the following robots meta tag:

<meta name=”robots” content=”noindex, index” />

In this example, Google will respect the noindex value and NOT index the corresponding page because that is more restrictive. Yandex, on the other hand, will respect the index value because that is the default value for this attribute. Confusing, right? Fortunately, you can completely avoid this chaos by making your values consistent.

What happens if a page’s robots meta tag conflicts with robots.txt?

Google handles this situation the same way they handle conflicting robots meta tags: they respect the most restrictive value. Therefore, if a page is blocked in robots.txt, Google will respect that, regardless of whether or not the page has a robots meta tag with an index value (because the page’s appearance in robots.txt is more restrictive). None of the other search engines have publicly commented on how they handle this situation so the best way to avoid a problem is to make your robots meta tags consistent with your robots.txt file.

Even if a page is blocked by robots.txt, its links will still be followed unless the page contains a robots meta tag with a nofollow value.

Is there any other minutia you’d like to tell me?

Actually, yes. Thanks for asking. I have two more quick things. First, it’s important to note that content values are case insensitive. Specifically, noindex will be interpreted the same as NOINDEX and NoInDeX. Second, if you use multiple content values, they must be comma-delimited (as mentioned above), but the spaces surrounding those commas will be ignored. Thus, noindex, nofollow is treated the same as noindex,nofollow.

Did you just make all of this up?

I would really like to say, “Yes.” But I actually researched this post (boring, I know). If you’d like to double check my work, here are relevant resources for each of the major search engines:

If you have any other questions, please leave a comment below, and we’ll keep this little Q&A going!


steve
About The Author:  is an SEO audit specialist at Web Gnomes. He received his Ph.D. from Georgia Tech, where he published dozens of articles on Internet-related topics. Professionally, Steve has worked for Google and various other Internet startups, and he's passionate about sharing his knowledge and experiences with others. You can find him on Twitter, Google+, and LinkedIn.


Did you enjoy this article?


Make sure you don't miss the next one by subscribing to our blog.


Share it with your friends and colleagues:


6 Responses to “Robots Meta Tag – The Definitive Guide”

  1. Nishant February 7, 2013 at 4:32 am #

    Whats the difference between Robot.txt and robot meta tags

    • steve February 8, 2013 at 5:53 pm #

      Hi Nishant,

      That’s an excellent question.

      The robots.txt file is used to restrict search engine crawlers from accessing entire sections of your website. You can read more about it here: A robots.txt File Guide That Won’t Put You to Sleep.

      Robots meta tags, on the other hand, are used to control search engine access for individual pages on your site. Specifically, you use the tag to tell search engines if you want them to index and/or follow the links found on a given page.

      Thanks for the comment!

  2. Jennifer Sibley August 28, 2013 at 12:51 pm #

    I am using Screaming Frog is crawl my site and the results do not include of list of robots meta tags. Is there any tool that will provide the robots meta tag for each page on a site or do I have to manually check the tag on each page?

    Thank you.

    • steve August 28, 2013 at 1:08 pm #

      Hi Jennifer,

      I use a custom crawler for my audits (and it automatically checks the robots meta tag for each page), but it also looks like Screaming Frog provides that information.

      If you look under the “Directives” tab, you should see a column titled, “Meta Data 1″ (if the column is empty, you don’t have any explicit values set for the robots meta tag — and you should be good to go).

      I hope that helps… and thanks for commenting :-)

      -Steve

      • Jennifer Sibley August 28, 2013 at 1:58 pm #

        Thanks so much, Steve! Lifesaver!

        • steve August 28, 2013 at 4:57 pm #

          You’re very welcome… I’m glad I could help :-)

Leave a Reply