Robots Meta Tag - The Definitive Guide

In a previous post, we discussed how to use the robots.txt file to prevent search engine crawlers from crawling sections of your website. But what if you only want to restrict how crawlers handle specific pages on the site? Is there a granular way to control how crawlers handle individual pages? Fortunately, there is, and it’s called the robots meta tag. Now that you know what it’s called, I’m sure your mind is full of questions. So let’s start answering them…

What is a robots meta tag?

A robots meta tag is a line of HTML code that is included in a Web page to instruct search engine crawlers how they should process that page. Specifically, the tag tells crawlers if they are allowed to index the page, follow its links, and/or archive its contents. If you don’t want to restrict the crawlers, you shouldn’t include a robots meta tag.

What does a robots meta tag look like?

As previously mentioned, the tag is a line of HTML code. Here’s a very simple example:

<html>
<head>
…
<meta name=”robots” content=”noindex” />
</head>
…

This robots meta tag tells search engines not to index the corresponding page (i.e., don’t show the page in the search results).

What are the most common content values for a robots meta tag?

The following values are supported by the most popular search engines (Google, Bing, Yahoo!, Ask, and Yandex):

index – allows search engines to index the page
noindex – prevents search engines from indexing the page
follow – allows search engine crawlers to follow (i.e., crawl) links on the page
nofollow – prevents search engine crawlers from following (i.e., crawling) links on the page
archive – allows search engines to store a cached copy of the page (and show it in the search results)
noarchive – prevents search engines from storing a cached copy of the page

According to the Baidu Search Help Center, Baidu does not support the noindex value.

Are there any other content values for a robots meta tag?

Actually, there are a lot more. Here is an exhaustive list:

all – allows search engine crawlers to index the page and follow its links (i.e., it combines the index and follow values) – supported by Google and Yandex
none – prevents search engine crawlers from indexing the page and following its links (i.e., it combines the noindex and nofollow values) – supported by Google, Ask, and Yandex
nosnippet – prevents search engines from displaying a descriptive snippet for the page in the search results (it also prevents the crawlers from storing a cached copy of the page) – supported by Google
nocache – prevents search engines from storing a cached copy of the page (i.e., the same as the noarchive value) – supported by Bing (and Yahoo!, now that it’s powered by Bing)
noodp – prevents search engines from using the Open Directory Project description as the page’s descriptive snippet in the search results – supported by Google, Bing (and Yahoo!)
notranslate – prevents search engines from translating the page in the search results – supported by Google
noimageindex – prevents search engines from indexing the page’s images – supported by Google
unavailable_after – prevents search engines from showing the page in the search results after a specified date/time – supported by Google
noydir – prevents search engines from using the Yahoo! Directory description as the page’s descriptive snippet in the search results – previously supported by Yahoo! (before it was powered by Bing)

Is there a sweet table summarizing robots meta tag content values?

Why yes, yes there is:

Value	Google	Bing	Yahoo!	Ask	Yandex	Baidu
index	YES	YES	YES	YES	YES	YES
noindex	YES	YES	YES	YES	YES	NO
follow	YES	YES	YES	YES	YES	YES
nofollow	YES	YES	YES	YES	YES	YES
archive	YES	YES	YES	YES	YES	YES
noarchive	YES	YES	YES	YES	YES	YES
all	YES	NO	NO	NO	YES	NO
none	YES	NO	NO	YES	YES	NO
nosnippet	YES	NO	NO	NO	NO	NO
nocache	NO	YES	YES	NO	NO	NO
noodp	YES	YES	YES	NO	NO	NO
notranslate	YES	NO	NO	NO	NO	NO
noimageindex	YES	NO	NO	NO	NO	NO
unavailable_after	YES	NO	NO	NO	NO	NO

If I want to use multiple content values, do I have to use multiple robots meta tags?

No. If you want to use multiple values, you can combine them in a comma-separated list. To illustrate, let’s assume you want to prevent search engines from indexing a page and following its links. You can accomplish this with multiple robots meta tags:

Or you can accomplish the same task with one robots meta tag that contains multiple content values:

What happens if I don’t include a robots meta tag in my page?

Absolutely nothing. By default, the search engines assume you don’t want any restrictions placed upon your page. More specifically, if you omit the robots meta tag, the search engines assume you want them to index the page, follow its links, and archive its contents.

What happens if a robots meta tag has conflicting content values?

Unfortunately, the answer to this question is more complicated than it should be. Google and Yandex are the only two search engines that have publicly commented on this situation, and each handles it completely different from the other. When presented with conflicting attribute values, Google will choose the most restrictive value, whereas Yandex will choose the attribute’s default value instead. To illustrate, let’s assume a page has the following robots meta tag:

In this example, Google will respect the noindex value and NOT index the corresponding page because that is more restrictive. Yandex, on the other hand, will respect the index value because that is the default value for this attribute. Confusing, right? Fortunately, you can completely avoid this chaos by making your values consistent.

What happens if a page’s robots meta tag conflicts with robots.txt?

Google handles this situation the same way they handle conflicting robots meta tags: they respect the most restrictive value. Therefore, if a page is blocked in robots.txt, Google will respect that, regardless of whether or not the page has a robots meta tag with an index value (because the page’s appearance in robots.txt is more restrictive). None of the other search engines have publicly commented on how they handle this situation so the best way to avoid a problem is to make your robots meta tags consistent with your robots.txt file.

Even if a page is blocked by robots.txt, its links will still be followed unless the page contains a robots meta tag with a nofollow value.

Is there any other minutia you’d like to tell me?

Actually, yes. Thanks for asking. I have two more quick things. First, it’s important to note that content values are case insensitive. Specifically, noindex will be interpreted the same as NOINDEX and NoInDeX. Second, if you use multiple content values, they must be comma-delimited (as mentioned above), but the spaces surrounding those commas will be ignored. Thus, noindex, nofollow is treated the same as noindex,nofollow.

Did you just make all of this up?

I would really like to say, “Yes.” But I actually researched this post (boring, I know). If you’d like to double check my work, here are relevant resources for each of the major search engines:

Google – Using the robots meta tag and Robots meta tag specifications
Bing – Prevent a bot from getting “lost in space”
Ask – The Ask Website Crawler FAQ
Yandex – How to manage the robot
Baidu – Baidu Search Help Center

If you have any other questions, please leave a comment below, and we’ll keep this little Q&A going!

Comments

Nishant says

February 7, 2013 at 4:32 am

Whats the difference between Robot.txt and robot meta tags
- steve says
  
  February 8, 2013 at 5:53 pm
  
  Hi Nishant,
  
  That’s an excellent question.
  
  The robots.txt file is used to restrict search engine crawlers from accessing entire sections of your website. You can read more about it here: A robots.txt File Guide That Won’t Put You to Sleep.
  
  Robots meta tags, on the other hand, are used to control search engine access for individual pages on your site. Specifically, you use the tag to tell search engines if you want them to index and/or follow the links found on a given page.
  
  Thanks for the comment!
Jennifer Sibley says

August 28, 2013 at 12:51 pm

I am using Screaming Frog is crawl my site and the results do not include of list of robots meta tags. Is there any tool that will provide the robots meta tag for each page on a site or do I have to manually check the tag on each page?

Thank you.
- steve says
  
  August 28, 2013 at 1:08 pm
  
  Hi Jennifer,
  
  I use a custom crawler for my audits (and it automatically checks the robots meta tag for each page), but it also looks like Screaming Frog provides that information.
  
  If you look under the “Directives” tab, you should see a column titled, “Meta Data 1” (if the column is empty, you don’t have any explicit values set for the robots meta tag — and you should be good to go).
  
  I hope that helps… and thanks for commenting 🙂
  
  -Steve
  - Jennifer Sibley says
    
    August 28, 2013 at 1:58 pm
    
    Thanks so much, Steve! Lifesaver!
    - steve says
      
      August 28, 2013 at 4:57 pm
      
      You’re very welcome… I’m glad I could help 🙂
leana says

March 20, 2014 at 2:37 pm

Hey – I have a question for you. I would like to block some pages of my site without using robot.txt – how does one do that using meta tags? For example google bot is in every folder of my site. going to /gallery/album – I want to block that part of my site from robots using meta tags.

I have already done the following – disallowing it from following.

a. will this remove the existing photos on search results? (these were extracted from the gallery folder – which is really password controlled so the bot literary embarrassed me/ as clients thought my site was password protected only to find their pics on google search results.
b . will it remove archived data?
c. does no follow mean not following the linked pages only or will it stop the search engine from revisiting my site? I want to make sure that it revisits but not to index the links.

Many thanks for helping us out here.

leana
- steve says
  
  March 20, 2014 at 6:06 pm
  
  To noindex a page, add the following markup in the <head> section of the page in question:
  
  <meta name=”robots” content=”noindex” />
  
  As for your specific questions…
  
  If you want to remove existing pages from Google’s index, you can expedite the process by explicitly removing the content (in addition to adding noindex tags).
  
  If you use nofollow in a robots meta tag, it will only prevent crawlers from following the links on that specific page. But it’s important to note that crawlers might still find the destination pages for those links (if other links exist that don’t use nofollow).

	The Most Actionable SEO Tips Ever
	33 Free SEO Tools You Should Know About
	How I Would Fix Grantland’s SEO: An In-Depth Audit
	10 SEO Analysis Tools You Should Be Using