In a previous post, we discussed how to use the robots.txt file to prevent search engine crawlers from crawling sections of your website. But what if you only want to restrict how crawlers handle specific pages on the site? Is there a granular way to control how crawlers handle individual pages? Fortunately, there is, and it’s called the robots meta tag. Now that you know what it’s called, I’m sure your mind is full of questions. So let’s start answering them…
What is a robots meta tag?
A robots meta tag is a line of HTML code that is included in a Web page to instruct search engine crawlers how they should process that page. Specifically, the tag tells crawlers if they are allowed to index the page, follow its links, and/or archive its contents. If you don’t want to restrict the crawlers, you shouldn’t include a robots meta tag.
What does a robots meta tag look like?
As previously mentioned, the tag is a line of HTML code. Here’s a very simple example:
<head>
…
<meta name=”robots” content=”noindex” />
</head>
…
This robots meta tag tells search engines not to index the corresponding page (i.e., don’t show the page in the search results).
What are the most common content values for a robots meta tag?
The following values are supported by the most popular search engines (Google, Bing, Yahoo!, Ask, and Yandex):
- index – allows search engines to index the page
- noindex – prevents search engines from indexing the page
- follow – allows search engine crawlers to follow (i.e., crawl) links on the page
- nofollow – prevents search engine crawlers from following (i.e., crawling) links on the page
- archive – allows search engines to store a cached copy of the page (and show it in the search results)
- noarchive – prevents search engines from storing a cached copy of the page
Are there any other content values for a robots meta tag?
Actually, there are a lot more. Here is an exhaustive list:
- all – allows search engine crawlers to index the page and follow its links (i.e., it combines the index and follow values) – supported by Google and Yandex
- none – prevents search engine crawlers from indexing the page and following its links (i.e., it combines the noindex and nofollow values) – supported by Google, Ask, and Yandex
- nosnippet – prevents search engines from displaying a descriptive snippet for the page in the search results (it also prevents the crawlers from storing a cached copy of the page) – supported by Google
- nocache – prevents search engines from storing a cached copy of the page (i.e., the same as the noarchive value) – supported by Bing (and Yahoo!, now that it’s powered by Bing)
- noodp – prevents search engines from using the Open Directory Project description as the page’s descriptive snippet in the search results – supported by Google, Bing (and Yahoo!)
- notranslate – prevents search engines from translating the page in the search results – supported by Google
- noimageindex – prevents search engines from indexing the page’s images – supported by Google
- unavailable_after – prevents search engines from showing the page in the search results after a specified date/time – supported by Google
- noydir – prevents search engines from using the Yahoo! Directory description as the page’s descriptive snippet in the search results – previously supported by Yahoo! (before it was powered by Bing)
Is there a sweet table summarizing robots meta tag content values?
Why yes, yes there is:
Value | Bing | Yahoo! | Ask | Yandex | Baidu | |
---|---|---|---|---|---|---|
index | YES | YES | YES | YES | YES | YES |
noindex | YES | YES | YES | YES | YES | NO |
follow | YES | YES | YES | YES | YES | YES |
nofollow | YES | YES | YES | YES | YES | YES |
archive | YES | YES | YES | YES | YES | YES |
noarchive | YES | YES | YES | YES | YES | YES |
all | YES | NO | NO | NO | YES | NO |
none | YES | NO | NO | YES | YES | NO |
nosnippet | YES | NO | NO | NO | NO | NO |
nocache | NO | YES | YES | NO | NO | NO |
noodp | YES | YES | YES | NO | NO | NO |
notranslate | YES | NO | NO | NO | NO | NO |
noimageindex | YES | NO | NO | NO | NO | NO |
unavailable_after | YES | NO | NO | NO | NO | NO |
If I want to use multiple content values, do I have to use multiple robots meta tags?
No. If you want to use multiple values, you can combine them in a comma-separated list. To illustrate, let’s assume you want to prevent search engines from indexing a page and following its links. You can accomplish this with multiple robots meta tags:
<meta name=”robots” content=”nofollow” />
Or you can accomplish the same task with one robots meta tag that contains multiple content values:
What happens if I don’t include a robots meta tag in my page?
Absolutely nothing. By default, the search engines assume you don’t want any restrictions placed upon your page. More specifically, if you omit the robots meta tag, the search engines assume you want them to index the page, follow its links, and archive its contents.
What happens if a robots meta tag has conflicting content values?
Unfortunately, the answer to this question is more complicated than it should be. Google and Yandex are the only two search engines that have publicly commented on this situation, and each handles it completely different from the other. When presented with conflicting attribute values, Google will choose the most restrictive value, whereas Yandex will choose the attribute’s default value instead. To illustrate, let’s assume a page has the following robots meta tag:
In this example, Google will respect the noindex value and NOT index the corresponding page because that is more restrictive. Yandex, on the other hand, will respect the index value because that is the default value for this attribute. Confusing, right? Fortunately, you can completely avoid this chaos by making your values consistent.
What happens if a page’s robots meta tag conflicts with robots.txt?
Google handles this situation the same way they handle conflicting robots meta tags: they respect the most restrictive value. Therefore, if a page is blocked in robots.txt, Google will respect that, regardless of whether or not the page has a robots meta tag with an index value (because the page’s appearance in robots.txt is more restrictive). None of the other search engines have publicly commented on how they handle this situation so the best way to avoid a problem is to make your robots meta tags consistent with your robots.txt file.
Is there any other minutia you’d like to tell me?
Actually, yes. Thanks for asking. I have two more quick things. First, it’s important to note that content values are case insensitive. Specifically, noindex will be interpreted the same as NOINDEX and NoInDeX. Second, if you use multiple content values, they must be comma-delimited (as mentioned above), but the spaces surrounding those commas will be ignored. Thus, noindex, nofollow is treated the same as noindex,nofollow.
Did you just make all of this up?
I would really like to say, “Yes.” But I actually researched this post (boring, I know). If you’d like to double check my work, here are relevant resources for each of the major search engines:
- Google – Using the robots meta tag and Robots meta tag specifications
- Bing – Prevent a bot from getting “lost in space”
- Ask – The Ask Website Crawler FAQ
- Yandex – How to manage the robot
- Baidu – Baidu Search Help Center
If you have any other questions, please leave a comment below, and we’ll keep this little Q&A going!
Nishant says
Whats the difference between Robot.txt and robot meta tags
steve says
Hi Nishant,
That’s an excellent question.
The robots.txt file is used to restrict search engine crawlers from accessing entire sections of your website. You can read more about it here: A robots.txt File Guide That Won’t Put You to Sleep.
Robots meta tags, on the other hand, are used to control search engine access for individual pages on your site. Specifically, you use the tag to tell search engines if you want them to index and/or follow the links found on a given page.
Thanks for the comment!
Jennifer Sibley says
I am using Screaming Frog is crawl my site and the results do not include of list of robots meta tags. Is there any tool that will provide the robots meta tag for each page on a site or do I have to manually check the tag on each page?
Thank you.
steve says
Hi Jennifer,
I use a custom crawler for my audits (and it automatically checks the robots meta tag for each page), but it also looks like Screaming Frog provides that information.
If you look under the “Directives” tab, you should see a column titled, “Meta Data 1” (if the column is empty, you don’t have any explicit values set for the robots meta tag — and you should be good to go).
I hope that helps… and thanks for commenting 🙂
-Steve
Jennifer Sibley says
Thanks so much, Steve! Lifesaver!
steve says
You’re very welcome… I’m glad I could help 🙂
leana says
Hey – I have a question for you. I would like to block some pages of my site without using robot.txt – how does one do that using meta tags? For example google bot is in every folder of my site. going to /gallery/album – I want to block that part of my site from robots using meta tags.
I have already done the following – disallowing it from following.
a. will this remove the existing photos on search results? (these were extracted from the gallery folder – which is really password controlled so the bot literary embarrassed me/ as clients thought my site was password protected only to find their pics on google search results.
b . will it remove archived data?
c. does no follow mean not following the linked pages only or will it stop the search engine from revisiting my site? I want to make sure that it revisits but not to index the links.
Many thanks for helping us out here.
leana
steve says
To noindex a page, add the following markup in the <head> section of the page in question:
<meta name=”robots” content=”noindex” />
As for your specific questions…
If you want to remove existing pages from Google’s index, you can expedite the process by explicitly removing the content (in addition to adding noindex tags).
If you use nofollow in a robots meta tag, it will only prevent crawlers from following the links on that specific page. But it’s important to note that crawlers might still find the destination pages for those links (if other links exist that don’t use nofollow).