Webmaster Central Blog
Official news on crawling and indexing sites for the Google index
New robots.txt feature and REP Meta Tags
Wednesday, August 15, 2007
Posted by John Blackburn, Webmaster Tools and Matt Dougherty, Search Quality
We've improved Webmaster Central's robots.txt analysis tool to recognize Sitemap declarations and relative URLs. Earlier versions weren't aware of Sitemaps at all, and understood only absolute URLs; anything else was reported as
Syntax not understood
. The improved version now tells you whether your Sitemap's URL and scope are valid. You can also test against relative URLs with a lot less typing.
Reporting is better, too. You'll now be told of multiple problems per line if they exist, unlike earlier versions which only reported the first problem encountered. And we've made other general improvements to analysis and validation.
Imagine that you're responsible for the domain
www.example.com
and you want search engines to index everything on your site, except for your /images folder. You also want to make sure your Sitemap gets noticed, so you save the following as your robots.txt file:
disalow images
user-agent: *
Disallow:
sitemap: http://www.example.com/sitemap.xml
You visit Webmaster Central to test your site against the robots.txt analysis tool using these two test URLs:
http://www.example.com
/archives
Earlier versions of the tool would have reported this:
The improved version tells you more about that robots.txt file:
See for yourself at
http://www.google.com/webmasters/tools
.
We also want to make sure you've heard about the new unavailable_after meta tag announced by Dan Crow on the
Official Google Blog
a few weeks ago. This allows for a more dynamic relationship between your site and Googlebot. Just think, with
www.example.com
, any time you have a temporarily available news story or limited offer sale or promotion page, you can specify the exact date and time you want specific pages to stop being crawled and indexed.
Let's assume you're running a promotion that expires at the end of 2007. In the headers of page
www.example.com/2007promotion.html
, you would use the following:
<META NAME="GOOGLEBOT"
CONTENT="unavailable_after: 31-Dec-2007 23:59:59 EST">
The second exciting news: the new X-Robots-Tag directive, which adds
Robots Exclusion Protocol
(REP) META tag support for non-HTML pages! Finally, you can have the same control over your videos, spreadsheets, and other indexed file types. Using the example above, let's say your promotion page is in PDF format. For
www.example.com/2007promotion.pdf
, you would use the following:
X-Robots-Tag: unavailable_after: 31 Dec
2007 23:59:59 EST
Remember, REP meta tags can be useful for implementing noarchive, nosnippet, and now unavailable_after tags for page-level instruction, as opposed to robots.txt, which is controlled at the domain root. We get requests from bloggers and webmasters for these features, so enjoy. If you have other suggestions, keep them coming. Any questions? Please ask them in the
Webmaster Help Group
.
Hey!
Check here if your site is mobile-friendly.
Labels
accessibility
10
advanced
195
AMP
13
Android
2
API
7
apps
7
autocomplete
2
beginner
173
CAPTCHA
1
Chrome
2
cms
1
crawling and indexing
158
encryption
3
events
51
feedback and communication
83
forums
5
general tips
90
geotargeting
1
Google Assistant
3
Google I/O
3
Google Images
3
Google News
2
hacked sites
12
hangout
2
hreflang
3
https
5
images
12
intermediate
205
interstitials
1
javascript
8
job search
2
localization
21
malware
6
mobile
63
mobile-friendly
14
nohacked
1
performance
17
product expert
1
product experts
2
products and services
63
questions
3
ranking
1
recipes
1
rendering
2
Responsive Web Design
3
rich cards
7
rich results
10
search console
35
search for beginners
1
search queries
7
search results
140
security
12
seo
3
sitemaps
46
speed
6
structured data
33
summit
1
TLDs
1
url removals
1
UX
3
verification
8
video
6
webmaster community
24
webmaster forum
1
webmaster guidelines
57
webmaster tools
177
webmasters
3
youtube channel
6
Archive
2020
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2019
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2018
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sept
Aug
June
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sept
Aug
June
May
Apr
Mar
Jan
2015
Dec
Nov
Oct
Sept
Aug
July
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2006
Dec
Nov
Oct
Sept
Aug
Feed
Follow @googlewmc
Give us feedback in our
Product Forums
.
Subscribe via email
Enter your email address:
Delivered by
FeedBurner