Webmaster Central Blog
Official news on crawling and indexing sites for the Google index
URL removal explained, Part I: URLs & directories
星期二, 三月 30, 2010
Webmaster level: All
There's
a lot of content on the Internet these days
. At some point, something may turn up online that you would rather not have out there—anything from an inflammatory blog post you regret publishing, to confidential data that accidentally got exposed. In most cases, deleting or restricting access to this content will cause it to naturally drop out of search results after a while. However, if you urgently need to remove unwanted content that has gotten indexed by Google and you can't wait for it to naturally disappear, you can use our URL removal tool to expedite the removal of content from our search results as long as it meets certain
criteria
(which we'll discuss below).
We've got a series of blog posts lined up for you explaining how to successfully remove various types of content, and common mistakes to avoid. In this first post, I'm going to cover a few basic scenarios: removing a single URL, removing an entire directory or site, and reincluding removed content. I also strongly recommend our previous post on
managing what information is available about you online
.
Removing a single URL
In general, in order for your removal requests to be successful, the owner of the URL(s) in question—whether that's you, or someone else—must have indicated that it's okay to remove that content. For an individual URL, this can be indicated in any of three ways:
block the page from crawling via a
robots.txt file
block the page from indexing via a
noindex meta tag
indicate that the page no longer exists by returning a
404 or 410 status code
Before submitting a removal request, you can check whether the URL is correctly blocked:
robots.txt:
You can check whether the URL is correctly disallowed using either the
Fetch as Googlebot
or
Test robots.txt
features in Webmaster Tools.
noindex meta tag:
You can use Fetch as Googlebot to make sure the meta tag appears somewhere between the
<head>
and
</head>
tags. If you want to check a page you can't verify in Webmaster Tools, you can open the URL in a browser, go to
View > Page source
, and make sure you see the meta tag between the
<head>
and
</head>
tags.
404 / 410 status code:
You can use Fetch as Googlebot, or tools like
Live HTTP Headers
or
web-sniffer.net
to verify whether the URL is actually returning the correct code. Sometimes "deleted" pages may
say
"404" or "Not found" on the page, but actually return a 200 status code in the page header; so it's good to use a proper header-checking tool to double-check.
If unwanted content has been removed from a page but the page hasn't been blocked in any of the above ways, you will
not be able to completely remove that URL
from our search results. This is most common when you don't own the site that's hosting that content. We cover what to do in this situation
in a subsequent post.
in
Part II of our removals series
.
If a URL meets one of the above criteria, you can remove it by going to
http://www.google.com/webmasters/tools/removals
, entering the URL that you want to remove, and selecting the "Webmaster has already blocked the page" option. Note that you should enter the URL where the content was hosted,
not
the URL of the Google search where it's appearing. For example, enter
http://www.example.com/
embarrassing-stuff.html
not
http://www.google.com/search
?q=embarrassing+stuff
This article
has more details about making sure you're entering the proper URL. Remember that if you don't tell us the exact URL that's troubling you, we won't be able to remove the content you had in mind.
Removing an entire directory or site
In order for a directory or site-wide removal to be successful, the directory or site must be
disallowed in the site's
robots.txt file
. For example, in order to remove the http://www.example.com/secret/ directory, your robots.txt file would need to include:
User-agent: *
Disallow: /secret/
It isn't enough for the root of the directory to return a 404 status code, because it's possible for a directory to return a 404 but still serve out files underneath it. Using robots.txt to block a directory (or an entire site) ensures that all the URLs under that directory (or site) are blocked as well. You can test whether a directory has been blocked correctly using either the
Fetch as Googlebot
or
Test robots.txt
features in Webmaster Tools.
Only verified owners of a site can request removal of an entire site or directory in Webmaster Tools. To request removal of a directory or site, click on the site in question, then go to
Site configuration > Crawler access > Remove URL
. If you enter the root of your site as the URL you want to remove, you'll be asked to confirm that you want to remove the entire site. If you enter a subdirectory, select the "Remove directory" option from the drop-down menu.
Reincluding content
You can cancel removal requests for any site you own at any time, including those submitted by other people. In order to do so, you must be a
verified owner of this site
in Webmaster Tools. Once you've verified ownership, you can go to
Site configuration > Crawler access > Remove URL > Removed URLs
(or
> Made by others
) and click "Cancel" next to any requests you wish to cancel.
Still have questions? Stay tuned for the rest of our series on removing content from Google's search results. If you can't wait, much has already been written about URL removals, and troubleshooting individual cases, in our
Help Forum
. If you still have questions after reading others' experiences, feel free to ask. Note that, in most cases, it's hard to give relevant advice about a particular removal without knowing the site or URL in question. We recommend sharing your URL by using a
URL shortening service
so that the URL you're concerned about doesn't get indexed as part of your post; some shortening services will even let you disable the shortcut later on, once your question has been resolved.
Edit:
Read the rest of this series:
Part II: Removing & updating cached content
Part III: Removing content you don't own
Part IV: Tracking requests, what not to remove
Companion post:
Managing what information is available about you online
Posted by Susan Moskwa, Webmaster Trends Analyst
Hey!
Check here if your site is mobile-friendly.
标签
accessibility
10
advanced
195
AMP
13
Android
2
API
7
apps
7
autocomplete
2
beginner
173
CAPTCHA
1
Chrome
2
cms
1
crawling and indexing
158
encryption
3
events
51
feedback and communication
83
forums
5
general tips
90
geotargeting
1
Google Assistant
3
Google I/O
3
Google Images
3
Google News
2
hacked sites
12
hangout
2
hreflang
3
https
5
images
12
intermediate
205
interstitials
1
javascript
8
job search
2
localization
21
malware
6
mobile
63
mobile-friendly
14
nohacked
1
performance
17
product expert
1
product experts
2
products and services
63
questions
3
ranking
1
recipes
1
rendering
2
Responsive Web Design
3
rich cards
7
rich results
10
search console
35
search for beginners
1
search queries
7
search results
140
security
12
seo
3
sitemaps
46
speed
6
structured data
33
summit
1
TLDs
1
url removals
1
UX
3
verification
8
video
6
webmaster community
24
webmaster forum
1
webmaster guidelines
57
webmaster tools
177
webmasters
3
youtube channel
6
Archive
2020
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2019
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2018
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2017
12月
11月
10月
9月
8月
6月
5月
4月
3月
2月
1月
2016
12月
11月
10月
9月
8月
6月
5月
4月
3月
1月
2015
12月
11月
10月
9月
8月
7月
5月
4月
3月
2月
1月
2014
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2013
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2012
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2011
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2010
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2009
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2008
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2007
12月
11月
10月
9月
8月
7月
6月
5月
4月
3月
2月
1月
2006
12月
11月
10月
9月
8月
Feed
Follow @googlewmc
Give us feedback in our
Product Forums
.
Subscribe via email
Enter your email address:
Delivered by
FeedBurner