Webmaster Central Blog
Official news on crawling and indexing sites for the Google index
Reunifying duplicate content on your website
Tuesday, October 06, 2009
Handling duplicate content within your own website can be a big challenge. Websites grow; features get added, changed and removed; content comes—content goes. Over time, many websites collect systematic cruft in the form of multiple URLs that return the same contents. Having duplicate content on your website is generally not problematic, though it can make it harder for search engines to crawl and index the content. Also, PageRank and similar information found via incoming links can get diffused across pages we aren't currently recognizing as duplicates, potentially making your preferred version of the page rank lower in Google.
Steps for dealing with duplicate content within your website
Recognize duplicate content on your website.
The first and most important step is to recognize duplicate content on your website. A simple way to do this is to take a unique text snippet from a page and to search for it, limiting the results to pages from your own website by using a
site:query
in Google. Multiple results for the same content show duplication you can investigate.
Determine your preferred URLs.
Before fixing duplicate content issues, you'll have to determine your preferred URL structure. Which URL would you prefer to use for that piece of content?
Be consistent within your website.
Once you've chosen your preferred URLs, make sure to use them in all possible locations within your website (including in your
Sitemap file
).
Apply 301 permanent redirects where necessary and possible.
If you can, redirect duplicate URLs to your preferred URLs using a 301 response code. This helps users and search engines find your preferred URLs should they visit the duplicate URLs. If your site is available on several domain names, pick one and use the 301 redirect appropriately from the others, making sure to forward to the right specific page, not just the root of the domain. If you support both www and non-www host names, pick one, use the
preferred domain setting in Webmaster Tools
, and redirect appropriately.
Implement
the rel="canonical" link element
on your pages where you can.
Where 301 redirects are not possible, the rel="canonical" link element can give us a better understanding of your site and of your preferred URLs. The use of this link element is also supported by major search engines such as
Ask.com
,
Bing
and
Yahoo!
.
Use the
URL parameter handling tool
in Google Webmaster Tools where possible.
If some or all of your website's duplicate content comes from URLs with query parameters, this tool can help you to notify us of important and irrelevant parameters within your URLs. More information about this tool can be found in our
announcement blog post
.
What about the robots.txt file?
One item which is missing from this list is disallowing crawling of duplicate content with your robots.txt file.
We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methods
. Instead, use the
rel="canonical" link element
, the
URL parameter handling tool
, or 301 redirects. If access to duplicate content is entirely blocked, search engines effectively have to treat those URLs as separate, unique pages since they cannot know that they're actually just different URLs for the same content. A better solution is to allow them to be crawled, but clearly mark them as duplicate using one of our recommended methods. If you allow us to crawl these URLs, Googlebot will learn rules to identify duplicates just by looking at the URL and should largely avoid unnecessary recrawls in any case. In cases where duplicate content still leads to us crawling too much of your website, you can also
adjust the crawl rate setting in Webmaster Tools
.
We hope these methods will help you to master the duplicate content on your website! Information about duplicate content in general can also be found in our
Help Center
. Should you have any questions, feel free to join the discussion in our
Webmaster Help Forum
.
Posted by
John Mueller
, Webmaster Trends Analyst, Google Zürich
Hey!
Check here if your site is mobile-friendly.
Labels
accessibility
10
advanced
195
AMP
13
Android
2
API
7
apps
7
autocomplete
2
beginner
173
CAPTCHA
1
Chrome
2
cms
1
crawling and indexing
158
encryption
3
events
51
feedback and communication
83
forums
5
general tips
90
geotargeting
1
Google Assistant
3
Google I/O
3
Google Images
3
Google News
2
hacked sites
12
hangout
2
hreflang
3
https
5
images
12
intermediate
205
interstitials
1
javascript
8
job search
2
localization
21
malware
6
mobile
63
mobile-friendly
14
nohacked
1
performance
17
product expert
1
product experts
2
products and services
63
questions
3
ranking
1
recipes
1
rendering
2
Responsive Web Design
3
rich cards
7
rich results
10
search console
35
search for beginners
1
search queries
7
search results
140
security
12
seo
3
sitemaps
46
speed
6
structured data
33
summit
1
TLDs
1
url removals
1
UX
3
verification
8
video
6
webmaster community
24
webmaster forum
1
webmaster guidelines
57
webmaster tools
177
webmasters
3
youtube channel
6
Archive
2020
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2019
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2018
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sept
Aug
June
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sept
Aug
June
May
Apr
Mar
Jan
2015
Dec
Nov
Oct
Sept
Aug
July
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sept
Aug
July
June
May
Apr
Mar
Feb
Jan
2006
Dec
Nov
Oct
Sept
Aug
Feed
Follow @googlewmc
Give us feedback in our
Product Forums
.
Subscribe via email
Enter your email address:
Delivered by
FeedBurner