Webmaster Central Blog
Official news on crawling and indexing sites for the Google index
Advanced Q&A from (the appropriately-named) SMX Advanced
giovedì, agosto 06, 2009
Webmaster Level: Intermediate to Advanced
Earlier this summer
SMX Advanced
landed once again in our fair city—Seattle—and it was indeed advanced. I got a number of questions at some Q&A panels that I had to go back and do a little research on. Here, as promised, are answers:
Q.
We hear that Google's now doing a
better job
of
indexing Flash content
. If I have a Flash file that pulls in content from an external file and the external file is blocked by
robots.txt
, will that content be indexed in the Flash (which is
not
blocked by robots.txt)? Or will Google not be able to index that content?
A.
We won't be able to access that content if it's in a file that's disallowed by robots.txt; so even though that content would be visible to humans (via the Flash), search engine crawlers wouldn't be able to access it. For more details, see our blog post on
indexing Flash that loads external resources
.
Q.
Sites that customize content based on user behavior or clickstream are becoming more common. If a user clicks through to my site from a search results page, can I customize the content of that page or redirect the user based on the terms in their search query? Or is that considered
cloaking
? For example, if someone searches for [vintage cameo pendants] but clicks on my site's general vintage jewelry page, can I redirect that user to my vintage cameo-specific page since I know that's what they were searching for?
A.
If you're redirecting or returning different content to the user than what Googlebot would see on that URL (e.g., based on the google.com referrer or query string), we consider that cloaking. If the searcher decided to click on the 'vintage jewelry' result, you should show them the page they clicked on even if you think a different page might be better. You can always link between related pages on your website (i.e., link to your 'vintage jewelry' page from your 'vintage cameos' page and vice versa, so that anyone landing on those pages from any source can cross-navigate); but we don't believe you should make that decision for the searcher.
Q.
Even though it involves showing different content to different visitors, Google considers ethical website testing (such as A/B or multivariate testing) a legitimate practice that does not violate Google's
guidelines
. One reason for this is because, while search engines may only see the original content of the page and not the variations, there's also a percentage of human users who see that same content; so the technique doesn't specifically target search engines.
However, some testing services recommend running 100% of a site's traffic through the winning combination for awhile after an experiment has completed, to verify that conversion rates stay high. How does this fit in with Google's view of cloaking?
A.
Running 100% of traffic through one combination for a brief period of time in order to verify your experiment's results is fine. However, as our
article on this subject
states, "if we find a site running a single non-original combination at 100% for a number of months... we may remove that site from our index." If you want to confirm the results of your experiment but are worried about "how long is too long," consider running a follow-up experiment in which you send most of your traffic through your winning combination while still sending a small percentage to the original page as a control.
This is what Google recommends
with its own testing tool, Website Optimizer.
Q.
If the character encoding specified in a page's HTTP header is different from that specified in the <
meta equiv="Content-Type"
> tag, which one will Google pay attention to?
A.
We take a look at both of these, and also do a bit of processing/guessing on our own based on the content of the page. Most major browsers prioritize the encoding specified in the HTTP header over that specified in the HTML, if both are valid but different. However, if you're aware that they're different, the best answer is to fix one of them!
Q.
How does Google handle triple-byte UTF-8-encoded international characters in a URL (such as Chinese or Japanese characters)? These types of URLs break in some applications; is Google able to process them correctly? Does Google understand keywords that are encoded this way—that is, can you understand that
www.example.com/%E9%9D%B4
is just as relevant to shoes as
www.example.com/shoes
is?
A.
We can correctly handle %-escaped UTF-8 characters in the URL path and in query parameters, and we understand keywords that are encoded in this way. For international characters in a domain name, we recommend using
punycode
rather than %-encoding, because some older browsers (such as IE6) don't support non-ASCII domain names.
Have a question of your own? Join our
discussion forum
.
Posted by
Susan Moskwa
, Webmaster Trends Analyst
Hey!
Check here if your site is mobile-friendly.
Etichette
accessibility
10
advanced
195
AMP
13
Android
2
API
7
apps
7
autocomplete
2
beginner
173
CAPTCHA
1
Chrome
2
cms
1
crawling and indexing
158
encryption
3
events
51
feedback and communication
83
forums
5
general tips
90
geotargeting
1
Google Assistant
3
Google I/O
3
Google Images
3
Google News
2
hacked sites
12
hangout
2
hreflang
3
https
5
images
12
intermediate
205
interstitials
1
javascript
8
job search
2
localization
21
malware
6
mobile
63
mobile-friendly
14
nohacked
1
performance
17
product expert
1
product experts
2
products and services
63
questions
3
ranking
1
recipes
1
rendering
2
Responsive Web Design
3
rich cards
7
rich results
10
search console
35
search for beginners
1
search queries
7
search results
140
security
12
seo
3
sitemaps
46
speed
6
structured data
33
summit
1
TLDs
1
url removals
1
UX
3
verification
8
video
6
webmaster community
24
webmaster forum
1
webmaster guidelines
57
webmaster tools
177
webmasters
3
youtube channel
6
Archive
2020
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2019
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2018
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2017
dic
nov
ott
set
ago
giu
mag
apr
mar
feb
gen
2016
dic
nov
ott
set
ago
giu
mag
apr
mar
gen
2015
dic
nov
ott
set
ago
lug
mag
apr
mar
feb
gen
2014
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2013
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2012
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2011
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2010
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2009
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2008
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2007
dic
nov
ott
set
ago
lug
giu
mag
apr
mar
feb
gen
2006
dic
nov
ott
set
ago
Feed
Follow @googlewmc
Give us feedback in our
Product Forums
.
Subscribe via email
Enter your email address:
Delivered by
FeedBurner