Webmaster Level: Intermediate to Advanced
As the web evolves, Google’s crawling and indexing capabilities also need to progress. We
improved our indexing of Flash, built a more robust
infrastructure called Caffeine, and we even started
crawling forms where it makes sense. Now, especially with the growing popularity of JavaScript and, with it, AJAX, we’re finding more web pages requiring POST requests -- either for the entire content of the page or because the pages are missing information and/or look completely broken without the resources returned from POST. For Google Search this is less than ideal, because when we’re not properly discovering and indexing content, searchers may not have access to the most comprehensive and relevant results.
We generally advise to use
GET for fetching resources a page needs, and this is by far our preferred method of crawling. We’ve started experiments to rewrite POST requests to GET, and while this remains a valid strategy in some cases, often the contents returned by a web server for GET vs. POST are completely different. Additionally, there are legitimate reasons to use POST (e.g., you can attach more data to a POST request than a GET). So, while GET requests remain far more common, to surface more content on the web, Googlebot may now perform POST requests when we believe it’s safe and appropriate.
We take precautions to avoid performing any task on a site that could result in executing an unintended user action. Our POSTs are primarily for crawling resources that a page requests automatically, mimicking what a typical user would see when they open the URL in their browser. This will evolve over time as we find better heuristics, but that’s our current approach.
Let’s run through a few POSTs request scenarios that demonstrate how we’re improving our crawling and indexing to evolve with the web.
Examples of Googlebot’s POST requests
- Crawling a page via a POST redirect
<html>
<body onload="document.foo.submit();">
<form name="foo" action="request.php" method="post"> <input type="hidden" name="bar" value="234"/>
</form>
</body>
</html>
- Crawling a resource via a POST XMLHttpRequest
In this step-by-step example, we improve both the indexing of a page and its Instant Preview by following the automatic XMLHttpRequest generated as the page renders.
- Google crawls the URL, yummy-sundae.html.
- Google begins indexing yummy-sundae.html and, as a part of this process, decides to attempt to render the page to better understand its content and/or generate the Instant Preview.
- During the render, yummy-sundae.html automatically sends an XMLHttpRequest for a resource, hot-fudge-info.html, using the POST method.
<html>
<head>
<title>Yummy Sundae</title>
<script src="jquery.js"></script>
</head>
<body>
This page is about a yummy sundae.
<div id="content"></div>
<script type="text/javascript">
$(document).ready(function() {
$.post('hot-fudge-info.html', function(data)
{$('#content').html(data);});
});
</script>
</body>
</html>
- The URL requested through POST, hot-fudge-info.html, along with its data payload, is added to Googlebot’s crawl queue.
- Googlebot performs a POST request to crawl hot-fudge-info.html.
- Google now has an accurate representation of yummy-sundae.html for Instant Previews. In certain cases, we may also incorporate the contents of hot-fudge-info.html into yummy-sundae.html.
- Google completes the indexing of yummy-sundae.html.
- User searches for [hot fudge sundae].
- Google’s algorithms can now better determine how yummy-sundae.html is relevant for this query, and we can properly display a snapshot of the page for Instant Previews.
Improving your site’s crawlability and indexability
General advice for creating crawlable sites is found in our
Help Center. For webmasters who want to help Google crawl and index their content and/or generate the Instant Preview, here are a few simple reminders:
Controlling your content
If you’d like to prevent content from being crawled or indexed for Google Web Search, traditional
robots.txt directives remain the best method. To prevent the Instant Preview for your page(s), please see our
Instant Previews FAQ which describes the “Google Web Preview” User-Agent and the nosnippet meta tag.
Moving forward
We’ll continue striving to increase the comprehensiveness of our index so searchers can find more relevant information. And we expect our crawling and indexing capability to improve and evolve over time, just like the web itself. Please let us know if you have questions or concerns.
Written by Pawel Aleksander Fedorynski, Software Engineer, Indexing Team, and Maile Ohye, Developer Programs Tech Lead