DIRECTIVE | IMPACT | USE CASES |
Disallow | Tells a crawler not to index your site -- your site's robots.txt file still needs to be crawled to find this directive, however disallowed pages will not be crawled | 'No Crawl' page from a site. This directive in the default syntax prevents specific path(s) of a site from being crawled. |
Allow | Tells a crawler the specific pages on your site you want indexed so you can use this in combination with Disallow | This is useful in particular in conjunction with Disallow clauses, where a large section of a site is disallowed except for a small section within it |
$ Wildcard Support | Tells a crawler to match everything from the end of a URL -- large number of directories without specifying specific pages | 'No Crawl' files with specific patterns, for example, files with certain filetypes that always have a certain extension, say pdf |
* Wildcard Support | Tells a crawler to match a sequence of characters | 'No Crawl' URLs with certain patterns, for example, disallow URLs with session ids or other extraneous parameters |
Sitemaps Location | Tells a crawler where it can find your Sitemaps | Point to other locations where feeds exist to help crawlers find URLs on a site |