Official Google Webmaster Central Blog: Google's robots.txt parser is now open source

Webmaster Central Blog

Official news on crawling and indexing sites for the Google index

Google's robots.txt parser is now open source

mandag, juli 01, 2019

For 25 years, the Robots Exclusion Protocol (REP) was only a de-facto standard. This had frustrating implications sometimes. On one hand, for webmasters, it meant uncertainty in corner cases, like when their text editor included BOM characters in their robots.txt files. On the other hand, for crawler and tool developers, it also brought uncertainty; for example, how should they deal with robots.txt files that are hundreds of megabytes large?

Today, we announced that we're spearheading the effort to make the REP an internet standard. While this is an important step, it means extra work for developers who parse robots.txt files.
We're here to help: we open sourced the C++ library that our production systems use for parsing and matching rules in robots.txt files. This library has been around for 20 years and it contains pieces of code that were written in the 90's. Since then, the library evolved; we learned a lot about how webmasters write robots.txt files and corner cases that we had to cover for, and added what we learned over the years also to the internet draft when it made sense.
We also included a testing tool in the open source package to help you test a few rules. Once built, the usage is very straightforward:
robots_main <robots.txt content> <user_agent> <url>
If you want to check out the library, head over to our GitHub repository for the robots.txt parser. We'd love to see what you can build using it! If you built something using the library, drop us a comment on Twitter, and if you have comments or questions about the library, find us on GitHub.
Posted by Edu Pereda, Lode Vandevenne, and Gary, Search Open Sourcing team

Google

Hey! Check here if your site is mobile-friendly.

Etiketter

accessibility 10
advanced 195
AMP 13
Android 2
API 7
apps 7
autocomplete 2
beginner 173
CAPTCHA 1
Chrome 2
cms 1
crawling and indexing 158
encryption 3
events 51
feedback and communication 83
forums 5
general tips 90
geotargeting 1
Google Assistant 3
Google I/O 3
Google Images 3
Google News 2
hacked sites 12
hangout 2
hreflang 3
https 5
images 12
intermediate 205
interstitials 1
javascript 8
job search 2
localization 21
malware 6
mobile 63
mobile-friendly 14
nohacked 1
performance 17
product expert 1
product experts 2
products and services 63
questions 3
ranking 1
recipes 1
rendering 2
Responsive Web Design 3
rich cards 7
rich results 10
search console 35
search for beginners 1
search queries 7
search results 140
security 12
seo 3
sitemaps 46
speed 6
structured data 33
summit 1
TLDs 1
url removals 1
UX 3
verification 8
video 6
webmaster community 24
webmaster forum 1
webmaster guidelines 57
webmaster tools 177
webmasters 3
youtube channel 6

Archive

2020
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2019
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2018
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2017
- dec.
- nov.
- okt.
- sep.
- aug.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2016
- dec.
- nov.
- okt.
- sep.
- aug.
- jun.
- maj
- apr.
- mar.
- jan.

2015
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- maj
- apr.
- mar.
- feb.
- jan.

2014
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2013
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2012
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2011
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2010
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2009
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2008
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2007
- dec.
- nov.
- okt.
- sep.
- aug.
- jul.
- jun.
- maj
- apr.
- mar.
- feb.
- jan.

2006
- dec.
- nov.
- okt.
- sep.
- aug.

Feed

Give us feedback in our Product Forums.

Subscribe via email

Enter your email address:

Delivered by FeedBurner

Google
Privacy
Terms