//Google wants to establish an official standard for the use of robots.txt via @MattGSouthern
1562018267 google wants to establish an official standard for the use of robots txt via mattgsouthern 760x490 - Google wants to establish an official standard for the use of robots.txt via @MattGSouthern

Google wants to establish an official standard for the use of robots.txt via @MattGSouthern

 

 

google wants to establish an official standard for the use of robots txt via mattgsouthern - Google wants to establish an official standard for the use of robots.txt via @MattGSouthern

google wants to establish an official standard for the use of robots txt via mattgsouthern - Google wants to establish an official standard for the use of robots.txt via @MattGSouthern & # 39;);

h3_html = & # 39;

& # 39; + cat_head_params.sponsor.headline + & # 39; & # 39;

& nbsp;

cta = & # 39; & # 39; +
atext = & # 39;

& # 39; + cat_head_params.sponsor_text +

& # 39 ;;
scdetails = scheader.getElementsByClassName (& # 39; scdetails & # 39;);
sappendHtml (scdetails [0] h3_html);
sappendHtml (scdetails [0] atext);
sappendHtml (scdetails [0] cta);
// logo
sappendHtml (scheader, "http://www.searchenginejournal.com/");
sc_logo = scheader.getElementsByClassName (& # 39; sc-logo & # 39;);
logo_html = & # 39; - Google wants to establish an official standard for the use of robots.txt via @MattGSouthern & # 39 ;;
sappendHtml (sc_logo [0] logo_html);

sappendHtml (scheader, & # 39;

ADVERTISING

& # 39;)

if ("undefined"! = typeof __gaTracker) {
__gaTracker ('create', 'UA-1465708-12', 'auto', 'tkTracker');
__gaTracker ("tkTracker.set", "dimension1", window.location.href);
__gaTracker ('tkTracker.set', 'dimension2', 'seo');
__gaTracker ("tkTracker.set", "contentGroup1", & # 39; seo & # 39;);
__gaTracker ('tkTracker.send', 'hitType': 'pageview', page: cat_head_params.logo_url, & title> #:; Cat_head_params.sponsor.headline, & # 39; sessionControl & # 39 ;: & # 39;
slinks = scheader.getElementsByTagName ("a");
sadd_event (slinks, click & # 39 ;, spons_track);
}
} // endif cat_head_params.sponsor_logo

Google has proposed an official Internet standard for rules included in robots.txt files.

These rules, described in the robot exclusion protocol (REP), are described in the robot exclusion protocol (REP). ), have been an unofficial standard for the past 25 years.

Although the REP has been adopted by search engines, it is still not official, which means that it is open to interpretation by developers. In addition, it has never been updated to cover current use cases.

It's been 25 years and the robot exclusion protocol has never become an official standard. Although it was adopted by all major search engines, it did not cover everything: does an HTTP status code of 500 mean that the crawler can scan all or nothing? 1945 pic.twitter.com/imqoVQW92V

– Google Webmasters (@googlewmc) July 1, 2019

As Google says, this creates a challenge for website owners, as the factual and ambiguous standard of fact made it difficult to write rules correctly.

To eliminate this challenge, Google documented how the REP was used on the modern Web and submitted it to the Internet Engineering Task Force (IETF) for review.

Google explains what is included in the project:

"The proposed REP project reflects more than 20 years of actual experience using robots.txt rules , used by both Googlebot and other large robots, as well as about half a billion websites that rely on REP. These refined controls give the editor the power to decide what they would like to be crawled on his site and potentially shown to interested users. "

The project does not modify any of the rules established in 1994, it's just been updated for the modern Web.

Here are some updated rules:

Any URI-based forwarding protocol can use the robots.txt file, which is no longer limited to HTTP, can also be used for FTP or CoAP, and developers must scan at least the first 500 kilobytes. a new robots.txt file.A new maximum caching time of 24 hours or the value of the cache directive, if any, providing website owners the ability to update their robots. txt at any time.When a robots.txt file becomes inaccessible due to server failures, known unauthorized pages are not scanned for a reasonable amount of time.

Google is quite willing to do comments on the proposed draft and confirms it to do it right.