Member Sign In
Not a member?

A Wired.com user account lets you create, edit and comment on Webmonkey articles. You will also be able to contribute to the Wired How-To Wiki and comment on news stories at Wired.com.


It's fast and free.

Sign in with OpenID
Sign In
Webmonkey is a property of Wired Digital.
processing...
Join Webmonkey

Please send me occasional e-mail updates about new features and special offers from Wired/Webmonkey.
Yes No

Please send occasional e-mail offers from Wired/Webmonkey affiliated web sites and publications, and carefully selected companies.
Yes No

I understand and agree that registration on or use of this site constitutes agreement to Webmonkey's User Agreement and Privacy Policy.
Webmonkey is a property of Wired Digital.
processing...

Retrieve Sign In

Please enter your e-mail address or username below. Your username and password will be sent to the e-mail address you provided us.

or
Webmonkey is a property of Wired Digital.
processing...

Welcome to Webmonkey

A private profile page has been created for you.
As a member of Webmonkey, you can now:
  • edit articles
  • add to the code library
  • design and write a tutorial
  • comment on any Webmonkey article
Close
Webmonkey is a property of Wired Digital.

Sign In Information Sent

An e-mail has been sent to the e-mail address registered in this account.
If you cannot find it in your in-box, please check your bulk or junk folders.
Sign In
Webmonkey is a property of Wired Digital.

Search Engine Robots Agree Over Standards

How I would like to be a fly on the wall during Microsoft, Yahoo, and Google meetings like the one where they agreed upon Robots Exclusion Protocols (REP). Do they trade barbs, quips and underhanded comments? In my imagination, the gathering is much like a Three Stooges episode.

However these meetings actually go, Web Developers reap the benefits when the three competitors agree as evidenced by the announcement of a standard robots.com protocol.

Microsoft, Yahoo, and Google each announced their involvement in the protocol over the past week along with documentation describing the protocol.

Search engines gather their information by creating tiny programs, or robots, that scan the internet for information. When the programs detect a web server, they copy all files in the server’s directories to their local cache, scan their data, and categorize them for inclusion into search results. Robots.txt is a file that is placed in web server directories that allow permission to the directory to search engines. If a robots.txt file is absent in the directory, the robot automatically assumes you are allowing the contents of that directory to be accessed by search engine.

The REP standardizes how the robots.txt file is interpreted by search engines. It allows web developers more control over privacy and how their data will appear.

All parties benefit by the new agreed-upon protocol because inconsistencies between search engines are erased. Now robots.txt files will be honored equally among the biggest search engines, and presumably by the rest of web-crawling robot community.

Post Comment Comments Permalink Print
Reddit Digg

 
Subscribe now

Special Offer For Webmonkey Users

WIRED magazine:
The first word on how technology is changing our world.

Subscribe for just $10 a year