How I would like to be a fly on the wall during Microsoft, Yahoo, and Google meetings like the one where they agreed upon Robots Exclusion Protocols (REP). Do they trade barbs, quips and underhanded comments? In my imagination, the gathering is much like a Three Stooges episode.
However these meetings actually go, Web Developers reap the benefits when the three competitors agree as evidenced by the announcement of a standard robots.com protocol.
Microsoft, Yahoo, and Google eachannouncedtheir involvement in the protocol over the past week along with documentation describing the protocol.
Search engines gather their information by creating tiny programs, or robots, that scan the internet for information. When the programs detect a web server, they copy all files in the server’s directories to their local cache, scan their data, and categorize them for inclusion into search results. Robots.txt is a file that is placed in web server directories that allow permission to the directory to search engines. If a robots.txt file is absent in the directory, the robot automatically assumes you are allowing the contents of that directory to be accessed by search engine.
The REP standardizes how the robots.txt file is interpreted by search engines. It allows web developers more control over privacy and how their data will appear.
All parties benefit by the new agreed-upon protocol because inconsistencies between search engines are erased. Now robots.txt files will be honored equally among the biggest search engines, and presumably by the rest of web-crawling robot community.
Starting this week, a lucky group of pilot customers in Texas will get 5 gigabytes of traffic per month for $29.95. After they exceed that cap (on day one, no doubt) each additional gigabyte will run them $1. There’s also a “high-end” package: 40 gigs for $55.
I ran a home server on Time Warner Cable for several years; now, mercifully, I have a much better provider. Even serving nothing but IMAP, as I did, would be quite costly under this new plan, which I’m sure is part of the case for the capping — TWC doesn’t want any users running servers. Their representative pitches it as a measure to tax the most gluttonous users of bandwidth: “5 percent of the company’s subscribers take up half of the capacity on local cable lines.” But even average browsing, YouTubing and Flickring, is going to rack up the gigabytes pretty fast.
Except where users are locked in by monopolies, they’ll doubtless be jumping ship to non-capping ISPs. Time Warner ought to be competing with the threat of cheap, uncapped floods of bandwidth brought by FIOS. Instead, the new rate system is competitive with burning DVDs and FedExing them. It’s not their first dubious business decision.
People with new babies love to play with their babies. People with nice cars love to drive them. People with servers love to watch their servers’ logs, which tell the story second-by-second of all the cool things that are happening.
LogAnalysis.org is the site for us. Its Library contains tons of information about logging, log analysis, real-time log monitoring, log parsing software, even log rotation.
What tools or tricks do you use to keep an eye on your logs?
From my inbox: “I believe that hosts and registrars should not control their customers’ content. Please recommend a non-censoring, respectable, and of course cheap domain registrar!”
There have been some recent cases of controversial domains being shut down by Godaddy and Network Solutions, two of the largest Internet registrars. Godaddy also makes enemies for itself, despite its low prices, with the outspoken, controversial opinions on its CEO’s prominently linked blog.
Godaddy shut down RateMyCop.com; Network Solutions shut down a site hosting a Dutch “anti-Koran” film.
Worse yet, Register.com charges $35/year for domain registration!