Archive for the ‘Backend’ Category

Behind the Scenes at Instagram: Tools for Building Reliable Web Services

In case you missed it, yesterday Facebook acquired Instagram, a photo-sharing service with some 30 million users and hundreds of millions of images on its servers.

The reported sale price of one billion dollars no doubt has many developers dreaming of riches, but how do you build a service and scale it to the size and success of Instagram? At least part of the answer lies in choosing your tools wisely.

Fortunately for outside developers, Instagram’s devs have been documenting the tools they used all along. The company’s engineering blog outlined its development stack last year and has further detailed how it uses several of the tools it’s chosen.

Instagram uses an interesting mashup of tried-and-true technologies alongside more cutting-edge tools, mixing SQL databases with NoSQL tools like Redis, and chosing to host its traditional Ubuntu servers in Amazon’s cloud.

In a blog post last year Instagram outlined its core principles when it comes to chosing tools, writing, “keep it very simple, don’t reinvent the wheel [and] go with proven and solid technologies when you can.”

In other words, go with the boring stuff that just works.

For Instagram that means a Django-based stack that runs on Ubuntu 11.04 servers and uses PostgreSQL for storage. There are several additional layers for load balancing, push notifications, queues and other tasks, but overwhelmingly Instagram’s stack consists of stolid, proven tools.

Among the newer stuff is Instagram’s use of Redis to store hundreds of millions of key-value pairs for fast feeds, and Gunicorn instead of Apache as a web server.

All in all it’s a very impressive setup that has, thus far, helped Instagram avoid the down time that has plague many similar services hit with the same kind of exponential growth. (Twitter, I’m looking at you.) For more details on how Instagram looks behind the scenes and which tools the company uses, be sure to check out the blog post as well as the archives.

Microsoft Unveils New Plan to Speed Up the Web

Photo: Lindsey Turner/Flickr

Microsoft wants in on the drive to speed up the web. The company plans to submit its proposal for a faster internet protocol to the standards body charged with creating HTTP 2.0.

Not coincidentally, that standards body, the Internet Engineering Task Force (IETF), is meeting this week to discuss the future of the venerable Hypertext Transfer Protocol, better known as HTTP. On the agenda is creating HTTP 2.0, a faster, modern approach to internet communication.

One candidate for HTTP 2.0 is Google’s SPDY protocol. Pronounced “speedy,” Google’s proposal would replace the HTTP protocol — the language currently used when your browser talks to a web server. When you request a webpage or a file from a server, chances are your browser sends that request using HTTP. The server answers using HTTP, too. This is why “http” appears at the beginning of most web addresses.

The SPDY protocol handles all the same tasks as HTTP, but SPDY can do it all about 50 percent faster. Chrome and Firefox both support SPDY and several large sites, including Google and Twitter, are already serving pages over SPDY where possible.

Part of the IETF’s agenda this week is to discuss the SPDY proposal, and the possibility of turning it into a standard.

But now Microsoft is submitting another proposal for the IETF to consider.

Microsoft’s new HTTP Speed+Mobility lacks a catchy name, but otherwise appears to cover much of the same territory SPDY has staked out. Though details on exactly what HTTP Speed+Mobility entails are thin, judging by the blog post announcing it, HTTP Speed+Mobility builds on SPDY but also includes improvements drawn from work on the HTML5 WebSockets API. The emphasis is on not just the web and web browsers, but mobile apps.

“We think that apps — not just browsers — should get faster,” writes Microsoft’s Jean Paoli, General Manager of Interoperability Strategy.

To do that, Microsoft’s HTTP Speed+Mobility “starts from both the Google SPDY protocol and the work the industry has done around WebSockets.” What’s unclear from the initial post is exactly where HTTP Speed+Mobility goes from that hybrid starting point.

But clearly Microsoft isn’t opposed to SPDY. “SPDY has done a great job raising awareness of web performance and taking a ‘clean slate’ approach to improving HTTP,” writes Paoli. “The main departures from SPDY are to address the needs of mobile devices and applications.”

SPDY co-inventor Mike Belshe writes on Google+ that he welcomes Microsoft’s efforts and looks forward to “real-world performance metrics and open source implementations so that we can all evaluate them.”

Belshe also notes that Microsoft’s implication that SPDY is not optimized for mobile “is not true.” Belshe says that the available evidence suggests that developers are generally happy using SPDY in mobile apps, “but it could always be better, of course.”

The process of creating a faster HTTP replacement will not mean simply picking any one vendor’s protocol and standardizing it. Hopefully the IETF will take the best ideas from all sides and combine them into a single protocol that can speed up the web. The exact details — and any potential speed gains — from Microsoft’s HTTP Speed+Mobility contribution remain to be seen, but the more input the IETF gets the better HTTP 2.0 will likely be.

Twitter Catches the ‘SPDY’ Train

Photo: dark_ghetto28/Flickr

Twitter has embraced Google’s vision of a faster web and is now serving webpages over the SPDY protocol to browsers that support it.

SPDY, pronounced “speedy,” is a replacement for the HTTP protocol — the language currently used when your browser talks to a web server. When you request a webpage or a file from a server, chances are your browser sends that request using HTTP. The server answers using HTTP, too. This is why “http” appears at the beginning of most web addresses.

The SPDY protocol handles all the same tasks as HTTP, but SPDY can do it all about 50 percent faster.

SPDY started life as a proprietary protocol at Google and worked only in the company’s Chrome web browser. SPDY has since won support elsewhere. Firefox will have SPDY support when version 11 hits prime time in the near future [Update: As Mozilla's Chris Blizzard points out, SPDY is disabled by default in Firefox 11. If you're using the beta and want to give it a try, you'll need to visit about:config, search for network.http.spdy.enabled and set the value to true. If all goes well SPDY will be turned on by default in Firefox 13.]. Amazon also baked SPDY support into its Silk browser for the Kindle.

The IETF’s HTTPbis Working Group — the standards body charged with creating and maintaining the HTTP specification — is now considering adding SPDY to HTTP 2.0, which will improve the speed of HTTP connections.

Despite the web standards backing, SPDY still has a long way to go before it’s an everyday part of the web. With only Chrome and Firefox behind it, SPDY is still only available for about 40 percent of desktop users. But with large services like Twitter throwing their weight behind it, SPDY may well start to take the web by storm — the more websites that embrace SPDY the more likely it is that other browsers will add support for the faster protocol.

If you’d like to follow Twitter’s lead and get your own site serving over SPDY, check out mod_spdy, a SPDY module for the Apache server (currently a beta release).

OpenDNS and Google Working with CDNs on DNS Speedup

A group of DNS providers and content delivery network (CDN) companies have devised a new extension to the DNS protocol that that aims to more effectively direct users to the closest CDN endpoint. Google, OpenDNS, BitGravity, EdgeCast, and CDNetworks are among the companies participating in the initiative, which they are calling the Global Internet Speedup.

The new DNS protocol extension, which is documented in an IETF draft, specifies a means for including part of the user’s IP address in DNS requests so that the nameserver can more accurately pinpoint the destination that is topologically closest to the user. Ensuring that traffic is directed to CDN endpoints that are close to the user could potentially reduce latency and congestion for high-impact network services like video streaming.

The new protocol extension has already been implemented by OpenDNS and Google’s Public DNS. It works with the CDN services that have signed on to participate in the effort. Google and OpenDNS hope to make the protocol extension an official IETF standard. Other potential adopters—such as Internet ISPs—are free to implement it from the draft specification.

It’s not really clear in practice how much impact this will have on network performance. It’s worth noting that GeoIP lookup technology is already used by some authoritative DNS servers for location-aware routing. The new protocol extension will reportedly address some of the limitations of previous approaches.

This article originally appeared on Ars Technica, Wired’s sister site for in-depth technology news.

Google’s New Cloud Storage Service Takes on Amazon S3

googlecodeGoogle plans to go head to head with Amazon’s popular S3 cloud storage service with the new Google Storage for Developers. Like S3, Google’s new service offers developers a cheap, scalable way to store data online.

While it isn’t exactly the fabled “GDrive,” Google Storage for Developers certainly lays the groundwork for Google to create a user-friendly online storage service.

Google Storage for Developers offers a RESTful API, backups across multiple data centers and even has support for storing large files up to hundreds of gigabytes in size.

Google Storage for Developers is currently an experimental Google Labs project. For now the service is available by invitation only and limited to U.S. developers. You can head over to the sign up page to request an invite which will give you access to 100GB of data storage and 300GB per month of data-transfer bandwidth.

After your application hits those limits a pay-as-you-go scheme kicks in. The pricing is roughly analogous to Amazon’s S3 service. Google’s version will run you 17 cents per GB per month for simple storage, 10 cents per GB for uploading data and 15 to 30 cents per GB for downloads. There’s also a fee for the number of requests — $.01 per 1000 PUT, POST or LIST requests and $0.01 per 10,000 requests using GET or HEAD.

Unfortunately that’s just different enough from Amazon’s pricing structure (which decreases the per GB price as your usage goes up) that it’s hard to say which is cheaper. At first glance Amazon’s S3 service looks marginally cheaper for storage, but in the end the total cost — and which is cheaper — will vary depending on the nature of your web app and how you use either storage service.

Hopefully, now that there’s some competition in the cloud storage space, both services will eventually become even cheaper.

Google does offer some extra tools that Amazon doesn’t have — the BigQuery API and the Prediction API.

According the Google Code announcement, BigQuery is designed to explore the history of your data, and the more interesting Prediction API gives you access to Google’s machine learning algorithms which are designed to “make your apps more intelligent.”

The Prediction API can help make real-time decisions “such as recommending products, assessing user sentiment from blogs and tweets, routing messages or assessing suspicious activities,” says the Google Code blog.

For now there is no charge for using the extra APIs, though noting that in the announcement seems to indicate that, when Google Storage for Developers moves out of Labs, there will be an additional charge.

Because Google Storage for Developers is a beta Labs project, you won’t want to switch from Amazon’s services just yet, but if you’d like to take Google Storage for Developers for spin, head over to the sign up page and request an invite.

See Also: