We recently mentioned over 100 presentations from Portland’s OSCon made it to the web. Included was a surprisingly open presentation on Facebook’s backend.
There was a timely discussion on the subject over at Metafilter. The question posed was “How does Facebook handle or simplify the presumably complicated database queries involved so that me loading my page doesn’t bring it to its knees?”
The answer, according to the presentation by Facebook’s Lucas Nealan? Memcached. In fact, a staggering 400 plus memcached hosts with over 5 terabytes of memory. That’s more than your typical network solution. Memcached is a way to cache frequently used database queries in memory for easy serving.
It’s not just Facebook’s secret. I recently had the opportunity to speak with Garrett Camp, founder of StumbleUpon who praised Memcached ability to dramatically reduce their load.
“Memcached has helped a lot. We added memcached to a lot of our most common queries and load has dropped off to about a third,” Camp admitted. “Between basic partitioning, a lot more RAM, database boxes and memcached, a lot of our most urgent pain points in the past six months have kinda disappeared.”
The latest addition to Google’s growing open-source code library, as of Monday, is an information description language (IDL) technology called Protocol Buffers. That’s computer science lingo for the in-house Google technology akin to XML.
If you like tongue twisters, other computer science terms for it is language neutral software stacks or serialized structured data. It is the delivery mechanism used between servers on the backend of a network to minimize latency. In other words, it is the way applications communicate with one another quickly.
Despite all the jargon, you don’t have to be a computer science whiz to understand it and you’ll most likely have to get a good understanding of the technology if you want to use Ajax-like applications.
Protocol buffers, the company promises, are scalable and portable. It is compatible with most programming languages and designed around simplicity. The files contain structured data and are distinguishable by its naming extension, .proto.
“As nice as XML is, it isn’t going to be efficient enough for [Google's] scale. When all of your machines and network links are running at capacity, XML is an extremely expensive proposition. Not to mention, writing code to work with the DOM tree can sometimes become unwieldy.”
We’ve never had to deal with XML in a scale where programming for it would become unwieldy, but we’ll take Google’s word for it.
Perhaps the biggest value-add of Protocol Buffers to the development community is as a method of dealing with scalability before it is necessary. The biggest developing drain of any start-up is success. How do you prepare for the onslaught of visitors companies such as Google or Twitter have experienced? Scaling for numbers takes critical development time, usually at a juncture where you should be introducing much-needed features to stay ahead of competition rather than paralyzing feature development to keep your servers running.
Over time, Google has tackled the problem of communication between platforms with Protocol Buffers and data storage with Big Table. Protocol Buffers is the first open release of the technology making Google tick, although you can utilize Big Table with App Engine. Google’s spokesman and blogger Matt Cutts describes Google’s usage of Protocol Buffers this way:
“You can think of the Google cluster architecture as a bunch of moderately powerful personal computers connected by ethernet. That’s not quite correct, but it’s a pretty good abstraction. In that model, you have pretty good disk/RAM/computational throughput, but network communication is much more limited. That leads to the first nice thing about Protocol Buffers: they’re very compact going over-the-wire via network.”
Google announced and released the code surrounding protocol buffers Monday, signaling the company’s dedication to sharing its best technology among the industry. The strategy, however, has its pundits. By developing on Google’s open technology, the company effectually gains free development. Applications built on the technology (like Protocol Buffers, but especially Google-controlled technology such as App engine) would eventually have to be ported should Google change their terms or be disagreeable in any way in the future. Similarly, Microsoft’s proprietary technology and APIs have been known from time-to-time to be the thorn in developers sides.
Another benefit to Google (and hungry developers) is the unique situation where applications built on Google technology are more likely to be bought up by the mammoth company — the technology already works on the company’s infrastructure.
Facebook, another high profile engineering-centric company with massive amounts of users, offers its own version in its open sourced Thrift. The compiling program is also ultra portable, fast and efficient. As blogger Sean McCollough notes, it is probably no coincidence Mark Slee, one of the developers of Thrift at Facebook, was also a Google intern — the technology and its goals are very similar.