LinuxWorld

Zvents releases open-source cluster database

An already-working open source database project could let other web companies join Google for bragging rights as the owners of a thousand-node database cluster.

Event search firm Zvents is releasing a massively parallel database server, based on a published Google design, as an open source project. The new software, Hypertable, is designed to scale to 1000 nodes, all commodity PCs, said Doug Judd, principal search architect for Zvents, in a LinuxWorld.com podcast.

Moving the project from in-house to open source is a way for a relatively small company to get the infrastructure software it needs, Judd says. "We aren't in the database business. this is the kind of infrastructure that should be in open source. This is not company proprietary stuff," he says.

The current Hypertable version is a 0.9 alpha release, and has been tested on about 10 nodes so far, Judd says. But Yahoo developers have expressed in interest in "kicking the tires" and testing on more nodes. Yahoo developers are already involved in another way: Hypertable stores its data on a distributed filesystem, and the database developers are currently using the Apache Software Foundation's Hadoop, which Yahoo supports by employing lead Hadoop developer Doug Cutting and his team and with infrastructure.

The Google database design on which Hypertable is based, Bigtable, attracted a lot of developer buzz and a "Best Paper" award from the USENIX Association for "Bigtable: A Distributed Storage System for Structured Data" a 2006 publication from nine Google researchers including Fay Chang, Jeffrey Dean, and Sanjay Ghemawat. Google's Bigtable uses the company's in-house Google File System for storage.

The API for Hypertable is slightly different from Bigtable's, Judd says. Although it is not a full SQL database, it is more featureful than a simple key/value store such as Brad Fitzpatrick's memcached. Memcached is widely used along with a conventional SQL database in high-traffic web sites, to cache chunks of HTML and XML and save an application from having to query the main database.

Brian Aker, director of architecture for open source database supplier MySQL AB, says that he can see a development path that would bridge the gap from the Hypertable API to a full SQL database. In an email interview, he wrote, "Someone could turn this into a backend for MySQL without a lot of effort. You would gain an SQL interface by doing this." For Hypertable as is, Aker says he can see several applications. Besides log data, Hypertable could be useful for image and object servers, and for pre-rendering responses to Representational State Transfer (REST) queries produced by web applications.

React: Give us your thoughts on the issues here.
Use this form to start a public discussion with other Linux World users on this article.
Log In | Register for an account (Why you should)

Note: Register to have your user name appear; otherwise your comment will show up as "Anonymous."

*Anonymous comments will only appear once they are approved by the moderator.

Featured Whitepapers
Newsletter sign-up

Sign up for one of Network World's newsletters compliments of Linux World

Linux & Open Source News Alert
Web Applications Alert
Video and Podcast Alert
Security Alert
Virtualization Alert

Email Address: