BangDb – new high-performance database?

The developer of BangDB, Sachin Sinha, recently put a performance paper up on the High Scalability website

Performance Data for LevelDB, Berkeley DB and BangDB for Random Operations

Berkeley DB is now owned by Oracle, but it’s been around for a long time. It’s a library that provides an embedded database for key/value data, written in C with bindings for a number of languages. Proprietary software requires a commercial license of around $900 to $14000 per processor.

LevelDB is a key-value library written by Google, somewhat faster in operation than Berkeley DB. It has built-in compression (using Snappy), is written in C, and is BSD licensed.

BangDB is a new entrant to the key-value database world, with embedded, network service, and distributed cluster modes. Only the embedded version has been released, and it’s BSD license. It’s written in C++.

 

Morder, C++ I/O library

Mozy open-sourced several of their libraries, Ruby Protocol Buffers and Mordor. Mordor is a high-performance C++ I/O library that uses cooperatively scheduled fibers instead of callbacks, which is supposed to put less load on the kernel.

There are some Mordor YouTube presentations

Like any library, it has a lot of infrastructure along with the stated purpose.

Shake, a build system written in Haskell

I was skimming through some recent ACM proceedings, and an article caught my eye.

It’s not massively parallel, but it has the central idea of a dynamic dependency graph (allowing dependencies to be added while building) instead of static dependencies a la make.
This is a video of an older version: http://vimeo.com/15465133
The Haskell package is here: http://hackage.haskell.org/package/shake
It’s notable for the brevity of its notation.

Visualization tools

Tools

D3.js (successor to Protovis). This is a Javascript library to do in-browser visualizations. The Github project contains source and examples. The examples page gives an indication of what can be done with D3.js. One particularly useful bit would be using treemaps to browse hierarchies of data. The biggest challenge here is getting data to the browser; it’s the web, so paradoxically getting access to remote data is easier than data on the local machine. So if I were using this to visualize data, I would need a server to provide the data, even if it was a local server.

Processing is a Java-based tool that can be used for visualization. One advantage to using Processing is that there are a lot of users, a lot of tutorials, and a lot of books. However, it is not a visualization toolkit per se.

Raphael is another Javascript library. It’s forte is as a mechanism to create SVG graphics from Javascript, so it is more akin to Processing in that you can build up visualizations with it.

InfoVis is a Javascript library for visualization. While it doesn’t seem as slick as D3.js, it has an impressive pedigree and some high-profile users (the Obama White House, for example). The author of InfoVis has two other related projects, PhiloGL (a Javascript WebGL framework), and V8-GL (Javascript bindings for creating 2D-3D hardware accelerated graphics).

R is a software environment for statistical computing and graphics. If you’re working with large amounts of data and you want to analyze or visualize it, R can handle it. It’s a language and a set of libraries which are being added to all the time.

Nodebox is a family of visualization tools.

Gephi is a cross-platform open-source visualization toolkit.

Other resources

visualizing.org is a website/community for people who do visualization.

visualizing data is a blog on visualization, with a monthly collection of “best visualizations”.

R Spatial Tips is a collection of tips for handling spatial data in R.

Tulp Interactive is the work of Jan Willem Tulp, an amazing data visualization expert.

visual complexity is a web site devoted to the visualization of complex networks. In particular, the blog is worth reading.

Graphic Sociology is a blog by Laura Norén where she analyzes visualizations in very long, thoughtful deconstruction articles.

information aesthetics is a blog that explores the relationship between creative design and visualization.

visual.ly is a showcase and marketplace for visualization.

vizualize is a blog about visualization.

 Ben Fry’s blog (the author of Processing).

Eyeo Festival is perhaps the premier conference for visualization.