Archive for the ‘Uncategorized’ Category

Big Computing

Friday, July 2nd, 2010

As an old hand at distributed computing (for my PhD I worked on compilers for large scale distributed memory parallel machines) I have been watching with interest the new found popular interest in parallel computing typified by Map-Reduce and Hadoop. After reading the code, experimenting, reading papers and attending talks I have become convinced of a few things:

  • map-reduce is a poor fit for many many applications
  • Hadoop is deeply batch oriented, like the timeshare systems of the days of yore

These systems are still very popular because at the core they possess two important features, they are:

  • composed from redundant, share-nothing, fallible nodes
  • programmed as a network of message passing share-nothing actors

The first feature makes them relatively easy to build out, administer and maintain.  The latter makes them relatively easy to program and, more importantly to recover and continue the computation in the face of node failures.

This blog and the project it will be covering will examine the good and bad aspects of modern Big Computing (Big Data + Big Computers + Big Programs) and propose some solutions.

Stay Tuned!