Update on Snowdoop, a MapReduce Alternative

Mad (Data) Scientist

In blog posts a few months ago, I proposed an alternative to MapReduce, e.g. to Hadoop, which I called “Snowdoop.” I pointed out that systems like Hadoop and Spark are very difficult to install and configure, are either too primitive (Hadoop) or too abstract (Spark) to program, and above all, are SLOW. Spark is of course a great improvement on Hadoop, but still suffers from these problems to various extents.

The idea of Snowdoop is to

retain the idea of Hadoop/Spark to work on top of distributed file systems (“move the computation to the data rather than vice versa”)
work purely in R, using familiar constructs
avoid using Java or any other external language for infrastructure
sort data only if the application requires it

I originally proposed Snowdoop just as a concept, saying that I would slowly develop it into an actual package. I later put the beginnings of a…

View original post 601 more words

Ariel Fuentes Díaz

A blog about Geography and more

Update on Snowdoop, a MapReduce Alternative

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply