Tag Archives: Madlib

A DBI for PostgreSQL on R

Between the capabilities of R there is the possibility of querying databases thorough R. The DBMS that I know more it’s PostgreSQL. What I like about it, that it is an open source object-relational DBMS. It’s so simple, an also it has an extension for Spatial and Geographical objects called PostGIS.

Thus, the DBI (Database Interface) package I’ve chosen for querying PostgreSQL is RPostgreSQL. To work with it, just I have to download the package from the Repository and use the following code:

library(RPostgreSQL)

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host = "localhost", user= "***", password="***", dbname="Procesos_UChile")

dbListConnections(drv)
dbGetInfo(drv)
summary(con)

df = dbReadTable(con,'etapas_sept_2013')
summary(df)
head(df, 3)
dbDisconnect(con)

This DBI is a nice product, but it’s limited by the ram, this problem appears when I tried to read a table over 10GB. So, I’m stuck on here. I know, this year was released a library called PivotalR, which allows you to manage big amounts of data with the library madlib.

Pivotal is a software company that provides software and services for the development of custom applications for data and analytics based on cloud computing technology.

And they made a an open-source library for scalable in-database analytics that provides data parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data called Madlib.

The next step is trying to installing this library on ubuntu to see how it works. The instructions are on this URL:

https://gist.github.com/thinkerbot/8699369

You can also watch a presentation of PivotalR with a demo on the following video:
https://www.youtube.com/watch?v=6cmyRCMY6j0

Advertisements