Saturday, January 19, 2013

Knowing Extra Data Analytic Tools can be Handy

In the last period, I have been using R for preparing and analyzing big sized data.  It has been a reliable tool for many of my needs.  The existence of numerous blogs (e.g. r-bloggers) that share many useful tips makes trouble shooting can be done reasonably fast.

Memory Allocation Notification - R

Not without flaw, I just realized that R does have its limit.  At the moment, it can not allocate working memory more than 2.2 GB regardless how powerful your computer is (see the description picture above).  Although you have 8 GB RAM, R will send an exception when the utilized memory reaches the allowed limit.  To solve this issue (i.e. coping with "big data" analysis using R) Ryan Rosario's video can stand as a good reference.

EM Clustering - Weka
I haven't really experimented with what the video has suggested (I am not in the mood for learning new package :p).  In exchange, for my problem at the time (doing EM clustering for ~25,000 records data), I just used my old "fellow" Weka, and solve the problem within 30 minutes or so.

Because a tool's limitation can be explored at unexpected time, it is always handy to know more than one statistical learning tools (e.g. R, Weka, Rapidminer, etc).  Happy data crunching then :).  

No comments:

Post a Comment