Finding Delta

Discovering gems of wisdom from massive data sets

Posts Tagged ‘education

Training To Deal With Mega-Scale Data

leave a comment »

From Revolutions….

In a New York Times article (sub. req.) published on the weekend, IBM and Google expressed doubts that the students graduating from US universities today have the chops to deal with the mulit-terabyte datasets that are becoming commonplace online and in domains like bioscience and astronomy today. From the article:

For the most part, university students have used rather modest computing systems to support their studies. They are learning to collect and manipulate information on personal computers or what are known as clusters, where computer servers are cabled together to form a larger computer. But even these machines fail to churn through enough data to really challenge and train a young mind meant to ponder the mega-scale problems of tomorrow.

The article reveals how Google and IBM are promoting internet-scale research at places like the University of Washington and Purdue. But a curious omission from the article is any mention of open-source technologies which are spurring the innovation in processing and analyzing these data sets. Tools like Hadoop, for processing internet-scale data sets and R, for analyzing the processed data (most likely in some parallelized form), and other open-source projects not yet conceived, are going to be critical in this endeavour.

Written by mattalcock

November 5, 2009 at 12:29 pm

Posted in Data Analysis

Tagged with , ,