- This event has passed.
BIOS Seminar: R for Big Data Analysis
Chuanhai Liu, PhD, professor of statistics at Purdue University, will present this week’s biostatistics seminar titled, “A Multithreaded and Distributed R for Big Data Analysis.”
Abstract: The computer software R is one of the most popular computing tools for data analysis. In the past decade or so, tremendous efforts have been made to make R useful
for big data analysis. These include Tessera, Revolution-R, and SparkR, to name a few. As we know, they are all making use of JAVA-based softwares such as Hadoop and Spark. In this talk, we introduce an entirely new alternative, a multithreaded and distributed R, called SupR. The prototype of SupR was made possible by modifying R (R-3.1.1) existing
internal system implementation. The key features of the prototype include (1) a R-style front-end obtained by maintaining the existing R syntax and internal basic data structures, (2) a Java-like multithreading model, (3) a Spark-like cluster computing environment, and (4) a builtin simple distributed file system. With simple examples, including multithreaded Expectation-Maximization and distributed Linear Regression, we show how SupR can be potentially useful for big data analysis.