Loading Events

« All Events

  • This event has passed.

BIOS Seminar: R for Big Data Analysis

March 7, 2019 @ 3:30 pm - 4:30 pm

Chuanhai Liu, PhD, professor of statistics at Purdue University, will present this week’s biostatistics seminar titled, “A Multithreaded and Distributed R for Big Data Analysis.”

Abstract: The computer software R is one of the most popular computing tools for data analysis. In the past decade or so, tremendous efforts have been made to make R useful
for big data analysis. These include Tessera, Revolution-R, and SparkR, to name a few. As we know, they are all making use of JAVA-based softwares such as Hadoop and Spark. In this talk, we introduce an entirely new alternative, a multithreaded and distributed R, called SupR. The prototype of SupR was made possible by modifying R (R-3.1.1) existing
internal system implementation. The key features of the prototype include (1) a R-style front-end obtained by maintaining the existing R syntax and internal basic data structures, (2) a Java-like multithreading model, (3) a Spark-like cluster computing environment, and (4) a builtin simple distributed file system. With simple examples, including multithreaded Expectation-Maximization and distributed Linear Regression, we show how SupR can be potentially useful for big data analysis.

Details

Date:
March 7, 2019
Time:
3:30 pm - 4:30 pm
Event Categories:
,

Location

Blue Cross and Blue Shield Auditorium (0001 Michael Hooker Research Center)
Michael Hooker Research Center
Chapel Hill, NC 27516 United States
+ Google Map
View Location Website