MS-20773 - Analyzing Big Data with Microsoft R

The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.

Click here to print this page »


In addition to their professional experience, students who attend this course should have:
  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.

Working knowledge of relational databases.

Detailed Class Syllabus

Module 1: Microsoft R Server and R Client

What is Microsoft R server
Using Microsoft R client
The ScaleR functions

Module 2: Exploring Big Data

Understanding ScaleR data sources
Reading data into an XDF object
Summarizing data in an XDF object

Module 3: Visualizing Big Data

Visualizing In-memory data
Visualizing big data

Module 4: Processing Big Data

Transforming Big Data
Managing datasets

Module 5: Parallelizing Analysis Operations

Using the RxLocalParallel compute context with rxExec
Using the revoPemaR package

Module 6: Creating and Evaluating Regression Models

Clustering Big Data
Generating regression models and making predictions

Module 7: Creating and Evaluating Partitioning Models

Creating partitioning models based on decision trees.
Test partitioning models by making and comparing predictions

Module 8: Processing Big Data in SQL Server and Hadoop

Using R in SQL Server
Using Hadoop Map/Reduce
Using Hadoop Spark