Levi, Ray & Shoup, Inc.
  • Courses
  • Site Content

MS-20773 - Analyzing Big Data with Microsoft R

The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.

Click here to print this page »

Prerequisites


In addition to their professional experience, students who attend this course should have:
  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.

Working knowledge of relational databases.

Detailed Class Syllabus


Module 1: Microsoft R Server and R Client


What is Microsoft R server
Using Microsoft R client
The ScaleR functions

Module 2: Exploring Big Data


Understanding ScaleR data sources
Reading data into an XDF object
Summarizing data in an XDF object

Module 3: Visualizing Big Data


Visualizing In-memory data
Visualizing big data

Module 4: Processing Big Data


Transforming Big Data
Managing datasets

Module 5: Parallelizing Analysis Operations


Using the RxLocalParallel compute context with rxExec
Using the revoPemaR package

Module 6: Creating and Evaluating Regression Models


Clustering Big Data
Generating regression models and making predictions

Module 7: Creating and Evaluating Partitioning Models


Creating partitioning models based on decision trees.
Test partitioning models by making and comparing predictions

Module 8: Processing Big Data in SQL Server and Hadoop


Using R in SQL Server
Using Hadoop Map/Reduce
Using Hadoop Spark