The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.
Click here to print this page »
In addition to their professional experience, students who attend this course should have:
- Programming experience using R, and familiarity with common R packages
- Knowledge of common statistical methods and data analysis best practices.
- Basic knowledge of the Microsoft Windows operating system and its core functionality.
Working knowledge of relational databases.
Detailed Class Syllabus
Module 1: Microsoft R Server and R Client
What is Microsoft R server
Using Microsoft R client
The ScaleR functions
Module 2: Exploring Big Data
Understanding ScaleR data sources
Reading data into an XDF object
Summarizing data in an XDF object
Module 3: Visualizing Big Data
Visualizing In-memory data
Visualizing big data
Module 4: Processing Big Data
Transforming Big Data
Module 5: Parallelizing Analysis Operations
Using the RxLocalParallel compute context with rxExec
Using the revoPemaR package
Module 6: Creating and Evaluating Regression Models
Clustering Big Data
Generating regression models and making predictions
Module 7: Creating and Evaluating Partitioning Models
Creating partitioning models based on decision trees.
Test partitioning models by making and comparing predictions
Module 8: Processing Big Data in SQL Server and Hadoop
Using R in SQL Server
Using Hadoop Map/Reduce
Using Hadoop Spark