This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.
Student Testimonials
Instructor did a great job, from experience this subject can be a bit dry to teach but he was able to keep it very engaging and made it much easier to focus.
Student
Excellent presentation skills, subject matter knowledge, and command of the environment.
Student
Instructor was outstanding. Knowledgeable, presented well, and class timing was perfect.
Student
Click here to print this page »
Prerequisites
Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.
Detailed Class Syllabus
DAY 1 - IN INTRODUCTION TO THE HADOOP DISTRIBUTED FILE SYSTEM
Understanding Hadoop
The Hadoop Distributed File System
Ingesting Data into HDFS
The MapReduce Framework
LABS
Starting an HDP Cluster
Demonstration: Understanding Block Storage
Using HDFS Commands
Importing RDBMS Data into HDFS
Exporting HDFS Data to an RDBMS
Importing Log Data into HDFS Using Flume
Demonstration: Understanding MapReduce
Running a MapReduce Job
Day 1 - LABS
Starting an HDP Cluster
Demonstration: Understanding Block Storage
Using HDFS Commands
Importing RDBMS Data into HDFS
Exporting HDFS Data to an RDBMS
Importing Log Data into HDFS Using Flume
Demonstration: Understanding MapReduce
Running a MapReduce Job
DAY 2 - AN INTRODUCTION TO APACHE PIG
Introduction to Apache Pig
Advanced Apache Pig Programming
Day 2 - LABS
Demonstration: Understanding Apache Pig
Getting Starting with Apache Pig
Exploring Data with Apache Pig
Splitting a Dataset
Joining Datasets with Apache Pig
Preparing Data for Apache Hive
Demonstration: Computing Page Rank
Analyzing Clickstream Data
Analyzing Stock Market Data Using Quantiles
DAY 3 - AN INTRODUCTION TO APACHE HIVE
Apache Hive Programming
Using HCatalog
Advanced Apache Hive Programming
DAY 3 - LABS
Understanding Hive Tables
Understanding Partition and Skew
Analyzing Big Data with Apache Hive
Demonstration: Computing NGrams
Joining Datasets in Apache Hive
Computing NGrams of Emails in Avro Format
Using HCatalog withApachePig
DAY 4 - WORKING WITH SPARK CORE, SPARK SQL AND OOZIE
Advanced Apache Hive Programming (Continued)
Hadoop 2 and YARN
Introduction to Spark Core and Spark SQL
Defining Workflow with Oozie
Day 4 - LABS
Advanced Apache Hive Programming
Running a YARN Application
Getting Started with Apache Spark
Exploring Apache Spark SQL
Defining an Apache Oozie Workflow