HW DEV-302 - HDP Developer Apache Pig and Hive

This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.

Student Testimonials

Instructor did a great job, from experience this subject can be a bit dry to teach but he was able to keep it very engaging and made it much easier to focus. Student
Excellent presentation skills, subject matter knowledge, and command of the environment. Student
Instructor was outstanding. Knowledgeable, presented well, and class timing was perfect. Student

Click here to print this page »

Prerequisites


Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

Detailed Class Syllabus


DAY 1 - IN INTRODUCTION TO THE HADOOP DISTRIBUTED FILE SYSTEM


Understanding Hadoop
The Hadoop Distributed File System
Ingesting Data into HDFS
The MapReduce Framework
LABS
Starting an HDP Cluster
Demonstration: Understanding Block Storage
Using HDFS Commands
Importing RDBMS Data into HDFS
Exporting HDFS Data to an RDBMS
Importing Log Data into HDFS Using Flume
Demonstration: Understanding MapReduce
Running a MapReduce Job

Day 1 - LABS


Starting an HDP Cluster
Demonstration: Understanding Block Storage
Using HDFS Commands
Importing RDBMS Data into HDFS
Exporting HDFS Data to an RDBMS
Importing Log Data into HDFS Using Flume
Demonstration: Understanding MapReduce
Running a MapReduce Job

DAY 2 - AN INTRODUCTION TO APACHE PIG


Introduction to Apache Pig
Advanced Apache Pig Programming

Day 2 - LABS


Demonstration: Understanding Apache Pig
Getting Starting with Apache Pig
Exploring Data with Apache Pig
Splitting a Dataset
Joining Datasets with Apache Pig
Preparing Data for Apache Hive
Demonstration: Computing Page Rank
Analyzing Clickstream Data
Analyzing Stock Market Data Using Quantiles

DAY 3 - AN INTRODUCTION TO APACHE HIVE


Apache Hive Programming
Using HCatalog
Advanced Apache Hive Programming

DAY 3 - LABS


Understanding Hive Tables
Understanding Partition and Skew
Analyzing Big Data with Apache Hive
Demonstration: Computing NGrams
Joining Datasets in Apache Hive
Computing NGrams of Emails in Avro Format
Using HCatalog withApachePig

DAY 4 - WORKING WITH SPARK CORE, SPARK SQL AND OOZIE


Advanced Apache Hive Programming (Continued)
Hadoop 2 and YARN
Introduction to Spark Core and Spark SQL
Defining Workflow with Oozie

Day 4 - LABS


Advanced Apache Hive Programming
Running a YARN Application
Getting Started with Apache Spark
Exploring Apache Spark SQL
Defining an Apache Oozie Workflow