AWS- - AWS Data Analytics Bootcamp

This bootcamp is a collection of four one-day AWS courses including Building Data Lakes on AWS, Building Batch Data Analytics Solutions on AWS, Building Data Analytics Solutions Using Amazon Redshift, and Building Streaming Data Analytics Solutions on AWS. By attending this training you’ll receive comprehensive training for developing modern data skills, including how to modernize data solutions end to end, how to leverage AWS data Services to store, process, analyze, stream, and query data to make decisions with speed and agility at scale, as well as skills to put your data to work to make better, more informed decisions, respond faster to the unexpected, and uncover new opportunities. These courses will deep dive into Amazon Lake Formation, Amazon Glue, Amazon EMR, Amazon Kinesis, and Amazon Redshift and the current thinking in building and operating data analytics pipelines to turn data into insights.

Student Testimonials

Instructor did a great job, from experience this subject can be a bit dry to teach but he was able to keep it very engaging and made it much easier to focus. Student
Excellent presentation skills, subject matter knowledge, and command of the environment. Student
Instructor was outstanding. Knowledgeable, presented well, and class timing was perfect. Student

Click here to print this page »

Prerequisites


We recommend that attendees attend the following courses:
AWS Technical Essentials
Architecting on AWS

Detailed Class Syllabus


Building Data Lakes on AWS – Day One


Module 1: Introduction to data lakes
Describe the value of data lakes
Compare data lakes and data warehouses
Describe the components of a data lake
Recognize common architectures built on data lakes
Module 2: Data ingestion, cataloging, and preparation
Describe the relationship between data lake storage and data ingestion
Describe AWS Glue crawlers and how they are used to create a data catalog
Identify data formatting, partitioning, and compression for efficient storage and query
Lab 1: Set up a simple data lake
Module 3: Data processing and analytics
Recognize how data processing applies to a data lake
Use AWS Glue to process data within a data lake
Describe how to use Amazon Athena to analyze data in a data lake
Module 4: Building a data lake with AWS Lake Formation
Describe the features and benefits of AWS Lake Formation
Use AWS Lake Formation to create a data lake
Understand the AWS Lake Formation security model
Lab 2: Build a data lake using AWS Lake Formation
Module 5: Additional Lake Formation configurations
Automate AWS Lake Formation using blueprints and workflows
Apply security and access controls to AWS Lake Formation
Match records with AWS Lake Formation FindMatches
Visualize data with Amazon QuickSight
Lab 3: Automate data lake creation using AWS Lake Formation blueprints
Lab 4: Data visualization using Amazon QuickSight
Module 6: Architecture and course review
Post course knowledge check
Architecture review
Course review

Building Batch Data Analytics Solutions on AWS – Day Two


Module A: Overview of Data Analytics and the Data Pipeline
Data analytics use cases
Using the data pipeline for analytics
Module 1: Introduction to Amazon EMR
Using Amazon EMR in analytics solutions
Amazon EMR cluster architecture
Interactive Demo 1: Launching an Amazon EMR cluster
Cost management strategies
Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
Storage optimization with Amazon EMR
Data ingestion techniques
Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
Apache Spark on Amazon EMR use cases
Why Apache Spark on Amazon EMR
Spark concepts
Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
Transformation, processing, and analytics
Using notebooks with Amazon EMR
Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR
Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
Using Amazon EMR with Hive to process batch data
Transformation, processing, and analytics
Practice Lab 2: Batch data processing using Amazon EMR with Hive
Introduction to Apache HBase on Amazon EMR
Module 5: Serverless Data Processing
Serverless data processing, transformation, and analytics
Using AWS Glue with Amazon EMR workloads
Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions
Module 6: Security and Monitoring of Amazon EMR Clusters
Securing EMR clusters
Interactive Demo 3: Client-side encryption with EMRFS
Monitoring and troubleshooting Amazon EMR clusters
Demo: Reviewing Apache Spark cluster history
Module 7: Designing Batch Data Analytics Solutions
Batch data analytics use cases
Activity: Designing a batch data analytics workflow
Module 8: Developing Modern Data Architectures on AWS
Modern data architectures

Building Data Analytics Solutions Using Amazon Redshift – Day Three


Module A: Overview of Data Analytics and the Data Pipeline
Data analytics use cases
Using the data pipeline for analytics
Module 1: Using Amazon Redshift in the Data Analytics Pipeline
Why Amazon Redshift for data warehousing?
Overview of Amazon Redshift
Module 2: Introduction to Amazon Redshift
Amazon Redshift architecture
Interactive Demo 1: Touring the Amazon Redshift console
Amazon Redshift features
Practice Lab 1: Load and query data in an Amazon Redshift cluster
Module 3: Ingestion and Storage
Ingestion
Interactive Demo 2: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
Data distribution and storage
Interactive Demo 3: Analyzing semi-structured data using the SUPER data type
Querying data in Amazon Redshift
Practice Lab 2: Data analytics using Amazon Redshift Spectrum
Module 4: Processing and Optimizing Data
Data transformation
Advanced querying
Practice Lab 3: Data transformation and querying in Amazon Redshift
Resource management
Interactive Demo 4: Applying mixed workload management on Amazon Redshift
Automation and optimization
Interactive demo 5: Amazon Redshift cluster resizing from the dc2.large to ra3.xlplus cluster
Module 5: Security and Monitoring of Amazon Redshift Clusters
Securing the Amazon Redshift cluster
Monitoring and troubleshooting Amazon Redshift clusters
Module 6: Designing Data Warehouse Analytics Solutions
Data warehouse use case review
Activity: Designing a data warehouse analytics workflow
Module B: Developing Modern Data Architectures on AWS
Modern data architectures

Building Streaming Data Analytics Solutions on AWS – Day Four


Module A: Overview of Data Analytics and the Data Pipeline
Data analytics use cases
Using the data pipeline for analytics
Module 1: Using Streaming Services in the Data Analytics Pipeline
The importance of streaming data analytics
The streaming data analytics pipeline
Streaming concepts
Module 2: Introduction to AWS Streaming Services
Streaming data services in AWS
Amazon Kinesis in analytics solutions
Demonstration: Explore Amazon Kinesis Data Streams
Practice Lab: Setting up a streaming delivery pipeline with Amazon Kinesis
Using Amazon Kinesis Data Analytics
Introduction to Amazon MSK
Overview of Spark Streaming
Module 3: Using Amazon Kinesis for Real-time Data Analytics
Exploring Amazon Kinesis using a clickstream workload
Creating Kinesis data and delivery streams
Demonstration: Understanding producers and consumers
Building stream producers
Building stream consumers
Building and deploying Flink applications in Kinesis Data Analytics
Demonstration: Explore Zeppelin notebooks for Kinesis Data Analytics
Practice Lab: Streaming analytics with Amazon Kinesis Data Analytics and Apache Flink
Module 4: Securing, Monitoring, and Optimizing Amazon Kinesis
Optimize Amazon Kinesis to gain actionable business insights
Security and monitoring best practices
Module 5: Using Amazon MSK in Streaming Data Analytics Solutions
Use cases for Amazon MSK
Creating MSK clusters
Demonstration: Provisioning an MSK Cluster
Ingesting data into Amazon MSK
Practice Lab: Introduction to access control with Amazon MSK
Transforming and processing in Amazon MSK
Module 6: Securing, Monitoring, and Optimizing Amazon MSK
Optimizing Amazon MSK
Demonstration: Scaling up Amazon MSK storage
Practice Lab: Amazon MSK streaming pipeline and application deployment
Security and monitoring
Demonstration: Monitoring an MSK cluster
Module 7: Designing Streaming Data Analytics Solutions
Use case review
Class Exercise: Designing a streaming data analytics workflow
Module B: Developing Modern Data Architectures on AWS
Modern data architectures