Advanced Apache Spark

Advanced Apache Spark

Build data pipeline applications using Spark Streaming, Spark SQL, Spark GraphFrame, and MLlib

About this Course

This course teaches you how to build data pipeline applications using Spark Streaming, Spark SQL, Spark GraphFrame, and MLlib. You’ll learn about Spark Streaming architecture, data pipeline use cases, DStreams, and property graph operations.

This is the third and final course in the Apache Spark v2.1 Series.

What's Covered

Course Lessons Lab Activities

6: Create an Apache Spark Streaming Application

Describe Spark Streaming Architecture
Create a Spark Structured Streaming Application
Apply Operations on Streaming DataFrames
Define Windowed Operations
Describe How Streaming Applications are Fault Tolerant

 

Load and Inspect Data Using the Spark Shell
Use Spark Streaming with the Spark Shell
Build and Run a Streaming Application with SQL
Build and Run a Streaming Application with Windows and SQL

7: Use Apache Spark GraphFrames

Describe GraphFrame
Define Regular, Directed, and Property Graphs
Create a Property Graph
Perform Operations on Graphs

 

Analyze Data with GraphFrame

8: Use Apache Spark MLlib

Describe Apache Spark MLlib Machine Learning Algorithms
Use Collaborative Filtering to Predict User Choice

 

Load and Inspect Data Using Spark Shell
Use Spark to Make Movie Recommendations
Analyze a Simple Flight Example with Decision Trees

Prerequisites

  • Completion of previous Spark courses in the on-demand series: DEV 360  and DEV361 
  • Basic Hadoop knowledge and intermediate Linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala, and experience with SQL

Curriculum

  • Lesson 6 - Create an Apache Spark Streaming Application
  • Quiz 6
  • Lesson 7 - Use Apache Spark GraphFrames
  • Quiz 7
  • Lesson 8 - Use Apache Spark MLlib
  • Quiz 8
  • Course Materials
  • Lab Guide
  • Lab Environment Setup Guide
  • Course Sandboxes

About this Course

This course teaches you how to build data pipeline applications using Spark Streaming, Spark SQL, Spark GraphFrame, and MLlib. You’ll learn about Spark Streaming architecture, data pipeline use cases, DStreams, and property graph operations.

This is the third and final course in the Apache Spark v2.1 Series.

What's Covered

Course Lessons Lab Activities

6: Create an Apache Spark Streaming Application

Describe Spark Streaming Architecture
Create a Spark Structured Streaming Application
Apply Operations on Streaming DataFrames
Define Windowed Operations
Describe How Streaming Applications are Fault Tolerant

 

Load and Inspect Data Using the Spark Shell
Use Spark Streaming with the Spark Shell
Build and Run a Streaming Application with SQL
Build and Run a Streaming Application with Windows and SQL

7: Use Apache Spark GraphFrames

Describe GraphFrame
Define Regular, Directed, and Property Graphs
Create a Property Graph
Perform Operations on Graphs

 

Analyze Data with GraphFrame

8: Use Apache Spark MLlib

Describe Apache Spark MLlib Machine Learning Algorithms
Use Collaborative Filtering to Predict User Choice

 

Load and Inspect Data Using Spark Shell
Use Spark to Make Movie Recommendations
Analyze a Simple Flight Example with Decision Trees

Prerequisites

  • Completion of previous Spark courses in the on-demand series: DEV 360  and DEV361 
  • Basic Hadoop knowledge and intermediate Linux knowledge
  • Experience using a text editor such as vi
  • Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
  • Knowledge of functional programming with Scala, and experience with SQL

Curriculum

  • Lesson 6 - Create an Apache Spark Streaming Application
  • Quiz 6
  • Lesson 7 - Use Apache Spark GraphFrames
  • Quiz 7
  • Lesson 8 - Use Apache Spark MLlib
  • Quiz 8
  • Course Materials
  • Lab Guide
  • Lab Environment Setup Guide
  • Course Sandboxes