-
Lesson 1 - Introduction to Apache Spark
-
Quiz 1
-
Lesson 2 - Create Datasets
-
Quiz 2
-
Lesson 3 - Apply Operations on Datasets
-
Quiz 3
- Course Materials
-
Lab Guide
-
Lab Environment Setup Guide
-
Course Sandboxes

Introduction to Apache Spark
Learn the benefits of using Spark 2.1 for developing streaming applications
This introductory course, targeted to developers, enables you to build simple Spark applications for Apache Spark version 2.1. It introduces the benefits of Spark for developing big data processing applications, loading, and inspecting data using the Spark interactive shell and building a standalone application.
This is the first course in the Apache Spark v2.1 Series.
What's Covered
Course Lessons | Lab Activities |
1: Introduction to Apache Spark Describe Features of Apache SparkDefine Spark Components Explain Spark Data Pipeline Use Cases |
No labs |
2: Create Datasets Define Data Sources, Structures, and SchemasCreate Datasets and DataFrames Convert DataFrames into Datasets |
Load Data and Create Datasets Using Reflection Bonus Lab: Word Count Using Datasets (Optional) |
3: Apply Operations on Datasets Apply Operations on DatasetsCache Datasets Create User Defined Functions (UDFs) Repartition Datasets |
Explore SFPD Data Create and Use UDFs Analyze Data Using UDF and Queries |
Prerequisites
- Basic Hadoop knowledge and intermediate Linux knowledge
- Experience using a text editor such as vi
- Terminal program installed; familiarity with command-line options such as mv, cp, ssh, grep, cd, and useradd
- Knowledge of functional programming with Scala, and experience with SQL
For more information on how HPE manages, uses and protects your information please refer to HPE Privacy Statement. You can always withdraw or modify your consent to receive marketing communication from HPE. This can be done by using the opt-out and preference mechanism at the bottom of our email marketing communication or by following this link.
×