-
Lesson 1 – Apache Pig in the Hadoop Ecosystem
-
Lab Environment Setup Guide
-
Lesson 2 – ETL with Apache Pig
-
Quiz 2
-
Lesson 3 – Manipulate Data in Apache Pig
-
Quiz 3
- Course Materials
-
Lab Guide
-
PIG Lab Files

Transform Data with Apache Pig
Use Pig to analyze structured data without writing MapReduce code. Covers data pipeline tools, how to load, and manipulate relations and use UDFs in relations.
This course covers how to use Pig to analyze structured data without writing MapReduce code. It starts with a review of data pipeline tools, then covers how to load, manipulate relations and use UDFs in relations in Pig. Together with DA 440 – Query and Store Data with Apache Hive, you will learn how to use Pig and Hive as part of a single data flow in a Hadoop cluster.
What's Covered
Course Lessons | Lab Activities |
1: Pig in the Hadoop Ecosystem Hive Use CasesUse Cases of Pig Steps in the Data Pipeline Data Types Used in Pig |
Connect to the Hive CLI Connect to the Grunt Shell |
2: Extract, Transform, and Load Data Load Data into RelationsDebug Pig Scripts Perform Simple Manipulations Save Relations as Files |
Load Data into Pig Relations Examine Pig Relations Basic Data Manipulations Store Data |
3: Manipulate Data Subset RelationsCombine Relations Use UDFs on Relations |
Load and Filter Relations Transform and Join Relations Explore Data |
Prerequisites
- Linux skills, including familiarity with command-line options such as ls, cd, cp, and su
- Beginning to intermediate proficiency with SQL
- Basic Hadoop knowledge
For more information on how HPE manages, uses and protects your information please refer to HPE Privacy Statement. You can always withdraw or modify your consent to receive marketing communication from HPE. This can be done by using the opt-out and preference mechanism at the bottom of our email marketing communication or by following this link.
×