Description
The key objectives of this course are as follows;
Learn Spark Architecture
Learn Spark Execution Concepts
Learn Spark Transformations and Actions using the Structured API
Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API
Learn how to set up your own local PySpark Environment
Learn how to interpret the Spark Web UI
Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution
The Python Spark project that we are going to do together;
Sales Data
Create a Spark Session
Read a CSV file into a Spark Dataframe
Learn to Infer a Schema
Select data from the Spark Dataframe
Produce analytics that shows the topmost sales orders per Region and Country
Who this course is for:
Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark
Aspiring Data Engineering and Analytics Professionals
Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster
Data Managers who want to gain a deeper understanding of managing data over a cluster
Requirements
A basic laptop or PC with at least 6 – 8GB of RAM
Basic programming knowledge


0 Comments:
We’re eager to see your comment. However, Please Keep in mind that all comments are moderated manually by our human reviewers according to our comment policy, and all the links are nofollow. Using Keywords in the name field area is forbidden. Let’s enjoy a personal and evocative conversation.