`Live Courses by Practice Area Analytics & BI

Apache Spark for Data Scientists

* Looking for a flexible schedule (after hours or weekends)? Please call 858-208-4141 or email us: sales@ccslearningacademy.com.

Student financing options are available.

Transitioning military and Veterans, please contact us to sign up for a free consultation on training and hiring options.

Looking for group training? Contact Us

psinghal

Last Update December 12, 2023

0 already enrolled

About This Course

Course Description

Learn Spark skills from a data science perspective to build unified big data applications combining batch, streaming, and interactive analytics on your data.

Apache Spark is a powerful, open-source processing engine for data in the Hadoop cluster, optimized for speed, ease of use, and sophisticated analytics. The Spark framework supports streaming data processing and complex iterative algorithms, enabling applications to run up to 100x faster than traditional Hadoop MapReduce programs. With Spark, you can write sophisticated applications to execute faster decisions and real-time actions to a wide variety of use cases, architectures, and industries.

This hands-on course explores using Spark for common data related activities from a data science perspective. You will learn to build unified big data applications combining batch, streaming, and interactive analytics on your data.

Learning Objectives

The essentials of Spark architecture and applications

How to execute Spark Programs

How to create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)

How to integrate machine learning into Spark applications

How to use Spark Streaming

Inclusions

Instructor-led training
Training Seminar Student Handbook
Collaboration with classmates (not currently available for self-paced course)
Real-world learning activities and scenarios
Exam scheduling support*
Enjoy job placement assistance for the first 12 months after course completion.
This course is eligible for CCS Learning Academy’s Learn and Earn Program: get a tuition fee refund of up to 50% if you are placed in a job through CCS Global Tech’s Placement Division*
Government and Private pricing available.*

Pre-requisites

Introduction to Java Programming (at least exposure to basic Java syntax)
Introduction to SQL (familiarity wits SQL basics)
Basic knowledge of Statistics and Probability
Data Science background

Target Audience

Data Scientists, System Administrators, Testers, and other technical business professionals who seek to use Spark for data processing and analysis.

Curriculum

40 Lessons24h

1. Spark

Data Science: The State of the Art

Hadoop, Yarn, and Spark

Architectural Overview

Spark and Storm

MLib and Mahout

Distributed vs. Local Run Modes

Hello, Spark

2. Spark Overview

3. DataFrames

4. Spark SQL

5. Spark MLib

6. Spark Streaming

7. Spark GraphX

8. Performance and Tuning

9. Cluster Mode

Your Instructors

psinghal

0/5

471 Courses

0 Reviews

0 Students

Write a review

$1,995.00

Level

Intermediate

Duration 24 hours

Lectures

40 lectures

Subject

`Live Courses by Practice Area Analytics & BI

Inclusions

Instructor-led training
Training Seminar Student Handbook
Collaboration with classmates (not currently available for self-paced course)
Real-world learning activities and scenarios
Exam scheduling support*
Enjoy job placement assistance for the first 12 months after course completion.
This course is eligible for CCS Learning Academy’s Learn and Earn Program: get a tuition fee refund of up to 50% if you are placed in a job through CCS Global Tech’s Placement Division*
Government and Private pricing available.*

Apache Spark for Data Scientists

About This Course

Course Description

Learning Objectives

Inclusions

Pre-requisites

Target Audience

Curriculum

1. Spark

Data Science: The State of the Art

Hadoop, Yarn, and Spark

Architectural Overview

Spark and Storm

MLib and Mahout

Distributed vs. Local Run Modes

Hello, Spark

2. Spark Overview

Spark Core

Spark SQL

Spark and Hive

MLib

Mahout

Spark Streaming

Spark API

3. DataFrames

DataFrames and Resilient Distributed Datasets (RDDs)

Partitions

DataFrame Types

DataFrame Operations

Map/Reduce with DataFrames

4. Spark SQL

Spark SQL Overview

Data stores: HDFS, Cassandra, HBase, Hive, and S3

Table Definitions

ETL in Spark

Queries

5. Spark MLib

MLib overview

MLib Algorithms Overview

6. Spark Streaming

Streaming overview

Real-time data ingestion

State

Window Operations

7. Spark GraphX

GraphX overview

ETL with GraphX

Graph computation

8. Performance and Tuning

Broadcast variables

Accumulators

Memory Management

9. Cluster Mode

Standalone Cluster

Masters and Workers

Configurations

Working with large data sets

Your Instructors

psinghal

Write a review

Inclusions

AZ-900T00-A:Microsoft Azure Fundamentals Training Certification

AZ-204T00-A: Developing solutions for Microsoft Azure

DP-900T00: Microsoft Azure Data Fundamentals

ABOUT

LINKS

SUPPORT

Login