`Live Courses by Practice Area Analytics & BI

Developing with Spark for Big Data (TTSK7505)

* Looking for a flexible schedule (after hours or weekends)? Please call 858-208-4141 or email us: sales@ccslearningacademy.com.

Student financing options are available.

Transitioning military and Veterans, please contact us to sign up for a free consultation on training and hiring options.

Looking for group training? Contact Us

psinghal

Last Update April 15, 2024

0 already enrolled

About This Course

Course Description

Learn advanced Big Data and Spark skills to access disparate databases, integrate Machine Learning (ML), and establish streaming solutions.

Apache Spark is an important component in the Hadoop Ecosystem as a cluster computing engine used for Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, Spark offers faster in-memory processing for computing tasks when compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R along with SQL-based front-ends.

With advanced libraries like Mahout and MLib for Machine Learning, GraphX, or Neo4J for rich data graph processing, as well as access to other NoSQL data stores, Rule engines, and components, Spark is a lynchpin in modern Big Data and Data Science computing.

This course introduces you to enterprise-grade Spark programming and the components to craft complete data science solutions. You’ll learn core big data and Spark development techniques and industry practices. This course is offered in Java, and with some alterations, Python, Scala, and R.

Learning Objectives

The essentials of Spark architecture and applications

How to execute Spark Programs

How to create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)

How to persist and restore data frames

Essential NOSQL access

How to integrate machine learning into Spark applications

How to use Spark Streaming and Kafka to create streaming applications

Inclusions

Instructor-led training
Training Seminar Student Handbook
Collaboration with classmates (not currently available for self-paced course)
Real-world learning activities and scenarios
Exam scheduling support*
Enjoy job placement assistance for the first 12 months after course completion.
This course is eligible for CCS Learning Academy’s Learn and Earn Program: get a tuition fee refund of up to 50% if you are placed in a job through CCS Global Tech’s Placement Division*
Government and Private pricing available.*

Pre-requisites

Java programming experience
Python programming experience
Basic understanding of SQL
Comfort with navigating the Linux command line
Basic knowledge of Linux editors (such as VI/nano) for editing code

Target Audience

Experienced Developers and Architects who seek proficiency in working with Apache Spark in an enterprise data environment.

Curriculum

73 Lessons40h

1. Spark Overview

Hadoop Ecosystem

Hadoop YARN vs. Mesos

Spark vs. Map/Reduce

Spark with Map/Reduce: Lambda Architecture

Spark in the Enterprise Data Science Architecture

2. Spark Component Overview

3. RDDs: Resilient Distributed Datasets

4. DataFrames

5. Spark Applications

6. DataFrame Persistence

7. Spark Streaming

8. Accessing NOSQL Data

9. Enterprise Integration

10. Algorithms and Patterns

11. Spark SQL

12. GraphX

13. Alternate Languages

14. Clustering Spark for Developers

15. Performance and Tuning

Your Instructors

psinghal

0/5

471 Courses

0 Reviews

0 Students

Write a review

$2,445.00

Level

Intermediate

Duration 40 hours

Lectures

73 lectures

Subject

`Live Courses by Practice Area Analytics & BI

Inclusions

Instructor-led training
Training Seminar Student Handbook
Collaboration with classmates (not currently available for self-paced course)
Real-world learning activities and scenarios
Exam scheduling support*
Enjoy job placement assistance for the first 12 months after course completion.
This course is eligible for CCS Learning Academy’s Learn and Earn Program: get a tuition fee refund of up to 50% if you are placed in a job through CCS Global Tech’s Placement Division*
Government and Private pricing available.*

Developing with Spark for Big Data (TTSK7505)

About This Course

Course Description

Learning Objectives

Inclusions

Pre-requisites

Target Audience

Curriculum

1. Spark Overview

Hadoop Ecosystem

Hadoop YARN vs. Mesos

Spark vs. Map/Reduce

Spark with Map/Reduce: Lambda Architecture

Spark in the Enterprise Data Science Architecture

2. Spark Component Overview

Spark Shell

RDDs: Resilient Distributed Datasets

Data Frames

Spark 2 Unified DataFrames

Spark Sessions

Functional Programming

Spark SQL

MLib

Structured Streaming

Spark R

Spark and Python

3. RDDs: Resilient Distributed Datasets

Coding with RDDs

Transformations

Actions

Lazy Evaluation and Optimization

RDDs in Map/Reduce

4. DataFrames

RDDs vs. DataFrames

Unified Dataframes (UDF) in Spark 2.0

Partitioning

5. Spark Applications

Spark Sessions

Running Applications

Logging

6. DataFrame Persistence

RDD Persistence

DataFrame and Unified DataFrame Persistence

7. Spark Streaming

Streaming Overview

Streams

Structured Streaming

DStreams and Apache Kafka

8. Accessing NOSQL Data

Ingesting data

Parquet Files

Relational Databases

Graph Databases (Neo4J and GraphX)

Interacting with Hive

Accessing Cassandra Data

Document Databases (MongoDB and CouchDB)

9. Enterprise Integration

Map/Reduce and Lambda Integration

Camel Integration

Drools and Spark

10. Algorithms and Patterns

MLib and Mahout

Classification

Clustering

Decision Trees

Decompositions

Pipelines

Spark Packages

11. Spark SQL

Spark SQL

SQL and DataFrames

Spark SQL and Hive

Spark SQL and JDBC

12. GraphX

Graph APIs

GraphX

ETL in GraphX

Exploratory Analysis

Graph computation

Pregel API Overview