Apache Spark for Data Scientists
*Looking for flexible schedule (after hours or weekend)? Please call or email us: 858-208-4141 or email@example.com.
Student financing options are available.
Transitioning military and Veterans, please contact us to sign up for a free consultation on training and hiring options.
Learn Spark skills from a data science perspective to build unified big data applications combining batch, streaming, and interactive analytics on your data.
Apache Spark is a powerful, open-source processing engine for data in the Hadoop cluster, optimized for speed, ease of use, and sophisticated analytics. The Spark framework supports streaming data processing and complex iterative algorithms, enabling applications to run up to 100x faster than traditional Hadoop MapReduce programs. With Spark, you can write sophisticated applications to execute faster decisions and real-time actions to a wide variety of use cases, architectures, and industries.
This hands-on course explores using Spark for common data related activities from a data science perspective. You will learn to build unified big data applications combining batch, streaming, and interactive analytics on your data.
- Data Science: The State of the Art
- Hadoop, Yarn, and Spark
- Architectural Overview
- Spark and Storm
- MLib and Mahout
- Distributed vs. Local Run Modes
- Hello, Spark
- Spark Core
- Spark SQL
- Spark and Hive
- Spark Streaming
- Spark API
- DataFrames and Resilient Distributed Datasets (RDDs)
- DataFrame Types
- DataFrame Operations
- Map/Reduce with DataFrames
- Spark SQL Overview
- Data stores: HDFS, Cassandra, HBase, Hive, and S3
- Table Definitions
- ETL in Spark
- MLib overview
- MLib Algorithms Overview
- Streaming overview
- Real-time data ingestion
- Window Operations
- GraphX overview
- ETL with GraphX
- Graph computation
Performance and Tuning
- Broadcast variables
- Memory Management
- Standalone Cluster
- Masters and Workers
- Working with large data sets
Data Scientists, System Administrators, Testers, and other technical business professionals who seek to use Spark for data processing and analysis.
What You'll Learn
Join an engaging hands-on learning environment, where you’ll learn:
- The essentials of Spark architecture and applications
- How to execute Spark Programs
- How to create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)
- How to integrate machine learning into Spark applications
- How to use Spark Streaming
Before attending this course, you should have:
- Introduction to Java Programming (at least exposure to basic Java syntax)
- Introduction to SQL (familiarity wits SQL basics)
- Basic knowledge of Statistics and Probability
- Data Science background
With CCS Learning Academy, you’ll receive:
- Instructor-led training
- Training Seminar Student Handbook
- Pre and Post assessments/evaluations
- Collaboration with classmates (not currently available for self-paced course)
- Real-world learning activities and scenarios
- Exam scheduling support*
- Enjoy job placement assistance for the first 12 months after course completion.
- This course is eligible for CCS Learning Academy’s Learn and Earn Program: get a tuition fee refund of up to 50% if you are placed in a job through CCS Global Tech’s Placement Division*
- Government and Private pricing available.*
*For more details call: 858-208-4141 or email: firstname.lastname@example.org; email@example.com