This course will help you learn the in-depth concepts of Sparks Resilient Distributed Datastores, develop and grab the Spark jobs quickly with Python. By the end of this course, you may expect to understand scaling up to larger data sets using Amazon's Elastic MapReduce services and understand how Hadoop YARN distributes Spark across computing clusters.
- Frame Big Data analysis problems as Spark problems.
- Use Amazon's Elastic MapReduce service to run your job on a cluster with Hadoop YARN.
- Install and run Apache Spark on a desktop computer or on a cluster.
- Use Spark's Resilient Distributed Datasets to process and analyze large data sets across many CPU's.
- Implement iterative algorithms such as breadth-first-search using Spark.
- Use the MLLib machine learning library to answer common data mining questions.
- Understand how Spark SQL lets you work with structured data.
- Understand how Spark Streaming lets your process continuous streams of data in real time.
- Tune and troubleshoot large jobs running on a cluster.
- Share information between nodes on a Spark cluster using broadcast variables and accumulators.
- Understand how the GraphX library helps with network analysis problems.
In the entire world, Developers are leveraging the Spark framework in different languages. Such as Scala, Java, and Python. Basically, Apache Spark offers flexibility to run applications in their favorite languages. Also allows building new apps faster.
Around the globe, some large organizations have taken spark very seriously. Some popular companies like Amazon, Yahoo, Alibaba, eBay, Hitachi, Shopify, and many more. They have invested in talent around Spark. There is some ratio, in which jobs are available, such as in the batch processing of large data sets, 78% of them are engaged. Also, for event stream processing 60% required as support. Similarly, for fast, real-time data querying, around 56% are there. Moreover, at enhancing programming productivity 55% are aiming. Furthermore, there are some huge opportunities across industry segments, that includes
- Banking and Finance
- Media and Entertainment
- Professional scientific and technical services
Check out the Big Data jobs here, https://www.naukri.com/big-data-jobs
What will you Cover?
How to Use This Course
Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies
Installing the MovieLens Movie Rating Dataset
Run your first Spark program! Ratings histogram example.
Introduction to Spark
The Resilient Distributed Dataset (RDD)
Ratings Histogram Walkthrough
Running the Average
Running the Minimum Temperature
Running the Maximum Temperature
Counting Word Occurrences using flatmap
Improving the Word Count Script with Regular Expressions
Sorting the Word Count Results
Customer Order Assignments
Customer Order Solutions
Customer Order Sorted
Find the Most Popular Movie
Use Broadcast Variables to Display Movie Names Instead of ID Numbers
Find the Most Popular Superhero in a Social Graph
Run the Script
Superhero Degrees of Separation: Introduction
Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
Superhero Degrees of Separation: Review the Code and Run it
Item-Based Collaborative Filtering in Spark, cache and persist
Running the Similar Movies Script using Spark's Cluster Manager
Improve the Quality of Similar Movies
Introducing Elastic MapReduce
Setting up your AWS
Create Similar Movies from One Million Ratings - Part 1
Create Similar Movies from One Million Ratings - Part 2
Create Similar Movies from One Million Ratings - Part 3
Troubleshooting Spark on a Cluster
More Troubleshooting, and Managing Dependencies
Executing SQL commands and SQL
Using DataFrames instead of RDD's
Using MLLib to Produce Movie Recommendations
Analyzing the ALS Recommendations Results
Using DataFrames with MLLib
Spark Streaming and GraphX
Learning More about Spark and Data Science
Project not defined for this course
No FAQs found ! in this Course