Big Data: Apache Spark and Python

Technology - Frank Kane

Ready to take this Premium course?

Sign up today and continue this course plus 1000 more courses anytime, anywhere.

Start Your Free Trial Now

This course will help you learn the in-depth concepts of Sparks Resilient Distributed Datastores, develop and grab the Spark jobs quickly with Python. By the end of this course, you may expect to understand scaling up to larger data sets using Amazon's Elastic MapReduce services and understand how Hadoop YARN distributes Spark across computing clusters.


Learning Objectives

  • Frame Big Data analysis problems as Spark problems.
  • Use Amazon's Elastic MapReduce service to run your job on a cluster with Hadoop YARN.
  • Install and run Apache Spark on a desktop computer or on a cluster.
  • Use Spark's Resilient Distributed Datasets to process and analyze large data sets across many CPU's.
  • Implement iterative algorithms such as breadth-first-search using Spark.
  • Use the MLLib machine learning library to answer common data mining questions.
  • Understand how Spark SQL lets you work with structured data.
  • Understand how Spark Streaming lets your process continuous streams of data in real time.
  • Tune and troubleshoot large jobs running on a cluster.
  • Share information between nodes on a Spark cluster using broadcast variables and accumulators.
  • Understand how the GraphX library helps with network analysis problems.


Career Opportunities

In the entire world, Developers are leveraging the Spark framework in different languages. Such as  Scala, Java, and Python. Basically, Apache Spark offers flexibility to run applications in their favorite languages. Also allows building new apps faster.

Around the globe, some large organizations have taken spark very seriously. Some popular companies like Amazon, Yahoo, Alibaba, eBay, Hitachi, Shopify, and many more. They have invested in talent around Spark. There is some ratio, in which jobs are available, such as in the batch processing of large data sets, 78% of them are engaged. Also, for event stream processing 60% required as support. Similarly, for fast, real-time data querying, around 56% are there. Moreover, at enhancing programming productivity 55% are aiming. Furthermore, there are some huge opportunities across industry segments, that includes

  • Telecommunication/Networking
  • Banking and Finance
  • Retail
  • Software
  • Media and Entertainment
  • Consulting
  • Healthcare
  • Manufacturing
  • IT
  • Professional scientific and technical services

Check out the Big Data jobs here,

What will you Cover?

  •   Introduction 00:02:16
  •   How to Use This Course 00:01:41
  •   Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies 00:14:50
  •   Installing the MovieLens Movie Rating Dataset 00:03:35
  •   Run your first Spark program! Ratings histogram example. 00:04:52
Project not defined for this course


No FAQs found ! in this Course


Frank Kane

Founder: Sundog Education

Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers…

Training 5 or more people?

Get your team access to WIISE top 2,000 courses anytime, anywhere.

Try WIISE for Business

Related Courses

Frank Kane
Data Science and Machine Learning with Python
  5h 56m           29,796 Views
John Bura
Android For Complete Beginners
  11h 28m           136,682 Views
Rob Percival
Web Development for Beginners I
  12h 15m           48,425 Views
Frank Kane
Big Data: Apache Spark and Python
  5h 11m           30,179 Views