Big Data: Hadoop

Technology - Frank Kane

Ready to take this Premium course?

Sign up today and continue this course plus 1000 more courses anytime, anywhere.

Enroll Now

95 Lessons ( 14h 30m )

The world of Hadoop and "Big Data" have hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you'll not only understand what those systems are and how they fit together but, you'll go hands-on and learn how to use them to solve real business problems. This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It's filled with hands-on activities and exercises, so you get some real experience in using Hadoop - it's not just theory. At the end of this course, you may expect to learn the course with a real, deep understanding of Hadoop and its associated distributed systems and you can apply Hadoop to real-world problems.

 

Learning Objectives:

  • Design distributed systems that manage "big data" using Hadoop and related technologies.
  • Use HDFS and MapReduce for storing and analyzing data at scale.
  • Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
  • Analyze relational data using Hive and MySQL
  • Analyze non-relational data using HBase, Cassandra, and MongoDB
  • Query data interactively with Drill, Phoenix, and Presto
  • Choose an appropriate data storage technology for your application
  • Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
  • Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
  • Consume streaming data using Spark Streaming, Flink, and Storm

Instructor

Frank Kane

Founder: Sundog Education

Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically…

Training 5 or more people?

Get your team access to WIISE top 2,000 courses anytime, anywhere.

Try WIISE for Business

What will you cover?

  •   Introduction 00:16:59
  •   Hadoop Overview 00:07:44
  •   Overview of the Hadoop Ecosystem 00:16:46
  •   Tips and Tricks 00:01:09

Instructor

Frank Kane

Founder: Sundog Education

Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically…

Training 5 or more people?

Get your team access to WIISE top 2,000 courses anytime, anywhere.

Try WIISE for Business

Project Description

 

Our final project for this course is described in videos: imagine you work for some big website, and your manager wants a graph of the total number of sessions on your website per day. For this reason, they don't want use Google Analytics or some other existing service - you need to build your own!

The requirements are:

  • The job will be run daily based on the previous day's activity
  • Sessions are defined as traffic from the same IP address within a sliding one-hour window
  • Assume your existing weblogs do not have sessions already assigned to them
  • The data is only to be used internally for analytic purposes.

How would you design a system using the tools you've learned about in this course to meet this demand? The hard part is maintaining session data as website hits come in.

There is no correct answer, but you will see our approach at the end of the course.

Attachments
No Attachments found ! in this Course

Related Courses

Frank Kane
Data Science and Machine Learning with Python
  5h 56m           12,903 Views
John Bura
Android For Complete Beginners
  11h 28m           57,338 Views
Rob Percival
Web Development for Beginners I
  12h 15m           20,352 Views
Frank Kane
Big Data: Apache Spark and Python
  5h 11m           12,643 Views