The Ultimate Hands-On Hadoop: Tame your Big Data! | Frank Kane | Skillshare

The Ultimate Hands-On Hadoop: Tame your Big Data!

Frank Kane, Founder of Sundog Education, ex-Amazon

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
94 Videos (14h 29m)
    • [Activity] Introduction, and install Hadoop on your desktop!

      17:50
    • Hadoop Overview and History

      7:44
    • Overview of the Hadoop Ecosystem

      16:46
    • Tips for Using This Course

      1:09
    • HDFS: What it is, and how it works

      13:53
    • [Activity] Install the MovieLens dataset into HDFS using the Ambari UI

      6:20
    • [Activity] Install the MovieLens dataset into HDFS using the command line

      7:50
    • MapReduce: What it is, and how it works

      10:40
    • How MapReduce distributes processing

      12:57
    • MapReduce example: Break down movie ratings by rating score

      11:35
    • [Activity] Installing Python, MRJob, and nano

      7:43
    • [Activity] Code up the ratings histogram MapReduce job and run it

      7:36
    • [Exercise] Rank movies by their popularity

      7:06
    • [Activity] Check your results against mine!

      8:23
    • Introducing Ambari

      9:49
    • Introducing Pig

      6:25
    • Example: Find the oldest movie with a 5-star rating using Pig

      15:07
    • [Activity] Find old 5-star movies with Pig

      9:40
    • More Pig Latin

      7:34
    • [Exercise] Find the most-rated one-star movie

      1:56
    • Pig Challenge: Compare Your Results to Mine!

      5:37
    • Why Spark?

      10:06
    • The Resilient Distributed Dataset (RDD)

      10:13
    • [Activity] Find the movie with the lowest average rating - with RDD's

      15:33
    • Datasets and Spark 2.0

      6:28
    • [Activity] Find the movie with the lowest average rating - with DataFrames

      10:00
    • [Activity] Movie recommendations with MLLib

      12:16
    • [Exercise] Filter the lowest-rated movies by number of ratings

      2:51
    • [Activity] Check your results against mine!

      6:40
    • What is Hive?

      6:31
    • [Activity] Use Hive to find the most popular movie[Activity] Use Hive to find the most popular movie

      10:45
    • How Hive works

      9:10
    • [Exercise] Use Hive to find the movie with the highest average rating

      1:55
    • Compare your solution to mine.

      4:10
    • Integrating MySQL with Hadoop

      8:00
    • [Activity] Install MySQL and import our movie data

      7:35
    • [Activity] Use Sqoop to import data from MySQL to HFDS/Hive

      7:31
    • [Activity] Use Sqoop to export data from Hadoop to MySQL

      7:16
    • Why NoSQL?

      13:54
    • What is HBase

      12:55
    • [Activity] Import movie ratings into HBase

      13:28
    • [Activity] Use HBase with Pig to import data at scale.

      11:19
    • Cassandra overview

      14:50
    • [Activity] Installing Cassandra

      11:43
    • [Activity] Write Spark output into Cassandra

      11:00
    • MongoDB overview

      16:54
    • [Activity] Install MongoDB, and integrate Spark with MongoDB

      12:44
    • [Activity] Using the MongoDB shell

      7:48
    • Choosing a database technology

      15:59
    • [Exercise] Choose a database for a given problem

      5:00
    • Overview of Drill

      7:55
    • [Activity] Setting Up Drill

      10:58
    • [Activity] Querying across multiple databases with Drill

      7:07
    • Overview of Phoenix

      8:55
    • [Activity] Install Phoenix and query HBase with it

      7:08
    • [Activity] Integrate Phoenix with Pig

      11:45
    • Overview of Presto

      6:39
    • [Activity] Install Presto, and query Hive with it.

      12:26
    • [Activity] Query both Cassandra and Hive using Presto.

      9:01
    • YARN explained

      10:01
    • Tez explained

      4:56
    • [Activity] Use Hive on Tez and measure the performance benefit

      8:35
    • Mesos explained

      7:13
    • ZooKeeper explained

      13:10
    • [Activity] Simulating a failing master with ZooKeeper

      6:47
    • Oozie explained

      11:56
    • [Activity] Set up a simple Oozie workflow

      16:39
    • Zeppelin overview

      5:01
    • [Activity] Use Zeppelin to analyze movie ratings, part 1

      12:28
    • [Activity] Use Zeppelin to analyze movie ratings, part 2

      9:46
    • Hue overview

      8:07
    • Other technologies worth mentioning

      4:35
    • Kafka explained

      9:48
    • [Activity] Setting up Kafka, and publishing some data.

      7:24
    • [Activity] Publishing web logs with Kafka

      10:21
    • Flume explained

      10:16
    • [Activity] Set up Flume and publish logs with it.

      7:46
    • [Activity] Set up Flume to monitor a directory and store its data in HDFS

      9:12
    • Spark Streaming: Introduction

      14:27
    • [Activity] Analyze web logs published with Flume using Spark Streaming

      14:20
    • [Exercise] Monitor Flume-published logs for errors in real time

      2:02
    • Exercise solution: Aggregating HTTP access codes with Spark Streaming

      4:24
    • Apache Storm: Introduction

      9:27
    • [Activity] Count words with Storm

      14:35
    • Flink: An Overview

      6:53
    • [Activity] Counting words with Flink

      10:20
    • The Best of the Rest

      9:24
    • Review: How the pieces fit together

      6:29
    • Understanding your requirements

      8:02
    • Sample application: consume webserver logs and keep track of top-sellers

      10:06
    • Sample application: serving movie recommendations to a website

      11:18
    • [Exercise] Design a system to report web sessions per day

      2:52
    • Exercise solution: Design a system to count daily sessions

      4:24
    • Books and online resources

      5:32

About This Class

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

  • Install and work with a real Hadoop installation right on your desktop with Hortonworks and the Ambari UI
  • Manage big data on a cluster with HDFS and MapReduce
  • Write programs to analyze data on Hadoop with Pig and Spark
  • Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto
  • Design real-world systems using the Hadoop ecosystem
  • Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue
  • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

145

Students

--

Projects

  • --
  • Beginner
  • Intermediate
  • Advanced
  • All Levels
  • Beg/Int
  • Int/Adv

Level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

Frank Kane

Founder of Sundog Education, ex-Amazon

Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

See full profile

Report class