The Ultimate Hands-On Hadoop: Tame your Big Data!

Frank Kane, Founder of Sundog Education, ex-Amazon

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
96 Lessons (14h 30m)
    • 1. Introduction, and Install Hadoop on Your Desktop

      17:47
    • 2. Hadoop Overview and History

      7:44
    • 3. Overview of the Hadoop Ecosystem

      16:46
    • 4. Please Follow Me on Skillshare!

      0:16
    • 5. Tips for Using This Course

      1:09
    • 6. HDFS: What it is, and how it works

      13:53
    • 7. [Activity] Install the MovieLens dataset into HDFS using the Ambari UI

      6:20
    • 8. [Activity] Install the MovieLens dataset into HDFS using the command line

      7:50
    • 9. MapReduce: What it is, and how it works

      10:40
    • 10. How MapReduce distributes processing

      12:57
    • 11. MapReduce example: Break down movie ratings by rating score

      11:35
    • 12. [Activity] Installing Python, MRJob, and nano

      7:43
    • 13. [Activity] Code up the ratings histogram MapReduce job and run it

      7:36
    • 14. [Exercise] Rank movies by their popularity

      7:06
    • 15. [Activity] Check your results against mine!

      8:23
    • 16. Introducing Ambari

      9:49
    • 17. Introducing Pig

      6:25
    • 18. Example: Find the oldest movie with a 5-star rating using Pig

      15:07
    • 19. [Activity] Find old 5-star movies with Pig

      9:40
    • 20. More Pig Latin

      7:34
    • 21. [Exercise] Find the most-rated one-star movie

      1:56
    • 22. Pig Challenge: Compare Your Results to Mine!

      5:37
    • 23. Why Spark?

      10:06
    • 24. The Resilient Distributed Dataset (RDD)

      10:13
    • 25. [Activity] Find the movie with the lowest average rating - with RDD's

      15:33
    • 26. Datasets and Spark 2.0

      6:28
    • 27. [Activity] Find the movie with the lowest average rating - with DataFrames

      10:00
    • 28. [Activity] Movie recommendations with MLLib

      12:16
    • 29. [Exercise] Filter the lowest-rated movies by number of ratings

      2:51
    • 30. [Activity] Check your results against mine!

      6:40
    • 31. What is Hive?

      6:31
    • 32. [Activity] Use Hive to find the most popular movie[Activity] Use Hive to find the most popular movie

      10:45
    • 33. How Hive works

      9:10
    • 34. [Exercise] Use Hive to find the movie with the highest average rating

      1:55
    • 35. Compare your solution to mine.

      4:10
    • 36. Integrating MySQL with Hadoop

      8:00
    • 37. [Activity] Install MySQL and import our movie data

      7:35
    • 38. [Activity] Use Sqoop to import data from MySQL to HFDS/Hive

      7:31
    • 39. [Activity] Use Sqoop to export data from Hadoop to MySQL

      7:16
    • 40. Why NoSQL?

      13:54
    • 41. What is HBase

      12:55
    • 42. [Activity] Import movie ratings into HBase

      13:28
    • 43. [Activity] Use HBase with Pig to import data at scale.

      11:19
    • 44. Cassandra overview

      14:50
    • 45. [Activity] Installing Cassandra

      11:43
    • 46. [Activity] Write Spark output into Cassandra

      11:00
    • 47. MongoDB overview

      16:54
    • 48. [Activity] Install MongoDB, and integrate Spark with MongoDB

      12:44
    • 49. [Activity] Using the MongoDB shell

      7:48
    • 50. Choosing a database technology

      15:59
    • 51. [Exercise] Choose a database for a given problem

      5:00
    • 52. Overview of Drill

      7:55
    • 53. [Activity] Setting Up Drill

      10:58
    • 54. [Activity] Querying across multiple databases with Drill

      7:07
    • 55. Overview of Phoenix

      8:55
    • 56. [Activity] Install Phoenix and query HBase with it

      7:08
    • 57. [Activity] Integrate Phoenix with Pig

      11:45
    • 58. Overview of Presto

      6:39
    • 59. [Activity] Install Presto, and query Hive with it.

      12:26
    • 60. [Activity] Query both Cassandra and Hive using Presto.

      9:01
    • 61. YARN explained

      10:01
    • 62. Tez explained

      4:56
    • 63. [Activity] Use Hive on Tez and measure the performance benefit

      8:35
    • 64. Mesos explained

      7:13
    • 65. ZooKeeper explained

      13:10
    • 66. [Activity] Simulating a failing master with ZooKeeper

      6:47
    • 67. Oozie explained

      11:56
    • 68. [Activity] Set up a simple Oozie workflow

      16:39
    • 69. Zeppelin overview

      5:01
    • 70. [Activity] Use Zeppelin to analyze movie ratings, part 1

      12:28
    • 71. [Activity] Use Zeppelin to analyze movie ratings, part 2

      9:46
    • 72. Hue overview

      8:07
    • 73. Other technologies worth mentioning

      4:35
    • 74. Kafka explained

      9:48
    • 75. [Activity] Setting up Kafka, and publishing some data.

      7:24
    • 76. [Activity] Publishing web logs with Kafka

      10:21
    • 77. Flume explained

      10:16
    • 78. [Activity] Set up Flume and publish logs with it.

      7:46
    • 79. [Activity] Set up Flume to monitor a directory and store its data in HDFS

      9:12
    • 80. Spark Streaming: Introduction

      14:27
    • 81. [Activity] Analyze web logs published with Flume using Spark Streaming

      14:20
    • 82. [Exercise] Monitor Flume-published logs for errors in real time

      2:02
    • 83. Exercise solution: Aggregating HTTP access codes with Spark Streaming

      4:24
    • 84. Apache Storm: Introduction

      9:27
    • 85. [Activity] Count words with Storm

      14:35
    • 86. Flink: An Overview

      6:53
    • 87. [Activity] Counting words with Flink

      10:20
    • 88. The Best of the Rest

      9:24
    • 89. Review: How the pieces fit together

      6:29
    • 90. Understanding your requirements

      8:02
    • 91. Sample application: consume webserver logs and keep track of top-sellers

      10:06
    • 92. Sample application: serving movie recommendations to a website

      11:18
    • 93. [Exercise] Design a system to report web sessions per day

      2:52
    • 94. Exercise solution: Design a system to count daily sessions

      4:24
    • 95. Books and online resources

      5:32
    • 96. Let's Stay in Touch

      0:46

About This Class

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

  • Install and work with a real Hadoop installation right on your desktop with Hortonworks and the Ambari UI
  • Manage big data on a cluster with HDFS and MapReduce
  • Write programs to analyze data on Hadoop with Pig and Spark
  • Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto
  • Design real-world systems using the Hadoop ecosystem
  • Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue
  • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm