Big data analysis with Apache spark - PySpark Python | Ankit Mistry | Skillshare

Big data analysis with Apache spark - PySpark Python

Ankit Mistry, Big data and machine learning engineer

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
24 Videos (2h 16m)
    • Introduction

      1:04
    • Big data Overview

      10:18
    • Traditional Data Storage and Processing Software vs Big data

      3:32
    • Time Line of Big data and Hadoop based Eco-Systems

      4:37
    • What is Apache Spark

      7:03
    • Spark API Overview

      3:55
    • Getting started with Data bricks - For eager Sparker

      11:38
    • Different Ways of Installation

      3:48
    • Cloud Digital Ocean Setup - Installation -1

      8:01
    • Python3 and Jupyter notebook Installation -2

      6:23
    • Install Java, Scala, Py4j, Spark - Installation -3

      7:01
    • Set Path variable and start Jupyter notebook - Installation -4

      6:27
    • Spark Data frame API - Introduction

      2:38
    • Spark Session

      3:03
    • Import JSON data into Dataframe

      4:52
    • Define Custom schemaType

      4:09
    • Data frame as SQL Table

      3:48
    • Data frame Operation - 1

      2:15
    • Data frame Operation - 2

      8:50
    • Filter data

      3:03
    • Handling Missing data

      6:11
    • Dealing with datetime in Dataframe

      4:41
    • Introduction to Structured Streaming

      6:56
    • Streaming example

      11:52

About This Class

Spark can perform up to 100x faster than Hadoop MapReduce Data processing framework, Which makes apache spark one of most demanded skills. 

The top companies like Google, Facebook, Microsoft, Amazon, Airbnb  using Apache Spark to solve their big data problems!. Data analysis, on huge amount of data is one of the most valuable skills now a days and This course  will teach such kind of skills to complete in big data job market.

This course will teach  

  • Introduction to big data and Apache spark
  • Getting started with databricks
  • Detailed installation step on ubuntu - linux machine
  • Python Refresh for newbie
  • Apache spark Dataframe API
  • Apache spark structured streaming with end to end example
  • Basics of Machine Learning and feature engineering with Apache spark.

This course is not complete, will be adding new content related to Spark ML.

Note : This course will teach only Spark 2.0 Dataframe based API only not RDD based API. As Dataframe based API is the future of spark.

Regards

Ankit Mistry

10

Students

1

Project

  • --
  • Beginner
  • Intermediate
  • Advanced
  • All Levels
  • Beg/Int
  • Int/Adv

Level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

Ankit Mistry

Big data and machine learning engineer

I am Ankit Mistry, completed my master from IIT Kharagpur in area of machine learning, Artificial intelligence. Now working as Software Developer, Big Data Engineer in one of leading private investment bank with 8+ years of experience in software industry. 
Over the time I developed interest related to data discipline and  learned about data analysis, machine learning model development.

I am so excited to be on Skillshare online learning platform.

I hope y...

See full profile

Report class