The Theory of Deep Learning - Become a Data Scientist | Merishna Suwal | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

The Theory of Deep Learning - Become a Data Scientist

teacher avatar Merishna Suwal, Teaching Data Science on Skillshare

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

8 Lessons (32m)
    • 1. Welcome to the course!

      1:14
    • 2. Inspiration behind deep learning

      1:42
    • 3. What are neurons?

      4:40
    • 4. Computations in a neuron

      5:01
    • 5. Activation functions

      5:11
    • 6. How does a neuron learn?

      4:46
    • 7. Deep Neural Network

      4:31
    • 8. How does a Deep Neural Network learn

      4:31
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

14

Students

--

Projects

About This Class

Learn The Theory of Deep Learning in this detail course created by The Click Reader.

In this course, you will learn the inspiration behind deep learning and how it relates to the human brain. You will also gain a clear knowledge about the building blocks of deep learning (called neurons) along with how they compute, make predictions and learn.

We will then move onto learning the theory of deep neural networks, including how data is fed into it, how neurons compute the data and how predictions are made. We'll end the course by learning how deep neural networks learn/train using a combination of feed-forward and back-propagation cycles.

Also, do not worry if you're not great at mathematics since we've covered all the necessary mathematical concepts in the course itself along with real-life examples.

Meet Your Teacher

Teacher Profile Image

Merishna Suwal

Teaching Data Science on Skillshare

Teacher

Hi! I'm Merishna and I am a data scientist from Nepal dedicated to providing top-notch educational courses related to data visualization, data science and Machine Learning on the Skillshare platform.

I hope to impart the knowledge that I have gained in our professional career through the courses I've put up on Skillshare.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Welcome to the course!: Hello and welcome to this course on the theory of deep learning. I'm Mishnah, a Data Scientist at the clique reader, and I'll be your instructor for this course. The planning is a type of machine learning algorithm that powers the state of the art technological solutions in the present decade. Being a deep learning researcher or an industrial data scientist is worth the best color your decisions you can make. Most fortune 500 tech companies, such as Facebook, Google, and Amazon have been heavily investing in deep learning research. In this course has been running the thing about the COD of deep learning, including what does a neutron and how does it learn what our activation functions and how to build entry in our own deep neural network from scratch. This course is aimed for aspiring data scientists looking to build powerful solutions. So if you are one, then feel free to join us in this course. And if you're ready, let us hop on to the course and start learning. 2. Inspiration behind deep learning: Hello and welcome to this lesson on inspiration behind deep learning. In this lesson, we will discuss some of the introductory concepts related to deep learning and its origin. So let's get started. Deep learning algorithms have been heavily inspired by the human brain. And it's ability to generalize learning. Generalization allows the human brain to abstract knowledge from PVC fees problems for solving new and unseen problems. Before the discovery of deep learning algorithms, machine learning algorithms couldn't generalize Bill. A lot of researchers spend time on learning how to mimic the human brain. And deep learning soon came into light. The most obvious similarity between deep learning algorithms and the human brain is the presence of neurons. Although the working of a neuron in the human brain differs, a deep learning algorithm. They haven't abstractly similar learning mechanism. In both cases, neurons learn from the input provided to them. Beat through the five basic human senses are through rows of numerical data. However, deep learning algorithms are still very different from a human read. In the next few lessons, we will understand what our neurons in deep learning and how they learn. We will also understand how a deep neural network is formed by the combination of neurons. So I'll see you in the next lesson. 3. What are neurons?: Hello and welcome to this lesson on what our neurons. In this lesson we will have a brief introduction to the key concepts related to the structure and working of a neuron. So let's get started. Neurons are computational units which taking real valued inputs and compute real valued outputs using the mathematical formulae, the input X, also known as a feature, when passed through a neuron, gives an output value y hat, which is the prediction. So as an example, we can take the prediction of flood with rainfall and temperature as inputs neuron, which then predicts a value in the output predetermine the possibility of flood. The features are weeded based on their significance in the determination of the output, as shown in the figure. The weight for the input feature X1, that is the rainfall is W1. And the width for the feature x2, that is the temperature is W2. Looking at the full picture on Iran, dk's in inputs pass from the input layer with the respective weeds and processes the value to provide the prediction as its output through the output layer. So let us understand how a neuron computes its output. Given a set of inputs. It follows a basic two-step process and we will be discussing them shortly in this lesson. The first step involves the calculation of the linear combination of inputs and a reads plus bias. The second step involves the calculation of the activation function resulting in the neurons output. So let us talk about each of these steps in detail. The first step in computing the output from the neuron involves the calculation of the linear combination of the inputs and their weeds thus bias. A-linear combination is essentially the combination obtained by adding the products of each of the inputs and the reads together. A bias is added so as to prevent the resulting value to be 0 when the linear combination is 0. In this equation, the values W1 X1 plus W2 X2, and so on is known as the linear combination. And b is the bias. This can be written in matrix notation as zed equals to w dx plus B, where w d x represents the linear combination of the weeds and the inputs. And w t is the transpose of the one-dimensional matrix of beads, and x is the one-dimensional metrics of inputs. This is quite similar to the equation of a straight line with intercept b0. So as we can see, the plot on the left is primarily the equation of a straight line, zed equals MX plus B with a line passing through the origin, resulting in the value of b as 0. On the other hand, the plot on the right represents the equation of a straight line that equals MX plus B with a line above a certain height b, that is the intercept. The second step in computing the output from a neuron involves the calculation of the activation function. So the output denoted by Y hat is calculated based on the activation function, that is f z, which is essentially the value w dx plus B pass through a function f. So I hope all of these notations, but not overwhelming for you. As a quick recap, we learned about the basic structure for walking off a neuron with the input layer for passing the input features along with the weeds. The neuron for processing the input and the output layer for predicting the output value. We also discuss the steps for computation of the output, that is, calculation of the linear combinations and the bias and cancellation of the activation function. In the next lesson, we will understand what all of these mathematical notations mean through a real-life example. 4. Computations in a neuron: Hello and welcome to this lesson on computations in a neuron. In this lesson, we will learn in depth about how a neuron works and computes its output values. So let's get started. As we had learned in the previous lesson on neuron computes the output in a total of two steps. The first step involves the calculation of the linear combination of the inputs and their weeds along with the bias. The second step involves the calculation of the activation function resulting in the neurons output. Now the two features or the independent variables that we discussed in our flood prediction example earlier, where rainfall and temperature. So let's consider the following data point. We have the rainfall in millimeters as the first input to be AD, and temperature in degrees as the second input to between eat. The condition or the label for prediction denoted by y is flood. So let us see how we can predict a condition using only the information of the rainfall and temperature. Let us have a brief look at this diagram for flood prediction problem. We have two input features, rainfall and temperature with their respective weeds and a neutron, which processes the input and predicts the value of output as flood or new flood. Here, the value of VT determines the importance of the feature. So we'll assign a higher value of V to the feature rainfall as it is bound to affect the possibility of a flood More than a temperature does. So the first step, calculate the linear combination of the inputs and weeds along with the bias, which is given as the sum of products of input and the weeds along with the bias. Now let us go ahead and substitute the values of the inputs, their weights, and the bias for the function. So we have our value of input X1 as ETM and x2 asthenia degrees, as we supposed earlier. Now, as we discussed, we know that rainfall affects the possibility of flood. That is our prediction more than the second feature, that is a temperature. We will hence assign the weight for rainfall as 0.2, while for temperature as 0.00001, which is quite lower than W1. Also, we have added a value of one as the bias soap on assigning and computing these values to our equation for two sets of input features, that is zed equals to X1, W1 plus W2 plus b, we get a value of 17 to eight as the value of z. The second step is to calculate the activation function to get the output from the neuron, that is Y hat. We will be computing the activation function of r value zed, which is also the function of w dx plus B. We will discuss in detail about the activation function in our next lesson. So we are using the sigmoid function, which is given by the equation one by one plus e to the power minus dead as activation function. The sigmoid function gives a probability of input data belonging to a certain class. That is for two classes, a and B as no flood, if the probability of the output is greater than 0.5, then the input data falls in class E as it falls in Class B. So we will now substitute the value of set that we previously calculated in the function one by one plus e to the power minus seven, which gives us the output prediction value as 0.99. Our output prediction has a probability greater than 0.5. And thus the output of the neuron seats that the possibility is flood. But how did the prediction becomes so accurate? Just because we pass the input data through the two mathematical functions. Well, this is purely the result of luck. If the weeds we had randomly initialized had been different than the ones in the above example, then our prediction may have been different. Our neuron here is not learning anything from the data, and it is just computing the input data given to it. Keep this in mind as we move on to future lessons for learning how to train a neuron in finding the value of best weeds. But for now, congratulations on knowing what neurons in the planning are. 5. Activation functions: Hello and welcome to this lesson on activation functions. In this lesson, we will learn about the VDS activation functions in deep learning. So let's get started. In our previous lesson, we discussed only briefly about activation functions since they required a lesson of their own. So let's dive into the concept of activation functions before learning how to train a neuron. An activation function is a nonlinear function that determines the output of a neuron. It changes the output of the linear combination of inputs and meets to a nonlinear range of values. So let us quickly refresh our knowledge on linear and non-linear functions. A linear function is a function that has a constant slope. As we can see, the function in the figure, fx equals two, x plus two has a straight line plot with a constant slope. On the other hand, a non-linear function is a function that has where I'm Sue across multiple data points. As we can see, the function in the figure, fx equals two X squared has a parabolic code with different value of slopes at different points of the code. Now let us understand why activation functions are important unused in deep learning. Suppose we have data points of two classes, a and B arranged in a graphical plot as shown in the figure. We want a function that can separate these data points into two groups. Differentiating these classes using a linear function, we can see that the line does not do a good job in dividing the two classes into different groups. Now, let's see what happens when we try to divide them with the help of a non-linear function. It can be clearly seen that the curve of this function does the job quite perfectly in dividing the two classes into different groups. So let us read this concept with why we are using activation functions. In a neuron. The linear combination of inputs and their weeds is a linear function. So in order to make the output of the linear function non-linear, we meet the use of an activation function. Some of the popular activation functions are sigmoid, Dan, edge value and leaky value. So let us discuss each one of them one by one. So the first activation function we're going to discuss is the sigmoid function, which we use as an example in the previous lesson. The sigmoid activation function gives a value between 0 and n1 and uses the function f z equals to one by one plus e to the power minus it. This function is also known as the logistic function and has an S-shaped curve known as the sigmoid CO. Next, we have the hyperbolic tangent function, also known as the damage function. The activation function gives a value between minus 11 and uses the function F that equals to e to the buys it minus e to the minus n divided by each deposit plus e to the power minus n. The function has a cough similar to sigmoid that goes from the values y equals 2x minus one to one. Next, we have the value or the rectified linear unit function. The activation function gives the input directly as an output if it is positive, otherwise it will output 0. It can be represented by the function max 0 comma x and the plot as shown here. So the last activation function we're going to discuss in this lesson is the leaky value activation function, which is a variation of the activation function. The leaky value activation function gives the input directly as an output if it is positive, otherwise, it will output a smaller version of the input. For a value x smaller than 0, a small value of alpha i's multiplied to X to create a slight information leak in the left part of the value. That is the part where the output is always 0, which can be seen slanted here in the figure. This is done so as to deal with the dying value problem, which causes the value to output 0 for every d dot. This problem is caused when is continuously supplied with negative values in input, which results in the neuron to get stuck at outputting 0 every time. So yes, status it for activation functions. And in the next lesson we will understand how a neuron learns using an algorithm called gradient descent. So I'll see you in the next lesson. 6. How does a neuron learn?: Hello and welcome to this lesson on How does a neuron learn. In this lesson, we will discuss the foundational concepts on how a neuron actually learns from the data. So let's get started. In order to understand how a neuron loans let us first understand how we humans learn to solve mathematical problems. Using this knowledge, we can easily break down the topic for this lesson in relation to its inspiration with the human brain. So let us consider the following mathematical equation, which contains two unknown variables, x and y. A simple yet time-consuming me to solve for both x and y is to randomly guess their values. This is similar to what we had done in the previous lesson when learning how a neuron computes its output. And alternative method is to take a metric called Arrow to determine how to change our values with respect to it. It can be done by performing the following steps in sequence. We start by randomly choosing a value for both x and y. Using the value of x and y in the equation, we calculate the value of the error between the guess that is in the left hand side and the actual value in the right hand side. If the error is positive, we decrease the values of x and y Alternatively, until the value of left-hand side and right-hand side are equal. If the error is negative, we increase the values of x and y Alternatively, until the value of left-hand side and right-hand side are equal. This is much better than random guessing because we move to the solution gradually. Also, if he made the guess very close to the actual values, we might find the solution in just two or three iterations. The step of increment or decrement speeds up the process. However, choosing the right amount of ment or document steps becomes essential to ensure gradual decrease in the error and prevent any cases of overshooting or deviation from the optimum value for 0 arrow. This method is also called variable optimization, where we tried to optimize the variable My analyzing its change in era. A neuron uses the same technique to learn in an effort to mimic how human ring solves the problem. If we substitute the values x and y as V, It's W1 and W2 and the bias B as 0, we are essentially computing the linear combination of the weeds. So let us now establish this concept to understand how a neuron loans. It starts by choosing a random set of weeds initially. It then calculates the error between the guest and the actual value. The neuron then updates the weights assigned to each input gradually and repeats. Each iteration on the loss is minimized. The error is calculated using a loss function denoted by GW. For our previous example, the loss function can be written as the difference between the predicted value, that is the guess minus the actual value, the increment or decrement of the weeds. We use an optimization function called read in dissent. The formula for gradient descent to update the weights is as follows. The value of the new wheat Wn plus one is given by the difference between the current weight w n and the product of the learning rate denoted by alpha and the partial derivative of the error, that is G WN, also known as the gradient of the loss function. The gradient of the error or the loss function is the matrix consisting of the partial derivative of the loss function with respect to each of the weeds as shown here. This entire process is iterated multiple times until the loss of the neuron reaches a certain threshold. Once the model reaches the threshold, the learning process or the training is stopped. Therefore, a neuron lawns simply by updating its weeds to minimize its task error. In the next lesson, we will finally see how a neutron acts as the backbone of deep learning with the use of deep neural networks. So I'll see you in the next lesson. 7. Deep Neural Network: Hello and welcome to this lesson on How does a deep neural network learn. In this lesson, we will be learning about how a deep neural network learns from its input to give an output prediction. So let's get started. The learning process of a deep neural network is also very similar to the learning process of a neuron. It goes through multiple finite iterations, where data is feed forward it into the network, and then weeds it using backpropagation. So what we have here is the feed-forward network of a simple deep neural network with interconnected layers of neurons. The aim of the deep neural network is to update the value of the output, that is y hat, such that the error or the loss function is minimized. This is done by backpropagation, which involves fine-tuning the value of weeds based on the error of being in the previous iteration. This is quite similar to the concept of how a neuron learns that we discussed earlier. So let us understand how a neural network loans. It starts by choosing a random set of weeds initially. Upon competition of the output value, that is the guess. It calculates the error between the guest and the actual value. Based on the error obtained. The network then back propagates towards the input layer in order to update and adjust the weights such that the error is minimized. This process is repeated until the loss is minimized. Or in other words, a desired value of error is reached. So here we can see the example of a deep neural network with an input layer connected to two hidden layers with two neurons each and an output layer. Each of the values have their respective reads associated with them. These weeds are taken in random for the first iteration of the network from the input layer to the output layer. Upon calculating the output value, we calculate the error metric based on which further iterations and the value of meats are determined. The error is calculated using a loss function denoted by dw, which is also known as the cost function. The loss function can be calculated by using the formula GW equals the difference between the predicted value and the actual value. Now, for the increment or decrement, that is the tuning of the weeds. We use an optimization algorithm called Reading descendant. We've already discussed about grading descent in our previous lesson. And the formula for gradient descent to update the weights is as follows. The new wheat Wn plus one is given as the difference between the current wheat wn and the product of learning rate, that is alpha and the gradient of the cost function denoted as delta G WN. This entire process is iterated multiple times until the loss of the neural network reaches a certain threshold. Bonds the deep neural network reaches a threshold. The learning process or the tweening is stopped. So congratulations on completing all the lessons of this course. With this, we've come to the end of this course on the theory of deep learning. As a recap, we learned in detail about the inspiration behind deep learning along with the theory associated with neurons and neural networks. As a reminder, we will be constantly, as a reminder, we'll be constantly updating this course. So make sure to check in on a future date for added. As a reminder, we'll be constantly updating this course. So make sure to check in on a future date for updated materials and lessons. We are really glad to have been a part of your journey on learning the theory behind deep learning. So feel free to send us any queries you have regarding this course. And we wish you the very best on applying your new lawn skills into practice. 8. How does a Deep Neural Network learn: Hello and welcome to this lesson on How does a deep neural network learn. In this lesson, we will be learning about how a deep neural network learns from its input to give an output prediction. So let's get started. The learning process of a deep neural network is also very similar to the learning process of a neuron. It goes through multiple finite iterations, where data is feed forward it into the network, and then weeds it using backpropagation. So what we have here is the feed-forward network of a simple deep neural network with interconnected layers of neurons. The aim of the deep neural network is to update the value of the output, that is y hat, such that the error or the loss function is minimized. This is done by backpropagation, which involves fine-tuning the value of weeds based on the error of being in the previous iteration. This is quite similar to the concept of how a neuron learns that we discussed earlier. So let us understand how a neural network loans. It starts by choosing a random set of weeds initially. Upon competition of the output value, that is the guess. It calculates the error between the guest and the actual value. Based on the error obtained. The network then back propagates towards the input layer in order to update and adjust the weights such that the error is minimized. This process is repeated until the loss is minimized. Or in other words, a desired value of error is reached. So here we can see the example of a deep neural network with an input layer connected to two hidden layers with two neurons each and an output layer. Each of the values have their respective reads associated with them. These weeds are taken in random for the first iteration of the network from the input layer to the output layer. Upon calculating the output value, we calculate the error metric based on which further iterations and the value of meats are determined. The error is calculated using a loss function denoted by dw, which is also known as the cost function. The loss function can be calculated by using the formula GW equals the difference between the predicted value and the actual value. Now, for the increment or decrement, that is the tuning of the weeds. We use an optimization algorithm called Reading descendant. We've already discussed about grading descent in our previous lesson. And the formula for gradient descent to update the weights is as follows. The new wheat Wn plus one is given as the difference between the current wheat wn and the product of learning rate, that is alpha and the gradient of the cost function denoted as delta G WN. This entire process is iterated multiple times until the loss of the neural network reaches a certain threshold. Bonds the deep neural network reaches a threshold. The learning process or the tweening is stopped. So congratulations on completing all the lessons of this course. With this, we've come to the end of this course on the theory of deep learning. As a recap, we learned in detail about the inspiration behind deep learning along with the theory associated with neurons and neural networks. As a reminder, we will be constantly, as a reminder, we'll be constantly updating this course. So make sure to check in on a future date for added. As a reminder, we'll be constantly updating this course. So make sure to check in on a future date for updated materials and lessons. We are really glad to have been a part of your journey on learning the theory behind deep learning. So feel free to send us any queries you have regarding this course. And we wish you the very best on applying your new lawn skills into practice.