Transcripts
1. What will you learn?: Hello, everyone. Welcome to the machine learning class. My name is the Shawl Raj food and I am deformed riddles. Think it's academy dot com. So what are you going to learn from this whole class? Well, I have bean working on this machine learning course for a long time, and now I have posted some off the videos in this class, so you will learn the machine learning concepts right from the scratch. I explain each and every topic in detail, and I also used real life examples to explain the concepts you will learn supervised learning, linear regression, logistic preparation, unsupervised learning. And there are a lot of topics that are coming soon on skilled share. So that's all for this tutorial. In the next tutorial, we're going to start with the definition off machine learning. We will define what is machine learning, and we will write Dive into it
2. What is Machine Learning?: so, no, we're going to start with other machine learning course that their definition off machine learning. So I'm going to show you in the soil. What is machine learning on? Basically, you will get an idea off what exactly we're going to do throughout the whole course. So machine learning is the study off computer algorithms that improved automatically through experience, and it is actually seeing as a subsidy off artificial intelligence. It's basically in machine learning. We create or study some computer in accordance. Now what is the specialty about diesel bottom is that they will improve automatically so we don't need to do any programming stuff for any further steps. It will automatically improve itself through experience. So, just like tried loans, walk to experience the machine learning algorithms. We'll also learn through experience. So pretty good thing. Next thing is that machine learning Al Gore comes build a mathematical model based on the sample data, and this sample data is known as the training data. So essentially we're going toe build the machine learning algorithms in the upcoming tutorials, and what these importance will do is they will create a mathematical model. A mathematical model is Basically it can be a function. It can be a probabilistic mortal or something else. So if we consider this mathematical model and we have some sample later, so we are given some data and we call it as the training later because we're going to use that data as an experience. And they will pas the whole data toe the mathematical model, the mathematical model B do some computations on, and it will be able to make some predictions or decisions, the doubt being explicitly programmed to do so. So if you have a training later, let's say, for example, we have, ah, training later we're we have some emails and we know that a particular email is a spam or not Spam. We're going to use this example a lot of times. So, uh, we have a training data, and in the training data, we have emails that you're actually classified as a spam or not a span, so they will pass the whole greater toe this mathematical model, and now if it will provide some additional email. So this morning, which is not in our training data, this model will be able to help me predict or decide whether the given email is, oh, spam or North A span. So there are different types off machine learning, which is supervised learning, unsupervised, learning, reinforcement, learning on recommend or systems From the next tutorial own words they're going to study about supervised learning and inside the supervised learning, we have linear regression and Lord speculation, so that start with the supervised learning in the next tutorial.
3. Supervised Learning: discuss it won't supervised learning what is supervised learning. So in supervised learning, we have Adidas it. We have a date. A certain this data said, is actually labor. So let's suppose I will create a very small later that here and this dude is it convenes a dio as the first column, and let's say the area is in square feet. The 2nd 1 is the price. The 2nd 1 is the price. So let's we just do it like this and all. We have a DEA and Bryce. So let's say the audio is it Zoom Grandi. Five 2 50 Square feeds. And this is basically a data set off house prices. So let's suppose we have a house and the area off that houses do. 50 Skirted that suppose the price is $300. Let's zoom back. It's just an assumption. The lips consider some more examples. 304 100 Oh, are It's a 700 200. And like this and for each off this area, we have a corresponding price. So let's say this will be 500 700 900. Why 200? It deletes it. Okay, so now we have a data set and it is labor. So now you can see for each off the area We have some price. I located. Do it. No, this column is actually represented as X. And this calling this one this one is actually represented us. Why so forgiven? X? I have Why Forgiven X? I have Ah, Why? Why also the output. So for ah house, which has a area to 50 square feet is he have a price? $300? And that goes on. So no, not saying there is a person. There is a person who wants to buy a house and his demands. He wants to buy a house that 500 square feet area. So the idea of the house that this person once was 500 square feet so you can see 500 square feet is not currently in The later so now we want to predict what with me? The price off the house. For that we will build up morning, right? We will build a model and it will be able to predict the price off 500 square feet. No, this data said in machine learning. This is known as the training. Newness it. This is known as the training do does it And this stable here that you can see here it is, actually oak sample Judas it. And sometimes it is actually represented as a capitally x I Goma y and this whole is both from my so and I closed toe Wonder then. So this is actually a re presented a D presentation off this data set. So no x for X forgiven X. Let's say I was the one, but I would be one here, so we have a particular value Why I call this morning to it. So in supervised learning, we have a data set that has a mapping from X to y, and we will cry to learn a mapping from this Do this. So our task is merely to find the price to predict the price. You cannot actually predict the price with 100% accuracy. So we will also see how to increase the accuracy off the Mourners. So this is basically supervised learning. Let's talk about unsupervised learning on supervised learning. Now, remember the second point that were disclosed in the machine learning it is all about understanding Parton's no in unsupervised learning. We knew how a data set we knew. How today does it look? The difference is this time we do not have. We have only the value of X. We do not have any particular outward or a mapping to the eggs. Remember this X In this way? Now these are the attributes off the state. Is it so no and unsupervised? Learning the house a set on data said. That goes from X one, x two and so on to Lixian. Right now, what we try to do is find the patterns among these values, and we try to group them. So let's suppose I create of group. Let's say C one Andi. Let's say the values x two x for an X e x six are actually forming a pattern so they are actually related, and similarly, we conform or groups. Now these groups are particularly known as clusters. These are known as clusters, So in unsupervised learning, we actually tried to build accordance that will make groups off them. And let's suppose I give a value. Let's say I give a value eggs and I want to say which cluster doesn't belong toe so there are avoiders that can actually predict that. So then condoms are basically known as the clustering algorithms. We have gaming's clustering, then we have by evening plastering the partitioning and voted them. We have had article and Goitom and so on. So this is unsupervised learning, and we're not going to actually cover unsupervised learning in the stream. We're going to cover the supervised learning force. So in supervised learning, the house a day does it. And now we want to learn among the price. So now supervise learning. Let's discuss about supervised learning in supervised learning. Supervised learning is basically divided and don't no sections. One is regulation, one is immigration and the 2nd 1 is known. Ask classification. All right, so we have we have classified. We have further divided the supervised learning into revision and classification.
4. Regression: Well, in this tutorial, relation is also off two types. It can be a linear regression, or it can be a North district regulation in this stream. We're going to cover the linear regression. So we're going toe, actually build a model that will be able to predict the price off the house using the regression technique. The next one is the classifications. And let's take a look at what is immigration? All right, so let's take a look at integration. No, we have the stable here, this table. So let's suppose these are the values or face and these are the values off. Why now? Let's suppose I tried to draw a graph. I'm trying to grow a graph like this. All right, so let's suppose the X axis this X axis will reap isn't the area in square feet. The idea is great feet, and the Y axis represents the price. Now, what I can do is I can use my data set. I can use this data set toe Lord, all of these in this graph. So let's suppose the points in the graph are something like this. Well, I'm just going something random here because it will take a lot of time for me to actually try to plot all of them. So let's suppose these other data points now. These are the points and we have seen these points and these these are actually drawn from over there desert. So we do have this. And now we say Now, what is the person desire the person wants to? No, the price at 500 square feet. So let's say 500 lies somewhere here. Let's say this is 500 clay feet. So what the person desires is to lona mapping from 500 to y axes. He wants to predict the way, So that's what we do in supervised learning. So what is the relation now? How can you actually predict for the value of 500? What will be the price? So the answer is that we will try to borrow a line. A line on it can be a golf. It can be a girl also, so they will try to draw a line and a cough. On this graph, the let's suppose there is a person who say, is that I will roar line. Something like this is the biggest change pile it on this one. So let's say the person say is I'm going to draw a line here and this line Would this mind re presents? Is the solution or the output or the outward right? So this line is I've gone this at random. We will learn how to actually find the best line. But now let's suppose there is a person who grows this line and what it does. Try to map something like this. We will try to find this. No. At this point, he can actually find the price. And let's say the price is actually let's $1000. So know this person draws a line. Let's say this is the line one. And using this nine, he has calculated in the price. So now our problem has Bean converted from predicting the value from production, drawing the line Oracle's. So if you will find a way to find the line or drawn a line or cove, it can be a couples. Let's suppose there is another person says I will drop walk over here like this, right? So he will try to find the value off 500 like this. And now this person say is that the value is lower and this place. So if you draw a line and a cause, then we can actually predict for the value off 500 we can predict the price. So this plotting off line and Cove is actually known as regulation revision is actually plotting the line or the cove on the love. If you're growing the line, it is known as linear regression. If you're going a golf, it is known as well, you know, mill regulation. Right. So, Martin Regulation says this is a line. We know that a line can contain only real value output. So this mine here or the skull will have the real value outward. I'm going to write her real value. This is what regulation is, and this is what we're going to do. This is the main objective off the stream. We're going to build a linear regression model, and we're going toe actually tried toe Ellen innovation. We will try to go online that will be best suited for our day. Does it that best predict with most accurate value off price for the value 500 right in classifications, though we do not door line. We grow some discrete values. We grow awesome. Discrete values, right? So what we actually do? It's that supposed This is a graph on. I draw some points here and I'd roll some points him. So I will say that this is basically last one. And this is basically class too. So if you can divide the data points, you know there are situations when you cannot actually draw a line or a golf because in some cases in this case, you can see if you will try to grow online. It will not give me a better output. Why? Because the best output is the one which bosses, which tries to passes to the data points toe All the data points you. Nobody does it. So in classifications, we sometimes have a date assert, which cannot actually work for drawing a line in it. It will not modulated. So what we do is we classify the points in two classes, starts why it is known as classifications. If we have divided it into two classes, we say it is a binary classification. What if we have some more classes? We say it is a Murphy classifications
5. Linear Regression Hypothesis Function: in this tutorial, be agreeing to cover the linear regression. Let's try to understand lean irrigation, then in linear division. Know what? I'm going to do it. I'm going to again draw on a graph and again we we will have et A who in a square feet and we have rise here behalf price. And again we will have some data points. I'm again trying to draw some data points. It saves like this and we want to find out for work value off 500. What will be the price? I So I want to find the line. I want to draw a line. So our objective, our objective I'm a gooey is to first. The first objective is to drawn a line, right. We want to draw a line. The second objective is to grow a line which best fits our day. Does it? Which best fits our way? Does it? So let's suppose a person say is I will draw a line help. They can see it is so far from the data points. So this is not a desired light. A person says I will draw a line here. This is also not a design not a good line. And when we say that's not a good line, we say it's not the best fit for our leaders said so Best line will be a line that actually cries us. So these data points such that the distance from all these points will be minimum. No, If you see the distance between these points like this, this should be minimum. We will, after minimize this dumb more. This distance is minimum, the better, or the best fit is the line. It's over. Objective has gone from prediction, drawing a line or golf best fitting the line off. So the day does it. So let's see how to first draw a line. What do we need to go away? Well, if I have the equation off, mine is why the color? Because, all right, so let's suppose we have an equation off line. We say MX plus C is the general representation on any line into the space. So we have an equation offline. Griffin offering. So if there is a way no find the equation off the line that will best fit our data set, our work off production is done. Then we can easily Plourde the value and get our Christ. Then it is easy. No, we have X exes. Work 500. And this is what we want. Toe Predict. So when do we actually warned you? Find out is in on. See Right. We want to find out in C So N is the slow offline and C is the intercept. I seize the Indus it. So our whole objective has been buoyed down from our election from election. No drawing a line, No final and and see. Right. So now if someone asks you how you can actually find out or predict the area, you can easily say that if there is a way to find out the slope on the intercept, then it will be very easy for me to Lord vacation. I get the line, I would plot, I will get in the way. So that's easy. Now this equation off nine. You can see this is actually a function. No, this equation off line does not mean that we will. We will get a very perfect line. Best fit means it fits pairs, but it is not 100% accurate. Remember, in prediction, it's almost impossible to get 100% accurate, right? So what I'm saying is, if we cannot predict what this Linus, then what I'm doing is I'm giving a hypothesis. Is I? I'm trying to give a hypothesis and hypothesis is means that I hypothesize that this equation can actually help me predict the value off the price given a DEA as 500. So this is a hypothesis. So that's why we tried to build an equation which we call as the hi Pontus is function, hypothesis function. No high part of this function is similar to this might be just write checks. So we put eggs, hair, and we will get the hypothesis as the answer. So we have It's also but instead off slope and intercept, which is him and CV Hold them has para meters because they actually want to find them. The whole game is all around these to you find the perimeters and the work is done. So in the equation, in the hypothesis function, we say it's the right slope as Peter deal on, right? And does it best, you know, one for the sake off complexity. Right? So this is hypothesis equation. It is very important we will market us one. So you re presently new regulation. We can actually use the hypothesis function now I have to find Peter Nord. What? He does Ito and feet No one. Now the correct choice. The correct choice on the correct pair off Pita zero and Peter one will make the hypothesis is closest 12 around. Sir, you can see in this case in this one you can observe the Indus It intercept, which is you don't want is large here. You can see than the intercept is small. And that's why we have got a very good line. A best breadline, right? So now again, our objectives has boiled down to finding he did not. And Peter zero This is our objective. No, to find Tita, Zito and Dedo one, we make use offer function which we call as a cost function. Why cause function? Because this function it basically Catholics the cost off feed a zero and you don't want so this cost function I'm going to right here The discourse function heads and get the mining into the mining The cost off he does zero and the door. So we have the cost function which will help us calculate the values off. Did a 01 do you know one? So with the study about this cost function, which we present as J theta zero comma theta one with the study about the cost function in detail in the next tutorial.
6. Cost Function Linear Regression: it will be a real valued girls and it is basically going to help us final. The actual the actual output. Right? So a linear relation. We try to draw a line and we gave ah aggression off line as why goes to an explosive C and we actually formulated it as a hypothesis function. I bought this This function so hypothesis function is etch eggs physicals to Thida zero off X plus one. So this is basically our hypothesis function. And we said that if I will have the perfect value off Peters, Peter, one perimeters, then I will be able to get a line which will It will give me the outward night. So we defined the cost function. We have no defined it yet. We will define the cost, function and course function will be represented as J off Tita zero comma theta one. So the cost function. We say that given these two para meters zero and you don't want we were trying to find out the cost. Let's see what that means. So first of all, I would like to draw a graph in this craft. We are actually going toe take the previous example off a video versus price Gulf. So we have a area in square feet and we have the price. So we will plot the later points. So I'm going to plant it randomly here. Right. So we want to plant the Living Lord. These data points which actually are the on which we have actually seen in the data set. So now we want to plot the hypothesis function, and we want the perfect pair off t d zero nt don't want. So now let's suppose there is a person and let's see his person A and what person it does. He says that I think that this line, this one which I'm going to draw here According to him, he says this one is the best fit, best fit. So it means that in order to draw this line, he might have chosen to zero and the, you know, one which are nothing but slope an intercept. So he might have chosen some intercept from here. And he also I have you something. Does it'll value. So I'm going to write her. He used some he doesn't know called Matthew don't want you to draw this line. And this is my personae. So there is another person and he claims that I think that this line, this one according to him, person me. He says This one is the best fit and also here skills in some he does Ito indeed are one para meters in order to grow this night from this graph, we can actually say that be the person B is giving us a solution, which is far better than a because the line be is actually passing through. Most of the data points on is best fitting our data points so I can actually data mined from the situation that B is better than a right. We can compare these two lines, but what will happen if there is a person who say's now there is another person and he games that this line which is in green color, this one. It's known there is a person here. It said This is by sea Now. This person claims that this is the best fit, been chosen, some values off data zero anti Dewon. So you can ju ST a zero in Quetta when anything and you will be able to walk in any line on this craft. So no, which is Meadow is me this fit? What are you doesn't or C Now, in this case, we're not able to data mine that So we need some metric or some major to find out which one is better. And that is why we have introduced course function. So what cost function does when you have picked up a value of fetal zero and did a one like we have done in these three cases, this function will output of cost. It will output. Of course, in each of these cases, no, we can use this cost toe compare on all of these cases. So if the cost is minimum so the A will have some cost. Beaven also have some calls to see is also having some cost and we can actually use. We can compare all these recalls and out of these, the minimum one. Whatever will be the minimum. We're going to just saying that this is the best deal. And then we will input these two potatoes, a guaranteed a one in here. And we will say that this is the best part line and so the problem will be solved so The first question is how to find that cost. What is the cost function? It's known the cost function has a mathematical formula. In short, the mathematical formula is known as the means squared distance means squared distance. So what is me? Square distance? That's see what exactly? This. So let's suppose we have a graph game here and I will try to again draw some data points. Ryan and me. And now let's suppose we have a line This and obviously I have chosen some bit off theatre zero comma theta one to blow on this line. Now the mean square distances that in order to find the cost when considering a pale but you have to do is you will have to find out. First, you will have to find other distance. From this point through, you can see that this point has a projection on this access. So this is our area. So let's suppose this is exactly let's say, 400. This is X zero, So zero comma 400 at this point. And if you see it hair on the Y axis, which is the place that say it has, ah value 3000 this is actually the price. So this point basically represents that meant media is 400 square feet. We have the price 30,000 or 3000 comments either. Right? So what we're trying to do is we tried toe find distance from this point, and so the distance off the line off this line and this night is nothing. But our hypothesis is so let's suppose there is a point here. Now we will try to find the distance between the actual point and the predicted values at this point is actually the prediction, right. So I'm going to say that this is a predicted value, and this point is the actual point or the actual value. No, these distant this distance can be calculator like this, But you can What do you just need to do it? Just find the projection. You know, for this hypothesis function, there will be some value. I'm going to say it is. Thanks. I Goma y I. And since this is the hypothesis function, I'm going to just like cap on them so that we will be able to know that this is actually also the predictive value nor the actually no. Here we have a projection and this is all the actual value. So I'm going to write X icom a wire. So on the y axis we have Why I gap and for the actual value we have Why I and why I is nothing but angel fix. Then you are considering the Para meters. You right, pita in front of X. Right. So you right, Peter? Because the batter meters rtw You know what we just do is the way we want to find the distance in the distance is all the difference between edge tater eggs and by I This is actually ah a little bit complex but this is how we actually can go the distance. We tried to find out the projection why I gap which is nothing but our a sea dikes. This is why I gap. Then you have considered Tita zero and you don't want you and and then you have Why I which is the actual data point. So what you have to do is you will have to subtract edged data eggs minus y I. This is the distance. Now if you try to capture the same for from this point to this one, you will find out that the value off HT tax will be greater than why I in that case we will get a positive distance. But in this case, you will get a negative business. So that's why we're doing squares. That's why we do the square so that we don't want negative distances. No, we know what is squared and what is distance. This is the squared distance, like awful calculating all the distances from all the points do. The hypothesis is nine. We will add them up, right. We will add them up. So I'm going to write submission here, and it will go from I close to one, which is this point. Let's say this is the first point. So it will go from my was to one. And let's suppose there are MM points, right? So it will go from Michael's the wounded. And so this is how you can't leave the values als zero and Peter one. This is how you, Catherine the square distance. Now, when you will add all off, then all these distances and you will try to add them up and you are also splitting this now This will give you a very huge value. And since the will blood the cost function on the graph, the planting will become nous a great. So what we do is we divide it by one, so you buy in. So that's why we say that it is me because we're actually adding whole distances, hold square distances and then we're taking the me, which is one my am. And now I can easily plot the cost function. So this is known as inducing. This is the cost function infusion. And I hope this concept that why we are actually taking the difference off Estate X and Y it's clear. So when you're considering, did a zero indeed a one here you will be able to get a cost using this formula. And similarly, if you have some other line, let's suppose you have some other line, which is let's say like this. Now you but you will have to do is it will again find out their distances like this from all the points actually points to this so foursome other values of theta zero theta one and then you are again going to put the values here and Jack Reed is, um you will get the course. So for each pair off Theatre zero data one, or for each hypothesis function, I can calculate the cost. If I can calculate the cost, I will just have to find the minimum value off the cost. And that would basically help me in understanding and finding the best hypothesis function on the best fit for our day, does it?
7. Cost Function Example: So let's try to do an example on course function. Right? So we're going to do an example on cost function. A very simple example. So first of all, I lived drawer graph here. Andi, since the metrics will be they useful and wait destroy something like this from here to here. Right? So now we're going to take the actual values off these data points, not some random values. So let me mark one in this they make toe and this will be three. And similarly, this will be one. This will be too, and this will between. And now I'm going toe mark some points on this date. If it I'm going to marks and points on in data set, Let's considered this point which is one comma one then newcomer to and then finally three commentary These points are I would have come from our data's it writes 11 Do do is you know what he does it on. It will be having some labels. Let's say the label is label exe on label wife. So I'm going toe. I'm not going to write a level exploit. You can actually just find it out from there All right. So we have a graph now here. And these are the points. Data points. Now what we're going to do is they're going to assume some values on state A zero and did a one beer. And then we will find the cost function using the formula. So let's take our first case. In first case, I'm going toe take Peter zero as it zero point fight. That means that the slope is 0.5. Next stick data one A zero. Because we don't Morning intercept right now. Right? So if he does, Ito is little 0.5 feet. Ellen is eagle. What will be our hypothesis function? He does. Zero means 0.5 x plus zeal. So it will be equal to 0.56 So if I will put one in X, I will get zero point Fife. If I will put so I will get one. If I will put three, I will get 1.5 selects plot these points and then we will be able together we will be able to get a line. So this is the point, Fife. This will be one. And then we have 1.5, it will be here. So know what I'm going to do is I'm going to draw a lying which will pass through all these points, right? Something like this. So I've taken some values off zero and Peter, one as zero and 0.5. And so this is according to us, this is our hypothesis function edge off X. No one willing to do is we have to find the square distance, right? So what we're going to do is we're going to put it in this formula, the cost function formula, to find out the cost when we have the schedule. Six. So what would be the points you can see here? The distance will go from here to here from this point at this point in this point of this so easily because this is a graphic and find out these distances. So what are these distances? You can see this distances would be work one minus zero point Fife. So Edgell Theater, eggs. Why I So if Isaac goes to one, which will be this case. We have Hegel. Two dykes as zero point fight minus one square one by N. M. How many points are there. We have three points. So one by three, and then we're going to take another value. Life, Jack. All right, So one important thing about life, Jack, is uh huh. Is that after the end off this video, I will be taking the life job on. So if you have any doubts or anything you want to ask, you can actually do the life, Jack. I will be taking all the questions in the life, right? So because I don't know what more to bring the flow off this whole tutorial. So let's start no. Where we have lift off. Right here. This is for our first data point. Now, let's consider the 2nd 1 Well, the 2nd 1 we have one. And do the distance were calculated as to minus one square. Sorry. One minus two square. And because this point has a value one here and this has do here. So one minus two squared. Plus, what is the final value? It is 1.5 and three. So 1.5 minus three square. This is the This is the minimum or the mean square distance. Right. So for this particular case, when we're considering theta zero as 0.5 and Tito one as you. We have this cost function when important thing about cost function is that says they're going toe right and garden to find the best cost out off all the cost they will use declarative. We will try to evaluate this cost function and says it has squared on that The negatives will make the stool come here and says that you will come here. Then again in the graph, it will be difficult for us to plot such a slope because elevated gives us a snow. So in the cost function we add, we divided by two No, si dividing by two him does not, uh, it changes the cost. But it does not mean that, uh, cost has changed for particular gays. For all the costs that may change. And since cost is just a comparison value, it does not matter if we divided by, um or not. So here I'm going to multiplied by do so what will be the answer here? One by six. Then this is your 0.5 square, and I have already calculated this answer. Uh, this one actually give us 0.58 right? 0.58 is the cost. When we are considering, he does eat as little 0.5 and you know one US deal. So now let's consider an ideal case, an ideal case. Now, in the ideal case, we're going to write an hypothesis function that will pass through all the actual later points. So in that case, we're going to resume its zeal on didn't want. So I will assume Peter zero as one and Tito one as you. So the slope is one. So if I will try to go a graph with these values I know that your effects will be Tita zero off. Explicit data one. It will give me X for each off X equals two x will give me points one comma, one toe comer to three commentary. I'm observed that if I tried to go ah, hypothesis function with me. Just change the color for the ideal case. It's a green. So now we're going to grow a line here. Now, this is the ideal days. This one is our identities. Why? Because this case in this case, the hypothesis function of the Linus passing through all the data points Now, since it is passing through all the data points, it is best fitting over Duda. So let's see what will be the cost in this case in the ideologies, in the ideal case be are going to again find the defense off this and this given observes than the cost. When you will apply this formula this calculation, it will be zero. Why? Because this will give me 01 minus one is a little. This will give me zero. This will get me zero. So 0.0 it's all of you. What is the cost? And the idea is, in the ideal case, the cost to zero. And in this case you can see the cost of 0.5 it. So if someone asked you out off this line and this line between is a better one, then you can say what I have conflated The cost and the cost is minimum in this case. So obviously this case is better. So I'm going to use this case. In this case, the value of theta zero is one and they don't want is equals to zero, and the function is H X equals two X So that's how we try to find out the value or try to find out the best fit for today, does it? That's the whole book. So now we know how to cover the cost. Now there are only two cases in this. We have only considered two cases. One off them was like So it is easier to cover the minimal most. Then it's easier to find the minimum off all these courses off all these costs. Now, what would happen is we have different Peter zero and Tito in value because we have arranged off these spare CDA zero Tito so there will be different lines. I'm so there can be line like this or in this all this because there are video stayed on zero and Tina wants. So how will we be able to calculate the cost? The minimum loss out of all of them? How will be able to find the minimum cost? The answer is Grady int. Listen
8. Gradient Descent Linear Regression: just understand Graydon means slope and listen means going down. So if you have a hill like this, so distant means going down assent means going up. So what we're going to do is we're going toe, not this cost function. We're going to try to Lord this cost function on a graph. No, See, this cost function is not a simple function. It it is taking to perimeters. So what will happen is is there trying to go? Ah, now off this, I will a cute, greedy and listen. So if I try to draw on the grounds, it will be a graph on. You will have to analyze this in treaty. So this line will re present the value of the function. J Data zero co Montego one This valley view then since we have to get a zero and Tito one also. So here I will take another access like this and another access. What you don't want like this this accessing representative does Ito in this accessory presidents do You don't want No Remember this entreaty? So for every pair off Peter zero in feta, one with Bill. We have the formula to cap it. G 80. Does Ito indeed? Oh, so let's suppose there is a point on this plain on the plain, which is this one, not this. So that's supposed the take. Any point to he does. He didn't do you? No one. Then on this answer is you will have to consider that it is actually coming or does this plane. We will have some value off J data. And for all of these values, they will try to plot all the points. So then you will try to plot all the points. You will get some graphs like this. Now you can see here on this access we have to Drozdov happy. The one and the values. This these are points, there are actually points. And when you connect all those points it will get off figure like this or like this. It can be anything. So what are these? These are glass off cost function. So what is the idea behind Grady? And listen, the idea behind greedy and descend is very simple. We tried to pick up a point. Oh, no autographs, for example. I have taken a point A and I got laid. What? I take another step and I take a step which is known as Al Afar. It is the learning rate and then I find the radiant or the slope off the cost function. So in duty, you can visualize this us that Suppose we have a graph like this and goals from, let's say here and it was like this in the started graph. So what will happen is needed. Pick a point that said, this is our point day and then we're going to take a step at a for remember we're going to multiply this Al for which is the learning rate because we're actually taking some learning steps. So after multiplying al far, I will get some point here. Like so after multiplying, Al, if I will get some of the point here and I will calculate the slow off this line. No, you can see that this algorithm, Brady and dissent a welcome will only work when you're going downwards. So I felt every step so you can see it is actually going down. Then we're going toe take a step, which is the Alfa Step or the learning rate. Alpha there again going to take that step and then we're going to go down, we will actually see the brilliant descendant. Bottom off. How Actually this is working on just showing you the institution behind all of this. And we're going to do this again and again until and unless be reaches a point, reaches a point. And when we take the slope and the slope is not Busan or I can see ascending. So at that point, we're going to stop and we're going to return this value, and we're going to say that our this value, this is the minimum value of a data zero and Tito one will give me the minimum cost. That's the basic idea behind greed in dissent. Now, learning rate is important to understand. The choice off learning rate is important. For example, if you take very small steps, let's say you chose that I'm going to take a very small value off Alfa. So let's suppose we around one day and you're saying, But I'm going to take of a small value overall for let's say like this. No says this value will be so much small. The step will be a lot smaller. So it will be. It will be It will take a lot off time for this greedy listened and going home to reach through the minimum value you reach to the minimum value. And if you cried, take Al for too much large. Let's say you take 1000. As for let's say so, What this will do is from this point it will jump toe this point and let's say this is Al Fight was 2000. Then again, it is going to take our first steps. No Alfa step is 1000. So it will overshoot the minimum value and it will go somewhere here because this is the out of body because they are multiplied thousands. So it is. It will actually miss this point, so it will overshoot. Then again, we're going to take Alfa Step. It will be this much so women reach here. Let's say after an Alfa step in each and now we know that the slope is work. It is no descending and we have not reached the minimum yet And we are again going to go down again. We will go here. Let's say this is Alfa. Then again we will go hell which is here. So now this over shooting. This is known as over Shooting is actually a big problem and even taking a value of file for which is very small value, it will. It is also a very big problem in all these cases. These are the problems that the grading descent In the next tutorial we are weighing toe see the ingredient descent and bottom the radiant descend. Glad it them. So I will just write the expression toe, find the grading distant, avoid them and we will explain that this is the working behind off raid and isn't so We tried to find the bottom me to see. Does it, Oddo? One you can see in here in these graphs from the you girls, you can say B is a minimum point here and here you can get the minimum point So you just try to go down, down and down by taking no add four steps or the learning rate. You take steps to go down by taking the slope. So, since we were taking the slope and to capture the slowly will have to take on the derivative off our cost function. And that's the main reason why we have under to him. So that's all the idea behind the cost function and reading. Listen, we will see great and avoid, um, this and avoid come in the next to do it.
9. Gradient Descent Algorithm: And now we have this access which represents a pair Tita, Zito, Coman, Tito one So know what we're going to do is now the grounds the girls will be something like this and our point A is lying here. So what I'm trying to do is I'm trying to visualize this graph and these points in two dimensional space. Actually, this is in three dimensions base, but I'm considering I'm just for the purpose of visualization. I am using a duty space, right. So there is a point B. Now, this point B is the minimum point. So from point A, we will have to reach to the point B. So then only the grading descent important will stop, and it will say that Okay, we have no fine phone off the minimum value. So the question is how deal guarding them, that we're going to write all the one of them didn't know that B is the minimum point. So this is a very important cushion now at point B, if you try to paint the slow, it's a five point B. I tried to take a slow pill, and slope is nothing but a ancient going passing through. Only at this point you can see in this line if you're trying to change Tita zero on data one, there is no change in the cost. No change in the cost at this point. So it means that if de by de Tita off J Tita zero comma theta one if we're going to find out their differentiation off the function at point B And if it comes out to be zero, that means that this means that we have got the point B. So that's what created Descendant Goitom uses. So it will go on repeating until and unless the differentiation of the cost function becomes zero. So we saw in the previous tutorial on so that if we start from a we're going toe take Alfa steps. So let's say there is another point here and I am no taking an Alfa step now. Al Far is actually learning. Read ratifies the learning rate. So what happens is the radiant algorithm will choose a step off al Afar and it will go to this point and it will start using the step l for again and again until and unless it reaches the minimum point and How will it know that it has beens? The minimum point is that at that point, the differentiation or the slope is zero. So that's how essentially that's the institution off the radiant descent L. Martin. So now how can we design such type? Often involve them? So let's take a look on Dilworth. And knowing bottom is very simple. So inviting will work or it beat. I'm going convergence, right? So convergence means you can see that if at this point point b the cost or the golf is not converging. So if it is not converging, the grading descent and world on belong that now I have to stop so it will keep on taking these Alfa steps at each iteration until convergence. So that's why Everton until Convergence e So basically the in bottom is off 19 here and then more than say, is that Peter G. And then we used this symbol. This is not equal symbol. You will have to see the school in here. We will also discuss what this symbol means. He does G minus now here. What I'm going to do is I'm going to write the partial differentiation off Pita J. And here I am going to write the cost function, which is J. He does eat local monthly no one. And this whole expression, this whole expression is multiplied by the learning rate, which is Alfa. So this is essentially the greedy in Busan and boredom. And so we're going to see how this is gonna come. Word this expression is so we will see how this expression will be able to give us going. Be All right. So let's start with this. So what I'm going to do, I'm going to again draw this graph here. Andi again, Every rock, golf late this. And let's suppose we have chosen a point a here, randomly. And this is over function Jadida 001 So know what we're going to do is they're going to find out that at this point, if I tried toe, project it on this access, this will give me some valuable state a zero comma theta one. Right? So, actually, this isn't three D. So if I tried to project a on the this plane, I will be able to get the pair to turn zero comma theta one. So know what I'm going to do here this? No. We will have to take Alfa steps to reach this point. Right. We will have to take Alfa Step to reach this point. And this point, if I try to project it on the access, it will have its own. Their all state A zero and Theodore. Right? So we want to design an Illinois item that will try to go from here to here. That means what we're actually trying to do is obey the value on state at zero and Peter one. So this value, then only we will be able to move from this point to this point. So if you can see here at this point A If I tried to take a slope which is nothing but a change in going through this point Now what will happen is at this point if I'm taking ah a slow So if I'm take a slope now I have our direction. So what I will do is I will try to take the slope which will essentially first give me the direction where I will have to go. And then we're going to multiply this direction by wonderful. Now, remember, since this function. This cost function now says this function is having do para meters. It's not possible to differentiate it so we used partial differentiation. So this is essentially you can see this is the symbol is off partial differentiation. So what we try to do is we tried to partial differentiate of course function. First, we will take data zero as the variable. So if there are any other Ah, the able to inside this course function we will consider them as constant accepted a zero. And in the next iteration I will take it away. So now what is G here? What is G in the same point? I'm Sergey are the values zero comma one. Now how we have come up with this one, which is why J zero comma one is that we're having to para meters zero comma theta one. So you can see it if I want to go from this point this point to this point, I would have to orbit both the perimeter state a zero indeed a one. I would have taubate both on them. So if I will abate both off them, that's why I will have to run the inequality on 14 0 as the LST don't So this screening Busan and more it, um it basically cries to do nothing but a break the values off Peter zero and 201 Because, see if you will be able to obey the values off zero and either one then your work is actually done because you will get the next point as the next step. Right. So that's the vibe you're using. Jake was 20 comma one no partial differentiation. What it does. Uh, no. Let's expand the school a bottom todo further steps But Peter Zero and Tito one. So I will again right here that repeat until convergence repeat until convergence. So in the first iteration, we will get J equals 20 So it will be Tita zero. Then we're going to use the symbol and then again later, zero minus Alfa times. Partial differentiation with respect to deter zero. Which means that inside our this cost function, we're going to consider Tita zero as the variable and all other variables will be considered as constant Now in the next iteration. Believe house, you don't one and we're trying to abate, right? So this is basically abdication. And that's what Graydon Descend does. It updates the value off Peter zero in either one. Because if you will obey those values, they will be ableto get to the next step. Then again, we're going to write al far powerful differentiation. This time it will be, Do you know one and again we will do this. This is our cost function. All right, So now, at every hydration at every step, we will try to obey Peters adamantly. No one. One important thing to note about grading descent. L bottom is that the abdication off Pita J can be the zero, and Dedo one will be a simultaneous I'm David. It will be a simultaneous update. So what does this mean? This means that the bottom this great and decent and wanted, um will obey the value of theater Zero and Peter one on the same time. Or out the same instance. The reason behind this is that if you will obey Tater Zero first and then you will obey Tito one. Then you will get wrong coordinates. You won't get this point, right? So at four step, I want to get this point, so I will have to abate as zero and either one like this. So what is this symbol? The latest Trying to understand the symbol. Now let's say I have a value equals 23 and I write a equals two a plus one. So what is the value of a No So I will put three here, so three plus one become four. So the value of fable become four. This is the use off equal sign. But instead, if I've been right equals three and A and I will use the symbol a plus one. This means that the value of a will not change. Instead, the value off a plus one, which is four, will get stored in a temporary variable and Abel point to this temporary video with so similarly here you can see that we're not going toe abate. You can see in this be or directly updating the value off a so instead off using equal slain were not directly a bidding. What we're doing is they will store the result which will come from this expression here, this whole expression and the story in some temporary value. Similarly, we will do this with Tito one so after the first step will simultaneously obeyed the value of theta zero and the other one. So these two values and then I will get I will get this point here and then we're going to repeat this again and again together. Next, further points, unless we will reach, be. And that'd be it will not convert. So the rating, the central fighting will stop. All right, So since we're studying linear regression, we have the cost function. So now let's see what will be the result If I will again try, um just evaluate the differentiation off the cost function by putting it from on the this equation equation one. I will put this equation. I will different shooter, and I will put it in this in gold. So let's try to do this. So no repeat until convergence No, Vida zero will be here. No change. Then again, he does zero minus. Then Alfa is the learning rate because they want to go. So this point we will take out four steps and then we have this one. So let's evaluate this particular defense shin, which is the partial eventual right. 20 days it goes to zero. Valentina is. It was to Tzeitel. We will have the expression. He does evil. And I'm going to just substitute the value hair which will be won by toe end. And the function was Sigma Etch X on edge teed off X minus. Why? I like this and the sigma will go from I equals the one l m. And there is also a square here. So what I've done is I've just taken the value from here, and I'm just putting it in this place, this one so it will get this reserve. No, if I have to does Eagle now, this is a hypothesis function s either excellence. Expand that also he does it'll again. I will write one by, um, this is not equals. This is bracket here then Sigma I was the one. Dylan, what is our hypothesis function? I'm going to take it from this equation. You can see here. This is our hypothesis function in case off linear regression. So, actually, what we're doing is we're trying to Catlett or trying to do this algorithm on linear regression. So in case you have some other golf, which is not a linear cough, so you can what do you What you will have to do with your and quantum will remain same. The cost function will change. The hypothesis function will change. Right? So let's put this hypothesis function here now. This edge Dita eggs was actually teeter zero x plus. Either one and then we have minus by ice. Quit like this. So since we're doing a partial differentiation with respect to theta zero, we have to take a zero here. Sapyta zero will be a very able and all other. All of the other values will become constant on differentiation. This square very come here in multiplication and it will basically cancel off this to here . And then we're going toe substitute. We're going to differentiate this dung, right? So why are we doing this? You can see if I want to calculate the differentiation off X plus one square. So first I will write it as two x plus one, differentiating this part and then we will differentiate X plus one. So the differentiation or fix this one is one. That's what we're doing here also. So now if I try to find out if I know. But this will be a constant so it really value it to zero. This is all so constant that this is also zero. Now we have a home here, Peter. Zero x he 00.0 x and I try. I will try to differentiate it. So this will give me the value eggs because the differentiation off Theatre X will give me X. So if you have, let's say two X and you want to find out the differentiation off this value, it will be It will be too right. So X is the variable. In this case, you don't notice the variable and X is a constant, so we will get eggs. So this whole come here, this whole come here will be a pendant here as now I'm going to write here one upon n then Sigma I equals one l m. And then and then I'm going to write edge Tita eggs minus why I into this value, which is X. And since it is the Iet sample, I'm going to write X ire. So in case off Peter Nord, we're going to use this in Britain to obey the value of theta zero and the story dinner. I'm pretty variable and similarly Let's calculate it. What will be the value off Peter One. So, one evaluation. What you can do is you can actually try to evaluate this. No, If you're considering Tito one here now, what will happen is only Tito. One remains here. So this will become zero and this will become zero and Tito. One differentiation off. No one will give me one. So the whole number remains saying like this. H d D I X minus y I. And now we're going to multiply it by one because this villain, Tito one differentiation gives me one. So my deployed by one is nothing but this resort so this reserved here. But you can see you can see this reserve here. This is the grading descent algorithm. This is the brilliant listen l mortem for linear regression. So for lean irrigation, we have this algorithm which multiplies which changes the value or a base value of theater zero and Tito one. And it keeps on doing this every time. It keeps on changing the value of Peter zero and you don't want and remember, it changes simultaneously, so you will have to obey the values off the zero and deter one simultaneously so he doesn't know will abate on. You can see I have captured and the partial differentiation here. What you can do is as a practice assignment You can actually try to do this on your own. Just try to put the function the cost function here at this point and try to do the partial differentiation with respect to t d zero and d don't want you will get this reserve that this is our final result or waiting descending Goitom And if it will try to beating this, then we will get this point b So this is what reading the central mortem is and basically you can use it for different types off course function. So let's say you have an hypothesis function which is not a straight line model X a X up golf I So let's say you have points something like this. So you want to grow Ah hypothesis function which goes like this. So this is estatal fix. So the equation off this gulf will not be this that will change. So we will put that in here that really changed the cost and what you need to do. You need to just put this cost in this place and try to calculate the defense shin and get the result. And in this manner, you will be able to Catherine the greedy int descent or the minimum value and this cove where the slope will be equal to zero. So de by the A, the device will be equal to zero at this point, right? It's only have covered sofa. Why is learning unsupervised learning? We have covered linear regression as our first model. And now finally, we have found the minimum value. And now we can put the value of theta zero and t the one in our hypothesis. You know what a hypothesis, and we will get the best fit for our data set.
10. Linear Regression in R [IMPLEMENTATION]: Hi. In this tutorial, we're going to implement linear regression in our So we're going to implement the linear regression machine learning mortal in our. So we wanted to discuss briefly about what exactly is leaning immigration. We have discussed cost function and ingredient descent. So now let's get started. So there are some steps in this tutorial. First of all, we don't load our on our studio, have the show you from where you can download it. Then you will have to install an important libraries. So basically, there are some libraries in our which gives us some important of features that we can use to perform lean irrigation. The third step is to load a data set, which is the housing prices. Don't see if the fight I will give the link to this file, which is housing prices. Not seriously. It is a data set on. I will give the link in the description below. You can check it out. So we load that day, does it in our our studio. Then we will build a linear regression model. Now we're going to use a pre defined function. We have a nimble function in our which can be used to build a really nutrition Mordor. Then finally, we're going deplored the linear regression model on the graph and we will then start our predicting. So now let's get started with our studio. And let's see how we can perform the operations. Teoh Linear, to do the linear relation work. So first of all, you will have to go on in the browser. And I peered down Lord, I So then you will go in here. You can see there is a link here in the first. A link which is down Lord, are 4.0 point zero for windows. You can go here and you can download the dot txt file from here. You can see that it will actually you direct. You do, though you can see this link or download our for point. Just click here and you don't know the door. Txt file and run it, Andi, basically, it will create. Oh, I know is that it will create a development environment for you where we can actually tired the strip. But I would strongly suggest you to use this idea which is the our studio. So download our studio for Windows. Remember first you will have to install are at will install all the tools related to our and then you can start the our studio. And here, If you will take here, it will show you some benefits. Off are also some osteo deck stop and some virgins. I just use the our studio deck Stop free virgin downloaded from here. Right. You can also download it from here and you can see it also shows the steps that first it requires are you will have to installer and then you will have to install the our studio. So after installing the our studio on and installing our you will see a screen something like this and what we have to believe. I know what you can do is just go on new project. Let's create a new our project. I would write our project here, So let's name this project as policy prices lean you right? Since we are doing immigration and play can't create project, so now it will create a project for us. You can see it Move. Create a console. Andi, if you will click here, it will open up an anti titled one. Right? So just save it as Hello, Lord, Are this saving? And now you will be able to write some scripts here. But most of the work that we're going to do, we're going to do it in console now. The benefit of using our studio is that you will be able to see the some variables here. You will be able to see plots here on packages which are there inside your our, um from on. And there are a lot off things that you can actually see from here. So there are advantages off using our studio. All right, So listen, Oneto, our next step, which is to load the day, does it now. I have given the length of the data. Sit in the description. What you just need to do is go on file on Dhere is an important day. That's it. And here you can see some options We will click on from next base. If it will Blake here, it will open up. You can see it will open a our file explorer So and is going to locate. All right, so we have this CST file housing prices. Just open it and you can see it will show you Ah, dialogue box like this that shows the data frame and the input file. This is our input file, and it has seen it. It has just You can see this is the way our final will look like it has area. It has price, and this is just tidy. So just like on import. Now, after taking on import, you will be able to see that here it has opened the housing prices, which has area and price. Now I have deliberately chosen or data set, which is which will be ableto give where we can actually perform Lee nearly gration. So I have deliberately jewels in this type of for data set. So after important that this data said he will see that in the environment in the environment. Tom, if you can see here, there is a new selection, which is housing prices. Energy says we have 4 98 observations on three variables on we. If we double click here, we can see that there are three variables X area and price. And there are 4 98 records off this. All right, so now we have successfully done our loading off the data set, which is off housing prices. The next step is to, ah, blood. The later and they're also let's just try to plot this data. Just Lord, the later on housing prices, I will go on the console. And if you want to plot something, you'll have to use the Lord function and you can see it shows me blood. And it also has a very good you concede documentation type. You can see here it is generate function for plotting off our objects. All right, so we're going to use this plot function in the plot function. You will have to provide the X and the Y axis. What? You were going to plod the points in a form off graph. And we know that a graph has x x axis and the Y axis. So in the Y axis, we have the price. So I will write rice here. Then I will use this operator here. You can see this. Operator, this is just on the left hand on the right hand side. Off the one number. You can see this operator here. The price is actually here. You can see in the data sit that price is actually why? On on the X axis, the house area. So I'm going toe. No right idea. So now we were going to plant this. Now we will have to provide it with the data. So data equals the data. Is the housing prices No, Remember, if you haven't loaded housing prices and it isn't occuring in the environment here, it will only be able to understand what exactly housing prices is. So here I will, just like Endo. And now you can see in the blood stab. It will block all the points that we have in the housing prices data into this class. And you can see this is a price on the Y axis. And on the X axis, we have the audio. All right, so that we have been something we could hear. We have plotted a graph. Now, what we just need to do is we need to do linear regression and find and plot a line here which will best fit my latest it. Let's see how you can do that. So the next step is too big a mortal, and what I'm going to do is I'm going toe create. Oh, very able and on the school model, which is basically our Lillian model. So let me just clean this up Now. You can see here l mortal on in this variable. I will use this operator. This operator is used to push something into this variable, right? We don't use equals operator in our we use this operator. Now, what I'm going to do is I'm going to make us all forbidding function, which is l m right. So if you will, um, just type L m. You can see it shows, though, that this is the for the index off the limb function. And Ellen function is basically used. It isn't in will from communities used to build over leaner model. So what do we need to provide in this function? First of all, we need to provide the Y axis versus the X axis again. They're going to use the same operator here. You can see this Alfredo on. Then again, we will provide the data, which is the housing prices. Now, what this line will do is it will build our linear model on. It will push all the details, including the intercept slope on all those residual errors in tow. This leaner model, which is the El model variable. Right, So let's see what happens. So after clicking Endo, you can see here it has actually form. What do you see? This? Ah, an environment. Variable on. This is a leaner model, which we can see here on here. You can see these other details coefficients, which the seed drills effects. So let's see what this linear model has so we can use the somebody function toe, actually see all the everything about this variable. So some rielle mortal gives me this. You can see here that we have some intercept kill Estimated 0.4 27 Something like this. We have a video here, some standard errors. 1,000,000 minimum and maximum values are here, so it will just using the summary function. It will give you all the details that you required in the linear model on the linear relation. Right. So the next step is that No, we have big Oh, we have a l model, which is basically our linear model. Now our task is to blunt this lean immortal on this graph. Right? So we will have toe floored the linear regression line. So for that we have to do some over. Plotting over plotting means this is the graph and I want to plot a line on it. So it is known as over plotting if I'm plotting some more things on the ground again and again for that we have to use the library known as Ah G plot library. So first, what you have to do is you have to running this command install dot packages and inside this you will have to use quotation marks and you have to provide G plot to Right. So this is the package that we need to install Toe Blart. Ah, a lot of things on the draft. So after clicking next, you will be able to install this package on and since I'm already install it, I'm not going toe run this again. Now what you need to do is after installing use the library function and just IgG plot do here so important this package in the environment, right? So if I click next, it will be ableto low Jeep Lotto in the rough. Sorry in the environment. Now you can see here if you go toe packages here on. If you will scroll down, you can see that G plot to is no selected. So instead, off writing this, you can actually just check this like it will automatically do it for you. All right. Sonali house, The deep floor to library selects used toe plot the linear regression line. So first off, all I will. What I will do is I will create up variable, which is the housing growth graph and housing dot graph is actually the Grand Star will contain all these points, plus the blotting off the light. Right, So this is a variable and willing to push the photograph in this variable housing Don't graph. We're going to use the GDP plot function. And here you can see that this allows you to on your own methods for adding customer next door DD plot. All right, so in this function, let's see what we have to provide. The first perimeter is too provide the data sick, which is the housing prices. The second step is to use E s and eight years. You can see here it is a aesthetic mapping and describes how a wills in the data are mom. Individual properties or aesthetics off. Jones eso genomes is basically the points that we were going to plant on this one in on this graph on aesthetic mapping, Czar basically used to map or customized this whole ground. All right, so we're going to use this A s function in this little provided X equals toe area on my wife equals stool rice. All right, so now what we will do is we're going toe plot all the points. So we're going to use G e o M. Underscore point. And you can see here. This is the function that we're going to use. And what dysfunction do is the point. Geum is used to create scattered plots. So basically, these are the scatter plots. Right? So we're going to use this. All right, so after I will hurt enter, you can see it has created a new variable here. We just housing Gruff. Right. We have a housing graph variable. So if I will try to plant this housing rough selects, try toe. See this housing dot draft If you will breast Endo, you can see Noah tasked with you. Dude. A scatter plot, right? Just like the plot function. But Now we are going toe. Since the plot function does not allow you to plot, do the over, plotting off a line and the scatter plots, that's the main reason why we're using G plot. So after using this line, we can actually open house indoor graph, which will contain all the information that points to this whole graf on the y axis. We have price. And here we have video. So now we have to applaud the line, which is the linear regression model on this DJ plot allows me to do the overlord until explored the line here. So in order to know the line, we will have to change our housing dog graph. We will make some changes to this ground will push some changes in this craft so the draft will remain same, but we will have the line. So housing dog graph plus, So how single graph is this craft? Plus, we're going to add the line, right? So we will use the g e o n underscored smooth function. You can see that this it say's that this function. If you're creating a new genome position or scaling on the package, you have to do this so they use dysfunction to draw the line. So here I will just provide it. When the para meter, which is met her, equals two linear model, which is Ellen. We will have to provide this, uh, first para meter in this function. Jeon smooth to push the line in the house and grounds. The 2nd 1 is the color, which I am going to initialize initially as ah, let's finish neither With big Right. So what we're going to do with housing graph will remain same. Which is this cat applaud will remain same. But we will push your red line off linear model on this graph. It selects it, Ando. And it will make some changes in the housing graph. Variable. Right. So let's see. What is our grass? No. Let's see how our graph has changed. All right, so now you can see here we have the scattered Lord as the last. We have a red line here, which is off Molina mortal Ellen. They were used this jeon smooth ah function to draw this line on the graph and the color is late. So that's how Meghan perform linear regression in our studio on in this manner, we can actually blogged the whole linear regression line on this graph. The last averages still remaining is to make predictions. Right. So what I'm going to do is I'm going to choose a value off area and then I will use my mortal to test what will with the value one. What will be the price off the house with a specific area? Now we have an inbuilt function. Predict to do that. The first thing is to provide the linear model that we have trained before. You can see here we have trained of Alina model So I've feared though console before. But we have been the l model we have printed. Oh variable. Here you can see this variable which contains all the information on for leaner model. So we have actually built the linear model. Now we have toe do the testing, right? We have trained our Alina model. We have to know do the prediction. So in the predict method, the first thing is to provide the linear model on the second step is to provide aid with the new data. Right. We're going to use Gator don't sane function to provide it with new data. So the new leader is the place where we are going to provide over area, right? So I warned the area. Let's say it goes to find So this means that after running this line, what I'm doing is I am in the production function. I'm saying that Okay, this is the leader model that we have trained. Now we have a new data frame which is area despite so we have a house with, let's say, five square feet in area somewhere around here, you can see that. And what we're trying to do is we're trying toe. Ah, urinate on. Though Price, at this point on this line, we will just do the projection on the y axis. All excel. It's tender. So now you can see that it has displayed 3.773398 So this means that if we have a house with area five square feet, the price is $3.77 which is somewhat around here. You can see in the graph also. So that's how we do the testing and training on Fellini immortal. And that's how we plot a linear regression line on the our studio. So that's all for this Trigoria Times for watching
11. Multiple Linear Regression in R [IMPLEMENTATION]: Hi. In this little real we're going to perform multiple regression in our studio. So we have already discussed Multivariate linear regression. Previously in this tutorial we're going to pick a problem related to multiple alliteration and we're going to solve that problem in the our studio using our. So we use our toe solve the problem, the multiple regulation problem. So let's start with the problem units, see what the problem statement is. So, first of all in the problem statement, you were given a data set which looks something like this. You can see this is a Gardez does it? It has something you can see. There are some labels here. Now, in the first column, you can see there are cars. So there are different, uh, models on the cardinals. So you can see in the second column we have different model off cars. For example, if we have commerce cities, we have a mortal eight loss. No next column is the volume No. One. Liam is the fuel that it that the particular car off a particular model actually consumes the weight is the next column, which defines what is the weight off the car and Finally, we have CEO toe, which is basically our production value. We will predict the amount of carbon dioxide released by a car with some particular bait and some particular volume. So see you do is basically give. These numbers are basically given in terms off grands. So what you have to do is you will have to build a hypothesis function performing immigration on you can see we have multiple features, which is the volume and read. All right, so, uh, in this feature, you can see in this type of problem, we will have to build a hypothesis function. Now we know that the hypothesis function in gaze off two variables or two features, which is X one and X two will look something like this. So x one will be representing the volume on. Finally, the X two will be used to represent the weight and hypothesis function will give us the all put off CEO do right and particularly in grams. So in the problem, let me read the problem statement for you. So consider the data set, which we already have. The card A does it? I have given the card. It does it in the description off this video or in the resource is stab right so you can download it and we will be. We will use that in our studio. So of your given. And he does it. Gardez does it. But you have already seen here. Here you can see this is our Garda does it? Which is all of us. Thanks. So I just close it now what we have to do? What? The problem statement is that we just write the problem statement for you Now The problem statement says that you have this data said you have to use linear relation with my people Variables to predict. So our objective is to predict And what we need to predict is, uh how much grands, How much grams also carbon dioxide. How much grams? All sealed o and it will be released How much grands off CEO will be released or kilometer like so if there is a God and it is ah covering a distance off one kilometer, we have to find out we have to predict from the given data. Is it that we have that How much grams off pseudo will be released for that kilometer on obviously will be given someone human bake. So here, in the question it has, it is given that it will drive where the given card has one point three leaders off engine night. So basically the wall you miss 13 needle zeal. And on the 2nd 1 is the weight. So the blade given is 2300 Easy. Hopefully so given of aid and one you we have to predict home it Carmen outside will be released. So that's the main off the table off. Multiple linear regression. We have more than one feature, right? So let's start with the, uh, main partridges off implementation. So I have my studio installed here. I would go to the fight, so I haven't Goto fight on here. You can see we have an option input data set. And here I will go from Excel because we have a CSNY file. So I will click here, okay? And it will open something like this. So you have to click on browse and go do the find that we have. So I think I have started in next door. And here is the Vegas it right, so you can see it assuring me this day desert, and it is retrieving the data. So it is showing me that we have this data here. So now what we need to do is finally just click on the import, but in here, and it will import the file for me. You can even do this on the console. Here. You can see these are the commands to do that. But that will be, Ah, very tough task. I actually even radically do that. So now, after important, you can see in the environment you can see there is a data set with the name card. It is it. If you will double click here, it will be. You will be able to view this data set here. All right, so now we have our data set. Now, the next task is still predict the to start with the prediction. Right. So here in the console, I am going to create a mortal, and I'm going to update this mortal with the a limb function. Know, Ellen, function we have already seen in the implementation off linear regression. Simple in irrigation in our that l m function is used to perform the linear regression so if I will provide the value off X and why it will be able to build a model out off it. So we're not explicitly need to run any great indecent and welcome or build the cost function. It will automatically do it for us. So in the leaner, more Ellen function, we first need to provide it of in the vat New by. So you can see this is my cardio does it? So I know that the value issued. Oh, no. What I will do is I will. Right here. Gardez does it. Then I will use a dollar single which is used to represent the column. Right. So the Pipala, Miss Seo. So basically, this is my then I'm going to use the symbol. You can see. This is a symbol that we have previously used also having used this symbol. And then I will have to write the features here, right? So simple linear relation we were having with one single feature which was eggs. Here we have multiple features, which is the volume and great. So what we can actually do is inside all this function, we can simply die on the first feature is the William right on. I just used the plus operator and then guarded. That's it. No, no, We right. So what we're doing in this line in this particular line but I'm actually trying to do is I'm trying to build a linear model which has on the by axes it has carbon dioxide off the card. Is it on? The features are a volume angry so no, what I want, I will do it. I will have to provide it with the data which goes like this data equals which is the card . It does it. So now you can see I have created a mortal. And now I would just press endo and here, in the environment of you will be able to see the mortal right. So you can view this mortal that it has been some coefficients in it, right? It's still after building the model will have toe see the somebody off the model life. So they use the summary function and I will provide more than inside it. And I was best endo tonight hissy. There are a lot of details that is given to us now. Remember that if we're performing the linear regression, multiple linear regression we're interested in intercept, which is the value off Futa zero. You can see here. We want to find out Pita zero. Like if I want to do the prediction. What are the things that I need is I need intercept? Then I need this value you don't want. Ah, no, I need the value Teeter too, right. So I need these fever lose because while human weight is also is given inside off the question on do we just need to find the value off these theaters? It'll data want Peter to, And then we will be able to predict how much grams off carbon dioxide will be released. Kilometer. So there's somebody mortar you can see in the somebody We have all these e the intercept here. You can see this is the intercept here. 79.69 694719 is the industry which is the value of theta zero. Then you can see we have ah value here. 0.780 fine, which is off the volume carded a certain volume. So it is basically the theater perimeter off for loom and you can see in our hypothesis function it is actually presenting to you don't want and similarly card. It does said they will meet you, Toto. So now we have the values off theta zero theta one and theta do so I can actually use my hypothesis function formula toe, Find out the final result. All right, so let me just being the console first. No, here. I'm going to create a variable final which will give us the final output which is the grams off carbon dioxide that released per kilometre now. Ah, level first right, The value off Peter zero. All right, so we will need the summary. So I will use the somebody more than and we will have to use these values, right? So I will create a final valuable on the formula is first even after use theta zero theta zero is this one. So I'm just going to copy it from him and I will place their tail. Then I have the value off Peter one right they don't want is this you can see here. This is the value of theta one. I'm again going to copy and paste here Now. This is a little bit more deployed with X one and X one is the volume and you can see in the question were given volume Ask 1300 So we will multiply this value by 1300 And then finally we're going to use the Peter do about and we dont hear which is this one again I'm going toe won't be this value and I would base to kill Onda. We will have to multiply it with the given way You can see the weight is 2300 kilograms, so I will multiply it with 2300 I will hit Endo and now you can see it is giving me a value So 107.208 files one night You can see this value here This is the grams off carbon dioxide which is needed which will be released, not needed, which will be released per kilometre. Then a car engine has a volume off 1300 or 1.3 leaders on the weight off the cars 2300 kilograms So that's how we do. Ah, our prediction in multiple immigration. Aiken just ah, I just show you that you can also bring the value of final in the council like this. So this is the final output. So that's it for this tutorial. Thanks for watching.
12. Multivariate Linear Regression: so no, we're going to start off with the multiple variables in linear regression. Right? So the very first thing that I'm going toe show you is what exactly is the meaning Off multi. Very. It. So multi, we It basically means that you have multiple features. I'm going to write here. Multiple features. So what exactly is a feature? So now I'm going to take are very old example off housing prices. Very. I have seen that we have prices as the outward and we have video as Thean book, which was our next. So we have We were having the prices and we were also having the area in square feet in a square feet. So, uh, we can take some values off prices and 18 years in it, right? So I'm not going to write the values because we're not going to take a look. Ando values. So in the linear relation tutorial, I I showed you that price is actually the output. And a DEA is the input, right? So if you input here, you get the output as the price. So when we were drawing the graph, the chorus morning to this data said what we were doing is they were planting area here and prices? No. The reason why we were writing a video on the X axis instead off y axis is because a video is independent video, so it means that eight year does not depend on price, right? So if you have ah, house, let's say it is 500 square feet in area. It does not matter what the price is. The area will remain 500 but price is a dependent variable, so it depends on the values off area. So if I say that price is, let's say $1000 for 500 square foot area, then this price is dependent on the area. So area is known as independent variables, so we have another name for area. We can call it US independent variables. So all put is a dependent variable. Now all the independent labels all the independent variables in over data set, and nobody does it are known as features. They're known as features because they are actually featuring the outputs off our data said so now. In the previous tutorial where we were discussing linear regression in one variable or in a single variable when we were discussing linear regression in single very well, we were only having one feature, which was a DEA. So this was the only feature X We were calling it X, and this was why. And we were using I to represent a particular record or training example like this and exit we have and training records. So we have discussed what will be. The hypothesis is in case off single variable, which is a single feature. We've also discussed the cost function on Brilliant Descend. In that case, now we're going to see what will happen if you have multiple variables. So let's say I have one more independent variable, which is that's a number off floors. So let's say we have four floors here than two, then create and so on. So now on the output, which is the price, it no depends on area as well as the number of flows. So in this case, we have multiple variables on multiple features. So that's why we call it as my DVD. It linear regression. Right? So now we're going to see what will be our hypothesis function. What will be our hypothesis function in gaze off Lini regulation in multiple feature side. So we have studied ejects in in case off single variable edges. So they will do. He does it'll explicitly. No one in case on single variable. So I think the motor what should be the hypothesis function in case off multiple features. So in case off multiple features we have hypothesis is as state are zero ex zeal plus did a one x one less data toe x two and it goes on the leader in X Men, they're in. We're in Is our number off training examples Sorry. Number off features where n is the number off features. So we have got our hypothesis function in case off linear regression in multiple variables . No, if you cry you you know there is one more fancy way to reduce this Hold on this whole victim. It said this is equation one. You can actually reduce it in this form, which is deduct Prospal's X so this can be reduced to this. Vititoe is nothing but A matrix is a matrix off the form Peter zero to either one and goes on like data and and they're x the nodes x one x to tell Xn. So we have Peter, we have eggs. You know why we have done that? Transpose all that. I can see the Stevie Presents here the Prospal's. The reason is that you want oh, in orderto open this, they will have to multiply it in time X So according to the rules off um matrix multiplication, you cannot directly multiply this like this. So you will have to take the cross ball so Tater cross pools will become. He does Ito the dough one and goes on data in. And if you multiply this by X now, it will be like this X one extra goes on like this. So now it's possible to multiply this by this No matrix multiplication responsible here. Did Owen will get multiplied toe this also. We have x zero here, so we will be able to obtain all these tongues using this type of my matrix multiplication . So in case off muddy really in integration, we have this hypothesis function. Now we're going to discuss most function in Gaisal. My people features now see the cost, function and breeding descend the algorithms. The formulas will remain same. What changes is the hypothesis function because hypothesis function is actually the growth that will represent on that will be used for prediction. So in case off multiple variables, the hypothesis changes and accordingly, in case off course function which was J off, he does. It'll go, Monty, don't one instance we have any para meters. I write it like this. Now this will be equal to one upon toe. End against signable was from my equals one pill em and then we have We were having edged either x minus y i and this was excite Cool. It's clear that this was our cost function a general cost function. And you can actually put if you will put a checks from this equation, which is, let's say this is equation a. Then you will get close function for single variable. And if you will put x hx t dot transpose X Inside this cost function, you will be able tow a pain. The cost function so simple, right? We are just going to make some tweets and we will be obtaining the cost function. Now the last thing is the radiant visit, and we know that great Anderson is basically used to find out the minimum cause, right? So we just tried to do the take some al for learning steps on. Then we drive reach the, um minimum value by taking the slopes. So, in case of greedy, listen, the algorithm balls, uh, if you remember, it was repeat until convergence repeat until convergence. And inside this we were having tea dodgy, or let me write data zero and we were using the symbol. We also explained the use off the symbol will be dita zero minus l four times, which is the learning read. Then we're going to calculate the partial differentiation with respect to the dodgy or did a zero in the first case and inside this that will have over cost function which is this one And we know one simply find when we try to differentiate it Eso I'm just going to write here. It will be like this in the dead end. So if if you will try to reduce this tome, you can take the partial elevated which will cancel this to hill. And if you remember the result waas so if you will solve that, the result that you will obtain ISS one upon m toe will get good and then we have Sigma goes from Michael's, the one pill in edge d don't also x I minus lie I no x zeal And here I am going to write I So this was we have already seen how wear just calculated the differential. The partial differentiation off this with respect Ochida zero. So we will get something like this. Now this thing here is due. No, this is X legal and against the I hear what this represents is it has a subscript This subscript is basically representing the feature index Peter Index. So this means we have X Zito. Let's say that the number of floors is egg zero and input is x one. So x zero represents the feature in extra just floors and the I represent the Iet training example training example. So if I is equals to do, then we are actually trying to get this value. So this will be the valuables X zero toe right The level Monday. Apply it here Now, similarly, we also have the same Formula 14 01 because in great Anderson we tried toe a bit or simultaneously update simultaneous invasion. So we tried to do the simultaneous foundation off all the perimeters. All the theater perimeters on the formula will be the same. We're just going to right eggs, Tito? No, I is the training example. So instead of writing it in the subscript, we should write it in the super slipped. So why should Britain like this? Right, so x I we should write it like this. Minus why I No, look, now we're trying to get to the X one function X one feature, and I get training example and similarly, we can construct it like do de dyin. And we can obtain the grading basin and boredom and it will repeat until conversions. That's all we know.
13. Feature Scaling: no si feature. Scaling is not a prediction and war it, um and it is not even anymore. Link quickly and we know the feature. Skilling is actually a Met third, right? A method or I should say, a technique, right? So it is a method order taking. So let's see, what is the idea behind? So let's foresee the idea behind feature scaling. So the idea behind feature scaling or the objective or feature Skilling is to speed up Grady int Bisson. The idea is to speed up grading descent. So what is the meaning off speeding of the braid and isn't right? So if I say that I have a graph something like this and we have a golf here and here we are getting the minimum value. So our potato perimeters are going toe continued continuously abate toe the next to get to the next point, and it will don't do that again and again. So it will obviously take some time for the brilliant decent regarding toe reach to the minimum position. No, we will have to find some way to speed abrade entry reason so feature Skilling is one off the method or technique toe speed a brilliant descent. So we only know what exactly a featureless on de Skilling means. We are going toe. Just reduce the range off the features. We're going to take a look at that that sold. The idea is to speed up breeding. Listen, we will see how this technique will speed up that. So how can we speed up the grading descent is by having each off our input values in roughly the same brings by having each off over input. And remember, input is nothing but our feature. So each out, all former input in same range. So what does this mean? So here I'm going to take an example to explain you this. Now let's suppose I have what data said Andi in this day does it. I have two features now. This is actually the price, and this is the area. Let's see. So in India we have, let's say, four I'm going to take, for example, First we have, let's say then then 1000 then 3400 square feet, 500 square feet and let's take one more villages. That's a 7000 square feet. No, if your data sit has features on the range of the feature and someone says What is the range off input? So the range off input in our cases from 10 to 7000. So this range off input is very high. It's very hot. You can see there. There are so many numbers in baby. So that's why the strange is very high. So engagement we have. So there is observation. It is on Zo that greedy in decent and bottom this algorithm off forms slow slowly if range off the inputs if range off the inputs is high. So if the range off the input is very high, then it does not much matter what envoy them. You're using the grating. Listen well. It is an observation that it will start. It will reach to the minimum position in a very long time. It will take a very long time to go to do that. So that's why we're introducing feature Skilling because what it is going to do is it is going toe reduce this range. It is going to make the same shorter by using some mathematical formula and then it will speed up. Brilliant. Listen, So let's see. So let's no just ripe and I'm going to cry to define what is feature scaling. So basically, feature scaling creature Skilling is a method used to normalize range off the inputs, range off imports and instead off imports. I'm going to be more specific by writing independent variables, so feature scaling is the method used to normalize normalized means. We're going to reduce the strange to some very smaller range. And why we're doing this is because it will speed up the radiant listen and that's our observation is so we can also say variables or features off data over data. So that's what is the definition off feature scaling we normalized the range. Now. This technique is basically known as data pre processing data pre processing. Now, if you have a machine learning and boredom, if you want to perform prediction on the data set, you you cannot use any day does it right. You will have toe pre process the data set in such a form that if you try to implement machine learning on the data, said it should perform well, it should perform well. The performance should be good. So you can see in this dude is that we have a range which is very high, and this will actually become an hindrance in slowing down the grading descent on. That's why we're going to perform features killing. We're going to reduce the range. All right, so long. Let's see how to reduce the range. How to reduce this range so it is observed, is generally an observation that arrange between eso. If I have X as the independent video whistle, if it is between minus one and one or minus 0.5 and 0.5 Now if you will try toe, make your features in this range from minus 1 to 1 or from minus 0.5 to 0.5. It is an observation that greedy in dissent, Laurette, um performs faster. If we perform faster, you can see here eggs the range of fixes from end to 7000. So much I So let's see over really first technique, fourth technique, you normalize to normalize this range and do 7000. So the first technique is very simple in this technique. What we do is we pick up the maximum element all the injured and see the maximum element off the range of 7000. I never divide E um or each on the training example. By this value, we will divide and divided by 7000. Then we have 1000 here. So we're going toe divided by 7000. And so once those 7000 will be divided by 7000. So in this way you can see I have actually skill. I have scaled the feature from then to 7000 scale Do skill off less than one. Now you can see the value off X. If I will divide it by 7000 it will be less than one because 7000 bison Talton will give me maximum off one as every value will be lesser than one. So no, if I will be having a day desert like this and what is the first passed to do is to find out if the area or the features inside over data set is scale or no. If the rage is very high, we will pick up the maximum element on divide it with each value and we will obtain a ridge which is a lower range on. Then we will apply our machine learning hypothesis and then cost function and then grading descent, people form foster. So this is the very first technique. And then the second technique is popularly known as mean normalization. No, it is known as mean normalization. Now, in this technique also, we're going to scale this strange twosome smaller value. So the formula, you know what they're going to do is evil half for bed, the excise right. We have to obey X I. So again, I'm going to use this function. We're going toe do the simultaneous update off. Although training examples Andi, I represent the Iet's training example So mean normalization Say, is that what you have to do is just a track excited with you? I andi iss Now what is me way? No MEU is known as mean or average it was known as me. No, not great. So if you retain out the mean or average off the feature X or X I If you have multiple features, then you can capture the mean by adding them up on dividing by the number. After any examples, you can take out the me and then what you have to do is it will have to subtract that me from each off these values. From each of these training examples, it will have to subtract the me. So after subtracting, you will have two divided by s s is equals to maximum minus minimal. Right? Maximum minus minimum. So what is the maximum value Martin value with 7000? What is the minimal value? It is 10. So what? We will get this, Uh, 6990 So then we're going to divide each off the stone by 699 feel and we're going to do it like this. Now, this is known as me normalization on. And this is how we actually tried to scale our feature. A very good technique toe. Improve the performance, off trading, listen and bottom.
14. Logistic Regression in Machine Learning: in this tutorial, we're going to understand. Noticed a progression indeed. Eight. So we have covered linear regression so far in this machine learning playlist, and we have discussed supervised learning is off two types, which is regression on classifications, so linear regression belongs to the regulation model. So in revision model, we cry toe a pain. Ah, really valued or port, which is a line or a cough that can help us do the prediction. In classifications, we have discrete outcome. So the largest immigration is going to be our model, which is belong with Belongs to the classification Morning. So all of the names largest declaration has regression in it. It made confuse us that it is a revision morning, but it's not so. It's basically no this celebration because we're going to use regulation, do the classifications. You'll be able to understand that in a few minutes. So now what we're going to see is in a classification morning. What we do is we try to classify particular data point as it belongs to some class. For example, let's take or email or a spam classification machine learning model where we have a data set because it is a supervised learning. We have a data set, and in Dar Data said, we have some example off emails which belongs to a stamp or not a spam. So if there is an email, it belongs to a span were going toe say that it is the spam and then we have some emails that belongs to north a span. Then what we try to find it is in our classifications model. We say that now you're given a particular email, and then you will have to classify it as a spam or not Spam. So let's see how long this declaration can help us do that. So let's get started. So not consider this graph here this graph line. Take a look at this graph. Now I have plotted some points on this graph using the housing price data. Sit no, in the data said, we have the video and according to 80 every house prices also, So we have these data points, so this means that we are going to do supervised learning now in the data said. It is also mentioned whether the house with particular area and a particular price is the luxurious house or not is a luxurious house or not. Now you can see here the Redpoint sale. The red points represent that the house is not luxurious. And the Blue Cross's hair represents that the house is luxurious. So you can observe from this graph that, uh if I take the area somewhere around here. So after this eight er gets or Vedic and house regalia, which is greater than this point so even observed that most of the houses are actually luxurious. All right, so now I'm going to ask you a question that can you grow a line? Can you go a line that in separate all these red points from all these viewpoints? So the answer is yes. Essentially weaken role online. So let me just show you home. We can draw a line. So a line something like this can be drawn. And now you can observe that we have to Britain. We have successfully separated both off these glasses. Which is the luxurious houses Andi known luxurious houses. No, this lane is helping us do the classification off these two glasses, which is luxurious or not the Let's suppose I have a green point somewhere around here that stake of Greenpoint Hill. So now you can see that if I want to classify this Greenpoint as luxurious or not, I can make use on this line and using this line. I contend that if the point is on this side of the line, it means that it is not a Julius, and similarly, I can do it in this glass on. So So this means that using this lying, I can do the classifications so we could grow an important point here, which is we can use regression because in regulation also, we try to draw a line or a co so there will essentially use regression to perform classifications. So it somehow I will find out this line, then my work off classifications or the work off the classification Mortal has been accomplished. Now let's take a look at this second graf and let's suppose this graph corresponds to some data said having two classes. The red point represents one class, and blue crosses represent another glass. So in this case, can you draw a line which can separate all the viewpoints from the red points So you can see here in this graph? It's not possible for us to create a line or draw a line which in separate board all these . But is there any other figure which can help us do that? Can I draw some of the cove or some other figures which can separate out all these blue points from the straight points? So the answer is yes. What I can do is I can throw off so cool. Just assume that this is a circle. I can grow a circle and this circle in separate the red points from the blue points. So again we can use this circle to separate out one class from the other. So let's suppose I again have a point somewhere unknown here. Let's suppose this is the Greenpoint and I want to predict whether this point belongs to this red class or blue glass. So since I have the circle, I can say that this circle does not lies inside of the class. It lies outside the class, so it belongs to the BUE class. So that's how we can use immigration. Regression is about launching a line in this case, Oracle or any other figure which can help us perform classifications in this tutorial, which is in this tutorial we're going to discuss noticed emigration and we're going to take a simple example off linear regression to perform the classifications. And what we can do is if we will be able to find out this line and the cost function and grading descent off the classifications mortal. We can just change the hypothesis function to this figure. So in linear regression, now focus on this graph. Now, in linear regression, we know that we have a hypothesis function, which is edge eggs is equals to Peter Prospal's X. Now, this comes from May deviate leaning regulations. So we're resuming that we have multiple variables. So this is the hypothesis function off this line? No, my mosques to do the classifications is to get this line here. So I need the Peter para Meters, which is theater transpose belongs to all the Peter perimeters in the hypothesis function. So if somehow I will be able to find data parameters such that it will give me equation off this line, then I can perform classification. So we're essentially going to find out the data pedometers here also. But no, my question is, can be used the cost function off linear relation to find out the cheetah para meters. So in the linear regression model, we have a cost function and agreed and descend which will find the minimum cost off the data para meters and then we will use it to find out the best line course function is used to find out the line that best fits our data points. Which means that the lying that this most function then the produce is going toe pass through as much data points as again. Right? And it is doing that because off the underlying concept off course function, which is minimum squared error or minimum square distance. But you can see that this desired line is not passing through any of the data points. So this line is not a best fit according to the cost function. But according to the classification problem, this is the desire line that we want. Same is the case with this graph on so you can see that in this cove this girl is not passing through any point. This line is not passing through any data points. So if I will try to use the cost function off linear regression toe. Find out this nine. I will not get this night. Instead, I will get a line which will try to past most off these data points and it will be something like this. And this is not a desired line because this line cannot help me do the classification work . So this line, the cost function off the linear relation is not going to work. So what should we do? No, we have toe. Do something. We have to transform this linear regression graph this whole graf into a graph such that we will be able to do two important things. So I'm going to mark here. These two important things fourth, is to separate new classes to separate two classes. Now you can see this line is can separate these two classes. No second point is to also this fed our data points or our does it. No, you can see this Mine is not a best fit off, our data said, because it is not passing through them. So we will have to transform this graph into such a graph that we will be able to separate these two classes and as Bella's this line or any golf. We'll also pass through most on these points because then only we will be able to create a cost function that will try to best fit over day does it. So this is a major problem that we have. We have to find out the cost function. And for that the line should pas through most of the data points. What we will have to find the best fit. So not solve this problem. We take this whole graf and this whole line this hypothesis function and all these data points and we transform this whole graf in due on logistic regression graph into a logistic regression ground. So now we will or Dana graph, which will also separate two classes and also best fit our days, does it. So let's see how we can transform it and let's see what the graph will be if we try to perform logistical regression. So now consider this graph here. Now, this graph is another representation off this graph. So the difference is that instead off having white as price, we have, like as a discrete outcome because it's a classifications model, we are going to user discreet outcomes So if the value of y zero I will say that the house is not luxurious and if the value of why is equals to one, I will say it's luxurious. So what I've done here is you can see all these points are no plotted in this graph. And how I've done the plotting is when rifles 21 we have on the blue point so you can see all the blue points are here. And similarly, you can see we have all the right points here. So what we try to do is we try to find the X, coordinate off each of these points, we can find the export in it and then we can place them on the X axis. So now I can see we have all the data points. So now let's say that I want to draw a line. Now I will grow a line which will try to best fit these data points. Right. So let's presume that we have a line something like this. So now in this line, you can see this line is drawing toe, boss, Cool. Some one or two points in our data said you can see this mine is not even passing through any of the points. But this is actually trying to pass through. The data points to through over classes. So this means this line can be eyes also off it. It is fitting over data set, and it is also separating the two glasses. So let's see how this line is separating these two classes. So that's supposedly have a point. And we take the point as a green point here. So similarly, this greenpoint is plotted here. So what we do is this point is representing the house with some area. Also exacts is representing some areas. So what we tried to do is we try to project it on this line, and corresponding to this, we will try to find our value off. Why, which is Let's assume the value comes out Toby 0.3. So what we say is when the value off boy is lesser than 0.5 than we assume that it is not a luxurious house. And if it is greater than 0.5 museum, that mike was the one now, since we know that probability is can range from the value 0 to 1. We can also say that there is 0.3 probability that this house this point this house is not luxurious. Or in other words, we can also use the scale to say that there is a 30% chance toe off this house being not luxurious, right? So in this baby cried, you will find out this line. Now you can see there is a problem with this line that it tries to pass. You can see it can past toe only two points at most. So we need a cough or a line or anything any other figure in such a manner that it will try to fit most off these data points. So this line is not you can see is not trying to fit these points. Most of the points are not fitted, so we cannot use linear regression. You can see this is this is a line. So this is linear regression. So we cannot use linear regression. One made one major disadvantage off using the great Lee. New regulation is also that Let's suppose I have an out liar and all clear is nothing but a data point which belongs to some class but it recitatives outside this bluster so you can see that this glass is actually, you know, group. And this point is actually far away from this girl. So if you will try to plot this point here now you can see it will be here somewhere on this line and you can see that this mine will not be able to fit this data point. So we have to come up with some other cove and that golf is also the logistic function. So instead, off using this nine as the output, we're not going to use this line as they are put. We're going to use the cove which is given by the logistic function. Dio, get tau fit these data points and then we will use the same method to find the vie. And then we will find out the probability or persons a chance toe predict whether it belongs to Placido or class one. All right, so what is the largest dick function? We will have to discuss that for us now, not district function is the mathematical function. It is also known as a sigmoid function and is defined as F full face is equal stool of one divided by one plus ear is to the power minus X. So what is special about this function? So let's let's try to blunt this function graphically and see what is the graph that we will get. So ah, this is a website which lord the function given, do it on the path. So let's say our graph is one divided by one plus no Andres to depart minus X. So I will use the keyboard. Here we have a very Is it about me And here I will write minus X minus eggs so you can see that it has drawn a golf here which goes something like this. So let me just focus this graph You they can see that this graph goes from has a range of 0 to 1, but it never reaches one. It reaches infinitive on one and zero also, and you can see the midpoint is actually zero point. Fife, you can see the midpoint of 0.5 and you can see this golf is going something like this. So actually we can use it best fit our data points I So let's get back to our example. So we know that this graph will give me will be something like this. So it will be on go from one and zero here it will never be. It will never be equal to one or zero. It will go something like this. It bosses through this mid 0.0 point five and similarly goes here. So now we're going to use this logistic function. You best fit our data set. So instead off using this line, we're going to use this function. The best fit our data, is it? So let's see how our graph will look like now. So I'm going to use another color to represent this gulf so you can see no. Instead, off having this line we're going. We will get a call which boasts like this and then it will go down. This is the mid 0.0 point five here, and this is how the cove is going to go like this. Eso I will make it a little bit bold here, so that didn't clearly see how this girl is going. So now you can see that this lean if line was not best fitting our data set. But this function can do that for us and you can see that it will also fit the out liar. It will go till infinitive, so it will also cover the South Liar. Now, if I want to find the predicted value, I can project it on this cove and find the corresponding my value. And again, I'm going to classify it as luxurious or not. So now what we're going to do is instead off this line, we're going to transform this linear regression line into this gulf. For that, I'm going to pass the hypothesis, function off the line, do the largest stick function as an input. So instead, off X, they will have Peter transports off eggs on one plus the recent of over minus. Instead affects I will write theater Prospal six. So now dysfunction is going to be our final hypothesis function, which represents Lord. This declaration it represents largest integration. So I would ask us to find the right data para meters. Using the cost, function and greed in decent will be able to minimize this perimeter so that the goal this golf here will be ableto best fit. Our day does it. So let's see what is the impact off this coefficient on the skulls toe. Observe how fitting can be done. So here in the ground. What I'm going to do is I'm going toe add. I'm going to multiply the heads with some cool vision. So let's drive it. 07 So you can see when they were I 57 The Gulf is actually shrinking to decide. So you can see this in action that if I tried to right minus seven, it gets shrink again. You can see that it is shrinking. So this shrinking is happening because we're going to multiply it with the data perimeters and since it will shrink to decide. So if there are data points on this side, it will also cover those data points. That means it will fit those data points. So if I take some larger values, you can see it will sing some more and more anybody going like this. So this is how we tried to find out. This is the reason why we're using the leader para meters. It will shrink this graph somewhere here and in this manner we will be able tow best fit already does it now we will have to calculate this hypothesis function and after they will be able to use cost function and reading descend to calculate the seat a perimeter civilly dead. Our coverage will best fed all the points and minimizing the cost. So if you will put these data para meters in this equation, it will also give you this line and this mine will be ableto best Separate rd does it. So this is the basic idea behind Law. This declaration we're going to use the logistic function and a combination off linear regression sort off combination off linear regression toe get a cough which will solve our this problem, which is to satisfy these two conditions. Now you can see that this golf is also separating these two classes because we're using by as discrete value as zero and one. And it is also best fitting our data set. So in the next tutorial, we're going to study about cost function in detail. So that's it for this tutorial. Thanks for watching
15. Cost Function and Gradient Descent in Logistic Regression Machine Learning: No, we're going to start with the cost function in largest integration on. We will also see greedy, indecent in largest integration. So earlier in the previous tutorial, we have come up with a hypothesis function in case off. No, the Stig regulation, which was equal to one upon one plus ear, is to the power minus leader constables X. So we have come up with this hypothesis function and we know that this actually comes from a logistic function and now we're going to find out the Theater para meters because that's the only thing that we need to find out to get this hypothesis function. If it will get this hypothesis function, we will be able to get the growth which can best fit my data set. So no, we know that if we want to find out the teeter para meters, we first have to find out the course. We have to find the cost when we are choosing some certain teeter para meters. For example, if I chose a Peter para Mito as let's say some variable seven, if you will go here, you can see that this is just a website this most dot com so you can actually, uh, right the function and it will give you the correspond Nick off here Second. See, I have written the function off the largest generation. No, Instead of fetal, you can see I under 10 minus six. So this means that feet about a meter is one in this case. Now, if I will change the state about a meter to do, just observe that change in the ground. Get enough know the graph is strengthening. If if I will change it, toe Ah, bigger value. Let's say 26 you can see it will shrink. Move. So now you can see that if we're choosing different bottom eaters, they will get different types off girls. Now, what we have to do is we have to find out the cost with each on the gulf, with respect to each of the cough. And then we can act leader in the grading descent algorithm to find out the minimum course . And in that way, if you will find the minimum cost, we will find the teeter perimeters corresponding to it. And then we will be successful in getting a graph which will be ableto best fit. My leaders it so no, we're going to discuss the cost function now. The cost function that we're going to use in this tutorial in the largest integration is quite different from the cost function that we discuss in the Lena regulation. So I I'm going to ask you a question. Can you use the cost function off linear regression to find out the cost off this hypothesis function And the answer is no, no answer is no. Because if you will try to find out the if you will try to use the cost function offline integration on this hypothesis function, it is generally observed that you're going to get a golf which will be a really type off golf something like this. Now, the problem with the this type off golf Now, this girl is corresponding to the cost function. So if you will use the cost function on the lean immigration and you will apply it on this hypothesis, you will get a cove corresponding to this one. Now, you can see that if you will try to open the greedy, indecent in quality. Um, let's say that you chose a point from here and reading this in the item is starting to go down so it will reach a minimal here. Now this Mima is the local Minima and you can see that we have a minimal hill which is supposed to be the a mini most minimum point If you will observe here and we need this point as an answer we don't need this point So great in descendant bottom will be will get problem here because off the local minimus so we cannot use I'm going to write her an important point. We cannot use lenient immigration cost function And the reason, er is that we will get the cover. The cost function will be a very cough producing many local minimus. So we have to come up with a cost function which we can use for the largest designation which will not create this baby covert will create a cove which will have only one minima which is also known as the global minimum. Right, we have to find a cost function So the cost function which is Jeon Stada, is in gay self nordeste aggravation. It is equals to one divided My m sigma Go goes from my post a one tell all the training examples and inside I'm going to find out the cost. Off edge, Tita eggs. I get example Goma by I have example there you can see. The difference is that I'm not using one point win. That's 1/4 difference and were not captured in the square. So what we need to do is we will find out the cost off the edge. Tito excite way I on we will get a ghost, which will be which will not be a vehicle. So now you can see that we have, ah, largest immigration, which is basically ah classifications problem. So if you've been observed this graph here, this one, you can see that the why can have values like an equal to zero or why can be with the one. So if I was to zero, then we will have a different cost. And when y equals the one, we have a different course. So what we're going to do is they're going to write to cases like was 20 and when Vike was the one, because for these two cases, cost that we're going to assign will be different. So then my equals two zeal I'm going to assign the cost off edge. Teeter off X going. My wife is equal school minus lok one minus. Uh, he don't off eggs like this. And in case off white was the one the cost function will be minus Look off edge the dark eggs. Now you might get confused. Why are we actually using these two cost in these two cases? So the confusion will get sold when we will try to plot the cost function in these two cases and we will see what the graph look like. So if I was the one and if I will try to applaud the jttf worse is the hypothesis function graphs. So if I were tryingto Lord the graph now it is observed that men white was 20 and I'm using this cost. You'll get a cover which will go something like this. It will meet the X axis and the police in one on it will never meet the y axis. So this is the the X axis represents edge X, and this represents the cost function. So when vipers 20 and if you were trying to block this graph the graph corresponding to this one, you will get a cough, something like this. And if why was 20? So I know white was the one we're going to find out. What is the ground that will correspond? Toe s Tito ejects and J. Tito. So it is observed that if you will try to block the graph what? This morning to this one, which is then white was the one you will get a graph which goes from zero. It touches the X axes, eggs, Eel on goes something like this. No, it will never meet when x The value of eggs is it was the one. So at one Edmund total infinitive. So a light attacks on Jeff on the gods function here. So this is the ground that will be a little Get off the cost function when these are the two cases. If it will combine these two cases, if we will try to combine these two grounds will get a graph which will be something like this. Like a parable. Now the specialty about this graph is that it has only one minimum, which we can call as the global minimum. You can see that it is in contrast with this graph which will give me some local minimus. So now I can run the grading, listen and water come You find out this local minima and then I will find out the data para meters at this point which I will put in the hypothesis function and it will basically give me the golf corresponding to it. All right, so instead, off having these two cases, why? Cause the zero wife was the one we can actually combine both off these in a single equation . Now observe this equation really carefully cost off edge. Tito X, come away! I'm going to combine both off these so I will say minus why log off edge teeter eggs on. I'm going to write minus one minus boy one minus edge data or fix something like this so you can see that f of it put like was 20 If it will put my close to zero in this equation, it will make this whole dumb this dome as evil. Why will be equal to zero So it will actually give me, um, minus one bless edged Edel's eggs. Right, So it will give me this minus long. One minus edge. Detente Fix. All right, so here, it should not be this one. It should be one minus. Why long off? One minus X tedx eggs. All right, so that was a mistake. So now if you will put my pasta Zito the stone will become zero and viable. Be equal towards zero So minus one. Log off one miners. STX This is actually this one. If you will put white was the one you will get minus law edge Tita eggs even trying. So I know what I can do is you can see I got a single equation that can represent these two cases. So if I will play on the cost function off this graph, I will get around something like this like a parabola. Now what I'm going to do is I'm going toe. Since this is the way I have simplified, this is a simplified version off the cost function. Andi, No. What I'm going to do is I'm going to finalize the cost function off nor this celebration by putting this this whole equation inside this one. I would just place it here, and then I will find. Then this will be my cost function. So the final over finalized version, of course function will be one upon in sig Muggles. From I was the one Kill him and inside I will put this equation minus y log off edge data eggs minus one, minus way long. Off one minus edge. Ttx. So this is going to be my final ah cost function immigration, which I'm going to actually use. And it will give me a graph which will be something like this. So let's see how we can run the grading this ingredient descended Goitom. Now let's move on to the greed and isn't so Now there is a very good thing about reading descend that this and bottom is independent off what hypothesis function you're using. That's the fortunate thing about brilliant. Listen, So I'm going to ask you a question that can you use the grading distant and, boy, come that we have used in the Lena regulation in this case also So no, the answer is yes here because the breeding dissent of item is basically an algorithm. What it tries to do it is it cries to obey the values off Peter with some al far learning rate and it will always try to repeat itself until it converges on. It will be ableto find out this minimum point him. So the answer is that greedy and distant inviting will not be different. In the case off northern speculation, it will be same as Lena Regulation. It will be same as linear regulation. And one important point here about cost function is that this function is a convex function , right? So convex function will give you a global minimum. Something like this. No, really indecent with his same as many regulations. So we already know the involved somehow the and border them off. Lady in dissent works If you don't know about great indecent in lean immigration, you can check out the video where we have discussed about Lena regulation. Well, the algorithm is repeat until convergence and what we're going to do is we're going to obey the values all State of para meters, using this update symbol, and it will be again same as the previous one. Instead, off the whim, I will write em here I was the 1 a.m. edged a don't eggs I minus Why, I and do X I and G So this is going to be my radiant, distant Deloitte. Um, nothing exchanger and this inviting will run on. It will be able to find the global minimum. It will be able to find this global minimum on. Then what we will do is we will find out the data para meters at this position will place it inside off this edge X function on we get a draft that will try to best fit our data points. If you will put these Sita perimeters in the equation that we have used here, you can see I use the linear regression equation. So if you will put the state of para meters in the hypothesis, function off lean immigration, which is t dot transpose x. So if you will place the data para meters that you else of scene you have come up with using the grade invisible boy come from here. If you will put it here, you're basically going to get a line that will be able tow said brain out over two classes . This is going to be the line now. This line is known as decision boundary because it can help us decide which class is a particular point belonging to. So that's all for this tutorial in the upcoming tutorials, we are going to see some of the important points some of the important concepts, which is the problem off over fitting, and they also understand some concepts. But you can resolve this problem using regularization. It's in the next tutorial. We will just start with the problem off or fighting. That's all for the state oil that's been watching.
16. Regularization - Problem of overfitting: the Let's talk about the problem off over fitting in our data models or the machine learning models that we have paid so far in this tutorial, we're going to consider the linear regulation on. We are going to see what is the problem off over fitting and home. We can resolve this problem. So in linear regression we have used the housing prices. Example. We have planted a grouse that conveys the housing prices. Day does it. So now I'm going to show you some grounds based on which we can understand what is over fitting. So let's consider this first graph help. This first graph is priced persist size ground. You can see that the red points represent the sample later, or the training set off housing prices. Now let's say we come up with the linear regression model and we come up with lying something like this. After calculating the cost function and greedy in dissent. Now you can see that this line is passing through only two points, and most of the points does not lie on the line. Now let's consider the second graph in which we have the price versus size. The same day does it, but instead off a linear regression. We have a gulf. So the hypothesis function on this gulf will not Gilenya. And you can see that this golf is actually trying to fit most off the data points. Now, consider this graph In this graph, we again have the same training data. But you can see that the hypothesis function that we have come up with is a very complicated function because it has so many girls in it. The reason being is that the hypothesis function is too complex and it is trying toe fit all the data points, no matter how many ghosts are coming in the function. So in the first case in this graph, we have the hypothesis function estate a zero Placido one x, which means that this is a linear function. Now you can see that this type will function on this type of hypothesis. Function is not a good fit for my data set. So we call this example this first example as under fit. Now, consider this graph feel In this graph, we have just one ah square really belong extra square. You can see it's not a linear regression now so this girl is trying to pass through most off the later points. Now consider the cold graph, which is a complex function. You can see it contains so many pollen are meals off our tree and even full, which leaks do so many calls. Andi, since it is trying to over fit, might lead us it. That's why we have a complicated function. So the first example is off under FIC, or sometimes silly for it to us. High bias. The 2nd 1 is you can see is right is just right for over data set when I can say it's best fit. What if my day does it? The 3rd 1 is an example often over fit, because I am using a function with just too complex on the girls you can see here, there are so many calls in it will be even will pose some problems in placing or in generalizing new examples, so over fitting is also known as hive aliens. Now let's try toe define under fitting on a war footing. So under feeling or high bias is then the form off our hypothesis function edge maps early to the trend off the data which means that it is not passing through most off the data points, and it is usually caused by the function that is too simple or uses too few features there can see here it is using only a single feature, which is eggs. In this example, we're using two features Excellent and X two, and in the last one we have x one x two, x three and export. So we have so many features. So we need to just reduce some of the features on. We can come up with this equation. So if we have an over, for example, if we will remove these terms somehow, we will not consider the export an extra feature. We will get the equation off the best fit. So over fitting is basically caused by a hypothesis function that first the available data . But it does not generalize well to predict new data, and you know that it was caused by a complicated function that creates a lot off unnecessary calls and angles which is unrelated to the data. So now how can we solve this problem off over for today? How can be addressable fitting now? The first way is to reduce the number off features manually. So you know that if you will remove the X ray and export feature manually, you will get the best fit. But it happens most of the time that required all the features, so we required x one x two x three Exporter in the equation because they have their significance. In that case, I will use regularization, Teoh address over fitting or to solve the problem off over fitting. Now in regularization, we can either keep on the features and reduce the magnitude off the perimeter seater. So if I reduce the Madam Leader Sita, we have features X three and x four multiplied by in the equation. So if I will Degrees the Manu Door, Peter the Dems off X one and off x three and export will become insignificant now. This works well when we have a lot off slightly significant features. So extra A and Excell can be those features which are also significant features on. Do they help in generalizing some new examples
17. Regularized Linear Regression Cost Function: high in the previous tutorial. We pre Phyllis girls regularization on the problem off or fitting. So now we have a good idea off are we can prevent the over fitting the problem off over fitting using regularization in regularization. We have two very important steps. The full step is that we can actually try toe any minute all the features which have higher pollen or meals in our hypothesis function when we can just we're going just in the second step, we can reduce the course off the variables, which are off High point Norm is. So we're going toe. See the second step, which is to reduce the cost off the polynomial, which are there in the hypothesis function. So let's start now. We have over hypothesis function. Let's assume that we have come up with a hypothesis function, which is something like this. Peter zero plus theta one x Lasted are two X square. Does he docked three x Q and says these are different features. I write x one x two x three and let's of wars. We have one more additional film, which is export export raised to the power for it's only have a cost function like this and what we want to do is you can see that in this cost function we have seen in the previous grants that the equation off the hypothesis function which contains only these three terms we're leaving to our best fit. If we have lesser dems than these two, don't we have wonderful. And we have if we have these stones vivir me in the case off over fitting, right. So in the second step off, literally regularization, we know that we will have to reduce the influence off these two Deng's now. Why is that? No first thing is that these features, which is the X three on deck, spoke. These are significant features in growing the hypothesis function or in mourning the who in prediction Morning. So these are significant, but we can see that says they have a higher polynomial. They will cause the over fitting. So what we can actually do is instead off eliminating these stones. We can actually assign the value off Keita in these terms as a very no, on very long cost. And if the cost is lower than you can see the influence off these two terms will be less so . What we want to do is in these two tons we want to reduced their influence, reduce influence because if it will reduce the influence off these two domes, the significant tones are these freedoms. And we can get the best fit off our production morning Now home. Can we do that? No. See, the hypothesis function will remain same in the regularization. We will change. We will have to change the cost function. We have to change the cost function because he using the cost function only we can change the cost off. The's T. W just stated that three on Peter folks and we will find them very low. So we will have to change the cost function. And in doing so, we actually call the course functional the regularized cost function. All I said, let's see how we can do this. How we get How will our cost function change in orderto achieve a best fit right? Our main objective is to achieve a best fit Onda. We will have to reduce the influence off these stones for the actual let's see how we can change the cost function to reduce the influence off these terms. So the cost function, which is which we actually know is J off Pita it is. We want the minimum of the act, right? We won't go minimum if you don't want weaken Just right in the linear regulation we have one upon, um and then we have summation. I equals one. Go. Still all the training examples? Which me? House m. Inside this I have edged either X square minus why I square like this. All right, so it's actually nor square here. It's This great is outside off this bracket. So we have the one upon, um summation edge data X minus. Why I raised to the power whole square. So this was our cost function in the previous tutorials off linear relation, we have come up with such a cost function. We Ahmadi were already familiar with this cost function. Now what we have to do is they will have to reduce the influence off these features, which is Experian Explosive will have to get a very low rally. One state of three anteater four I have to change. This cost function will have to change this cost function such that we are going to reduce the the constantly that reenter for All right, so you can see here in the cost function, we have inflated it with do tones, which is Tito three x squared and did a full X grape. Now here is total. Friday 1000. I will write a lambda which is really able, or we sometimes corn lambda as a regularization perimeter regularization Fatemi toe and we say it as a regularization perimeter because you can see that this para Muto will actually help us do the regularization. Because when the breeding descended on, Brighton will try to reduce the cost function and we know that cost function will go nearly about zero. So when will the reading descendant Goitom will try to minimize the cost function? It will also try to minimize these do dems and in order to minimize thes two domes. If we have a regularization para meter which is awful, very high value, it will have toe reduce the values off Peter three and data four with a very significant major on. Don't, it will be ableto prevent the problem off over fitting and reducing the influence off thes two para meters. So now we can see that in this hypothesis function, which is here, there are only two Polly normals. But instead off two point ominously can have some more pollen or meals or some more features. So in that case of ill generalize, our regularized cost function as JT don't equals to one upon you win Sigma goes from Michael's to one tell em and I will write edge teeter X minus Why I goes last Now, instead of writing these totems, I can generalize it toe Peter, I square on some mission will go from I will still or I should say a J because they have already used I hear it should be J and goes from Jake was 21 Tell n all right, so every business and the models para meters of the my normals that is causing the problem off over fitting. So in this manner, now we have a cost function. This cost function will be able to give me the cost in such a manner that it has reduced influence off the higher point normals, making our hypothesis function as a best fit
18. Regularized Gradient Descent: No. The next step is to consider the grading Descente Laurette Um because great in descendant Goitom folks on the cost function And since we have changed our plans function now it contains some more variables. The grading descent algorithm will also change We'll study about the change in grading descent Deloitte And now So in grading descend, you're going to see that we have these thes para meters. What we try to do in grading descent is trying to minimize the cost function. This is the basic idea behind the reading descended Lord and we know that like so what will be the avoid? Um, we know that Dilworth and repeat, until convergence it will repeat until convergence. Now we have some changes in the grading on descend algorithm because our cost function has also changed. Andi, the bottom say is that if we have data Tzeitel, we will update white perceivable simultaneously obeyed the values of theta zero as he doesn t o minus four times one upon end then sick Muggles from my quills to one l m on. We have edge teda off x I minus my eye raised to the power explain or I should right, X zero I So Peter Zito is remains unchanged. We're goingto ability, does it? Oh, but you can see that the cost function is not considering these two values. And the reason is because we're not analyzing we're not analyzing or were not assigning any penalty toe. He does evil. This is an important point that inbreeding distant off a theta zero abdication off the thing doesn't available builder mean unchanged because you're not penalizing teachers zero with our regularized para meter with our regularization perimeter. We're not making any changes to eat a zero, right? We're only making changes to Peter three instead of four. So in that case, the right he got G. We're going to update theta J because you can see in this equation we have. Jake was 21 Dylan. So we will abate the data. These only We will update only these parameters in such a way. Doctor Bill be able to regularize this tome so it will go like this J minus at four times now here we're going toe change it. The one upon em sigma I equals to 1 a.m. and we again have a teeter x off. I minus y off I and again xj, I will remain here. This dome is again the same. But now we will have to add the regularized Barham Ito well divided by em. And then we'll write the dodgy right. So this is the only change. This is the only change that will come in our grading descent algorithm. And the reason is this because we are analyzing the Peter G value so well, I changed the values off liturgy and remember, it is no Lambda by am. So this dome, which is this one it performs regularization, egg, both forms regularization. And there are some important things that we will we have discussed here, which is mainly considering though, Lambda, which is the regularization bottom Ito. So we have seen how reading descent algorithm will perform regularization by introducing the stone in the energy. All right, so there is one more important thing. We can actually simplify this bill. We can actually simplify this ablation, which is the dodgy. So let's see how we can changer. I'm going to write a tell So on simplification, we can actually write it something like this. The dome will become Peter J one minus at a foot in tow. Lambda, buy em so you can see here. We're actually multiplying lambda with this system. Our regularization don't on. We're considering this film. Both have strategy. So we're considering them. And we have taken the data j as common. And we have written one minus l four times this as you can see here. And then we will write our remaining term, which is one upon him. Signals from I was the one. Tell em on similarly exceed or if I minus light off ice clear. No, not my I square. It should be X J, I think, Sam, but all right, so now on simplification, you can see we have northern anything major here. Really? Just a pond Theater J comin on in these two terms and we have district and one minus Al four times Lambda by am Now why we have done this is because Novi, if we can see this time here No, this Tim, this whole down You can see this whole equation and the question here you can see these are the same. But we have only one difference. Which is the introduction off this Param Eter. This whole one minus Alfa Lambda Lines. And so there is an important point about this. The reason vibrated simplification is because this term one minus l for Lambda Times and has a very important point, which is that if the this value is less than one, if we have, if one minus lander times, al four times lambda by enriches distance. If this is less than one, then it can be seen as the reducing value off teetered because that's only reducing to does he. So it will be less than one. So it can be 0.1 or something like that. So it will be able to reduce the values off Peter J. And since it would reduce the value of technology which are in our cost function, it is actually going to prevent the problem off over fitting on and it really keep on reducing the value of theta J. After every observation, we know that we're going toe, obey the values off theater J, and at every step it is going to decrease the value of theater J on every update on and these second thumb you can see here this don't This is exactly the same
19. Unsupervised Learning: Hi. Welcome to think. Execute me in this video, we're going to discuss unsupervised learning. Indeed. A. We have already discussed supervised learning and we have whiskers, linear regression, logistic regulation and a lot of the topics and supervised learning. So now we have a good idea off how we can do prediction and how we can actually train the models to perform prediction. So we even performed linear regression in our to see how we can actually train a mortal. So we saw supervised learning in action. Now we're going to head the words unsupervised learning. We're going to see exactly why eso unsupervised learning is needed and how we can apply and supervised learning. Now unsupervised learning is really very interesting topic, and it has a lot of applications and a lot off. Big companies do use unsupervised learning to find an understand Parton's inside a data. So if I say if I'm right, something about supervised learning in supervised learning. We have a very important thing, which is we have Ah, Labor Day does it right. So I have ah, label day does it. And from the very old example that we have been choosing for all the tutorials in the unsuitable in the supervised learning, which is off the housing prices. So the housing prices waas our day Does it that we were using to predict Oh, our brain, the mortal no, in housing prices do, does it. We have area and we have priced. So for every given for ah house with a given area, I have a mapping toe a price, and I can use this mapping to pray in the model by doing the same thing which is finding the cost function, hypothesis function and great in descend. And then I can actually do predict for, ah, House that given some area right so I can predict the house with given some media, I completed the price off that house. So in supervised learning, I can actually say we have a supervision off labels in a data set. But in unsupervised learning, the do not have and unsupervised learning, we do not have labeled it as it so, the data said is on labor no, most off the leaders, it's that we have on most off the applications or the data search started available. They are mostly unlabeled. Unlabeled means that it does not have the labels such as area price or anything. So what we have is we just have of only one Kuala Mel's ex with some, uh, I can say some just features, but I do not have a mapping off those features over price. Right? So we have an only way, does it? We do not have a label. He does it in unsupervised learning. So what do we exactly doing? Unsupervised learning is we tried toe understand? Happens in the video. Understand Parton's Indo? They does it. So in the very In the second tutorial off this machine learning playlist, we have discussed that in machine learning. We tried toe either predict something inside our data, said all we tried to understand Parton's in a data. So basically unsupervised learning is really helpful to find under on understand the patterns in the data and most specifically, unlabeled later, right? So there are a lot of applications off unsupervised learning. The very first application that I'm going to show you just know, is how Google uses unsupervised learning. No group the websites related toe some keywords. The group that sites related to keywords right so and supervised learning since we have one label data. No, you can see on Internet. We have a lot off types on Internet. Internet is a connection off Lord, off of websites and a lot off webpages. Right? So there are a lot off websites and those that sites contain some more pages. So let's suppose I entered a word in a book in the Google search box. That's about we have a Google search box here and I enter some key word here. Let's suppose I endo data science inside a keyword. So what Google will do is Google has a collection off all those websites and webpages on their basically only barely desert. So we have those websites only we just have the length for them and we have some data for them now. What Google does is it uses the unsupervised learning Teoh. Understand the group off websites or Web pages that belongs to this keyword data science, and it will perform some ranking methods, and some complex algorithms also toe show you the results related to data science. Some of the best results related to Gaeta science. So that's when good application off unsupervised learning. So in unsupervised learning. Since we know that supervised learning is also classified in Lini regulation on Lord. This dig regulation on unsupervised learning can be done. Or I can just say we can perform unsupervised learning. Or we can understand partners in a data using two very important techniques which I'm going to use A regular to reap is in these techniques. The first technique is known as clustering. On the second technique is the association. So from the next tutorial own words of the study these two topics which is glass ring and association. We will also discuss, um, more different techniques and in God, items including gaming's class, ring, hierarchical plastering and a lot of other topics. Ah, such as dimensionality reduction. And these are very important techniques because they're used in unsupervised learning. So we will. In the next tutorial, we will discuss what this clustering, and we'll get a basic idea off how we can if we have ah, unlabeled data said how we can actually do clasping. So the basic idea is oh, we just use unsupervised learning. Teoh find the group's off data with some on the basis off some similarities and the similarities they will get that We will discuss it in detail in the next tutorial. Lost clustering. So before that, I just want to cover one more very important application off unsupervised learning. And it is actually to find or Dune today. Security breaches. So let's suppose we have a date. Is it? We have a data set such that I have a data set off some good users, right? So there are some good users in a data set. Good users on good users means they actually used. They are authenticated. They're authenticated that the security system and says they're authenticated. We can actually build some similarities. Weaken draw some similarities or the unsupervised learning will draw some similarities within these users, Right? So if there is any user, let's say there is somebody user and and there is a bad using. I would say that this is some anonymous user, Some anonymous user, Let's say is Ah ah user Zed on this user is not authenticated, and it is using some uh means off. Or it is using some vulnerabilities in the system to exploit some off the features off the or to extract some off The very important data off these good users, right? So since these users have some similarities in them, we can use on supervised learning toe understand this pattern off similar similarities. And if there is any anonymous user that Christ Toe enter into the system are unsupervised learning will, it will be able to understand that there is some anomaly or Cem. I can say there is some different type of backing that is generating in our data. In that case, you can detect that this user is actually causing the whole security breach. So this is one more application off unsupervised learning. As they will discuss more and more topics, including clustering association, we will get to see some more applications off unsupervised learning.
20. Clustering Analysis: In the previous tutorial, we have discussed unsupervised learning in detail, the estimated how unsupervised learning can be used to find on understand interesting patterns in a data set. So if we have unlabeled data said on, we want to find some interesting patterns in it, we use unsupervised learning. So one way off understanding patterns is to do clustering. So in this tutorial liver, discuss what is clustering, what is cluster analysis and will also discuss. And I will give some examples to explain what exactly is clustering in the future to do real civil? Discuss some clustering and bulletins like K means clustering. We will discuss the applications off, plastering on in this manner. We will be able to do the unsupervised learning. So let's start with the cluster analysis with help. Often example. So I'm going to give you a very simple example toe. Understand what is clustering on how it is done? So let's suppose, uh, I have a box here. It's supposed I have a box. Let's suppose this is a bunks, and inside this box I have some shapes inside this ball, so I usual. I have some shapes inside this box on Let's suppose I have some shapes off triangle. It can be any greater triangle. And I saw this triangle or it has different types of triangles with different dimensions. Right. And it is in my whole, it is inside my books. Then I have some war figures like I have some square angers off different different shapes like this. And finally, I have some circles inside this box off different sizes like this You can see here, let me add some more. Shaves him. So inside this box, I have different shapes. Right? So what I try to do is I I will try to make three plasters on three groups on then These clusters basically clusters are groups only. So, in these clusters I will cry toe please the shames in the order off similarity. So we know that these are the triangles for different shapes and I will assign them in a single cluster. Right? So all the triangles with different shapes on going toe assign them inside this plaster on the circles that the different shapes and sizes will be assigned to this one. And then finally I will assign Oh, all the squares and rectangles like here So now you can see what I'm trying to achieve is in this whole box. I have different shapes and I'm assigning them to different clusters because these shapes are similar to each other. You can see triangles have three sides, so they're similar to each other. Circles are there, and there are squares and rectangles and these plaster. So what I'm trying to do is that suppose we have We can make an analogy here. Let's suppose we have Ah, unlabeled does it? I haven't unlabeled data said. And if I try to plot that data set in a grass like this, it's supposed This is my golf hard. I will try to plot some points off that unlabeled data said in this graph late this I would also make some point to All right, so let's suppose I have this only one day. Does it on this golf now in clustering. What we do is we find the similarity among the data objects off the unlabeled data set on. We tried to do the clustering with Justo organized them in different clusters so clustering this graph will look something like this so you can see here these points or I will just market there. So after clustering, I will run a clustering. And Gordon, basically a clustering and bottom, is used to convert the whole unlabeled data set in to different blasters. Right. So this is done by our my different blustering and involvements, which we will study in the future. Tutorials Phone off the martins is the game is clustering, which ah converts the whole data set and gay cluster scana more off clusters. So let's suppose I haven't labeled a Does it like this? After running a class ring and bottom on this graph, it will be ableto find out different clusters in this data set. So let's suppose this is the first cluster election to you at this point. Also, this is the first plaster. And let's suppose this is our second cluster. It is here and let's suppose we have another plus storage bill, which is the last one which is here. So in this manner we can actually convert a unlabeled data set and we can convert it into different clusters. Now there are some important points about clustering analysis. There are some important facts, which is which we can look in this graph. No, there are doing pointing domes, which is the enter plus their distances. On the second term is the Indra cluster distances. So the clustering algorithms can word this whole later said and little science, Then these clusters on the basis off the inter cluster and ingrown cluster distances. Right, So let's see what is in crop cluster distances post. So inside a cluster, you can see inside this bluster you can see these other data points. So the distance between these data points is known as the incorrect Lester distances. So this is basically leader in pro cluster distances on the distance between two clusters, two different clusters. For example, this cluster and this plaster we have some distance. Yeah, No, this is known as Indo cluster distances. So one thing that cluster going from does is reduced the intra class for distances so that if you will reduce the intra cluster distances, you will be able to find out that those points are actually similar or more similar to each other. The lower the lesser the intra distances is the higher the inter cluster distances will be right. So if you will shrink down this whole blaster you will reduce these intra plaster distances on you increase the inter plaster distances you will be ableto actually get these clusters which are far away from each other. So in this same grass in this same graph, I can also obtain some different clusters are also there. For example, we can create a plaster which is this one on. I can create one cluster like this. So plastering is actually a lot in biggest. Mastering is in Vegas in nature because even we do not know how to create these clusters. How There is no specific method off creating these clusters write so it isn't be used to create the clusters but the rattle bottoms it will try toe find out the distance is among the objects and find out similarities among the data points on the dissimilarities between the clusters toe find the different types off clusters. No, The important thing is that if we are running a clustering and vitamin were doing different types of clustering. So if we have some new data, all I should say if I have some new data object or a new data point, right, so I can if let's suppose we have a point. I will create a green point here. Now, let's suppose there is a green point. Some work here. No, this Greenpoint is having You can see the distance off this Greenpoint from this cluster is this much. And it is this much. And from this cluster, it is large enough. So if I have some new data object on new data point, I can actually tell how much similar it iss with respect to this cluster. This cluster and this plaster. So this data point is oh, so much near to this Lester. So I can actually attend that this cluster is at this point actually is similar to this cluster. So that's a very essential point on this is what helps in achieving unsupervised learning. And we will actually create the clusters toe. Achieve this to achieve this, uh, tell whether this new data object or point is longing to which blaster now, in the next tutorial bubble, discuss port ardo applications off clustering wafers, discussed the applications off the clustering. Then we will study with other different types off clusters. So even these clusters have different types of study them. Then we're going toe essentially move Oneto, the clasping and bottoms on Mr Lee, the different types off clustering and Bolton's like K means clustering and basically a K means clustering will be able to help us, uh, convert the whole later said. The whole unlabeled data set in to different Leicester's. So that's all from district area. Thanks for watching.
21. Similarity and Dissimilarity: Hi. In this tutorial, we're going to study similarity on dissimilarity. So in the previous tutorials on this machine learning lay leaders, we have discussed what exactly is unsupervised learning on Now we have a good idea that unsupervised learning is all about finding interesting patterns in data. We have also covered clustering. What is clustering analysis in detail in the previous tutorial on what we saw was that we can actually form clusters that have some common pattern or some similar features in them. So let's suppose I have Ah, a few data points. Something like this. I'm just going some Gundam point to that supposed these other data points that I have grown from a day does it? No. One important thing is that this data said is not a label data set. And that's why it is a unsupervised learning. No, What I tried to do is I try to find out that okay, these points are similar, So I I'm just enclose them in one cluster and then I will find out some more, uh, data points which are similar to these, and I will start expanding this this Glasper like hell on Finally like this So now I will I'm or paint a Gloster Hill. Now again, I will find a cluster somewhere Care. Let's say this is one more plaster and let's have include this point also. And then finally, we have created one more bluster something like this. And there is one more cluster which has these two points. So we know that these are the clusters which have some common patterns. Now. One thing about clustering is that class fingers and the use so envious means it is really confusion. It is confusing how to actually create clusters. For example, in these data points, someone could have come up with a different clusters, for example, somewhat could someone could out Gun is conclude this cluster in this one and there can be some different clusters that can be formed, so it isn't big use in nature. No one important thing about glass training that we have discussed is the similar similarity and dissimilarity. So if there are data points inside a cluster, so I will write data points inside a cluster. So let's suppose I have to find a cluster and there are some data points inside the cluster , so data points Villeneuve cluster inside a cluster. All within a cluster are more similar to each other are more similar. Who? Each other Then the data points in different blasters. Data points in in different clusters. All right, so what does want this? This mean? What is this point about? So this point basically means that let's suppose I have two data points. Let's consider Toe Data Points Sale in this cluster. Let's suppose I have two data points. You can see these two red points. No, these points in this bluster are more similar to each other than a club than data points, which are in different clusters. So this point and this point are basically dissimilar to each other on and these two data points are similar to each other. So one thing is sure that I can use this similarity on this similarity to form clusters like so if I can actually try to measure if I can measure how similar to data points are e r. With respect to each other, I can actually tell that. Okay, these two points are similar to each other, so they are within a cluster. And similarly, if I have some major some metric to calculate this similarity within two different points, I can say OK, they belong to different clusters. So basically these two majors are important in building clusters on. That's what the Class Pringle Bolton's do. The export of the blasting and guidance will go. So in the future tutorials, they will discuss some off the plastering and bottoms. One is the K means class playing now in Kenya's clustering bills. We divide the data points and gay clusters so we will use the dissimilarity and similarity measures. Basically, majors all I can say matric. For example, let's say way. Want to calculate the distance between two points A and B on the two point cell. So if I say that these two points are closer to each other now, this will be an ambiguous stars, right? Instead, what I can do is I can say that these air closer because the distance between them is only one meter. So basically, I have a major to calculate how much closer these two points up. And that's the case with the similarity and dissimilarity. I need some major to calculate how data points are similar to each other, and they're similar to each other. So that's the basic meaning off similarity and dissimilarity on the clustering algorithms will use the major similar in the similarity. So what are these majors? No, In clustering, we have some majors that can calculate similarity and dissimilarity. First off all, one important thing is ah, if you you can calculate the dissimilarity, you can actually tell whether the items or the data points are similar to each other and vice for size. Also true on ah, one more important point which I forgot to discuss, which is really very important is that we have previously discuss that if the data points, if the later points inside a cluster, if they are similar to each other, the more the similarity between the two data points the dinner cluster, the more they're different from data points in different clusters. This is an important point also. All right, so let's come to the majors off similarity and dissimilarity. So the question that we're arising is how can we actually figure out how much similar to data points are because if we will be able to do that, we can actually device some embodied terms that will be able to find out the plasters right and then leave. It will be able to understand different patterns. So what are these majors now? Some off the important majors are you plead in distance, you re listless, you peed in distance. Then there is Manhattan distance, hamming distance and medical skidded skins. Many CalSci Distance on. These are some of the majors on. Basically, these have formulas which will be ableto calculate how much similar to points are with respect to each other. So we can use these these majors to calculate whether two points are similar or dissimilar , and in this manner we can actually device the blasting of items. So indeed it is necessary that we first understand these majors because then only we will be able to device the lasting, involve items like the game is clustering and water. So first of all, I would like to define what is similarly I will define on a similarity or dissimilarity writing something like this. So I will give the formal definition off similarity and dissimilarity in case you really learn so similarity is defined or dissimilarity is defined as a major bigly. Two data points between do objects or data points and it is a new medical major. Basically, this major is a new medical major, so it is basically a number. So it is a new medical mission between do data points or on dips off the degree which on the degree to which the two objects are different or similar. No, uh, similar. So in case off, similarity will say the degree to which the war objects are similar. In case off the similarity, we will say the dignity of age there. Two objects are different night. So this is basically the formal definition off similarity and dissimilarity in the next tutorial village study are all these Euclidean distance Minkowski distance.
22. Measures of Dissimilarity: In the previous tutorial, we have discussed waters similarity and dissimilarity between the objects and why it is necessary to calculate the similarity and dissimilarity. The reason is because they want toe you similarity and dissimilarity to form clusters within a data. In this tutorial, we will see what are the majors off similarities and dissimilarities like you peed in distance or Minkowski distance diversity the properties off them. And finally, we're going to see the formulas that will help us calculate similarities and dissimilarities. The first time that I'm going to define is the proximity measure. It is a very important dome. No proximity major is basically or function. It is basically oh, function that is on which it is a function off. The proximity between off proximity make me to or gifts within a data on were described this toe measure the similarity and dissimilarity basically so this is a function of approximately between two objects to the term mine similarity and dissimilarity. All right, so various proximity majors that will study they are you pleading in distance and it also infuse the variation which is the medical school distance. And then, ah, one important thing is that these two distances are useful when we have danced data I, for example, if we have the temperature such as time CDs or duty points like for duty points, this type of distances are helpful to find the similarity and dissimilarity. And then comes the J card or cost sign distance. And basically these two distances are for discreet later. These are for discrete data. So these are the proximity measures so that now the question is how to calculate these proximity measures. And basically discreet, they don't. We can also say it as sparse data. No mislead. It should be sparse data, so example is likes and documents. So these are the approximately measures that we have and know what we are going to do is they're going to start with the 1st 1 which is the you beat in distance. We'll start with the you peed in distance. So you peed in distance. Be is the distance between two data points x and y, which is defined us, my little dog. Some. This is a symbol off summation which goes from a equals to one in en. And here I would like XK minus Mike A raised to the power whole square. So in this formula and is the number off dimensions and is the number of dimensions that we have, for example, Julie or treaty what even higher, right? So every prison is the number of dimensions, and Gary Prisons D number off it attributes or companies when we can simply say confidence . So we used this Euclidean distance to find the distance between two data points and vividly the properties off European distance. To figure out how there are some very interesting properties off your reading distance, they will see them on. Basically, this formula is very useful, and you should remember this formula because they're going to use this formula and K means clustering and buoyed up. So another variation off the same formula is basically a generalized form of this formula, which is given by Minkowski on it is known as the medical school distance, and the former love it was given is that if you want to calculate the distance off X com away, you can actually like a little dose. Summation goes from eight was the one in en and here the light Xscape, minus like a and then raised to the power I and basically xk minus like it comes under model s sign. And finally, this whole thing is having an explanation Nation off one one of or not this whole thing. So this is the distance, but which was given by Minkowski. If you will put different values off are in him, you will be able to find out that you will get the you clearly understand. So let's put the value of R equals to one They were first put the value of particles to one , if you will put on equals to one. It is most commonly known as anyone know because the value als on his once of equality. As anyone known. If we have articles to to we recall it as generally as l to Nome and they're far is infinitive. We call it us and infinite if or l'm Max no. So just some fancy names for different values off I All right. So what will happen if we have articles to one? So when l when r is opposed to one, we have some common examples. Which one very common example is the hamming distance on. And this means that if the value of artistic was the one we say it is hamming distance because this formula will be ableto calculate the harming distance. And basically coming distance is used to calculate the number off birds number off babes that are different, right? So if the number of that are different, the walk six. So if we have articles to one than recorded as the hamming distance, if I was to do we call it as a you peel in distance so you can see that this single form level give us different types off distances this minkowski distance. If you put the value 40 pulse to to hear you will get the Euclidean distance formula. All right, so the next one is the on infinitive and this is known as Sabrin um, distance supreme, um, distance or legs. And now we has a major. Most importantly, we're going to use Euclidean distance in class strangle bottoms. So, no, we have ah formula, which we can use to calculate the distance between two points and then we can use it to judge whether the two points are similar or the similar. Now, let us study some of the properties off Euclidean distance. All right. So let's discuss the first property off Euclidean distance, which is known as for the divinity. And this property basically says that the distance and D from a point eggs to appoint X, which is basically the same point if this distance is greater than only will do zero for all values off X and Y. So this means this is a positivity of property, which is which says that repealed in distance between two same points are always greater than equal do zero. Now the second property is the symmetry. The symmetry properties is under distance from a point X to appoint way is always equal toe a point, the like on my eggs for all values ALS X anyway, So now we have something to properties on. We have the last property, which is the triangle inequality on the triangle. Inequality says that if we have a distance from a point X to a point, then it is always less than equal two point to a distance off X come away unless the distance off ex wife on with it for the oil points X y, and did so These are the basically properties off Euclidean distance now why we have started these people produces because if the Euclidean distance specifies all of these properties, then we call these measures as metrics. Right? So if the Euclidean distance that we have phone has all these satisfy all of these properties, we call them us metrics. Next thing is that if only for property holds true life, some there's going to write that if triangle inequality is the property that is satisfied or it is true, then it can be used. It can be used to increase efficiency off blasting and bottoms, efficiency off class bring l've items. So that's why these properties are necessary because no, we can actually tell work are the metrics. And we can also use this third property to increase the efficiency off blasting and items. So that's it for this tutorial. In the next tutorial, we will study different exhaust blasting of items. The dancer for District real Thanks for watching
23. Partitional Clustering And Hierarchical Clustering: in the previous tutorial we have covered you clearly list INTs on We won't talk covered blustering analysis. So basically we have We have also seen that how clusters are made basis on the basis off similarity and dissimilarity on Now we also have major toe calculate the similarity and dissimilarity using Euclidean distance or Minkowski distance. Now we're going to study different types off blustering that are labelled waas on. Basically, we will start them by ah, using a worse is differentiation. For example, 1st 1 is the our nation on it. Worse is head article blustering right. So in this manner level, uh, loan different types off blustering on the first type you can see is the partition clustering and we'll see how it is different from hierarchical, blustery. There are some more clustering methods which are exclusive, blustering or partial clustering. We'll study them one by one in distributed will study about partition versus hierarchical clustering. No, both off These clustering methods are very popular, but the partition of clustering is most commonly used. So let's start with the traditional class spring force. What is partition clustering now in partition? Clustering? We say that if we have a data set off in objects. In addition, clustering we have data. It's a with D on six. What we knew is we divide these d orange apes in tow some foundations and you partitions. So basically what we try to do is the have, ah, set off data objects. And what we try to do is we divide them into patterns, right? So, for example, if I have some day, does it or let's say these are the data points on what I'm just going to do is these This is a whole data set off the points that say, these are the D objects. And let's suppose on this access on this axis we have the features, right? So a partition, plastering and bottom, Right, So there are different types off plus train. Still, we have different types off clustering algorithms to perform them right. So, in partition, clustering these are the Botox. But what a partition involved, um, will do it will try toe form or divide these Dianetics in tow. Big clusters on K plantations in due part, ations. Right. So you can see these other data points that we have and now they are divided in three clusters. There can be some more clusters or less clusters. But the simple idea is to divide dough Dion Dixon toe partitions No one comin and goin Item that is available that does. This partitioning is known as the gay means class, struggle, boredom and we will march in the future Tutorials seven on. See how looking into clustering Well, day means means that we even divide the B objects into okay, clusters like so here you can see in this example K is equals to three. So we will develop a clustering. A welcome. We will just give the whole latest today for them in the data and Haltom will automatically create these types off clusters. I will just supply the value sk, and it will divide it into plasters. Now the idea is that in V just use um three points and the values gate in gaming's clustering. We just you some points which are also called as ST Croix Tips, for example, I jewels three points here on these points which are the same, Kreutz are used to cap lit the European distance from various objects and in this manner we try to develop clusters although we will see gaming's class ring in detail in the further coming tutorials. So now the next type of plus minus the hierarchical clustering and our people class ring now in head. Ocular clustering. We have a cluster. We have a date. Objects. Let's say we have. Did you don't this? Let's suppose this is my data. Right on. These are the features on these to access. So what I try to do is in hierarchical clustering is to build a hierarchy head Arquimedes that first I will say that. Okay, this is my single cluster. The school cluster is my single cluster. Inside this cluster, I make some more clusters in it. For example, this is one blaster and let's say this is one Gloucester. And then again, this is one plaster right on. Within these clusters, we can make some more clusters like this can one cluster and this can be one cluster. And this Congar Oh, on and on. Like this. So basically hierarchical clustering is also a type off nested plastering, which means the clusters are Nestor inside different blasters, right in part, additional clustering. We also call it as a nested last train. No in this type of lasting, which is hierarchical clustering. What we do is we try to build a three or we call it more awfully. As a lender. Graham, we tried to build it in the ground with three prisoners. The hierarchy off the clustering. For example, This plaster, the big cluster is a cluster A Let's say this is a big cluster A. I will write a here and then I will divide it and go cream branches. Because inside this big cluster we have three more clusters. So let me use some different color to best little's. Now, this is my first plaster you can see And this is the second cluster in the school. One is the her cluster. So I call these clusters by some different names like B, C and D. So now I will be here than I would like to see, hear and like be here Now you can see the B cluster has some more clusters in it. Right. So this is the first cluster which we have inside D. And this is the second cluster less named as e in there. E. And if so, Then again, I will write some I will create another hierarchy e in this. Now, in this manner, this hierarchy can be very big. Also like it can go on something like this this and this, uh, from every plastic and go late. This right, So these are basically represented using den diagrams and on end arounds are is just a way to show the hierarchical clustering it has some also some or ip applications in different ah themes. McCann via just concerned about hacking new class straight. Now, in the next tutorial rivers study. Ah, some more types off blustering. So that's our industry, Gloria. Thanks for watching.
24. Agglomerative Vs Divisive Clustering: Hi. In this tutorial, I'm going to give a summary off The difference between a gloomy native worse is devices clustering. So basically, we have started hierarchical clustering and partition Clustering, hierarchical clustering is further divided into different types. Sauce clustering one is the Abla Meritus on the 2nd 1 is the divisive and got them right, So in a ruminative clustering. So let's suppose we have a data points something like this, Right? So let's suppose we have these data and they are initially some individual later points like this No argumentative and device that the both are hierarchical clustering. So basically what we will obtain is a den diagram. We can actually a port in agenda. Graham on in the program, we have some three light structure, right? We have a very light structure. So basically we have nested clusters. No, in argumentative work. The idea is the idea is to us take all the points all the data points initially individually on, then start combining them into smaller clusters. For example, let's combine B and C in a single cluster, which is something like this. Then we will combined d and E in a single cluster, then in the next iteration, I really combine D E and live in a single bluster. And then I will combine Let's say you B and C in a single cluster. And finally I will combine these two clusters in a single cluster. Something like this, right? So from this diagram, we can actually compute that ABC the A E s is a cluster, and inside it we have two clusters A, B, C and D E f. So let's suppose we have a big cluster hair with points a B see the e and they're so inside this A, B, C and D E F iss A cluster like this and the if is also plaster and inside a B C B C is a single cluster and A is a separate cluster. And similarly E l f is a different cluster and he's a different clusters. So indeed, this is a had article clustering. But the idea is that we tried toe. We take the points individually and then we argue, married them into clusters basically are ruminative means we are combining them into plasters. So finally we will get the whole bluster like this in divisive and bottom in divisive clustering. Will you do the opposite off, Ruminative? What we do is the combined all the later points in a single cluster. Initially, we say that this is my single cluster and then I start dividing this this whole plaster into different clusters. And we basically all this as splitting off a cluster. So initially I have this whole cluster. And then let's say I divided into two clusters which is B. C. And it's a a d e. And there right? And then again I really split them into different clusters. Something like this. And then here also I will do the same. I will let this plaster in tow, right? So divisive means dividing dividing the cluster, right? So in hierarchical clustering, you can see in both off these clustering agreement, whatever and divisive we do obtain a hierarchy. You can see this is the hierarchy on and basically Linda Brahms are used to represent the hierarchy so argumentative and argumentative. What we do is we are you can see I'm going down here. So this is basically known as top down approach. Right? And here Sorry. It is going. A divisive is known as stop don't. Because what we're doing is they're actually dividing this oil into sub clusters on here. Also, it will be divided into a and E, and then this will be divided. And don't yen if here it is going something like this. So I call your dad as a bottom up approach, right? So basically these other two guys off, argumentative and divisive, we will discuss the abloom narrative hierarchical, clustering and bottom in detail in the coming tutorials so that some from district Orient kinds for watching
25. K Means Clustering Algorithm in Machine Learning: Hi, in this tutorial we're going to start with k-means clustering, invoice them. So the first about k-means clustering and avoid them is that it is a partitional clustering and volatile. It is a partitional clustering and avoid them. We have already studied about partitional clustering, where we divide the given set of objects into some partitions, right? So in this tutorial, we will color all the important things to consider about K-means clustering, which includes the gradient descent and some optimization criteria, time complexity, and the advantages and disadvantages off k-means clustering. So for first thing is we need to understand what is k-means clustering. So in k-means clustering and bottom we are given some value of k. So let's say VHDL, considering the value of k as three in this tutorial. So given a value of k, So this is actually given to us. What we do is we divide the given objects and do a number of clusters. I'm going to write here k clusters. So let's suppose we are given some data objects and I'm going to represent them using a said. So let's say we have a set of objects which goes from X1, X2. And let's say we have m number of objects. So I'm going to write it something like this. So this is the data points on the object, or sometimes it is also known as in stances, right? So we are given some in stances on objects which are m number of objects. So m represents the number of objects in my dataset. So after we will run the k-means clustering and plotted them on this data objects. It will convert them into k clusters. So what will be the output after running this K-Means clustering algorithm, it will give us the number of outputs as clusters, and we're going to represent clusters with the name s. So the clusters will be, will go from S1S2, del, k number of clusters, right? So the U S to represent the window glass training. So if we have k clusters, the, the output will be the gain and more off clusters. All right, so this is what the k-means clustering we avoid them, do, which is essentially to divide the whole data objects into some given k, which will give us the number of clusters. So let's see how this embarked on books and how can we actually generate these k clusters. So the next step of, first of all, what I'm going to do is I'm going to just draw these points. Let say we are given some points. Let's say we are given some data objects and I'm going to draw them here. Right? I'm just going to draw, let's say this is the given dataset that we have. Right? These are the points that we are given. And we are given k as three. Alright, so the first migration, in the first iteration, there will be some steps in the fourth digression and we're going to just recursively do those steps again and again in the folder i iterations to improve the clustering and boil them or to optimize the blustering and block them. So what is the step one in this invoice them. So how do we actually start the blustering a bite them, right? So we're given this data set and this whole dataset, these objects. The step one is to choose the k points and randomly, right? So I'm going to write here jews E points randomly. Alright, so in this dataset, we, what we're going to do is the first step is to choose the k points and AMI. So in our case, you can see that the value is three. So let us say I choose 1 as this one. This is our fourth. Let say, Right, let's say this is our second. And, or let us say I choose this one as the second. And VHS choosing any random value, right? Let's choose this one as the third points. So we have chosen three points according to the step one. So what is step two? In step two, what we do is the drive to find out the distance between the points dot vf chosen with respect to other points. And if the point is near to it or if the distance is smaller, VC docket belongs to same clusters. So what we do is we calculate the distances, we calculate distances and assigns them to Glasgow. Assigns them to what cluster or Gilson point. So we have chosen a point. You can see these are the three points. Now what we will do is we will try to calculate the distances and we will assign them to this cluster. So let's say I will assign this point to this one and this point to this one. And similarly this little angle here. And this point is also assigned here. And similarly I will assign this point, all these points which are near to it, to this one, right? The next point is this one. So you can see this is near to it. So I will assign these points to here. You can see like this. And these will also Watteau viewpoint, right? And the last one is this one. So we will calculate the distances of these points. And since they were near to this point, we will just assign them to this point. So after doing this assignment, what we observe this, that we have obtained three clusters, right? You can see that when we have assigned it, we have all these, see these three clusters, which are something like this. Alright, so the question is, how do we calculate the distance between a point and a and the point that we have chosen. So we have already studied the Euclidean distance via standard Euclidean distance. I have given, I have already taught this in a previous video, and I have given the link to this video in the description below and also do the whole machine learning based. So you can check that out also. So we use Euclidean distance to find the distance from 1 from, between the two points, right? So it was used as a similarity measure. We know that Euclidean distance was used as a Euclidean distance loss. Used further dissimilarity measured H. So we calculate their distances and assigns them to what chosen point auto Glasgow. So this was our step two. So the question is, since now we have obtained these clusters. Then the k-means clustering and log them stop and return the DC-3 clusters. The onset is known because at this point it does not optimized. So at this point, after iteration one, it is not optimized. So how can we say that it is optimized or not? We need to find out some criteria. We have to set up some criteria using which we can say that this hydration, the whole clusters are optimized and the whole points and the distances between them are now optimized, Right? So there should be some optimization Greg area. So now here I'm going to write the first of all, let me define what is a newbie, Euclidean distance. The formula for Euclidean distances goes from x equals to one, summation n, an insider Revit x i x j s. We are S belongs to them. Cluster number, right? Then we do the square. And finally we do the square root of this, right? So this will give me the Euclidian distance. And I can use this to calculate the distance and we will assign them to a chosen point. Now, we have to set up some optimization, great video. We would have to set up some optimization, right area. So what does this optimization criteria? Now, we have to understand this. This is a very important part of the game, is invite them. Now we have seen that at this point also we have obtained three clusters. But it may happen that some of the points that are assigned to these clusters should not actually belong to these clusters. So we have to optimize the whole hybridization. We will have to keep optimizing it. And at every iteration we tried to optimize it. So there should be some criteria will be, should a weirdo, k-means clustering envoy come stop. And basically the optimization criteria that we're going to use is known as Debussy S, S, which is within cluster squared distance. Right? Sum square. I will write that it is within cluster sum squared. Alright, so what is this? W CSS does the within-cluster sum of squared distances. So how do we calculated? So first of all, what we do is we calculate the distance of the points and the point that we have chosen. We recalculate those distances and we're gonna do the square, sum, square them and add them. And then we will do this for all these clusters. We will do it for this cluster and this cluster and this cluster. And finally we will add them up, right? So it will be first we will add, we will find out the sum squared. Sum squared is given by x i minus mu i whole square. Right? Why is the sainthood all these clusters, these clusters are basically S is represented by S is represented by the cluster center. Right? So we represent Gladstone buyers cluster center, which is given by mu y. So we have the cluster center and exiled represents the, the points that are there and the data objects. We calculate those graves some distances and v-squared them on-demand and list x belongs to S right? Now, this is for the single cluster. Since we have k number of clusters, they will sum this also. And we will run this from S equals i equals to one K, right? So what essentially we are doing is Vn actually setting up a criteria which we will minimize, right? So you can think this off as the, remember in the linear regression. In the linear regression to Boolean, What we do is we define a cost function, hypothesis function, and we have to set up some optimization criteria, right, to optimize hypothesis function. Now, what we do is we use the audit function to find out the cost of each and every. For each and every parameters, we calculate the cost. And finally, we find out the minimum cost. And after finding and how do we find the minimum cost is by using gradient descent. So wherever the gradient descent converge, we will say that the cost is minimum magnet back point. And really use this to calculate the hypothesis function and we will get the exact result. So similar to that, we are willing to do this in this tutorial Viet also linked to here also the agreeing to calculate the sum squared distances from each and every point to the center of the cluster with a square them up. And then for each cluster level first calculate this sum squared, and then finally we will add them up. And now finally, when we have added it. So you can see in the first operation we have obtained these clusters. So we will calculate the sum square distances for this particular iteration. And then we will plot it on the graph. So vivid blockade on the graph. And in the next iteration, what we will do is we will try to find out the step three. We will do the step three, right? So this step three we'll be able to do the recursion where there we will move to step one. And in this manner we will be able to move to some more iterations by it. So step one is to choose the k points randomly. Then we will calculate the distance and assign them to a chosen point. And then after doing this, we recalculate the optimization criteria or within cluster sum square distance, and we will plot it on the graph. And then what we'll do is we'll recalculate or recompute cluster centers. Right? So at this point, you can see these are the clusters that we have obtained. Now we will compute the clusters and you can see the center of this cluster is that own tail. And the center of this cluster is around here. And the ST, that octave duster is around Hale. So we recompute the cluster centers. So instead of choosing the k points randomly, what we do is we assign these points that we have chosen in the first iteration and assigns them to the cluster center in the next step. And then again, calculate the distances and we will assign it to these chosen centers, right? So this is the third step will be savy compute cluster centers and move to Step one. So now after completing this step three, which is to recompute the cluster centers, we will move to this tape one within, again choose the k points time. Namely, we will not use the p points to an animal genome because we had actually computing the cluster centers. So now we're going to choose the cluster centers here. So after the forest hydration, the clusters are going to be something like this. Right? These are my clusters and these are the cluster centers which we have recomputed. And now what we'll do is we will calculate the distances of the points with respect to these centers. According to step two, right? We will calculate the distance. More specifically, it is going to be the Euclidian distance. And we calculate the Euclidian distance. And we will assign them to a chosen point, which will be something like this. And let's say there was a point here which is now a part of this cluster, right? So a reassignment take place at each and every iteration. Remember that at each iteration we calculate the optimization criteria, which is the within cluster sum squared. It is similar to the cost function, right? It is similar to the, not by formula Weiss, by logically it is similar to cost function. We calculate the cost at each and every iteration of this K-Means Clustering envoy them, right? We have chosen this as the optimization criteria. So what we will do is we will keep optimizing it. And basically by optimizing the mean that we are going to minimize this, this distance, which is the sum squared distance. So the next step is to find the convergence of the k-means clustering and boiled them. So when we calculate the optimize optimization criteria at each and every iteration, it is generally observed. It is generally observed. Dan to K-means clustering envoy Tom will keep on decreasing, right? Something like this. It will start going down. It might not reach the global optimum, right? So let's say this is the global minima here. This one is the global minima. And this one is the local minima. So the k-means algorithm at each and every iteration when we recompute the cluster centers and recalculate the distances. And when we will recalculate the distances, the sum squared distance, which is this one. It is also willing to change. And it is observed that after each and every hydration, this distance on this course or this course, goals or convergence downwards, right? It decreases and it keeps on decreasing. And it will reach to some optimum. It cannot, it is not necessary that it will reach the global optimum. But one thing is showed that this and gotten will keep on decreasing. It will reach the local optimum. And that's the main reason why we call this as a limitation of this and audit them because in gradient descent, if item one, so Vf stated that this is the basic limitation that the global optimum is not reached, right? So at this point, when we have reached the local optimum, we see that the slope at this point is 0. So basically if I will differentiate this function, if I will differentiate this function, I will get the slope, which is the dy by dx and using, and this slope will be equal to 0. And this will hate me in, in calculating the conversions, right? So if we will calculate this optimization criteria and if at some step the differentiation of this optimum, optimum or the optimization criteria gets equal to 0. I will say that booklet k-means clustering envoy Kim will stop and it will return though clusters, right? So now let me write though convergence criteria, something like this. So what is the convergence criteria? First of all, we would have to do the optimization. We will have to write the optimization criteria which was equal to some off goes from x i minus mu i. So instead of mu, I am going to write a. And here I will write the square. So a is also the cluster center. I've just used some other representation. Or let's not create some confusion here. Let's say it is mu i squared only on x. So here you can see that this was Alva optimization criteria, which was the sum squared distance, right? This was the result. Now what we have to do is we have to differentiate it. And we, when we try to differentiate it. And we know that the convergence idea says that after differentiating this whole term, if it is equal to 0, it means that it will converge. And at that point we're going to stop D. K means clustering and avoid them and we will return with the obtained cluster. So let's differentiate this on differentiation to welcome here. So it will become two x i and summation is also here. Then minus summation of this will be two mu y, right? So dual will come here and do XI minus doom 3i and v will give the sums to both of them respectively by so now it will become dou mu x i equal to two sigma mu i, we'll cancel two. And since there are m objects, since delta M objects in my given example, we will write summation of mu i because there are m sanders. So I'm going to write this one as m mu i, right? So since each and every object, we have M objects, or M can be the members of a particular of a particular cluster, center or sift. Plasters and debt or only the cluster, right? So let say this is the mu i. And here, let's say we have a member m, j off a particular cluster. So basically what I've done is I've just written summation mu i equals two MG mu y because we have m number of objects. And these objects should be the members of the cluster center that we have chosen, right? So that's why I've written m j here. And novelty recompute the cluster center equal to mg when go down in the denominator and we write sigma of x i. Now this equation that we have got, now whenever this center of any cluster will be equal to one upon M j when Vg is the member also particular cluster and sum them some all, we will find out the sum of all the objects. If this will be equal to, we can say that now the convergence criteria is met. So at this point I can say that the Angola economists converging, right? Since it is converging, we can stop the, a body calm. We can stop the game means and loaded them. And after stopping, we will return the clusters that we have. So this is the k-means clustering algorithm. The first step, I will just give somebody or what we have done because they have done a lot of things in here. The first step is that we choose the k points time. Namely, we calculate the Euclidian distance and assign them to the chosen point. And we, and in this manner we obtain a particular clusters. Now we move to the optimization criteria, which is the sum squared within cluster. And we calculate this distance with respect to each cluster and then we add them up. Then we will move to the step three witches to recompute the cluster centers. And we'll again the capillary, the cluster centers. And then again we will move to the first step. We will again recalculate the distances. You can see in this step VOD calculating the distances and we will assign them to the chosen point. And then again, we're going to calculate the squared sum squared distance. It is generally observed that from iteration to iteration or after several iterations, the end gotta convince dark decreasing and some stamp this. And the slope will be equal to 0. So the differentiation is equal to 0. So the convergence criteria and the criteria is the convergence criteria just basically says that at this point who lives DOM the K-means avoid them when the slope is 0, right? So whenever this sum squared distance is equal to 0 or about 0, we will say that now we have reached an optimum solution and add that that step we will, the k-means clustering enlightened will stop and it will return the clusters as the output, which is in this form, right? And the optimum criteria. The convergence criteria and what it does is it just religious differentiated? And this step, we are just differentiating it. This is the optimization criteria, which is the sum squared within cluster, right? I'm just going to write here sum squared. We use this, we differentiate and Alfred differentiating. We have come up with this equation. So whenever the mu i, which is the cluster center, is equal to one upon M j summation of x i. It means that the envoy them is now, has now converged. And at that point we will stop and return window with this output, which is the gain and bought off clusters. All right, so now we have done the k-means clustering, invite them. So let's say, let's try to understand the time complexity of this algorithm. The time complexity of this algorithm. So there are various steps, and for each and every step we can calculate the time complexity. So in the first step, when we were trying to calculate the Euclidian distance for each center. So when we were calculating the distance or the Euclidean distance for each object, it, and they go off and dying. Right? So where n is the dimensionality of the vectors. All right, what else? We were also reassigning the clusters, the recomputing the cluster centers, recomputing the clusters. And it will basically take big O off k, m time, right? So k is the number of clusters and m is the number of objects. And the third one is to compute thus centers, right? So we will have to compute the cluster centers. You can see in this step, in the step three, we were computing the cluster centers. So let's see how much time would it take? It will take big O of m n, m, n, where m is the number of objects and n is the dimensionality of vectors, right? The last one is the, assuming that these steps are done for vibrations, right? We are doing all these steps, 12314, number of vibrations. And we know that alpha at each and every iteration, the invoice them will start decreasing monotonically. So after dehydration, we can actually come up with T m n time complexity. So this is going to be the actual time complexity than the K-means and volume will take to compute the clusters or the partitions given the M objects. All right, so we have covered the time complexity now. Now let us see what are the advantages and some disadvantages. All the k-means clustering algorithm. So the advantage is it is fast. It is a rhombus, invite them, it is a wave fostering vitamin because it converges. It always converges and it converges very fast. So it is also efficient. It is relatively efficient. And we know that the time complexity is big O of m n and 40 iterations, right? So d K m n. And the last point is that when we give it vivid, supply it with best dataset. If we have a very good dataset that is optimized for this and avoid them. And basically, best dataset means that it is clearly separated into clusters, right? So if the data is separated, it works very efficiently. Now let's take a look at the disadvantages of using k-means clustering and avoid them. What are the disadvantages? So the first disadvantage is that it requires us to specify the number of cluster centers, right? We will have to specify explicitly, specify explicitly number of clusters, right? Which is k number of clusters. So we will have to supply it with some value. And this is a disadvantage. The second disadvantage is the hard assignments of data points. Hard assignments of data points. Will the cluster data points to the cluster. And we know that we use Euclidean distance to calculate or compute the or assign the data points to a particular cluster. So it basically is then a hard assignment because it takes a lot of time for doing that. And the very important one, the important disadvantage is that it may converge to some local optimum, right? So we may have some global optimum, but it will just miss that will go to the local optimum. And we have seen that this is also a disadvantage of using gradient descent, right? So these are the disadvantages of using k-means clustering and avoid them. So that's it for this tutorial. Thanks for watching.
26. Conditional Probability Basics in Machine Learning: Welcome to the machine learning playlist. In this tutorial, we are going to understand the conditional probability, right? So the first thing is that we are actually studying provability because from now onwards in this playlist, we're going to study though, regardless the models or the probabilistic machine learning models. So basically we are going to use provability to do, to do the prediction, right? So one such probabilistic model that we are going to study is known as the Gaussian bayes classifiers or wash, wash in naive Bayes classifiers which are very important in machine learning. And before just jumping onto the topic regression naive Bayes classifiers. Since the naive Bayes classifiers depend on the base theorem and Bayes theorem is constructed upon the condition programs will have to just cover the basics. Awesome conditional probability first, then the base URL and the Gaussian distribution. And then we can actually see how the, how probability can be used to do the prediction. And we will study the naive Bayes classifiers. Alright, so let's get started with conditional probability. So first of all, I will define what exactly is provability. Visibly provability is defined as v is equal to number of favorable outcomes. Number of favorable outcomes divided by total number of outcomes. Right? This is the basic definition of probability. Now let us take an example. Let's suppose we have a point. And we know that in acquiring begin, let suppose we are tossing a point near tossing this coin and Winder tossing a point we can get u will become suggests ahead aura a, considering that this point is unbiased, the outcomes that income is too, right? So I will write total number of outcomes as to the, let's say we want to find out the probability that hairdo will come. Then we read off this point. So what are the favorable outcomes? What are the number of favorable outcomes? It is one. So basically half is the probability of getting ahead by tossing a coin. And same as with a law also. So this is how we basically define probability. So let's now try to understand what is the conditional probability. Right? We will have to understand conditional probability. And then we will use this. And we will understand the base. Right? So what is conditional probability? So let's suppose we have two events. Let suppose we have two events, a and B. These are two events. And more specifically, we are going to take them as independent events, right? So these two are independent events, which means that Then both are independent of each other and they do not rely on each other, right? So let suppose event a is tossing a coin. Let's say we are tossing a coin and we are considering it as event a. And let's say even me is rolling dice, right? So both of them are independent of each other. We are tossing it points separately and then we're tossing or rolling a dice. So what is conditional probability? So, conditional probability will answer this, these types of portion which I am going to write here. So the question is, what is the probability? What is the probability that event a, event B, N80, right? What is the probability that event B will occur? Then event a has already occurred. Has already occurred. Right? So in this case, we cannot directly use the formula of provability. We cannot use this. You will have to use the conditional probability. Why? Because here it is saying that to find the probability that event B will occur when event a has already occurred, right? So this is an important event a has already occurred means that this is our given condition, right? So this is the condition that is given to me. And based on this condition, I will find out the probability that event B will occur, right? So Phoenicia program ready is basically represented as P B given a in this form, right? So probability of b will have to find out and reliability of a is the given condition. This is how conditional probability is represented by this symbol, right? So the formula for conditional probability is given us e B intersection a divided by rho w d. That event a will. This is the formula for conditional probability. So now I'm going to give her a given example, this solvent example, which we'll use conditional probability. First, we will solve that example without using this formula. And then we will see how we can. Then we will recheck and when this formula rejected with this formula. Alright, so now consider this example. Let's say we are tossing a coin three times, right? So we're crossing. We add just tossing of wine three times. So let's say we define an event E where we say that the, there are at least, at least two heads. Right, so let's suppose I'm tossing the coin three times. Let's suppose this is the first toss, the second, and this is the third one. So in the first one we can get a head or tail. In the second one that we can get a head or a tail. And similarly in the third one, I can get, I don't know a. So we haven't even d, which says that we will get at least two heads. So at least two hertz means in, out of these three crossing all the points are these three combinations. We can have 0 hertz, you can have one head, or we can have two hairs, but not more than that right? Now let's do find one another. Event which says that the first flow, then the flow will show a head. They show. Okay? All right, so we're tossing the coin three times, so let's right there, so we know that it will have here head order d here also the arrowhead or D. And similarly like this. So we have two raised to the power three, which is eight congregations. I'm going to write all the combinations here, right? So for eight I will write four times edge and four times a. And similarly two times edge. And these are the, these are all the combinations. Then we're dosing the 0.3 times. I'm just writing although combinations. Now one time the edge, the edge e, and so on, right? So these are the eight combinations that we have. The 0.3 times, these are the combinations that we can have. So how many total number of outcomes out there? You can see these are the eight number of outcomes, right? So what is the program? Now the quotient is, the quotient says that what is the probability of event E? Then F has occurred, right? This is the portion that we're given. Now this means that we will have to find out the probability E given F. We will have to find this out. Now what does this mean? This basically means that we're going to find out the probability that when we are tossing the 0.3 times, two times near getting the hedge. And the first row will sure, head, right. So in the first row we will always show ahead. And the rest can be a head or a T, right? We will have to find out the probability that at least two heads will come. And it is given that the first troll ratio head. So it is given. So let's see what is the answer of this. So first of all, that we have at least two heads, but what is given to us is even. Now, even as it says, first of all, we'll show ahead. Now how many events are there that with your head in the first row, there are four events. 1234, right? So total how many or Amazon where there are four number of outcomes night. So there are total number of four outcomes. Now, it is saying that at least two heads should come. At least two helps means 0 head, one head, two head. Now out of these, you can see that this one, this event as having three heads that just not allowed and exempt and this combination, all these combinations are valid in this one we have to head then again to help. And here we have only one thing. So how many favorable outcomes up in the data be favorable outcomes, right? So this is basically the answer of this problem. Now, we are going to use the formula and we will check whether our answer is three by four or not, right? So the formula is, and it will be E invest Section F and E of f, right? So I, b of e. So what is the probability of E? The probability that E will occur is, what is the book? We just at least two. Now you can see out of all these good combinations are there, because it is not a conditional probability. I have all the eight combinations. Now, at least two heads, right? So let's count how many events are there, which has at least two heads. This one has two heads. This one is also valid and all of them, right, so seven orders are valid. Now let us tablet s. So what is P of S? S is the total visual head. So again, the total number of outcomes is it says that the worst throw raise your hand, and there are only four favorable outcomes because this will have the first row S, n. So I am going to write all my ID. So basically this is the formula for the conditional probability and it should not be BOC, it should be P of S, Right? This is a indices b, sorry, this is b and this is a. So it should be off if you can reach it from this formula here, which is given, this should be E of F is for my, you know, we have calculated this non next calculate what is E, E intersection F, right? So since we are again calculating the probability that we'll consider all the combinations with just eight combinations. E intersection F means that the first TO wish your head. And though there will be at least two heads, right? So at least you helps. There are, you can see there are seven combinations of at least two heads. And you can see out of these seven, only three are there, which will show a head in the first row. So E intersection F will give me my aide. So if I will just put these two in here, it will be three by eight divided by four by eight, which will give me three by four. So the probability of event a occurring given an event F will be three by four. So in this manner that we can calculate the conditional probability. In the next tutorial, we will study them based euro, illustrating the Bayes theorem. And we will use this formula and conditional probability to see what is this Bayes theorem. And we will also do an example on the serum. And that's all for this tutorial. Thanks for watching.
27. Total Probability Theorem Machine Learning Probability Basics: And you can build a machine learning playlist. In this tutorial, I'm going to explain what is the law of it will go grow badly B. So basically I'm going to cover some of the program, the basics so that we can get started with the machine-learning robot ballistic models, like the naive Bayes classifiers. So let's talk about the total probability and let's understand what exactly it is. So let's say I'm given some sample space S, Let's say this is my sample space S. Or let say this is a simple set that contains some events or that has some events in it, right? So this set contains some events. Let's say we have n number of events. Let's say we have n number of events. And here I'm going to say in this tutorial every day, the value of n equals to three. Right? But the formula that we will derive, the total probability will be applicable for n number of subsets that we are going to make from this sample space S. So what I'm going to do is I'm going to divide this whole set into subsets. Let say this is one subset and desist thinking subset, right? So this one is, let's say B1. This one is Baidu. And let's say this one is b3. B1, b2 and b3 are three subsets of the sample space S. So now what I'm going to do is I'm going to show that if I will take some, some event a, let suppose this is some event inside of my sample space. I'm going to represent it as the sum of all the probabilities of B, of intersection B, right? So let's say we have this event a here, and from this set we have B1, B2, and B3 events. One important point about all the subsets, which is b1, b2, and b3, is that they all are mutually exclusive. This means that they do not have anything in common. All right, so one thing is that they are mutually exclusive and this means that they are disjoint six. The second thing is that they're exclusive and also. Exhaustive, right? So exhaustive means that if I will add them, I reject the whole sample space. So that's why mutually exclusive and exhaustive subsets. Now let's say a is a part of this sample space. And let's say I want to find out what is a, I want to find out what this part is. So now you can see that there is, if I want to find out what is common in b1 and a0, I can basically derive this whole into three parts. Let say this is the host part. This is the second part, and this one is the third part. In the first particle, it represents a intersection b1. The second part is a intersection B do because this is my B do subsets, right? And this is the part that I'm writing. You can see this is the, this is a intersection B to write. And since we are going to find out the oil space or the whole event a, we will add all of them, right? So the last one is a intersection B tree. Let's say we have n number of subsets of some sample space. This will go on. There will be n like this. So now let's represent this in the form of probabilities. Let's say i will write probability of getting a, right. So let's say we have some event a and I want to find out the probability of a. And I'm already given that these are the, this is the whole sample space and these are though some events, right? So now I'm going to write this probability of a intersection B r1 plus r2 Bambi a intersection B do. And this goes on the probability of a intersection B n. In this manner, you can see I'm actually presenting the event a in the form of sum as the sum of probabilities, right? So we have all the program biggies intersection B one, intersection B to build a intersection B1. So this is known as global provability. Now in the previous tutorial, we have briefly stated about conditional probability, which we have already seen. What is conditional probability? And if you haven't watched that tutorial, I will give the link to this video in the description below, or you can find it in the machine learning playlist. So in the conditional probability, we have come up with a formula that the probability of B given that a has already occurred, is given as probability of a intersection B divided by probability of event a. From here, I can get probability of a intersection B. Right? I will provability of a there. So it will become probability of a into probability of B given that a has already occurred. Now the reason why I'm doing this, why I am taking the value y, I'm concerned with the value of probability of a intersection B is because I'm going to represent this formula of daughter provability in the form of conditional probabilities, right? So from here you can see this as a intersection v1. So here, if I would compare it, I will be able to get a robot body of a will be equal to Rabban lady of a into probability of B. Given a has already occurred. And since it is B1, I'm going to write b1 here. And again, I'm going to do the same with probability of B. Given a. In this manner, it will go on till b will become ill. All the Serbs a, NSF's, right, it will go till n. So you can see that I have represented total probabilities by using conditional probabilities. Let say we are given that even has, given some probability that event has occurred. And we know that the probability of b1 is something, right? So we can actually find out what is the probability of a using that. Alright, so now let's see how I can generalize this form, right? So let's generalize this bomb in using the sigma symbol, right? So I can write it as probability of a is equals to sigma, was two sigma i equals to one. And it goes till n. And it will be profoundly deals, which is this one. And here I will write rho value of B i given a. Now this is my final formula for the older programmed it. In this manner, we calculate what is total probability. In the next tutorial, we will do an example of it. We're going to go badly Vi, and we will use this formula. We will also study the base theorem, which will also use this formula. And basically Bayes Theorem is used. We will use Bayes Theorem when constructing the naive Bayes classifiers, which we'll be able to help us in making predictions, right? That's all for this tutorial. Thanks for watching. And please make sure to like this video and subscribe what channels for more machine learning videos.
28. Naive Bayes Classifier Intuition in Machine Learning: Hi everyone. Welcome to the machine learning theorist. In display list, we have already covered some of the provability basics like conditional probability, total probability. Now we're going to use provability to do the classification using the naive Bayes classifiers. So in this tutorial, I will discuss all the important points related to naive Bayes classifiers. And I will also show how it actually works or how it actually do the classification. So the core understanding of this classifier, which is the Naive Bayes classifier is the Bayes theorem. So we will first have to discuss what exactly is Bayes Theorem. And then we can actually use it to do the classification. So let's see what exactly is Bayes Theorem. Now in the previous tutorial, we have studied about conditional probability. And from there we have come up with a formula, which I'm going to write here. All right, so basically before writing the formula, let's take an example of a dataset which we will use to do the, which we will use to do the classification, right? So basically we will create a program or a classifier, which will be a six classifier. And what it does is it finds out the, whether the given student is male or female. And in the dataset we're given some attributes like height, weight. And let's say we have upsides. And then we have the output y, or the sex, which is male or female. So let's say I'm given a dataset, something like this, which has these attributes, height, weight, or size, and sex. And here we are given some values in the dataset and we are given whether the student is a male or female, Right? So in conditional program, we have the formula which we represent as E of Y given X is equal to probability of x intersection of y divided by probability of x. Now this is the conditional probability. And now we're going to see what is this theorem now and base it on what we say is we can, if we know, we actually know that probability of Y, then we are given some x. So x is given to us. So if we can find out, if you want to find out the probability of probability of x given by, we can actually do that. See that? And just using the conditional probability to write this formula. And what I can do is I can take this equation as one. And from here I can find out the value of b and x intersection Y, which will be y given x multiplied by P ofs x b x given y. From here, from equation one, I can write probability of X multiplied by probability of y given x. And in the denominator I have probability of why. And basically this is known as the base theorem. This formula that aggregate and this is the base theorem. And basically it states that if you actually know that programmability of y given x, you can actually use it to find out the probability of X given Y and vice versa. Right, so let's consider this dataset that we have, and let's try to make an analog though. This dataset of how we can actually do the classification using base theorem, right? So now you can see we have some features here. I made full size, let's call them as X1, X2, X3, Xn, right? So we have all these features. So x basically represent the attribute, or you can say feature set. Right? So we are going to assume that x is a fishers and which consist of height. Let's say hi, good. X1, X2, X3, and so on. Now, why is the outward which is represented as a sex, which is male or female, right? So y can be a male or a female. Now what we will have to do is it will have to use Bayes Theorem to find out whether a given student is a male or a female when we're given some height, weight and fault sites, right? So what they're interested in, we are interested in the probability. Then when we want to find out whether the given student is male or a female, given some attributes, stripes, given some attributes. Very interested in this one. We are interested in finding out the value of this right from here you can see, if I want to find out this one and the Bayes theorem, it will be written as you can see, this is P of Y given X. So I'm just going to multiply this with this one. We have 3p offline integral variability of X given Y. And probability of x, they go in the denominator, so it will be probability of X. So basically, we are interested in finding out the probability of Y, which is a male or a female. So I can write it as like this. That what is the probability that the given student is a male? When we're given some attribute, which is the high Omar, let's say wave. And for size, something like this, right? So this is what we can use to find out the probability of Y given X. So that's why it is known as the Naive Bayes classifier. And one important point about Naive Bayes Classifier is that it states that the features, features are conditionally independent. Rights that they're conditionally independent, which means that they do not depend on each other. So high grade, four sides, they are all independent features. Alright, so you can see that in this equation, which is this one. And this equation we have, we're given some v, we can find out the value of rho gravity by given x and we're given probability of y, two of x given y and robotic dog x. So basically sense we will have some n number of features which has X1, X2, X3. We will have to rewrite this equation such that it considers all these features, right? So we're just going to write this equation. Now since the features are conditionally independent, we can actually write it in the form of equation which says that E of X given some way. So we're assuming that the features are conditionally independent. We can actually write the value of X given Y equal to the product of was from I equals one l n number of features that I have probability of X given Y. So basically this is the conditional independence ration. So we can actually replace the given probability, which is probability of X given Y, which is in this equation. You can see that you can replace it with this one. And you can see that this equation will consider all the features that I have. So you can see that this part, this one, is basically the variability of X1 given y multiplied by probability of x2 given y. And it will go on until probability of X n given by type. So we can actually read the Bayes theorem that this equation, using this equation, we will just replace the value of p x given y. Then this one. And in this manner, we will be able to get these. We will be able to consider all of the attributes. And then the equation that we will get is known as the Naive Bayes classifier. So basically I'm just going to write, rewrite the equation that we are interested in finding out the probability of Y where we are given some x is equal to Y. Into. Instead of writing room averages of x given y, we are going to write it as product of i equals to one. N. Go back beauty of XI, given some value of y divided by probability of x. So I just replaced it. Now, we can use this to do the classification, and that's why it is known as the Naive Bayes classifier. This is actually the, just the base theorem. We have just used the conditional independence equation to replace it. And you can see it will consider all the attributes. And then we call it as a Naive Bayes classifier because now we can use it to classify, to do the classification.
29. Gaussian Naive Bayes Sex Classifier in Machine Learning: Welcome to the machine learning playlist. In the previous tutorial, we have already studied how to do the naive Bayes classifiers. How we use naive Bayes classifiers. We have already studied the Bayes theorem and we have studied how we can use Bayes theorem to construct a Naive Bayes classifier. So basically, in this tutorial, we're going to create a sex classification. We're going to perform sex classification. So we are given some dataset and we will find out whether the height and weight corresponds to some male or a female, right? So we're going to classify it as male or female. So in the tutorial where we discussed about this formula, we discussed the Naive Bayes classifier. I have given the link to that video in the description. So please make sure to check that out so you will be able to understand what this formula is and how we have derived it from the base theorem. So this formula says that given the probability by given some x is equals to rho BanRegio fly. And this is basically a product from i equals to 1 billion, which gives me the probability of X i given Y divided by probability of x. Now I will rewrite this equation as revamp duty of Y given X equals two and just written this bart here as the product of all the conditional probabilities. And you can see I've used the daughter provability, the concept cost total probability theorem to expand robot videos x as provability of wildland multiplied by probability of x given y one plus y two, multiplied by probability of x given y. Do we are Y1 and Y2 are do different classes. It can go n, right? So if we have n classes, we will add them like dark. So this is the formula that we're going to use to do the classification. All right, so let's take a look at the greening same that we have. And you can see that in this table we have person male or female. And you can see we have four males and for females, we have some features like hide and then wait, and then for size and these are the values of height, weight and for size. So what we will do is we will use this training dataset. We retrain our dataset, and then we're given some sample dataset. You can see this person is a sample. The height of this person is 60. Rate is 1 third the pumps and the foot size is eight inches. So now I'm given this data and I will have to use the Naive Bayes Classifier to find out whether the given sample belongs to a male or a female, Right? So how will we do that? We will use this formula to find out the probability of male given this data set, which is x. And then we will calculate the probability or female given this data set x. And then we will compare both of them. If the provability of mainly given this dataset x is greater than the probability of female given this data, eggs. Then we can actually say that those sample belongs to a. Maybe things will become more clear when we will do more. We will go to the next slides. So here you can see X1 is the height in this formula, I'm digging X1 as height. X2 is weighed, and similarly x three is though firm size. Then we have Y1 as male, invite US female because we have two classes, meaner meal and see me. Now, these two, there are two classes. So the provability of male and female is equal to 0.5. so if you will add them up, you can see that under daughter group averaging becomes one. So now we have provability of main programs of female. Now let's see how we can actually use this dataset and these features and these glasses and how to rewrite this formula so that we can do the classification. Aura is let's see how it works. Now. You can see a formula here that I have written which says probability ofs given X, where X is the set of features, right? So we can also openly down gas posterior off me. So instead of writing provability of male given x, you can also say studios male. And it has come from the equation, the formula that we have previously studied. Right? So now you can see here meal is why. You can see in this tutorial. In this one, this is y. So in our case, y is mean. So we will replace y by me and x1, x2 by height, and so on, right? So here you can see that I have replaced by width mail. We can see here male. And similarly this has also become male, male and me and X1, X2, X3, or weight, height and the size in the denominator, the denominator, you can see I've written the proband 3D of male, which is Y1. And multiplied by gravity or I'd given me provability of aid given me, and provability of folk size given me. Plus the probability of female, which is our class. By doing so, you can see that in this equation we have probability of Y1 and Y2, which is provability of mail and provability of female. And we're just multiplying it with the probability of x given y one. Since we have three features, X1, X2, and X3, we are going to write probability of X1 given y one multiplied by probability of X two given Y. And similarly for, for XYZ also dilutes a robot videos X3, which is the flip side, has given me. So you can see that this is the equation of row babbeuda male given x or posterior. Or maybe. The next thing is, the next formula is provability of female given x, right? So now we will replace y by female, which is here. You can see the male-female. And the denominator will remain the same. Now, one important point to note is that in both these cases, in both these probabilities, that denominator is same. And since they're just going to compare these two equations, we do not need to calculate the denominator. We will only calculate the numerator and we will compare the numerator, right? So we can actually ignored both the denominator since they're seeing. Then we will compare the probability or posterior of male when the posterior of female is posterior of male is greater than posterior of female, we will say that the sample data belongs to the main category. All right, so basically the first question that arises in our mind is that we know the probability of mainly 0.5. we know that probability of female is also 0.05. but the question is, how do we calculate the probability of HIV given me robot video, they'd given me provability or font-size given B. And similarly for female. So you can see provability of height given female rate given female and flipsides given 50m in Hollywood got weight, all these probabilities. Now the answer is the Gaussian distribution. Now, important thing about this is that height, weight, and size. They're not discrete outcomes, they're continuous outcomes. So in Gaussian distribution, we will use the formula for height, read and foot size and we will calculate the Gaussian distribution. And since height, weight and foot size on the wanting us outcomes, the brutality that we're going to get can exceed the value of one. We know that. We already know that the probability lies between the range 01. But since height, weight and foot size, the quantity of as outcomes, we are not going to get the W D. We are going to get the probability density which can exceed the value ofs one. Alright, so let's see what is the Gaussian distribution. Now, here in this table, you can see that I have calculated the mean, variance of height and weight and for upsides for me and forcing me respectively. And I've done that using the able. Now mean is represented by the MMU symbol and convenience is represented by sigma symbol. Now let's see how to calculate the Poisson distribution. So let's say we want to calculate probability of HIV given male. So the formula of Gaussian distribution is one upon under root of two by sigma square, where sigma squared is the variance and exponential of minus six minus mu squared over two sigma squared. And this gives approximation 1.5789. So six here is basically, it comes from the sample data. You can see here. In the sample data, which is here. This is our sample data, the height S6 and asked IBM taken six for read, will read, take 1 third. And similarly for first size number eight. Alright, so. Eps go here. Now this is the probability of I had given me. And six is here. Know mu is the mean, right? So mean of height, since we are dealing with height, we will write nu for height and similarly for variance. And we'll use the hide for me and for me because it has height given me. And similarly, we will calculate the value of provability rate given me. Wait, is 1 third from our sample data and report the mean for the weight given that it is a male. And similarly for variance also, we will do that and we will find out this value. Similarly, we can find out for size given me by using this table. And we can also do the same using the female also. So using this table and this formula, which says that this is an ocean distribution formula. And basically you can see all the probabilities are this program, it is 1.5789, which is greater than one, and we have already started that probability always ranges between 01. So why is it that it is greater than one? The reason is that height weight for continuous outcomes, right? So in that case we do not say that this is the provability. 1.5789 is basically the probability density, right? So now we have all these values, but we can do is we can just put all these values in the equation, which is here. In these two equations, we will put all the values and we will just ignore the denominator. We will calculate these products and then we will compare the value of posterior middle, and posterior female. Right? So now let's see what is the result of the classification. Now let's see what is the result of massification. Now after calculating the product of the numerator, posterior of male, You can see averaging posterior numerator me. The product comes out to be 6.1984 multiplied by ten, raise to the power minus nine. And then you will calculate the posterior numerator also you made that is the product of all the terms in the numerator. The answer comes to be 5.377810 raised to the power minus four. So on comparison of was Juliette of male and female, we can see that the posterior of email has a greater value than the posterior or mail. So we can say that the sample, the given sample is also female, right? So we can club, we have done the classification that the given sample belongs to a female. So that's all for this tutorial. Thanks for watching.
30. Gaussian Naive Bayes Sex Classifier in Python sklearn Machine Learning: A little thinkers, Welcome to think X Academy. This is the machine-learning playlist. And in this case, we have so far studied the Gaussian Naive Bayes. And lot of them, we have studied how we can do the classification using the Gaussian Naive Bayes. In this tutorial, we are going to use Python to implement that. Alright, so let's get started. The first task is that you should have installed the Jupyter notebook. I'm using the Jupyter Notebook and by default, it installs the SKLearn, which is the scikit-learn kit. So if you do not have the Jupyter notebook, I would suggest you do first install the scikit-learn kid because we are going to use the library Naive Bayes and Gaussian Naive Bayes to implement this. So basically implementing the Gaussian Naive Bayes is pretty easy. It's only three to four lines of code. So let's do this. Now you can see in the first step here, I have created x and y. X is the, is a numpy array. You can see this is the numpy library and we're using an array to beta at a, which will contain all these items. So you can see that first one represents high. This one is the weight, and this one is the foot size. So in the previous tutorial we have just taken, we've seen a dataset of the, of all these features which fields the height, weight, and foot size. So this is the same dataset Hale. So six AND 12 is one record. And similarly 5.9 to 1911. And you can see this is our whole dataset. Now corresponding to these features, we have some labels. So again sees 6112. This record is all for me, right? So y is basically though class. So the class labels are male and female and you can see how I have represented them. Now the next part is to assign a classifier, which is basically a Gaussian Naive Bayes classifier. So in this way, you can see I have created a variable CLF, which is equal to caution needs base, which will, which is basically a liability which will help us to do to implement the caution Naive Bayes. Alright, so next step is to do the training of our dataset, right? And basically in creating, we just use the formula of the Gaussian Naive Bayes. But instead of using that, we can just use the library. So what I'm going to do is I'm going to just let us do the training first. Clf dot fit function is used to do the training and we will supply the features which is x and y in the function. So in this way, we can do the draining of this dataset, which is the features and the labels. I will just provide it in the classifier, which is the Gaussian Naive Bayes classifier. And fit x, y are the attributes or features and the class labels. So after training, it's time that we do though. Testing of the data. So we can use the CLF dot predict function, which will help us and do the prediction. And inside this function, what we will actually do is we are willing to provide it with a NumPy array. And let's say we want, we have to supply some sample data to it. So let's apply six feet as the height and 130 as though weight and eight as the four sites. So you can see this is basically though, testing data. This is the sample data. So after predicting, after running all the classification, because we have built a classifier which is CLF. It will do the prediction on this classifier with this sample data. And we have already studied all the working behind all of this standard, the probability concepts also. So it's easy to implement than to see the working behind the scenes, all these code, right, so this will make a prediction and it will correspond to some labor which is male or female. So this will actually print a class label. We're going to print that, right? So in find out the print function, I will put that. So now if I hit control enter, you can see in the output we have female because after running the classification, it, it was able to find out that the posterior or female was greater than the Boesky at off males. So that's why the classifier has returned the female. Lets change some of the, let's say I will give five less Jesus data. And here I will write a 110. And I'm just using a random number here. Like this. Let's run this again. And again it will be a female. Alright, so again, change this R i. So we will have to see some data here. Okay, let, let's change a 686112. I will hit control enter. And now you can see it is giving me a meal category. And for some data, it will give a male and for some, it will give female on the basis of running the whole probability, right? So in this manner, we can do the training and testing of the Gaussian Naive Bayes. If we want to find out the score, Let's find out with the goal of this whole classifier, we can use the CLF dot score function. And inside of this, I will provide x comma y, which basically finds the accuracy or the score of this model. So if I will hit Control enter, you can see the score is basically one. So in this way we can do the wash. And Naive Bayes classification using the Gaussian NB Library and the SKLearn dot nice based library. So that's all for this tutorial. Thanks for watching.
31. Neural Networks Representation in Machine Learning: I think goes, Welcome to think X Academy. In this machine learning playlist, we're going to understand now what our artificial neural networks and how we can use neural networks to detect some objects around us and for speech recognition and for extra cognition. And there are various applications of neural networks. So, so far we, in this playlist, we have studied a lot of learning algorithms like linear regression, logistic regression. And these Lenny and boy comes are simple to handle so many features which are required in doing the object detection and some other problems also. So neural networks or the artificial neural networks solves that problem. And we're going to address that problem. All right, so now we are, you can see that there is an image on the screen here. And this amazed me know that this is alpha Mango, Right? So this is a manual. So now what we're going to do is we're going to see how humans actually recognize objects around us. And then we're going to see that how a neural network in a similar fashion, how neural network will recognize that object. And it is really fascinating to know that Newland networks, the artificial neural networks, actually motivated from the neural networks inside of a human brain. Right? So let's see how we, firstly, we'll see how humans will interact with the objects around us for less than what the object is off. And then we will see how the neural networks on the artificial neural networks will understand this object. And this object is often mango. And we know that this is a mangled because for several years we have seen a mango, we have a student mango. We have seen different types of mangles and people are speaking about mangos. So we actually know because we are trained and since we are drained, we know that this is a mango. But let us consider that there is a child here, right? Which is let say three or four year old child. And he has never seen a mango in his life. So what I will do is I will take a mammal and I will show this mango to him, right? And I will ask him, can you recognize this object? Then he will say that I do not recognize what this object is. Then I will say him that this is actually a mango. And after saying that, he will just see this picture all the object with his eyes. So basically, eyes will capture the images of the object, texture of the object. And he can even touch this object right here will touch the mango and humidity, the shape and texture and different features of this object. And then I will tell him that actually this belongs to, these all features belong to a mango. So this is basically a manual. Now, after some weeks, let's say after three to four weeks. I will again take let's say we take this child do some shopping mall or a market. And in the market, he again sees this object, or a mango. But again, he tries to remember what this object is, but he does not remember that because. He has only seen it for the first time. Now when he goes to the shopping market, he will see this object, which is this mango. And then after this he will ask, I do remember what this object is, but I do not actually remember the name, right? So we will tell that actually this is a mangle, right? And similarly after years and years, This child will become drained because the neural networks inside of this child's brain will recognizes objects. So that whenever he will see this object, he will be able to understand that this is a mango. Now, this is how humans interact with different objects and understand what this exactly is. Now, instead of some mangle, it can be some words, some eggs, some voice and like this, or some phase. So you can see that there are different industries which actually use neural networks. For example, let's take an example of Hess law, or let's say Google. We've all, we all have used Google or city to actually give voice commands to them. And you can see how fascinating it is that they can actually process are moist and gave answers really good or bad. Now this is because these are renamed neural networks and their soul deep neural networks that they can actually understand your voice. Now, FDA uses the object, object detection neural networks and it is so much precise that end the guards, they have enabled this object, object detection, which basically gives or actually it makes the costs as those self-driving cars, right? So this is enough motivation or what is the actual scope of the neural networks, the artificial neural networks. And you can see how much essentially is to understand the basic idea behind neural networks. All right, so this is the human interpretation of objects. Alright, so let's see how we can actually convert this human interpretation or fall into the artificial neural networks. So the first thing is we will first collect some of the images of this object. It can be any object, or in the case of classification, it will be some x. So we will create some really created, already will have some dataset which contains all the images with some aspect ratio. Let's say this is 64 by 64 pixels. And here we will have all the images with the same aspect ratio. And we're going to supply these images to the neural network in the input layer. So the neural network starts with D input layer. We have where we have some nodes, or these nodes, nodes are basically the input layer. Let's say we have three nodes and we have input layer and endo neural network. We have this first layer will pass all the images. These are basically the images of mango and pass these images as a vector or as a matrix in the matrix form or in the pixel form to this input layer as X1, X2, X3, right? So this is the input layer where we just supply the input values to dock neural network. And from here, we're going to actually now the neural network will use this input layer to do the further evaluation, which is the hidden layer. So after getting all, after getting on the inputs, now if we move to the hidden layer, in the hidden layer also we have some nodes. And let's say we have four nodes in here. What happens in the hidden layer is that we apply some matrix transformations in the input layer. And using that, we will basically pass it to the hidden layer nodes. For example, let's say we have X1. We are going to make a channel to this node of the hidden layer. And we will make another channel, do this, another node, and we will do it with all the nodes like this and winemaking these channels, we also assign some weights to these channels. Let's say this is Lead V1, V2, V3, and wait for. Now. In the hidden layer, which is the second layer of our neural network, we actually try to meet the nodes after making some channels from these inputs. And we apply some weights to these inputs. Although these waves are not fixed, these will actually get adjusted to make our predictions more accurate, which we will see later on. But here what we're doing is we are actually making a channel to all the nodes of the hidden layer with the weights weight 1234. And now what we will do is we actually grabbed them like apply. The operation is X1 multiplied by the vapor, which is made one. And then we add some bias. We act some bias to this. And after calculating this, we will also calculate for weight two. So it will be X1, W2 plus some bias. And after adding that, we will get these nodes in the hidden layer nerves, all them as B1, B2, B3, B4 and something like this, right? So the same thing happened with all the nodes of the hidden, sorry, in the input layer. Again, the x to remake a channel to this one and this node, and this node. And similarly with all the nodes, it will make a channel and hit on, has some nodes, sorry, some weights of these channel, right? And similarly for the XY Nord, we will also have some channels that some weights. Now we perform this operation which is x1 multiplied by the weight. And we add some bias and begin this hidden layer. There can be more than one hidden layer also. Let me enclose all these nodes in a single layer. So now what we do is in the hidden layer we calculate the bias. And there can be some more layers and we again make some more channels to those nodes and will again perform this operation. But let us consider that we have only one hidden layer. Now what we do is we finally have the output layer. So this is basically the hidden layer. And the final layer of a neural network is the output layer. Now, in the output layer we have all the glasses or outputs that we want inside of that we warned our neural network to predict, in this case, which is via predicting a mangle. So the output will be data container Li2O, works or do classes which is of a mango or not among. All right? So here I will make a channel to the output layer, like this, right? And ln this. I will make all the channels in the same manner and we will get the output as a mango or not among. All right, so this one, let's say this one read business that it is a mammal and let's say this one represents that it is not a mammal. All right, so these are the basics of a neural network. We supply the images to the input layer. We apply this operation by adding some bias to the hidden layer and then we will get the output. And the output will be a mammal or not a mango. But this is not the end of the case because in the first isolation, you may get the wrong answer. So let's say I set blight image of a mango, but after calculating these buyers and waves, it predicted that it is not a mango. Let's say it predicted in the output market is not a mango now, in this case, if it happens, so what we will do is they will LD, neural network. Actually we can create a table of actual versus predicted values. So let's say that the neural network has predicted that it is. That it is not a mango, but actually it was a mango, right? So what we will do is then it will, vendor prediction is gone. What it will do is it will back propagate to the input layer. It will do the backpropagation. I'm going to write here back propagation, right? So it will back propagate to the input layer. And it will adjust the waves. It will edges though waves, right? It will adjust these waves. Wait V1, V2, V3, and all the weights off all the channels. And how will it adjust? What we will do is it will tell the new acts. In this table. We will tell the neural network that there is some change in ETO, right? So I can actually create another column I'm going to create in, in the down. And here you can see I have created another column which says that the error between the prediction and the actual is, let's say, plus 0.5, plus 0.6 or minus 0.4 and so on, right? So for different iterations I have different values of EDAR. So here you can see the error is plus 0.5. so it will use this edit to actually adjust these weights. And on adjusting these weights, it will be able to make them prediction more accurate. And that's how it improves the accuracy, right? So hit improves the accuracy. So after seeing the errors that are coming back propagates and it edges the vapor according to the editor that misapplied. And then it will again make a prediction. And then again we will check it with the table. The more time we do this, we call this as straining the backpropagation. And you can see initially it was going like this, which was the forward propagation. So the forward propagation and backpropagation is basically one cycle. And the more these cycles are, the more will be the accuracy of our neural network. So this is basically how we train a neural network to make predictions. And the more a neural network is trained, the better it will be able to predict. Same as with the, you can see in the example of a human Also, the more you will see an object, and the more you are introduced to that object, you will be able to make a more, a better prediction. One more important point here is to understand what, which is something known as activation function. Activation function, order, threshold function, right? So. And activation or a threshold function can be a sigmoid function. The aesthetics sigmoid function in the logistic regression, this function basically converts any supplied value to one order, right? So inactivation function, but this activation function does, is it actually activates or deactivate some particular node in the hidden layer. So let's say I want to calibrate the wage that is invite, the activators are used is because in some cases, we actually want to hide some nodes. Let's say we want to hide B2 loads. So what this activation function will do is we lived supply it with some value that it will actually give me the desert as 00 means that the node is actually deactivated. Sometimes this activation function is also known as the threshold function. And it is basically just used to activate or deactivate some on the nodes so that it will be able to make some better predictions. Instead of sigmoid functions, there are a lot of different variety of functions that are used for activation. So basically when we introduce activations inside ofs on neural network, we can see we change the introduce some nonlinearity. We introduce some non linearity inside alpha neural net model because these functions are non-linear in nature, you can see this is a non linear function. And then the evolution of advancement of technology in the neural networks. It is Fido observed that different choice of activation functions and they are, for example, currently really is factual function which is actually more reliable than a sigmoid function because it has limitations too. But it will be an advanced topic. And here we are just covering the basics of neural net vocals, comic books. So now we actually know that how a neural network will take the images in the input layer, add the weight, multiply the weight and the bias, and use the activation function to activate or deactivate the unnecessary nodes and then compute the output as a mango or not a mango. Or we can have some more outputs as well. So that's how a neural network does a prediction. Let me give you an example of a real life example of how a neural network is working. Inside of Microsoft PowerPoint, I will go on insert. And here in equation. I will go on ink equation here. And now you can see that I get this dialogue box. Let me expand it. And here you can see if I will try to write something here. Let's say I write an equation, x squared plus y squared is equals to two. So you can see that I am writing this whole equation in my home and writing. And since this is an orderly, drained artificial neural network, this whole equation is being recognized as x squared plus y squared equals to two, which you can see is a 100% accurate. There are some cases where it won't be a go wrong later, let's say I will write as summation sign and i equals to one n. It will be able to do that, but in some cases it fails to understand what actually I'm writing. So let's see here, let's expand this equation. You can see that when I'm writing this, we know that it is y i minus x, but you can see it is not that much accurate. So basically all this is a trained neural network and for most cases it will work, but it is not a 100% accurate also. But it is a good example of learning how actually this is converted into this equation and what actually is done. As you can see, there are some boxes here. You can, it is visible to you and see that there are some boxes. Now what these boxes do is they are being supplied to the neural network as an input layer, and then you will supply them in the neural network. The neural network will do the processing as we have already studied. And it will be finally gave us these results in the output layer. So that's all for this tutorial. If you like the video, please make sure to like this video. Share this video with your friends and subscribe my channel to support us. Thanks for watching.
32. Perceptron Learning Algorithm in Machine Learning Neural Networks: Hi thinkers, Welcome to think it's academy. In this machine learning playlist, we have already discussed artificial neural networks. In this tutorial, we're going to cover perceptron, which is basically a simplistic version of the neural networks that we have studied. So what does a perceptron? And perceptron is a supervised learning, a lot of them. So it is a supervised learning, a Auditorium, which basically means that we are actually given a dataset in the form of features. And we also have a mapping, right? So we will have a dataset, something like this. Vm already discussed this dataset and the previous tutorials of linear regression, logistic regression, etc. So let's say we have two columns in this data set. And the first column represents, let say, some feature X. And it also has a mapping to y, right? So for every value of x, I have a mapping to y. So perceptron basically is a learning algorithm and it is a supervised learning and avoid them because we have a supervision awesome output mapping, right? So the second thing about perceptron is that we will use perceptron to do the classification, and more importantly, a binary classification. So basically, perceptrons are binary classifiers, which means that they convert the data into two classes, right? So firstly, we drain the perceptron learning algorithm though given dataset with the given inputs and output mappings. And then we will get it with some sample data as we used to do it in linear regression and all the learning and Garth unstack stack via studied. So perceptron looks similar to a neural network. So I'm going to now show you a basic representation of a perceptron so that you will be able to understand how it is similar to the neural networks. It is just simpler than the newer link books. So in neural network we know that we have the input features or the input layer. Right? So let us say we have these Input Features, which is, let's say X1, X2, and X3. So we have one input layer here. You can see that now what we do is we try to make a channel to all of these inputs. And basically these are the channels. This means that these channels actually has some weight, which is assigned as V1, V2, and V3. Now if we have n number of features, that will have x n number of features. And similarly, we will have. Wnba, right? So what we do in a perceptron is that we actually drive to make component here. Let's make another component here. Here, vivid make a component which is basically a circuit which does the, this operation which I am going to write here, x1, W1 plus W2. And it goes on till x, n and wn, right? So this circuit is basically an adder circuit, sometimes does also known as an idol. And what it does is it multiplies the components of the input features, the debates, and it adds all of them. One more thing is that we also add a bias, which is B. A minus B is also added to each of these x1, W1, right? So we also add a bias B2B output off this oil company. So then we pass through this component. We will actually be able to do them like application or all of these features and the waves. And we will also add a bias. And after calculating this result, let's say this is the result on the baseQ dildo. Next component, which is basically our activation function and we have already studied deactivate. That activation function is basically used to activate ten, which is basically a sigmoid function. So this is basically an activator auto take shoulder. You can even say that this is a trace shoulder. So this input, which is this one, let me simplify this equation here. We can actually simplify it as sigma levels from my equals to one, n x i, w i plus B. Now let's say these inputs are some insight, some vector x. So I will create a vector like this. Then all the inputs, and let's say we have another vector w, which contains all the weights W1 and W2, ALL OWN, right? So we have these two vectors and basically this operation here, which you can see here is actually just the scalar product of x and w. So x dot w will give me the scalar product, which basically is multiply x1, x2 W1, x given W2. And similarly we are going to add them up and we will get this so we can actually write it as x dot w plus b, right? So this is the final result and we pass it to the threshold function. And what this session will function does is that it actually classifies. It gives a value, either 0 or one. So if the value of x dot w plus b, which is the bias, if this value is actually lesser than 0, it will give a 0. And if this value is greater than 0, this will give me one, right? So activation function, what it does is it classifies into a regular input you will give here. Since it is a sigmoid function, it will give an output as 0 or one. So it will classify it like this. So basically a sigmoid function is this f of x one upon one plus e raised to the power minus, all right? So here if it will give any real value of x, we will get a graph which was something like this. We have already studied this type of graph. And we know that this is the value one and this is 0. So basically it can map this value to 0 or one, which are basically two classes. We can see that 0 represents that an email is a spam or not spam. And similarly with one, right? So it is basically classifying it into two different classes. So this is the basic calculations involved. Now in the, there are iterations of this. So first we will take, let's say we take the input head, I have some input, and this input has some mapping here. Now, what we will do is we will pass these inputs to this circuit. It will add them. We will choose some weights. Now remember that these moves are and just it will reach, right? So it just given it means that we can actually change them and really change them because we're getting some errors then window via calculating on matching it with the classes that we are getting. Now this means that if I will pass the input, the address, I will just give me this result. And it will give the result to the activation function. And let's say after the first hydration and after calculating the first input, it classified it as. Suppose it classifies that as 0. But in the bisection it was mentioning that it is actually one. So in that case, what we will do is we will apply some learning rate and will act some learning rate. And we will calculate the error from here, and we will adjust these weights according to that, right? So this will be W nu, right? When we calculate the W and W, or by adding some learning rate alpha. And we will also calculate the, the Editor dot has come here from v, y i minus y i bought. Right? So vivid. It basically just add the learning rate multiplied by this difference, which is actually the error. And here, y i bar means the wrong output, which we are getting from here. And by the right output, the output that is present inside alpha dataset, right? So vividly calculate, then we will update the value of the weights here. And similarly in the next iteration, when we will update these base, then these waves, then we will again calculate the desert. And again we will pass it to the threshold function. And again we will get some output, get angry with them, tally it with the table that the dataset or the table we already have here. And again, we will calculate the error, multiply it with some learning rate. And basically, learning rate should be chosen wisely because if the learning rate is too much high, it is going to do overshooting off some values. We have already studied this in the gradient descent video. And if this learning rate is very low, it will take a lot of time. And there are some trade-offs of choosing the learning rates. These are the tradeoffs, right? So let me multiply it with the editor and we will add it to the old weights. And similarly, we will do this again and again until and unless we will get the right output, which means that the editor will come out to be 0. And after this, though, perceptron, we get screened. The learning and water come, which is the perceptron learning algorithm. It will get trained. And similarly, we will be able to do the classification, the binary classification. So in the next tutorial will solve an example. It will solve an example where I will give a very simple dataset. And we will see how we can actually perform these operations. And we will see how we can drain the perceptron to adjust or to converge to the solution. So in the next video we're going to do that. So that's all for this tutorial. If you have, if you have understood the concept, please make sure to like this video and please subscribe my channel to supporters. So that's all for this tutorial. Thanks for watching.
33. Implementation of AND function using Perceptron Model: I think goes really come to thin gigs Academy. In this machine learning playlist, we have already studied the perceptron learning and loaded them. In the previous tutorial, we have briefly studied the NAEP presentation of perceptron. In this tutorial, we are going to do a solved example which will basically help us and understanding more about how a perceptron exactly works. So let's take a look on the question. This is an equation which says implement and function using perceptron networks. So we would have to use the perceptron here you can see, and it is for bipolar input sand tardigrades stripes of bipolar basically means one and minus one. By any means. And inputs and targets are this right? So VR, not to do with any data set that we have to make a dataset using the quotient it says, Alright, so let's try to make a table here. And in this table, I've tried to write all the inputs and the outputs from this quotient itself, we wouldn't have to see what are the inputs. So bipolar means they will actually have Boolean books X1 and X2. And that gotta get that B0 one autumn minus one, right? So this means that we are actually doing a binary classification. So data geeks can also be one or minus one expert on him by Boolean inputs into your site. So next name the fullest and put as x, x1, then x2, and it's in the database. So bipolar means one, n minus one. So we flow Skype 11 minus one and minus one and x two, I will write one minus 11 minus one. And now since this is the would have to implement and function we're going to do with the angle, these two inputs. So 11 will give me one. If there is any negative, we are going to write negative one. It'll oscillate, then they won. And here it all is likely to be negative one. So this is how we implement IGE function and this is all we have. This is basically our dataset. You can consider that this is a dataset which we have just seen the portion and we have just implemented their tail. All right, so now what we want to do is we want to drain the perceptron. And basically we know that in a perceptron, we have the input features x1 and x2. And then we make the weight W1 and W2. And we add a component tail. And then activation function, and then we get the output as the Audubon, Right? So the, the actually performed the operation to get the result or the output y, right? So why would it be equal to VS dead end that we can actually tie good dog W plus the bias. All right, so now the step one of solving this problem is to assume, to make some assumptions, right? It does. The assumptions are that limited. First initialize, step one is to actually initialize all the values as 0 and even bias as well. Right? So initially in the fullest hydration basically, we know that after computing the values, after getting the output, we check it with the dataset. And if there is any editor, we add that error inside all though. In order to make an ablation, which is by adding alpha times the error in the weight, right? So we again go back and redo it, I believe after i iterations. So in the first type relation, we are performing this as the iteration. So I am going to just ITI iteration one. Iteration one. The step one is to initialize all the weights and bias as 0. And for the sake of simplicity, we will take the learning rate as one, right? So learning rate is equal to one and maids and via satellite. All right, so in the first iteration we are performing the step one. Let's move on to the step two, right? So we have initialized all the weights, which is W1 and W2 is equal to the bias and this is equal to 0 and alpha equals to one. We already know that. The first task is to calculate endo neck important. Right? So we will, I will just directly kill. The next task is to calculate the net input, right? Which is basically we are going to perform this equation. We're going to perform this operation. And we will find out. Then we will pass it through the activation function. But before passing it to the activation function, we will have to calculate the net input from here. And then we will pass this net input to the activation function, right? So let's calculate the net input, which is why I n equals two. So now you can see the bias is 0 and since its weights are initialized as 0, so I will basically get 0 here, which is my n equals to 0. Or I can just divert latus W1 X1 plus X2 less than bias, which is equal to 0 plus 0 plus 0, which is equal to 0. So what I've done is I've just applying this formula here and I've gotten no net input. Now, pass this input, which is the step three in step three, visited fastest and put tilda activation function or the sigmoid function. Because in this case we are going to use the sigmoid function. Alright, so in order to calculate the activation from the sigmoid function, we already know that the finite output via will be equal to we will give the input to the sigmoid function. And you can see the y is actually equal to f of 0. And we know that document the value of the sigmoid function. We know that if the value of this is equal to 10 minus one, in that case V is zoom. And the value of f, value of y is a 10 and minus one. And the cases, we have already done the cases, which is y, n, which is this one, should be less than and greater than beta. And in the case of CDO, on just directed as v should be greater than 0 and minus 0, doe f phi n and one. We have this one. Or let us write it as T dot theta is basically the way that we get in producing. And similarly for minus one we have y n less than minus theta. So basically it means that if f is 0, f n is between this one which is less than 0. We can also, for the sake of simplicity, I will just, I calculate this Fn and the value comes out to be the value of FIN is greater than 0. And if the value ofs FIN is lesser than 0, we assume that though values are often the, this F by n is one n z, right? So you can see that if the value of this and lakes and could use an equal sign here also. So if the value of y input, which is the net input, is 0, the value of f by n. All right, so now we have the value of the sigmoid function. So we will pass this and after passing it, we have classified it as 0. So what is the value of y equal to 0, which is the predicted value, right? So now we will retaliate with the database. The dataset we have since the when evaluating X1 and Y1 was a 11. So this is for the first input. For the first and put the value of x1 is one and x2 is one. And the output that is coming, which is y, is equal to one. And you can see that the wire that is coming here is equal to 0. And since it is not equal to the value that we desire, we will move to the step four witches update, which is to basically update the weights. All right, so now we will update the weight. Now this is an important step after doing all this of just checking the value of y, which is here. And this one African bidding this, if this is not equivalent to update the weights. And we will do this iteration after iteration until and unless the values of the y which is coming off the calculation becomes all starts becoming equal to these values in the dataset. Alright, so abating the weights. So let's see how we should update the wakes. The new way. We're going to be equal to the or the weight that I have less as far times e off X1. Because VIA assuming the first input, so we are going to write it like this or in place of this, you can also write delta w, right? Which means the editor that is coming. Alright, so we will also have to update the weights and the bias, right? So I'm going to have to abate the bias also. Knew bias can be equal to the old bias. Let us alpha times ie, right? So in this manner, we update though bias and waves. And this is a very important step here. Step four is the updation of weights and bias. All right, so let's do this step. Let's try to update the values of h and bias. So w new equal to w old plus alpha times of this, right? So let's see what they will be the value of w one, which is the new value. So the old value of w one was equal to 0 because we have initialized it as 0 in this step. So it was equal to 0 plus alpha is given into V1. We have assumed that w1 then E, t is the actual output or the target, which is one. And then comes the X1. X1 is also one. So the value of weight one will become one now, right? But previously it was 0, now it is LAN. Lexie. What will happen to the Buto will also have to update them neutral. So previously it was also equal to 0 and the learning rate is equals to one. Then d times X1 is basically d is 14 W2 X2, right? For WW the x2. So I will just write x i here. And basically x2 is one also. So now the web's has become 11. Let's see what is the new Bias? The value of old mice was assumed to be 0. Initially we have initialize it as 0. The value of alpha is one and the value of target is also one. So the new bias that we have is. Equal to one. So now when we use these new waves and these new Bias in the iteration, sorry, this is the first hydration we have only done the first input for the next set of inputs that will considered these waves and these bias. All right, so let's move on to the next input, which is X1 equals to one and X2 equals to minus one. And the value of garlic is equals to one, so minus one, which you can see here. Now, what we have to do is we have to move to this step, which is to calculate the new input, right? So let's calculate the new input, which is equal to x1 W1, W2 plus the bias. And remember the bias and waves have changed in the previous step. So X1 is equal to one. W1 is not veto it just now one you can see here by the right one here, plus x2. X2 is minus one and w is one. Less bias. What is bias here? You can see it has come out to be one. So one will cancel from here and we will get the net input S one. Now after calculating the net input to pass it to the activation function, right? So let me gave me a fourth one. So the value is greater than 0. So the limit as z omega one. You can see this one here. Right? So the value of the function and the finite value that V0 is equal to one. Now we will have to delete this value of the dollar dataset. The value of function was equal to one, which does not matches. So we will have to move to the step four, which is still update the weights. And similarly we will update the waves also. I'm going to do all the steps here. W nu will be L mu one will be equal to W. So all the weight was equal to one. Right here you can see down the order rate was equal to one plus added fighting means the same dot ago does now minus1. And X1 is one, right? So this basically gives me one minus one which is 0, right? And in W2, we will write the old weight oil. Maybe it was equal to one plus alpha times dot, and x two was equal to minus one. So w two will be equal to one plus one, which is two, right? So these are the new waves. And then we will get the Mayas, the New bias. So the old bias was equal to one plus alpha times t. So alpha was one and d was equal to. Minus one. So what will this give me? This will give me one minus one, which is again field. So I patients after i iterations that will then move on to the colored input and then the fourth input. You will observe that menu will move to the fourth and book, but you have to do is these are the same steps that you have to do. I'm not doing it as I'm just leaving it for you. But you have to do is you have to use these new Bias and new weight. And then you will have to move to these steps, which is this one, this one. And then you will have to check whether the value of y is equal to d values. You will have brutality in the dataset. All right, so let's observe and these two tables that I've created for iteration one and nitration too. You can see we have four inputs in both dehydration. And what I've just done, I've just noted the values that we have calculated. You can see the value of X1, X2, the target Y, N, and the Y which is from the activation function. And the change in W1, W2 and bias, then the wage W1, W2, N, The bias. You can see this is the table that I have constructed. And basically we have in this stability live only calculated the input one and input two, but you can do the same steps doing to get these inputs also. And similarly all of these inputs also. You can actually construct a table. Now, observe the input folder you can see and the input for there are, there is no change in the weights and the bias. So it means that actually it is matching with the animal. And you can see the rate that we have and we have got is 11 and the biases minus one. So it means that if I will use the weights W1, W2, and a bias of minus one in the perceptron. Basically, I will be able to get a line which will be able to do, which will be able to just do the binary classification, something like this. Say these are the two classes and it will be able to do the classification and the aggregation. Though you can see again we have the same values. And you can see the change is now, once it is 0, it is going to be 0 all the time. There is no change in the weight one, weight two. And by if there are no letters in it. And you can see that the weights are constant here. So the iteration two is basically to confirm that for all inputs we had actually getting n, though. We are not getting any error, right? So that's for the i iteration two. So that's all for this tutorial. If you have understood the concept, please make sure to like this video and subscribe my channel for support. So that's all for this video. Thanks for watching.