Applied Data Science - 5 : Modeling and Prediction | Kumaran Ponnambalam | Skillshare

Applied Data Science - 5 : Modeling and Prediction

Kumaran Ponnambalam, Dedicated to Data Science Education

Play Speed
  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x
20 Lessons (4h 40m)
    • 1. About Applied Data Science Series

      8:12
    • 2. Types of Analytics

      12:08
    • 3. Types of Learning

      17:16
    • 4. Analyzing Results and Errors

      13:46
    • 5. Linear Regression

      19:00
    • 6. R Use Case : Linear Regression

      18:01
    • 7. Decision Trees

      10:42
    • 8. R Use Case : Decision Trees

      19:36
    • 9. Naive Bayes Classifier

      19:21
    • 10. R Use Case : Naive Bayes

      19:12
    • 11. Random Forests

      10:31
    • 12. R Use Case : Random Forests

      18:47
    • 13. K Means Clustering

      11:53
    • 14. R Use Case : K Means Clustering

      16:24
    • 15. Association Rules Mining

      11:30
    • 16. R Use Case : Association Rules Mining

      13:11
    • 17. ANN and SVM

      4:35
    • 18. Bagging and Boosting

      11:27
    • 19. Dimensionality Reduction

      7:28
    • 20. R Use Case : Advanced Methods

      17:18

About This Class

This class is part of the "Applied Data Science Series" on SkillShare presented by V2 Maestros. If you wish to go through the entire curriculum, please register for all the other courses and go through them in the sequence specified.

This course focuses on Modeling and Prediction. Different algorithms for supervised and unsupervised learning are explored. Use cases are presented for the major types of algorithms.

Transcripts

1. About Applied Data Science Series: Hey, welcome to the course are played data signs with our This is your instructor, Cameron Parnham belong from video Mastro's Let's Go Through and understand what this course is all about. The goal of the course is to train students to become full fledged data practitioners. So we're focusing on making people practitioners who can execute into event data since project right from start off acquiring data all the way to transforming it, loading into a final later our destination and then performing organs analytics on them on finally achieving some business results from this analysis, what do you What you by taking this course is you understand the concept and concepts of data signs, you understand the various stages in the in the life cycle off a data science project, you develop proficiency to use our ANDI use are in all stages off ANALITICO right from exploratory Data Analytics to directive an hour. It takes to modeling toe. Finally doing prediction using machine learning algorithms learned the various data engineering tools and techniques about acquiring data and cleansing data on transforming data. Acquired knowledge about the friend machine learning techniques on also learn how you can use them and also most importantly, then you can use them become a full fledged data science practitioner and who is can immediately contribute to real life data. Science projects notto mention that you want to be taking this knowledge to your interview so that you can get a position in data science. Terry was this practice we wanted touch upon this particular thing off theory versus practice, data, signs, principles, tools and techniques. Image from different signs and engineering disciplines. No, they come from computer science, computer engineering, information, terry probability and started sticks, artificial intelligence and so one on theoretical study of data signs it focus on these scientific foundation and reasoning off the various Mission Learning Gardens. It focuses on trying to understand how this mission learning Salgado's work in a deep sense on be ableto develop your own algorithms on. Develop your own implementation of these algorithms to predict a real ball problems. Just one dwells into a lot off in our equations and formal on deprivations and reasoning. Whereas the pact is on the up late at part of the data, science focuses on a playing the tools, principles and techniques in order to solve business problems get the focus on trying to use existing techniques and tools and libraries on how you can take these and a play them to really work problems and come out with business deserves. This one focuses on having adequate understanding of the concepts a knowledge of what are the tools and libraries available on how you can use these tools and libraries to solve real world problems. So this course is focused on the practice off later signs, and that's why it's called Applied Data Science Inclination of the courses. This data science is a trans disciplinary subject, and it is a complex subject. It doesn't mainly three technical areas to focus on. So there is math and statistics that is mission learning. And there is programming on this course is oriented towards. You know, programming is oriented towards existing software professionals. It is heavily focused on programming and solution building. It has limited and asked required explosion exposure. The math and statistics on it covers overview Off machine learning concepts gives you articulate understanding off how these machine learning all guarded them books. But the focus is on using the existing tool to develop real world solution. In fact, 90 95% other work that later science time. Just do in the real world is the practice of data science. Not really, Terry, of greater science and this course strives to keeping things simple and very easy to understand. So we have definitely made this very simple. We have stayed away from some of the complex concept. We either they tried toe tone down This complex concepts are just stayed away from them so that it makes easy for understanding for people of all levels off knowledge in the in the data science field. So it is a kind of a big nurse course. If I may say that the core structure it is goes through the concepts of greater sense to begin with, what exactly is their assigned? How does data science works? It looks at the life cycle of data saints with their various life cycle stages. It then goes into some basics of started sticks that are required for doing data signs. It then goes into our programming. It question to a lot of examples of how you would use our programming for various stages in data science project. The various stages in data sent injured Data engineering, part effort. What other things you typically do in there that's engineering one of the best practices in data undulating, it covers those areas. Finally, there is the modeling and predictive analytics part where we build into the mission Learning or God Adams. We also look at Endo and use cases for these machine learning algorithms, and there are some advanced topics also that we touch upon. Finally, there is a resource bundle that comes as a part of this course, and those results bundle basically contains all the data sets. The data filed the sample court example coat on those kind of things that that we actually teach as a part of this course which is covered in the examples all of them are given in the resource bundle. So do I Don't know the resource bundle that has all the data you need and all the core sample that you need for you to experiment the same things yourself. Guidelines for students, the fasting this toe understand their data. Saints is a complex subject. It needs significant efforts to understand it. So make sure that if you're getting stuck, do review and relieve you the videos and exercises does. He called help from other books on land recommendations and support forums. If your queries 1000 concerns does, and that's a private message, our do posted this question question, and we will be really happy. Toe responded that as soon as possible. We're constantly looking to improve upon our courses, so any kind of feedback that you have is welcome on. Please do provide feedback through private messages are two emails on the end of the course . If you do like the course, do give leave a review. Reviews are helpful for other new prospective students to take this course and to expect Maxim disc ones from other future courses from We Do Mastro's, we want to make that easy for our students relationship with the other. We do Masters courses are courses are focused on data science, really a topics basically, technologies, processes, tools and techniques of data saints on. We want to make our courses self sufficient as much as possible, eh? So what that means is, if you are an existing we do master student, you will make see some content and examples repeated across courses. We want to make themselves a vision So rather than saying that, are any point in the course? Okay, girl, look at despotic like other courses. Register for the other course and learn about this. We rather want to focus on this course itself. Keep two things in the same course itself. Unless that other concept is a huge concert. That theirselves of separate course. We want to India them as a part of this course itself. So you might see some content that is repeated across courses. Finally, we hope this course helps you to advance your career. So best of luck. Happy learning on Don't keep in touch. Thank you. 2. Types of Analytics: hurry. Welcome. Toe the session on a narcotics and predictions. This is your instructor, Cormorant. Here in this section, we are going to be talking about a lot off the mission learning stuff. Have you used various machine learning algorithms for performing Predictive analytics? Just two for one. You are the concepts that are bean expletive. We're going toe a face in this particular sex session are going to be a little complex. We have toned down. Are the complexity off this algorithms asthma Just possible for easy understanding. However, if you deal, feel confused are, you know, feel stuck. Please re go through the presentations 17 and feel free to use other references. Either on the web are books, toe cross check our our cross validate these concepts because they are, in general a little difficult to understand. So but we have toned down as much as possible. So I hope this whole session is pretty helpful to you. Moving on. What are you we keep hearing about ANALITICO analytics analytics all over the world, and everybody talks about analytics. But the question is, what exactly is these analytics we're talking about? Alex, of course, is anything that you do with data. Don't look at the data in various forms and then try to make inferences and take some actions. But there are a number of types of analytics that you would keep hearing all the time. So just to make sure, here we all understand what are the different types of analytics that today exist today exists in the parlance, the first step of another Texas called descriptive analytics on this is just understanding what happened. This is basic reporting. And anytime you have a report that is coming out, you look at the report and say, Okay, I see this is what happened yesterday. Yesterday we made a sale off, you know, $1000. On this week, we have been making a sale off $10,000. Today's performance compared to last speak. Waas moved by 10%. Now, this is just this Deborah. It's just looking at and seeing what exactly happened the next level of another Texas called exploratory analytics, where you're trying to explore and find out why something is happening. So yesterday we see that Okay, yesterday I made a sale of $1000 which is 10% more than last week So why was there a 10% increase in my sales? Wasit? Because in gender, Lester Day like was a holiday all Wasit because I had some marketing going on yesterday because of which there was some increasing our wasn't more importantly, Wasit because vetted my new safe came from Was it from my Web? Was it from online sales, as was it from my store? Says Wasit from a specific region like Was it mostly from the rest of us? Was the east of users us? So why, exactly something happened on where exactly something happened is what Explorer treated analytics is all about. Inferential start ist inferential analogue Texas Trying to understand a Borden and their population from a sample population here refers to everybody. Let's say I'm trying to analyze cancer patients in the U. S. On when I'm doing this analysis, I can't collect data about all the patients and try to work on them. Rather, I take a sample of this population a few sets off in 100 or 200 patients tryingto unless them and once I get some findings, take that findings and extrapolated that to the end their population. So it's trying to understand a population from a sample this keeps happening in usually when you have drug testing your tested drug on a sample of people on, then that's a profit. But you make sure that it has a nice mix off our archive types of people, like, you know, people of all age groups, people of all ethnicities, equal percentage of men and woman. And then you just the Doug and then say, You know what? This drug works better on men than on woman. So you only looked at a small sample, but you're just taking that on extrapolating that to the Indian population that is called inferential analysis. The next level is the Predictive Analytics, which we are going to be talking in the Dale in this session on Predictive Analytics is about forecasting what is going to happen. We're trying to predict we're going to use past data to understand relationships between various features are variables and use that past data to predict what is going to happen in the future. Carcelle analysis and causal analysis is where you're trying to figure out what will happen if What if I change one variable? If I change one variable. How is it going to impact the other variable? Suppose, for example. In marketing, we have letter to things I have price I have discount on have Daughters is so I have price discounts on total sales and I I'm trying to understand how will my total sales be impact that supposed I reduced the change the price off my product buy, you know, $20. How is it going to impact my totals? Is what will happen if I give a discount off 30 person? I wasn't going to impact the sales I can on sort of lays for baby a slabs off this conflict 30% 40% 50 person. When I keep doing when you do this discounts, how is it going to impact my overall sales? This is called causal analysis on the final thing. We way we keep hearing is what we call Deep analytics. Deep Analytics is really not a type. It is just a term that is being used in popular paradigm. For it is the use of advanced techniques to understand large and multi day so state assets on deep in Arctic involved. You know any of these other ones we talked about, You know it will in all extra exploratory inferential predictive in Castle kind of analysis . In general, Deep Analytics is like an advanced level of analytics that you do. Let's look at what exploded three. Did I landed. It is and exploded in. Analects is one of the first steps you do. Once you get your data into shape on that main goal off the d. A is to understand the predictors and targets in the data set. You want to understand how the predictors looked like how the targets looked like What are the relationships between the predictors on the targets? How was our the predictors correlated with a target? How was I want each of the predictors correlated with each other? You try to look at the relationship between these variables and trying to understand how when one goes up, the other also goes up. But like one goes up, the other not does not get impacted that are. You try to understand these relationships. When you do an idea. It is used to uncover patterns entrance, which is again relationship between these variable various variables. It is used to identify key variables and eliminate unwanted variables, which is you look at typical. A great asset coming in. There are, like 20 different variables and there is one target on you want O make sure that you want to look at this predators and see which other ones that are highly correlated with the target on which are the ones that has no correlation with the target on. What do you want to do is if you see certain variable that does not have correlation with the target. You want to eliminate them, you won't eliminate them. Why do you want to eliminate them? Is because even if we passed these variables to a machine learning algorithm, the mission learning all garden will simply ignore it. But it has to spend a lot off time and resources. The Mission learning algorithm has to spend time and sources trying to work on these variables and understanding patterns, which means it is going to take more time on more CPU power, more memory and stuff like that for it to execute. Plus, if you have no using a big data, said Leno. Huge data said you have to also store this and wondered where tables in the data in a date in the data store and stuff like that so you can eliminate all those unwanted cast if you find out ahead of time. If there are some unwanted variables on you just want to eliminate them. It is used to detect our players. E d. A. Is also used as a great tool for finding out if the data contains some out players on better. You want to eliminate them? Are not It is a also a told toe, Farid it if the previous data condition processes possible mistakes. So when you do a lot of data processing, you can do a quick Edie on the final results to see that the data looks okay. What do you mean by data? Looks OK is that as a data ingestion process doesn't have any bugs because of which it has introduced. Some unwanted data are. When he was trying to do some data transformation, it did mess up something and stuff like that. So you can go those kind off analysis using a d A. And that's pretty simple and straightforward to find out. Suppose you're doing a day transformation. Supposed dead dates are coming in various formats and you're trying to convert all these dates into a proper former and off that logic might have another, because of which it always puts the dead to be dead one. So in stuff putting the actual dated my put data's did one. If you do an immediate analysis our mayor analyzes of by day kind of an 80 analysis will immediately tell you. Okay, all the days that one. So what happened to the dates? So you can go back and look at the data on DSI. Why things are not happening as it is expected it was used to test assumptions, and I parties is so you typically have a lot off assumption that you make a hypothesis you make as to what is happening. So suppose as your sales went up by 10% last week and immediately people are talking. But I think the sales went up because because we made good Agra basements are I think the sales went up because a particular region performed a lot better. So people start making these assumptions are hypothesis on e. D. A. Is a great way to go and verify and validate of the iPod. It's is true as a valid and why, exactly said and things that happening, the tools used for e. D. A. R. The tools we have already looked at that we would be looking at as a part off our class. Eso correlation mattresses are are one of the most primary tools for e. D. A. We will be using that in all our use cases to look at and understand how the data looks like box plots tells you the variations in the data that is coming in. Scattered plots. Scatter plots away, lets you analyze the relationship between two variables. Principal component analysis helps you understand. Principal components. Analysis is an automated way off looking at your day die taking the very variables with high predictability and throwing away variables with low predictability. So it is a good automated way off, eliminating on Wonder data. Hist o grams, of course, history. Grams again are a great waited look a data and understand deep trends and patterns. Thank you 3. Types of Learning: Hi. In this session, we are going to be looking at what are devious types off mission learning on in general what machine learning is all about. You've seen some off this when we're going through the data science concepts. So let us try to reiterate some of them. Data contains attributes they that contains a lot off attributes on these attributes. Basically, show relationships are correlation between the entities. So if you look at data, there are predictor there are outcomes on looking at the data. You can see how one variable influences the other variable learning the process of learning is about understanding. These relationships are correlations between these entities. This is what you generally call us learning are, at least in this term of data science. Learning means understanding the relationship between these entities on Mission Learning is using a computer to do the same. Mission turning is using a computer to analyze the data automatically and learn about the relationships and delivered D corresponding grizzles learning and mission learning. This is what it means when it comes to data science parlance. When you do mission learning on a data, it typically bills what is called a model. A model is nothing but a definition off the relationships between the various attributes. The model is nothing but a definition or an explanation. Off the relationships. A model can be any creation, especially flying, how you can develop one variable from the other. A model can be a decision tree. There were tree that shows by using a decision tree on these variable values. How can you derive at the final target so models can be built in a number of phase? And that is what we are going to be seeing in the rest of the class. And mortars can either be used for grouping of data so you can use model to group similar data. You can try to use models to group like similar customers. Similar products together are you can use models to predict an outcome before we go into mission. Learning this we have already seen we just trying to create rate here. Missions only understand numbers, and you'll be know that text data needs to be converted. The new medical representations for machine learning toe work missions do not understand text on. We have to go through a lot of processing off text to convert text into a numerical are expo, and we're numerical representation for mission learning algorithms to look at them and make them work. So numbers need to be used. I mean, even if you're using a classifications like excellent, good and bad, they need to be converted into a numbers presentation. So when you convert data into categorical data and use them as categorical data, the mission learning of God bottoms internally convert them as numerical representation. So if you're wondering, I am passing categorical data in there. But then the categorical data is marked. Specifically, Leggett is marked as factor data something like that for the mission learning algorithms to understand them. Boolean variables are indicator variables that another thing that you create indicator variables are boolean variables are dummy variables were when your ratings excellent, good and bad. You convert them in the new indicator variables, usually in minus one with values of zero on one are you create documentum metrics, for if you have a lot of text documents, you convert them into a document term metrics and use them for analysis. Now let's go into understanding that types off learning there are two types of learning called supervised learning and unsupervised learning. If you're wondering, what does the word supervise mean? Is there a supervisor who's going to sit there and great tell car the money, things like that? There's nothing like that. The difference between supervised learning and unsupervised learning is that in the super waste learning that is a target variable. You're trying to target a specific variable. You're trying to predict a specific target variable like you're trying to predict sales. You're trying to predict whether a person has disease or not, so there is a target variable on new. The external person specifies that target variable. So that is all the supervision. You do that. So it's nothing complex there. You just saying what the target variable is, and that is why it is called supervised learning in unsupervised learning on there is no target variable. Rather, you were just looking at the attributes and then trying to group them together. You're trying to create groups off five or groups or 10 on your trying to group them. Based on the attributes. Suppose you have data that as 100 attributes, you cannot visually inspect them and group them. Rather, you give them to a mission. Learning will guard them, which goes and analyzes the data and sees how Which of these samples are they rose in? The data are closer to each other and comes up and say, Okay, these are the various groups iPhone based on similarity in values. Someone I be invaluable means like they belong to the same country. They they are the same gender. They're the same age group. So it tries to find similarity between things and tries to group them together. So observations are grouped by similar be exhibited by DVDs. Entity similar. This again come from data and similarity can be ah, off These entities are typically, you know, their distance based values. How they were ordered distance based values is is the distance between the values. Like, say, there is a person A with age 15 person be with 8 16 The distance between the values is just 15. Minus 16 is one. If there is another person whose ages 25 in the distant between 15 and 25 years 10 So similar D is based on distance. In other words, hope distant are how far away are these values were from each other's. Obviously 16 is less are closer to 15 than a value 25. So 16 is lot more similar to 15 than the value 25. So those are distance based values. Our presence or absence of a value is kind of an S r no nothing. So if two people are both male, then you see the n Ismael kind of an indicator variable will always be yes, as combat do. If the person is female, then that value is going to be zero. The presence or absence of a value can also be used to understand similar. What are the types of unsupervised learning? The 1st 1 is called clustering, where the idea is to just group based on data. The second is called Association Rules Mining in association. Does mining your time to talk about how things are used together? No. If the classic example is of market basket, where analysis where you go and figure out in a supermarket, what items are brought together? So the similar T here is in terms off their usage in terms of the buying pattern. Similar to hear is how they're bar together in collaborative filtering. You're again trying to do similar day between people are similar to between items based on again use it. You're trying to find similar customers at a classic example of collaborative filtering is if you go toe amazing dot com, you try to analyze and find out who are. The similar users are the people who are trying to do similar things. What similar things they do. They go and look at the same kind of products. They buy the same kind of products that give the similar kind of comments so that a similar people. So those are the three types off unsupervised learning. We are going to be exploring each of these types later in the in the class. In the case of super waste learning, you're trying to predict an unknown attributes, also called as an outcome based on known attributes. Suppose you have a data said that has three items. There was maybe four items on age. Let's say that is the age of the customer, the price of the product and whether the customer by are not, and then this is If you have, you will have passed it and the pastor that you'll know all three of the variables on. Then what you're gonna do was you're going to build a model that will predict whether the customer is going to buy or not based on the age of the customer on the price of the product. So in the future, you do not know whether the customer isn't going to buy or not. But you will know the customer's age and the price of the product, so you try to predict whether the customer is going to buy are not based on the values off the age on the place of the product. The models are built using training data, so training data is the data at the past data where you know about the outcome anti predictors. You always learn from past later and then. The model is then used to predict the future outcomes where you only know the predictor variables, you know, OK, who is my customer? What artist attributes, but I don't know whether he's going to buy it or not, but I want to make a prediction as to whether this customer will buy are not burst on which I would take some business action. So the whole idea of doing a prediction is to make some business action. So what I'm what kind of business action it might be is that I might try to do some marketing or sales pitch to that customer. I'm a preacher to the customer. Make a phone call, send him an email. If I know the customer has a higher propensity to buy, then a customer who will not care about the product the types off supervised learning are to one of them is colored regression. In the case of regulation, you're tryingto on allies and find about continuous outcome values. In the case of classifications, we're tryingto find classes. In the case of regulation, you're trying to predict values like predict the age of a person or predict the price of something product, the total value of something, the case of classifications. You're trying to predict a class or a group that person might belong toe. Typically, it starts with the binary classifications like the well, the customer, by our will not buy art. It can be more like something like a good, bad, very good, good, bad kind of a classifications. Also, you can try to predict whether that we went. Bank customers should be a gold customer or a silver customer or a platinum customer based on various things where his attributes, you know about the customers. Those are all supervised learning items. So what is the process off? Super waste Learning. The supervised learning has a big process. You please spend some time understanding how this process works. To begin the process, you have historical data past data. In fact, you should be using a significantly large amount of data. A few predictions need to be a great So look at all the past data your past, greater variability predictor values on the outcome. Values like looking at the past data you have attributes off a customer attributes off a product and then whether a sale was made or not, that is past data. The first thing you do is you split this data in the training set on a testing set. The splitting the data is too big. Lever used, done using some random mechanism, some random generators and random split. The idea of using a random split is that when you spread a raider into a training and testing data, said it should retain body training data set and the testing the dust that should individually retain the characteristics of the historical later. What I mean by that as far supposed in the historical data, let's say 30% off the customers actually bought the product. So the ratio between by what's us, not by his 30 to 70 when you do a split between the training and data, said the training anted. Interesting data sets should individually have the same ratio off 30 to 70 or similar issue of 30 to 70. That is what is called a proper strip on. This is not just for one variable. You feel, let's say, of 10 different variables in the data set. All these variables need to kind of exhibit the same thing. It should retain its patterns on. The only way you can know what to retain its patterns is by using a random number generator , because which records will go into the training that I said. Which records will go into the testing data set once you spit the training data is that intestine did assert you wouldn't use the training data set for the learning process. What I mean by the learning processes you passed the straining data, said Toe mission learning algorithm on that machine learning algorithm comes out on builds a model and bills or the model. So let us say you pass some continuous data. It builds a model which may be print like an equation or a decision tree. So I want to build a model. How do we test the model? How do you make sure that the model is good at predictions? Model is good at predicting what it is supposed to predict is I use the testing data set. So I play the model on the testing data set. Remember the testing that does it actually already has the outcome known. But I'm also trying to know use the model to predict Welcome. So now I have a predicted value on the actual value. So then what I can do is I can compact the predicted very value with the actual value and then find out how accurate my prediction ist so I can combat the predicted value with the actual value on. Try to find out how good my prediction is that is called the testing process. So I didn't look at okay, how good is my prediction Is that really a great or is not a great and then I can make a decision to let go back. This is a countries's night already process. So you look at the results year, then you go back on, tweak your learning process. How do we treat my learning process is I might eliminate some variables. I might add some new variables I might try. Some techniques like creating indicator variables are trying, centering and scaling and seeing If my model performs better, I might try different mission learning algorithms for the same machine learning algorithms . I can tweak the parameters to see if it improves my predictions. So there is an iterative process. You go through until you come up with a satisfactory level off predictions. What is a satisfactory level? It depends on the use case. There is no global formulas to 80% as good on 90% is good. It depends upon what kind off you scares you have. Then, once you have ah are reasonably confident that my model is good. Then it becomes my final model. Once I have a final model whenever have new data coming in data where I know the predictor variables, but I don't know the outcome. Our target variables. When the new daughter coming comes in, I A play that new data. So the final model and then use it for doing my prediction on then that prediction is your actual outcome. You can do like trying multiple models also, Sometimes you might draw than having one model. You might have an entire model set and go and try multiple models and then see which one works. Better are sometimes you might use for, like, five different models built out of five different algorithms and then take a vote as toe Which one which particular outcome comes most out of these models? There is a lot of combinations you can do, which we will see later in the future. Thank you. Training, investing, data. Ah, again, Just reviewing what we just saw, what we just talked about historical later contains both predictors and outcomes. You split the data into training and testing Data training data is used to build the model and then testing their diet is used to test the model on How do you test it is that you play the model on training data. You predict the outcome compact, the outcome with the actual value. And that's how you measure accuracy. The points about the training and testing best practices is that you typically do a 70 30 split. So if you have 100 records 70 the cars go to the training set and 30 records or the testing said, and you have to do random selection of records in order to maintain the same kind off data spread on both data sets. We would be day doing this training and splitting test in our use cases, of course, so you can see how, exactly that one is done. So this country this concludes the discussion on the types off learning. Thank you. 4. Analyzing Results and Errors: high in this section we are going to be talking about How do we combat the results off our super ways? Learning exercises on what kind of leaders are possible during this exercise. So when you go when you want to compare the results off our training and testing exercise, what you build is what is called a confusion matrix. I don't know why they called it a confusion matrix, but that is what you build. How is the confusion matrix built? Is that it, Clark to predictions against the actuals for the estate? Er So you basically build a model with the training data said then, then used that against the test data toe actually test the model. Then you plod this confusion matrix in which the actual values off the outcomes are the target variables are plotted here as columns. The values. In this particular case, it isn't bullion outcome. It's a true are fallout. False outcome. Maybe we're trying to predict patients who have a specific disease, So the actuals applauded. Ask columns. Your true or false on the what you predicted is part of hostels rosier, true or false. So you're trying to compare the actual stooge predictions. And then you fill out this particular table as toe How maney actually separated correctly. How maney actuals? You predicted incorrectly. So this is what you call as the confusion matrix. The confusion mattress tells you the correct predictions and the incorrect predictions. What do you see in this? Diagnosed here is the right predictions about your predicted True, as true and false as false. The ones you see here six and nine are incorrect predictions. So when you do, ah, testing exercise, you take a day, take your model and played on the test data. Then you build this confusion matrix to understand how accurate your algorithm is. And this one clearly tells you, vary barrier. All garden is going wrong. In this case, it's only true or false, But sometimes did. This might even be categorical later like excellent, good, very good extra things like that. So you look at, you know bad. Exactly. That's going wrong. Like is it? Is it May It may be going wrong, whether in terms off sometimes what happens is it would be predicting false, the all the false as false. But sometimes it may be predicting truth as also false like this might be high. This might be low, so those kind of differences will happen. And you can take a deeper look at this conversion metrics to understand how your predictions are working. So these predictions, of course, can be bullying or are classes. And we would see both kind of conversion metrics as a part of our case studies. So what are the various prediction types? So when it comes to the confusion matrix there is arrested in terms that are being used in the rial field on, you would have heard about this a lot about, you know, false positives and true positives, especially in the medical field. These terms are no highly used. So let us try to go and understand what these are. So if you look at the table on the right side, you see what is the true positive troop Asa do is right. Positive. That is what is called a true positive. Your corrective true here stands for the accuracy and positive is basically the outcome season here. A true positive is your correctly predicted. The true here, false negative is something that is a true your predictor as fall so the for this is what is called a false negative. Then you have a false positive where something is a false and you can't incorrectly predicted this is true and then true negative is a correct prediction is off negative. When you said true in the friend, it means that it is correct prediction. When you say falls in the front, it is incorrect prediction. So these other term that I used to defend each of these boxes on these boxes actually play a very important role when you're discussing prediction outcomes. True, positive and true. Negative, of course, is you know, those are the ones expected, but you have to focus on false positives and false negatives. More to understand the accuracy off your predictions. So far, it's positive. Why, why it is false, Positive and false negative are important is that sometimes it is about what is acceptable and what is not acceptable. Sometimes, depending on your use case, false positives are OK, but false negatives are not OK in some other field of false negatives are OK, but false positives are not okay. For example, in the medical field, you're trying to do some tests on then you're trying to predict whether the patient has a disease or not? Ah, false negative in the medical field is critical, which means somebody has a disease. The actual is true, but actually is true here. But you're trying to play us false. So somebody really has a disease. A new prediction algorithm predicts as false. Now that is unacceptable. You do not want false negatives. That means you are not going to be treating someone who has a disease and that can be really fatal. So far, so many beans are not acceptable in the medical field. Past false positives are acceptable. Okay, Somebody doesn't have a disease. It predicts that the person that's ever disease okay, you take that person as a patient, usually you will more do more tests and figure out okay. This guy doesn't have a disease. That's OK, but false negatives are not acceptable. Farce possibly were not acceptable? No. In the judicial feel, you know what that means is somebody is not Somebody has not been a claim. And you're predicting that that somebody has done a great and that is again CD is you know , you are predicting an innocent person as a criminal that kind of protection again, not acceptable. So it depends on the use case as to which one is acceptable on which one is not acceptable . Some formulas are some confusion metric Mavericks metrics that you keep hearing on which you would be using a lot in the data science parlance Are these so fasting is accuracy? How do I make sure I could receive a prediction? Is basically taking true, positive and true negative. That is my current productions divided by the total number of samples. So you have all of them are sample that the count off the number of samples, right? So he said, true positive plus troll negative. They were hit by everything else. Troop are still bluster, negative by plus by develop them. Please come it all these formulas to memory because if you're going toe interviews, this other questions that they might typically ask you sensitivity is how What? How good are you in predicting the true positives? So if you say sensitivity called true positive, they were by true positive by false negatives, sensitivities about this column. Then comes specificity. Specificity is aboard the false column, so you are different. Taking too negative here and the wearing by two negative plus farce positive specificities about this false color. Then comes position. Precision is about this column on the The stroll destroyed the true that is true positive. A very bad Prue positive plus false positive. So these are the various formula that is used to define the accuracy off. Your predictions typically has said, these are some things typically you will find in interviews you kept asked these kind of questions. Production edits. So what kind of editors do you get in predictions? There are two types of errors you want to worry about. One of them is called bias and the other is called variants. So what is bias by us happens when the models cues itself to certain aspects of the predictions while ignoring others. What do you make it skewed when this is a little complex explanation? So let me give you a nice example. Suppose you have. You are tragic trying to predict the age of a person, okay on water. And when you try to actually do a test and trying to predict an age of the person and trying to find the difference between the prediction and the actual you will see that the difference is always somewhere around minus five. The difference is always like minus four minus three minus four it was trying to skip is actually skewed towards third minus phi range. So everything restrained predicted around that minus fade, the difference is always around minus phi. That is what is called bias. So you see, here in this is an example off high by us in the Bible. You see that in stuff hitting the target and the mirror is always cute. The words one end. It is always around this minus Phi minor six miners for kind of thing. Variants, on the other hand, refers to the stability of a model. Are how accurate are close. It is trying to always predict what I mean again by a very interesting to say the same example off age when it trying to predict the age. What is the error? The error might here be all over the place for one nighters minus fire the next person to displace fighting, and that is a treat and there is a minus six and the other is all over the place, so that is called variant and in the chart on the Right, said, You see, in a comparison of what is high by us and low by us and what is high villians and low obedience, I really want to be in the bottom left corner where there is low buyers and low variance so high by, as you see the whole thing has, like skewed towards one end variance. On the other hand, you see a high spread. There's a spread of the prediction that is happening, whereas in the case of low buyers and librarians again, there is spread happening. But the spread this around the center. But you have high buyers and high variance the spreaders against cute that is high spread and high skewing that is happening. So buyers and variants are two important aspects that are discussed when you're tryingto trying to discuss very is machine learning algorithms and how good this mission learning unguarded themselves. There are certain machine learning on guard, um, that tend to have high by us. That then do have I variant. So these are things you want to be watching out for types of errors that are faced air during a prediction of guarded. Um, the first thing is in sample in sample. There happened. What is in sample endeavors? You go build a model and super waste learning. Then then you use thes model on the training data Set itself. So you build a model on the same data, you build the model off. So you will you play the model on the same later Said you built the model off, so you play the model onto the training data set on DSI. How much this model can predict the data from which it is built off. Ideally, given that the model is built off the training data said it should be very accurate on the training data set itself on. That latter is my shirt as in sample letter. So go for the model s high end sample error. That is something really bad that is going on is that there is not enough signals in the data or something like that, because athletes at the minimum, the model should predict the training data set accurately out of sample error is basically everything else. So whenever when Yamada List used to predict on a new data set like a test data set are really, Abdullah said. What is the area that you get in terms of predicting the actuals that has called out off sample letter over footing is a concept that refers to a situation where there is very low example letter but very high, out of out of sample letter. What that means is the model of when you try to predict use the model a pretty train get a set itself. It doesn't alone very later. But when you try to predict, the new data said, it is very hard as very high. What that means is the model is over fitted itself in due to the training innocent. If you look at that, Data said the data said, as signals good, a good signals and the data said doesn't noise. It also has a lot of noise in terms off. It shows some false patterns, false trends to, and then the model tries to adapt too much itself into the training data. Set it models board the signals and the noise. But what happens when you go into a new date? I said the new data set was going toe only retain the same signal characteristics but it may not reading the same. They're nice characteristics. So when it tries to predict on the new data, that they're the Arab that you're going to be getting is pretty high. So that is what is called offering it. It has very, very good in very low in sample error and very high out of sample error. And this is something that happens when the data said you use is very small one you maybe you not having enough amount of data to characterise everything that is happening in the real world. Second is the training data set is not a reflection off the situation that you're trying to predict. For example, you would take data about your phone customers, and then you try to predict how your Web cells will be. So if the patterns off phone sales and Web sales are different, obviously the what the model that is built on your phone data set is not going to predict your vibrate, I said very accurately. So those are the reason. Does that does that what? Some of the reasons why you might have over everything happening when you're trying to do a mission learning predictions Thank 5. Linear Regression: Hi. In this lecture, we are going to be discussing about the first mission learning algorithm called a linear regression. Regression analysis is a very popular and a very old matured and a very a used method when it comes to analyzing the relationship between two variables are actually multiple variables. So another regression analysis. The goal is to build a equation. The equation where the predictor is considered the y On the rest. Sorry, the outcome is considered the Y, and all the predictors are considered the X on. Then you're trying to predict the Y with the eggs using that equation. So it tries to estimate the value off dependent variables from independent variables using a relationship equation. The relationship equation is the model in the near regression. So when you're doing modeling in linear regression, you're tryingto build nothing but an equation that explains the relationship between the dependent variable on independent variables dependent here being the outcome, variables and independent variables being the predictor variables. It is typically used when both the dependent and independent variables are continuous. So everything is numbers, and both of them are contiguous numbers. So that is one regression analysis comes into play where you're trying to predict a number as opposed to trying to do classifications in regression analysis, you need toe. Always look at something called the Goodness Off Fit are how good the regression analysis on how good the equation explains the relationship between the predictor on the target variables. So we will see how this good nuts off for test deter mined on how you have to look at this goodness offered to see verify how good a model that this regression analysis has been. Let's start with the to understand what is a linear equation you might have already is in this kind of linear equations in your mathematics classes, either in your school or in your college. So a linear equation is something that explains the relationship between two variables with an equation. So let us consider that X is an independent variable. And why is the dependent variable? You can explain the relationship between X and by using an equation called y equal Do al for X plus beater. So why is the dependent variable are the outcome variable? Our target variable X is the independent variable are the predictor variable on. You can write any question Where Why equal toe Alfa X plus B on with them determining the right values for all Fine beater. You can predict why, by using the values off. X Alfa is called the slope the alphas called the slope because it is as all physical toe way by X is the formula that they typically use is a number on. If you look at the plot on the right side, you see that this is a line the slope of the line is concerned by by X on intercept of the line intercept off the line is bitta be ties the value off y when X is equal to zero. So when you put X equals zero in the sea creation, the moment you put X equals zero the I'll find the X becomes zero. So why he called the beater? Sobeih ties the value where the line in triceps, the white access so disorder too. So when you dork linear regression model building, you're pretty much tryingto find the values off alpha and beta because you already know X I only off the boldest values of Alphen beater. Once you know the values of Alpha and beta, you can determine white. So when you're building a model, the modern building process goes and looks into the data and tries to come up with the values off alpha and beta fitting a line. Let's eso you already know seen what an equation is now how do we use that concepto do those model building in mission learning? It's called the concept off fitting a link. So what is their footing? The line, it does say, is here, given a scatter plot of why What's a sex fit? A straight line through the points so that the sum of squares of vertical distances between the points on the line, as many must. So what exactly is this particular line saying? Suppose you have two variables X and Y. Let's say excess age and let Us Airways weight, and then you just draw this plot and then plot these points in this craft to the points are going to be like all over the place. The goal off fitting a line is to draw a line through the plot point, draw a straight line through the points such that. So what is that such? That it says, trying to find the distance. The vertical distance between each point and the length, for I tried to find the vertical distance between each point on the line. Now this distance will later be positive or negative. So Squire each off such of those measures. So some the Squire each of those measures and sum it up that is called the sum of squares off vertical distances. The finding vertical distances squired them and then something. So the goal is now in this particular set of points, you conduct any killing, you can draw a line like this. You can go online like this train like this. You can like rattling anyway you like. But the goal is to draw a line such that these some off Squires of vertical distance of the sum of the squares of vertical distances is the lowest possible value. So if you draw like five lines through these points and try to find the sum of squares off vertical distances, the goal is low. Pick up the line, which had this sum of squares off vertical distances, is the lowest are the minimum most value. So that is how you draw a line through the points Obviously, when you say that the sum of squares off vertical distances is many, Mother Line will kind off travels through the set off points almost through the middle. That's up. You'll get a line. There were the distant between the points, and the line is minimized. So once again, take a look. Draw a line through the sort of point. Find the vertical distant between each point on the line on. Make sure that I draw a line in such a way that this distance is a kind of mini moist to the best line is equal to the least residuals. Recipe yields in the integration is nothing, but they're some off squares, some off Squires. The vertical distance is called the recipe girl because I did the Presidio thing that is not still they're still not mapped are what you will see. The rest deal. The difference the rest of the world would mean because the line is actually your model. On the points are the actual values. The difference between the model and the actual values are kind of the residuals, So the best line is the line where the president calls are the least you might remember that land can be fitted to any sort of points. It is not necessary that these points have to be like almost falling in the lane. The points can be all over the place and you can still, darling, The only thing is that if we do like that, if the points are all over the place, the line is not a good predictor for our red amending the points. So that is something we will see in the next light. So what we find here is that this equation the equation for this length of the line you draw through this sort of points this line eleventy equation. Where why he called all for X plus B. Now that equation becomes the predictor off way. That equation becomes the model by which you can use the values off X Alfond beta to determine why when you build a model, you're basically finding out the values off Alfond Beta. Now, you take this model and you have new data. The new data is going to give you X So our play X into the model. You already have Alfond beta available in the model and then you once you have this equation you can pretty much find what the value off. Why is goodness off it? As we have been talking about? How do I find if the line is a good predictor off the points you measure something called us. Goodness off it on. That is a measure called Are Square. Our square is a measure which is just kind off the sum of the squares off. We talked about the sum of the squares off the distances it is R squared is a formula for the more than that, but it uses that residual value to find how good afraid, Linus. So some are squared is a value that goes from 0 to 1. So it has its own farm lot. We're not going to go into that. Typically, a machine learning algorithm when it gives out the model will also give you the r squared for the modern. So the thing is, the r squared value, the higher the value, the better s You're fit at the bottom. Here you have three. A lot of three sets of data and you see for each of them, however, is that r squared looking like so The 1st 1 You see, the pines are almost falling in the line on this has an equation. Also, why he called the 10.97 x plus 4.0 point +18 on its R squared is 0.95 pretty high, very close to one now. The points in the second plot, you see, are kind of away from the line a little. They still fall in the line, but kind of wavering away on this r squared is like 10.74 and the 3rd 1 you see that the points are all over the place. On this, our score this 10.24 So obviously it means that the points are falling kind of almost in length. Then ask what it differently. Going to be higher on this line is going to be a very good day. Dominica are very good model for predicting Y x and wife, because why we say this is now, If this point this points are going to be closer, toe this line. Any new data that is coming in a new value off X, we'll also kind off a new value of X and Y will also fall almost in the same line. If the training data set and the production day does that have the same characteristics so that when you have put that equation place, your is going to definitely predicted a dodge a much higher level, then you have something in the third graf where the points are all over the place, a new point made comes under here somewhere here at somewhere here on. Do you few predict Why did the wise again going to be not that accurate? If the points themselves are going to be all over the plate place? In that case, a linear model will not fit. In fact, the reason why you say it a linear moral is that you should be able to fit the points in a straight line. And that is only possible if the points are already falling almost in a straight line. Remember that you can always switch telling it doesn't mind there. It doesn't need to be that the points have to be all falling in the sea. Straight line you can always with the line, but you have to use our squared to find out how good off a fit this particular model is. Higher correlation usually means better fit. Typically, if you look at the correlation coefficient between two variables, if the correlation coefficient is higher between the two variables and the points are the cartilage in coalition this high, our squad will also be deputy high. So the moment you are doing Explorer did analysts and using correlation coefficient being kind, you can very well say that you can find one of these variable using gather by using Lee near regression 10 Big want toe multiple regression. What is multiple regression? But there is more than one dependent variable that is used to predict Sorry. Where there is more than one independent variable are there are multiple predictors that is used to predict a dependent variable. This is mostly the use because you are, you will hardly have a situation where there is only one predictor variable typically have a number of predictor variable and you have one target variable. Now, in that case, the equations extends itself to like this. So why I called the beta, which is the intercept intercept is going to be still there, plus alpha one plus x one plus Alfa Do plus X two. So what do you see Here is that Alfa One Alfa Dough, Alfa three All these ones are becoming the coefficients for each after predictor variable. So for every print ever able, you have a prohibition. So when you do live aggression in this case, you're going to be determining the value of beta off the interest on the coefficients. For each off the predictor variables, this is most likely the use case. The only thing is that if you want a brothers on a plot, it has to be drawn on a multidimensional plot, not a two dimensional blood on it is very difficult to, you know, with your allies are even draw that kind of a multidimensional lot. If that particular thing has selected three damage, not four dimensions, the goal is still that you should want toe draw a line a straight line through that multidimensional plot such that the distances are minimum such that the heart scored is high. The same process of prediction holds good. Acid is in a single independent variable the same way you use Ask wired, same where you find Alfa and beta on drone murders. When you're multiple predictors, different predictors have different level of impact on the dependent variable. So when you do correlation analysis, you'll see that different independent variables different predictor variables have different level of correlation co visions. The higher the correlation coefficient, the higher the impact off those independent variables on the dependent variable on that level is usually reflected in the valley off the coefficients. Al Faraj al Fordo, Alfa Three The higher does those will to the Alpha one Alpha doing all for three. They will data mine are they will show exactly how much is that particular variable? The independent variable impacts t dependent variable. So suppose if X one has a high impact on why, Al if I would be pretty significant if X to let s say, doesn't have that much impact on why Al Fatah will be something like 10.0 something like that. So when the money play with X to the value of the insulin value will be a small value which does not impact the volley of why significantly, whereas if let's say, x one as like a significant impact on by then, Alfa one will be significantly big camper toe al for doing Alphandery. When you do the exercises on Look at real data. You'll when you start looking at how these coefficients looked like, you will get a better picture of what I'm talking about using linear regression for mission learning. So mission learning take This is a very popular mission learning technique for continuous data It is. This is one of the super ways learning techniques for predicting continues data. The predictors and outcomes are provided US input on the training data set. So you build a training data said, you give the predictors and outcomes to the L Garden. You tell the algorithm in Norwich in was a target your target variable which one so you operate are variables, and when the data is analyzed, it comes up with indignant equation that immigration is nothing but the modern. The dark linear equation of the model, the model that is all ported country stuff, the values off the coefficients for the predictor variables, the values for the intercepts Andi value for R squared. So all of them issue typically output that when you use, I'll guard them to do linear regression. The coefficient in intercept obviously will form the model. In this case, does these values you are going to take an a player. Linear Rick. Equate to a linear equation When new data comes in on R R. Squared gives you an indication of how good, Ah, model your birth. So obviously this mortal in India but is used for prediction. It was typically fast for building models. In immigration is a very old and very popular system that has been used for building models . So let us now look at the somebody Fellini integration. What are the advantages of using linear regression is that it is pretty fast. It was pretty fast in terms of building models. It has very low cost in terms of memory usage and CPU usage. It was excellent for the near relationship. The relationship between the predict tarps on the target is linear, which is they all fall on that straight line. It is excellent, and predicting those kind of relationships on it is related to be accurate for country news , very, but it pretty accurate when the valuables air continues. There is no other basic algorithm that is pretty good, like linear regression. But what are the shot coming off this algorithm? The shortcomings are that it only can be used on numerous continuous variables. It can be used for sexual later. It doesn't work that good for classes. Kind of data like male female kind of data you date has to be continuous. It cannot model nonlinear, are fussy relationships. That's another issue with. It cannot really mortal non leniency. So it's limited by by wathiq and model. So the relationship is not linear. Yes, you can't do that. There are other advance our ground advance regression I guard Dems for modeling in or non linear relationships are quadratic relationships and stuff like that on it is very sensitive, the out layers. So that is one issue does. For example, we saw in this plot we almost all all points fall in a straight line. But suppose just one point of somewhere far away. What is that going to happen is when the one point is far away. This line will try to tilt itself. You are just too on our distant accommodate that particular outlier point. So that way it screw. It messes up the entire equation. It changes all the coordinate just because there is this one point somewhere far away on the whole line adjusted, so accommodate that point. So it is better than you eliminate the outliers before you start building. And indeed, aggression murder, and it uses. It is the oldest predictive model system, or link system used in a wide variety of applications. Wherever there is continuous data prediction, it has been used for quite a long time. So it is a very popular algorithm that is used for modeling continuous variables, especially when the relationship between them is leaning. Thank you. 6. R Use Case : Linear Regression: Hi. Welcome to this use case for machine learning in this one. We are going to be talking about linear regression. So typically, all the example that you see as used phrases will follow those this path. And basically it will go through these sections off explaining what the problem you're trying to solve. What techniques I used in this particular case, Like which techniques that you learned in the date assigned scores are going to be used here in this use case than about data engineering, analysis, modelling and protection, testing and conclusions. You have a pdf file off the same thing that we're gonna be seeing here as a part of the resource bundle. So you can always take a look at that and, you know, take the same code, copy the code over to our and play around with the court. Let's see what we are going to be doing in this particular example. So the problem statement here is that you have an input data set with input data set is a CSP file. It contains data about various car models, It just various data about various called models. And based on this data, we are going to build a linear model. We're going to come up with a linear equation, which can be used to predict the Miles Burger gallon for a model. If you have any new data that comes in, you can have the data about the car, and then you can use that car. Tow that data to predict the mpg in this one. The techniques that you would be using for analysis are billing us linear when Bellini regression that is multi very it we're going to be talking about. Have you do data imputation on? We are also going to be doing variable production, so let's go on and see what we do in this example. So the first thing, of course, you always start out by setting your working directory. Then you will read this file auto mpg dot CS. We in tow this data frame called auto data, so the C SV's again available as a part of your resource bundle so you can take a look at it. The first thing you want a toe immediately after loading this is today is to basically inspect the data so I do the structure off auto data and take a look at what is there? So that shows you what are the various columns in this particular data frame? On what type of data? Eyes? It isn't some example. Data. So it has my respect, Garland. Which off? Just the data you are going to predict. You're gonna build a model entous and come up with an equation. So it's a number. Okay, fair enough cylinders in the car. It is an integer yard. Looks fine. The displacement of the car and number hardship over half a car off the car. And it is coming a past factor. And maybe I need the past year because our Sperber is a numeric value. Ways that showing me a pass a factor factor only comes if it is a non numeric value. Architectural value. So take a look at this and some example they done humanely. See this question mark? So question mark is one of the values for this, Which means that is missing data here on. We need to do something about the Miss India. Then comes Weight looks OK. Acceleration looks OK. The year model year starts from 1970 on the name of the Martin. This is one way of looking at the data. Now we take a look at the same later in a different fashion by doing this summary of data. I mean, it was somebody. Every day it is going to give me for all the numeric values, the cartels. So they're looking at The corporal's tells me if the value ranges are fine, so mpg is anywhere between nine and 46. So given that I know about cars and typically cars on my sparkling between nine and 46 year , the data looks OK. Look at something like Cylinder. It's between three and eight. So there is a strange is from Theresa Industry cylinder makes sense. Yeah, that's what the cars have. Suppose I saw I saw value something like minus 50 or a value like 170. I would be worried because that those are not real values. Those are not valued values for the number of cylinders. Cars are not built that way today. So same way you start looking at the rest of the guys. Displacement like same day harsh power we already saw. There is a question mark recognize is a factor. So we gotta do something about it great. Seems kind of Okay, Acceleration anywhere between 8 to 24 looks fine. The model year between 1970 to 1982 looks fine. So this is how we look at the data on somebody of great and kind of may make sure that data is good and there are no no some junk data that is sitting there and stuff like that. Then again, you can also do a head of moderate and look at the actual records in here. The top six records on JAG looks OK. So they came up with only one issue there, which is there is a question mark that was there for the harsh barber on. We're going to do something about it. So what we're gonna do with wherever the question whatever the heart sport data is missing , we are going to be replacing it with the mean value off horsepower, but the mean based on all the other records. So how do we do that? The first thing we do is we convert this column into a numeric column. So it is a factor column at this point, you know, there's asked numeric function you convert into new Americans toward it back into the same column. Once you wrote this, asked numerical this question Marks will actually get converted as Aeneas. That's not available. Once you do that, what? And what I'm gonna do is I'm gonna compute the mean off this particular column by making call in this comment, which is mean off our towed out on the hospital. And I can I'm seeing here the innate our daughter must true. So it eliminates all the innate columns and for the remaining columns it's going to compute the mean and I'm gonna used it on toe ascended to this one. So what do we do here is I'm forgetting I'm accessing only the arts power column in the data frame on. Then I'm filtering it four rows where it says this is in a off. Our sports only take those where the hospital itself is any. And for that I say in the mean So it just feels does into those columns those records where the hearts properties in a and that is replaced by the me. Now you know, again, a summary of what I did there to make sure everything looks ok Now look at hard power. Now, where doesn't Joe factors anymore? There is now showing, like all the cartels anywhere between 1 94 kind of looks. Okay. I mean, these are really old cars. 1970 to 1982. So obviously you're not going to see heart spores like 3300 or 400 looks again. Fine. Once you get this cleansing out of the way, then you start doing some exploration, that analysis, trying to do some plots to see how things are related to each other. So the first blood I'm gonna be doing this. I'm going to be looking at MPG by the number off. Slender. So I'm diverting the data by the number of cylinders a car has, and for each of them, I'm doing a box plot of against mpg. So this is the commander. Use a you g plot pick apart or data into that. I'm giving the factors, you cylinders and mpg. And then I do with black box blood on this. And I used the color being the fact awful off cylinder factor of cylinders. And this is what I get. The hypothesis is that the more the number of cylinders, the less it will be hard Sperber. And you can see that here for four cylinder the hearts barbers ending anywhere between 25 to 35 but that the number of cylinders increase the average house father keeps coming down , You see that? The Rangers are moving down as the number of cylinders increased. So this is giving you an idea there. The data seems to be following some patterns. Also, you see that for 67 dozen eight cylinders, there are a bunch of our players standing up there. So obviously what, 16 in their car it is giving me, like, 35 MPG on this is like a 1970 to 82 model. So there may be something special there. You can actually go and query the data frame to see which exactly cards are matching this condition by in acquiring with some conditions, and you can actually see which ones are actually giving you this kind of apartment. I recommend you do know kind of explore more on the data player on more plan, make more crafts like this so that you get really a good picture off the data that is there . Once you daughters, then you go for the correlations. Correlations between the predictor variables on the target. Variable target here being mpg on. Also, look at the correlation between the predictor variable themselves because ideally predictor variables between themselves should not have Hi coordination. I really they should have zero correlation, so let us take a look at that. Now we call this command called part start panels, auto data, and let us see what is happening. So we have seen this before, So the actual variables, the target variable is mpg. Here. The rest of the variables are lined up here on the cross metrics off these ones are actually the Kardashian provision. So let's look at my spending gallon on X correlation with all the other variables. So go one by one with cylinders minus 7.78 Good one. Good correlation. Minus 8.0. Good correlation. Medium with hospital again high with great on medium with acceleration, model year and name. One thing you notice is for weight, displacement and cylinders. The correlation co visions are pretty high. Then let us take a look at the correlation between the variables and what you notice is these high values 0.75 point 90.93 And this is happening between cylinders, displacement on weight between cylinder displacement and wait between these 23 variables, there seem to be high correlation between those three variables themselves. And if actually I play some logic in our place, um, domain knowledge, you'll see that the more discipline there, the motors, your displacement and more is going to be a great. So what that means is, between these three variables, one variable. It's a proxy, very. But for the other two, very, but for the other. So what you can I really do wish we can do variable reduction here. Given that you see that this thes predictor variables of high correlation, we can eliminate to off them and only keep one that makes a lot of the crossing downstream a lot more easier and Lord more faster. So that's what we're going to do next. So we're going to just do this one arto date. Our dollar displacement has not ordered a total a certain personal. What that means is, it will daily board these columns from the data. Now you do like a summary of data. What do you see here? You see that those two columns are gone. Now, you only look with six variables in there. One target, and the rest is predictor. Now, once you come here, what we're going to be doing is the actually building the linear model on for actually building the Linnean model are has in its based classes function called limp. This Conley need modern. So just call this function Elham on the first thing You tell us, Waters, that you want to predict what is going to be your target variable. And what are the predictor variables? And this is a utility was a mpg so predict MPG. The stool Sinus predict miles per garden by the set of carbs. So in this case, I put a dot dot means everything else so predict mpg by everything else. You can actually say one column that they predict mpg by hospital. I can say predict mpg by hearts, power plus of eight. Something like that. I can have toe variables are I can put all the variables and then finally I see here which date I need to use. So I'm going to be using this auto data data frame minus six, which means I'm living out the name column. Since name is a text column, it is it. The linear model would not take a textual column, so it only needs all the variables to be numbers. So it is going to give another if you pass it a text column. So I'm just taking out that text column passing the remainder of the data on with the remainder of the data predicts the mile per gallon. With everything else on this comes organ gives me this lenient mortar. The leader model is available for me in a variable. I mean this then can be used for other analysis. This model can be used for predictions. In fact, you can save this variable toe file and then re back the data and the files toe in memory so that so you build a model, save it and then use it for further analysis. Their stride. Look at what this model is actually saying. So you see that it's a summary of Elham through some very well. Um, and this is what it comes out with the fasting, it says. It says the call which is what commanded you give so the same commanded, just repeats. Then it tells you about the recipe else in the data residue. Girls are basically we talked about the rep ical distance between the final line and the actual point. We saw this lines. And what is the distance on this on that vertical distance here is telling you how this distances looked like. So if I take the vertical distances between each point in the data that is given for building the model, so the actual line that the model built that distances, you get that list of all those distances. And this is the cartel for that particular list of distances. So it drinkers anywhere between minus point paid 0.0.9 to 40. That's how the strangest. Now remember, this is not. This is a multi dimensional plot. This is like there are five dimensions in this jail, so it kinds of browse through all this damage. It's very hard to know which utilize this one. Then it tells you what are the Alfond leader. We talked about the equation whyy called Tau Alpha one x one plus all forward two x two xxx plus beat eso years The Alfond beta. The intercept is the value of the beta. So and he had the interceptors minus one point Fine. Except experiential. Over. Then. These are all the Alpha one Alpha. So these are all the coefficients. So this is the corporation for hospital. This is the CO option for weight. The city Coalition for Acceleration. This one is for the modern here. So you get have any new data over, You know other things, but you don't know mpg. So somebody gives you data about this is my heart. For what? This is my way. This is my acceleration. And this is my model. Here, Give me the Mass Spec Island that you're gonna take these values. And you said this formula. So I'm gonna take the value of heart Sperber and multiplied by this, then added up with this This plus this into this place this into this and finally plus the intercept Alfa one X one alfredo X two on for three X trees. It goes on plus the better. So this aereo Very s coalition did out there. It's all put a nasty linear model for you Once you have the leader model the next thing One Look at this. What does my are square? Because that is what I'm going to tell you how accurate your model is going to be on the r squared value here. It shares that at this 0.809 which is really pretty high. So it's It's really a good model. They should accurately predict the data that you have. Okay, the model is now built. Then we need to predict on some new data. Now, for just this example, I'm just going to predict the same data on with the day that the model is built. I'm gonna take the same model data I could actually spread into training and testing. Also, I didn't do it in the specific example I took that I take the same auto data and predict it using this model. So I'm trying to find operating on the same day down which the model is built on. I'm trying to see what is my in sample letter. So predict on this is the command. Predict that l m used it. What? They're using this new This model on this data on this is going to return me a vector called predictor. So for every law in the auto data, it is going to predict our value off mpg. So if the operator had, like, 100 rose that 100 MPG value to go to predict and that is going to come back to you in this predicted variable and then when you do a somebody of practice gives me the range of all the values that came out with. Now what you can then build because you can plot the predicted value against the actual value because you know the actual body for data. You can just plot the predicted value against the actual value and see how it looks like. Ideally, ive the prediction and the actuals up really close. This output should look like a straight leg, and that's what it looks like. It almost looks like it almost a straight line, which means that the predictions are really good. Eso that's one very Check it this way. The second way. The checkers You can go a correlation between the predictor and actuals, but given their peregrine actuals, had to be pretty close to each other. The Kardashian coalition should be really close to one on DSI. What you see now is the coalition coalition is actually 0.89 That's like fine. Nine. So it's really high correlation coefficient. So that means that in sample terrorists, you know, pretty small you have to try it on new daughter and then see how exactly this works. But in general, the R squared values pretty high. The correlation between the predicted and actual eyes high Or, in other words, the in sample Arab is very low, so it kind of looks like, you know, really good model. So this is how you know, linear regression in our and again. This file is available to you as a pdf as a part of the resource bundle. So do go on exploded more, Thank you. 7. Decision Trees: Hi. In this lecture we are going to be looking at decision trees very popular and a very simple and very easy to explain mission learning technique. So it is very popular because it is very simple and it's very easy to understand on easy to explain, it makes my job pretty easy in this particular course. So what happens in a decision tree is that again you have predicted I variables and you have target variables. You use the predictor variables toe build at Decision Tree. A decision tree is built where you keep checking the values of the predictor variables on depending on the values of the predictor variables, you start making decisions on. Based on that, you keep progressively making decisions until you reached and the leaf north, where you are actually predicting are classifying some data. So when you build the street, trees typically start at the root note on. Then progressively, there are some branches that keeps coming in at each branch. You keep asking a question, are doing some logical comparison on based on that making decision and keep moving along. And then finally there are the leaf nodes which actually present you the physicians. Visage entries are a popular classifications. It is mostly used for classifications. It can be used on continuous data, but it is mostly used for classification purposes. So again, a training day day is used to build a decision tree that tree itself is their model. In the case of decision trees, that three is the model on decision Tree. Basically, Briggs predicts the target on. Then you use this particular model on, then you predict for new data. So here is an example off a decision tree on the left side, you have data. So you have three variables that is age and B m my age and B m I. R U predictors. And then a variable call is diabetic. Whether oppression is diabetic or not is three target variable. And for this data, we are going to be building a decision tree on the right. So in this case, we start with age greater than 41. So that is the first question we ask ourselves. A greater than 41 on different is yes. Then we go and make the next decision. Is beom a greater than 24? If it is years, then the value off his diabetic is why, if not devalue is diabetic is no. Similarly, you build a tree on the left side also and you see No and by so the tree actually becomes a model. So suppose you get a new day dot that you have the predict somebody gives you an agent a be on my combination. Ask you the question. Is this patient diverting? Supposed that? I say the age of the person to study on the BME is 40. So we didn't walk toe this this tree to make the prediction. So is this guy's ages 30 to a greater than 41? No coming This side is be emigrated than 28. Yes, so come into sight. And yes, that person would be diverting. So it is pretty easy to walk through the tree and come up with the solution. So the tree itself becomes the model on. Then you use the model to predict on any new data. The challenge in building the tree Is that in what sequence do you use these variables? How do you faster the man? Okay, I should use H year in the root note. Not be a my I could build us another tree with the same data, but I start with maybe a question on a B and my greater than something on. Then after b question on b m I I can make a decision on age, but by using different variables on different sequences, so you can in this to combination you can use age Western than B M I R. You can just be in my first in age. You can build different trees on differently. Is can have different levels. It depends on the complexity of the data that we're dealing with here. The data is pretty simple, so it's under two levels. So trying to billitteri manually is not that easy. So there is a lot of complex that the involved fortunately, mission learning algorithms adapt themselves to this complex city. They will figure out internally within their library, which, when he was to use first, which lady very bus to your second based on the selectivity off this variables and come out with an optimist decision tree. So you don't out really worry about using this variable. So just pass into these algorithms the H B m I the predictor variables and water targeted on day. Build a decision tree and come out pretty quickly. Onda. Of course, the when you do have to predict something, you just give the predictor variables and the prediction algorithm just walked through the tree with the value supplied and comes up with Harrison. So again reiterating what we saw. The depth of that really is highly influenced by the sequence in which the protests are chosen for decisions. So sometimes the trees can end up really large. Sometimes they're pretty small it again. Depending on the number of predictor variables you have, the trees will be bigger or smaller characters using the parade dance off high selectivity typically gives you a large fast ourselves again. The algorithms find them for you unless you are nobody eager to learn the theory behind all these algorithms. You don't have to really bothered about these things. Machine learning algorithms off course, they automatically make the decisions on the sequence and preference. So this is a pretty simple and straightforward a machine learning algorithm. You see the bill DCD use and easy to explain. So what are the advantages off these decision trees? First thing is, they're easy to interpret and explain why this is used to to interpret. Explain. It's a big thing is because that has taken example that you're using a machine learning. I'll guard them to build a model that is going to either approve or reject a person's loan . He's a bank and somebody a place where alone you is a machine learning algorithm toe Look at the different attributes of the person applying for a loan, and then you approve or reject alone. Now the person ask you, like why was my application rejected? Then you can actually easily look at the algorithm and tell him, Okay, this is the reason why your thing got rejected because it is easy for you to walk through the tree and then say and what point decisions were being made on based on what attributes which attributes off him. Let's say the ages, that income is the past trade industry, which one influence his is credit rating and you can actually walk into the tree and explain. Okay, this is how ah, we made a decision. It is not possible to do this this kind of things. With every other mission, a mission learning will guard them possibly with linear regression, but not with something like Let's in. Neural networks are support vector missions. It is not easy for you to explain why the algorithm behaved in a way it did. So this is pretty important. I'd VOCs very well with missing data. If data is missing that on inmates, it doesn't. It is okay. It can handle inmates and you walk through that one. It was sensitive to local variations. What do you mean, basins? A different local variation is that if different rangers off, the target has different phenomenon. For example, let's say you're trying to predict age and in predicting age for age 21 to 40 the it is behaving in one way it is 21 to 40 has a different behavior than age for 40 to 60 has a different behavior, adapts itself. So this behavior. So it'll build a tree where aged less than 40 and then handled that behavior separately and age greater than 40. And look at this phenomenon separately. Let us if you look at something like lenient aggression, it is not local ist unit aggression. You have to draw a straight line anyway through all the points, so there are local variations. The local variations are not adjusted. Its in this linear model, you will see when you're drawing a line and trying to predict something, it will either predict this strange 21 to 40. Very a currently are the 41 to 60. Very accurately, it can predict both if they're both have different kind off signals are they have different kind of patterns. But Decision Tree adopted self pretty quickly toe this kind of local patterns. And it is, of course, fast. It is a very pretty faster and building the decision tree. So model building is pretty quick. Why, fast as an important thing is, if you are to make real time this your bill models in real time, for whatever reason, then the This is one of the advantages the shortcomings of decision trees are. It has very limited accuracy. Accuracy is not that great with decision trees by as bills off very quickly by us toward variable being some value, we already saw what biases in the over resection, So bias builds up pretty quickly on does not good with a large number of predictors. If we have like 40 or 50 predictor variables. Message entries will not function that very well because it becomes difficult for it to find out which were able to use first and which were able to use second and stuff like that , typically used in things like credit approvals in situations where there are legal needs to explain decisions so supposed to reject apart some partisans loan application, that person goes and files a lawsuit saying that OK, I was rejected for not so good. Recent I was by the addiction was by us towards something. Then you can use decision trees will guard them to explain why this particular persons application was rejected. So he does it advantage just in these legal situations and it is used for preliminary categorizations a lot off tens. It is sensitive to local variations. What it can do is first used this local very to separate your data into two sets of three sets. So it is used for some kind of preliminary decision making our preliminary categorization. So you split the data using a decision tree on. Then on each of the split, you can go and, uh, play some other mission learning will guard them so slipped. One can use algorithm may split. Do can use ill guarded them Be so you can actually mix and masters algorithms as you want. And decision trees are usually somewhere a friend in the chain. You first put the data using decision trees on. For each of the split, you can apply at different algorithms and come up with different predictions. So these are the advantages, shortcomings and uses off decision trees. Thank you. 8. R Use Case : Decision Trees: Hi. In this lecture, we are going to be looking at a use case and are for decision trees on. We are going to be predicting flower star flower types for this one. So the input data said that you have here is the world famous Irish data set the area status that contains 150 samples up. Different types off large there. Three tapes off lower there. Sentosa were secular and Virginia on for each of the sample. You know about the better land battle with CEPAL int and supple wit. Okay, so you have 44 I perform attributes are four predictors on. Then you're trying to predict the type of the flour based on these four predictors. So in this example, we are going to be using decision trees. The addition B has a number off algorithm implementations. In this case, we are going to be using the sea point for 5.0 algorithm. We're gonna be doing Have you do the training and testing split? And how do you use the training data? Said toe build model and the testing data said test the model. We will also be looking at confusion. My tricks on how to use it. The data for this is the iris standard, Irish Data said. That comes as a part off the are off the are our data our data that you have. So we just loaded in tow data frame called iris data and then let us start inspecting the ideas data. No structural virus data. They are look pretty similar in their numbers and number Rangers. The species is a factor of three levels, with photos over secular and Virginia. If you look at somebody or fighters, data the SEPA length, you see the range simple with talent and parted worked on. Then finally the speeches. So there is 15. There is equal split between settlers off particular, and Virginia in the data said that you have again. Let's do a head off on this date are now that is going to give you like and one today That looks like a pretty straightforward stuff. Everything looks OK that are known. Looks like there are no out players that looks like there are no missing data. There's data set seems to be really clean on. It's a high quality that seem to be no Anderson there. Once we dodos. Let us start doing some explore a treaty, Talentless is because we said that we are going to be looking at four different variables. Spread very was here. The first thing I'm gonna be doing is I'm just gonna plot one question you might have. Which walking off plot you do. You do when you can do anything you want. You mean you have your own assumptions and things that you have. You can validate your assumption and say, Okay, I think this is going to increase of this increases and stuff like that. Do you make your assumptions and start blurting things out? Eso In this case, the first blood I'm doing this better lend against petal with and I'm going to color the points by the type of species. Now, with better lengthen priddle bit against each other and the stripe of the species color wanting you immediately notices how the separation happens between these three classes. Battle with and prevalent seem toe really separate the classes out, which means that if you just know okay, pedal enters. Let us say in this case, for example, if I have a new flower and I say they prevalent us. Let's get to It can't be anything else other than Sentosa because the pedal and seem to really differentiate the three types of flowers and seemed just seemed happened with But the ripped also Santosa especially really distinction distinct Virginia was sick. Color are testing. Next up there is a little overlaid that you see here, but pretty much by the London pedal with look good. Let us try to do the same thing between CEPAL infants up a bit on what you see is again a little problem here because what you see here is Santoso seem to have separated itself into a cluster when you CEPAL and was a simple but but versi color and Virginia are all mixed up so supple, blend and supple That seemed to be not good indicators of by which you can separate these three types of flowers to explore more. Let us do this box plot. You're gonna be doing a box plot off every variable that you have. This is like, you know, brute force mother dough take every predictor variable on door box, block by species. So for every Prechter, very but look at how the range off that value is given the type of species. So we're going to pedal and peddle weight cepal and then separate on. We're gonna be ending up here with four different blocks here. What you see is paddle length when you look at petulance Sentosa farcical. In Algeria, the Rangers are really distinctive, not even like overlap here. So this is in this range. This isn't this range. And this is in the street. What it tells you is there clearly distinct? Another words pedal and seem to be a very good predictor by just knowing that land pedal end off a flower, you can immediately say I think this is Settles are I think this is particular the same thing. Kind of a place to peddle Victor. Also, they kind of different yet pretty quickly against each other. Come down to separate lent pretty similar plans, except that maybe the boxes are little more tick. And this is a lot, a lot, a little more or lapping that is happening between the Rangers. Not like he had this really, really know. However, this guy's overlapping Gilbert supple, which seems to be bad, you know, there's overlap all over the place, so just by knowing supple, but I don't think I can predict anything about this Flower packs. Next. Let's go on with correlations and correlations can actually reemphasize our reiterate what you just saw and exploited analysts. So let's go and look at the same past start panels. But these four saw the era species on here. Artist correlations with all the four. You see that better land and petal with have 40.95 and 0.9 63 reiterates what we just saw. That this value really separates the three speeches types, which means there is high correlation between the type of the species and these values. So you see that 0.9 firing 0.90. Excellent correlation between these two values that is also excellent correlation between better land and metal guitars. A 20.96 This is kind off interesting one. You might choose street nor one or you might not the Jews. In this case, we are not going to do any variable direction we can if we want to. Between species and CEPAL, lenders point someone. It kind off medium, you know, levels supplements seem to give different yet, but not at the level of fertile land and Priddle bit. And finally, CEPAL. With this minus for three, we saw that coupled with all over the place, so it doesn't have that higher condition. So what you see in the data against the exploded analysis to so this correlation, you can immediately see that this correlation coach in one number can immediately tell you if this predictor is a good predictor are not a good predictor. Pretty simple and straightforward for you to tell immediately by just looking at this one number. That's why Carnation co vision is very well used. In fact, by just looking at the correlation coefficient, you can immediately say Okay, we're building a decision tree. The decision tree algorithm has to decide what is going to be the top note. What is the very predict ivory. But it's gonna be using stop decision box and it is going to be either petal interpreted with because those is what giving you really high prediction going on, buddy modeling and prediction we are going to be again using the R package was splitting between the clear between the training and testing data set. Uh, so what do you what do you use this art package. There's a library called Carrot. The card is a library that gives your large off machine learning functions its internal functions. Rate of mission learning. One other functions it does is it gives you the ability to take a data set and split them randomly into training, and the testing data set on it can spit in the way that suppose you're trying to predict a class variable. Okay, in this case, you're trying to predict the species off the particular flower on the class. Variable has data. Let's say there's four different classes on that. Four different classes occurred in some specific ratio to one another in this gift. Speeches, as three classes said, does oversee color and Virginia on their almost in the equal ratio like oneness to want us to one in the original data set. When it is trying to sprint the data, it will make sure that the training and the testing data each one individually, still has the same kind of ratio. It will split in such a way that both the training said on the testings that will continue to have the same ratio for this particular up. The target variable. So what wanted How you use it is you write of us, call this function called create data partition and that you're passing it. I want to create a partition based on this one. So tell it, this is the target I'll be using. So it is going to inspect this target and then spread it in the ratio of this. So you see the properties 10.7. Which means you're telling you that d playing get a search would have 70% off the data. So I'm freaking out doing a 72 thirties prettier on Lester's. You call? The fourth minute is going to return me a vector. So it is going to put it on me. This party. Glad Victor on. What does this director contain is the row ideas are the wrong numbers off the rose that should be in the training. Did he actually go and inspect? What is an entrained is going to contain no numbers. 1356 It is basically telling you these other those that should go into the training data set and which Electra's missing in this one should be going into deep testing data set. So then you use this one to split the training and testing. So I create this new district court training by substituting this iris data for only those roads that are in entering. Then I know the testing by doing the same thing, like Iris data. But here I called minor centering, which is every road that is not in this particular in train Vector will go into the testing . Now. I split in Spine seven, which is 70% so 70% of hundreds of the Roses 105. So let's do a diamond shine off in all the training and testing, you see that the training has 105 rolls and destinies for the videos. Furthermore, let us see how the species with just my target variable because I did this plate based on this target variable. How was this variables spread between both the training and testing, Gator said. You'll see that there's 35 35 35 again 111 Pressure for the train did assert on it was 15 15 15 again 1111 is to want us to one ratio on the testing, data said there is a magic. The three year data partition does for you. It does things randomly at the same while, ensuring that the spirit make sure that the ratio is still maintain. Uh, this is how you know, turning in testing data. Now, once you have the training data, we are going to be building the model on the training data on. Then testing the model are predicting the testing prep rating based on the tested. So let us see how this one works First, I started with the library of C 50. Okay, this is the librarian will be using your brains talent with install packages and then loaded the space. Then this is the function I called C 5.2 which I'm going to be passing all my predict outlets or training off. Minus phi is all the columns except the target variable. And then your passport is metre variable. I'm just passing the training data certain here and building the model. Once I built a model, let me try to do a summary of the model trying to figure out what is there. What is that in this model? So you look at this model, okay? Start up with the calling. Made Okay, This is a calling made. And then it says the right 850 cases in there and built a tree based on it on. Then here it gives me the actual three that it does so at the top level showing. Here, this is the top level. This is the second level. This is the third level, the top level of reducing Patel dot land. And it is saying quite a lot. Land leads or not, he called the 1.9 it is saying it, everything is settled. So it made a decision that itself. So the three stops there pedal in less than one point in Sentosa. Then he takes the other brand, which is peddled and greater than one pointing and in that one deciding a sub. Nor then it takes under addition, which is on a little bit. So pedal with greater than 1.7, it says, is Virginia, but pedal with less than equal to 1.7. It is going burning another level, another level again, using pattern, not land. It makes a decision less than or equal to five country, in which case it is Virginia pedal and greater than 5.3. Sorry that the other one is what's the color in this one is Virginia. So this is the Redbirds. It takes a note and take the decisions less than awesome value greater than some value. Then it branches on the yes side and the new site, and then it keeps growing the tree. And this is the treat that is actually will be used on any nudity. If you have new data on you, posit the values off pedal and peddle with simple with Lyndon Supple With this is the three are This is the condition that is going to be a played on the new data to figure out whether what type of Florida is going to be, and then it gives you what is called the in sample error, which is you played the same message entry onto the training data itself, and it got a matter of 1.7%. And this is the confusion matrix for the training data suggest the giving you in the columns the actual values off the classes off species on the rows are the predictions on how much are the predictions and actual matching, which is with each other. So the Bagnall you see here are all the correct matches because it's Sentosa vesicular Virginia on the columns and said, Oh so bicycle at Virginia in the rose, the top on the columns are basically the actuals rose or the predictions. The Dagnall is going to be the actual predictions on this one and this one or the other, so it kind of predicted pretty accurately. In other words, it has two errors out 105 that is an error, a percentage of one point 9%. And then it says attributes usage, which are divorced to reduce. So it used the spittle Atlanta reviewed for 100% of the rolls it use pedaled out with that reviewed for 66% of those. It looks like he did not use the separate Lenton supple with a thought. So that is how your model looks like. So this is the model that has been muddled. Now, how do you know testing on this model again? Use the library character on you. Call this function called predict and pass it The model. The actually model that you have to use for breaking on detested the data that that is guilty used to predict any new date, I will again fall of the same thing. Support you get a new data on that new data, has only four columns and doesn't have the speeches color. You just create a data frame out off it and pass it like this exactly the same way you're passing the testing data. So you call this predict mattered with modicum are testing. It is going to come up with this vector for far, where for every, you know, in the testing data friend, it is going to have a predicted value. So look at the table of predictive value and then you see that there are 15 Saito's US 17 mercy colors and 13 Virginia's. That is how the table looks like. So this is what this predictor Now you go on, go your confusion matrix on the whole thing called There is again a function in carrot called confusion matrix where you pass it the predicted value on the actual body. Actually, the testing dollar suspicious. You know, the value actual values. So you pass it, the value off predicted an actuals, and that's going to come up with this research So this is the confusion matrix, and it also comes up, but a lot of statistics. So you know, the confusion matrix the references on the column references. The actual values are different. Sentosa particular Virginia. The prediction is in the Rose again, said Does a particular in Virginia. So everything in the Bagnall are the correct predictions. You can see Sentosa settles AR 15. What's your colored vertical at? 15. Virginia, Virginia 15. And the stories seem to be the only error that is their so two places where the actual values Virginia, it has predicted particular. That is the area on. Then you're the overall statistics. The main thing you want a bottle borders theocracy off this prediction. So accuracy is 95% 950.956 or 95% which is really higher currency off this algorithm doing this prediction. And there are other things like there are some statistical things like confidence into 95% confidence in trouble and P values. We haven't learned them. We don't want to be going into this point. Onda finally, things like sensitivity and specificity. Also we saw. So those are the values for them. So in General, this is how your classifications works. You build your training that they're testing data 1,000,000,000 model on the training data and predict on the piston. Now, we just wanted no one more level of an experiment. We saw that better land and pedal would have high correlation on. We saw that the model only used them for the decision. Did not use CEPAL int on separate wit which did not have that high correlation. Let us say, if the only had data about CEPAL Lenten supple with not brittle and then peddle it How would the holding work hold Oculus? He would be the prediction. Let us try an experiment here. But I'm just going to be using CEPAL and supple wit and species and just a breading orders under the subset of the data on we know that CEPAL endon supple with do not have that high correlation On then, on this, I'm going to be doing training, testing, model building and finally distinct. So repeating the same steps split as training and testing data. Then build a model. Let's see how this modern looks like So it only a separate lend them supple. What's address to go with those two values and you see a tree here Zeppelin great, less than 5.5 red than 5.5 and how this whole tree looks like on debts. Example. Arrest 24.8. Not really really high error, because we know the correlations are not that much. Obviously you want to come up something like this. Now you don't training and testing and see your confusion matrix. You'll see the accuracy is 0.6 or 60% yearly. It wa swatted 3 95 person but a separate Landon Simple, but only at 60 person. This again goes to show the fact that the correlation coefficient values tells you how would a predictor variable list after target? And that's what we see for some really high predictor variables. We got 95% for something that is moderate to weak predictors. We got only 60%. This is you can compare them. You can do some trials and find out how the how good or bad the spirit. Very bolsa. So this is a decision trees for you. As you see the decision tree does not do, does not have to use all the variables available to predict it only has to use as money to figure out which one that should be using. So that's another unique things about decision trees. So that's all we have for decision trees. Plea. Go, go out and try This again is available for you as a pdf. The data file is available for you in the resource bundle. So please go out and try your experiments with the data and see what kind of things you can get. 9. Naive Bayes Classifier: all right. In this lecture, we are going to be talking about the nave base, a mission learning algorithm. Name ways, machine learning. All garden is based on the base theorem in probability, and statistics on base theorem is a subject in itself. There are, like books were done on based here alone on probation theory can be applied to a lot of real world situations. So But for this particular class, I'm trying to, like, really, really simplify based in them to just get the just off how this unguarded them is supposed to work. So the begin with let us start with some probability. So we toss, start with something called a probability, often even a occurring on any vant. We call it probability of AP or faith, and that is usually between zero r wants it or the one when it says, you know, the one is This is, you know, the one of usually between 0 200%. So let us start with something like we are the World Cup soccer World Cup Onda. We have Argentina and Germany playing, and then you can say, What is the probability off Argentina winning the World Cup so even a we're talking about is the probability of Argentina winning the World Cup and that it may be something like point for our 40% chance off Angelina. Winning the World Cup, for example, now comes based here in base here, and we talk about what is called conditional probability. So what is conditional probability is that you talk about trying to predict and even a given that given be as already occurred. So rather than just predicting even they all a bay itself. You have some prior knowledge off certain other things that have happened on. Given that those things have happened, how does that change the probability off this even day? So again, going back to Argentina, burning the World Cup? The probability of Londrina winning the World Cup is, let's say 40% are point for. But suppose you know that Messi is not going to be playing in the game. So you have what is called a prior even that has already happened. Let's image Missy got injured and he is not playing in the World Cup. So what is the probability off Argentine happening the World Cup? Given that Messi is not playing Now that will be a different probability. Maybe 0.1. Our point to to just 10 are 20 person. So that is bay. What based here, um, is all are a boat. It is trying to predict a probably be off on event given that certain events off sudden prior events have already occurred. So this is the formula for based around the probably off air given be. So what does a given be? Meaning the probability often even a occurring given that B s already occurred The probability of gardening a winning given that Messi is injured. So a human being has already occurred. And you know that we has occurred. And based on that you trying to predict a So what happens is when you're trying to predict something, you start with some percentage and then you ask, you know, more and more information about that something around that even something has already happened. Something has already happened that changes the probably be that off that final. Even so, suppose again. If Argentina and on da Germany are playing the World Cup final, you start with the prediction that okay Argentina Vince, the probably have argued in opening is 40%. And then during the my certain things happen on then people don't keep saying this when during a match, the military say that. Okay, the first team, the first team scoring a goal. The Bigley has a problem. We have been in the match that is 70%. That is actually conditional probability. The problem. You start with the probability that both teams have an equal chance of an intimate, which is 50 50. But they also say if somebody is called the first goal, then they're probably be off. Winning goes up by X percent off. Somebody's called the 1st 2 goals. The probably changes like this. If somebody is leading at halftime, the probably changes like this. So all this is called conditional probabilities. You're trying to predict the probability of a future even given their something else has already occurred. Here is an example that they're supposed there are 100 patients on the probably off a patient having diabetes in general. The overall probability is point to now the problem. We have having a patient having died of diabetes. Given that the patient's age is greater than 50 which is what we call the prior, even at a problem at the prior even happening. And that is point for. So obviously, when you know more about the patient, the probability of that patient having diabetes are not diabetes. Keep changing. So this is not just one day when you can start having many private events, not just one. So a patient's age is greater than 50. You know something? OK, that changes the probability. What of that The patients is greater than 50 on debate is greater than the no £150 on. What if you had 1/3 condition that the person is a meal? So as you keep knowing some prior information about the patient or what we call me at the prior, even it keeps changing. The probability are the outcome that you're trained to predict on. This is what the just off base there must all about. I'm just, you know, really, really simplified the whole based their world into just one slide. Now, next move on toe Navy base. What is named bias classifications? Mission learning is all about it. Is the application off the base tedium mission learning so neighbors classifications the application off the base. There um took mission learning to do some classifications Predictions. The target variable you're trying to predict in the classifications becomes the even A on every predicted that you're trying to use in. The prediction becomes the even be want to be one. So what we want to be in. So that is a view map, your predictors and targets to the neighbors classifications. The target becomes even say that it's you're trying to predict on all the predictors becomes prior. Even so, the values are the predictors are actually like the prior evens on. You are trying to predict probably off even a recurring given that be. Want to be enough already occurred. So you have AH customer coming into your store and you're trying to predict if the customer is going to buy your stuff on, you'll be want to be and not basically the prior problem. This you know, already nobody customer like the customers a the customer's income, the customer's preferences and stuff like that. So, for example, if we look at the stable we created in the earlier example, each being my and diabetic. So the way you were stayed these are like probably that is diabetic equal Dubai. Given that the ages 24 being amazed 22 on the probably that for the second recorders again equal. Dubai, given that age is 41 BMX 36. One thing you do in been a based classification is a lot unlike other gardens which just come out and say, OK, this particular recorders a RB or discarded cleric orders yes or no. It simply gives you the probably that something is a yes and so probably there something. It's a no. So it gives you a value of over. It actually tries to predict the probability where each of the possible outcomes for the target variables the target variable is purely arrests are no, it piers you. What is the problem with that? Yes, occurs. And what is the probability that no workers if the target variable is something like high medium low, it gives you the probably for each of them on. Then you basically go on pick, which already one with the highest probability is what I'm going to use for my prediction. So how does neighbors all garden book it basically bills does kind off a table. This kind off a probability table by reading through all the data hit as I had built a probably table like this. So let me try to explain this table. So this table is built upon three columns were trying to predict the salary of a person. That salary is the target Variable on Saturday is between lesson for P and greater than you . Just trying to predict is this person salary lesson from PK a greater than from decay on the predict after you are going to be using our age and gender. Now, looking at the data you are goingto looking at the training data set, you will build this table on in the straight. When you're building this table, how do you build a stable? You first start finding out overall probabilities within the training data set. So you start with here. The training, which is the overall probably be off the outcome. Less than 50 years points on figure ended in 50 is born. Do fight. This is overall probably based on the training data set. Supposed that the training data third has 200 records on this one is a point sound. If I is thebe, probably be off less than self of decay. It means 150. The cost in there actually have salary less than training, Data said. So you already know the outcome. This is the overall probability. Overall, how many people are less than from decay? How many people are great didn't begin. You're the same thing for age, every range in age here. So you see that in the training data set. If what is the probably that somebody's ages between 20 and 30 that comes up here, Point do for similarly 30 to 40.26 you do the same thing. So for every possible classifications available in age, you find the overall probably similarly, for every possible classifications of gender, which is male and female, you find the overall probability. Once you find the overall probability, you find what is called the giant probability between a predictor and the target variable toe. What, as the joint probability is what you see in these middle cells, So let us say, Take this point to fight. The point to fight is the joint probability that person's ages between 30 and 40 on that person. Salaries lesson from PG, So point if I probably that somebody's ages 30 to 40 on salaries of ticket less than 50 k The same valuable the joint probability for all these guys. So this middle to rose, you see, are the joint probabilities between the target and the predictors. So you find the overall probabilities, and then you find the giant rivalries individually. All of them are probably simply take the number of records this conformed to this condition divided by the total number of records. That's all you find probably is pretty simple, straightforward. Total number of records in which this condition a place. So when they say Point defies what point to fight, it is the probably for age, and less than 50 cases basically take the number of records. And with the supplies they were by the total number Africa. It's pretty simple and straightforward. One thing you'll notice is that the some of these probabilities always comes out the one so the overall probably 0.752 point to fight the overall property will always be one same thing with gender spawned 33 plus 330.6 on would be one the same thing. A place for these rose to on these roads to so all of them are going to be like that. Order is going to be one because it's a total poverty of all possible outcomes. Once you have the stable built, it is pretty simple and straightforward for you to do any kind of prediction. And how do you do? That prediction is supposed when a new prediction is to be done. U use ur play the conditional probability formula we just talked about, which is probably of a given be. You see caldo, the probability of being given a into something something so we just talked about in the earliest leg. So suppose I want to predict if the salary of a purse water the salary of the person is a less than for bigger flatlands of decay. Given that I know that that particular persons aged called 25 so that means property of salary less than 50 K given magical to quantify, I tried to predict both the probabilities, probably less than from take 100 and 50 k given that the ageing called 25 on. I just borrow the information from this table, put them in just a play. I'm just playing the formula. So what? I'm putting in the formula. I'm trying to put the overall probably that somebody salaries lesson to decay divided by the overall Probably that somebody's ages 25. So I'm just borrowing them from points of and fight here and then 250.4 Here I'm just borrowing this values might be played by the joint, probably being that somebody's salary is less And take a energy called 25 on that is borrowed from this particular sir and I just go out and compute the value. So I complete both the values and compact the values. Obviously, 0.9 toe is higher than the problem value point away. So that that that means that we are predicting here that the person salary is less than 50 K if that that percent ID equals 25. So this is our name by its algorithm votes. It just first builds this conditional probably table we saw on then just goes a place, this conditional public table. So when you need, when you have new data, it goes on a place, the table, the numbers. You phone the table to the formula and comes up with the result. It finds an outcome for each of the confines of probability for each of the possible outcomes, and then you decide which ones you want to choose based on. So what does? Let's go through some of the advantages off name buys. It is simple and straightforward and faster gain. It works well with nicey and missing data. It probably is probably as a result, which sometimes may be pretty helpful because, let's say I'm trying to predict yes or no. Sometimes it's this 0.9. The problem we have s a spine, and sometimes why you know this point to Sometimes, yes, it's just point Fifi and weigh on Noah's point for fight. So based on that, I can make some decisions when neighbors is giving me a very high prediction that somebody's the outcomes of probably is related in 0.7, I can just go with it. But if neighbors is coming more closer to what a 50 50 kind of a strip, then maybe I can use some additional algorithms toe make more decision so I can make those kind of positions based on the outcome of nearby us, because I'm getting the probability for each of the possible outcomes in other algorithms, but it is just trying to predict only yes or no. I don't know how close as a nose art, but in this case I will know how close is in No, sir. So that is one of the advantages off neighbors shortcomings again, he does limited a good ASI. It expects the predictors to be independent, and this is an important one. The reason basis, the name of neighbors algorithm is called nave because it makes some name assumptions. And that name assumption is that the predictors are fully independent off each other. The assumption that the based you know makes is the prior events we talked about. The B wants to me one each one of them are independent of each other. In other words, the chant The B one should not impact me, Toby To should not in Bagby tree, which means one prior event should be independent of the other prior event, which means once you not influence the other occurring. That is what one of the important things about the base serum is that the prior events are independent off each other. When, when what you mean by independent of each other. Is that supposedly prop one priority? When this age and the other Prior and David is weight, the issue not in any way influenced evict. That's what it is trying to say. Suppose you have 1/3 were able something like, You know, their cholesterol levels. No, it may be possible that your baby's has an impact on cholesterol levels, so the weight and cholesterol levels are not really independent. They have some amount of correlation coefficient. So when you say to wear ables are independent of each other, it means that they're correlation. Coefficient is pretty low. That's what you're trying to say. The independent events of the prior even should not have any correlation between them. That is what we call the independent on. No bias makes that independence assumptions. So when variables are really independent off each other, new bison work very well. But if they have internal dependencies, it is not going to work that well, and that is what we try to mean by the statement. It is again not good with a large number of numeric predictors. If you have numeric predictors, you have to do is bidding so neighbors only works well with classes for predictor variables . So when there are numeric variables, you have to convert them into bins like we did for age. Just now you're convert. If you have a continuous variables like age, you convert them into eight rangers 20 to 30 30 to 40. So you are the build them in tow, please. On Binda Mendel classifications, in this case, Barry's name by as use. It is typically used in medical diagnosis because you want to predict if some person has and the Caesar doesn't have a disease. So this is where knowing both he probably yourself the board. The outcomes is good, because if some you're trying to predict a patient as disease or not, if the probability of noise very high, then you ignore the patient. If the probably of the Essen or pretty close to each other. Then maybe you subject the patient to fuck that test or something like that to do more medical analysis so that that May name vice is pretty useful. It is Houston spam filtering, trying to figure out if a particular email is spammer ham again. In this case, what is very useful is this person ages So if if the spam filter comes up and says that the probably that this email is it's a spam is very high. Typically, the algorithm, typically that process, will kill that particular email and not send it to the receiver at all. But if you say it comes out with a SPAN prediction that it's a 60% chance that this is a spam, then you might send that email to the person on. But there may be a note that Alexis this guy, this particular email might be S pan. So that is where in the email title you will see that. Okay, that is the marking, which is possibly his family, to put a question mark. So those kind of decisions are made on the probability values that are done by the spam filtering algorithms and they are using nearby is to do that. It is used for document classifications, trying to classify, let's say, news articles in tow, whether their articles related to the sports are politics or something like that. Again, it comes up with those probability for each of the various classifications, Shen's available again, pretty useful. So there so and then finally, sports prediction as I talked about trying to predict the outcome off again based on certain evens occurring, are already occurred on this is again very nearby city useful. So probably that somebody will win a game. Given that are they scored the first goal are given that they're dating and halftime all of that at sports predictions again, near bias comes into play in these cases. Thank you. 10. R Use Case : Naive Bayes: Hi. Now we will be seeing a use case for our name bias in our on the use case we will be looking at is spam filtering. Powerful during is a very popular activity that happens on any kind of textual data. It can be like email later. SMS data, Twitter dead. I know. Whatever it has to be. Onda New Place is one of the popular algorithms that is being used for spam filtering. So in this particular example, we are going to be having a data set which has a set off SMS messages. There's so much messages have been pre classified as either ham are spam. And using this data, we are going to come up with a model that can help I only identify messages to be either a hammer span. The idea behind you know, using this kind of analysis is that ah, ham messages and spam messages. They different what people typically light the different terms of what kind of bird occurring. The ham message was a spam asses spam message. Typically have words like, you know, deals, money offer something that is more are selling than hammers it, and that's what we are going to be seeing the techniques used in those use gates are nearby . Classifications training and testing confusion matrix on The new thing we will be seeing is text pre processing. How do you process text and prepared text on converted it into a numerical representation for it to be consumed by mission learning algorithms. We start out by setting the working directory. Then we read this file call SMS Pam Shot, which is available as a part of the resource bundle. You don't worry dot CS we and loaded into this SMS data. The SMS data currently is off type ham and span. We just make sure it is off assitant factor. Why? It is not a factories. Because we loaded that a string satisfactory called the false traitor, have loaded the string as characters. I just did this to show you how you do this conversion so as it must date our dollar type, you make it into a factor. Now look at the structure of data. You see there, 500 observations finder drove off two columns. There is a type column which is a factor off ham and span. And there is text column, which is all all the X expand. It was somebody of the data. You see, there are 437 hand messages. Was a sixties planning decisions. Head of the data, actually social. See the message and you've difficult The message. You see a lot of stuff going on there. There's a lot of the numbers and their currency symbols in the lot of punctuation is and stuff in there. So why does he have a bull gored and process? All these ones for X klinsi explain Sing in are the most popular library that is available for text cleaning Is the library called PM Onda. We just load this library tm on it also loads the other package call and I'll be once we load the celebrity, we have the convert that text data we have into what is called a message Corpus this and they're text PM Library works on a message corpus and it has a function to convert it. So you can work that this way. You call this first method called director source and then you call this mother Carpets? This is how you have to use celebrities. So you just follow the convention and convert this into a message. Corpus on. Then once you convert to a message corpus, you can take a peek at brought this card. Baskins contains using this function called Inspect. So you see that I'm just looking at the tough five messages only. So every message you see, 123 and it just gives you actually the content. It does a lot of meta data it puts into their That is what this object actually shows. But there is also the content. Once you have this, now we're going to go and cleanse the data. We talked about how data is to be clenched in the presentation. So we're going to be doing them actually. Now, on father, there is dysfunction called be a map, which has a lot of this clean of functions. So this diem happy or passing the actual message carpets. And then there is a barometer you pass called removed punctuation. So this is going to the more you about punctuation symbols on the output of another message carpets, which we are saying toe this particular variable called refrain carpets and then you repeatedly no do other processing like past. The next thing you do is remove white space and again they're all ever do is called the same pain with the new refrain carpets as the input parameter. And then you get some called asthma thing called stripped white space which strips on the white space in the data that is us. And now toe the different carpets. Then you know, lower case conversion when there is a coward is called a content transformer against something built into the PM library. Just call it with GM map and say to lower that gives you with a lower case. Then you remove the numbers in the text using the remote numbers. Then you remove stop words where you say, OK, remove Werth on what? What I want to remove use. Call this function here in Donley called stop words. It removes all the stop words. Then you want to move some specially words than you again. Comfortable words on give a list of words that you want to remove as a C list on it is going to just go remove these words from all the carpets. Now, once you have done with this, let us go take an hour, peak in again to an inspector on the data that has been terrifying. Now you see the data being a lot more peanut like there are no numbers and stuff like that . The spaces are out. A lot of darts. RB Now it's a lot more cleaner. Once this data is ready in this fashion, the next thing you do is create this document. No metrics. A document of mad picks consist off the document being converted into a mat. Tricks are dead offering in which every document is zero and every word is a column. So every dock Monday, zero and every word is a column. So you just call this mattered on then that converts the on their carpets into a document. Our metrics? No, let us do. Ah, that Look at that game of the document on metrics, the dimensions of the document on Metrics it, Josie, that I find it rolls each representing the input document that the the document here is actually the SMS message on. Then the columns There are our 2000 columns, so every word becomes a column. So there are 2000 columns in this particular matter tricks. So this is kind off. Another interesting thing because you have so many different So Maney columns and they're on any mission learning all goddamn needs to process all this column. So it may be pretty tired s So what you're just going to do now is that I don't going through this stuff. You want to only focus on words that occurred at least 10 times in all the undead. So you take all the documents, do a word, count off for everyone, make account off how many times this word occurred in this finder documents. And then you can only filter those words that occurred only 10 times. So this is what I'm gonna be doing. I'm calling this function called Find frequent terms on this DTL. And I'm passing the perimeter value 10 Which means it is only going to give me the list of numbers that occurred at least 10 times in this and their carpets and then using that as an input, I am going to be doing refrained corpus doing this call dysfunction list. What this is going to do with it is going to know go and finish at this dark document on Matt tricks for only does what? Which means it is going to reduce the columns from 1966 The only those words that occurred at least 10 times. Andi. So after I do this filter DDM if you look at the diamond chance of the full guardian, I see this finder and 59. So from 1009 166 the columns have come down to just 15 in condoms with this kind of we know decent, honorable. So we're really made a lot off data that you know are very sparse. And we worry and may not be that useful in the signaling process because you need to have this word occurring a lot more times for it to have any impact on the mission learning algorithms. So last. Now, go on, inspect this big document on matters. Look exactly how it looks like. You see the documents occurring as rose on the words occurring as columns on. Then you see a company eras Which word? If this world Let's say the call According document 51 time. So we just put the code is one here. This is called a sparse metrics because, you know, the data is very sparsely populated. Ondas somewhere you see some 20 Otherwise it's all no one's all over the place, mostly ones. And when they printed it out and prayed the whole Mataric So sorry for that. I scrolled down a lot to get back toe the next piece, of course. So once I have a document of metrics Okay, let me start doing no. Some explore it redid analysis. One other things you want to do with words is what is called a word cloud. You would have seen this workload a lot of times where people just plod the words that are occurring the size off. The word is depending upon the number of times the word occurs in that particular data said So we are going to be doing the same thing Forward cloud. So we used this library called Word Cloud. We said the panel we said the pilot Sorry. The pilot basically tells you they cut us came to be used this brewer dot path as a number of color schemes. I'm just picking the color of skin called dark too. And then I'm gonna you first plotting a word card where I'm picking on Lee from the refined karpers only those data where the type is he called a ham. It's only pick those date words Where the type he called of ham. I'm using the refrain carpets, Not the document don't metrics. So just look at that one and then saying, Look at only words that occurred on these five times on, then do a plotting. So this is the plot and comes out to be chose, which was typically after in hand messages and using. Will you get guard now? These were occur a lot. Now I tried to do the same thing. The same word cloud for the spam messages on. Now let us see how it looks like there's a big word called Carl called Seem to be very frequently used free. Seem to be very frequently used claim. So you see that there are certain word that uniquely occur in spam messages which differentiate it from how ham messages will be looking. There s so this is how you can do a word cloud. That's hardly any other. You know, you can do any correlations in here. Many more exploratory data analysis in here because of the type of data you have their own . Lex parsley blood in there. So you're straight away getting door goodies training and testing split. We again go back and use the library. Carrot on. Then we're going to be doing the data partition Bip 70 to 30% on. We are actually going to be splitting three types of data into training and testing. So we first pretty raw data, which is there so much They're done training. Interesting. We spread the carpets in tow, training and besting on. Then we spread the document on my breaks into training and testing Parliament the same methodology. So there's so much data refrain Carper's and filtered radium, each of them being split into training and testing. The next thing what we're going to be doing is we're gonna be converting numbers and the factors. So the documenta matters that they built as in its actual cell values, the count off the number of times the word occurs in every document we're going to convert that now into it s our no, irrespective of how many times have bird occurs in a document. We're going to say the water card, yes or no. And for that we're going to write a function called Anaconda counts convert cones in which years you take an input on. If that input s value is greater than zero better done 10 So then we regard value is greater than zero, which means it doesn't matter if it has five or six or 10. What is greater than zero. They done one else, make it zero. And then I convert that into a factor of using this command. So in this case, I'm converting them into a No and yes. So I just saying, factor off this, I call this matter. Pass it. What? The level said the levels are zero and one and then 01 R one R map to no and yes, so that is their Dunbar. So once I made this con con function, I'm going to use my a plating, a play function which are placed toe every row or every column in the data. So I'm going to say training. I play the training BDM on margin equal door, which means it is going to apply to every column the Concorde smothered. I apply force to the training doc mental metrics, then to the testing document format fix, and then a good demand train and test. Once they can get them a strain and test their actually like my Crixus. My place is so I can work faster. Are the matters into a date offering Using the asked our data, friend, because they do frame is what they all got. Those will take input ass on, then fire. What I'm gonna do is I'm gonna add this type the actual type we will going to predict, because the Dark Montel Matters is not going to have the type of just built off the text part effort. So I'm just gonna add this column type to both the training and testing data frames. So this is all the processing. I don't Once I do all the stuff, let me take a peek at this data frame the first tendrils into Western columns. So you see that these are the rose. This is the training data set. So you some rose missing kid because they went to the desk data set. And then you have the columns on how many number of times the Eckert. So you see that the ones and zeros kind of thing, every being replaced with yes or No, because off the processing we did with this data. So once this is done, then it is a simple prayer thing off building the model and predicting using the modern for which we are using. The library called each e 1071 So even 07 minutes a library that gives me a name by us function. So I called this nearby US function to build a model to which I passed all in my predictor variables. In this case, the predictor variables are all the 59 columns I have in the data frame, except for the 60th column, which is the type column. And then I passed. What is my target variable, which is my type. So this builds my model on, then I'm going to be looking at what? How the model looks like So we talked about when we looked at the presentation on a bias about all the probabilities and condition probably devalues. So you're going to be actually seeing in the market. The actual value started figured out. First is the call the Madonna call pretty straightforward the first, and then he chose What is my A priori? Probably these which is overall Overall. What is this played between ham and spam? What is a Probably something is a ham is 0.87 87 person and Thomas 870.12% in the training data said this is the overall probability. Then I'm a one toe conditional probabilities where for every column for every column in the data. It is going to give me a probability that it is a ham or a span. So every column in this case is every word because we made all the words columns. So you start out with this world anything and anything values where nowhere else in the table. So what is the probability that anything is and know if the document is 1/2 and that is coming out to be 0.97 the same way? What is the probability that anything will be a Yes If the document is a hammer at this point, Toto, the same probably the first time. What is the probability that anything is going to be? No, you have the document that spam on similarly for yes. So you see all the probabilities coming up here and you see, this will always add up to one. This will add up to one. So first there is the overall probability that something is a ham and something. There's a spam right on. Then, once something is a ham, what is the probability that anything will be? No. If there's anything will be yes, is So you've got all these levels are probably that it is building. So you'll see this mattress for every word in the particular document. That will be 59 words. That will be like 59 such things in here. Okay, so this is the overall Karamat tricks it builds. And using this matrix, it always are placed the base formula to find out the actual abilities that we saw in the presentation. Now we go and go the predict function internally predict function will be actually computing. These probabilities were going to say, predicting parade function. Using this model on using this data on you're going to be coming out with predictions on. Then you use confusion matrix toe, actually tablet the predictions against the actuals and how See how well my model is performed against the Dustin data. So reference you have ham a citizen. The overall confusion Matrix Hammond Span prediction. There has only seven errors. And here in that there are seven spam, which actually got in different as hams. What that means would be that the seven spam messages would actually be sent us ham to the actual person. And the person will be looking at seeing ways. My a Stanford. I'm not working. Fine. So that is work. This one, you know, means So now we look at the accuracy 0.95% accuracy. 90% agree. Very, very, very good accuracy for this algorithm. What? We're trying to pretty Pierre Andi, that this is what you have so in, Really? What is going to happen has been over. A new message comes in. You're gonna convert. You gonna have this model. You're going to save this model, that human march for it to some kind of a fight on Monreal time. When really when a really message comes in, you're going to convert that message into, like, a like a vector data friend. The same structure that D D of testes looking like on, then passed that toe the same predict function. It is going to come out with a prediction so in this case, stop coming of the Leicester Productions. It is going to come up with a vector one because you only grating one message. And that is what then you will use to identify if the message is a hammer spam on. Based on that, you decided to, you know, send it to the actual inbox. Are this market that's Pam or whatever you want to do. So this is a really, really ham span, Phil, that would actually work using nearby us. Thank you. 11. Random Forests: Hi. In this lecture we are going to be talking about random forest. Another very useful Madird mission Learning classifications ill guarded them. So random for us is one of the most popular and accurate algorithms that are available for mission learning. It is one of the most popular in the sense that when you have these data saints competitions, this is one of the al gardens, a dumb that is very popularly used. It is an insatiable matter that is used to build decision trees and several what is an insatiable matter and simple method is somewhere that you don't make one decision. You'd make a number of decisions and then take a boat. So let me explain a little more. Suppose that I want to buy a laptop and I want to get some opinion about whether I should buy a laptop or not. If I just go on, ask one from off one friend. Should I buy this laptop or not? And he gives me an answer, yes or no? That is some kind of basic decision meeting. So every friend becomes a model on. I use just one model to get an opinion, and I only get one answer, which is yes or no. Rather, if they going has 10 off Matron's, but that I should buy a laptop or not. And then I get 10 different opinions and then take a walk between this one. Okay? Seven of these people said yes. Three of these people said no. So yes is higher than no. So I should go and buy this laptop that is called in symbol decision making. You are using much people. People marched, people, brains. So that is so in this case, what happens is that random forest is an assembled, mature, built on decision trees. Every friend here is a decision. Tree on down for us is nothing but a collection ofrece That is why it's called the forest because it is a collection off trees so in random for us. What you do is you build my people models, you build multiple decision trees, use the same decision tree algorithm, but you build multiple decision trees. You will see how we build this multiple decision trees later on for predictions. What you do is you actually use each of this model. Suppose I build finder decision trees. I get fined and models. I use all the finding models to make a decision, So I will get 500 different drizzles. So I kind of figure out if a patient is actually sick or not. I will get finder answers Yes and no. Then they take a boat. I get how maney assess. I get how Maney knows I get on, whichever is the highest. I'm going to choose from that. So you take a vote on these results to find your best answer. That is why it is called an instable method. You actually, in an instable matter, when mission learning when you say something is an in summer matter, you typically have multiple models on. Every time you try to make a decision, you play all these models and then take a what between those multiple models. So how this one works is let us say you have a data said that contains M samples or embryos , and in predictors and are in college there in columns and Ambrose in the example you build X crease. But each tree is built with a different subset of data, and that is how there is a possibility of East three big different. If you use the same data off a building with each of the priest, all trees will look alike. But for each tree you pass a different subset of data out of this M samples and in predictors, you pass a difference upset off data to each of the tree. So how is this substance chosen? Their chosen randomly. So you just aimed out of these embryos, and in columns you will choose a subsidy, Afros and best subset of columns randomly. And that is why this algorithm is called random for us random because you used a random mechanism to select the rows and columns for us because you have multiple Treaster. That is why it's called Random Forest. For example, if you ever data said that as 1000 rows and fight columns, each tree is build using 700 rose and three columns, so you're actually randomly 103 some 100 rose and three columns on then used That data sets at the military. Now you don't have to worry about writing code for choosing these rows and columns because random for us, I'll garden implementations or libraries. We'll do them for you. This is more for your understanding. Sometimes you can control in in terms off How, maney, What percentage of rose you want to choose and what percentage of columns you want to choose for every subject three you're trying to build. But mostly they optimize themselves in terms off that election process the data subsidies used to build a tree on for production. The new Dodi is passed to each of these X trees and you get X possible results. And then you take a vote. Are among this ex possible result to see you know which one I want to buy. So, for example, if you're protecting yes or no on, then there's somebody is going to buy a product or not on you. Bill fired the trees. You will get like, 3 50 ways and hundreds of pinos and then you know that 3 50 is greater than hundreds of B. So I'm going with the decision off. Yes, So this is like awarding decision more democracy kind of decision making process that you a play. So the most found result is the overall prediction that you're going to be doing so random for us is about building multiple trees each with a subset of data chosen randomly and then you Whenever you have to predict, you basically ran through awarding mechanism to make a decision, and that is your ultimate prediction. So, uh, somebody for random forest? Let's look at what are the advantages? One of the advantages is that is very highly accurate. Every tree is being built with a different subset of data. What that means is a lot of the nice that is there in the data. Good eliminated Onley their process prop proper possible signals keeps getting, you know, built into this model. So it is fun to be very highly accurate, and it was efficient with a large number off predictors. It doesn't matter if you're like 40 or 50 people. It does, given that each trees chooses a subset of predictors. The number of predictors you choose for a given tree is typically this choir out of the number off actually predictors available. Eso Suppose you are four protectors. Each tree is built with two. But suppose you have, let's say, 16 predictors. What? It's quiet enough. 16. Maybe something like four right, so you only holds four predictors for each of the subjects built, so it is pretty easy and flexible in terms, in terms of adjusting itself to a large number of predictors. Another big advantage of random for us is that it is fully parlays herbal even that each tree is built independently. The rebuilding process can be can use like multiple CPU. They can all run in popular, build the trees and then you can will collect all the results back. So same a place with when you're doing predictions, you can learn all you can do predictions in parallel with each of the trees that are being built using multiple CP use, then collect back the results. So in that very is fully paddle Izabal, you can use some parallel processing techniques to really speed up things. With that, I know for us it is also very good with missing data. The benefits of this gentry's the individual decision trees also accrued the words random forests shot coming is, of course, it is very time and to source consuming your double find the trees instead of one. So that is going to take a lot a lot of time, you know, unless you even if you play partner crossing and stuff like that. It is going to take a lot of time for categorical variables by us might still exist if the levels are disproportionate. Now, let us explain. What What do you mean by the levels are disproportionate. Suppose you're trying to predict yes or no. So there has to levels yes or no in that particular categorical variable that are toe values. Yes or no? Those are what we call levels. So in the training data set the training data said, if you have, like, 50 SS and 40% percent, 50% know that as equal proportion. Supposing the trained it has that 95% off the value said yes. Only 5% of the values are no, that is disproportionate. When you have that kind of a disproportionate data said, where one of the class types dominates the other other classes to in this case, the white dominates again. So much models that you build on that kind of data pretty much will buy us towards that particular y. So anything you if you have like, 95% doesn't 5%? No, in the training data, the model that you build with always he predicting yes, all the time. That you'd risk there is if the accuracy number that will come out with that stuff production will be highly a great because suppose you have 95% Yes, and 5%? No. In the play in the training data, the same proportion will also exist in the testing data. So in the testing data you take on start predicting everything as yes. You still get 95% accuracy because you're 100 records off them. 95 they are going to be Yes. You just go and blindly predict all of them to be Yes, you'll still get 100% accuracy in the estate. The problem there is whenever the nor happens, it is not going to be sensitive to that Noah tall. So these are places where you have to be careful. One of the techniques that has usually applied when the levels are inappropriate is the data said that he used choose a data set out of the training data. Is that where the levels are almost equal? You know, we need 100 records and 95 Domar es and five or no, rather than taking 100 cards only take 20 records with fires and 15 Sorry with finals and 15 years or so that we had some proportion is achieved. Only issue there again. Is your use clearly a large set of data so that you don't get any kind of prediction errors . So those are some of the challenges you'll face when you have this proportion, it levels off values. This is used in scientific research a lot and scientific research toe scientific research, the place. But you're not really worried about, you know, speed off prediction, our speed of model building so you can happily go and use it competitions. A lot of competitions use right now for us Another place where accuracy is most important speed. Our spirit of production is not again. This is yours. There It is also used in medical diagnosis to predict if a patient, as you know, Sicard in Arctic, that kind of protect decision making again. I had no for us. Highly. You create very time consuming. That is the somebody of democratic random for us. Thank you 12. R Use Case : Random Forests: hi. In this lecture, we are going to be looking at our use case for random for us, but just prospectively loan customers The problem. We have trying to solve the areas we have a bank and that bank has a set of prospective customer list, a list of customers, a list of prospective customers who wants to go after. So this is it is going to going go after these customers for a bank loan, maybe the customers interested in taking a bank loan. So these guys are going to take a list of customers and start calling the customers anything. Hey, we're calling from this bank. You are invested in the bank. Now they have all this stuff, Let's say off 1000 prospective customers. But what they're trying to think of us. Do I want to go and call every possible such person our dough? I want to find those customers who have a high probability of being converted into one actual customer. In other words, I want to choose only those people who probably will buy a service by our will by a loan from me or take a loan from me, and I only want to focus on that list of people. So what? I'm gonna be doing this. I'm going to be building model. That will predict if because if a prospective customers will become an actual customer for bank loans. So I have this data. What techniques? I were going to be used in this particular example that we're gonna be using random forest training and testing confusion Metrics. We will be doing indicator variables. Winning on variable reduction are we start with the said working directory on. Then there is a table. We're gonna be reading this fight called bank dot CSP into this bank data. This has actually, like 17 variables it as infirm. It does information about previous campaigns that the people of ran against thes people with us information our previous campaigns. So they're taking a list off a drop of prospective customers based on previous campaigns that they're done and then trying to filter that campaign for a new camping and seeing that richer customers from old campaign would be good enough for the new company. So what data do they have about the customers? Is that they have age, job moderato state does education the four, so they had a previous loan with these people on whether the defaulted on the loan are not what kind of violence they have. The loan violence Do they have housing or not on? But they have loan previous loan or not. Also, then there is a contact information with them. How did they contact them? Is it cell phone or a lot of phone our telephone? How many days where they kind of customers with the bank? Which Monday? Actually, they just Jordan became a customer. Um, duration. I think it's a duration of the loan. The duration of the loan they took, what kind of campaign was sent will put against them. So, typically, banks have campaigns like, you know, email campaigns, phone campaign's Web campaigns and the conference typically of Heidi. So these are the ideas of the campaigns. And then where there any previous companies done hugging the same people before this campaign that was done, and if so, when was a previous campaign done? What was the previous campaign idee, and what was the previous campaign outcome on then? Finally, you have this yes or no, which is when they love Randy last camping against these customers did this customer actually took alone are not. This is the data. We have, like 17 variables in the data. So there is one target very build away. And the rest is all going to be pretty dust. Let's take a look at this. Somebody off the bank. Data received. The data is clean. Ages between 19 and 87 jobs. Okay. Management, blue collar technician, Marital status. Divorced Mary Single. Okay, Education defaulted. Yes or no? Only balance. Okay, there is a balance called minus 3313 but that it's possible sometimes they over pay or something like that. It's OK. Housing is no loan. They took alone yes or no. And they got similar things for all the other perimeters. And they are low, kind of. OK. And finally in the data said there are about four. Fight to one customers off which 4000 of them A no. And find off the Maria's eso more inspection on the bank. Did I just look at the head then? Then? Okay, Don Looks kind of OK. The same file is, of course, available for you in the resource bundle, so you can take further Look at the data more explored. More the data? A lot more. Okay, Now what we're gonna be doing is we're gonna move on on. Look at the correlation coefficients. So you were gonna be looking at the library. Psych, the library. Psych. We're gonna be looking at something called par start panels. No part start panels is going toe hang. It takes a long time when the number of variables are large. So I'm gonna be splitting the data into two sets. I'm gonna be looking at the 1st 8 We have 16 predictor variables. The first time will be looking at the first aid predictor Variables against the target on guy will do one more pass start panels for the remaining. So that it it kind of works easily. And then the blood kind of foods in a number in the frame, I'm gonna be just doing spreading into two and doing them step by step. So let's look at this. Why is the variable that I'm trying to predict? And here are the correlation co visions on what do you see? Uniformly used art. All the predictor variables are bad. Miners over seven minus one or buy toto or four It's hardly anything that us off any sense here, because all the bric variables look pretty bad against White. Now you go down and then do the rest of the guys 9 to 17 and then again only durations into some reasonable correlation point for rest again. There is a 0.0.1 point two. This is all very small. There's a minus 0.13 Typically, they're all very small on what you have. No, in genderless what we call, we have a set off week predictors. 20 years, Williams said of very weak predictors. It is possible that a combination of them will actually become strong. So when you have a set of E creators, it means that some of the basic algorithms, like new buyers, are decision trees may not give you good results. This is way we have to goto. The symbol matters like random for a solar. Does he have random forest works on the set off Really weak predictors? The first thing we're going to do with eliminate some variables which are really, really low, like very close toe, just less than 0.1. We're just going to eliminate those variables like default balance, Damon campaign. We're just gonna take that data, eliminate them. Only take the rest of the columns and create this new data, Sir. Call new data. But there is no greater. Only has so many number of variables. So that is why I destroyed the target variables. And we only looking at those variables where the calculation, as you know, greater than 0.1 more than 10%. So after this foltrigg, let's look at agreed. Occur again. The data kind of Everything is likely. Okay, It's very off fluid. Even very low. Predict are so I don't see how this one goes on. Let's Then what we're going to be doing is we're gonna be doing some data transformations. First thing, we're gonna be doing this building. We're gonna be in the age in the Rangers. So this is the first thing we're gonna be doing. Do have cut on the age and the Rangers. 1 2022 44 to 60 and 60. 200 on you. Just replace the orginal age with this new bend age on. Then what? You're gonna be doing this? We're going to be creating indicator variables for the marital status. So this is a Ukraine indicator variable to create new variable called is diverse and then put a condition if else our new later dollar marital and say equal divorced if marital equals diverse, then put one else port zero. So all this then become indicator variables on. Once you create this indicator variables, you delete the orginal. My writer variable now looks look at how the new data looks like. So you have the age now being becoming a factor of four levels 1 to 2020 to 40 and then you have no Newbury, but is divorced is single and it's married and all of them you see zeros and ones. So so much for the data transformation. For a part, there is no start exploding on doing some exploratory data. Analysis off. How why you can be how white trending against various other variables. So you know this plot for housing against why and then contacts against the type of contact against way. Then you do box plots of duration against ways and the PD's against right. And let's see how these ones are done. So here you see housing no audience against customer. The housing no kind of has an impact off. There's more. Yes, on the when housing is no, which means somebody doesn't have house and their tendency to take a bonus. I possibly the oppo what makes more sense. And you also see that the type of contact for some reason, Sonoran telephone have irises than the unknown reason. Maybe a signal may not be a signal we don't know than these other plots of the duration against. Why seeing the SS typically have a large wire adoration then you know, it's kind of, you know, flak where this is kind of off white at box. But all of them have, you know, high our players, so we really don't know what is happening. So you see, this is a example, but it doesn't give you a lot off confidence during the exploded reprocess. Look at the kind off the crafts and then you don't really see confident that I will be able to do some kind off good prediction that since government no straight and say, Hey, this is really good. There is no go to the model building process, which is a big delivery character on. Then I'm gonna be doing the training and testing split the music created a function to create the training data and testing data on. Then look at the diamond chance each. When I start in variables on, they're gonna split in this 3165135 sex ratio. And they, of course, have the s and no split in equal numbers under this work created a partition. Best for you moving on to the model building process. There's a library called random for us. No, I'm going to be using. And if you are missing their install inducing installed packages on Didn't the library? There is a function culture, and for us that I'm gonna call and there are no forest is called with What is my target variable by all my predictor variable. So why told dot means it's predict why using everything else on the data it is going to be using is the training data set the training did a friend and the model comes out like this . So let's look at what the model looks like. So do a model, and then this is the model you see here. The first thing, of course, is the call that is given here. Then it looks at what type of Burnham. For us, it is a regression of classification. It is a classifications. Random for us. The number of trees it has tried to believe it is treatable. 500 different trees, the number of variables used in each split history. Which means we talked about using a subset of columns and the subset off rose for every treatise going. The bill one, this one chases for every tree had used three columns. So there are totally what we said. There are about eight columns or something like that. Sorry. There were 12 columns on off the 2012 columns for every split or every treat build you used to in the tree. Then here, it gives me, gives you the in sample error, which is you try to predict using the model it has built on the same training data on here . It is giving me the in sample error. Pretty huh? You see that most of the especially the s part does not that go to this pretty high error in the example part. Hi, Erin. The example part doesn't mean that the model is bad. So we're to also see what is the out of sample letter? Also. Then you have this command called importance of the model, which means it is going to give me the importance off the various variables that are being passed. And here it gives your value. So these are the various variables I had the predictor variables, and here is a value that it gives its college Jeannie value on the higher the value, the more important that particular variable assoc duration of the highest 1 to 24 3 days is next 52. Previous is 28 kind of looks OK. This is how the importance of the martyr list then comes to testing part where we're going to do and do the prediction part offered. So I'm going to call this function, predict, giving it the model on the testing data, and it comes out with a vector off the actual predicted values and the particular values of nos and yeses. And then, of course, I go and do my confusion matrix on then see how the results look like it comes up with an accuracy off 0.89 80 name person accuracy, which looks very good. The big admit when there is also inspect a little more on Don't see what is happening. Nose up There are Look at the my diagnosis is there There is 1168 knows on yes. So one once it said knows I've been predicted correctly. Only 30 don't knows are being predicted incorrectly. But if you look at the years wanted to 42% of the s rep Reiter correctly on 149 years were predicted incorrectly. So even though the overall aggressive looks good, what you see is that the s is not being predicted. That a gritty they're more errors in predicting yes, only 40 door predicted correctly is the rest of the SS predicted as no and why this is happening. Why why would you see high accuracy when this my Texas looking so bad is because off the disproportionate number off s and nose in the in? The original data set in the original data said we saw that let's go back to the split in the testing gate at the testing data said had 1002 100 knows on only hundreds of basic CSS . That's they are most Leno and 90 to 10 pressure on when you have that kind of a ratio. Even in the training data set, what is going to happen is the modernist tend to skew itself toe the highly disproportionate size eso this case, the model X cured salted the no part In which case, since there is a very high number of knows it is going to skew itself and start predicting everything as knows when it can be. Yes. So that is a problem you will have when you're there In your data has the disproportionate number off these classes. One possibility for you to do here is, you know, to reduce the number off nose in the training data said. But do you feel about to do that? Then you really look at a large proposition off data to start with. So that is one possibility you can do here. But you see, the skewers is happening simply because the classes, the type castles, you have a disproportionate. When he looked at the ideas, data said the classes were equal like that. Those are the vesicular the Virginia badly oneness to want us to one here is like night is like nine is 21 So that is why you might be having this kind of problems. So this is where your go. Try various things. Try various other algorithms, try filtering the data does on bagging, boosting, and see what helps those kind of things. You do them. There's one more experiment we're going to see after we do this prediction, which is? We saw that it built 500 trees, but for building every tree, It has a lot of work to do in because it has a bill find trees. But the question is, do you really need find trees? How can I get the same level of accuracy with maybe 100 100 trees? 10 trees fight trees. So dough that because it that again depends on the data set. You have, Dennis, that is really good. You only have led to need less a number of trees. But they said his bad. You need more number of trees. So, Father, what? I'm gonna be doing this. I'm gonna be like going through this loop where what I'm doing is I'm gonna be I'd reading through values wanted 50. So I'm gonna be building trees from 1 to 50 and I call this same with that random for us. And I passed this variable called entry, which tells me how many trees and immutable by default entry is 500. But I'm just saying, Well, to his 1 to 50 on. Then for each of the tree that is built, I'm trying to find the accuracy. So what I'm gonna be doing is I'm just doing everything here in one line, but basically the same thing. Both a martyr losing the credit function, then call the confusion matrix. And the confirmation my bags actually gives me a data object which I can query and find the overall accuracy, which is what I'm picking up here and then adding it to this particular vector car Oculus e . I want to do that. I'm just going to be plotting the accuracy. So I'm gonna be plotting the three saves on the X axis on the accuracy on the Y axis on about how you see how the accuracy increases at the three sites also goes up. So what you see here is that maybe the three sides of ST to with is low three this low. But Omurbek, like four, it beefs up straightaway took something like, you know 0.89 than 90 and then gives going somewhere like this. But also remember that this has a disproportionate number of classes, so that might also be influencing these overall bristles. So that is one possibility. But whenever you use, found them for us. It is good to do to this kind of an exercise to find out how many trees you do really have to build, because there is no point building fire under trees if all in a day's enough countries or even 50 trees to do good predictions, because every tree is going to cost you more resources in terms off CPU and memory. So this is a view to random forest. I know for us is a pretty a great matter when you have low predictors and stuff, but then watch out for the amount of time it takes around this algorithms. Andi, the best way you can control that. As you know, find out what number of trees uni really, really have. The build for the given data set and the given model. Thank you 13. K Means Clustering: Hi. In this lecture we would be looking at what we call as k means clustering. So clustering is an unsupervised mission learning technique in which the goal off clustering is to host group data. So came in clustering the popular method of grouping data into substance. So you can group some data into sets of three are sets off four. Based on the similarity between the data on how do you know the similarity between the Rada is basically by the similarity off the variables. So there is no different predictor and target variables here. Everything you can consider as a break their variable on we're looking at similarly, between the values of the predictor variables noted that mine how these groups should be farmed. So in Edo came in clustering. Suppose you have in observations are rose in your data said and invariable Zoran came columns New data said you didn't group them into K clusters. How do you grow them into K clusters is that you group in such a way that every observation or every row is finally put in the one and only cluster. So if you have rose on every roll at say represents a customer and then euro came into custody. You go and create K clusters and K can be like five or six or whatever value and you end up creating that many number of clusters and every row are every customer is ascent a one and only one cluster and that is called K means clustering. How would Brooks so in, you know, came in stressing, You cared an M dimensional space. So M is the number off variables are columns. You have Ukraine an M dimensional space and plot each of those in that space based on the values off the variable. So you blood every point in that space and then you do clustering by doing waters called us distance measures between the points. So in that n dimensional space, you measure the distance between each of the points on, then use this distance to group the data. In the next life, you are going to be seeing how exactly came in disgusting is done with an example my people , the types of distance my shares are available and know how do you calculate the distance between point A and point B And some examples are the Euclidean distance. Euclidean distance is finding the distance like asked the crow flies. But as Manhattan distance is in my shed, by which it is like a step by step kind of a distance, and there are a number of other distance measures also available. But the most popularly used measure is the Euclidean distance. It just like trying a straight line between two points and using that to measure the distance between the points. Souders clustering work. So let us go to tow them stage by stage. The first stage air is on did. If you look at the first block, let's say consider the data said. We're talking about it only has two columns, so it is a two dimensional space easy for you to visualize. That is where we have just considering a two dimensional space. So there are only two variables on in this particular thing. I have an X axis and the Y axis on. I just applaud the points here. Suppose this maybe, like just age and weight on. Maybe that I'm plodding every patient that I have based on their age and weight in each of these dots. All the green dots here represent one patient, and this is so they applauded. Now I want to group this data in tow Two clusters so I can choose any number of clusters I want for it. But in this particular example, I'm only going to be choosing to clusters. So I'm just gonna be to sink low class just year, and then I'm going to start blocking. So how do I start a clustering The first. The first stage I do is I just choose to points at random. I call them sent drugs. But the first time I choose the centrist, I just use them at random. So I just put two points anywhere in this particular chart. I'm just choosing, you know, these two points here, but I can choose anywhere I want and the algorithms typically Jews. What the random? A point is, once I choose a point, the next thing I do was I measured the distance off every point toe every centric supposed to take this particular point. I measure the distance between this point and dissent, Troy. And then again, this part in this central for every point, I'm going to be repeating this process off finding it descends between each point and eats and tribe. Then what do I do? I Assane each point to the new arrests and try a sane each point to the nearest centrally. So at the end of this assignment, every point is a saint a set off, you know, dots. So this particular blue dots are sent that this central and then the red dots are sent to distant Try Now this becomes your clusters for around one off clustering. So class ring happens in many rounds. The first round of clustering These are you cluster. So you have the red dots far for forming direct cluster and the blue dots forming the blue cluster. So this is around one now what do you do in brown to once you have the clusters formed? No. Find the center of each of the clusters or the centre right off each of the cluster's. How the way find centre right is I find to find a point within the cluster in such a way that the distance off each of the points within the cluster toe the center Roy is tend to center point is minimum. The same tribe has phoned in such a way that distance. But being each point in the cluster to the center point is the minimum. So in terms it is you're trying to find the true center off that particular group of points . So when I find the true center from that particular set of points, I end up here the news and try. So the Sentras have actually moved from that original location. Do the new location. Now you see that these are the centrists. Know what happens now? We go and repeat the process. I repeat the process of finding the distance between each point toe each other central. So I repeat the process of finding each point each of the central rights and then assigning each point to the nearest central. Now, when I do this once again, what is going to happen is the some of the points will move between clusters. For example, this blue dot here you're late belonged to the blue cluster, now would be moved to the red cluster. Similarly, some point that I hear in the red cluster have become no blue cluster at the center right moves the dissent between the points and the central exchange as a result, some points move would be in one cluster to another. No, I have a new set off a new cluster. And for this new cluster, what do I do again? Go and find New Central. It's on again, trying to find the distance Michelle and again trying to re ascent points. So there's this particular thing off finding distance between the point and the scent, writes finding a new cluster on. Then finding the center out of cluster keeps repeating again and again. And this process goes on for in iterations. And when does this iteration stop? Then this point no longer move between clusters. It stops when the the central has become stable. They don't move anymore and then the points don't move between clusters. It does that place that the clustering process has come to a standstill and that is where you get the real clusters. In this case, the points are really no spread out each other. So within a few iterations you have got into the final outcome are finding that the central rates and the points sometimes you know, if the points are all intermingle with each other, it may take a lot more iterations. Before you end up being coming up with the real thrusters, not all clustering algorithms will typically go until the end. They would typically stop after some X amount of iterations. They have some internal measures through which they figure out when the clustering process is kind of optimal are complete. And then they stopped at that point. But this is the basic mechanism in which game in clustering works, finding the center right, ascending parts of the central right and then no area sending them. And this process keeps going on. So this is your clustering process. How came into clustering books? So advantages of K means clustering. Compact? Oh, there are other types of clustering, like hierarchical clustering. There are other variants of K means clustering are available, but a lot of all of them typically work with. You know, the same kind of concepts. What are the advantages of K means clustering is is that it is fast. It is a vision for a large number of variables air Okay, it can be even if there are 20 or 30 variables and area came into clustering can work fine . It is explainable. You could result, explained why it is points of being assent to these clusters based usually. And I'm pretty easily explainable. Shortcomings is that you need to know. OK, a friend. So if we have a group of data, how do you read it? The mind if the pretty remained the number of value of K. How do you know that this particular group data said I have actually have three logical groupings of biological groupings of four logical groupings? I don't know that our friend. So you need to know. Okay. Beforehand for you. Dodo came into clustering. One way of overcoming this is actually trying to do clustering for all kinds of values. Like take the same set of their Dando 23456 kind of clusters on Aren't that you're trying to find something called us? You know the distance. The what is called the separation between the clusters, which is measured by the some off Squires kind off logic. You would see that when we do the example the use case for came in structuring. You do that repeatedly and then you draw waters called us a curve for this. Ask some off Squires on. Wherever there is in me in the curve is very good a mind that particular clustering is complete. We will see that example when we look at the use case how to find me in the cluster. The initial sent right position has influence on the actually clusters formed. You know, wherever you put those darts, a friend, sometimes they do in free in the actual station ship size and shape of the cluster's. Sometimes what happens is if the points are that start the seven shoes and tried starting a different position. The point meant actually end up in different clusters. So the initials entire does have an influence on the clusters being formed. So that is one shot coming. There are other clustering radiance which taker of this kind of a shortcoming to but claiming clustering by before relates a lot on this initial central. If the parents are too close to each other, then the unity classes form are influenced by the initials and tried that is used. It is used in preliminary data grouping up there. A lot of times, custom can be used like a preliminary grouping of rate I first group that they're dying to three or four clusters and then start doing mission learning or individual clusters and see how they behaved. Sometimes it is used like a preliminary clustering technique. It was used for gentle grouping of data like any kind of data, like documents grouping, finding groups of documents and stuff like that. Groups off website searches, search text, things like that. It was also used for geographical clustering where you have the long itude and like that you'd ask the very builds and then use them to find logical groupings of data such that that you find some true centers and true groups of data in the job graphical set up. So there is another abuse off gaming's clustering. 14. R Use Case : K Means Clustering: Hi. In this lecture, we are going to be looking at how we can go. K means clustering. Andi came in structuring. We're picking an example called for auto data. We're trying to keep the example specifically simple so that it is easy for you to understand and visualize how this clustering mechanism exactly works. In this use case, the import data contains cars, information about cars on some technical and price information about them. The goal off this problem is to group them into four clusters in the fourth logical groups . Based on these attributes, the main techniques that we're gonna be using here are k means clustering and centering and scaling. We start with the data engineering analysis. We start with loading and understanding the data certain. So we said the working directory on we load this art or data not see us, we file from the resource bundle on. Let us look at how these attributes look for this data set. We start with the make off the specific car, the fuel type, the aspiration whether it is standard are turbo, the number of doors, the body type which is convertible for door to door, the type of Dr Four wheel drive for friendly rifle will drive our air Will drives things like that. The number of cylinders, harsh power rpm, my spur City mpg for city mpg for highway on the price, we're gonna be using this data out of group the cars into four clusters and then we will see How did it a mine? The ideal number off clusters to so again taking a look at the somebody of the data, you see? Okay, the make fuel type. You know, all of them kind of look pretty good. You can You should definitely take a look at this data set and even try other algorithms for this data set head off the data shows you the how the daughter It looks like there seemed to be no cleansing requirements here. One of the first things that are clustering needs is that the clustering needs all the numeric values to be in the same range. Other ways clustering is a clustering is based on distant measures on for that toe happen. All the numeric data that you see in this data said, which is things like the horsepower rpm, my leg per city mileage per gallon highway on price, they all should be in the same range. You see that our sport has between 48 toast is to do. Our PM is in the 4000 to 6000 range prices. Somebody you know in the 50,000 range. We need to get them all to go into the same range. What do you do? That is, We do centering and scaling. So for doing centering and scaling, there is a matter that is available Call scale. So we just past those numeric variables, which are the columns 8 to 12 toe the state are on. Then you get them a scale number on, then use the scale. Um, but to replace the original columns 8 12 Who is the Autodata? Come on. 8 12 Replace the original later with the scale value. And now again, we look at the summaries and see how the summaries looked like the horsepower rpm. All of them are between some. Right, but from minus to plus three range, you can see the minus 40.0 minus 2 to 3. Price has come down to minus one. Before you see that the scaling has happened. Skate, sending and Skilling has come brought them all down toe pretty much the same bridge. So this is the data that you have and then what you're going to be doing nervous Do some exploratory data analysis to look at, You know, there's our players are any kind of errors that are there the data? So we're gonna be doing the number of box plots. Typically, we're going to be doing box plots for every every value that is there. This is not we're not going to predict anything here. There is no predictor variables. We just trying to see what kind of range of values were gonna be having for each of the data. Also, HB rpm, my Spurs, Avellan City, mpg highway and the price How the Rangers can imagine now that since we have centered and scale all of them, we can actually put them side base and actually look at how they kind of scaling because they are all now in the same scale Ring. So you see that how harsh power must but got all of them are in the box part kind off looking. Doesn't I spread in all of them out players in prices. A lot of our players, actually in prices. Maybe there are some costly models in there, possibly, and we chose inarguably more out layers It's in. There are many. Actually, we could possibly the growth thing. What we're trying to do here is we're trying to group them into clusters on. But when you do clustering how clears can create an issue because our players will be somewhere far away on day will start forming their own small cluster, their love. That will be. If I'm trying to make four cluster. There's a possibly this out layers might influence you that's one dark somewhere far away, magically create its own cluster on. Take the rest of the cluster study points to start. The points will only then get three clusters to actually group themselves in. So those kind of all class might be detrimental toe the clustering problem. But given that we have so many out players, you see that a lot of them are there. It's okay for us to keep their players and see how the clustering comes out to be. If you actually end up seeing that the clustering is not that good in other words. When you create this clusters, we look at the number of members in each of the clusters that come out. If you see one or two, classes have only very, very few clusters. Maybe that may be due to our players. Then you want to maybe go back to your data and clean out their players and try clustering again when all these machine learning algorithms are trial. Another kind of thing. You are the I too late by making modifications to see which ones you're gonna be getting the best out put off. Let's go to the actual fluster buildings. So fasting I'm gonna be doing is just so that it is easy for us to visualize in a two dimensional plot. Let us try to build these clusters with only two variables. So just for visual ization, sic in Arctic clearly see these points laid out on a on a nice to demonstrate blood. I'm gonna be your speaking 100 samples on just to really both the harsh power and price to create four clusters just just so that you know the fast around, take a look and see how exactly the clustering works and how it looks like the only 100 samples on only horsepower and price. So I used the library called class. The class library has a function for K means clustering in the Caymans clustering it is important to set your random seat. So the initial position off the clusters, the initiative portion of the classes way talked a border during the lecture there randomly chosen. So the random number it uses a random number generator basically the systems random number generator to pick a random number and chilled these clusters. So if you want repeatable results, which is every time you execute this court, you want to end up with the same kind of clusters. Then you said the seed explicitly so that this serious always used toe shoes, the initials and try positions on then these initials and tried positions. Once they are the same, that mining clustering process will also be the same. There are chances when the sea does not go to you can try that. Also, it actually changes the initials and tried position which might actually influence the actual groups of cluster that a farm if the day that does not segregate itself that very well internally logical groups. So it's always good to set the seed a friend of some number, so that the initials and tried positions are always the same. And we're going to choose and subset of data, which is the 1st 100 Rows and columns ate and played come out well on. Then we just going to say the A k means on this data and create four clusters very simple. And then you say clusters on. This is going to give you the actual information about the clusters form, so K means clustering. This is actually giving you the airport gaming's trusting with four clusters of sizes 14 45 28 13. Yeah, there are four clusters found with each of these sizes, which pretty kind of okay, and all the classes have some good sizable members. If one class right like one member than you might be wondering why that may be an outlier somewhere sitting outside that is influencing your clustering process. That's 14 15 28 30 and looks OK. The clustering means these are the means of the cluster's barrier. Sent rights are kind of look, it is actually Augusta sent right these points you see here are the center points off your clusters and then it gives you the clustering victor, which is for each of the 100 records. In there. You tells you which cluster that record belongs to the hospital. A vascular faster. God belongs to the second cluster. Kendrick are blocked. The first cluster terror, carpentry, the second cluster. So it just giving. You had this sent proud to belong to. And then you come toe and do what is called us the sum of squares off clusters. So it comes up with a farm local between some not squired by total sum of squares. What that means is just shows how much it tells you how much coalition is there within the cluster. In other words, you want a lot of cohesion within the cluster and very less cohesion between clusters To repeat, you need a lot off coalition within a cluster on very less cohesion between the clusters on the best way to measures is this farm law which comes up with the number was a percentage. So the higher the person days, the better is the clustering process. So this is a farm allow. I don't want to get in deep into that. But what what do you need to know is basically, this value is a range between 100 and the higher the value, the better is the clustering process or 87 is really pretty good value. So now that we only clustered on two variables Ah, let us try and go on, plotted and see have it looks so I'm gonna be plotting the hatch be on the x axis the price on the Y axis on. I'm gonna be coloring each point with the type of cluster with the cluster that Columba that is found on I'm just using the point of type of point is a dark And then I'm saying the size of the point is to and then to this plot I'm adding the cluster centers to as purple. So what you see here is OK. The house published year the prices here on then the clusters. Each cluster is color differently in the central, that clusters of all this triangles you see there. So these are the clusters form, you can see that it has nicely grouped itself into four sets. So this is clustering explained, but just two variables for you. No, let us go out and do the clustering for all the data on for clustering with all the data. What clustering only takes numeric data. So what I'm gonna be doing is for the 1st 8 columns, I'm gonna be converting them into numeric variables. However, convert into numeric variables suggest I'm gonna be loping through this variable. I want to eight and then say auto did off. I called as numeric order data off, so I'm going to read each of them into a numeric data on then. Now, the way somebody of this data somebody of this data you see, that may type everything has been converted into a numerous equal int. So wherever there is text off factors, it is converted into its I d form. You know, I d do we have this idea in name concept? So this converted in the nineties. Everything is now what number? Once you get everything, do this number. Former. Now you go and do clustering on the rest of the data. So do okay. Minutes clustering on order data on every door on. Okay, I'm only choosing here the off five columns in there the 7 to 12 someone to 12 variable in there to kind of limit the variables you could have actually collected on all of them. But in this example, just focusing on the 7 to 12 columns only on them creating the clusters off. So this again gives me clusters and the clusters and tried again given here you having like , 12345 variables in here. So there's a five dimensional thing that is coming up with on the cluster Sent writes out there on the moodiness is not that good on A 60% is still kind of good And okay, that is the kind of trust as it does for me. So this is our caimans. Clustering works. You can try a different sizes and different variables and see how uniquely will hold with the trust issue Come up for it. One of the biggest challenges in gaming structuring is finding out How many clusters do I really have in the data that I have? Is it full in this tent? Is it 10? Is it just to how maney logical clusters that does This data actually grew pinto and you since game instead string is a process where you have to give the number of clusters as an prior input. It is difficult for you to come up with that. So the only way for you to find out is actually running the clustering process with many number off cluster with a different cluster values. So try Cluster one cluster two classes, three cluster for and then you do what is called as you look at this sum of squares value on there is something called us and meet that it has to take. So what does this need? Ask? Take. Let's take a look at that Me. So I have this function. What dysfunction is basically going to do is that it is going toe Tran came into clustering for you know this one to end. So 1 to 15 is going to try anywhere, all the way from one cluster to 15 testers one by one on each time it does this clustering process, it is going to come find this water is called the giddiness off the cluster which is a part off the clustering process. So it is going toe duty clustering and get this within us value on it is going to plot this within its value against the number of clusters. So what happens? No. So here you see the number of clusters here and then once I do, my came in clustering on. I get the witness value from the cluster object and I'm plodding it here. This graph looks typically like this, but just use either It start the within its value starts from where very higher on then straight of a ghost really Don't And then somewhere it takes a knee. So it somewhere it turns from south to best. So that is where what we call us. Any. So it seemed to have taken and me on this value off three. So what that means is wherever it takes to me Me. So what? What What does this kind of telling us? As you keep increasing the number of clusters first, what happens is it starts getting into more and more logical grouping. So within us will start dropping drastically once it has achieved that logical grouping. After that, you're just creating artificial splits. And that does not change the between us that much. You're dis trading artificial spread, So if you look at year until the value of three drops really smooth, really big. And then here it takes and me are pigs goes a goes eastward, so from south and goes to east. This is the point in which afterwards every other new Costa that has been created are more optimistic clusters. What does justice use? Three is the ideal number of clusters. Three is the optimal logical number of cluster that this data set has. So this is how you find the logical are optimal number of clusters in a given data set. So this is our came in clustering works. It is very powerful for doing any kind of grouping you can group like our documents you can group customers based on their attributes is a financer has got a wide variety of functions on, as you see that are as very simple functions That makes its use very, very easy and simple. Thank you 15. Association Rules Mining: Hi. In this lecture, you are going to be looking at what we call us Association drools mining, which is a popular clustering techniques typically used in a lot off retail business. So what is association dolls? Mining in association Does mining? You're trying to find things that typically occur together. There is a set off items at a set of things that typically occur together. You're tryingto find the things that most frequently occurred together. For example, in a supermarket you're trying to find items are food items that are frequently bought together, like like milk and eggs, bread and cheese, or bread and jam, things that have very frequently bought together. Why do you want to find them? The supermarket want to maybe stock those items bought together usually their stock together, so that when somebody buy something, it is easy for them to buy other things that they also intend to buy. It is also used to find fraudulent transactions. So what? Why do we wasted you? So finding fertile and transactions is that there are certain patterns that occurring fraudulent transaction. Suppose I have like 50 variables about transactions are created card transaction that happening, fraudulent transactions typically have a pattern that typically there are things like their ages. So and so and then you know the location insolent or in time of day and so on. So things that occur frequently together are identified using association rose mining. It is also done for what is called frequent pattern mining. What is frequent part on mining it is that when you look at the data said, let's take a look at the data sent off patients, there are certain things that occur together, like when column one as a value X column to has a value by. So there is some relationships between this column that keep happening whenever somebody's age is less than 30. It is also phoned that they do not have diabetes. It is also phone that they're in chemist, less than 50,000. It has also found that something like this so you see that when some event some particular value occurs in one column, other columns for that record will have some similar kind of values. Are they always occur together. So it's ideas to find things that occur together. And whenever you have that kind of a challenge, association rules mining is the solution for that kind of challenge. It is also used to find the next word it. Let's say when you look at the search engines and you want to predict what is the next word . So you start typing a word and the surgeon and predicts you the next possible sort of words that also comes out off association. Drew's many, probably because you're tryingto on association to reminding you mind the data to find words that frequently occur together. So when somebody starts typing, ever than the prediction of garden becomes easy, So it goes and looks at the scores off the words that are close together, and then they prompt you. Indeed, when you're typing out there, string that you want to search it is one of the clustering techniques one assumption association rules mining does is that it assumes that all our data is categorically, categorically needs all data to be categorical, not continuous word. The work. So you need to work candidate out, convert they inter categorical data, so that is new medical data. You need to convert them with the bends and stuff like that before you can pass it on to the association rules mining. It is also popularly called market basket analysis. If you want the that, it's popularly used in the retail business. When you run association does mining, it is going to come up with a set off artists called association rules, and these association rules can then be used for by the business and let us see what these association rules are in the next lights. The import data said that cows into when you're doing association rules mining is a different kind off a data set in the when you look at market basket transactions that plant when an association does mining All Guard, um, takes as input of file that has transactions. Each lane contains Airtran section on in that transaction is possibly a transaction 80 on items that occurred in the transaction. So typically, it will be looking like this transaction one has bread, cheese, milk transaction to US Apple X yogurt that doesn't solve the import data toe. Ah, year on garden, we looked like a transaction 90 separated comma, then a list of items that, according the transaction, you can also use that for textual data that say it's a bag of words data. So for every word that is there you come up with what is called a bag of words or the keywords in that particular document. Suppose you are trying to group as set off set off news articles. You can big the keywords in the news article and form that as your bag of words. And that becomes like looked like a transaction. And this is what is then given as an import toe. The association rules mining algorithm. So when it comes to air and there s set of metrics are measures that I used for measuring how these frequently occurring items occur together. There are some measures using which things are loaded. So what are these measures? Starters Go on explode. Let us say nb the number of transactions in your data set. Let X. Why is that be the individual items in the day like it? Maybe attempts like in a milk or butter or eggs The X prize that be the individual attempts in the data set. Then what happens? There is a measure called support support measures. How frequently and combination of items august in the deficit so supporters how frequently a given combination of items it maybe one night um toe items, multiple items it occurs in the data set The support of X equal do count of transactions with Dex developed by in support of X y, where X and Y occurred, took it. That is the count of transactions with X and Y divided by n so that is how you measure of support. The next measure is called Confidence Confidence Mesure. The expected probably be that why would occur when exalt so occurs. This is the association probably so every time X occurs. What is the probability that y all soccer's while how do they occur together? The confidence of X Why given Excell every time X occurs? Why also occurs the formula for their does support off X comma y developed based support off X We already computed support and this is over. You compute confidence confidence off Why given X also occurred a support of X comma y divided by separate affix. 1/3 mission is called lift to lift my shares. How many more Times X and why a close together Damn expected. So it is like there is an expectation I don't like so on average. I mean, they cannot go together. How many more times is it occurring? And that lift is computer for Why, given X is that confidence off X given y so confidence? We already know the formula David. Base support off. And this is how the leftist measure. So whenever you're giving at least heard of transactions to an airmall garden, it is going to compute, support, confidence and lived for all combinations of data. And then it is going to give you outer set off rules the rules where you typically has the higher support and highest confident. Those are the ones that come out as the top rules. It typically gives you all the combination and all the rules and then all the measures. But it is sorted in the descending order of support and confidence. So it'll special rates when more what one item accursed and others also occur. So when you look at all the output off air, um, you can make these kind off hypothesis a decision based on that something like when bread is bought. Milk is bought 33% of the time. When India occurs in a bag of words such in a girl's 20 person dollars in. So this is, Ah, the rules that come out off on a animal garden would look like. And you would see more off that when you're looking at the use case. The goal often Ahram algorithm. When you turn a hair on my guarded, um, you specify a minimum level of support and a minimum level of confidence, which is, you told Al, guarded them to go and buy gone. Find all the rules are all the combinations that occur, which has a minimum support of X, are more on a minimum support off the minimum confidence of why are more typically, you say support this point at one, and confidence is 10.3, so there's going to go look for all the transactions. But the minimum supported spined one are occurs more than 10% of the bank. It's you, God bless, you know, player, and with the support and confidence levels you try to give to lower supporter or too low confidence, and you're the list of items that are like 50,000. There's a Ramallah Adam is going to run forever. It may run out the memory issues you know crashes and stuff like that. So you always want to start at a lower level for support and confidence and see the number of rules that are generated. You know, sometimes the number of rule generally true too few because that there s that itself doesn't have that frequently occurring combinations. Then you can slowly increase the support and confidence levels until you get a desirable level off the number of rules. But always start at a high level. Let the confidence of 3% or something like that and keep going down a frequent identity as an item set. So the output off and Airmall garden is what we call the frequent items said things that are going more frequently where the support is greater than the minimum support level provided. So you've given the perimeter. I want to look at all the frequent I themselves that have a support level of X or more. It is going to go do the analysis and come out with the results for that particular analysis. Now, a lot of them that is used for a Aramis what is called the a priori algorithm. So it does its magic internally andare Typically there is an implementation off air. Um, I pray all guarded them available in the language of Charles Use formation learning. And then you pass it to this algorithm list of transactions. You provide the support level and the conference level, and it is going to come back and give you the set off Frequently occurring rules. And thats so Aaron books a room was a very popular technique. It is used a lot in the retail industry to find things that got together as we talked about it is Houston for fraud protection. It is used in exploded analysts suppose you have, like, 50 different, very personal 50 predictor variables trying to go through each of the Breda variables to understand how they work with each other. One of the things you always do with, you know, you're trying to find the car elections. But correlation, it's a super lover, you know, it is a global level correlation, but there could be some more internal patterns where when, when excess of value off one y s a value of two, that kind off closeness between two variables Those kind of relationships can be discovered using association's ruled mining. We will also see an example of that in the use case that follows. Thank you 16. R Use Case : Association Rules Mining: hi. In this example, we are going to be looking at association drills mining in which the problem statement is an accident, Data said. On In that accident, Data said, We're going to be using association rule mining to Do Frequent Pattern Mining Association does. Mining is also done for market basket analysis, but you know the number of examples on market basket analysis you can pretty much find on the Web. A lot of those examples exist whenever you're going against and finding examples for a rules. So I chose to use another example toe demonstrate the market basket capability and an additional capability of converting regular data into market basket transactions also. So if you're going to look at examples for a rules you will find a lot about, you know, the regular market basket, it'll talk aboard like milk and eggs and butter twenties all the time. So I'm trying to use a different examples here. So in this case, the problem statement is that I have a data set that has information about 1000 fatal accidents that has 1000 fatal accidents, and there are a number of variables associated with that accident. What I'm trying to find as I'm going to find frequent patterns in this accident. So I'm going to find what kind of conditions always occur together. So this data, the variables, are going to be like the conditions. What kind of weather is there? What day of the week is that? What time of the day it is? And I'm going to find what kind of patterns typically a good together, but just been variable a equal value X variable, be equal so badly. Why most of the time, so don't cater. Patents is what I'm trying to find here. The techniques I am gonna be using is association rules mining as well as converting future data figure data meaning table kind of data in the basket data format. The data I'm gonna be using is a file called accidents dot CS. We available in your oh so swindle I loaded up in this variable called accident data on. Then we take a look at the structure so it tells you for this force the first variable, but that the police force was there, not the severity of the accidents, the number of vehicles involved, the number of casualities the day of the week. The local other authority type the district basically believe the accident occurred Type of roads, speed limit, type of junction pedestrian crossing facilities, light conditions, weather conditions, road surface conditions our Bernardo data on did the police officer had been the scene of the accident. So I'm trying to find from here what are the most frequent common patterns in this whole data set? I could have actually gone and done, you know, the other all there was to do an exploratory bid. Analysts manually trying to compare each variable with other variable correlation. Co option does not give you those kind of my new things, which is correlation coefficient gives you more like, you know, one X increases the way, also increase more kind off a number kind of thing. But as he had trying to find most frequently occurring values and what kindof combination of values occurs most frequently that you don't not get from those kind of analysis. You need something like this to kind. Find the most frequently occurring patterns. So look at the accident data, the accident indexes, and I d accent. I desire to jump from a large number, and then you have all the other ones. Police for severity. Pretty straightforward data. Nothing looks noisy in here. So we're just going with this data that is already there again, doing head of the data again. Pretty straightforward stuff. The first thing I'm gonna do is I'm gonna be converting this data, which is a regular table into what we call the mosque market basket data. So how am I going to convert it into this is the destination former I want to convert into . So I going to convert into this CSP where every rule represents a transaction. The transactions has a transaction 91. And then this is what you call the market basket, then less than items in the basket. The way I'm gonna convert this data is convert them into name value per so and stuff inside the column naming just police force. I'm converting it into like this police force. He called one accidents a year equal to three number of vehicles, equal to three. So this is like item this this becomes like items. So you have a comma separated value of the items in the basket. So this is Ah, you would convert the regular table in tow into a market basket. Former are Aramark transaction for much transaction I d. Followed by the items indifference, action. And to do this kind off a conversion. What I have is that I had in my own court. So this court basically walks through every row. So there is. I'd reading through every low in the data and then if the role and then what price to build that particular record. And then it tries to iterate through every column in the data. And then it bears the name equal value kind of thing on builds the whole string. We build Sethi and their CSB filing memory on, then finally arranged that particular fight into the basket. So it takes care of all this comma. It takes care of the new land characters and stuff like that so you can go through this code in detail. It is just regular called CSC file out of this data. So I showed you how the final form it looked like, and that is being saved in this file called accident Basket guard CSP now given out that were converted the regular data into market basket data. Let us start doing some analysis. So for reading transactions on analyzing transaction, there is a library called Edel's Association Rules Library, which is what we're going to be using. So we load up this library on, we do a read our transaction, so when they don't read, write transactions, it is going to expect the data in the transaction forward format, which is transaction I d. Followed by the list of items just going to pick that burn uploaded into accidents. And once I loaded into accidents, I can do this command summary of accidents. It just blows me to somebody of the data that had read from this transaction stable. So there are about 1000 rows in there. That's what he sees. And what are the most frequently occurring items it starts with. This is item. This item called the police officer a tense enough accidentally called one that occurred in 902 transactions or 902 times This pattern occurred. The single partner single value occurred. Same with the next hires. The next high Yes, officials, The top 10. You can also look at the same data about doing what is called us on item frequency block so I don't frequency plot shows you the top 10 items or top 28 them. So you do. I don't Frequency plot shows is on the accidents. Transaction said stopping Nicholas tension. Show me the top 10 because the absolute with me Show me the absolute values color called da grin on horizontally Colotto basically plot the plot Arizona the other ways it will be plotting it vertically. So now you see what are the most frequently occurring one. So it starts with this dead police officer on the scene of the accident happens and 900 transactions and tells you the most most commonly occurring single item patterns. So accents if we article a tree seem to be the highest too, were be wrote typical. The six seems to be pretty high. A number of casualties equal the one being the most highest or speed limit. The three that a being the most highest out of you know that no other speed limit showing up here. So you start looking at here, you start understanding the kind of patterns you see in this data. Now, this is just, you know, single item daytime into just looking at a single item and seeing how the Ecker next you want to start looking at the combination of items that occurred together, the combination of conditions that got together. So let us start looking at that. And for that you're trying to find out the rules inside the circle rules and the rules were trying to find using this command called a prayer, right? A priori You passed the accident on. Then you tell it the support and confidence levels you want to look at What you're saying is look at only those tools. Find out only those patterns. But the support, the minimum support this 0.1 on the minimum confidences 0.3 We looked at what the support and confident confidence family are in the regular presentation. So you're going to be only looking at these ones. If I give this values pretty too low, the algorithm might go for a toss. It may just go for a spin because it is tryingto must have unless too many different things and then it'll end up, you know, running out of memory and stuff like that. So you want to start out this confident in support at a very high level and look at the number of total number of rules generated. The total number of rules generator are part of gender. It is not enough. Then drop the support and confidence. Other ways 0.1 and countries kind of pretty good typically. And the the lower the value of those you give, the more amount of time this one is going to run because it does to find more number of patterns because you're giving it at very low support and very little confidence. So once you know the roses gives me, you're the output of how the tools I've been running on it that shows your stuff here. And then once the roads are given, you can actually look at what the rules are by doing and this inspect on these rules. And so I'm going to be only inspecting rules. 1 to 40 you can actually go inspect all the rules. If you want, it is going to give me in descending order off a support and confidence, support and confidence. So let us look at how this looks like. So the first set of rules are basically the single item rules items, single items only, which we kind of already sort that. So row type we call the six as a support off points of and Fifi on the confidence in 60.75 So if there's only one night them, these support and confidence will be equal and lift will always be one. Then story starts in the multiple item one. So in this one, what does means is when the day of the week equal defy Did police offer off Israel and the traffic seen equal? The one happens with, uh, with the support off 0.1, which is 10% of the transaction. Had this on the confidence off 9 91 person. What that means is, when they are vehicle equal to if I 91% of the time date officer at the scene of the accident he called one happened. So what that means is, every time they are the we could defy fight the accident. That officer did I can see no Dax and 91% of the time that so you read it, then you're starting in more than more patterns and look at some interesting part done since year, you know? And look at this one it says when the better condition he called to, which is Maybe it's called whether are on a snowy weather accident. CVRD called three. So this severe, high, severe the accidents happen in this specific weather condition that immediately gives you some know some inside. Until okay, this kind of weather seemed to be the most problematic. What this means is that whenever this weather happens, that has to be more safety precautions. You know, that has to be more traffic precaution that things that I have to take, or maybe in places. But this kind of weather condition occurs. You need to have more more safety precaution, maybe no level crossings or signals. That is for the police department to figure out. You know how how they can minimize these accidents by doing something. But this one gives you in a good indication of what happens there. We seem to have severity of accidents pretty pretty high when this specific condition happens, so that you start reading through this and start finding some interesting Parton's. But you can then take and then start making some decision upon us toe. What do you want to do with it? So this is all the partners you see And this is all you do Market basket analysis. If you are done a regular market basket than what? This wonder of Leicester Leicester? Like milk, eggs, bread. How many number of transactions that occurred on? We look at the combination would say when milk is bought. Eggs board, like 90% of the time. Things like that. But that example, as I said you would find all over the Internet, we just go look at examples for euros. That is what you find. So I'm trying to use a new example share to give you a different experience. I do explode that example at the regular market basket. Also on the Web on do you go? Try tried multiple support levels and multiple confidence level. Don't see how this algorithm behaves differently. So this is for my what we have for association. Does mining market basket analysis are frequent? Parton Mining. Thank you 17. ANN and SVM: Hi. In this lecture, we are going to be looking at two advanced machine learning techniques, and they're called artificial neural networks. Andi support vector missions. Now these two techniques are called what I called black box methods, and the reason they're called Black Box mattered is that they looked like a black box inside which some magic happens. You give them the import data, and it magically does something and comes up with the predictions. It is not that easy are simple to, you know, explain or understand how this artificial neural networks works are the support vector missions work there typically need some some solid understanding, some off sums, foundational computer sign, science and foundation and mathematics on then based on which you can improve upon and then trying to understand how this one's work. The good news, though, is that these ones are again available for you, implemented in certain libraries on for practice purposes. For usage purposes, all you have to do is call. This library is passing the later and they will do the magic for you. So in this lecture, we aren't going to really take a good look at bodies. And Carl Adams to that. We are just going to be doing and all of you here because I assert its pretty complex in trying to explain thesis ones, and not that it is impossible there Texan available for that. But given that these are complex topics, we just used to skim through them and simply focus upon how we can use them in practice. Artificial Neurex networks are inspired by heart by how the biological human brain fogs, and it is a black box algorithm that is going to take a lot of time in order to explain and understand. It has used highly in the artificial intelligence domain where things are fussy. The relationships of fuzzy data is not always correct, and it does not always complete on. It is now a green extended into the use for mission learning. It helps discovery, no complex correlations hidden in date I that works similar to the human brain. It helps discovered pretty complex correlations with, you know, incomplete data and faceted and fussy relationships, all of their dis accorded, for it works very well with my C data and works very well with variable social relationships are not that easy to understand the production part is fast building. The model is slow. Predictions are fast building. The model ist low on it is very easy. Teoh wolf it. It was used in a lot of artificial intelligence situations, off Mission learning, like learning about facial recognition, character recognition. Our sense us and stuff like that Support Vector missions is under the black box mattered. It is again, the inner workings are tricky and complex and difficult to understand. It is called one off the colonel matter. There is something called Colonel Programming or Colonel Mathematics that goes in to explaining all these off stuff, and I'll guard the most, based on what is called Vector German Tree and Statistical Learning. Terry again, you need some basics off these fields before you start toe. Understand? Explain what support Muktar Mission Rector missions do. It can model really complex relationships, and it is very popular for usage in a pattern recognition like facial recognition and text recognition in those kind of areas of machine learning. Not really in the business situations, but in these kind of pattern recognition situations is where support vector missions are usually used and a successful implant applications soft support vector missions happens in Biomet Informatics and a major ignition those kind of areas. And it is used for both and or classifications and regulation problems for both discrete and continuous outcomes. So and they are also pretty popular in these areas in the caves of businesses. Again, these are available implemented in libraries. We just have to pass in the variables and you're going to get the output on doubt on then use the output for your work support. Vector missions obviously take a long time to run because their complex but predictions are fairly a great when you use support vector missions. Eso again. We are not going to get in dept. In in this particular of course, because this is more the begin, of course. But then there are a lot off material that is available to you. If you are going to go through them and understand these ones for the more Thank you 18. Bagging and Boosting: hi. In this section, we are going to be seeing about two ensemble methods called Bagging and Boosting. We already saw an and symbol method like random forest and similar to random forest. Bagging and boosting are also in sambal mattered in that you go on, build multi people models using the same data set, and then you take a boat among these models when you're trying to predict the difference between bagging and boosting is how the is that. How do you take this building's data set for every model that you build that Stoneleigh difference? And we're going to see how, exactly their differ in terms off, the data said that is being selected so bagging it is called bootstrap aggregating, and it is an and symbol matter, and it always uses a base. Classifier based classified like a distant trees, are named by a regression. It always uses the based algorithm on using that algorithm. It is going to do multiple rounds off training, and it is going to build my people. Models on production is done using each model, so whenever there is production needs to be done, that production is done using each of the models. So if there are in models you are going to produce Entwistle's on, then you take a vote Among the end results to see which one is the best listen. So the very use select data set for each of the round off mortal building is that for each round off model building on the training you build waters called us a bootstrap replicates , Data said. So how do you build a bootstrap? Replicated a set is that if the original data set off as M examples PM examples meaning AM number off rose. You do end rounds off sampling on the data on for each of the sampling round. You select him by n examples, so suppose your orginal later said has tendrils on. Then you do end runs off sampling. Let's say you do two rounds of sampling in each round of sampling. You select Mbai end that is 10 by Duke Fi examples each. So you do two rounds of sampling and each round of sampling. You get fi examples of photos. Then you put those two sets off fight throws together to form the data sent. So the final data set also has the same number of rows, as the original Data said, except that that is a possibility that some values could be repeated, so we will see how that is done in the next slide. Suppose, let's say we want to run training fight ends, which is we want to run training five times or build five models on a data set that said that they does that has eight records. So how do you do that on for each round? We wantedto two sets of sampling, and that is called sampling with replacement. Why is it gets about called sampling with replacement is that when you pull a sample out off the overall population, you put the sample back. So the next time you sample again, that role you pull, the earlier can actually occur again. That is word called sampling with replacement. And how do you know that? As let's say, for training round one that I caught I do is for these records are one toe eight. So Nero sampling loaned one. You select 14 fiserv in when you know sampling around to you select 2467 So the same rose can occur again because you're doing sampling with replacement your sample and replace it back in the original data said. So you might be getting the same values back. And then you put sample one and sample toe together to create what is called the bootstrap replicates. And as you can see, some of the samples are repeated. They, like the Rose Forest, repeated on the rose. Seven. It's kind of repeated, and this farms, your data said for training around one. Now you will go under training ground to on repeat the same process in which you might get another to set off samples. Under the two sets of values, you again form the bootstrap replicate. Some values may be repeated, like the values to and the value six are repeater, and then you go build a martyr lugging So you will repeat like this for building five different models. And whenever you want a product, you passed the data toe these five different models on. Then you're going to take a vote on which one is give which results occurs. The maximum number of times things are not about bagging is that it might produced improved results. Then the based, classically, the base classified with the basil garden, which you typically run only once. But in bagging, you're using the same well guarded emptor on many times on different data sets. If basil guarded, Um, is providing you unstable reserves, which is, you try to run the algorithm again and again on the same date, asserts it. Keep giving you different results. In those cases, the bagging is a better option when you use the same basic algorithm and then I play the bagging concept onto it. It has high results requirement. It takes longer thanks to build models, obviously because it is going to be building multiple models. And there are various models available in terms of bagging various implementation of bagging available, and they all use different based classifier. And here are some examples, like add a back bag car back artists using decision trees back flexible discriminate analysis. It is a very flexible discriminate. Analysis is a very in tough linear regression on. Then you are play backing on it. Logistic regression you can do bagging. There is another called model average neural network. It is. It is a variant variant, tough neuron networks, onto which you apply bagging please, no Daddy. No, the algorithm that are available in the world. Even though some of the basics are the same, there are a number of variants that keep coming out and coming out. A lot of research is going on in terms after trying to produce new earn your honor guard items on these are typically variants off. The origin regarded them so you will see, like tons of these algorithms available on How do you know which I'll get to them? It's best for your use. Case is simply by doing Thailand never used algorithm and see if the algorithm is predicting better. So you would see a lot of these are gardens. But don't worry about any of them because all you have to do is choose the algorithm and called him 1/3 with the bottom eaters, and it is going to do magic for you. The next thing, you are going to be looking at us, boosting the boosting. It's also very similar to bagging. It is an instable matter. The only difference between bagging and boosting is how do you come up with the data set for the training process so it again creates multiple models again production. Is there non multiple models and then the results. You take a vote to deliver the final prediction. In this case, the difference is that you are saying something called Fades. Eat to each other sample. So your data set contains a number of records on each record is given a bait. So typically you start with all records being a weight of one on, then used. Sometimes you want to do with as you keep going and building models, you keep increasing the rates off the records. How do you increase the weight off record? You can simply duplicated their guard. Suppose data set does a Pickard's and you want to increase the weight of the third record simply duplicated what that means is no. You end up with nine records with that particular record. When it gets duplicated, its its values the values off various variables in that regard get higher bait age because it occurs more number of times and typically that will influence the mission. Learning all gardens a lot mood. So that is how you kind of increase the weight off a particular role or a particular sample . So how does a how Horace Bait is being used. Is that you? As you keep doing multiple rounds of predictions, I asked, Misclassification happens, you would just increase the weight off those misclassified records on. How do you know that? Let's look at in the next life. So again, here there are multiple rounds off training. You start up with the weight off all records being equal in the first room. So you don't take a subset, you take all that occurred and all the weights of all the records being equal. So go on, build your first model. Once you build your first model, try to find the in sample error of the model bill. What is in sample letter is trying to use the model to predict the training data set itself and see how many of the records are wrongly predicted. If the record is wrongly predict, that means it does not model enough in the model that was being built. So you go and increase the weight off those misclassified records. You increase the rate of those misclassified records, like I said at a duplicate of that record, and then now you have another data set with a newly added record. Now that day desert becomes the input for the second. Enough model. No, you go do a modeling again on the new data said. Again, find the example. Error. Finally, Misclassified Ricard's then increase the weight of the most classified records now go for around three round for round fight. So as it keep building these models on each round of the model being built, you keep increasing the weight off the misclassified record. So each other model that is being built uses a different data set on with a different wait for the best. With less red record, you are ultimately end up with a number of models once a year. The number of models. The production process is the same as like bagging are run. You're putting through multiple models, come up with work on the results. And then there's your final prediction. Things are not is that they have high resource requirements similar to bagging because they take longer thinkable murders under this because you're building multiple models, The good thing about it is you can use a set off week learners. Weekly owners is nothing but big predictors. You have a number a set off predictor supposed you lose your orginal correlation analysis on you find that the correlation is weak for all the predictors. So this may be a good option to go and try and see if we can use a set off this week predictors to actually come up with the strong critter. So when a normal garden might not work for possibly boosting can actually work with. So it is one good thing to try if the prediction if you see that the correlation coefficients between the predictors on the target are not that good introduces bias because whenever there is misclassification, he gives great age. So the bias the algorithm originally had on the other rose as kind of videos. So it manages by as very well. And again there are different implementation off these algorithms available like booster classifications, trees, booster G A. M. Boost. A linear model. You know, Lumber of Arians off these algorithms are again available again. Which one is the best? You have to try and see and you basically learn from experience as to how do you use these algorithms? Thank you. 19. Dimensionality Reduction: Hey, in this lecture, we are going to be looking at what is called dimensional be production. So what is dimensionality reduction and what are dimensions? Dimension in this case are nothing but predictors. The number of predictor variables that you have in the data said, are what we call dimensions. But when you have a number of predictor variables, we have a number of issues associated with them on it is because of that, we want to reduce the number of predictors. So what are the issues with having too many predictors? It needs a lot more memory requirements, a lot more storage requirements as more soup you requirements. The time taken for mission Learning our gardens to run is actually a lot more if the number of predictor variables are more on the correlation between predict arts again between the predators themselves, not between the predictor and target. One protector might have a high correlation, which with another operator, which means they are dependent upon each other on that might actually influence your garden . Typically, you want the predictors. Didn't they got themselves? That shouldn't be that kind of correlation. So those kind of complexities arise. There is a chance of over fitting because some predictors will influence more than the other predictors on some machine learning will guard them, simply don't work fine when there are too many predictors. So how do we already is the number of predictors and what are the various options available ? If I would, already with the number of predictors. So some of the things you can do is you can use manual selection. In this case, you can use domain knowledge. You know, the field on you based because you know the feel. You can make certain things and you can say this is not going to influence my target. For example, in the medical field, you are trying to predict whether somebody is gonna have diabetes or not. Now there is an attribute for the patient called height, and a doctor makes a the height. Half a patient has no influence on whether the Persian as diabetes or not. So that is domain knowledge. So use this domain knowledge and play that domain knowledge and say, You know what I'm gonna take? I'd out off my data said, because I know for sure how high does not going to influence my cholesterol levels, but you gotta watch out. It might be possible that there is actually a correlation and nobody knew reported. That is a risk off trying to remove columns are removed variables without having proper consideration. Second is look at the correlation coefficients between the predictor variables and the target, and you can throw those pretty variables which does not have high correlation, simply so that is an easy thing to do. That is one possibility off throwing out variables based on their correlation with the target variable. 1/3 thing is to use decision trees, and then you can just decision please toe actually choose predictors on by Easter. If the specter of how maney predictors are there, you can give to the decision trees and try to build a model even though decision trees do worse or take a lot of time with a lot of predictors, At least the final street comes out with you'd want use all the critters. You know that half of T Prater variables. The final decision tree might not use all the very boots. It might only use five or 10 or it is going only pick enough number of variables, there's only pick those variable that have high correlation R high tendency to predict the outcome on only used it to build a decision tree. So Decision Tree can give you some insight about which off the variables are each other. Pretty does actually have high actually influenced the outcome so you can build a decision tree once look at which of the very about the decision tree actually used on then Onley Full Dario Data said that that set of variables and then used other gardens to do final predictions. The other, more popular scientific method that is available for you to do dimensionality reduction is what is called principal component analysis. In the principal component analysis, you are trying to find those principal components which have high influence on the outcome . So this is a very scientific method used to reduce the number of predictors. A full explanation off all the family and the concept involved is, I would say, an advanced level of machine learning at this point, so I'm not going to go into that. But it is based on water called aging vectors and agent values. This involves a lot of complex matrix and inverse of my tricks and transports of a metrics kind off stuff that goes on before you come up with this. Fortunately, they are implemented for you in the libraries that perform principal component analysis for you. So what are the principle of com print analyzes to suppose you ever data set off em predictors the PC, it takes this set of em predictors and transforms them into a set off en predictors, another set up in Predictor. Now you look at this end, predict arts. You don't not be possible for you. Do the lead a single value US A. Single column in the in predictor toe a single column in the origin alien predator. It's not possible to conflict. It is totally transformed and totally diffused. You could come up with a new set of values, and a new set of columns on the new predictors are basically the right predictor. They're called PC one pc to PC three. The good news, though, is that the specie one pc to PC three. They will show high correlation to the target variable the bear. The outcome will come up as the first predictor. P C one will have the highest possible correlation with the target variable. The 2nd 1 will now the next higher score like and third will not allow the next highest correlation on for each variable that each Prechter that is coming on the PC. A score called us how much off the variation in the final target is explained by this variable. So PC 1 may be able to explain 50% of the variation in the final of the target. Variable PC, too, might be able to explain another 20% off the target. Variable PC three might be able to explain another 10% of the target variable, so we just use PC 12 and three. That overall would be able to explain 80% of the variation in the target variables. So what that means is who just picked the top three atop for off these the right credit task and ignore the rest of them. Just don't uses detail, predict darks, and you start to go her and go your mission learning our model building exercises so and model driven, you know, model building excesses. With the red predictors, you have to also do the same thing when you come up with the production parts off your abuse piece here to do conversion to in order to do the prediction, the new predict asked retained similar level of correlation and predictability. So they do an excellent level of correlation and they are able to predict when the good thing is, they go in decreasing levels off how much they influence the target variable. So you only go on, pick the top X off those new predictor variables and use them for your analysis. If you look at the use case we have for this model, you will see how this is exactly being. 20. R Use Case : Advanced Methods: Hi. In this lecture, we are going to be looking at the advanced methods that we talked about on an example use case for advance mattered. And for that, the first thing I want to talk about is this carrot package in our correct package is a very useful package that you haven't are in which you can do all the mission learning algorithms using just this package there. It offers a number of functions which we have already seen. It gives you the ability to spread between training and testing data. It gives you things like pre passing like principal component analysis and scaling and centering and those kind of activities to on the most important thing that the current package does is it has taken all the other packages mission learning packages and a put a nice wrapper around it, put a wrapper around it so that you only have to call our function in the same way the respect he off which algorithm you want to use the same function in the same way, irrespective of the old guard, um, on the algorithm, you you want to use itself becomes a perimeter toe. That particular function, we just have to change the parameter value from decision trees to neighbors. And you get neighbors, you just change it to other algorithm name. It gives you that little garden you don't out the number All the different. We saw all these other algorithms to invent through each of those function calls had different ways in which you are to call the function the different ways in which you're the pastor predictor variables on the target variables. Sometimes it is within that they're within the same paradigm. It does something well outside. Just kind of all kind of confusion is happening there to hear you only have the number one way of doing things on the machine. Learning all garden itself is a para meter. Given that it makes the mission learning algorithm as a perimeter, what are the various algorithms it actually supports? And for that, you can go and look at this train model list on what you see is a really exhaustive list off models it supports. You learned so far about what, four models and here you see what? I don't know something like 203 100 often. So don't be alarmed by looking at so many algorithms and then saying, Oh, I don't know all these algorithms. Remember that all of them's are just variants Variants off what we have already saw. So we saw the basal garden. Let's elect classifications, trees. Now all these algorithms or implementations where people are trying to take the base on Agata might try to kind of tweak it toe, make it better for something here, something there. These are our people. What they doing research in their PhD, You know, pieces are in the university. They keep coming up with new algorithms for different use cases, but they pretty much used these same basic concept. And given that she FBI just focused on practice nor theory, all we have to know is that the set of algorithms exit exist and all the or do we just try this different algorithms and see how they're behaves? For example, you have something like classification trees. Here, let us look at all the algorithm that saves trees. Sorry, I didn't I should have gonna work here. So you see that there is bootstrap classification trees which is boosting the plate on classification trees. Then you see that random for us by random ization. And there are other tree the stats Toshi Attic Grady In boosting another kind off the C 4.5 type trees than logical model trees. You know there are different types of really guarded them. Similarly, you'll find for every basil guard them. They're like 10 different variants off those based algorithms, so don't have to worry about it. And I will also give you an easy way by which you can try all of them and see which one suits most for your use case. So with that letters more toe the example for advanced method and fear, we're going to be looking at breast cancer data. What you have here is a set of observations made about a set off breast cancer patients, observations and diagnosis that has performed on them the values of the diagnosis that came out. And finally, the value whether that patient was benign or healthy or Mulligan are they have a possible is it's so looking at these perimeters you're trying to figure out whether this patient might have the disease are not so. The techniques that we are going to be using is we're gonna be using principal components analysis with training and testing confusion matrix on. We will look at neural network support backed emissions bagging and posting all of them in one shot because we would be using the carrot package for doing all this. All this stuff there is first loaded data packages called breast cancer dot CS We in the resource bundle and you look at the data on what you see. There are 30 door different variables in there. So there is the ivy, which we can simply ignore that idea off the patient on then the diagnosis, which is telling you whether there's benign or malignant bananas. OK, they don't know the disease malignant and they have the disease. And you have, like, 30 different predictor variables in there, like buddy media areas, muteness compartment. These are official examination or some kind of examination of the patients and these other mission mint takes taken on this patients. So obviously the number of variables here are huge. We lose somebody of the data again, 30 different variables kind of all looks OK. You can inspect it and on your thing when you when you do your analysis Similarly, the head not giving that same May kind of thing. Data looks pretty straightforward except that there is a lot of variables. Let's go down, do the calculations again. 30 different variables. I'm gonna break them into, sets off 10 and try to study analysis. So under the plot, the second variable, which is the diagnosis against all the other ones, like 3 to 10 variables. And see how diagnosis has correlation with the rest of the game. So 0.70 point 42.74 kind of medium toe. OK, kind of range Now in do the next set off 10. So come back to 11 to 20. This is again kind of Woking 33 faces and Fifi looks pretty OK, and then go back again than the rest of the guys again. Kind of. Okay, So what do you do with 30 different variables? You have to get into variable reduction. You can do the early method by which you can visually inspect them and figure out which variables I want to manually remove. Our I can use principal component analysis. The principal component analysis is like magic. It is going to come up with a new set of variables on this new set of variables are used to explain the or set of variables. When I say explain, it means it is correlated are the patterns in the old centre? Variables are captured in the new variable. So that hasn't pattern. The old were able captured in this new variable on that is done by using a aging victors and agent values on the mathematical, uh, the planning thing that is there. So how does this one work? The first thing you do for principal component analysis, you have the skill. The data scared the cancer data using the scale function. And then there is a function called PR calm. Principal component analysis on the scale data. So how much other complex you have been talking about about these material off these algorithms? Yeah, pretty simple, because something is already implemented for you to just captured a run this command PR camp on the scale. Later on, you get this data out called PC data to the PC data has again a set off variables which captures the patterns in the original set of variables. So how about those? They captured the very things Let us see. But you know a plot of the specie. A data. These are the variables that are coming on. It's called PC one PC to PC three, PC. Four. It's called Principal Competent. One principle competent to 34 on each off. This new variable captures the patterns about all the old 30 different variables. Capture the partners about all the 30 old, different very. But that's how it works on each of these variables. Explains that very variances in those old variables in different levels. So the first variable PC one see the variance it explains, is really high. The next one explains to some extent, and that I want an explanation. Each of this new variable dust about the old variables keep decreasing like this. So the one on the top, the BC one explains the most. Then the next, Then the next. Something like that. Now you need a with somebody of BC. You did that. You'll see how much of the explosion really happens. So you look at PC one BZ. One explains the proposition of variant that explained the PC. One explains point for for three. So what it means is 44% age off the patterns you see in those old 30 variables are explained in this one variable alone the same thing. Now we go to PC to 19% off that variable variation that you've seen. That old set of 30 variables is explained that one variable alone on here at the community personage. So there's a 44 44 plus 1963. So by the time you get to the fourth very bluesy 80% of the variants that is they're in Those old 30 variables are explained in these 1st 4 principal component variables. So in just four variables, you explaining all the pattern that you see in those or 30 variables that sets off the magic death even though it reduces the number of variables it still X was able to explain 80% of the variants that you see in those old variables. So that tells you that I don't know the big You wonder what came out with, you know, 30 new variables. I don't know the going pick all the 30 new variables. Maybe if I just be the top for our fight, you don't explain enough amount of variation for me. So that is what I'm going to be doing is I'm gonna be lady taking the 1st 3 variables alone and making my final data the 1st 3 variables alone. So this variable values that again just numbers. If we inspect the values, you will see, like 1234 some numbers and there. So you pick the Varios so you create a new data frame, pick the 1st 3 principal components alone and put that in the final data frame on. Then you add the diagnosis is so that diagnosis, the one that needs to be predicted, or the target variable. So ascent that also for the final data, Now you do of par start panels and see what comes up with. The first thing you see is diagnosis. The correlation it does. It does very high correlation with PC one because, as we know, PC one is able to explain, like 44% of the patterns on that shows good, Really good correlation to the diagnosis value. What? What is very important. Most important to see the zeros here. There is no correlation between d predictor variables. A lot are so that is one of the greatest thing that comes out off PC is that as each of this new variables that are coming out, they have correlation toe the correlation to the target. They are capturing the patterns in the original old variables, but they do not have any kind of correlation between themselves. So that again Brilliant. Because this actually has a lot in the mission dunning world, because these three of predictor variables are totally independent off each other. So this is actually really good. So we're just three variables were ableto explain. The patterns you see in the diagnosis were converted 30 into three pretty quickly. Now we have converted 13 to 3. The next thing we have going to be doing is modeling and prediction using the carrot package. The first thing you know where to load the package. Then you are going to be doing the training and testing spread 70 to 30 similar to how we have done in the other example staining, testing in train and then going down. You will see that again. The split between the benign and malignant is almost equal to 50 to 1 49 Simmons, we are going to be using four different algorithms. What? I'm going to be doing here is I have this piece of court. What? This court is going to do this. It is going to do model, building and prediction for each of the four algorithms and for each other for our gardens . It is going to be measuring the time it took to build a model and predict on that on also the accuracy of the model. So I'm going to be running like a test year, a comparison test here between all those four models, though I'm using all the four models toe predict the breast cancer thing. I predict whether the benign and malignant thing for the breast cancer, I'm going to be comparing how these models perform against each other. So let's take a look about what this court does. First, A C just cleared the vector off these mission learning names. So I'm just picking the tax for each of these ill guarded them. So there is a backing of garden my pig boosting our garden A picked on a neural net vocal guard, um, and then a support vector mission. A big and these things I picked from the current package we saw here, the actual I'll guarded them value you have to use as a paramilitaries in the second column . So that's what I picked from here. I would like for Gotham's this is the same court. You can use it as all the other algorithms to, And then I create a final result State. I said that a memory data said to capture the final results. So this data set is going to have, like, the algorithm name, the duration and accuracy. So I'm gonna be looping through this red list are predict analyst who are 1 to 4 on for each of the members off this list. What I'm going to be doing this first thing I'm going to say, OK, I'm just doing this. I got into spending a value, then capturing the start time, the current system time, Build the modern. So I have one function Coltrane toe, which I passed the target tell all the predictors date I called training on I say method. He called a What is the name of the method. So this is one matter training with the training my thought I call in which I can pass the Al guard, um, name as a para meter. So by just passing the garden name, I can try different algorithms with the same train weather. And that's what I'm trying to do here. So I'm just looking through this list off algorithms and calling them one by one. And then why build a model? I predict on the model Identical. The confusion matrix on the model captured the end time. Where are basically to find out how long this one is taking, say, captured. Then start them and their time. And then I populate the results. I have this results, Nate. After Mnuchin, you're going to just populate the algorithm used the total time it took start 10 minutes. Any time on the accuracy that I get from the confusion matrix, the confusion matrix has this overall, remember which in turn has this? The first member off this overall data frame is good accuracy. We just captured us and money played 100. So accuracy is going to come out past point something I'm just Monday playing by 100 on droning off to do. That's what I do here and capturing Grizzles and I run this court on different algorithms will warm it different kind off. No actual our ports. All this is crap on. Then finally, I will go and look at the final output on the end zone running the fire Lord put on here. A seat algorithm is the type of the algorithm used. The duration it took on the accuracy of the algorithm. So you see that the FDA, the backing actually took 80 seconds to execute nine and 96% accuracy. Largest tick boosting on you. Took two seconds to execute. Came in 97% accuracy nearly broke eight seconds. 97% on SPM. Three seconds, 94%. You know, you can go and try the rest of the algorithms to and then see you know which one gives what kind of accuracy typically at the data increases the in several matters like bagging, boosting astral ass Random forest will take more time. Neural networks will also more take a lot more time. The fasters you would find maybe like something like decision trees are a regression. Maybe the faster. So this is all the guidance work. So this is you can use the carrot package for doing predictions. It is one package that can all you need this one package on the rest of the algorithms that we had. We tried different libraries just for the sake offer. We could have done all of them using the carrot package itself. And I recommend that you try all of them also with the current back agency how it comes out to be. And we also saw how principal competent analysis produces the number of variables while maintaining all the signals that is required. So this is all the advanced methods for you. Thank you.