Transcripts
1. Course Overview: Hi. Welcome to the course. This is the course Overview lecture and I'm your instructor, Nizamuddin Sindical. This is a very concise course giving you an idea about their designs. To be very honest, this course does not teach you calling. But you will get to know the important concepts off better signs before listening to lectures. I highly recommend to take their daughter understanding test given in projects. This test is created to check your understanding off Bitta. After that, in the next lecture, we will discuss some common things like what is data science? And it's need and things like myths about better signs and mistakes made by other people who want to enter into orbit assigns word. Then we will move on, though conceptually stuff where we will discuss types of data. Descriptive analyst says the duct cleaning feature engineering and an important thing that is how to develop data thinking. After that, in the last section, we will discuss a real life problem you are facing and how that problem can be solved. The using their designs, we will find a problem that is related to you. Then understand the use off algorithms in data science and figure out which algorithm is to be able to solve that problem. Then we will discuss how to make predictions and different learning methods. These concepts are very useful for a beginner in data science. So you must listen to each and every letter. Very carefully. Ah, huge. Thank you for enrolling in the scores. Take their dirt understanding test. Now. I will see you at the next lecture.
2. Data Science in Layman Language: welcome everyone in this elector. We will discuss their technical definition off due to science and they could look at each of its components. So let's start with the definition. Data science is simply a process in which we make use off label data and scientific characteristics off their data, then apply different tools off Brazilian making, sort of mathematics, statistics and machine learning to solve real life problems. The science of better science is basically a combination off mathematical science and computer science. Mathematics and statistics are familiar. Parts off math, medical science and machine learning is a part of computer science. Both off these sciences deal with the manipulations off data using different methods. Now the first thing you need to understand is what is data and water? It's scientific characteristics. Letter is to find that a certain values off subjects these subjects can be people think website pages, services or anything about which some kind off information can be recorded or opt in an example of beautifies collection off heights off historians in a glass since did either said or a collection. Therefore it is plural. The singular form off beauties didn't, for example, height off any student in a class since Judum is a single value, cannot be analyzed and do ties used every year for analysis. Purpose. Did you also have something difficult? Mystics do not get confused with Sandvik. Corrected sticks of data and the decorative sticks. Scientific characteristics off data are different from the the characteristics they're do. They're characteristics defined the quality off Duta and scientific characteristics off. Twitter defines the nature of data. There are three most common scientific characteristics and 1st 1 is measurement units. The day is always measured in some units for example, which can be measured in kilograms can be measured in centimetres. Friend can be measured in seconds. Currency can be measured in dollars ex cetera. You need to remember these metal mental units during calculations because ordering these measurements will affect your analysis. The second characteristic is the tape off bitter You subject whether their duties continuous or categorical. This is very important because the method off the endless is you will use in data science projects have a direct relation with what type off better is there. The last thing is limitations. Sometimes you will have to work on their tablet has some limitations like world temperature . The normal body temperature is 37 degrees Celsius, and if it is greater than 37 degree, then we do not call it normal. Similarly, there are other processes where some limitations are imposed, and these limitations are based on Dr Statistics off the subject understudy. After do that, we have mathematics. Mathematics is the base. To understand the statistics and machine learning concepts, you need to have a good command over linear algebra, calculus and Mattis Terry, if you want to become a great data scientist once you're done with mathematics, the very important subject is the statistics. The statistics is a very broad subject, but some concepts off the statistics are the base of data science, such as descriptive statistics, hypothesis testing and regression analysis. Since the statistics plays a major role in Data Science project, you are most likely to get questions in interviews only from statistics. Finally, we have machine learning machine learning performs. The analyst is off real time. Better. The time guitar means that do derivatives continuously created, for example, sales on an e commerce website. Due to a large amount off Twitter floating on the Internet, it is hard to create offline systems that can use previous data and automatically learned the patterns in their data to perform necessary actions. And machine learning provides a solution to this problem By applying machine learning algorithms, we can infer about the characteristics off new Buta without using explicit commands. So I hope now you understand what big assigns is all about. If you have any questions, then please ask in question and answer section. Thanks, everyone. I will see you at the next lecture.
3. Why do we need Data Science?: I welcome everyone. I hope they're now You go to an idea about what is the designs in this lecture, we will discuss. Why do we need Dr Signs? So let's consider an example. Suppose you spend $800 in January. They tendered $50 in February, $700 in Mart. Now you want to know how much you would be spending in April so you can do 700 because it is the last months spending or any off the three values randomly. Considering that each of them are equally lately or the average off these three millions, anyone can do this without using any prediction tool. The problem with this is that we do not have any it racing either. For example, if you take the average of the three values, then how you can be so sure that the April's until spending will be cool toe that average or will be close to it. There is no goodness measure or a crazy meter for your production does times is used in this case to solve this particular problem, we can use Time cities concept off better signs to make predictions for the appeal suspending there are many times these methods that can be used in the scenario. Therefore, we need to learn many concepts because theocracy off the production will vary depending on the matter values. This is just an example where we can make use off times these methods. But this is not enough because there do. The problems we face are very different and over it'll off methods are required to solve these problems. That is the reason we should learn multiple algorithms to become a data scientist. In this course, we will discuss the use off some algorithms that are preferred by data science professionals, tens everyone. I will see you at the next lecture.
4. Mistakes by Aspiring Data Scientists: Hi. Welcome everyone. So far, we have discussed the definition of data signs and it's need. In this lecture, we will cover the common mistakes that you must a wide to become a data center. These mistakes are made by other students off data science. So you should take a lesson from them and I work these mistakes. The first mistake is learning political concepts without practicing them on a real data set to understand. Due to science concepts, you must go through books, blocks and causes, but it is not sufficient if you don't practice them side by side. Application off these concepts is equally important. Will become a data science professionals. Whenever you learn a new concept, just we will real life problems based on that concept, and you will find that your grasp on that concept is a stronger than earlier. You can also search for problems. Build on those poems EPS in different competition platforms like Tegal, Devon data crowd analytics, except don't read to learn everything in one go, it is more important to go slow and study instilled off quickly and rushing. The second mistake is moving on to machine learning without having prerequisites my daughter off learners are amazed by machine learning techniques, but if you really want to understand it, you should understand how when algorithm works from a sketch, mathematics and statistics plays an important role here and the concepts you need to learn our linear algebra, calculus, metrics, clery, statistics and probability. There are tons off online and offline the sources for these topics. But don't get confused by the options initially twos one and followed that only eventually you will get to know the concepts and their applications. Once you feel a little comfortable, then exploring different books and courses will help you a lot. That third mistake is using data science terms inappropriately in your resume. Hiring managers are looking for your background in Dota signs. Using too many data science terms without explaining why you used it will make your resume AVIC. For example, I am proficient in random forest. This shows that you are given proof off your knowledge without any practical experience, it concluded in as applied random forests to predict loon approval. This shows that you have really used random for it algorithm and they can ask you more about your project. It is important to mention do the science terms, but the V you mention them needs to be more specific, and using too many terms will make your interview difficult because you will have to close . Is so much information in their case only list those terms about which you feel very confident. Always write a line about why you used it. Check out the resume is or provides off other duties and this on Lyndon to understand how they present themselves. The fourth mistake is thinking they're on WTO's science project are similar to get a sense condition problems. This is one of the common mistakes the dude has exploded in. Competitions are almost clean, with a few missing values that don't require too much work to fill them out. On the other hand, Realtor science problems are not like that. You will get unglued ended up, and you will have to spend around 50% off your time on cleaning it. This is frustrating, but the truth blinded a science. You gonna work this mistake on liberating experience? Initially, you will feel their data science competitions are difficult and the real projects are like that. But when we will look at some real projects off their designs, then you're thinking will change automatically. If you properly focused on your programming skills, then you are more likely to become greater detail cleaning, and that will solve your problem for real Duta science projects. The first mistake is most people think that accuracy of predictions is the most important thing and the forget about the business. Let's understand this through an example. Suppose you have 100 variables to predict the seals off an e commerce company such as type of products, locations delivery did etcetera because possible that you are not aware off work somewhere . It was really mean. But you build a model with good accuracy and drop some variables. Some off these drugs variables might be important to the business, but you drop them because they were not contributing to higher accuracy off the modern. Therefore, the morally creases good, but the problem is not solved as it should be, and it is a big mistake, So having an understanding off the crucial elements off a business is necessary, and we call it domain knowledge. To avoid this mistake, focus only on one industry and Google. How do data science is used in that particular industry. Also searched for relevant do desserts and analyze them for practice. The sixth mistake is trying to learn many tools. Some learners are very confused about two finger tool and many thing their discipline are and pichon both from starting. This is the wrong. Just think about if you have any teeth problem, would you go to a dentist or a general physician? Of course, that then disk is a specialist in the same way you knew to be less based. List in our or pattern. Once you will be a master in one off the two, then you can learn the other to add tool expertise, and you can use it according to their need. I would like to share one older saying, Take off all my stuff, man, You should be a mustard off one to Institute off Jack off. All polls I recommend to my students are very simple solution. If you were inclined towards programming than Q Spuyten, but if you are not much inclined toward solving according problems, then you should learn our The seventh mistake is not working on public speaking and communication skills. As for my observation No one is talking about public speaking in the designs industry, and everyone is doing it. We want to fill the demand of data scientists, so we should focus on how to publicly communicate your project needs and insides to known technical people. That is the reason confident public speaking is the most, and the way you take this is their communication skills coming. To be honest with you, the interviewer will monitor your communication skills throughout the interview process, and they might also ask for a presentation. So what? This mistake? Explain your projects to a nontechnical person. This can be any person who understands the English language, whether your mother, friend or anyone. The main objective is making them understand what you want to say. The Earth mistake is not solving case histories in the final Alonso drone, most people feel alert solving cases studies. You have knowledge and tool expertise, but it is not enough because I did. A scientist's job is to solve a business problem. The case histories are copies off real problems field by the company, and they want you to propose a possible solution to the problem. To award this mistake, you should practice cases studies from the same industry. Since you are not part of the team, they already know you are less likely to solve the problem. But they are looking for your problem solving approach. Whatever approach you have, it should be logical. And you must be able to clearly come negatory steps. These were made mistakes that must be awarded if you want to become a data scientist and who will be saving a lot off your kind? Thanks, everyone. I was you were the next elected.
5. Data Science Career Myths: I welcome everyone to this lecture on myths about data science. SKorea's There are three common myths about careers in data science people have, and the 1st 1 is the SDU Degree is necessary to become a data scientist, but this is not true. It depends on which type of job profile you are willing to work in. If you want to work in an applied data science rule where existing algorithms are used, then you don't need a PhD degree, and it also has a very high demand. On the other hand, if you want to work in a research rule where you need to create new algorithms, right scientific papers or deserting them, then appeared he might be required in most of the rules. Masters degree in a quantitative subject is considered the second motive. Your experience will be considered a few transition kudo The signs. If you have a solid experience in some industry and want to enter into direct assigns, then you have two options. Change your domain completely or a stick video. Doman and Search for Adidas Angel. If you were tending your domain completely, then you are newcomer to better signs and your experience wouldn't be counted. The critters look for value. You can add to the organization, and they will find nothing because you are changing your domain as well as a whole. Hence, this will be most likely to be a one way ticket to failure. But if you stick with your domain, then your chances of getting considered is high because you already know the industry and you will understand their data you will be working with. This is a very strong factor. Hiding Manager will take into account to make the final decision about your selection in the interview. The tournament is fully skills are enough to become a data scientist. The most common question among aspiring. The Deseret Industries. The school I should learn or Pepper Toby come under doesn't interest. You knew the combination off multiple skills. So the right question is work. Excuse. I should have to become a data scientist, focusing only on pools will lead, You know there schools are, nor does central point in their designs, but they have a very wide used to implement their techniques. You must focus on concepts and their real life of politicians, problem solving skills. It's structured thinking and communication skills. This will build your formulation for data science at all. Just remember that these three minutes are very common. So don't get into the trap. Thanks, everyone. I will see you at the next election.
6. Data and Variables: Hi. Welcome everyone in the selector, Little discuss about their types of data and variables. There are four types of but I exists in the nature and we need to understand them before applying any concept, because understanding their doctor type will help us to decide what we can do with the actual data. Values indeed assigns you will find that the rota falls in one off these four categories. The 1st 1 is nominal later. It is also called the categorical or classified data, for example, agenda. It has took a degrees male and female. Next, consider hair color, which is generally defined with four categories as brown, black, gray and others. Similarly, religion has many categories, like Christianity, Islam. But this in these secrets and others the misery of central tendency For this type of two days more, the second Davis Ordina Luda It is also called categorical. But this tape of PETA has rending in himself. For example, level off education. Here we have four categories off education. In this case, we know that graduation is better than a school. Poor vision is better than revolution and Ph. D is better than post creation, But we cannot see how better there with each other. Hence they can be represented as an order, whether decreasing or increasing. So it is called Ordina Measure of Central Tendency for this type of today's either more or millions, but it can never be the mean. The third type is in trouble. Later, they're due to over test ranking in itself, with known differences between the values and have some physical significance is known as interim leader. For example, temperature can be represented as two degrees Celsius, three degrees sales years and five degree Celsius herder difference between East temperature level and ex consecutively Will is equal. The most important thing about this type of details. It includes their zero level but does not refer to the absense off it. For example, zero degree sales years has some physical meaning. Maser off Central tendency for this type of data can be mean 1,000,000 mood depending on their distribution off, the better. The last type is. Issue the dudovitz for lose all the properties off all editor types, including the value of zero Ministry of Sense. For example, if you have $5 $10 or any other amount in your purse, it represents the shooter because if you remove all the dollar amounts from your purse, it means you have $0 which implies the upsets off fish, Mr Offs and Children. And see for this type of data can be mean 1,000,000 mood, depending on the distribution of the debtor. Now we will discuss their type of Areva's. There are two types of variable used under designs, and the 1st 1 is convenience very well. Any very well, which has in final values between two consecutive values, is known as a convenience very well. The easiest way to identify continues variable is that it has a unit off measurement a social good wedding, for example high that can be measured in centimetres. And there can be infinite number of people having out, but wins 1 60 centimeters and 1 60 once and do winters. Similarly, we have it, which can be measured in kilograms time, which can be measured in seconds at Spectra. The second, very bullets categorical very well. Any very well, which takes different categories, is known as a critical variable. For example, a variable college haven't three categories, such as treasury portrayed word and purity here We have three categories hands. It is a categorical very well tens. Everyone, I will see you have the next elected.
7. Data Cleaning: Hi. Welcome everyone. In this lecture, we will discuss the most important aspect offer their destinies Project that is due to cleaning The cleaning is the process of preparing beautiful analysis. But I do moving or existing the rotor that is incorrect, incomplete, it relevant, duplicated or improperly fermented. Mainly, we can say that we need to perform due to cleaning to check whether it has any errors or not, and if it has some errors, then we should remove those errors. The first Arab in order tested that we need to look for it. Type waiter. Sometimes details incorrectly entered and there dig referred to as paper letters. For example, gender off a customer is male but incorrectly and orders female. It can be also in unwanted information such as email is ender in place, off name and refill the secondaries missing later someone formation in their duties. There it is not available. For example, a customer did not under his or her gender dealing very station. The last era reasons pollinators or for mountain errors that is possible. Their categorical very was having trees that have values didn't incorrectly or did. Format is not consistent or nomadic values are not in appropriate number for me. We need to fix these errors before applying. Any do journalists. This technique on the little tense everyone, I will see you at the next Lecter.
8. Descriptive Analysis: Hi. Welcome. Everyone in this lecture of we will discuss about that descriptive analysis. Once you understand their type of Buddha, you have. Then you need to understand what you can do with their data and how there can be done. The first step would be descriptive analysis off each of the variables. It gives us an idea about the distribution of the variable. If they're very well is continuous, then we can clear the summary statistics where we find minimum value, maximum value, first quartile, murder and third quartile, total number of values and missing values after their recreate his telegram or box port to his allies, the editor is the very bullets categorical. Then the summary statistics are frequencies, proportions, marginal will use, and bar plot is used to visualize their data. If we want to visualize the behavior off to convenience variables together, then a scatter plot is used. But if one variable is categorical and the adult is continuous, then box brought off continues. Variable is created for the categories off the critical variable, and if both very was not political, then I stayed about a plot can be created Once descriptive analysis is done, then we move on to inferential statistics or machine learning expert. The objective of the project Descriptive analysis Help us to decide which procedure off inferential statistics or machine learning should be used for further analysis. Tense everyone. I will see you at the next lecture.
9. Feature Engineering: Hi. Welcome. Everyone in this lecture we will discuss about featuring the living, which is one of the most interesting thing, indeed assigns the objective off future in the leading is to create new, independent variables that helps us to create better predictive models. There are three ways in which our new very bullets created creation off new, independent variable using existing ones transformation off existing variable innovate that improves the performance off their variable in the model and extraction off independent very was by extracting them from some other data. For example, suppose you want to create a predictive model for health outcomes such as rep pressure, and you have persons were and hide, so you can create an another variable cord body mass index by dividing weird with the square off the Since B. M. I is a very good predictor for many health outcomes. You must include that in your model. Otherwise your model is less likely to perform better. In this example, we have created our new very well be Am I using existing variables, height and will. Similarly other views can be used. There are many transformation and extraction techniques of level, and you would need to practice them while doing feature engineering. Just keep in mind that your objective is to create a feature or independent very well in a way that will make your model good. This is more off a political concept than practical, so you need to consult the relevant literature, talk to the domain experts and use brainstorming tense everyone, I will see you at the next lecture.
10. How to Develop Data Thinking?: Hi. Welcome, everyone. In this lecture, we will discuss what you should do to develop data thinking. If you want to become a data scientist, then it is a must to develop. Do the thinking in dumps off. Do the signs digger thinking is deal observation off a problem and mentally are planning the process to solve that problem. Data thinking is developed by learning with the application. In this process, you learn, not technique off to the signs and try Google play their technique in the two day of duties , such as certain sleeping, walking, eating, reading, etcetera, for example. Suppose you studied the application off once ability, test off statistics. Now you should pray toe hypothetically, create problems in your mind and find out whether you can solve it with the help of one sample to test or not. One simple you test is the hypothesis testing procedure and which would take a random sample from a population and on the basis off testing the sample, we can comment on the every value off a correct plastic off the population this characteristic and behind off the population or anything. One of their daily basis problem can be the sleeping us in a day. Suppose so mystery and says that this sleep six hours per day on average. So you record the sleeping hours off some extremes or your friends for a month, then use this data to test that, whether the every sleeping ours is six or not. In a similar V, you can use to sample to test to check whether the sleeping Earth is greater form illustrations or for free Miller strains. This was, ah, hypothetical example. I defined in my mind where we can apply to test. You should define this tape of problems and find out how you can use that a science topics to solve it. If you think that you can use data science topics to solve that problem, then think about how it can be done. Once you will be able to define and understand how to solve problems in your mind, then your creativity and problem solving skills will start developing, and it will also help you to improve your analytical thinking and boost up your confidence as well. Do not forget that this is important for your analytical thinking. Also, and analytical thinking will help you to improve your better signs. The skills thanks everyone. I will see you at the next lecture
11. Real-Life Problem: Hi. Welcome, everyone. In this section we will discuss a real life problem faced by aspiring to dissent is and how it can be solved using data science itself. The problem statement is I want to learn due to signs, but don't know where should I learned as soon as you get discussion in your mind that were , should I learned their designs? You have already jumped into the ocean off data science word because now you have to ask truffle in questions to yourself. How many institutes are there offering data science courses? Which one? You should, too, whether it should be online or offline according to your comfort, How much you can pay. Do they provide jobs, support, programming tools, the training, and do they provide certificates? Now we will have spoken word these questions into a form which is used in data signs. Here we have one question, which is dependent. That is which one you should choose. And the other questions that whether the calls should be online or offline, according to your convert, how much you can pay. Do they provide jobs, support programming, tools, training and do the provides certificates? These questions can help us to pick their decision about choosing the best institute That fits your requirement so we can call them exponent Rick Oceans or independent questions because they will explain to us which institute should be chosen. The next step is to convert these questions into variables. This is called very well formation here. I have created hypothetical names and values for your understanding as follows. The caution that which one, usually two, can be named as institute names with seven hypothetical categories. A, B, C, D, E, F and G. The caution that whether it should be online or offline then we named There's Learning Method with Tooker degrees only enough line The question that how much you can pay can be named as fees. The question that do they provide jobs? Support? Can we named There's job support with Took her to grieve. Yes and no Programming tools training can be named. There's tool with three hypothetical categories ourselves and python. Do they provide? The certificates can be named as certification with two levels. Yes and no. Now, if we have listed all the variables, let's have a look at the data, baby. I have created this hypertechnical deter to make it easier for you to understand. Here we have all the variables with their different values. You might have noticed there, too. There are few institutes that are mentioned more than one time. The reason behind that is the provide courses through boot. The millions online in Northland, also on more than one programming tools and their prices are different. This is one of the most important step in do the signs you need to make the data usable in a V says they're the analysis becomes easier. I'm debating this. You need to make the data usable in a vase as their the analysts becomes easier. This is the very, very important part of her duties science project, and you are likely to spend 80% of your time on this. So just make sure that you spend enough time to understand the form off data that will make your analysis easier. I am sharing some key points that you should remember well. Collecting the data always mentioned the orginal values don't change anything initially that should be collected from an authentic souls collection must be random and should not be inflamed with personal beliefs, find and point out the inconsistent values in the data and don't end the unknown values to zero. These points are very important to get Duffy results. The next step is to find out how we can make use off this data. So beside which institute you should choose. So for that we will be needing algorithms. Hence I will give you an overview off some algorithms Letter widely used under the signs. Thanks, everyone. I will see you at the next lecture.
12. Algorithms: Hi. Welcome. Everyone in the last lecture every discussed a problem statement, that is I want to learn the other signs but don't know where should I learn it and created a hypothetical letter to solve this problem. Now the next step is how we can make use off algorithms to solve this problem. In this lecture of, we will discuss different types off algorithms and the situations where they're used. There are five types off algorithms used junior dozens and these tapes are regression algorithms, classification algorithms, clustering algorithms, most ing algorithms and dimensionality reduction. Allegri loves progression. Algorithms are used in situations when they're dependent. Variable is continuous classification. Ingredients are used in situations when they're dependent. Very well is categorical. Clustering algorithms are the usual situations. Then we need to group assert off items having similar characteristics. Most single freedoms are used to improve the accuracy off the predictive models and dimensionality reduction algorithms are used to reduce the large number of variables to a smaller number of variables to achieve I smaller data set, which can be easily an alleged. Some algorithms can be used to predict continuous as the less categorical variables. That means really to make use of classification algorithms to solve the problem off a sparing that the scientists we will not only discuss classification and greed ums but also understand some of the other algorithms that you must learn to become a data scientist. Please keep in mind that I will explain when to lose thes algorithms and won't go and do practicing them, because this course is specifically created with an objective. To help you understand, the approach off data seems only so we will not perform and according exercise in the schools, the first algorithm is linear regression. Linear regression is a type of regression algorithm using which we can make a predictive model for one dependent variable. They're depends on one or more independent variables. But these independent very bus must building nearly related to their dependent very well individually and should not be linearly related to each other. The linear relation means that as the value off independent variable increases or decreases , the value off dependent variable also increases or decreases, and that is also called correlation between the two very was under consideration. Look, we should make sure that if we have more than one independent variable then these independent variables must not be linearly related to each other. An example, awfully near regression equation is, as shown here, were widely presents the dependent variable. Excellent X two x three upto xkp prevents different independent variables. That means there are gay variables. We need to find the values for boutique officials in this model, and these Buddha coefficients will help us to predict the values off the dependent variable in terms off the independent variables. The second original rhythm is step by the relation. This is also lunatic aeration mattered. But in this case, each independent variable is either added or subtracted from the CERT. Off independent variables go get the best set off independent variables, which are predictors off the dependent. Variable addition off variables is called forward. Stepwise regression and subjection off variables is called backward stepwise regression and Forward Oh, step reservation. We make the reversion model with only one variable, and then it starts hurting other variables, one by one to the model. On the other hand, in backward step by situation, we make the aeration model with all variables, and then it starts removing variables one by one from the model. Since Lena Integration request that independent variables must not be correlated to each other, but sometimes it is not possible. So when the independent variables are correlated to each other than this situation is referred to as the multi culinary, it'll hence Reg integration and loss or aggression are used to make predict to model for their data, which suffers from multiple linearity. These integration methods to uses different regular elevation techniques to select the best independent, very was the revelation. Techniques used in rich and loss of vision are out off the scope for the schools, so we will not discuss them here. The next algorithm is polynomial regulation. Polynomial regression is the type of fluorescent technique in with the independent. Variables are not linearly really put our dependent variable there till you have curricula in your relationship with the dependent variable. For example, an independent variable is off power one on, and there is the square off the first and rest off. The others also have different powers. As shown here, we're very representative dependent variable extra next to extreme upto XK two prevents different independent variables. The next algorithm is logistic regression. Logistic regulation is the type of regulation in which the dependent variable is categorical, which has only took her degrees. For example, we want to predict whether a credit card user wants to take a loan or not, as there are burger degrees such as yes or no. The general model for logistic revelation is Russian here, hairpiece. The probability off choosing one off the two categories and the right hand side is also, if you now suppose if you have more than two categories, for example, we want to predict which type of computer game is preferred by college students has. There are four categories, such as event to just gains puzzle games, action games or educational games. In this case, we can make use off multi normal largest aggravation toe predict their type of Kim. The next algorithm is name based. Name was either dedicated classification algorithm and it should be used in cases where we have multiple classes for multiple categories, off dependent very one, and it can we also use in cases when we want to perform text classification. This algorithm works on the theory off based room in which we use the prior probability of for calculating the posterior probability often even the general model for nay bays issue in here we're property off. See given Next is the posterior probability off class target. See, given the predictor X probably two FC is the prior probability off, plus probably off X Given see is the probability off predictor given class and pull it off . X is the prime Robledo credit The prayer probability is the probability often even worker and the posterior probability is the conditional probability that is the problem to off occurrence often even given their another even has already occurred. For example, the probability there to the this piece in the world is prior probability And the posterior probability for this case is the probability that God exists given their there is peace in the world. The next algorithm is decision trees. The main purpose off this country's is to provide a with representation off all possible alternative sport of the season. It works for both cattle legal and continuous, independent and dependent very was in this matter we divide the population or simple in tow . Two or more homogeneous groups based on the most significant independent. Very well Suppose we have ah sample off of three students with three independent, very well gender course height and with one dependent very well. Whether the play basketball or not, 25 of these historians play basketball in free time. If you want to create a model to predict for will play basketball during free time. We have to divide these 50 stew and sprayed on gender course, and this will help us to understand wonderful players who plays basketball and distant. We're done with the help of her dizzy entry. The most frequently used issued teaser card chaired and see four or five. The next algorithm is random forest. The Random forest algorithm build multiple decision trees and merging them to get a higher production increasing predictor knee. Will you in class imputation all trees give the results and the forest off trees. To this there is, er we get the highest number of foods, then showed as an output predictor new value. Inmigracion all peace give the results and the forest off. Please choose is the average of those desserts, then showed as an output. The next algorithm is support vector machine. The support rep promotion is mainly used for classifications, but it doesn't solve the problem off regression as well that uses Cardinal tree toe transformed the data and then, based on these transformations that finds an optimal distance between the possible outcomes . No classified these outcomes. The next algorithm is ingredient boosting. Brilliant boosting is one of the most important algorithms because it improves the accuracy of production. It means that if you want to predict our categorical variable, which has two degrees and your model predicts soon 70% off the observations correctly, then you can use Graydon boosting to increase it. And it looks. But you cannot make the aggressive 200% because Graydon boosting sometimes create over putting off the model. So we should be very careful with the model for 10. The next algorithms, uh, principal component endless is and lunar discriminate analysis Well working one or due to science project who will have so many variables and dealing with a large number of variables is a difficult task. So we try to reduce the number of variables and this kind of well done. With the help of principal component analysis and linear discriminate analysis, these ingredients are applied toe the whole set of variables in the data and produce on insert off variables which are not correlated to each other. The next algorithms are K means clustering and hierarchical Clustering. Clustering is the method of finding homogeneous groups. K means clustering and hierarchical. Clustering are the two algorithms which serves this purpose. K means clustering. Find an optimal number of clusters using sent drugs for each pleasure. The optimal number of clusters are found by repetition off android calculation. Once all lesson probes are unique, the Caymans algorithm stops and those number of trips RK in gaming's in hierarchical clustering. The merging off two clusters is done Baird on the equivalent distance between them, and this process is repeated until all the blisters are murdered. Together, we will discuss an example of clustering in the unsupervised learning Lecter. Once you understand their use off algorithms, the next step is to choose the best algorithm to solve a problem. And the best ways we should not is 2 to 1 algorithm for creating predictive models. Whether to predict continues very wells or categorical variables. For example, if we apply logistic regression to predict whether customer will be eligible for a loan or not, then it is not necessary. There it will were for all problems where we have took a degrees off the dependent variable . Hence, we should check all the algorithms known to us to find the best model. Certainly an algorithm is best for a similar type of problems. Would not provide the best withers because the characteristics off different datasets are not seem into your life problems. I hope now you understand the truly behind these algorithms, I said Just my store instead traced on google dot com and practice these algorithms you can access whom I needed a sense conditions for free on Google. And you will find so much help there because thousands off duty scientists are stored on google dot com. Now extreme to understand the production for aspiring doctor sent this problem how it is concluded and how we can decide. It's like crazy, tense everyone I will see you there than expected
13. Making Predictions: So far, we have discussed the situations where different algorithms are used. Now we will discuss a hypothetical model. Included predictions for the problem off aspiring doctor scientist. This will help you to understand how you need to measure the prediction, and it's dizzy. We know there. There are five independent variables, hence the general model for your problem is shown above here. You want to find the values off each bitta. Once we will get these values, you can predict institute name on the basis off London method fees, job support, cool and certification. The hypothetical question for prediction is all social in here. Suppose there the beautiful use are given and of you put online learning methods $100 fees . He has to drop support and one certification. Incest. The model is predicting institute C. Hence, you should to institute C to learn that the signs we know there this is ah classification problem so we can use classifications algorithms to make predictions. Suppose we use random forest and found their institute CNG are incorrectly predicted by the model. Then we used support rectal machine and found that institute C, F and G are incorrectly predictor now to choose the best model between the two. We will have to find the error rate. The error route is conclude by the number of incorrect predictions divided by the total productions. The number off incorrect predictions by Random Forest model are too, and the number off incorrect predictions by support with machine model are three. Therefore, the error rate for Random Fork model is 0.2 and the elevate for support with promotion model is zero country. Since the era route off, Random forest model is less 10 that supported her machine model. That means the accuracy off rainforest model is better, so we will use the numbers model for future predictions. This was a classification problem because they're dependent. Variable Institute name is a categorical variable with seven categories. If we want to know how much should I spend on their designs learning, then the same problem to industry regression? Because now the dependent variable is continuous and the data can be viewed as shown here. Sense everyone. I will see you at the next letter
14. Learning Methods: Hi. Welcome everyone. So far, we were trying to predict these to do claim, and we know that this game will come up from the seven names. This is gold supervised learning. Let's say you want to learn online. You can pay $90 for the court. You new job support. Want to learn the course using her and bones Newt certification. Therefore, the value off independent variables are on lane. $90 years are and no here we predict these two to name and you move there. There are only seven institutes A, B, C, D, E f, and and anyone off. The seven institutes will be predicted for your choice off Lupin. In variables since the outcome is known, this is referred to as the supervised learning in unsupervised learning. We don't know anything about blood comes now. Suppose there there are tennis to rinse landing due to signs at an institute called the Science Educators, which provides online classes. Only all of these is to insulin under the signs through their toys off tools. The institute wants to recommend courses to their stones without asking the interest off these restraints, because if they will ask them that to lend my think they want toe on more from him or her. In this case, there does know the restaurants interest, so they will choose a question mattered or association rule, with desserts in a group of restaurants having similar characteristics, the clusters formed by these techniques will have less variation within clusters and high variation between clusters. That means these are homogeneous clusters. Let's understand water homogeneous clusters. Suppose that you have a bag off apple guava, but on our tomato potato onion, you know it's more nuts and tissues. Here we can make homogeneous clusters as follows. First, Klestil will contain apple, guava and banana because their fruits, second clustered, will content tomato, potato and onion because their vegetables and third cluster will contain peanuts, walnuts and cashews because they are dry Fels. Here, all the items are suitable, but they're from different categories. Therefore, they make different homogeneous clusters. The variation within a cluster will be less as compared to the variation between clusters. For example, desperation in the characteristics off apple, guava and banana will be less because they come from the same family. But if we compare fruits with vegetable or rifles, the variation will be higher, which is called variation between clusters. If the variation within a cluster is high, then the clusters are gold. Had true genius because we can not recommend other products due to high variation in the characteristics off their products. Let's have a look at data science. Educators Institute, state off Chinese to incident. You can see that Sophia's learning are, and the institute wants to recommend other courses to her. She might be interested in and two other students as well. In this case, they can use clustering algorithms to find homogeneous cluster. Suppose they used game is clustering method and find these three clusters as plastered. One Contains Sophia ever and eaten. Cluster two Contains Jacob, William, Anna and Jack. Clustered three Contains James, Elizabet and Logan. It means that Sophia ever and eaten have similar characteristics, so they're in the same cluster and other clusters are formed in the same way. The recommendation off pool is made based on the restaurants. If you want to recommend a tool, so Sophia, then we will look for the courses in which ever and eaten are enrolled. We know that so flays in role in our court, Eaton is also enrolled in articles and ever is in order. Pattern course as Sophia and Ethan are enrolled in the same course. So we will recommend fightin to Sofia and eaten. Thanks, everyone. I will see you at the next lecture.