Making Numerical Predictions for Time Series Data - Part 1/3 | Partha Majumdar | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Making Numerical Predictions for Time Series Data - Part 1/3

teacher avatar Partha Majumdar, Just a Programmer

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

34 Lessons (6h 7m)
    • 1. Introduction

      4:28
    • 2. Course Contents

      3:23
    • 3. Time Series Data

      8:12
    • 4. Descriptive Statistics

      13:24
    • 5. Moving Averages

      17:15
    • 6. Centered Moving Averages

      9:54
    • 7. Weighted Moving Averages

      12:11
    • 8. Standard Deviation of Moving Averages

      4:52
    • 9. Seasonality

      7:39
    • 10. Correlation

      12:03
    • 11. Linear Regression

      10:30
    • 12. Linear Regression Demonstration

      17:03
    • 13. Linear Regression using LINEST

      12:45
    • 14. Predicting with TREND

      5:16
    • 15. Linear Regression using Data Analysis ToolKit

      13:21
    • 16. Multi Variate Linear Regression

      10:39
    • 17. Exponential Regression with Linear Model

      15:57
    • 18. Optimising Exponential Regression using Solver

      9:41
    • 19. Exponential Regression using LOGEST

      7:08
    • 20. Multi Variate Exponential Regression

      14:13
    • 21. Power Regression

      10:58
    • 22. Multi Variate Power Regression

      8:27
    • 23. Logarithmic Regression

      12:23
    • 24. Quadratic Regression

      4:30
    • 25. Polynomial Regression

      4:22
    • 26. Selecting a Model through Experimentation

      40:59
    • 27. Guidelines for Selecting a Model

      5:54
    • 28. Outliers

      12:43
    • 29. Degrees of Freedom

      4:38
    • 30. Normal Distribution

      14:46
    • 31. Standard Error of Mean

      11:15
    • 32. Confidence Interval

      16:21
    • 33. Next Steps

      2:27
    • 34. About Me (Optional)

      7:32
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

54

Students

--

Projects

About This Class

Predictive analytics is the branch of the advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics, modelling, machine learning, and artificial intelligence to analyse current data to make predictions about future.

One class of Predictive Analytics is to make prediction on Time Series Data. Studying historical data, collected over a period of time, can help in building models using which future can be predicted. For example, from historical data on Temperatures in a City, we can make decent predictions of what the Temperature could be in a future date. Or for that matter, from data collected over a reasonably long period of time regarding various life style aspects of a Diabetic patient, we can predict what should be the volume of Insulin to inject on a given date in future. One example to consider from the Business world could be to predict the Volume of In-Roamers in a Telecom Network in any given period of time in the future from the historical details of In-Roamers in the Network.

The applications are just innumerable as these are applicable in every sphere of business and life.

In this course, we go through various aspects of building Predictive Analytics Models. We start with simple techniques and gradually study very advanced and contemporary techniques. We cover using Descriptive Statistics, Moving Averages, Regressions, Machine Learning and Neural Networks.

This course is a series of 3 parts.

  • In Part 1, we use Excel to make Numerical Predictions from Time Series Data.

We start by using Excel for 2 reasons.

  1. Excel is easy use and thus we can understand complex concepts through exercises that are easy to replicate and thus become easy to understand.

  2. Excel is expected to be available with everyone taking this course.

  • In Part 2, we use R Programming to make Numerical Predictions from Time Series Data.

  • In Part 3, we use Python Programming to make Numerical Predictions from Time Series Data.

The course uses simple data sets to explain the concepts and the theory aspects. As we go through the various techniques, we compare the various techniques. We also understand the circumstances where a particular technique should be applied. We will also use some publicly available data sets to apply the techniques that we will discuss in the course.

From time to time, we will add bonus videos of our real time work on industrial data on which we will apply the Predictive Analytics techniques to create Models for making predictions.

fded6521

Meet Your Teacher

Teacher Profile Image

Partha Majumdar

Just a Programmer

Teacher

Partha started his career in 1989 as a programmer. In his first assignment, he was involved in development of a Cricket Tournament management system as a part of the team from Centre for Development of Telematics (C-DOT) requested by the Prime Minister of India, Mr. Rajiv Gandhi. Since then Partha has developed Tea Garden automation solution, Hospital Management solution, Travel Management solution, Manufacturing Resource Planning (MRP II) solution, Insurance Management solution and Tax automation solution (for Government of Thailand).

Partha got involved in Telecom solution with project from Total Access Communications, Bangkok in 1996. Partha developed the completed solution architecture and designed & developed the complete infrastructure services and primitives on top of whic... See full profile

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: the new medical predictions for banks in these data at the age of evolution. Onda we have, you know, it's all beat up as a college because we care much for appreciating to make traditions from the mathematics involved in making predictions was available since last century. Oh, the mathematics has become, what chance? Making predictions simply because off the property of the leader that borders on the result of it now it is possible for almost every in the street are endemic. To make predictions from the 27 there are more than what it's actually not by which we can make. Use one provision, trump the data and predict the future. One such big, which you don't as the banks in these times it straight up your data just collected over a period of time. Generally a lot of time on it is for a particular kind of operations, which is being tracked because useful talk. Many bubbles is, for example, property the stop everything. How many customers will become operator get off, or that any business how many customers you can put actually get with a period of time. I you inform sometime in the future, so this fakes ready goes one about politics. In this ghost, we will see how being dancing state up for making American predictions. We will use various students even start except because in this simple, I didn't really want every what. And it does not require much programming in front if you understand the concepts involved by trying to make my predictions using except subsequently that we're getting the hard core programming reuse are nine bridge on five language. Oh, applying the same governments. So sit back, relax and enjoy the program. Thank you for this name. 2. Course Contents: Hello. Once again, Welcome to the course. Making numerical predictions for time series data Part one or three. My name is Marco Majumdar. Let me walk you through what we will cover in this course. It starts with understanding what is time Series data. Then we begin our journey off making predictions. We start with using moving averages. Recover many techniques recover Simple moving average is centered moving averages weighted , moving averages on some more. Then we move on toe regression models. We will cover linear regression, exponential regression. What aggression? Lovely. Make immigration quadratic regression, which is a non linear decoration. On what a normal regression, which is also a nominee integration. We will end the course with how we can select a model with which we will make predictions based on the data that we have from the different motives that we have discussed. So welcome aboard. Once again. Sit back, relax time. Enjoy the courts 3. Time Series Data: Generally, it has seen that making numerical predictions are forecasting involves studying the behaviour offer characteristic over time and examining data for only pattern. Forecasts are made by a shooting that in future the characteristic will continue to behave according to the same pattern. The data gathered could be from Sade's for day units of production per week or running costs of machines per month, etcetera. A time series is a collection of observations make sequentially over a period of time. The data on any characteristic collected with respect to time over a span of time is called Time cities normally, and be assured that the operations are available in equal intervals. Example. Honored hourly, daily, monthly or yearly basis. Sometimes Elise cover opinion off several years as well. Way take a look at some common types. Off time series data way have time City State of a trend effect times It is data with seasonal effect. On times it is that I would sy click Effect. We will take a look at each of these in some students. We first take a look at a time series Data trained effect. Trade is a long term, smooth variation in the time series a friend can be are four trained. For example, The population of India since 1951 till date or it friend can be a downward trend like mortality rates. In India, we can have a trend which is neither afford nor downward. Or you can also have a trend, which sometimes is up Ford and sometimes his downward when values in a time series on an average showed increasing or decreasing trend over a long period of time. The time Citizens called Time Cities with Trend effect show neo is a time Cities data How Bombay Stock Exchange. Clearly, this is our fourth time series. What if I find me up? Idiot. Next to take a look at Time series with seasonal effect in the values of the time, cities reflect seasonal variation with respect to a given period of time, such as Quarter R Mont etcetera. The Time Citizens Card Time series with seasonal effect shown here is the graph off number of malaria cases over a period of one year. Clearly, you can see that there's seasonality in this time series. Lastly, we take a look at time cities with Cy click effect. If a time cities exhibits cyclic trends. The time services, Corner time cities Attack Lake Effect shown here is this likely Gruebel trend. So far, we have seen different types of time series. Now you have noticed that there are various factors you know which there are variations in the time cities. Thes variations are these factors which, because the time cities to bury are also known as the competence offer Time cities. The first conference is the trained, competent, usually times in this data, so a random variation. But over a long period of time, there may be a general shift in the mean level toe higher or lower level. The gradual shift in the level of the time cities is known as the trend. The second competent is a seasonal competent in time series. Radiations with jock urge you to read the make or natural forces or factors, and operating a regular and periodic banner will appear span off less than or equal to one year is Tom does seasonal variation. Although the data may be recorded over a span of three months, one month, a B or a day, the total period should be one year to assess is there. Variation property. The third component is a cycling competent sometimes cities exhibit variation in a fixed period of time. Do you tow physical causes? For example, economic data are sometimes thought to be affected by the business. Cycles with a period vary from three years to 10 years. These cycles could be caused by opinion off moderate inflation forward by opinion of high inflation. It's, however, the existence of such business side cycles leads to some confusion about sy collectors and seasonal friends. About this confusion, we shall term a pattern in time cities a cycling component only when the delusion is longer than Bunia. Long term variations. That is a trend competent on the short term variations that is a seasonal, and cycling components are known as the regular radiations. Apart from this irregular variations random, any regular radiations which are not accounted for, my friend seasonal cycle competence exist in almost every time cities. These variations, we will refer to us irregular, competent, so we have seen that many factors natural and manmade, affect that time cities. We would now like to describe the effect of these factors on the competence over time cities. Mathematically, we take a look at two very commonly used mathematical models. Wasters tried The Time cities data is neighborly. Well, the first model is called Additive model. The second model is called Multiplication Model. We will take a look at both these models a little bit more. Independents Additive model is based on the assumption that at any time, t the times it is value white B Is there some off all the competence off the time series? Mathematically we can expresses. Does White Easy Quarto TT Plus S T plus City Plus I t. Here. DT is the trend component. ST's a seasonal component. Cities a sightly competent on site is the irregular competent. This is based on the following assumptions. Won the Cy Click effect remains constant for all cycles to the seasonal effects remain constant during any year or any corresponding period. Three. The effect of irregular variations remains constant. My duplicated model is based on the assumption that at any time, be the time cities value. Whitey is the productive all the competence. Mathematically, we can write it us. TT into ste. Into city into i. D. Where D. D is a trained, competent ST's, a seasonal company cities. The cycling competent on night is the irregular Competent. My native model is used when seasonal variations exhibit increasing or decreasing drinks with this brief introduction. Toe time cities data Let us next move toe making some predictions on time Cities data using various techniques. Thank you for watching. See you in the last lecture. 4. Descriptive Statistics: in this cause. We're trying to predict future outcomes. We start by using descriptive started states. This triple statistics are grieved descriptive coefficient with summarize sister and tired for data or a sample of the population data we consider for our illustration is taken from a school check. The school chain orders printers and cartridges for its requirements. Now given Here are the different printers and cartridges numbers, which has been ordered on different days off the Yeah, now we have the data about the number of printers order number of cartridges ordered of the different days on the total amounts phantoms, food supplies by the school. Our purpose is that we want to be able to predict how maney printers and cartridges we ordered in the future. Such tactics to stop create a shortfall? No, it is more than what is the quiet. This has to be done based on the historic data that we have collect. We start with some simple techniques. Why? Discussing this techniques? We will also establish How do we find out whether our predictions are being good or not? This. Take these methods we will carry on for other techniques that we will discuss later in the program. One of the easiest ways to make predictions is to study the central tendencies of the data . Now we can study the central tendencies like mean median mode, etcetera on base. Our decision on the future prediction on these values. This is one of the easiest techniques. Let us apply this technique and find out how good or how bad our predictions would be if you use this technique. Even today, there are many people who find the mean or the median of the data on use it as a predicted value for ordering quantities for the future day. Now the turns out that the mean or median value may give a valley which will not be very acquitted. However, the chances that could lead to a shortfall will be minimum, and the chances that Lito over ordering will also be very minimal. We will then find the standard deviation of the data, which will give a mission off the spread of the historic data. Now, by finding the standard deviation, we can make a judgment regarding by how much our mean could be off the value which you're looking for to check the goodness off the prediction, we will compute the mean absolute era. The means quieter on the mean absolute person data we will later discuss after the illustration which off these errors is more appropriate. In what circumstance? Now let's look at the data. So we have the data regarding the day the number of printers are not number of cartridges, daughter and school supplies spending. So we have 75 data points. So we find the mean the median on the standard deviation. To find the mean we used Excel function average. So you say average and then give all the data for the printers Order on all that all the day's You do the same thing for the number of cartridges ordered on all the days on we find them enough number of cartridges ordered. Find the median, reuse the itself function median on. Give the complete range of data so media enough number of printers order this 15. Similarly, we find the media for the number of cartridges. Order on it. It's 57 now to find the standard deviation to use the Excel function istea on give the completely and jobs data. The standard deviation is 2.97 tonight. So if the distribution is normal, we can say that the mean will vary by one standard deviation by about three printers. So we do the same for cartridge result. Now, let's use the mean as our forecast that it our first strategy which we will your place. So we say that our forecast for every day is the mean value that we have. So we will forecast for using for ordering 15 printers on every day after. Yeah, let's see how good this forecasters will find later in the forecast. Find data. We subtract the forecast value from the actual Now we copy this for all the days. So there we see that way, have learned in Forecaster sometimes you're accurate. Don't their forecast where we have ordered more or their forecasts were being water less as we have all Stephen negative figures. Just find the absolute value. That is absolutely so we say yes on the data. That is gift. Absolutely not be copied this across to find absolute ended for all the days to find the mean absolute error, we find the average off all the absolute errors we have all core region due to it. Okay, so the absolute mean Absolutely. That is 2.5067 Next, let's find up. Means squared it to find the means. Whether that we find the square that off each day, that is you take data and square it. Now we copy this for all the different days. So we have the value off the square dinner on each day to find the means quite better way. Take the average off all the squareness. So there we have the means. Whether now means wherever reason square time. So to bring into units off number of printers, we find the root, mean square toe for do that, He said The square root after means wherever you says the root mean square. Next, let's find the mean absolute person visitor. To do this, we find absolute value off the error divided by the actual number of printers Ordered way. Have it Now. We copy this for all the different days we express this in percentage terms on we increase the number of decimal places to today we have the absolute presentation era to find the mean absolute percentage era. We take the average after absolute percentage error I mean, absolute percentage generally 17.74%. So we have seen a simple way how we could forecast from historical data. Now, let's take a little more look at how the data looks like. So we will do something, capitalization to have a look at what the data looks like. We will begin by cleaning a chart from the data to see if we can find any trends in the data. Then we will create a history Graham from the data on in the process of creating the history Graham, we will get introduced toe Datanalisis stool back provided by Except so we have back in, I accent. So here we have the data. First, let's started charted. So we select the data for the printer and the cartridges and the number of on the day number on. Then we say in certain line chart. So it's uncertain here, let's move it to a convenient location. Okay, let's make some adjustments to the data cities. You see that they are three later cities which has been taken, but actually they should be on the X axis. So we put the day on the X axis on removed a from the They're going to be charged. So now we have the chart, which shows the trend off, how the prisoners have been ordered and how the cartridges have been ordered over a period of 75 days. Next, let's make the history now to make the history Graham, we require toe. Create the bins before we can create the bids. Let's find out what is the maximum number of printers ordered and what did the minimum number of printers order? So you say Max, on give the range of the data regarding number of printers order So Matsu number printers ordered on a single day wa straight. Similarly, we find the minimum number of printers order on a single day and it is stent. So now we can create the Vince Let's create the bins Here you start with nine and create bends for every individual into Joe. Number off printers Order as the maximum pro print disorder is 20 so it cleared bids up to 20 months. Create the instagram we will use that analysis told back. So go to the adults who will track down, said you insert include that analysis tool back So once everything included global data. Tap on. There, you will see that analysis tool practiced. Really? From there, this is stool back. Select this telegram for the court range. We provide the data regarding the number of printers ordered on. Different is up there. Yeah, as we have been through the label, we would say label is a really Now we go and we find that range apter bids and we define a place with output will be provided. Among the remaining options we tell that also created chart off the history, Graham. Now we click. OK, so here we see that the frequency chart off the different number of printers. Order has been created by itself. And here is the Mr Graham Jack, which is created by Except removing. Click on within location. So that concludes our demonstration of how to create history. Grams, thank you for this thing. Seeing the next lecture 5. Moving Averages: Hello. Welcome back. In the last lecture, we saw that we could use the average figure off all the data points to be used as the prediction fact which we should order in a given situation. Now, using averages, we have the disadvantage start. We do not consider what is happening recently with the data. One way to overcome this limitation is to use moving averages. We will see that in this particular lecture. In moving evidence forecast, reconsider the end most recent dinner points for using as the forecast This is instructing the average of the completely desert. We decide how many points of the data we will consider this recent data on. We will take the for average off that toe use as a forecast. So it is denoted by a off. And now here N is a number which can determine how many points you will consider. For example, a three means we will consider the three most recent data points during this time. The mechanics let's go through an illustration. So here we have the data regarding the day and the number of printers ordered on that particular day. So we have 75 data points so they here is the data that we have now let's find moving average for two days. It may too. So we take from for the 75th day, 73rd and 74 They data that is 11 plus 14 they wanted by two, which is gives 12.5 b Round it off to get 30 Now for the 74th day we take the previous two days. That is 73rd and 72nd day Onda. We get 14 plus 10 divided by two, which is equal to 12. So the predicted value for the 74th day would be 12 printers. Now we do the same for the 73rd day. So here we take the data for the 72nd and 71st day that is 10 plus 40 divided by two with gifts. Well, so according to moving averages, the prediction for 73rd day will be 12 printers Toby ordered. Now let's do the same for moving averages for three days. So for the 17 day we take the data from 74 73rd and 72nd day. So we get 11 plus 14 plus 10 divided by three, which is 11.67 Rounding it off, we get 12. So for the 75th they are prediction is to order well, printers. We do the same for 74th day, where we take the data from 73rd 72nd and 71st day. So this we get 13 printers to be ordered on the 74th day. Now that we know what is going every logistic issues, moving averages to predict a value in the future, we will use three day moving averages to credit the future value for date 76. So we take The days are defined 74 73 values on. They are 14 plus 11 plus for being divided by three With gives us there's 76 production by moving evidence off three days as 13. Now, let's do these calculations on. Except so we have our data. No, we see that we have 75 points. You calculate moving averages for 10 days way Take the 10 most recent days from the 75th today. That is from the 74 day is the 65th day. So the sustained is average that we're finding out there. It is not a copy this across religious force were defined that this moving average that is up today, 11 today. So we have found the moving averages from the 11th date with the 70 day. Now let's see how good west there is to make on the production supposed to find it up, which is there in the prediction to find a way, subtract the predictive value from the actual bed, so we find actual value minus predicted. By that is they very now. Prediction. We call this across all the predictions that we have made. So here we have up in a as we see the day that is both negative and posted some pants they were witness to matrix of vastly underestimated it. Let's find them enough to do that. To do that, they'll find the absolute that off. It's great production, so we find absolute value off there needs protection. We copy this across. So here we have the absolute value off there are in each tradition, find the mean absolute error. We ate the average off all the absolute letters that we have found there we have the mean absolutely, which is 2.6385 Next, let's find up means quieter. Find the means quitter b squared. They know that we have we caught with this formula across to all the predictions that we have made. So here we have the square off each prediction find the means planet. We find the advantage of all this credit us. You say average off all the swear letters. There we have the means, questioner. Now we find the root mean square. Better to bring it to the same time. Says the number of printers. So we do square root off the means quick. Next, let's find the meat. Absolute person. Visitor. To do that, we find the absolute positive a lot of these predictions. So this absolute value off do I did bind up actually, what do the order? So there we have the figure that's competence across For all the predictions that we have made when you discover created that's expressed this as it wasn't Page so expresses a person. Did you increase the number of decimal places to do So we're here. We have the absolute percentage off. It's tradition to find the mean absolute positive. We find the average of all that person to dinners there we have the mean absolute wasn't a general, which is 18 for 96%. Because so one method of calculating moving averages using Excel nine Excel It is far more easier using data analysis tool kit to calculate going averages. Let's see how that is done to use the did analysts told back or toolkit way. We first need to add on this fact in Excel. Once we have the dude analysis tool back a toolkit available in Excel, we can use the did analysis tool back to invoke the moving averages. After invoking the moving averages tool, we need to pass the required parameters in the dialogue, which is displayed on, then select the options what we require for providing the output on. There it is. We will get the moving averages. Let's see a demonstration off this So you're back in our accent on this time we find 15 days moving averages, so first we add the egg on that. Is that the analysis tool kit? So we got the tools and then our Don's wanted is added to the data tab on their U 500 analysis tool kit in both return analysis tool kit to find the tune called Moving Averages . Once he invoked that it will ask you for what is the range of data with for which the moving averages has to inform. So we give data regarding the printers on as we have included the head up. So I say in label on the first room, we need to find 15 days. So I keep 15 days as a moving average is now, we select output rage. Always select output change. One probe below better averages have to be displayed once you've done that. Next we select that display. A chart answer is displayed on this. Say okay on it calculates the growing evidence. You see that for the initial knows it is audible to calculate because it doesn't have potatoes of data on. Then it starts completing the moving averages from the 16th day on. What's now look at this era, which is dress calculated. It is using excel function. Now let's look at the chart. So it has provided the chart off the do you guys were. So he makes a modification of the charts on the X axis. We have days on the Y axis. We have the number off readers that that we have ordered you can take the head up to 15 days will recover its way . Take a look at the formula used by moving averages stool off Iran and stood back for calculating standard error. The formula used is square root off some exam by two. The function some except by two, calculates the sum of squared at us. In this context, we passed the actual values off as a first parameter on, we passed the predicted values as a second parameter, so it gives the sum of square it up between the actual value and the predictive value. So we have seen how to compute moving averages. We also saw how to predict using moving averages for one period ahead after last moving every service treason. Now let's see how we can predict deep into the future using moving evidence. To do this, we first played a trendline off the calculated moving averages. Then we determine the equation for this trend line. Once we have a patient, we will use the equation for making the required predictions. You see an illustration. So here we have the moving averages for 10 days. I just inserted row here. This is not required, but nevertheless for simplicity Off illustration. I showed this. So I put the day number there and we put the same day number as we have brought in Pro column number A co operatives across. So be that so that we have all the day numbers. Now we have the day number and the moving everything. For 10 days, I was next to each other. We use this data for creating the chart, so we select the range of data for which you are to create the chart on. Then we insert Oh X Y Scott up there we have it. We moved to a convenient location. Now we ended the data series so that we only pointing to the data that is then the ridge off moving averages that were Catholic. So we take the X access levels on change from room number 12 to number 12. We do the same for the y axis labels old. So now we haven't. Now we try to act that turned like you see that when I'm tricky, right clicking on the chart. He doesn't show the option for creating a trade like you would inside the chart, the same thing happens. What you should do is should click on the line chart and then you see the option for adding a trendline with selected option. We see that you're creating linear trend line, not choose option to show that a patient. Now you see, the equation is displayed for the trend line that we have generated from this data. So we see that the operation for the trend line is 0.12 x plus 15.165 is equal to white here X is the day number on. Why is the number of printers that we should but in order for a given day? So when we give a particular day number, we can determine how Maney printers we should order on that particular day. For example, if you wanted to be the number of printers to order on day number 85 we replace X with 85 on then week. On equating situation, we get why it's equal to 15.63 Thank you for this thing. Seeing the next lecture 6. Centered Moving Averages: Hello. Welcome back to the last lecture we discussed about moving averages of aviation of moving averages, which is more commonly used is called the center Moving averages. We will discuss center moving averages in this lecture. We discuss center doing averages through an illustration. So here we have the data regarding the number of printers order on given days. So we have data for 75 days. Onda. We have the details regarding harmony. Pictures were ordered on each of the days. So here is the data. We will try to calculate the center moving averages for three days as we're calculating centered moving average for three days with the quiet three data points to calculate. Doesn't that moving average The one data point requires to be on the day Rachel, which were computing on one data point? Because we before the day and wanted upon request week after the day so that the center most of the point is on the day off for which were calculated. So for calculating centered, boring average for 74 we require the data from the day 73 74 74. So the center moving average for day 74 is 14 plus 11 plus 14 divided by three, which is a quarto 13. Let's go through one more illustration to compute the center. Moving averages for three days. For day 73. We require the data from day 72 73 74. So we get 11 plus 14 plus 10 divided by three just 11.767 Now that you have understood how to calculate the center moving average for three days, I leave the rest of the calculations to you. The figure. I put the figures as calculated. You can verify it by calculating yourself. As you would have guessed. Center moving averages method is suitable for our number off idiots like a matri that is complicating within the moving averages. For three days, though, we can compute center moving averages for even number up idiots as well. However, we will not discuss that here. It is quite simple. You can studied on your own. Now let's see this illustration using extent. So here we have the date up way will calculate moving averages for nine days, so we take the last point, which it's possible. That is the 71st day Onda. We calculate average for the nine days, four days before four days after on the center part down 71st day. So here we have our nine data points and we take the average. Now we copy this till the point Very disposable to calculate nine days moving average center doing every is that is. So we go up to the fifth day on. We calculate the sector growing average for nineties like before. Let's check. How good is this prediction? So you find data by taking the actual value on subtracting the predictive value from the actual value we caught with this For all the data points where we have calculated 90 throwing average sector moving average. So there we have it. That's calculate the being of absolute era. So for that, we can't get absolute error by using the absolute function to calculate absolutely. For all the moving averages that were calculated, we have it there. We find them mean absolute error up by finding the average of all the absolute it is that we have computer to his average off. All the absolutely us that we have computed to the mean absolute error is 2.5091 Next we compute the mean square that up that is you take data and square it. We copy this across. For all the predictions that we have made using sector moving average for nine days now we compute means creditor. By averaging all the spread of those that we have to come in, we see that the root means for their eyes all d computer. It is 2.856 times. Now let's calculate the mean absolute person digital. So we take the absolute value of Taylor on divided by the actual value of number of printers ordered. So we have absolute percentage. Eller for one prediction. Recopa these For all the predictions now you find the mean absolute percentage error. We're finding the average of all the predictions that we have made that that is average off all the absolute percentage errors that we have Computer. So the mean absolute person digitally 17.64%. Now let's bigger Mendoza, the medication for making the prediction in the future. So we take the data on we insert X y graph. There we have it removing toe a convenient location. We just format the data cities so that we only paid the data for the days for which we have computer term moving averages. So we go to the data. It's taxes contains the day number. We just change the low number from and to we need to consider. We've done it for the exact since. Now we do the same for the y axis the y axis, you know, distant. We have got the moving averages for nine days centered moving averages for nineties as we have computer. So now we have our trendline. Let's find the equation for the trendline. Yeah, on way certainly near equation and we say, show the equation. So there it is. We just format the equation so that we can see it more clearly. Now let's use this equation to make predictions. So the situation that we have got is minus 0.8 x plus 15.1 for two. So if you replace X by the day number, we should get the number of printers to order on that particular day, which is the value? Why, using the equation in a startup predict the number of printers to order on day number 85 as an example. So we replace Expiry five. So we get via sequel to bind US 0.8 into 85 plus 15.1 for two. So why is a quart of 15.74? So according to the moving averages for nine days Central moving every is for nineties. We should order 15.474 printers on day number 30 fights. Thank you for this thing. See in the next lecture. 7. Weighted Moving Averages: Hello. We'll come back now. We look at another variation of moving averages, which is known as the weighted Moving Averages. Weighted moving averages is especially very useful when we have to give certain extra vintage toes are particular characteristic off the data, for example, when we have to give extra vintage two more decency off data. We didn't see how this is done using weighted moving averages in the simple moving averages of this center. Going after this material started so far, we give equal weight age to all the observations. This results in the prediction may not be very close to the latest value. Now we know that the decency of data plays a huge part in making accurate predictions, So we need to give consideration for the recency of data. Now you could defer toe the elephant modern, though it's a marketing model, but it can be there for Houston area circumstances. Addendum stands for Recency frequency and mercury ready when we can. Ever Smith heard unequal weights are given in such a way that all the way it's a positive on the sum is equal to mathematically. We can write it as follows. Whitey's accorded summation off W I in tow X t plus I where eyes from minus Que took you where W. Is always greater than equal to zero and summation off, all dubbed dry from I Want to buy ness que to cure is equal to one. Do you just take the formula? We take a look at an example. Consider that we are calculating weighted moving average for three periods to calculate value for y two we require why two is equal to W one x one plus w two x two plus w three x tree Where w one w two w. Three year rates and x one x two x three are actual values as one strategy we can assign. W one is equal to one by six w to Mexico to buy six and W three is equal to three by six, so don't do one plus W W three is equal to one. There's understand, weighted Moving average through an illustration. Here we have the data regarding the printer daughter on different days, so we have the letter regarding the day number on the number of printer daughter ordered on each of their days. So here is the data we have greater for 75 days will calculate centered, weighted moving average for three days. Let's start with calculating for day number 74 calculating for day number 74 we required a number 73 74 74 75 days now. So it is 16 to 14 plus two six of 11 plus three six to 14 which is equal to 13. So the prediction is 30. Let's consider one more example. To understand this your scalp flick for day Number 73 to calculate. For day number 73 we require data from day number 72 73 74. So we take one by sixth off 10 plus to buy 6 to 14 plus three by six stuff. 11. This is equal to 11.83 to the prediction for the number, 73 is 11.83 Next, let's see an illustration of how to calculate weighted moving average using excel. Well, here you We are in the except you won't be familiar with the data by now, so we will help played weighted moving average for seven days. So first we sent awaits. Let's give the weight since 1234567 for the seven days, with seven being the weight given to the most Recently we first sum it up, sum up all the weights and it gives us 28. Now we give the multiplier for each day. So for doing that we divide the weight by the total sum off all the weeks. So there we have it. For day one, it is zero point 357 Now we do the same for all the other days. So there we have the weights that is assigned toe each after day, for which we have to calculate the moving average. Now we start from the fourth day because there are three days before on three days after for this day, with four being the sector most. So we're finding centered moving average, which is waiting. So we find the sum productive here because we have to multiply by the weights. So we take the seven days of data for which we have toe calculate the weighted moving average and we multiplied by the multiplies. So notice here the use of the function. Some product. Now we copy this across still the day disposable to calculate weighted moving average which center for the seven days. So there we have directed moving average for all the days for which we can calculate based on the data that we have now next, like we have done before, you'll see how good this estimation or the prediction has bean. So for this we first find the Arab find data, we subtract the predictive value from the actual value off number of critters ordered. Now we copy this formula for all the days for Miss. We have computer the prediction. Next we try to find the mean absolute error up. So for that we find absolute value off data that we have found use the habeus function and find the absolute value of the copy This formula trust for all the predictions that we have made. And there we have it now to find the mean, absolute it up we take the average off, although absolutely just actually have Catholic. So is average on. Then give the range off all there s that we have calculated. So the mean absolutely That is 2.5331 next week l played the means question. So for this you can read the square in off needs prediction. So for this we get square the intercom on. Then we copied across for all the prediction that we have made. Now we take the average off all this Credit us to find the mean square dinner. There we have the means Quetta to find the root means forever. We take the square root. Pasta means glad in it. Next we calculate the mean absolute percentage era. So we find the absolute value of terra divided by the actual number of printers order. Now we can't let the absolute percentage inner For all the predictions that we have made by copying this formula, we find the mean absolute person digital. We're taking the average off. All the absolute wasn't digital is that we have just calculated the mean absolute person to generally 17.93%. Now, let us use this predictions for making production in the future. For that, we may draw excite graph off the predicted values to select the data on. Then we insert I X y scatter way moving to a convenient location first, as we have done before. It does give only the daytime, the range for which we have computer. We started from day number four, that is rule number five. Fill their number 70 1 70 Do so we go to the Adidas. Siri's Andi. Make the change. See that we have the day number on the X axis and we have the moving, weighted average underway access. Now that we have done that, they decided at the trend line we make a linear trend line on we say show that formula. Just move the formula to a convenient location so that we can see it. So there we have done it. Now let's use the formula for determined to make predictions using centered, weighted moving averages to the formula, we got waas. Why is equal to minus 0.5 x plus 15.1 96. So for it is every X we can find the money off. Why X being the day number and why being the number of printers that we have to order jealousy an example. So we predict for day number 80 fight using centered, waited bowling averages. So why is it going to minus, you know, partisan fight into 85 plus 15.196 So we get why is equal to 14.771 So according toe so intergrated moving averages for seven days that we have computers. We shouldn't be artery 7 14.771 number of printers on day 85. Thank you for this thing. Seeing the next lecture. 8. Standard Deviation of Moving Averages: So far, we have discussed a few techniques for making predictions. Now it should be recognized that all the techniques we have discussed so far are good for making short term predictions. These air not techniques for making long term predictions. We need to further find out with what confidence we can use this predictions. For that we need to find a sand aviation off the prediction that we have made. We really discuss it in this lecture before we discussed under deviation. Let's look at the results were obtained from the different examples that we discussed. We use the different methods, the methods we discussed, where simple moving average is centered, moving averages and weighted moving averages. We use different periods for each of the examples on we are the results. We saw the mean absolute era, the root mean square error on the mean absolute percentage era. All of these are shown here. Not is that the results are better in some cases than others. So one of the things which is the ground would off machine learning or artificial intelligence is that we have to conduct a number of experiments before we can select the technique, which we will utilize for our prediction making. Also, it needs to be noted that depending on the data, different techniques will work with you under different circumstances, though it should be obvious. But I need to state that smaller the it up in the prediction, the better the production techniques should be. Please note that for explaining the concepts we have been using a data set off 75 data points. Now this is a very, very small data set compared to what we will normally use for machine learning and artificial inclusions. Now, when we have a last year, said the Law of Large Number applies one of the conclusions of love. Large numbers is that the daytime it lives, they say, will always fall under the Normal Distribution Co. The Number Distribution Co. Has property that when we make a prediction, if the prediction is within one standard deviation off the mean, then there's a 50 65% chance that the production would be cutting. Also, if we make a prediction which is within two standard deviations from the mean, then the prediction has a chance of being 95% cutting for the time being. Here Let's consider that our data is normally distributed. What we have used in the example. Now, under that circumstance, we have made predictions. If you find a standard deviation off this predictions, then we can be confident, say that our predictions will be correct to what extent that is what I figure is within 65% or with the 95% depending on whether it is one standard deviation away or to certain medications. Have a let's discuss the formula to use for calculating, mean and standard deviation of predictions made using moving averages. The mean of the prediction is equal toe the mean of the observations. Now, when you're using moving averages, the number of observations need to be. The number of media is considered in the moving averages. So when we're doing a May 10 then we need to take 10 observations as stop observation set for which we have to find them mean. The standard deviation of the prediction is he called the standard deviation of the observation, plus the standard deviation off the off the observation divided by a squared it off. And where N is the number off observations that we are considering Let's understand this through an example. To understand the calculations, we will use the example used in discussing weighted moving averages for seven periods. Now I will not you do the calculations on extent. As I provided the Excel sheet as, ah resource in this coast, I would expect that you do the calculations yourself very fight the results that I want to discuss here, I mean of the prediction is equal to the mean of the observation, which is equal to 15.92 The standard deviation of the observation is 3.334 So the standard division of the prediction in the quarto 3.334 divided by sweater or seven plus 3.334 which is equal to 4.1799 So we noticed that mean plus one standard deviation is equal toe 19.272 at mean minus one. Standard deviation is equal to 10.913 So if our prediction isn't this strange off 10.913 and 19.272 there's a 65% chance that our prediction will be correct. Thank you for watching. See you in the last lecture 9. Seasonality: Hello. Welcome back. So far in the program, we have been discussing techniques for predicting when we see drinks in the Time cities data. We have gone through the techniques off using moving averages on some variations of the moving every Just take me. Later in the program, we will discuss more techniques for predicting for takes. Now we take a short diversion and take a look at another component of time. Silly strata. That is seasonality. Though there is overlap, there are specific techniques which you need to apply When we are trying to predict for data which have seasonality aspect in them, we will discuss these techniques. In this lecture, we will use the same data that we have inducing in the program so far on see how we can apply techniques for seasonality. So we have the data regarding the number of printers order on the days off the year. So as we have seen before, we have 75 points of data for the number of printers are on each of the days we will first read tabulate this data so that we can look for seasonality. Now we have the day numbers from 1 to 75. We will consider that each of these are today off the year on, so they must have a big day, which is a Sunday Monday Tuesday, etcetera for each of their days for the day numbers, we will come to go zero a Sunday on six us Saturday we will start by assigning a week day for each of the day in their data for calculating the big day. We will use the Formula Day model, a seven explicated, less marked day comma. Seven. You can see the big day as we have calculated. So now we have data where we know how Maney Printers reordered. Owns Monday, Tuesday, Wednesday up to son Now each of the big day will belong to their particular week in the year. We will assume that the day dies from the beginning of the year and so we will start with number one for calculating the week number. We will divide the day by seven, take the impeach apart and I want to it. Next we can use floor function floor they divided by seven comma, one on plus one. So now we see that each data point has been given a big day and a weak number. So we have a leader from week number one, week number 11. And in each of the week, we have data for big day 0 to 6. Next we read, emulate the data such that we put all the data regarding a particular big day on a single cluster. We will see how this is done. So on the X axis, we'll put the week number. Now the Y axis, we'll put the big day. So we will have all the data regarding weekday number zero or a single line all the very regarding Regan, number one on a single line and so on upto regain number six. Now, since we have enough for 11 weeks, we will put the 11 weeks data on the X axis. Our goal is to see whether we can see any seasonality in the data across a different weekdays that we have got data for now we plant the data on this chart. All the 75 data points. Now, this is a test data. However, this will be a sufficient for us to explain the concept. Now you can see the graph of this data. We can see that there's a little radiation. Maybe this is a seasonal variation. The first step is to determine the sample mean we find the sample meanings. 15. That is, we take the average of all the reader points on that. It's 15. The next step is to find the seasonal mean now case we advocate the mean off all the data points for a particular day off the week. So now we have the values off those seasonal means. The next step is to calculate the seasonal factor. Seasonal factors calculated US seasonal mean divided by sample mean we see that the seasonal factor for day zero is 0.88 That means we have been ordering 88% of the mean value on the day. Zero. Similarly, on day five and six, we have been ordering 108% off the mean demand that we have been supplied. Now, if you'll have data regarding tourist visiting Hill stations, for example, you'll see that there's a huge variation between the tourists arriving in summers and through stories, and I begin Big does the next step in the process for predicting for time sinister with seasonal affective the seasoning. So here we take each off the individual observations and divided by the seasonal factor like 19 divided by 1.2 gives 18.7. So we do it for all the data points that we have got an hour observation sect. Once we have got the disease and data, we will use the DCIS and data for making predictions. We will see how this is done. First, we will make this sexism, data and normals cities just like we had the original data set. So here we have the original data. We plan the week days that we computed earlier on. Then we planned the pieces and data that we have just computed. You see that we had the seasonal factors. We will require this for making the prediction. So you blanket here now we make prediction for day 76. Using moving averages for four days, we notice that for their 76 the weekdays number six. So the seasonal factor will be 1.8 moving averages for four days. We get 12.1 as the moving averages for four days. So toe get the prediction for 76 day we should take the moving averages that we have found Mark implied by the seasonal factor 12.1 into 1.8 which is equal to 13.12 to the production for day 76 is 13.12 This completes our discussion on making predictions forces no data. Thank you for watching. 10. Correlation: Hello and welcome. So far we have used descriptive statistics on variations are moving averages to make numerical predictions. Now we move to the next stage where we will try to use the new immigration. However, before we discussing immigration, let's understand what is correlation? Correlation is a star testicle association. Cool relations are very useful because they could indicate predictive relationship that can be exploited in videos analysis Correlation is synonymous to dependence on it is represented by the Greek alphabet to rope. So when we say roll X y, it means the correlation between the variable text time. Why? Now we discuss the mathematical formula for correlation. Correlation draw X Y is equal toe summation off excite minus mu X When you x is a mean off X into Why I minus view I realize that mean off Why demanded by end minus one in tow Sick My eccentricity Mama us in my extensive Mayar Senator, Deviations off! Extended This can't be The return has summation off Excite minus mu X into why I minus me Why divided by squared it off Summation of Excite minus mu Rex Hold squared. Why? I minus new iPhones work for the sake of complete lists, it was understand that Mu X, our mean of X, is equal to one by enough summation Off excites. Similarly, Mueller flies equal to one by enough summation of my ice. Now Correlation Row X Y is always between minus one and one. If it is one, it means it is absolutely positive. Correlation, if it is minus wanted resets Absolute negative correlation. If the value of Roe X Y zero there's no correlation between the two million votes. If the value of Roe X Y is brittle and zero, then we say that X and Y are positively correlated. Is the value of Rohit's? Why is lessons, you know? Be said the extend. My valuables are negatively correlated. Now let's see how you can calculate the value of correlation using the formula that we have just discussed. So here we see the data, which will use with illustration for calculation of correlation. We have the data regarding printer cartridges and school supplies. We have greater for 35 days, so you can see the rest 35 data points that we have got. The first thing we will do is we can create the mean for each of the Daytona, this printer cartridges and school supplies. Now, to calculate the mean arm, you we can do some off all the data points that we have for printers on, divided by the total number of data points that we have got So we can find the total number of data points for using the function count in. Except so don't they count on give the range of data. So some divided by couch would give us that Riley off Mu. We could have found this by using the average functional exit also, but we're using There are functions as we have been discussing so far and said the calculation more to automatic so that the numbers get automatically calculated. Now we have found the value off mute. What does that mean? Next Ask for the formula. We going to calculate excite minus mu X. So this is exciting Minus I mean, I anchored it. So there we have it. Now I can calculate excite minus mu ex and why I minus me away. And there's I minus music by just copying this formal across. So there we have it here, really presenting printers by X cartridges. by why, on school supplies buys that next you'll find Excite minus new X on by I minus new white similarly will find Excite minus Mu X and Joe. They're dying minus music. And lastly, we will find why I minus knew why in tow that I minus music. I will calculate these values for all the data points by just copying this formula. Trust. So there we have it. So we have got the towns which we require for the numerator. Now let's culture the doubts for the denominators. So we have to find excite minus mu X. It's cold square. So you say excited by less Munich scattered too. So we get that number now I can copy this value across to get all the values for excite minus mu X All squared Why I minus new wife will square that that I minus the news that ball square So we have that now we're supposed to find up some off excite minus new X squared. So here we find up some copy this formula to get why I minus you know why he holds great on some off said I minus music World Square. Now we fight the productive off Excite minus new expose Great and why I, minus real y, holds quite similarly for the other two terms. Also next, we need to find a square root so used that excuse party function. Now we need to find the some off X I minus mu X in tow. Why I minus new wife. So we sum up all these numbers to get that value. Similarly, find other toe sums also. So now we have all the values for the numerator and the denominator. Let's find the correlation. So for printed was this country it is a expresses white. So we take the randy off the numerator on the value of the denominators, and there we have the correlation between printer and cartridges. Similarly, we find a correlation between printer and does food supplies. And lastly, we find the relation between cartridges and schools a place you must learn about that. The correlation of the same time against your said was always what it's a correlation off printer toe printer cartridges, cartridges for supplies to pull supplies is always going to be equal to one. We saw that to find the correlation we had to do a number of calculations Now. Fortunately, Excel provides a function called Corell. You think which against find a correlation in this very straightforward manner, we will see how to use this function in excel. Now we see how to use a Cornel function to find correlation. So we have the same data with 75 points. So we first put the labels. We want to find the correlation between the printers, cartridges and school supplies. So he's like printer cartridges, school supplies, same replant on the X axis. Also, first we find a correlation between the printers and the cartridges. So we can say, Put it CEO already the first dynamic guys, we give all the values off the printers. The second parameter, we give all the values of the cartridges. So there we have the correlation between the printer and the cartridges. We can debate the same exercise for printer and supplies, and now we can do the same for cartridges and supplies. There we have all the correlations calculated. We found that this method saves us a lot of effort. We found that using the coral function, finding correlation became much simpler. However, even a data set we have a very large number of variables than finding correlation for each pair of variables can be very tedious using the Koran function. Fortunately, except provide something known as datanalisis stool back using which we can find correlations. Did you see how they used analysts told back to you Find correlations now used a journalist stood back to calculate correlation Gitana mrs told back and founder of the data tap its ad don't so you need to add it on and accept integrate Analysts told back you'll find days er option for correlations Select that press OK Did you the dialogue right? Ask for input Rage in the import range Provide all the data for Richie required to find the correlation Now here you don't need to grow a pair by pair. You can give all returning a single cope After you've done that you say that the data contained slavery because we included the faster Also in that data which was defied then the specified output bridge and click OK, so there you have the correlations calculated that concludes our discussion. Thank you for listening 11. Linear Regression: Hello and welcome. My name is Bartomeu jim dot Linear regression is a very powerful and popular method for predictive analytics. In this video, we discuss about linear regression on. We see how we can conduct linear regression using, except we will go through all the theory on. Then we will say Demonstration on Excel. We start with understanding what is linear regulation Leaving immigration is one of the most popular on most commonly used predictive modeling techniques. This is because the technique is simple and it glides reasonably good results. The aim of linear regression is to find a mathematical equation for continues the response were able why, as a function off one arm or X variables. So we're looking for a function like Why is it will do a plus b x for my rich. By providing value of X, we can get the badly off white. In a separate video, we will discuss the mighty, very ugly near immigration where we will have multiple number of excess for which we will have to find the valley off. Why? So you think being immigration, we can predict the money off y when X is known to us so we will give a value off X and get value off white. For example, Suppose X is a day of the year on Why is the temperature necessity We can give the value off day of the year to get the temperature now City. Now, look at the genetic mathematical equation for linear regression. This is the equation when we're dealing with one variable X based on which we will predict the value of variable Y to the question is, why is it called a big zero plus one x plus Absalon? Now here he does. He does it except we don't want to slope and absolutely Zeta We doesn't want me to have on a rancid investigation coefficients. You must remember that Why is the dependent variable, which is dependent on the dependent variable X? We try to understand this graphically. I suppose this is the graph off X values and my values. The middle line depicts the relationship between X and y values the point where the model line cuts the Y axis is known as like except that is, read uh, zero. The slope meter one is the increase in the money off white for every unit increasing value off X predictions using linear regressions to be academic. There are three assumptions which required to be satisfied by the data that is considered for the near regression assumptions and as follows. The first assumption is known as normality. Assumption. The normality assumption requires that the area around the light of immigration we normally distributed at each value off X as long as the distribution of arrest around the light of regression at each level off X is not extremely different from a normal distribution. Inferences about the nine of regulation on the regression coefficients will not be seriously affected. The second assumption is known as almost dynastic city assumption. The whole most congested city assumption requires that the radiation around the line of regulation because stand for all values off X. This means that the errors vary by the same amount when X is low value, as well as when exes are high. The third assumption is doing US independence off your assumption. The independence of inner assumption requires that around the regression line, being dependent for each rally off X. This assumption is particularly important when data is collected over a period of time in such situations, the inner for a specific time period that often related with those of the previous time period. Next, we discuss the functions. I really within excel for billing with linear regression. The first function is known as intercept. This function gives us the intercept for enemy immigration life. The slow function gives a slope value for linear regression line. Ste Y X Function gives the standard data from the bureau regression on artist you gives the are square. The value of R squared is an indicator with the linear regression will give. Reliable predictions are not r squared can take a value between zero and one. The value of R squared is close to one. That means the immigration will give up fairly reliable prediction. If the value of R squared is less than seven, that means that the prediction trump linear regression cannot be relied upon too much. Now we get down to the steps we need to follow for conducting linear regression. The first step is to gather the data. The data that will be using here is shown here. We have the data regarding that day the number of printers ordered on the day the number of countries are ordered on the day on the amount spent on school supplies. In our sample data, we have 75 data points, that is data for 75 days. However, many irrigation works well when the amount of denies really large. The second step is to remove the outliers. Presence of our clothes can seriously affect the predictions from linear regression. One of the best mechanisms to remove out flyers is to create drops plots later. Watch out for a video on how to remove our clients. So here we have created a box plot using Excel, you can see in the box plot. We can find the media in the first quarter, the third quarter on through the first quarter and third quarter, we can find the in the quarter mile range like you up. Generally, an outrider is any data point that lies outside the 1.5 into like you are. So we should remove the data, which are outside third quarter plus 1.5 into like you are, or which are outside 1st 4 time minus one point firing toe like you are. The third step is to study the correlation between the variables. If the correlation is very weak that this correlation is close to zero, then there is no pointing. Applying linear regression. Correlation can take a valley between minus one and one. In this data, you can see that the correlation between number of printers and days it's very small. Supposing we want to find a number of printers to order in a particular day. There's no point in taking X as the day number. Next step is to conduct African analysis. One useful graphical analysis is to create scatter plots between the dependent variable on Lebanon variables. In this kind of a lot, you can see that there seems to be some sort of relationship between the dependent variables that sprinters to order on. They never knew very well that this country disorder, so there may be a good chance that predict the number of just order. We can base our prediction on number of cartridges ordered on a particular day. Another useful graphical analysis is to create the identity plot off all the response with labels. Now here you can see that the response variable is not very normal. If the response variable is not very normal, immigration will not work effectively. The fifth step is to create the linear regression model. So we're looking back, trying to create a linear regression equation like, why is it called to a plus B X we confined? Ask where for the model to check whether the bottle will be reliable or not. In this example, the are square off 56% indicates that this will not be a very reliable linear regression model. The final step is to confirm whether the assumptions of the near immigration are made by the data. So the first check is almost industrial city Assumption check. So you can see in this example the home most elasticity assumption is not very well fulfilled by this they don't. We've seen the memo how this assumption is checked. The next check is to check the normality assumption We will see the demo. How we can do. They're still analysis to find the normality. Assumption is true are not for the given data. We will also see What is this? A dual on how we can calculate Let's do it. The final jack is to check for the independence of a resumption. So this check is also performed to residual analysis. We will see how this is conducted. In this particular case, you can see that the Independiente winner Assumption seems to have been made. So before we go down seeing the demo on Excel, let s summarize the steps. What we have today, we have to gather the data. Then we have to be moved out liars. Then we have to starting the coordinations. Then we conduct a graphical analysis. Then we created linear regression model. And lastly we confirmed up as I'm sure not true or not. Now that we have discussed teary let's go down to excel and check how we can conduct near regulation using 12. Linear Regression Demonstration: So here we are in Excellent. You can see the data that we are discussing. We will use as a sample. So we have begun regarding the A number number of printers Order number of God resorted are the school supplies we have spent on. We have enough for 35 days. So we start by tractor identified out players we create. We start by trying to create the box plots so you can see the charts. We have got box and whisker jugs. So we first select printers and try and see whether we have any out flyers here you insert charge box and whisker us. It's very Have it buy Apple interviews, the chocolates and we could make out There are no plans in this digger would be the same for school supplies, Jack box and whiskers joining a separate session in which every discuss exactly how we can determine out liars. Oh, this technique is good enough. You want to do the same for cartridges? Next we will try to look finder correlations. We'll find correlations about all the four variables that we have. So we goto the baroness astute back you select correlations the import range we give all the reader that we have with us select our area where we will display the correlations. Okay, so every have the coordinations. What we will try to do is that we will try to predict the number of printers to order on a given day. From the correlations what we have calculated, we find that the correlation between printers to order on the day number is very meat. It is close to head person. No, we will not take the day number as the ex parameter for which we will predict the why that is a prick us to order. We see that the best relations legatus between printers to water and spun expect on school supplies which is close to 35%. It is not a very good correlation would. Nevertheless, if we try to predict the number of records based on the amount spectral school supplies, we can have a reasonable chance of picking up the scent prediction. Now we come to the third stage where we try to do graphical analysis. So we brought up scatter chart between printers to order and money spent on school supplies . So we go to insert on intellect X Y scatter. So there we have this kind of block between treat us to order and school supplies. We are the trend line. You can ask Clark the question to be displayed and also the value of r squared. We moved to a position where we considered security make the fun pick up. We have a problem with this scatter chart. We have the pictures on the X axis and school supplies in the Y axis. We will rectify this. Now let's try to meet the density chart for the response variable that is number of printers to order. So we calculate the maximum and minimum values for the printers that is available now, Based on this, we create the bins for which we will cut late up history. You start with an ongoing increments off to we will switch to automatic calculation so that the Cal patients happened civilian now using it our analysis tool back. We will try to play tennis program the input ranges. The data for the printers on the bench is what we have just created. We were like the labels because we have included the labels. Now we give out footbridge and click. OK, we will be the chart also to see the history Graham. There we have our instagram more convenient location. No right click on the chart on change the chart Time to area track underlying you fight their idea charts So we'll change it so that the identity blood is not very normal. So we should not expect a very good predictor from the really regulation for this data. Now we will create our linear regression model. So we will try to predict the number of printers to order from the money spent on school supplies will compute the values of intercepts slow from which we can get the predictive value. And, dude, also calculate the starter better. Last word so that start by creating the intercept we say four to intercept. We give the range off wise that we have. That is not printers that were more trying toe predict order. So these are the known Weiss on. Then we give the X for the known wise. So we get the value of the concept. Next we need to find a slow find a slow, big contest low again. We give the Y values that is that number of printers order in the past on their goddess sporting access that amount spent on school supplies. Now we find the starter data. We use Ste vie X function. We give the vibe value system printers order in the history and there is corresponding amount spent on schools, a place Lastly, we can't see the dark squared so that we have computed they slopes under later than last week. Now you think they intercept and slope value, we can compute a predicted value for past data that we already have on. We can use this equation for predicting the future. So we say that step Plus I anchoring because it's really been the same for all bad news off X last law Slope on ty and current and multiply the slope with the value off X that is about spent on school supplies predicted really for the first days 19.40. Now we copy this form when I cross for all the days data that we have there we have a predictive writing for, although is so we see that the printer value is different trump their actual value that we order. So we find the difference between the actual value and the predicted This difference is known as the Let's do it. So we compute the residual for all the observations that we have got. You mean let us see how to organize the residents. Now that we have got a really immigration model, let us see with assumptions after deregulation of better or not. So for this we will do the resident analysis. First, we will do the most good extra city, as I'm sure analysis. So for this we were blocked a ground between Let's do It's on the X. That is the amount spent on school supplies. The Creator x twice better. So there we have the X y scatter. We see that the most congested city assumption is not quite made by this particular get up as the other is not constant across all buddies off X. - Next we take for the normality assumption. So, watching for the nomadic adoption, we have to make a history graham off the residence. So we find the backs moment renewing value of that. It's too. It's there, we have it. And now, based on these values, we can create the bends for making the history. Graham, we create the bench Round of I minus four. So there we have the bends. Now in use your journalists stool back to create the instagram. But the input we selector residue, it's and we select the bench has been see just theater, you select our foot range. And also we will select that Create the chart for this Mr Graham. There we have the frequency distribution way have our instagram. So you see that the system Graham shows that normality, as I'm sure, check Ismet by this data. - Lastly , we checked for the independence of other assumption. For this, we have to see the behavior of residue. It's over a period of time for which the data has been gathered. So we plot, exploit charge between the students and, uh, days off the activation we entered exploits that scatter from the graph. You can see that there s a quite independent oft up days off the observation so we can say that they dependence of inter assumption has been make. So with this, we can turn our discussion on linear regression using except 13. Linear Regression using LINEST: Hello and welcome. My name is Barco Majumdar. In the last lecture, we saw how we can do the denigration. Using Excel, we used various formulas which was available on We could do it. However, in Excel, it is far more easy to do the immigration. You think the Linus function in this video even see how we can use dinos function on duty near immigration using except before we get down to using nine its function. You don't understand what it is all about. The first thing is Linus desired function. Now any function means that you will have to press control shift enter after entering the function in a particular cell and you insecurely braces. It gives the function very dissector. Linus function takes four parameters. First parameter is known voice. Second, there's no next. Then there are two optional parameters constant on statistics. Linus Function uses method of least where generative equation of the farm why's it will do a plus B x by default. Many of constant is true. If we make constant physical defaults, then light next function generates a question of the farm. Why is equal to be X biting forward status said to falls, Efstathiou said. True that Linus funding produces additional regulation statistics. As I mentioned earlier, Line ist works on method of Lee Square to find the best week for the data which we have provided Linus can be used when we're using the deregulation for one dependent variable with one independent variable are more than one independent. William Give the data for the dependent variable is in our column, then the data for the independent variables. It should be separate columns. If the data for the dependent variable is in a row, then for each independent variable the date. I should be in separate rose. Now we take a look at what are the output provided by Linus function. Nynex provides a addy off output, so you need to select the number off rows and columns. As for the output displayed by line, it's the first. Output is slow. We have already seen this when we're doing the need aggression earlier. Slope is the unit by which the money off. Why increases for every unit belly off X. We are discussing using Lanus function for one independent really would later, we will see how we can use dinos for multiple independent variables. Also, the second output is intercept, which is given in the first row. Second column intercept is a point where the line after your decoration got stuff. My access. The third output from Linus function, which is given on the second row First column is the standard Iran Slope North. This value, the better prediction we will get from the media Immigration equation from by Linus Function. The fourth output provided by the Linus function is starting little or intercept it has given under Second Rope. Second Column North This value is the better prediction we will get from our linear regression equation that is produced by the finest function. The fifth output produced by the finest function is arts quick. This is given in the third row. First column are square is a query SHINDO Determination. It's very ranges from 0 to 1. If the value of R squared is close to one, it indicates that we have a better chance of getting a prediction for why. Which will be good Prediction. The sixth output just given in the third row second column is the standard error on the Y estimate. Again, the Lord, this value is well, we can expect a better prediction off White from the linear regression equation, the seventh output provided by the Niners function. It's on the fourth row first column. It is that I have started sticks. Now they have started. Sticks can be used to inform whether the correlation indicated by our square occurred by chance or not, if the F observed value is relatively large, we can be confident that the correlation indicated about square is a good reliable indicator after goodness off the linear regression equation. For further analysis on the F observed value, we can use the Excel function leftist F distribution. It can give up indicator. Whether their value can be relied upon are not. The next output is dignities or freedoms. The degrees of freedom indicates the number of independent variables that can vary without breaking the constraint in any regulation analysis. Degrees of freedom is computer, lest the and minus K minus one in case constant is true or in my escape, in case constant physical do falls between the freedom of the very important idea which our cars and many applications off started states the ninth output. It's regulation sum of squares. We saw in the last lecture that the difference between the actual value and the estimated value off why is known as the residue it there is still some of squares is there for your s s s obvious idea. Know the difference between the Y value on the average off y value? If you summed them up after squaring them, it gives us the total sum of squared. Now, immigration sum of squared is equal to total sum of square miners. There's a told so much square R squared is equal toe regulations about square developed by torrents of our square safety The arrest this formula it becomes one miners. Yes, adults all squared, divided by total sum of square. So we can see that the resident sum of squared is small then are square is closer to one. So if the residents are square is small, then the regression equation will produce a better output. The last output from Niners function ists residents off squared. We have just discussed what This is no time off square. You have also seen this in the previous lecture when we calculated manually Now that we have discussed all the output from the Linus function. Let us see it in action the next. So here we are in. Except you have seen this data before. So first we compute up regulation equation. We know that in the regulation equation will have a slope intercept so weak Take two says on the other Linus function. The first parameter is unknown wise, that is. We want to predict the number of printers to order Independent variable is school supplies . So it's like that ridge on a press controlled shift. Enter. We're repressed the control chip and we get the slope on Die doesn't now beginning separately computing the slope and intercept using dynasty as well in conduction with the index function. So if you say index followed by the light mist then we give our known voice And then we give our known excess and say comma once the first output first index So this gives has a slope. Similarly, we can compute the intercept. This is the second index off the Niners function. So we say next line ist give the known Wythe. Then we give the loan excess and said coma to so we get there. Except this is the same value as what we computed using the dinos function for the regression equation. Now let's compute the started sticks. What liners provides us so we know that this is, ah, deregulation for one independent variable. So we'll have in our courts in two columns. So we select that rage on give our nine its function first to give the northern Waas then pick you are known Texas then her constantly say it's true because we wanted equation of the form. Why is equal to a plus B X and four starts We said through this time. So there we get out. So here you could see that put the slow they accept, then the letters there we have ah r squared off 0.12 That is 13%. Not a very good number. Then we have they have started stakes big with the freedom, etc. No, we didn't check whether they have started sticks or the good one by finding they have distribution. So there we have half of the value. You can compute everyone as equal toe and in this number of observations, you can get that by using count function minus the degrees of freedom minus one, and we can get B two as equal to the degrees of freedom. Now, if you say the function of this the first but every day is therefore value second parameter B one. But I would be too. So we see that the emptiest value is very small. The probability is less that one person, in fact business and part one person so the family can be fairly light apart. That concludes our discussion. Thank you for listening. 14. Predicting with TREND: Hello and welcome. My name is Barco Majumdar. In the last video, we saw how we can use Linus function for conducting linear regression in excel. Now we look at another function train using which we can predict values they generated truly near immigration. The use edge off trend function is as follows we say train excellent should be preceded by a quarto. Then we give the vice that is known. Then we give the X that is known on. Then we get the X for which you want to predict up. Why they don't see this in action in extent. So V I accept. Now we create a new column where we will make the predictions for why, For the known rights that we have. So you and other former drink equal to trend give the known wise. That is the it does order historically on the no next. That is the amount spent on school supplies historically and then the value affects. You would like to predict for those school supplies on the day one we will, aka the known wife known excess that we can use this formula for all that upper are days old. Now we copy this for all the observations. So there we have the prediction for the already known data, so you can see that there is a gap between water actually ordered and what is a pretty good Now that's pretty. Some number of protest order for a future. So we've been supplied the amount spectrum sports applies on. Based on that amount will try to predict common indicator they should order. So we're doesn't dollar values like 506,000 and two Citra. Now, if this is amount of spending school supplies, how many pictures do we have to order? That is our problem to solve. So you want to find the predicted number of readers to order so we will enter the train culture here, we would say train, give the lone voice that is the Rikers order. Historically. Then we would get the known access. That is, the amount spent on sports applies for the corresponding number of Victor's order and then the actual media amount we want to spend in the other sports of life in the future. So again, as you will use this formula, part of it is all too real. occur the values of no wives and no nexus. Don't forget to press control chip Director, that this is a function. So there you have the value off number of printers to order in future. If this food supplies you spent so much of our money that concludes our discussion. Thank you for this name. 15. Linear Regression using Data Analysis ToolKit: Hello and welcome. My name is Barco Majumdar. We have only discussed how we can do. Let me immigration using itself. Also we have seen because you can use a special function in Excel called Linus for conducting Leamy regulation. Now we move one step farther on conduct any immigration using Excel with data analysis tool kit. Previously in the course we have seen that the journalists toolkit is I don't in Excel on. We have seen how we can add it to the Excel. Also, we have already discussed many concepts related to linear regression, so we will not repeat those discussions once again. In this video, we will straight jumping to the discussion regarding what is the output produced by the cannabis is toolkit with regards to linear regulation. The first set of output produced by Garnett is this toolkit with regards and reintegration is known as the immigration statistics. We will discuss each of these statistics What is produced individually The first time Mystics is multiple are it is the correlation coefficient that measures the strength of really immigration Between two agreements it has a value between minus one and one. What means the correlation is very strong on a positive minus one means the correlation is negative. What strong on zero means. There is no relationship between the two very amounts. The second immigration started sticks is our square. We have already seen what is ask what it is, the coefficient of determination, which is used as an indicator off goodness off it off the integration and analysis it's ranges from 0 to 1 of any off 0.139 I shown here means that 13.9% of the values in the regression arises model fit properly. The third revision statistics is adjusted r squared. Now this Sinus six is applicable when we're using multiple number of independent variables in the regression modern. In our case, as we're dealing with one independent variable, this is not applicable to us. At this point of time, when we discussed might immediately deregulation, this would become applicable to us. The fourth regulations 36 is standard, and we've already studied what is standard it off. It is a goodness off my shelf that shows the position off. The digression on this is morning. The smaller the number, the more certain that the regression equation is a good one. The next set of output is I know what our analysis is. Obedience Analysis of brilliance is done for regulation on residents. We have already seen what this means. The first output is DF are degrees our freedom. We have seen this in the previous video. The next output is some off squares so we get the sum of squares for immigration on for residue it's on total. The next output is means quit where we take the difference from the mean on Dsquared The fourth output is F observed value We have seen this indeed aids by discussing the finest function. The last output here is significant CEP which we know is the probability that if statistics is not by chance shown here other next set off outputs the first output is the coefficients . We know that these other regression coefficients the regression coefficients When we're dealing with one independent variable are they intercept on slow? You remember with in order these white we doesn't know And Peter one in the initial lecture , The second column gives the standard it up. So what we get here is the sander data on the intercept on the standard error on the slope . This is because we're dealing with one independent variable right now. The smaller the number, the big ideas. The third column give the result off the T test. Now. We will not discuss the test as it is a big topic of discussion. Watch out for my future video where I discussed this indie dates. I've given the formula how the justice but form the Fifth column. Today's a P value. This is the living off marginal significance. Now we will discuss this. It is a pragmatic chur where we will discuss hypothesis analysis. Next on the output, we get the shown here, where the predictions for each of division point is provided now based in the predictions it gives the residual values. We know that the incident is the difference between observed value and the predicted badly . The standard. That's a do. It is the difference between observed you and predicted value divided by started aviation off the predicted value. Now we look at the graphs produced by the regulation to love Can illnesses told back. The first graph is a residual plot. See, here you can see the outlines. Clearly, the next graph is the line freak block Very consider predicted values against us actual values. The third plot is a normal probability plot. Now that we have discussed all this, let us eat it. Except so here we are in excel. You will be familiar with this data already. So we have 75 points for predators to order cartridges ordered and school supplies Spain never use yet analysts told back you could see that we have option for regression. We select that now First we have to give the white values So where we have the white readies So we include the head Oh on we give the complete later for the print us that we have Next we have to give the x value. So here you are taken to doing bartered of X values There is the schools of life We include the header on give the complete range to indicate that we are having labels in the first row on we say that the confidence interval is 95%. Next we stayed output area record output will be placed on We select the options With all the options we would see what exactly output is We have discussed all affected next Children. We will see it in reality now. That repress. Okay, so there you can see the complete output has been provided. So we have seen this output while we're discussing the output during the lecture. So you can see that we have got the regression statistics. First you're with the value for multiple Lord are square just about square Sanders. Later it's a truck. Then you have that. Know what after know where we can see that we have got to say slope coefficients, Sander Dirty starts fever do. It's a tro. After that we have the predictions recibio output resident on the Sandra rescuing. We have seen how this is self. Later you can see that for all his every paper forms the predictions have been made on the result Cigarettes happen calculator and then we see the probability output The problem they'll put this year Stop creating the normal probability block. You can see that you just made the plots also for us. We just moved the plots with ingredient area so that you can see there were usually properly So we have three bloods that we've discussed so you can see on the screen. Lightly block then clearly. See there out players. This is the normal probability plot. You're all so you can see with Our supplies are clearly starting out. This is there is a girl plot. So now in the data. Let's see where we have our place. So you can see. Here is one out flyer. You see, the second no player here we have the photo life. The exports were clearly starting on the graph. You just take the predictions for them. You see the 70 year read up going that's good is will be high. You have noticed in the statistics that 2013.8% of the data operates the graph. So this is because of our clients that you get regulation. Mortar has not been very good. That concludes our discussion. Thank you for listening. 16. Multi Variate Linear Regression: Hello and welcome. My name is Barco Majumdar Way Haven't discussing the immigration on linear regression using Excel for the last few videos. Now we come to the last discussion on linear regression in this course. Now we discuss mighty valiantly near immigration. So far, we have seen linear regression using one independent variable. Now we will see the immigration using multiple independent variables. We have discussed it required duty in the past for your videos. So now we will look at the linear regression equation in Multivariate linear regression on . Then we will see a demo off out of the market. Very linear regression using Linus function on Donald is a stool kid in Excel. The question you're looking at is why is it would be Does he know Plus become on X one. Plus, we got two x two. Although we have to be dying extent. Plus, it's alone here. Why is the dependent variable X want to extend our the independent variables? Because it'll Toby tie in Artur regression coefficients B Does he, though, is they accept and Epsilon is the totem. So let's straight jumping to excel and see how to conduct Marty. Very denigration using except now BRD Nixon First week are not immigration using line ist. So we give Linus functions. If you have chosen three cents because now we will get toe regression coefficients as we have got toe independent variables. So I'm giving the Linus function. We get the immigration coefficients they accept on the regression coefficients B meter one and Rita too. Now we can from the equation. What actually this immigration coefficients will generate So there you can see the reflection. Requestion. No wonder that see, this artist extended it by the nine s function for this market. Brilliantly immigration. So in other Linus function once again and about his college parameters, you hear used up to independent variables we said drew for constant and true for statistics . So there we have started states taken over here. We have got the r Squibb. This is better than when we did being immigration in a single notice. Also, the big visa freedom the F stops is much higher. We see that the significance of it is also greater That things that John said they have started your triggered by chance is very high. - Now that we have seen big immigration using Linus. Next, let's see how to do the interrogation using it, Analyst. Okay, so we start with this clean sheet once again with only the data available to us. Now, we invoke analysis tool kit to let immigration you the values for white, you know, values for X. Here we select both cartridges and school supplies. You forgot to include the labels. So we'll go back and director do that once again. So we select printers along with the label. On the first row, we do the same photo. He different very puts cartridges that school supplies. Now we click labels to indicate that we have intruded the labels. We select output idea and then you select all the different outputs. What you turn it is a stool kid. The whites there, we have it out. The first thing is that we'll be working with you, clear immigration we should be concerned about adjusted out square. So we look at this pick up, which is more 11 in terms over that, that in the integration equation is a good fit or not. We can look at the Novo. We see that the immigration on the residual sum of squares argument you look at the core patients. Now we have a corporation for cartridges and school supplies. So we have the regression coefficients for X one and X two, apart from things, except we see that the prediction is given and the risk to alter also calculated you can see the different grabs disputed the restaurant floods for the partridges and for the school supplies. The line fit block is also created for cartridges and for school supplies. Lastly, we can see the normal distribution plant that concludes our discussion. Thank you for this name. 17. Exponential Regression with Linear Model: Hello and welcome. My name is Parker Majumdar. So far in the cause, we have discussed moving averages many variations off that on linear regression. You have realized by now that to find a proper prediction model, we should experiment with various techniques before we can arrive at a model which can give us good predictions. In this video we discuss another technique which is known as the exponential regression. Using linear model, we will go through the nuances off this model in this video. So what is the exponential regression model? We will first discuss this model when there is only one independent variable x the exponential regression mortal re expressed by the dependent variable as Why is it Porto? I'll find toe e to the power Bi Dykes just to cry Fight. We're trying to predict the value of fight based on unknown value off X, we will transform the formula that we just discussed and see what we get. So our model is why is it went too far? Times u to the power Bi dykes If you take natural love on both sides, we get natural Log off right It's equal toe love he off four times each with a power Bi dykes It is always the question we further get Loggia. Why physical do lovey off Alpha plus log the off into the bow off be tags continuing without transformation We get law G off. Why is equal to law G off Alfa plus B diets so we can write this as natural? Log off. Why is equal to natural log off Al far plus B dykes We noticed that now we have got our linear equation Are in two variables that is natural Log off wife on X So for a given value of X, we can find the natural love off white. I'm from natural. Log off by We can get the value of y by taking exponential off natural Log off wife, take careful Note off the following observations. Observation number one You notice that Alfa E to the power we die X plus one is equal to help trying to eat it. A power bi tags plus beat up which is equal to Alfa U to the power bi TAIEX in tow. He did the about off be done. So we noticed that for every increment off x by one unit Why increases by my people off into the power of freedom off the mission number two. We already saw that the model takes the form love. Why is it called toe Delta plus B tykes here Delta and beat up other regression coefficients? We call this as love level regression. Now that we have discussed all the requisites here, let's go to accept and see how to perform exponential immigration using video. So we are here in excel, your family or with this data by now. So we will try to predict the number of printers to order based on the school supply spent . So we create more columns in one we take love off like that. His love of number of previous three order printers on in the school supplies. We have the values for extra Mr School supplies as it is so we can calculate the love off number printers order using Ellen function of extent. Salen off the number gives that extra logo the number of prisoners to order we caught with this across. But all the day does it there we have it. Now we need to perform any regulation with X being the amount spent on school supplies. And why being glove off? Why, that is Log off number of Rikers ordered Historically, so we goto eternal is the stool kit. And we say, Digression, we're known wise. We give the log off white that we have computer? No. Next we give the amount inspectors culture place you say labels are available market confidence interval of for 95% and we defined output area. Then we see all the different route that journalists tool it provides Nick Okay, so there we have output. We can see that the ignition started six sq in. Now you see here the one we're interested in is our square so r squared with about 56 person, which is fairly good. Not very nice but baby good. I don't see that noble and you can see that they accept on the slope is also cut. You can see the residual floor where the prediction is provided another prediction when you is for log off white. Let's look at the charts generator So we see that three charts a generator questions that still flock. You see that little plot? You can see that the foremost canasta city. It's not quite made by this data. This we have realized in the past. Old next is the life blood. The last time you see them now will probably be part on that side. A condom for predicted value off. Why that you have generated through this technique predicted value of why can be found by taking the exponential off the predicted by new that we got through The bitterness is to back So you see that the predictor running for number of protest order on day one is summoned 19.9 we call police across for the entire doesn't on there we get the predictive value for all the days off The known Veda Yes, over the visualization. Let's put the actual values off white next to the predictive venues they every heaven now that straighten predictive values into the future. So we required to predict the number of printers to order for a given number. You just known regarding how much you spent on this holds a place. So we ate school supplies. How much way we plan to spend on DSI Hominy printers Believe to order in that. So we take some arbitrary values right now, but this is the way you can do the prediction in this particular more. So we try to find out if he spent 2000 2000, 506,000 school supplies. Harmony picture should be ordered. We know that he's in love, never immigration. So we find the natural love off the pictures to order. We have the district on the slopes so we can form the linear regression equation from which we can make the prediction that is intercept. I agree. And could it plus the slope? Yeah, I got that call so into. No, no, about respect of sports of lights. Very having this is the natural log off number of printers to order. So from this, we can find out how Maney production should order by taking the expert in shallow. This number. There you have it. So this is the way we can make predictions. You think this particular Morton that we have just created is another way toe Get to this. Select the why on the X variables are inspectors and sports a place and in certain x y scatter, you have it. Now we have the critters on the X axis. This is not quite correct. Identify the date US cities. So we put printers on the Y axis and sports a place on the X axis. So now we have the correct charge. When we have the medic. Chuck, we wouldn't. Soto big turned like but before excel get stuff linear regression trend like we want takes foreign chill trend line. This is a linear regression trend line between number of printers to order on amount spent on sports, A place, you know. So that explanation ship there, you can see that the immigration line for exponential regression is now collecting. We have changed the color of this line to distinguish from the linear regression light and the consensual the equation and also that spread. So you see that we have the equation in the form that we were discussing. That is why you support trying to eat the power we guides. That concludes our discussion. Thank you for this name. 18. Optimising Exponential Regression using Solver: Hello and welcome. My name is Parker Majumdar. In the last video we saw how we can do exponential regulation using linear model using Excel The model we discussed has one shot coming. The shot coming is that exponential immigration using linear model does not try to really my stuff, it us We can overcome the shot coming by using nonlinear models to apply non linear model We can use solver in excel. We'll go straight to excel and see how this is done. So here we I night said which we just used for demonstrating exponential progression You think medium or now I would like some columns here. The first color with the sport, the predictive value off. Why next we will compute that has to do it and then we compute the square of the residue. Now we concluded our exponential regression model using the deer model. So we got Alfa is equal to 8.2062 and I would beat hours 0.0 Fire find, meet up. We can compute the predicted belly off white way far into We like her it first you do exponential off beat up. Me too X flanker Peter Also So there we have completed a number of printers to be ordered. According to this morning, the computer is he doing to the people? Copy all the family together and we compute that square off the list. So there we have on our values No computer sum of squares off. Although constituents, this is SS are yes, I d now being books all word, the salt. But we would say that we want to minimize the sum of square of residence. So we select the cell in which we have computers Sum of squares of residents. I didn't say we minimize it, minimize it by changing the baddies off fire. So there we have sort of only their programming problem. And you said okay to solve on We got the results. You see that the well is a well find. We have changed Also the value off this summer, off spinner, this goods has come down. So they concede that new credited values According to the new Al Fire Eater, this check with us this prediction but up after optimization. For that, we need to compute the at square for computing r squared. We require this radio Some of stress of total and sum of squares Regression Not because that's almost place of total. We have to find the average. First we find the mean off right. Instead, we found using average function. So far the mean off white. Next we compute. Why minus me enough white. Do you think of this? - Next we have to compute. Why mine us mean off? Why hold squared? We compute this for all the activations. Now we will get the summer off All these values This will give us the sum of squares off the total s s total. Now we comfort sum of squares off immigration until they says Adi s I d no square some residue, it's it's gonna be found by subtracting. This is our ideas I d promises to open now. We could fight outs Quick, this is SS. Here's our region divided via Sesto. We find that our squared is 55%. So actually this is not a good much benefit impact. It has become worse. Now that concludes our discussion. Thank you for listening 19. Exponential Regression using LOGEST: Hello and welcome. My name is Parker Majumdar. We have already seen the theory behind exponential regression. We have also seen how we can compute. It's one inch immigration using the Dionysus told kid before the saw how this could be optimized using solver. There is yet another way for computing exponential immigration that is using the longest function. Longest function is similar to the finest function we used in the near digression. They started this in the days Now we know that the regression equation we are talking about is as follows naturally. Log off. Why is it called a day? Plus we diet. If you take exponential and both sides of the equation, we can exponential off natural. Log off. Why is it going to export in shell off their top class B takes? This gives us why is equal to exponential off kilter, into exponential off tights. We can write this us. Why is equal to exponential of delta into exponential off vita rays to the Power X. So we have our equation in x and y when X is the independent variable and why is the dependent radio? We also know that you did the power daily diet is nothing but al Far Longest function produces output based on this equation, so they just can't eat output. What is produced by the longest function? Just like the Linus function Loggers producer Stan Outputs arranged in two columns off five euros each. The first output is E to the power beat up The second output. Busy to Tamar Delta are Alfa on the second row. First Bono. We get standard. There are only two. The war we'd up on the second line on second column. We get standard error only to the Power Delta. In the next room we get R squared and then we get standard in a roundup estimate that this producer that song, Why on the fourth row first column we get, they have stacks on the fourth row. Second column. We get the degrees off freedom in the last room. We get the sum of square of regulation on some off where over schools. We have discussed all of these terms in our previous discussions. Did this terms are sounding strange? At this point of time, I would suggest you to go back to the previous lectures on Go through the detailed discussion apart from the longest function, we have another function called growth. The growth function can be used for making predictions. They use any off. The growth function is as follows we say growth. Then we provide the known twice. Then we provide the known eggs and then we provide the body off X, for which we want to make up. Prediction. We will see all of these using except So let's dive into existence straight so he everywhere. Except so we'll see the output from the longest function. We know that our longest produces an outputs in two columns. So we select 10 cells in five rows and two columns. We give the longest function we give the known twice. The first didn't give the known accesses a second parameter. Then, he said, True for constant and true fourth stats is the same like lightest function. Remember the press control chip detector because this is ari function. So there we have cows from the longest function. You could see that we have got our squared off. If this explosive, we have discussed all the district outputs before, so you can make a computations based on their discussions. We had idea Now we try to make predictions using the road function, our excess the money spent on school supplies. So we will see if you spent different about the money and culture plans, what would be the number of protest we need to order? We reject the values for ₹2000 or dollars. 507,000, 4000. It's no function is also added function. So after you got the protection, you need to pass them to achieve. So you say what you broke. We give the known Weiss trust. Then we need to give the known access Alexis money inspector school supplies, not cartridges. So we'll make that change and then you need to give the value for X, for which we want to make a prediction. We will. Anka No nexus unknown voice because we will use the same values for the predictions with other values of X as well. Now you press control chip Corrente. There we have the prediction. When the amount specters food supplies is $2000. The copy this what other values? Also there we have the production that concludes our discussion. Thank you for listening 20. Multi Variate Exponential Regression: Hello and welcome. My name is Parker Majumdar. Now we come to the last discussion on exponential regression in this ghost. So far, we have discussed exponential regulation using one independent variable. In this lecture, we will discuss exponential regression. When we have to deal with multiple number off independent, very amounts, you will notice that the basic concepts remain more or less the same on the tools we use are also the ones we have already used. So let's get started. We will take a look at the regression equation involved in multi, very it exponential regression. The question is, why is equal to exponential off B does zero plus double our next one all the way up to be dying. Except here, X one to accept are the independent variables. If you take natural logs on both sides of the equation, we get natural love off. Why is it would do with a zero plus beat our next one all the way up to be time. Except so now we have ah, linear equation. You will recall that we call this love level regression we will now dive into except on, see how to conduct my TV, idiot exponential regression. Gov are back in I accent. We will start Michael Idiot Exponential immigration using Google August 1st we see that we have to independent variables, so we will select a day off 15 cells fine by three. Now we will enter the longest function. No advice are the data for the printers to order and there's no next other cartridges. Order on the amount specter on school supplies. Then we give crew for constant and true for stacks. Remember to hit control, chipped, enter. So here we get out. What is the outskirts? This is much better than when we need a single, very it It's exponentially. The ignition now even make predictions using it's one inches regression. In multiple independent variables, we will use the broke function. First, let us see the predictions for the already existing data based on which we have made the more detailed. So we enter growth. They provide the known vice. Didn't we provide the no nexus that it's cartridges and school supplies? Now we have to give the cartridge in the school supplies as the plan is off X based on which why would be predicted. We anchored the no nexus are known wives because we will use this for all the off the rations. Press control shift Enter now we coffee this for the observations. So there we have the predictions for the existing data. Now let's try to predict for some future later. So we take the input as the number of cartridges to order amount to spend on sports apply based on these values will predict how many printers to order. So again we had other growth function. We give the known Weiss on the known excess. You can see that it makes sense to name the ranges, which you will use frequently in such situations. Let's find out how to names Rangers in Excel for the third parameter, we have to give the future burning off both cartridges on amount, respect and sports airplanes. So there you see, the prediction has been generated. Next we perform exponential regression using little. This is Stuart back. We know that this is a law 11 regulation, so we need to find the law off. Why there is a love off number of printers ordered historically, so we find the natural, long off number of printers ordered that is there in our doesn't. So we use the function and give the value off number of printers order. So we copied this across for all the observations. So now we can perform Lee near digression between love off. Why im independent variables being number of cartridges ordered on amount spent on school supplies. So we go to the dollar store back in work regulation for known wise. We give the values in column C that is natural. Log off white for the X. We give the values in the column D and E that this number of cartridges ordered on amount spent on school supplies. We select output range hit off. There we have the output way. Know that in Marty failure, immigration. We need to use the value off adjusted R squared. It's robot square. This is wrong. About 67%. You can see the other output. No, it is that we have got the coefficients for intercept number of cartridges on amount out school supplies. Here we see our predictions notice that prediction is for Lakoff White Natural Log off white. So from this value, we have to extract what is the number of printers to order by taking exponential of this value. Just compute this value and see what we get. So we will insert a new column. Very well. Keep the predictive value for Dwight as discussed. We can find this value biking exponential off the predicted value off log y by journalists Toolkit. So there we have it. So if you copy this across way that we get the same prediction from it. Analysts tool kit on with growth function last me, we take a look at the graphs that is generated. This is the normal probability plots is the line thick plot for sports? A place is the line food plot for cartridges is the physical block four sports a place and last exist that is in your blood for cartridges That concludes our discussion. Thank you for listening. 21. Power Regression: Hello and welcome. My name is part of a gym that so far we have discussed linear immigration on its full interrogation. In this video, we take a look at another form of regulation that is power immigration. We will start our discussion with our immigration when we have one independent variable. As before, first players like a look enter regulation equation in power immigration. The regression in question is, why is it going toe on four times X to the power off reader here, X is the independent variable and why is the dependent variable if you take natural love on both the sites we get Natural lover five Physical toe Natural love Off Alfa plus beater Natural long off X So we know that natural, lovable far is a constant so big can denoted by your constant del tough So the equation becomes natural longer. Five is equal to tell Topless vita nicer log off X So now our power immigration in question has taken the form off linear model. So we called our immigration as love log recreation as well, unlike linear immigration on exponential immigration except does not provide any special tools for our digression. So we have to convert the problem toe linear regression problem and solve it in. Except however, using the trend function, we can compute the future values from power regulation. By the expression given here, we will see how to use this in excel. So let us die straight into, except so he'll be headed next. So first we find the log off white, which is a log off number up. It does order. We also need to find the love off X. In our case, we find the log off the amount spent on school supplies. This is a lot of lovely immigration. We find the log using the lead function when you can find the night, sir. Look, Properties, for all the observations. Now, we have log y in logs so we can formally here immigration model using plug wire logs, quarter data and walk down. And listen, stool kid. We select irrigation for no wise. We give love white. And for no next we give the column G, that is log eggs. May 7 include levels and we select our good idea. Now we get Okay, So there we have the foot. We can see you get the same output as we got for Libya decoration. You can see that we got they deserved which is love off far Thank you. Very noting it. And the slope off log off x the predictions We see that we have other protection for you. Log off white Now let's try toe predictive value for why? Using the train function that discussed you Now up lecture you takes one inch Eloff, Let's log off the known wise. Yeah, I got this WeII known excess This also we have got natural log off known access. We cannot just give the known access. You have to give the natural log off alone. Access Oh, never give the lock off X for which we have to make a prediction. So there you can see the complete formula Press control, Chip that up and you get the predictor populace for out of relations using the same formula you can predict the value in the future as well. Yeah, Now let's see what is the prediction? According to the give me a regulation that we conducted using their current, this is told we know that now put provided by the journalists Tool kit is the particular value for log white. So if you take the exponential of that, we will get the predicted belly for number of printers to order 61 inches off the natural log off white. There, you see, we got the predictive value. It's the same as what we got. Using the term functional is another way to find the power regression equation. Eight. The vibe Values on the X values and plotter ex wife's Carter. You see that we have got the printers on the exact sense and food supplies in the Y axis is not quite what we want. So we changed the bigger cities. We put the printers on the Y axis and food supplies under X axis. No district. Now we are trained like, but before the gives a linear turned like we want a power regulation trendline. So select and say this play their creation. There we have the trend line from power Digression. You get predicting the future also and taken to the hospital way moving to a convenient location so that we can see it clearly and lies the form so that is visible to you on the video. So they have the equation for our immigration. You can see that it is in the form al Far times exit about up to be done. That concludes our discretion. Thank you for this name. 22. Multi Variate Power Regression: Hello and welcome. My name is part O Majumdar. In the last video we saw how we can conduct our immigration. Then we have one independent variable. Now let us see how to conduct our immigration. When we have multiple independent variables, we take a look at the regression equation on Go straight to excel, to see a demo off how to conduct Marty very ate our immigration. The regression in question is why is equal toe be? Does zero x one to the power We don't one x to the power Peter too. All the way up to X sent to the power we diet. If you take natural long on both the sides of the equation, we get natural Log off Why is it Quito natural? Log off with a zero which is a constant plus we'd a won not natural log off x one all the way up to be dying natural long off. Except so we now have ah, linear equation. As you will recall, we call this our love love regulation. Now let's head to excel and see how we can conduct Marty radiate power immigration. So we're back in excel. The data is from here to you by now because you've been using the same data. So first we find Log off. Why? That is love off number of printers to order on. Then we find love of X one x one Here is the number of country is ordered on X two is the amount spent on school supplies They find Baxter Log off x one x two y find the natural dog . We will use the function in Excel There we have it now we will competence across so that we have a natural lago Why x one and x two Now we can conduct linear regression between natural lager Fly on natural love of X one and x two Natural love of X one and x two with the independent variables on matter Another five. Very so we go to data on involved a Daniel. It's a stool kid. We select regression on give the known wise as the column G. That is natural. Log off white way Give known excess that is call of H and I. As we have selected the levels of mean to claimants on we select output area. Your family have it all of this by now as ever discussing it so many times on Big Dick. Okay, there we have about using market millions. You need aggression. We should notice the value off at just a dot square. You see that The district ask what is roundwood? 70%. We just bet improvement. What? You thought that output by Don It's a stupid kid that look at that. No. Well, with into the coefficients, the coefficients has got into step natural long of X one on natural olive Extreme. The residents are You can see that it has predicted the body off Natural love off. Why you think the value of natural law Go fight weekend establish What is the prediction for wine? For a particular observation we call on, we predict that the number of printers to order what they sing later from the duels way have to do is we have to say find exponential off natural. Log off why that has been calculated. There you have it. Coffee This for the observations you will get the prediction for all the operations based on this power Grecian Modern, which is a multimedia powered revelation. More. We can also compute critical money off white by using the training function like reading. Okay, so when we were doing powered ignition in single radio we've been predict using trade, you know that we have to give exponential, then give the natural love of flight. Next parameter is the natural log off X two here. Really? The natural logo that number of your youth ordered and amounts by transporter place. The last parameter is a very OPEC's. In our case, we will have exploded next to really get the letter logo. Number of cartoon disorder and amounts accounts for supplies for each individual of the rich. This is a very functions of the lack of press control chip center. But before that, no wise had known excess. So there we have the prediction. It is the same as what we got lock using. Get out of this story that concludes our discussion. Thank you for listening. 23. Logarithmic Regression: Hello and welcome. My name is part O Majumdar continuing our discussion on regulation. Now we discuss longer the big regulation. This form of regulation is very useful in certain circumstances. We will see all about that on we will see at the more this on Excel. So when do we use lovely doing regression? When the data demonstrates a fact acted grows or it Nice. Ready fast initially on then seconds down at a slower rate over time. Then we use this form of immigration. They just look at the general form of the immigration equation for longer the big regulation. It is why they were Toby does zero plus beatable on Lakoff X one plus Vito Lago makes to all the way up to be dying. Log off. Except now in this form, we have to have all the x one x two up. Two x ed should be positive. Do you call this immigration? Also, as levin log regression. Now let's move over to extent and see how toe conduct this form off regulation. So the FBI next said so. It's a level law. Immigration. So we have to find the natural log off. X so x year school supplies. We can find the next year lock using your function in it. So we say No, the sports place. We copy this across for all the observations. Now, this becomes a linear regression problem. Since it becomes linear regression problem, we can find out the regression, coefficients and other started states using the lightness function. We know that yesterday is a standout puts in five rows and two columns. So we high five rows and two columns on Put the lightest function where were given no voice as the bodies of the Breakers order historically no next. As a long off the sport of life that you have just found while getting the statistics that called Stand Give true, there have output, you see that are squared ready general trip, this explosive. Now let's predict the value off. Why? Based on love, rhythmic nutrition. So we get a column for sorting the predictive value. We will use turn function for predicting the body off. Why? So we said train Now the known buys are the values off the printers order story for the no necks, we give the log off the X value that is amounts. Why don't schools up nice on four X we give log off X. This is Ari function. So we have to press control Chief Director We I could the known widely known excess so that you can use its formula bar sense also on hit control cheer tempter. They have the predictive value for Why their prediction for all Top division that here? No, it's really fulsome. Future values wait. Foot the body off the amount spent on Twitter Place. Now who will be number of printers to order we take some arbitrary values for amount spent on school supplies. Assured me to be in dollars. Oh, and here the X will be the the martyrs on sponsor place that we are predicted to spend in the future. You can write and no nexus us before we could copy the formula. Brother, that is off. You take there. Be after production. If the amount stricken schools of lights is as we have now, just use off, get out of the system. So we woke because of his history. Facts in expectation for the loan whiles we have to get the radios off the letters so we select the values in column B No. Next we have to give the values that column H that we have a computer that is log off. No amount spent on Twitter place. There we have the output from the determinant system back you notice that we got the same r squared value. Also, the sum of squared over the students of the same lovely to me immigration output is on the baby off white From the prediction What is a very old on the regression politicians? So far, we have conducted lovely matriculation with one independent variable. Now it has conducted March of Alien Covenant make digression way we could have x one and x two x one will be the number of cartridges ordered historically on X two Will reader amounts back transporter place. So we have to find the law off x one next to so you find the love off number of cartridges ordered historically So there we have it for the observations. Now we can perform in your immigration green book down and it's a story. Kip said Extrication No, why? We're doing the same. That is a column B whether critters data is available, Forex. Now we will have to Select two columns X one and X two. That is log of X one. In love of X two Put up a jet you select a new idea without can be displayed and it Okay, you see that we have a major improvement on R squared value. This is market radiant immigration. We should consider the idea, but just not square spread over to see what's going on. You see Here way. December's for X one on four x two. The predictive value is the value off. Why, that concludes our discussion. Thank you for this name. 24. Quadratic Regression: Hello and welcome. My name is Barto Majumdar. So far, all the regression models that we have discussed when linear models. Now we discussed some nonlinear modern's. The first in the series is the core tragic situation. Modern starting the regression equation for the court tracking regression model before we go down to excel to see how we can for some regulation using quadratic regression Morning. The question is, Why is it quarto? Because zero plus theta one x plus two x squared plus absolute Absalon is the doctor. Does it become one? And we got to our regression coefficients knowledge. They don't just excel on perform quadratic regulations. We are back in our familiar excel. So first we need to find X and X squib. We will use the amount spent on school supplies as X there. We have it now will copy this for all observations. Now we can perform linear regression. So we really involved the journalists toolkit. It's direct aggression or why we will give the values in column B that is the number of printers ordered. Historically forex. We will give column F n g, which contained the values off X time X squared XB amount spent on schools, a place we would select out stadia and click. OK, so there we have our foot down in this system. Let's see the regression coefficients. So the intercept is nothing but because, you know, slope Opec's is nothing but a better one and slope of X square. Easy Puerto Be talked to using because you don't be telling and we could do you conform them. What? Back to get patient that we just discussed, didn't see dots with and also the sum of squared off residence That concludes our discussion. Thank you for listening. 25. Polynomial Regression: Hello and welcome. My name is part O Majumdar. In this video, we discuss polynomial regression. This is very similar to quadratic regression we discussed in the last video for Normal Immigration is also a non linear regression model. We will see its nuances through an example in excel. Now, for a nominal decoration, it's very similar to quadratic equation that we have done, so we would not go through the steps. So I would take a shortcut where we will use all the graphical method to determine the equation require. So our why is the pretty else order on excess the amount spent on school supplies. So we select why you have the X challenge and then we will insert X y Scott. We want to predict number of printers to order. So we would like to have a number of produced order on the Y axis on the amount spent on school supplies on the X axis. We will just rectify the graph. No, it's OK. Next you lado trendline by before the trend line is only me, regulation he will choose for normal aggression will be choose for normal immigration. We can provide what is a degree off. The polynomial that you for might be for the number of degrees is to that means it becomes a quadratic regression equation. We can change the number of degrees to a rubber that we're comfortable with that we desire to have. Oh, so you can see the number of degrees is three. Then let's see, what is the situation that we get from this regression? We get a polynomial of degree four, as you have selected for, we're protected for three. Our question will change to a polynomial of degree three. - That concludes our discussion. Thank you for listening. 26. Selecting a Model through Experimentation: Hello and welcome. My name is Marco Majumdar. So far in the program, we have discussed many models using which we can make predictions. However, this is just a people dice book. There are so many more models which has already been created on. There are so many more models which people like you will create in the future. Also, we will see that combining the models that we have discussed, we can create innumerable number of models. Now the most difficult part is to select the model that will predict now comes in the future reliably from the data that is available to us. We will discuss this aspect off predictive and undertakes in this video. Now you must realize that what we're doing is traditional programming in traditional programming. We write programs to which we input the rules on the data to the rules. On the data are imports to the program using the rules and data that were provided to the program. The program generates dancers. So this is traditional programming. You'd have realized that we are picking up a model on then we are giving the data, applying the motor on top of the data on beginning answers. So what we have been doing is basically traditional programming in the Part two and part three. Off this program, we will see a different pattern night, which we know as a machine learning in machine learning. We have programs. However, now we input the answers on the data based on the answers in the data that we input. The program generates the rules, which it will use in the future, for predicting on the data that is supplied to him after the short deter. It was going to new on the topic of our discussion for this video. That is for a given, Data said. How do we select a modern? We will go straight to excel for continuing this discussion. So now we unaccented we will use the same data that we have been using so far in this program on, we will try to select a model which best fits this data. So I'm creating a new book sheet with the same data. Let's start our hunt with dry moving averages method within moving averages. Also, actually, we need to try different values off end that is, moving everything for different number of days. However, as this is just illustration, we will only find knowing every years for one badly off that it's an easy court. Tonight, this number nine has been chosen arbitrating. This leads to Jordan with experimentation. So now we find the moving of these for nineties. So he comes with a 10 day and fighter average for the previous nine bills. At this is your simple moving average. There we have found the moving averages for nine days. Now find the stool that's doing is nothing but the difference between the actual number of printers. Order minus stuff. Predicted number of printers to order. Oh, you fight the square of the residue in way . Copy this for all the observations. Next we find the sum of the squares off their students. You can use this value of somewhat square, so restaurants for comparison. We make it somebody shaped where we record performance off each of the methods that we will try first. My third we have tried this simple moving average is well, tractor somewhat square. Some rest stools and the are square where it is up to cable. Be ready with this illustration, we know move to the next method we will use central moving averages again. We will take center moving averages for nine days. So first I played a new option with the same dude up name. This work should at central moving averages, - find this interviewing averages for nine days. We go to the fifth day on find average for four valleys above on the day. Value on for my knees below the 50 can create value till it is possible to computer central moving averages for nineties. As before, we find that it's a duel. I did also find the square of the residual because we want to compute the sum of square off stewards. Oh, we know that residual is the actual value by nurse the predictive value. So we regard the value go swell of residue. It's you see that the sauce, whatever it's doing is reduced. Wait drastically from the first method that you placed. Next we will find the weighted moving averages. We create one more book ship. Yeah, well, rename this to weighted moving evidence they will find the weighted moving average for nineties. You first assigned the weeks so we'll give simple weights off. 123456789 for the nine. These some of the greats on trying to find the murder player they were found all the multipliers, which is very find. It multiplies. That is the case where the summer for the multiplies is equal to one. Now we find the weighted, moving averages. We will take the central, greater moving everything. So you some productive Well, I think of the other players lovely, having we call this for all the values where it's possible to find central moving their villages, settle weighted, moving ever before we will find the stool and the sum of the squares of rescuing. So we're going to fight The square of the residents also makes it space. For that, we find that the Stuarts from the square of the distorts there we have the sum of the square residue. It's in the case off central record moving averages. So it was just in the somebody sheep. - So we tried. Three models are moving averages on the date up. We saw that we got the smallest fighting for some of square off residue. It's when we used weighted moving average. Now let us tried off regression models. Now we will try immigration models. We start with level level regulation. So he cleared a new chic. We named this level level regression. We will use number of printers to order as the dependent variable on. We will use amount spent on school supplies as a dependent variable. So go data do, Darren. It's a stool kit. We'll choose regression. We'll get the wide range as the number of printers ordered. Historic me and the Ex Rangers the amount spent on school supplies. Selector put area. Today we have the result of the immigration to notice that we have got our square off about 56% notice The sum of square somebody's doing This has come down drastically from the moving averages by thirds that we were using so far with the court. This is not somebody. Now let's move to the next method. We will try the 11 no immigration, so you create a new sheet for that. It's a long immigration. 11 is a white laments. The same will be everything. Love off the eggs ready. So we say that's a love of ex way confined. This way you think they didn't function is a little the amount spent on schools, airplanes. So now we conducted aggression between number of printers. Order on the log off amount spent on scooter place. - There we have the output. So we gave the value of our square. We're competitive and all to the sum of square summer students. We will conduct the love level immigration in log level immigration. We have to take a log off the Y on excrement as it is. - So we take the natural log off white using their function. Now we create a regression models here. The wives will be the column. F and X will be the color. Be so there we have about food, the values of r squared. And it says Silas, I d into that Somebody ship. - Now we try our fourth regression model. It is love long division. So long block immigration. We have to take the log off. Why on the love offense, make an invasion murder Trump love off to longer fix so clearly a new sheet and tries this up so she gets ready. Having played integration became known wise and the economy f had known X as the G There we have our put. We tracked by news off our square on some off square of residents. - So we have applied three models using moving averages method on four models off the knee regulation on the data that we have got. Now let's try a few more models which are fallen gnomic on. Then we will also try a model which is a hybrid between Paula, gnomic and algorithmic. So here we're back in Excel once again. So we now try toe create a modern where Why is it quarto eight times x squared, plus some constant And it we'll call a model X squared. So as a plus step, we find X Quigg X is the amount spent on Twitter Place. Now we create a regression modern using bitterness. Historic it where Why is gonna be Britta's ordered? Ah, nexus. When we have explained there we have the output get to the value of R squared and some off square over its duels for analysis. - We try one more word. Yeah, we did Why? I think we do eight times X plus three times expert love someone stripped lasts a mirror. So this is a polynomial progression. This is a polynomial in degree to if you require right then required X on required X squared. So you find X and explain not to the digression between why X and X square Forex with the left The column friend G So we get the output. You know, it is that the at square is marginally better as compared to when we made a mortar with why people behave this way. Capture these radios in our somebody shape I would try one more your eyes equal toe into X plus being to explain life, See into X cube. Love some constant class severa. So this is a dig deep on a number you find ex explain and X Cube Oh, now we form our in the immigration modern. You see that the art square is partner into so very much They could look at the sum of square residue. It's off three models we just trying. Now we will try last mortal There we will take love off. Why is equal to eight times x plus me times X square plus sometimes log off Thanks. So you can notice that what I can do is that I'm trying different models and try to find out which modern fix, the greater the best. So this is the experiment which request we done. No matter what is the data provided to, you know, Bottle will pick the data immediately. There's a lot of experimentation. Just we got to come up with the border, which will pick the data properly on which can be used for making reliable predictions. Now we need to find love off. Why? X X squared and local fits. Check the equation. Once again find the love of life. Were using the function caught with the formulas for all the operations. Create our great aside for immigration. And now we conduct the regulation within the dam. In this the stool back. Why is in column here? Six GH There we have the output of the immigration notice that this model is better than all the other models we have tried so far. So we have to keep trying. Different models still become the model which is most appropriate. The other tractor, the also which we will discuss later in the Part two and part three of the course. The main purpose of this video was to elaborate the point that we need to try a lot off models before we can come to the conclusion which more than we will use for our prediction off data from a data cities that concludes our discussion. Thank you for this name. 27. Guidelines for Selecting a Model: Hello, Welcome to the class. My name is without any further today. Let's get started in the last video. I emphasize the point that we need to do a lot of experimentation before we can settle on a model that can be used reliably for making predictions on the given data. Now I lay down instead of guidelines, which can help in selecting the mortar for forecasting. These guidelines have been prescribed by statisticians who have practiced the art off forecasting from Time series data over a long period of time. These guidelines are a four step process where the steps have to be executed in sequence one after the other. In the first step, we perform residue analysis. We know that recipe. What is the difference between the actual value on the predictive value? Once we find the restaurants, we should check whether the restaurants are randomly distributed for the Time series data. If they randomly distributed the model Wilfred appropriately, the model would not be appropriate if the restaurants demonstrate are trained or a sigh click effect, or are seasonal effect. By graphing the residuals, we can check whether they are randomly distributed or demonstrator trend or a psychic effect or a seasonal effect. From the resident analysis, we find that the restaurants are randomly distributed for two or more models. We go to the Steptoe, very measured, the magnitude of restaurant through the square. Differences now spread differences denoted by S. Y X is equal toe the summation off actual value minus a predicted value old square. If you find that that's why X is equal to zero, then we can say that the model fits perfectly. However, if that's why X is large, then we can be sure that the modern will not fit. So we must let the model, which has at least s Y x. You'll notice that it's why exes computed by squaring ah, value. So that's why X analyzers individual at us in prediction very heavily. If you find that's why X to be similar for two or more models, we moved to the Step three in step three. We make sure the magnitude of rest to let up through absolute differences. We compute the mean afternoon deviation. Formalize mean absolute deviation is equal to summation over Isaac for to want to end off the absolute value off. Why I minus why I had divided by and hear why I is the actual value which is observed on why I had is a predicted value. If mean absolutely. Aviation is a good a zero, then model fits perfectly between absolute deviation is large, then the model does not fit. So we select the model, which has the least mean absolute deviation after this step, if we still have two or more models which have a similar and maybe we moved to the step for Step four is adopting the principal off parsimony principle of parsimony states start selector Simplest model. Now statisticians consider that the mortals of linear regression and for tragically questions are simple. I hope you find this guidelines useful. Try it out and see whether you get good models using these guidelines. Thank you for listening. 28. Outliers: Hello and welcome. My name is part O Majumdar. To perform analytics, we require they don't be collected over a period of time. The data may be collected by machines. The data may be collected manually. In either case, there could be possibilities that we may make some mistakes while recording are finding the data, these areas may give rise to out liars. Out flyers cost a beard. Difference in the results were obtained from any analytics. So we need to have us on our strategy as to how to deal with them. In this video, we discuss all about our flies, how to find them, how to deal with them. It's a trap. We begin our discussion. Trying to understand what are outliers and out loud is a data point that differ significantly from the other observations in the data. For example, if you take a random sample off, salaries are by different people. You may find that the salary of Mr Bill Gates is an outlier wouldn't be far outside the range off normal silence given Here are some common definitions of all tires. The first definition considered is that now clarity data point that life's outside the 1.5 . That's what I'll change. Now we have discussed. What is it like? Water rich are here in the court. A second commonly used definitional water is not in our cloud is an idiot a point that is outside Who standard deviations from the mean Now we know that when the day guys normally distributed 95% of the data lights within two standard deviations so they did outside the two standard deviations is don't have another player. It is intuitive that imprisons about liars The impact on centrality measures off mean and standard deviation is very heavy. This country understood by the fact that Saturday off Mr Bill Gates, maybe Puerto repeating 60% of the total salary of this up. However, the median value does not change too much. So for our purposes we will use definition runoff outliers. Next, jurors understand what are the common causes due to which outliers are present in data? The most common reason for our class to be present in data is due. Measure. Madeira's Michelle Maderas can happen due to man functioning off the missionary movement. This is a very common cause. Another reason could be that there could be a very indeed a transmission or data transcription. How clients can arise due to flaws in the theory that is a shoot. Now this situation calls for additional research to find the right theory to work off another common cause. Why out lies are presenting data eyes due to detain creators. Not this detective address can be intentional or it can be our intentional merited intentional. We call it a fraudulent behavior. Otherwise, we call it a human error when it is unintentional. In either case, the statistics we had derived from the data is really impacted, so we need to take care off these areas now. Class can be genuinely loved ones. Also. For example, if you consider that we're tracking sales in a particular department store, it may be happening that on a particular day there's a spike in sales. Are the number of hits on the website spike on a particular take our particular time in the day? Now, these request we handed through specific statistical processes on we cannot ignore this out flyers. So we come to the realization that there is no strict or reject mathematical definition by which we can save with what is in our clients to classify data point as an out liar, It's ultimately are subjective matter. How about we will, starting sometime defict methods for making these decisions so their studies of methods for detecting outliers the first matter with study is the graphical methods normally recreate now will probably plots like the one shown here when we plotted in a points in between. A point is far away from the line of reference we can classify doesn't out Liar. The second set of techniques are known as model based methods. There are many model based methods for detecting out liars. Hollenbach. It is outside the scope of this particular video. Discuss each of these methods in details. Look out for up separate video for discussion on each of these model based methods for detecting out flyers. It's you could research them on your own as well. The method you will use is to create box plots box plots in the hybrid method, which uses both a graphical method as well as modern best matter. You'll be family with box plots by now, as we have discussed it previously in the courts, box plots are very easy to create. They can be created using Excel or our programming are by found programming or most of the graphical display tools. Now let us discuss a technique we will utilize for identifying outliers. It is called cookies Fences. In this method, we first create the box plot. Now from the box plot, we get the first fortnight on the third quarter. First quarter is not just you want on Third quartile is marked as you treat you want is a 25% time mark and Q 3 75% time mark from Q one and Q three. We can determine what I've age or I Q. Are like you are in the quarto youth remind. Ask you what does not out lad as o. So the dinner point, which is greater than Q three plus K times like you are, or always less than you want minus gave times like you watch normally reconsider. Casey called the 1.5 to classify. It did not work as our client. If you consider case equal to three, then the real appointed sector be far out. Now that we have refined the method when identifying out liars, you see illustration off the same. So first we see the data which will continue for this illustration. So suppose we got this original data set for which we have to find out flyers. The first thing we need to do is to sort the data in the ascending order normally used to using a tool. So using the tools, you should be able to have functions by which you should be able to sort the data. So here I'm showing you the sort of data after having used such a tool in. Except so there we have the sort of data. Once we have this order data, we can easily find the median for this data media is the center. Most observations in the data. Now here we have 10 observations. So the sector most observation will be the average off the middle. Most observation. That is the 56 observation In Excel. You can simply use the function media. Once we have found the media, the next step is to find the 1st 4 time. The 1st 4 time is nothing but the middle five off the top half of the data. Now there was understand it does first wartime itself media enough. The top of the data That is the 1,000,000 point of the data In Excel. You can compute this passport tied for using what I function. Next, we find the third quarter or Q 3 $34 nothing but the medium of the lower half of the data. So if you take the major point of the lower half of the data, that will, he was a part for tight. However, when you're using Excel, you will find that you get the my new 19.75 because I find the third quarter night we only consider the data above then we be a Now that we have found first quarter and third quarter , we can compute like you are or they end up what I'll range. You know what I ranges? Nothing but you three minus two. You are now. OK, so this 20 minus 15 which is the 4 to 5. We know that for using cookies fences I cure is a vital information. Now that we have found a like you are, we can send the cookies fences. So for our flyers, Toki Spencer's US attacked Q one miners 1.5 times like you are on Q three plus 1.5 times like your. That is 7.5 to 27 point fight so we can see that we get to our clients 32 51. Next. Let's see if you have any fine oats to find the far hours we sent the troops expenses That Q one miners three times like you are on Q three plus three times like you are. So we get the values zero and 35. So when we draw the cookies fence, we find that the data 0.51 is a far out. Now that we're found out lads and far outs using cookies fences, you see their impact on our data. We do analysis of the centrality. First we find the mean off reader points. We see that in this 21 next to remove the far out on the mean without the far out works out to 17.667 Next we remove all doubt pliers on without doubt, Lioce. The main works out to 15 days, so we see that there is a significant impact due to the presence off applies in our now that we know how to find out Liars didn't see how to do work with these out flyers. The child's off. How we deal without liars should depend on the context. Now. Some estimators are very highly sensitive to our players. We will consider two techniques on how to deal with applies. The first choice we have is to retain out lies. We know that by the law of large numbers, when we have a large number of observations to date out normally tends to me a normal distribution. So even when the normal distribution is appropriate for the data being analyzed out, fliers are expected in large samples it so they should not be automatically discovered. We should use appropriate techniques for dealing with them, as we have only discussed earlier, the second choice that we have used to extrude out players. Now this is a very controversial option, and many scientists do not agree with this option. So the world we have mathematical criteria which can help us and excluding out flyers objectively on using quantitative methods. Nevertheless, when the observation sect is very small, visual, not considered exclusion also, whether the mission set is not morning into a normal distribution vision. Avoid exclusion. Option. With this, we come to the end of the discussion. I hope you found this useful. Thank you for listening. 29. Degrees of Freedom: Hello. Welcome to the class. My name is not without any further today. Let's get started. Earlier in the course, we discussed many techniques for making predictions on time cities data. We saw that each of the techniques produced a lot of outputs. We also discussed what those outputs men on in brief touched upon. What is the implication off those outputs? However, those are puts needs to be studied in a little bit more details and theoretical background of the same can help making better decisions in this city's off bonus lectures. We discussed those outputs in a little bit more details, both from up here critical on practical point of view, I started this discussion series with the discussion on degrees off freedom. First, let's go through some definitions off degrees of freedom. Generically dignities or freedoms is required in any physical modelling. It is the number of independent ways by Richard, and a mix system can move without violating any constraint imposed on it. In other words, degree or freedom is the minimum number of independent coordinates that can specify the position off a system completely. Let us look at the mathematical definition off the same in status takes the number of Davies of Freedom is the number of values in the final calculation of the statistic that are free to very in general. The degrees of freedom, often estimate of a parameter, are equal toe the number off independence cause that going toe the estimate minus the number of parameters used as in the big news strips, in the estimation of the parameter itself, this will become clear through an example most of the time, the sample variance as of end minus one degrees of freedom, it is computed from, and random sco's minus one parameter estimated as an intermediate step, which is a sample mean you don't understand this through an illustration. We know that the formula for sample variance is Sigma Excite minus exact, where excited, the mean of the sample hold squared, divided by end minus one. So as we stated, this esque wet is a sample millions on exact is the sample mean trouble is formula. It is clear that to calculate the sample variance the sample mean has to be known. Suppose that and is equal to fight on sample minutes equals 20. Then we know that summation off all excise is equal to 22 5 or equal do 100 knife very x one x two x three Export then X fight has to be 100 miners x one project Stupid X three blood export Because X one projects to predict three projects, more plus x five has to equal 100 so we can vary four million wins. Order the words the degrees of freedom is five minus one is it will do for which is in minus one. Thank you for listening. 30. Normal Distribution: Hello. Welcome to the class. My name is not without any further today. Let's get started. Whether you have studied, started stakes on mathematics or not, these shapes should be very family after you. This bell shaped cover so much using regular life on business we know it as the normal Distribution Cup is also known as the Godzilla Investigation Coat. In this video, we'll discuss in details about normal distribution number. Distribution is the most common. Continues distribution. It is represented by the Bell Shaped Co. It was understand cartoonist distribution a little bit more. Countries distribution is the distribution of continents. Variables continues. Variables are measured why street variables are counted as an example. The time taken to download of a bridge is measured on not counted. Probably have a page will download in 7 to 10 seconds can be computed, for example, while it can be downright in, exactly eight seconds is always zero Normal. Distribution is very important in statistics. Numerous continous variables common in business for have distribution that closely resembles the normal distribution. So normal distribution has a very wide application in business. The normal distribution can also be used to approximate radius district probability distributions. A normal distribution has a very close relationship with Central Limit. Europe Centrally material state start When the number observations increase to a very large number, their distribution tends to be normal. It's a normal distribution provides over righties off statistical inference. Next, let's look at some properties off the normal distribution. We have also stated that the normal distribution is a bell shaped cove on. As a result, it is symmetrical in its appearance in the normal distribution, the measures of all central tendencies like mean media more are identical. Middle spread is equal to 1.33 standard deviations that unavailable associated the normal distribution has a range from minus infinity to plus infinity. The figure on the right should be understandable from the people's discussion we've been having in this course. Now let's get into the meat of the discussion. A normal distribution, probably density function can be stated as follows effects is equal to one divided by a squared off to buy Sigma into e to the power. My nose half X minus mu by sigma holds quit Here mule is the population mean and signifies the population standard deviation. Since this formalize fairly complex. We have a normal distribution cables by which we can gather the values off effects. We will discuss that later in this video. Since the valley off mu and sigma will be different for different distributions, we have something known as the satellites. Normal distribution sunrise. Normal distribution has identity function as follows. We have like the transformation, Zach is equal to X minus mu pi sigma on the normal distribution density function here uses the population mean and sick miser populations under deviation for the sunrise unavailable exact. The mean mu is always equal to zero on Saturday. Aviation is always equal to one. So that's analyze normal probability Density function is upset. Is it toe one by squared off to fight in tow me to the board minus half ZX Quick on the right. You can see those sanitize Normal Distribution Co See here Musical aceto on Santa Deviations ar minus one minus two minus 3123 etcetera. As you would have realised, computing the value of the normal distribution function for different values of X is quite a task because of the complexity of the formula. So we have the table for the standardized number distribution. You'll find different normal distribution tables in different books. This table shown here is a table from zero. Does that I introduce this table to you because I find this table to be quite con sized and easy to use. Let me introduce you to how to use the stable. First, find the population mean mules. Then for the given data, find the populations under deviation Sigma. Once we have mu and sigma, we can find the value of exact foreign argument X For those Abia pain In Step three, we can find the value from the Sanders Normal Distribution Table. Seeing the normal distribution table on the Y axis, we have the values in one digit off decimal on the X axis. We have the same values for two places of decimals. So it said in the table provides the same value for Z is equal to any number which has got to places off decimal. For example, the top left corner off the table is the value for Z is equal to zero. Part 00 The cell in the first row. Second column. It's for their physical do, 0.1 and so on, Let's see the your violation of the table through an example. Suppose musical to 50. Also suppose sick Mexico 10. Now for this distribution, we want to find probability that excess less than equal to 55.2 we first find zag there is equal to 55.2, minus 50 divided by 10 which is equal to 0.52 So on the table we go to the road where we have 0.5 on. We go to the column where we have zero partisan or two. This makes 0.52 So we have ah, value that is, you know, point 1985 Now remember that the total space under the normal distribution curve is equal to one. So when we have for people to 0.52 if X is equal to 0.1985 probably that existed as an equal to 55.2 is 50% plus 19.85% which is 69.85%. Liberals visualize this through a graph, so here we have a normal distribution curve for this particular distribution. We draw a line accident 0.52 on the X axis. The area on the left is equal toe the probability that X is less than or equal to 55.2. Again. Remember that total area under the curve is equal to one you're Let's take another example . Suppose we have musical to 50 and stigmatizing 10 as we had before. We want to find the probability that existed on 55.2, so intuitively you can gather that probably that X is better than 55.2 is equal to one miners. Probably that X is less than a record of 55.2. Now. We had already found property that X is less than 55.2, so we can use this equation to find that probably that excess metal and 55.2 is 30.15%. The graph on the right should help in visualization. Next, let's take another example where X is less than mu again. Supposed me with 50 and said, My stent, we want to find probably that X is less than equal to 46.7, so you find the value of that. Did you think what do? Minus 0.33 So on the standard normal distribution table, we find up, said Banning. For 0.33 we see that the value is 0.1 to 93 So the probability that excess doesn't equal to 46.7 is equal to 50% minus 12.93% which is equal toe 37.7%. Remember that this is the zero to that graph. Next, let's visualize this. The area in the orange shape is the area where probability that X is less than 46.7. Notice that we have drawn the particle. I not acceptable to my No. 0.33 Now we consider one last example. Suppose music 1 to 50 on sick bicycle to 10. We want to find the probability that X is between 46.7 on 55.2. Now, the probability that X is later than 46.7 and less than 55.2 is equal toe. Probably the nexus that's their people 55.2, minus the problem in the legs. That's not equal to what is explained. Seven that is extreme 70.85% minus 37. But there s one person is equal to 32.78%. You can see the regionalization in the ground provided here. The idea between the green line and Orange Line is they're really looking for. I hope this introduction enables you to find the problem is for any other problems that may be possible in this particular space. Now we can calculate these probabilities using excel. It's well except provides a function called nam dot s dot list. Very upto pastors that value on possible nominee truth to get those probabilities for different bodies Off said, I leave it to you to try this out for yourself. Thank you for listening. 31. Standard Error of Mean: Hello. Welcome to the class. My name is not without any further today. Let's get started in this video we discuss about standard Iran off mean way have used this in our analysis earlier. We know that smaller the standard error off mean the better chance that the prediction will be more reliable. The minutes said to be unbiased. This is because the average of all sample means of a given sample size and will be equal to the population means let us verify this through an illustration for the illustration. We take our review. Does suppose we tracked the data regarding number of it us made in typing the same document by four different type ists? Let the diapers be a BCD. Let the number of arrests made by typing Big 3214 respectively. So we are afforded appoints. Let's name the mass x one x two x three Arm x four. So this is our population. Let's first find the population mean mu. We know that the population mean is since my excite divided by n so for our case, the population mean is three plus two plus one plus four divided by four, which is 215 Next, let's find the population standard deviation. We know that populations 100 aviation is the root means predator that is root off mean off square. So we have computed our population mean on the population standard deviation. Now let's find the sample means we will take anything for the three for the illustration. So I tablet the sample, which will be a sample number. The type is considered in the sample Outcomes under sample mean we will determine. So we have all the combinations here now for the outcomes. For each of the outcomes, we can find the sample mean by taking average of the three of the rations so the sample means are given here. You can calculate it yourself to verify the same. Now we find the mean off the sample means which is nothing but up. Average off the sample means that we have just computed. So we find that the mean of the sample mean is nothing but 15 which is equal to the population. Mean so this proves that the population mean is unbiased. Next week, computer Sanders later off Mean no. First, let's see the history Graham off all the standard. It does not me calculator if you see that it is the idea, normal distribution, that is, if the sample is normal, distribution of the sample address would also be normal. The starter dinner off mean is nothing but the population center deviation divided by a squared off. And from this equation, we gather that as the sample size increases, the standard it off me reduces. Now you have noted that we made a sample with replacement. Now we can use the same equation for computing standard error off mean if the sample is generated without replacement. This is possible, provided the sample size is at the most 5% of the population. Now that's very fight. Whether this equation hold scored with the data that we have got, as there is a lot of computations in more, we will use Excel for this. So let's go to extent. So here we are in Excel. I have planted the samples, as I had showed you earlier. The sample means are also planted on this shape, So the plan took a computer standard it off meat. On the right hand side, you can see the original data that is a tie. Pissed at the number of errors they had Me first you can be with Doug. I mean, off sample means now we can start computing the Saturday there off. Mean for this Be compute the sample. Mean minor stuff. I mean, off sample mean? So there we have it. Now we take the spread of this value for further competition. So there we have it. Now, what we must do is we must take the summer off all this quest. So we have to sum up on this quest we need mean off some off this quest. So we will it the sum of all this quest divided by the total number off values that is 16. Now, we need to find this credit off this to get the standard air off me. So there we have it. This it'll 0.65 in stuff doing all this calculation, we could have also usedto except function. STD not be. It should not be gives the centred aviation off the population. So if you give the range, you see that it produces the same value. Little 0.65 So now we know how to compute this value in the Romana. But you could be very difficult to compute this if we have to consider a lot sample or later having large number off data points in the population. So we will use the formula that we last discussed. That is to find the standard error off mean we could take the population center division divided by a squared it off. And I see that we found a population mean now this population mean is the same as what we computed from death every day. Off stand sample means like we had mentioned before this proof started the population mean is unbiased. Now let's find the population standard deviation. We can use the function SCF Dr B. So we get the population started aviation. We know that we're using separate size off three. So we find us credit off that. So we get this credit off three. Now we can find us standard a write off mean by dividing the population standard deviation by square root off. And so you see that we get this value which is the same as we have computed using their own mechanism. So this proves that the formalize walking you can try the same for and is it will go to Now that we saw how to calculate the standard error off mean, let's see, what is the distribution off the sample means as we had discussed earlier. If the data is normally distributed now, distribution of the sample means will also be normally distributed. Gonna see if this is the case now. I've worked out the data for anything. 123 If you use a history Graham off. Great analysis. Tool back. You can get the distribution off. The sample means I hope you have worked out for any quarter to also on. Then you can get the distribution of the sample. Means is look at the graph here You can see that the sample maintenance incredibly normal also see that as the and increases the sample means a more closer toe. The population mean thank you for listening 32. Confidence Interval: Hello. Welcome to the class. My name is without any further delay. Nets gets got when we have a very large number of observations. For example, coming from a production life, it is very difficult to establish the population means so we need toe go by your method of estimation regarding the population meaning we try to establish the population means by knowing a confidence by which we can establish what its value would be. This is where the concept of confidence interval comes into picture. Why? Doing immigration palaces? You have noticed that we mark the confidence interval is 95% 99%. That's a drop. Now we will see what exactly this is. But before we discuss about confidence Interval, we need to discuss a few toppings. We have discussed our normal distribution in the previous lecture. Now we need to find out what is the X value given that expected probability is known to us . Also, we need to find out what will be the X values that include 95% or 99% of 90% of the given data. Once you know the concept, you would be able to do the same for any person. Date of the given data only are given a particular value off X. We found the value off said. Now we do the reverse. We find the value of X had knowing the value off Zach. Suppose the probability 35%. That is with people center 30%. As we're using up the photo, Zack, a normal distribution charge on supposed mules 50 on said my. Instead we find the sect. So for 35% we find the date on the table from the table. We find that there are two values which in a few, 35% we find that that is between 1.3 and 1.4 So we can assure you that that is the midpoint of these two. That is 1.35 Now that we have found sec, we can find eggs. We know that they're physical two x minus mu my sigma, which means X is equal Do mill plus X sigma. So we get X is equal to 60.35 In this particular yes, finding Zach is far more simple. If you're using Excel, Excel provides a function called Nam not s not I n b using which we can find the value off that we need to possible. You're the probability, and it gives us the value off Zach on. Then, once we found the value of set, we can establish the value off X. I have a vast excel along with the lecture so you can play with it. Now that we know how to find the value of X for a given value off that let us see how to find X values that include a certain percentage of the given data we take 95%. Doesn't example. We need to find the lower and upper values of X, which would include the percentage of the given data that we're looking at now. This a lower and upper values should be localism because around the mean well, you know, low value of excess excel on upper value of excess. X you for our illustration. We will consider musical 50 and signifies equal do 10. When we're finding 95% off the given value that ex includes, we have to note that 2.5% of the values that we below excel on 2.5% of the values will be about excel. So this is a chart. This is a certain normal distribution chart. We are considering a special case where Music for the 50 Answer my quarto. And now we find that Zack for extent is equal to minus 1.96 on set for X you Is it going to plus one for +96? So having established the value of Zack for Exelon for X, you we can proceed to compute the valley off Exelon Excuse. So Excel is equal to mu plus six Sigma. That is 50 plus minus 1.96 in tow. 10 which gives us 30.4 excusing for two mil plus exit market physical 15 plus plus 1.96 into 10 which is equal do 69.6. This means stock between X off 30.469 point 6 95% of the given date. I is included. Similarly, you can find out what will be the X values, which include 90% of the given data, or 99% of the human data. Now let's move forward in our discussion on confidence in Devon. First, we need to find those egg for a sampling distribution off mean we will understand this through an example. So both things we have a production line where we are making boxes off medians. Suppose the machine requires to back 3 68 grams of cereal in each box. As we're in the production lines and I'm amount of it. I will be large on my law. Large numbers. We can say that Britain will be normally distributed. Let this be assumption as well. We have a target that we want to have each box off. 3 68 grams, that is Population Ministry 68 graphs with the standard deviation off. 15. This is a production line on, so we must repackaging thousands of boxes every day, as we discussed earlier under this circumstance, governing the population mean it's very difficult. Population mean is important to us because that is a target. We want to fact 3 68 grams per boss, so we take samples at random to determine the sample Mean in the sample standard deviation . Selecting sample at random is not a big issue on. Once the sample is established, we can find the sample mean and sample standard deviation. Let's start with an example, maybe want to compute the probability that the sample of 25 boxes will have a mean below 3 65 grams. This will be our desirable condition for the production line. Now, as we are fighting, the value for the sampling distribution off means six formula becomes there is equal to expert minus mule off expert about it by six mile fix. But here expert is the sample mean UX bodies, the mean off sample mean and sick. My expertise, the standard deviation of the sample mean from our discussion on normal distribution we know that population mean is unbiased, so we can take me enough sample mean is equal to the population means. Or we can say that Mueller is equal toe mu off expert. We also established that Sandra deviation of sample mean is equal toe populations under deviation, divided by a squared off. And where N is the sample size so we get there is equal toe expert minus mu divided by Sigmar, divided by squared off. And here meal is the population mean signifies the population center. Deviation on N is the sample size. So now we have our formula for Zack. So for our example, there is equal toe 3 65 minus 3 68 divided by 15 divided by squared off 25 3 65 Is the value off mean off the sample below, which we want to find. Probably this gives us there is equal to minus one. Now, using the Excel are using any other method you could find that why'd for is ended? For the minus one, the probability is 15.88%. So there is a 15.88% chance that the sample mean off 25 bucks. A sample will have less than 3 65 grams off cereal. Now let's start our discussion on the confidence interval, which you want to establish on estimate for the sample mean well taken illustration will find 95% confidence in government. So first we need to find the range off means off eggs for the sample size 25 where 95% of the lutalyse they should be simple. Now we know that we want a mule off 3 68 grams and a standard deviation off. 15. The SEC for except is minus 1.96 onset for excuse plus 1.96 So we get Excellency Quarto 3 62.1 toe on X U is equal toe 3 73.88 So if our sample mean lives within 3 62.12 and 3 73.88 we can be confident that our production processes doing okay. Let us mark out these values or no normal Distribution Co. So we have a known limit off 36211 toe on our parliament on 3 73.88 mean at 3 68.0 Now suppose we pick up a sample of 25 boxes and find that the mean er's 360 DuPont three grams , then the estimate for mule with me between 3 56.42 on 3 68.18 We find that this interval includes the population mean off 3 68 So we should be confident that the production line is okay. Now, suppose we take another sample of 25 boxes and here the sample mean is 3 69.5 grams. Then the mu should be within 3 63.62 on 3 75.38 If you plot this, we find that the population mean off 3 68 which is our target still lies within this range . Now let's consider the third case when we take a sample of 25 boxes on the mean off the weight is 3 60 grams. Then I estimate from you would be Mueller is between 3 54.12 and 3 65.88 Now you notice that the population mean does not lie within this range, So they should. We got a lot out. His assumption is not working with this body grow sample. Under these circumstances, we should check our production line for correctness. The confidence in Dublin symbolized by one minus. I'll find 200% here, al five The fortune of the tail that is outside the confidence interval, for example for 95% confidence interval 2.5%. Neither side is outside the confidence interval. So we get expert minus set in tow. Sigma pi squared off end. It's less unequal. Toe mu is less than equal to export. Plus they're times Sigma divided by a squared off. And so this is the confidence in German. Let's see an example. I suppose we find that expert for a sample side of 25 boxes is 3 66.7 We only know that signifies equal to 15. We want to find the 95% confidence in developed. We use our formula. So get up Exelon next to for the sample meeting so we know is that 3 60.82 is less cynical . Toe mu listen equals 3 72.58 We know that music within the range off 3 60.82 and 3 72.5 million at this range includes 3 60 80 which is our target. So if we get a sample enough three sixes export seven. We should say that our production line is working properly. Next, let's compute the 99% confidence interval for 99% confidence interval. Zed will be equal to minus 2.58 and plus 2.5, so we get us the number between 3 58.96 I'm 3 74.44 I hope the calculations are clear to you by now. Lastly, we take an example to find 90% confidence intervals here. 10% rely outside the confidence interval that is 5% deny the site. So we have to find sex for 5% on for 95%. You notice that political tu minus 1.64 on plus one points export. So we get millions between 3 61.78 on 3 71.62 Thank you for listening. 33. Next Steps: Hello and welcome. My name is part O Majumdar way have covered a lot of ground in this cause, making numerical predictions for time series data for one off three. However, this is just a people die spoke also reused accent in this particular courts. Now, when we have very large data sets, we cannot lose except as there is a limitation on a number of very low standard can support . So when we have lasted assets, we will have to do some programming language. We will deal with our language and fighter language in this program. In part two, we will see how we can use for our language and in part two, you will see how we can use by Tom. However, I hope this court has held in understanding the concept which we will be requiring later on . Also, this goes way are mainly covered. Two topics The first topic the cupboard waas moving averages. I'm next. We went into a reasonable details regarding linear regression. We will see both these topics in more details In part two. Also, we will discover new areas that is logistic immigration Demming Regulation record Neural networks on many more in part two and part three of this program. The most important aspect that we discussed it discloses that we need to experiment with various methods and techniques before we can settle down to a matter which you use for making predictions. So they explored the New York Knicks and continue on our journey. He's joining me in the post, making me medical predictions. Broad time TV State of heart to off three, then good bye. Happy learning. Happy predicting. 34. About Me (Optional): hello and welcome. Let particularly short journey off myself. I am basically a programmer, a computer programmer. However, over the years off writing computer programs, I view everything in life as programming. With that it is playing a game of cricket, a billiards off, whether it is running a department or the company. Everything for me is programming. I like to look for patterns on. I normally find patterns on. Then I make those patterns if we table so that it can make a winning formula in whatever the situation is. So I publicly I say I am just a programmer. My name is far too Majumdar. I'm a Bengali and so we pronounce her last a s o. So it's part Oh, Majumdar. I'm basically from Calcutta. However, I've been living in Bangla or bring Lulu for the last 10 years. So I can say I'm from Bangalore recently, I've been spending a lot of time in Riyadh, so the odds has become a second home for the time being for me, I've been walking and competence after industry for close to 30 years now. I started from the time when we used to use eight inch floppy disks the Winchester disk off 3.5 inches was a major innovation at our time. Now we have progressed a lot on We are in the age of iPhones and Ipads. This journey has bean very exciting on. I have thoroughly enjoyed the same I still continue to enjoy as I'm learning every day. Asking is so much to learn as there is so much development going on in the industry at all points of time. So this is a very good industry which has been for me and I have been fortunate to be part of it. I haven't even realized house time has just flown by. In all these years, it's everything. Seems like it just happened yesterday. My programming journey began in 1989 when a cricket tournament was held in India. All the Jamaal Alero Centenary Cricket Cup. This tournament was played by six Nations. On it was hosted a trust 19 locations in India. Now this tournament wants to remarry through computers, hasn't discharged by the dead Prime minister, Mr Rajiv Gandhi. So we started the coverage relation. They're not being done for the Serie Olympics On on the same lights, retracted automate are computerized economic management for this tournament. This for this job? I was contacted by Center for Developers Telematics Acee dot on walked on the very big names BU Slipper novel A trade on IBM Sarita Lights A total developed us off in this tone. Oh, man, You know the first time when Elektronik scoreboard was used in India on that scoreboard was and starred in Eden Gardens in Calcutta. So my career had a good start and subsequent care got opportunity to walk in very exciting products. Between 19 to 1996 I walked on my knees. Projects betrayed off different industries. Sometimes it was food army. Sometimes it is from Russian exact operation off the hospital are for our tea gardens. It's a truck now. This enabled me to learn a number off domains over these years. Off my information, I got the opportunity to go abroad for the first time. When I was set, The timeline to don't up a mountain me to solution for a company called Computer Integrated Manufacturing Company are simple. Then, after I worked with government of Thailand project on then you local and then I got sent to , you know, for the first time in Belgium, where I was part of the Euro conversion project in 1996. Aside Total Access Communication's Thailand to develop my end to Angelica billing system. This is a huge Project Areas game wants to replace a very popular daily compelling system from Kingston. Now this software took us three years to develop that we had more than 100 and 25 people in the deep I've used bobbled a Informix tuxedo Crystal reports, etcetera. For the first time in this project, the product was awarded The best Are Soft Red. It's out Estacio. Ever since 1996 I've been purely walking in telecom domain, another telecom billing domain, and I have provided solutions to work 40 telcos around the world. Now here are a list of few telcos when I have given solutions on this. Include very very number of solutions, including mobile money. It's a trap. I also helped many telcos to start up like Getco, for example, for the first time, I was part of a technical operator when I joined Mobili in Saudi Arabia in 2019. Though I am still associating with some telecom operators for giving solutions are mainly working on, government predicts in Saudi Arabia. So I've been working with Customs, R. M. C. I. T. Or Ministry of Communication Information Technology, Mr Planning, etcetera. And these are all big data solutions. I've been fortunate to work in the company's. They stood here, reaches the company's I got the opportunity to develop number of software number of software pronounce on baby involved in very many projects on I learned a lot during my stay in each of these companies in 2014 I started my own company, Imaginary Consultancies, Private Limited. Now this was treatment by circumstances, personal needs, ambitions and try on a lot of luck. Ever since, I have partnered with the do vibrates with this man to form a sign Solutions on. Now I have also partnered with a company in Saudi Arabia, tools and solutions to work as a director in that company. My work has taken me to many countries in the world. Besides, I've got the opportunity toe physically near the country's for pleasure. Now, doing all my directions, I have been able to work with people of very many number of countries, much more than I visited on each of them having a very enriching learning experience on the education front. I'm still study. I've enrolled in master of science artificial inclusions. I recently completed Postgraduate diploma in Business Analytics. Andi, I'd love mook. I have over 300 certificates from Oak and I continents. The study from book It helps me to refresh my knowledge and also enhance my knowledge. I highly recommend move for everyone. My current hobbies include share trading. I've been training in chair since 1996 on now I've been applying the I. I love to look for patterns. I've been successful informing patterns which is giving the results of about 77% accuracy. I love blogging and I love traveling. I have ambition to travel to almost all the countries in the world. So for me it's a formal off aren't spent and share experience on again through the experiences shape. Thank you for this name.