Transcripts
1. Introduction: What does this course cover?: I Unwelcome toe A cause on statistics for business analyses are data science. My name is Karima on Army Business Intelligence analyst who are spent a significant amount of time walking on business intelligence types, such as statistics in this course swollen. What's no more distribution means on its importance. In statistics, we learn about central limit to your um, we also learned the students the distribution and finally willing out creates use that interpret confidence intervals. These are some of the indispensable tools you need when making business decisions. Relying on data, you must be able to make predictions on the uncertainty, and that's precisely what you can do after completing this costs. So what are you waiting for? Let's begin this journey together. See the
2. Distribution: inferential statistics refers to the metals that rely on probability theory on distributions in particle are to predict population values based on sample data. While this definition may not be completely clear just yet, literally in this lesson would define what a distribution is. I will go through a couple of distributions you will likely use at work. This will naturally lead off two points estimates. I will conclude this course with confidence intervals will not only present these topics but will also develop a deeper understanding off the statistical processes. This will help you agree. Deal If you decide to get into data science, I recommended complete all the exercises provided of the will reveal additional subjects you will be in charge off. Researching this insight will likely be your gateway to the fundamentals. Off constitutive Research on guitar driven decision making so less begin Before we can talk about this thing, we have to learn what a distribution is in statistics. When we use the term distribution, we usually mean it. Probability distribution, good examples in normal distribution, the binomial distribution on the uniform distribution. All right, let's start with the definition. Distribution is a function that shoes the possible value off a very bold on our often for your call. Think about Gifford. I it's are six sides numbered from 1 to 6. We wrote it. I What is the probability off getting one. It is one out of six. So on sixth, right, easy. What is the probability off? Get into once again insist the same holds for 345 and six we have are in questions over getting each of the six outcomes. Now what is the probability of getting seven? It is impossible to get a seven where ruling singled are. Therefore the probability is zero. Okey. Let's generalize. The distribution off our events consists not only the input values that can be observed, but it is made up off all possible values. So the discretion off the events ruling a day, I will be giving in the following table. The probability of getting one is 16 or 0.17 the probability of getting to 0.17 and so on. We are sure that you have exhausted all possible values. When this storm off the probabilities is equal to one or 100% similar to what we discourse about getting it seven for all other values. The probability of occurrence is zero on that. The probability off ruling it die, by the way, it is called a discreet uniform distribution. All outcomes have an equal chance off a Korean. Okay, the probability distribution as in visual representation, it's the graph describing the likely would off occurrence off every event. Here is the graph, for example, discretion to understand that the graph is just in visual representation off a distribution . Often when we talk about distributions, the makings of the graph. That's why many people believe that distribution is a graph itself. However, that's not true. Distribution is defined by the underline. Probabilities are not the graph. A graph is just a visual representation. All right. After this short clarification, let's explore different example. Think about ruling to dies. What are the possible outcomes? One on 1 to 11 well into and so on years. It's a rule with all possible combinations. Syria plane again, where we're trying to guess the sum of the two dies. What's the probability of getting in some of 10 as this event is impossible? The minimum so we can get is too. So what's the probability of getting is some of two? There's only one combination that would give us a sum of two when both dies are equal to one so one other 56 to the outcomes, or 0.3 Similarly, the probability of getting it some off three is given by the number of combinations that give it some off. Three divided by 56 therefore so divided by 36 or 0.6 We can continue in this way and we'll have the full probability distribution. Let's city graph associated with it. Looking at it, we can easily understand that when willing to dies, the probability of getting its seven is the highest. Moreover, we can also compare different outcomes, such as the probability of gets in. It's in on the probability of get in a five. It is evidence that it is less likely we'll get a 10. So the examples that we saw here we're off discrete variables in the next lesson who focused on continues distributions as they are more common in interferences. See you there
3. Normal Distribution: Okay, here we go. So far, we've learned that a distribution of a data set shows off the frequency at which possible values or call within an interval. We also said that there are dozens off distributions experienced. A decision can immediately distinguish by no meal from a Poisson distribution as well as a uniform from an exponential distribution in a quick claims. Off plots in this cause do rather focus on the normal on students T distributions Due to the following reasons. The approximate a wide variety of random, very booze distribution of sample means with large enough sample size could be approximated to normal. All comfortable statistics are elegance, decision based or normal distribution insights of a good track record. If this sounds to general or technical, don't worry, I assure you things will be more easier once we get started. Here is a visual representation of the normal distribution you have surely seen in the more destruction before assistance. The most common one statistical time for it is Gaussian distribution, but many people call it the back off aspect is shaped like a bill. It's symmetrical on its mean media, and more are equal. If you remember skill nous you would recognize it has no skew. It is perfectly centered around its mean all right. It is denoted in this week on stands for no more diffuse signed shoes. It is a distribution on in brackets. We have the main and the variance of the distribution on the plane. You can notice that the highest point is located at the main because it's going sides with the mood. The spread off the graph is determined by the standard deviation. Now let's try to understand the normal distribution a little bit better. Let's look at this approximately. Normally distributed is to Graham. There is a concentration of the observations around the main, which makes sense as it is equal to the mood. Moldova. It is symmetrical on both sides. Off the main. We used 80 of the visions to create this instagram. It's mean as 743 on its standard deviation is 1 40 Okay, great, But what if the mean is smaller or bigger? Let's zoom out a bit by heart. In the origin of the graph, the origin is zero point. I don't need to. Any graph gives perspective, keeping Istana division fixed or in statistical jargon controlling for this town. A division a little mini would result in the same ship off the distribution boats on the left side of the plane, in the same way it began. Mean would move the graph to the right in our example. This result into new distributions one with immuno for 70 honest on a division of 1 40 on one with him in of 9 60 honest on a division off 140. All right, let's do the opposites. Controlling for the mean We can change this turn on division and see what happens This time . The graph is not moving. Both is rather reshaping. It lost on a deviation results in a lula discretion so more detaining middle on sina tales . On the other hand, the highest honor division will cause the craft to flatten out with less point in the middle. On more to the end or in statistics juggle, Fatah feels great. These are the basics of a normal distribution. In our next lesson, we use this knowledge to talk about standardisation. Stay tuned
4. Standardisation: a Let's talk about standardisation. Every distribution can be standardised, say they mean and variance for the variable are new. Are Sigma squared respectively. Standardization is a process of transforming this variable to one with a mean zero Palestinian division off one. This into formula allows us to do that Hooky logically, in no more distribution can also be standardized. The result is called a standard normal distribution. In the last section, we explored shifts in the mean understand our division. So if we shifted mean by me, you a list on a division by Sigma for any normal distribution will arrive. Understand that normal distribution? Great. We use the letter Z to denote it on our set. Previously, it's mean zero on Istana Division One to standardize Variable is called Disease Call on its sequel to the original variable mine. No, it's mean divided by its stunned that division. Let's see an example that will help us get a better grasp of the concept. Will take on approximately normally distributed set of numbers. 12233344 On five It's Minutes three on Istana Division 1.22 Now let's subtract the mean from all data points, we get a new data ST minus two miners one minus 100011 on two. Let's calculate inhumane. It is zero exactly as we anticipated. Showing that photograph we have shifted because to the left or preserve in its ship. Claire. Okay, so far, we have a new distribution, which is still normal, both within Green off zero. Understand a division of 1.2 to the next step of the standardization is to divide all the points by the standard deviation. This will drive just on a division off the new deter states to one. Let's go back to example. Well, the originality to sit on the world. We often after subtracting the men from each data points, are based on a division off 1.22 Remember, I did not subtracting values toe audit appoints. It does not change the standard deviation. Now let's calculates each data points by one points to two. We get minus 1.63 My No. Zero point. It's to my No. Zero point. It's 20.0 point 0.0 point 820.82 on 1.63 If we calculate Istana division off this new details, it's we'll get one on the mean is to you zero in terms of a curve. We kept it at the same position, but we shaped it a bit Great. This is how we can obtain is done. I'm no more distribution from any normally distributed data. Sit using it makes predictions on inference is much easier on. This will help us a great deal on what we'll see next. Thanks for watching.
5. Central Limit Theorem: so you have a population of use cast in car show. We weren't analyze the car prizes and be able to make some predictions on them. Population perimeters, which may be of interest, are mean car prices, standard division of prices, co variance and so on. Normally in statistics will not have data on the whole population, but rather just example. Let's join sample out on that data. The men. He's $2617. 33 cents. Now a problem arises from the fact that if I take another sample, I might get a completely different mean 3200 or $1.34. Then it's all mean off $2844 on 33 cents. As you can see, the sample mean depends on the incumbents off the sample itself. So taken, a single value is definitely so optimal. What we can do is draw many. Many samples will create a new deficit comprised of sample means These values are distributed in some way. So the RV distribution when we were friends for distribution from my samples to use the time it's sampling distribution for case becomes even more precise were to load a something distribution of the means. So far, so good. Now, if we inspect these values closely, will realize they are different boats cause entreated around is certain value right? Or our case somewhere around $2800 since each of the sample means are not import, approximation of the population means the Value T revolve around is actually the population mean itself. Most probably none of them is a population mean, but taken together, they give a really cool idea. In fact, if we take the average off, the example means we expect to get a very precise approximation of the population mean nice . Let me give you some more information. There's a lot of the distribution of the car prices. We haven't seen many distributions, but we know that this is not a normal distribution. It as a rescue, and that's all about You can see you have the big revolution. It sounds out that if you visualize, the distribution of example means we get something else something familiar, something useful in normal distribution. That's what the central limited or in states, no matter the distribution of the population, binomial uniforms exponential or another one. The sampling distribution off the men will approximate in normal distribution. Not only that, but it's mean is the same as a population mean that something we already noticed. What about the variants? Well, it depends on the size or the sample. We draw what is quite elegant. It is the population variance invited by the sample size. Since the sample size is in the denominator, the bigger the sample size, the lower divvy Reince or, in other words, they close at the approximation we get. So if you are able to draw bigger samples, statistical results would be more accurate. Usually for CLT to apply. No the sample size off at least 30 observations. Excellent. Finally, let's finish off widely. Centrally. Material is so important. As we already know, the normal distribution asked elegant statistics on on on March applicability in calculating confidence intervals on performance tests. The central limits Europe allows us to perform tests, solve problems I make a difference. Is using the normal distribution Even when the population is not normally distributed, it is Correa improve auditory revolutionized autistics as a feud. I will be relying on it a lot in the subsequent Less is, that's all for now. Thanks for watching
6. Standard Error: in the previous Listen, we showed that LaMotta on the line distribution of the data sets the distribution of the sample Mean would be no more with a mean, equal to the original mean and variance accord to the original variance. They better by the sample size. All right, this lesson will be very short on Has the sole purpose of defining what they stand at a row is the standard Aero Istana division of the distribution formed by the sample means, in other words, just on a division of the sampling distribution. So how do we find it? I know it's variance Sigma squared, divided by n. Therefore, the standard deviation is sick, modified by D squared off in like a standard deviation to start a row shoes variability. In this case, it is a very big area off the means of the different samples we extracted. You can guess that sells itself, has its own name. It is widely used and very important. Why is that important? Well, it is used for almost all statistical tests because it shows how well you approximated the Trumaine. More on that. In the next lesson, note that it decreases as the sample size increases this nexus as bigger samples give a better approximation of the population. That's all for now. Thanks for watching.
7. Estimators and Estimates: okey greats. Let's continue by introducing the concept of an estimate. Oh, off the population perimeter. It is an approximation dependent solely on sample information. A specific value is called estimate. There are two types of estimates. Point estimates on confidence interval estimates. The point estimate is a single number. While the confidence interval literally is an interval, the two are closely related. In fact, the point estimate is located exactly in the middle of the confidence in Savar. However, confidence intervals provide much more information are not preferred when making inference is turns. Worry will have separate listen dedicated to confidence intervals all right of recent estimates. So far, sure we have. The supplement expert is a protest in medical the population main meal. Moreover, the sample variance R squared was an estimate of the population variance Sigma squared. There may be many estimations for the same variable, however they love to properties. Buyers on efficiency will not approve them. That's the mathematics appreciated is really out of the scope of this course. However, you should have an idea about the concept. As the metals are like Georgie's. We're always looking for the most efficient on buyers estimate ALS on unbiased estimate o as unexpected value equal to the population parameter. Let's think of it by us as tomato to explain that point. What if somebody told you that you find the average height of Americans by taking a sample , finding its main on, then adding one foot to that result? So the formalize expire plus one foot? Well, I hope you trust them to give you an estimate. It'll but it by as to one, it makes much more sense than the average height of Americans is approximated jobs by the sample. Mean right? We said that by us off this estimator is one foot Claire. Okay, great. Let's move on to efficiency. The most efficient estimators are the ones with the least variability of outcomes from the estimates we know. So far, we haven't seen estimators with problematic variance, so it's hard to exemplify. It's enough to know that the most efficient means the unbiased estimator with the smallest variance. A final note. What making is about the difference between estimators on statistics? The wall statistic is the border team. Eight points estimate is a statistic. All right, this is how we can describe estimators on poor point estimates. Okay, so you've learned about point estimators, right? But as you can guess, they're not very reliable. Imagine 55% of restaurants in London. I'm saying that the average milk it's Walter turns Â£2.50. You may be clues, but chances are that the true value it really turns Â£2.50. But somewhere around it, it's much safer to say that the average merely London is somewhere between 20 and Â£25. Evidence in this week, you have created a confidence interval around your point estimate of change Â£2.5. The confidence interval is a much more accurate representation of reality. However, there is still some more certainty left, which we measure in levels off confidence. So getting back, for example, you may see that you are 95% confidence that the population permits are lies between 20 and 25 quid. Keep in mind that you can never be 100% confidence unless you go through the entire population. And there is, of course, if I've percent chance that the actual population perimeter is outside of the 20 to Â£25 range will observe that example, we have considered deviate significantly from the entire population. All rights there is one more ingredient needed. The level of confidence. It is denoted by one. My knows how far and it's called the confidence level off interval Alfa is value between zero and one. For example, if you want to the 95% confidence that the perimeter is inside the interval, how far is 5%? If you want a IR confidence level off, say, 99% are probably 1%. Don't worry, we'll discuss this in more detail in our next lesson. You can't wait until the next lesson. Okay, here's a formula for all confidence Intervals is from the point estimates China's reliability factor will supply by just on an arrow to the point estimate plus the collaborative facto multiplied by the standard Aargh! We know what a point estimate is Value like expert on expert, right. We also know what Istana Aero is. What about the reliability factor will have to introduce it in our next lesson. Thanks for watching
8. Confidence Intervals: the confidence interval is arranged, within which you expect the population parliament are to be on. Its estimation is based on the data we have in our sample. There are too many situations where we calculate the confidence intervals for population when the population barrier is known and when it's unknown. Depending on which situation we are in, we would use a different calculation method. Now the old feudal statistics is this because we never have population data. Even if we do after pollution, we may never be able to analyze it. It may be so much that it doesn't make sense to use it all at Worms. In this lesson will explore the confidence interval for population mean within noon. Various. An important assumption in this calculation is that populations no mileage introduce it even if it is not. You should use a large sample on lettuce central limit Syrian duty normalization magic for you. Remember, if you work with the sample, which is large enough, you can assume normality off sample means. All right, let's say you want to be committed. The scientist. I'm interested in a salary and mitigate. Imagine you have set an information, but the populations on the division of data science, salary is equal to $15,000 for the more you know, the salaries under my distributed in Suffolk assists off pretty salary. The formula for the confidence interval it in known variance is giving blew the population men will forbid 20 sample mean my nosey off our 40 guarded by two. What applied by the standard arrow on the sample mean Lawzi off our four divided by two times to stand at a row. The sample mean is the point estimate you all know about star? Not there already. So let's compute it with using the formula. What we have left is a so called reliability factor. Zero for four divided by two is a statistic we've described earlier is still alive. Variable That has a standing No more distribution, right? What about how far this is the same offer we are when we defined our confidence interval. So for confidence level of 95% are far be equal to 5%. Similarly, for a confidence level of 90 90 cents, how far will be equal to 1%? It's all fits into place now, doesn't it? Let's go back to our example. The sample minutes $100,200. Understand a division is known to be $15,000. Those based on an aero is 2007 over $89. I haven't calculated these values. We can take the next day and choose our confidence level. Common confidence intervals are 90% 95% on 99% which respect about 10% 5% on 1% on and I want to put a value of our for these 0.10 point 05 on 0.1 respectively. Keep in mind that 95% confidence level means you won't show that in a 12% of the cases, the true population parameter would fall into the specified interval. Okey Is he off? Are far come from so called stand on, the more distributed to book. It is best for seeds and then commenced on it. Let's say we want to find the values for 95% confidence interval after is 0.5 Therefore, we are looking for the off divided by two or 0.25 in the table. This one market value off one minor 0.25 or 0.975 Correspondence he Come from There's some of Ruan column several letters associated with this cell. In our case, the values 1.9 plus 0.6 or 1.96 commonly used him for disease is critical value. So we have found the critical value for these confidence interval. Now we can easily substitute in the formula. The final confidence interval becomes 94 towers on 833 205,568. The interpretation is the following were 95% confidence that every data scientist salary will be in the interval 24 hours on 833 on one of around $5568 less repartee exercise using the higher confidence level said we want to be 99% setting off the outcome out for 0.1 We look at the table for the value off one minus 0.5 which is equal to 0.995 There was no such value. When this happens, we just after round up to the nearest value available. The corresponding which can value is 2.5 plus 0.0 eats. There's we have 2.58 We plug it into our formula. Small on the new confidence interval is equal to 23,400 to 5 on one ran $7 on 106 listening that where 99% confidence at the average data scientist salary is going to lie in the interval between 23 times on or around 35 on 1 7000 $26. Please note that in this case there is a trade off between the level of confidence we choose on the estimation precision. The interview op tion is broader. The opposite is also true in our confidence in several translates to higher uncertainty makes sense, right. If you are trying to estimate the population, mean we are picking in larger interval, they're increasing our chances off having an interval. That actually includes the mean advice. Answer. If you want to be more specific about the population, mean range is to take away from our confidence about this statement. Okay, This lesson was a bit longer, but very insightful Please make sure you practice with exercises provided they will help you increase. Reinforce the knowledge about the concept, which is fundamental for anyone who wants to work with numbers in their day to day jobs. Thanks for watching.
9. Confidence Intervals Clarification and Student's T Distribution: Let's take a step back and try to understand confidence. Interval a bit better? Yes, he grabbed off a normal distribution. You know where the sample mean is in the middle of the graph. Now, if we know that a variable is normally distributed, we're basically making a statement that the majority of observations will be around the mean on the rest far away. Form it, there's joy, confidence interval. There's a liberal limits on the open limits. On that factors and confidence in several would imply that we are. Let's 5% confidence that a true population main force Within this interval there is 2.5% chance that it will be on the left off the lower limits on 2.5% chance it would be on the right. Alvaro. There was 5% chance. That's our confidence that our rule does not contain the true population mean so when Alfa is 0.5 or 5% we have often divided by two or 2.5% chance that the true population mean is on the left of the interval and to pull factors in chance on the right. Okay, great. Using is this school on the formula were certain form Eastern time. No more distribution. Therefore, the minute zero deliver limit is my nosey. While the upper one is easy for 95% confidence interval using is the table we can find out . This limits are on those 1.96 on 1.96 That's exactly what we did in the previous Listen, finally, the formula makes sure that we get back to the original range off the values and we get the interval for a particular data sets. Okay, what if we're looking at in 90% confidence interval? In that case, the interval looks like this and there's a 10% chance that the true Maine is outside interval, actually 5% on each side. This causes the confidence interval to shrink. So when our confidence is lower, the confidence interval itself is smaller. Similarly, for 99% confidence interval, who would have a higher confidence puts in much larger confidence in tomorrow? Let's see an example just to make sure we have solidified this knowledge. I don't know the HDR students, but I'm not tractors and confidence that you are between 18 and 55 years old. There's only far that you were taken on online statistics costs. There's no more information to begin with clothes. I don't have any information about eight of any of the students hiss Dwight Interval. Okay, I am not factor some confidence that you are between 18 and 55 years old. Also, I'm not make sense confidence that you are between 10 and 70 years old. Un represent confidence that you're between zero and 118 years old, which is the age of the older specimen alive at the time of recording. Finally, I'm 5% confidence that you are 25 years old. Obviously, this is a completely arbitrary number. As you can see, there's a trade off between the level of confidence on the range off the interval. UN represent confidence means that our role is completely useless as the most include all ages possible in order to gain 100%. Confidence 19 represents confidence gives me emotional awar. Arrange what it's still not insightful in love for this particular problem and five years old on the order hand is a pretty useful estimate as we have an exact number. But the level of confidence, or 5% is too small for Austin. Make use off in any meaningful analysis. There is always a trade off, which depends on the problem at hand. 95% is the accepted norm. As we don't compromise with accuracy, too much posted get a relatively narrow interval. It's time for a short break from all these numbers and calculations. I would like to tell you a story. William Gosset was on English statistician. What really bury off? Guinness developed different metals for the selection of the best union varieties of barley on important ingredient in making bay because it found example studios. So it was trying to develop a way to extract small sample. We'll still come up with meaningful predictions. It was a curious and productive researcher and published a number of papers that are serially wants today. However, due to this company policy, it was not allowed to sign because we just own name. Therefore, all of this work most gonna depend. Name students later on in front of these on the famous opposition who not Fisher stepping on the finance off? Cosette introduced the T statistic on the name that's talk with the corresponding distribution. Even today is students see the students. Distribution is one of the biggest breakthrough in statistics, as it's allowed in Ferris's two small samples. With an unknown population serious, this certain can be applied to big parts off the statistical problems we face today. And it's an important part of discourse, all right. Visually, the student's T distribution looks much like in the more distribution for generally. Hard for, the team's father tells us, you may remember, allows for higher dissipation of the reviews on. There is more uncertainty in the same way that is, the statistic is related to D standard normal distribution. That statistic is related to the student's T distribution. The formula that allows us to calculate eat is the which inventors one degree of freedom on a significant level off how far it cost to the sample mean Man of the population mean divided by the standard aero off the sample. As you can see, it is very similar to dizzy. After all, it is an approximation of the normal distribution. The last characteristic of incidences statistic is that there are degrees of freedom usually, for example, off end. We have n minus one degrees of freedom. So why example off 20 observation. The degrees of freedom are 19 more to Pakistan and no more distribution. We also have a student's T table. The rules indicate different degrees of freedom, abbreviated as the F while the columns comin out fast. Please note that after it's easier to rule, the numbers don't vary that much. Actually, after 30 degrees of Freedom City statistic table becomes almost the same. Others a statistic as the degrees of freedom depend on the sample. In essence, the begat example, the closer it gets to the actual numbers. A common rule of Tom is that, for example, continue more than 50 observations. We use Izzy table instead of the tea table. All right, in our next lesson will apply our knowledge and practice. Thanks for watching.