Basic Statistics for Engineers | Dana Knight | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Basic Statistics for Engineers

teacher avatar Dana Knight

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

30 Lessons (1h 30m)
    • 1. Lecture #1

      1:36
    • 2. Lecture #2

      3:24
    • 3. Lecture #3

      1:09
    • 4. Lecture #4

      4:28
    • 5. Lecture #5

      5:38
    • 6. Lecture #6

      1:33
    • 7. Lecture #7

      2:35
    • 8. Lecture #8

      3:27
    • 9. Lecture #9

      1:46
    • 10. Lecture #10

      1:21
    • 11. Excel #2

      1:25
    • 12. Minitab #1

      1:12
    • 13. Excel #1

      5:54
    • 14. Lecture #11

      3:17
    • 15. Lecture #12

      2:41
    • 16. Lecture #13

      8:30
    • 17. Lecture #14

      0:40
    • 18. Lecture #15

      1:23
    • 19. Lecture #16

      1:29
    • 20. Lecture #17

      7:09
    • 21. Minitab #2

      1:48
    • 22. Lecture #18

      3:17
    • 23. Lecture #19

      6:39
    • 24. Minitab #3

      2:08
    • 25. Lecture #20

      2:26
    • 26. Lecture #21

      1:10
    • 27. Lecture #22

      5:09
    • 28. Minitab #4

      1:20
    • 29. Lecture #23

      1:05
    • 30. Lecture #24

      4:19
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels
  • Beg/Int level
  • Int/Adv level

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

236

Students

--

Projects

About This Class

In this course, you will learn the basics of statistics, the types of statistics, and descriptive statistics such as the measures of central tendencies, measures of statistical dispersion, and measures of statistical shape. You will also learn about the normal distribution and how to standardize it. You will learn how to do hypothesis testing for a population and for a sample, for a two-tailed test and for a one-tailed test, and use p-values and confidence intervals. You will be able to apply tests such as the z-test and t-test using Minitab and learn how to interpret the results.

By the end of this course, you will be comfortable with statistics and be able to apply your knowledge in real-life. Anyone can be the ideal student here as there are no prior requirements needed.

Meet Your Teacher

Teacher Profile Image

Dana Knight

Teacher

Hi! I'm Dana. I'm currently a PhD student in Industrial Engineering. I finished my B.S. in Architectural Engineering and my M.S. in Industrial Engineering. Lean Six Sigma Green Belt certified. I enjoy learning new things. My research interest is Data Science including Deep Learning, Machine Learning, and Artificial Intelligence.

See full profile

Class Ratings

Expectations Met?
  • Exceeded!
    0%
  • Yes
    0%
  • Somewhat
    0%
  • Not really
    0%
Reviews Archive

In October 2018, we updated our review system to improve the way we collect feedback. Below are the reviews written before that update.

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Lecture #1: Hello and welcome. My name is Donna, and I'll be instructive for this course in God we trust. All others must bring the data. A famous cold by Dr Edwards. Stemming Doing is widely known as the leading management thinker in the field of quality. He was a justice dish in on business consultant whose methods help based on Japan's recovery after World War Two. All others must bring data. He shows you the importance of data there. It's used everyday in our lives. The topics will cover in discourse. Star First will introduce the different measurement scales, and the difference between the continuous and discrete variables will introduce the tax of statistics such as descriptive statistics and inferential statistics. The measures off central tendencies which will be discussed, which are mean median and mode, then statistical dispersion such as standard deviation and variance. A statistical shape which are skewed nous and cart. Oh sis. Then we'll also talk about some popular kinds of plots, such as history, grams and scatter plots. We'll also talk about the differences in regression and calcification tasks. Then we'll have done onto the normal distribution, and it's underscores. We'll introduce the Central limit Theorem and then the topic of hypothesis testing. I hope you will enjoy this course and gain valuable information. Thank you. 2. Lecture #2: Welcome back in this lecture will talk about the different types of measurement scales, a New America versus categorical Parametric versus non Parametric, and, lastly, the difference between population and example. Measurement scales are used to categorize on door quantify different variables. We have nominal orginal interval and ratio. Nominal scales are used to label variables without any quantitative value. The name nominal comes from the Latin Norman, which means name examples are gender, male and female and colors. You have red, blue, green, yellow on others. With orginal there exists, there exists order of importance and significance. They are usually categorical, meaning that they belong to a divine nable category. Examples are low, medium and high or satisfaction and Peyton Scales. Interval scales are numeric scales where we know the order, and differences in the numbers like one is numerically higher than the other. An example is temperature. The problem with interval scales in that they do not have a true zero. There is no such thing as no temperature. Without the true zero, it is impossible to compute ratios Ratio, such as weight or height tells us about the order on the exact value values units between units here they do have at absolute zero, which allows for a wide range of descriptive and inferential statistics. Data comes in two shapes. Either numerical or categorical numerical data or either discrete or continuous discrete numbers are integers. They can be counted, such as one apple to apples, three apples and so on. However, continuous data are infinite, they cannot be counted, such as 1.42368 and the numbers go on values between any two. Discrete numbers are infinite. Categorical can take can take numerical values but would not have any mathematical meaning like low, medium and high could be represented as 12 and three Parametric values are assumed to follow a specific distribution, while non parametric don't they. They use parametric tests where it makes assumptions about the properties off the population distributions from which the data was drawn from. While non parametric tests make no such a something, assumptions interval and ratio fall within Parametric while nominal and or dinner or denial do not Population versus sample. Let's say you wanted to know the percentage of smokers on campus off some university. It would be almost impossible to ask all the students Therefore, instead of asking the entire population, a sample is taken. Sometimes all you have to work with the sample. And with that sample you can make conclusions about the population you in for. That's why one type of statistics is called inferential statistics. Thank you. 3. Lecture #3: Welcome back in this lecture will learn about descriptive statistics, inferential statistics and the differences between them. Descriptive statistics describe the basic features of a data set. They aim to summarize a sample rather than use the data to learn about the population off the data that was withdrawn from so it doesn't care about the population. Onley summarizes the sample you have. They are described in the central tendency which will be discussed in the next lecture. It also described is described by the statistical dispersion off the data and they said shape off its distribution. On the other hand, inferential statistics is used when you are trying to read reach conclusions that extend beyond the sample you have. You conclude the properties often underlying prop probability probability distribution. By analyzing the data, you infer properties about the population from your sample. This includes hypothesis, testing and deriving estimates. Thank you 4. Lecture #4: Welcome back. Everyone in this lecture will talk about the central tendencies. So the measures off central tendencies are mean median and mode. Where the mean is the average of all the numbers that you have. Median is the center off the numbers that you have mode, the number that occurs most often within a set of numbers. So first of all, let's start with me and mean is the average of older numbers. Now you add up all the values in a data set and divide them by the total number of observations. So you add all the excise, x one x two x three x four up until x 10 and you divide them by the number. And so, for example, your classroom has six students. The test scores off out of 100 are as following you have 95 85 80 to 93 64 71. So water is the median off the test score. Well, you add all the numbers and divide them by six, which is their account. You got 81.66 So what does this mean? Between this means if you want to give everybody an equal score, everybody would would end up with 81.66 so that's what it means. Another example. Let's say the table below shows five different weights for girls in kilograms on five different weights for boys and kilograms. The question is, do females have lower weights than males? How did you know? So the girls weight in kilograms, the mean is you. Add all of them up. Divide by five because there are five girls. You get 68 on the boys, you add them all up and you divide by five and you've got 72. So didn't do females have lower weights and males? How did you know the The answer is no, because we cannot know. So the answer is we don't know because you cannot infer or no ah, property overpopulation, which is males versus females based on a small sample size. So, maybe, for example, you accidentally took out these sample off. Five. Andi, they do not reflect anything about the population. So without a large sample size, you do not know anything about the population. So you cannot make accurate assumptions meeting. It's the data center where 50% is on the left 50% is on the right. Andi, the middle number in a sequence off number. So your classroom has far students. The test scores out of 100 auras following 98 73 64 89 on a date. What is the median off test score? Well, the median would be 88 because you have 50% on this side and 50% on the side. So it is in the middle. Your classroom has six students on the test. Scores are 95 85 80 to 93 64 71. What is the median off the test scores? Well, since here you have to in the middle, it's an even number. So what do you do now? Here you take the two numbers in the middle, you add them up and you divide by two. So you take the average of the two numbers in the middle. So the meeting is 83.5. And to prove that the median is different than the mean, the mean awful. These numbers are 81.66 so they are different. The mode is the number that occurred most within a set off number. So let's say these are the numbers and this is a frequency right here. So the number that occurred most is the mode. But sometimes you can have two modes or three months, which is by model or try motile. So what is the mode in this given set? Well, the mode would be three because three occurred the most. 12 and three times. If two you have one and two If two occurred another time, which is three times you'd have a by motile answer, which is three unto would be your mode. Thank you. 5. Lecture #5: Welcome back. Everyone in this lecture will talk about the statistical dispersion or the variability in the data center. So first of all, uh, the r three measures of statistical dispersion we have a standard deviation variants range of portals for the standard deviation. It's how far away from the mean each observation is it measures the deviation around. The mean variance is how spread out the data is range and court tiles where the range represents a spread off the data, the difference between the highest and the lowest observations. And we'll talk about quarters in a bit. So the's standard deviation is the average off. How far each observation is from the mean, where this is the formula right here. So each observation minus the mean squared and you add them all together, divided by end minus one. So find the standard deviation off the following set of numbers. 10 63 11 5 and three. So you add them all up, so sorry, you take the mean for you. Add them all up and you divide by six. So you got the mean 6.33 So now you have them in. So what you do right now is each observation you did. You subtract the mean and you square it. Subtract the mean and you square it and divide by n minus one. And take the square root of that and you get the standard deviation. 3.44 means almost the average off. How the observations are. Divide the or away from the mean. The average is 3.44 So on average, each observation is away from the mean by 3.44 The variance represent the spread in the data, which is the standard deviation to the power of to. So you either calculate the sound aviation and square it, or you calculate this which is the same thing without the square root. So find the variance of the following set off numbers 10 63 11 5 and three. So, first of all, you get the mean Where is you? Add them all up. Divide by N. You get 6.33 and then you add used addle the numbers where you subtract each observation from its mean each observation from its queen Square it Andi, you divide by six minus one, which is an minus one and you got 11.86 Or you could calculate the standard deviation, which is the same thing, And just take the square root off that the range is a spread in the data and it's the simplest to calculate you take the max and you you subtracted from the men. You find the range off the find the range of the following set of numbers 10 63 11 503. So the range would be the maximum minus the minimum, which is 11 minus three quartile. Let's say you have this set of numbers right here. Remember, the median would be in the middle. So since this is this is an even number, these would be in the middle. So the median is this Plus this over to which is the average off these two numbers. So you have 71.5 the median. Now the median off this section is 64 this is called the lower court tile or the first quarter. I'll and you take the median off this section, which is 77. It's called the upper quartile, or the third court time, and the reason why we do this quartile is so we can find the outliers where, where they are. Observations or data points docked are, um, far away or not far away. But basically out liars they are. They do not represent the true data set. So we need to get rid of them, because if they stay in the data that they would skew the results off the data set, meaning, Let's give an example. Let's say you have 10 people from one until nine. The old I only have $1. The 10th person has $1 billion. So this is an out lawyer. He's an outlier because if you want to take the start of duplication or the mean, it would be a lot on. And technically, that could not be the case. So one million is the outlier, so you want to get rid of him in order to understand the data correctly because it could be wrong, that outlier it could be a mistake. So, for example, this is another reason why we use media and sometimes instead of mean, because if the same example, you have 10 people where nine of them have $1 the 10th have has a $1,000,000,000. The mean would be something in the middle liens or hundreds of thousands. Probably millions. Um, but the median would be one. Because you put them all, you arrange them all from lowest to highest. And then you take the, um, the middle number, which is one. So sometimes the median is more representative off the data than then. That mean, especially if the data is skewed to the right or left, which will talk about in a bit. Thank you. 6. Lecture #6: Welcome back, everyone. This is a short lecture and the measures off the statistical shape. So the measures of the city's school shapes our ski, Eunice and Curtis is where skill nous is a measure off cemetery or lack of symmetry in the distribution. Curto sis is the shape off the peak or measure off the peak goodness or flatness and the distribution. And lately some people have been saying that is the shape of the tail rather than the peak skew nous. So we said it measures a lack of symmetry and the distribution around the mean it represents the amount and direction off the skew. So, for example, this is negatively skewed distributions give to the left. This is the normal distribution where it is not skewed. Skewed nous is equal to zero. And this is positively skewed distribution or skewed to the right where skew nous is greater than zero. And this is the formula right here. Curto, sis is sharpness off the peak and where you have plati Kurt IQ distribution. Low degree off peak goodness. Car toeses is less than zero. This is normal distribution, which is missile Kartik or Miso Kartik, distribution worker toes is equal to zero. On this is left oak Arctic distribution, where high degree of peak goodness and Curtis is equal. Two is greater than zero. Thank you. 7. Lecture #7: Welcome back in this lecture will discuss some widely used plot to plot to visually plot multiple data points. First off the Instagram instagram is a diagram that consists of rectangles whose areas proportional to the frequency of a variable and whose with is equal to the class interval. For example, for the biggest data points right here you create a user specific been off the same range like 10 30 30 40 40 50. So the same range and you put each data point in its corresponding Ben. So, for example, the data points in deep in between 20 and 30 or 25 22 22 on 25. They have frequency off too, and same thing you do with all of them. So, for example, in the bend 20 to 30 you have two elements. So frequency off to so you have the with off the rectangle is equal to the width off the range The range of the ben on the high sequence to the frequency, for example, from 80 to 90 you don't have any data points, so you have a zero here. Scatter plot is a diagram in which the values off the two variables are plotted along two axes. For example, let's say you want to plot the temperature against the sales of ice cream. For example, the the higher the temperature, the mole more sales you'll have for for example, let's say, here you plot. This is the X axis, and this is the Y Axis, where X represents the temperature and why represents the sales. And let's say, here was a 0.11 point 9 11.9 where you have $185. So this shows you some kind of linear, really linear relationship. On some correlation, the data where you can see obviously you can see as the higher the temperature goes, the higher the sales goes. Also, a box plot, as we discussed previously, helped show the median the media, the quart tiles, which then help reveal out liars. And as we said, an outlier is a data point that is distant from other observations. It may be due to variability in the measurement or some kind of error, since out liars can cause problems In statistical analysis, they are usually removed. Thank you 8. Lecture #8: welcome back, everyone. So as a request from one of the students, they asked when to use what, like when to use mean went to your standard deviation when to use variance went to use almost everything. So let's go ahead and start now for the mean. If the question asks you for the average, like what is the average scores? Or if we wanted to give everybody the same score, what would be the score? So what is the average of whatever you're trying to find out? I think this is pretty straightforward. Now, if we wanted to know about the, uh, how the distribution is spread out, they would ask either for the standard deviation or the variance. Now, standard deviation of arrogance are pretty much the same thing. The only difference is at variance is the standard deviation to the power off to so the squared standard deviation. So you if we wanted to know how the data spread out from the MEEN, we used 100 deviation or variance where a low standardization means that most of the numbers are close to the being and a high standard deviation means that most of the numbers are not close to the mean so basically, how the data spread from the mean itself. So talking about trained, it's the difference between the two extreme values, the lowest and the highest. It's asked if if somebody asks you what the range is, they want a rough idea on how widely the data spread out. So basically the lowest value on the highest value the difference between them so that they can know how widely your data spread out now for skew nous. If the question asks you to know the shape off the distribution like are both sides off the mean symmetrical Are they alike? Or is one side more skewed than the other? So basically where your data is is the is like lying. So is your data. Is it lied between, like more most of the data's to the right off your mean or to the left over your mean So basically where the majority of your data lies and pertussis so basically carte ASUs, it tells you about the frequency off the data. If you remember Carter, Texas, which has a high peak, the frequency This is a frequency here. The frequency off the data is mostly centered towards the means. So that means that most of your data points are the mean under around it. And when you have a low keratosis, which is this the flat, the plati? Kartik, you can see that the data spread out. Yes, most of them are at the mean because it's a bell curve. But also you have data is frequent like a lot of places not just centered around the mean. So this is when you use pertussis. So basically, you know, I hope I gave you are off idea on how and when to use these descriptive statistics. Andi, please let me know if you have any questions. Thank you. 9. Lecture #9: Welcome back in this lecture. We'll talk about the We'll talk about random variables, so a random variable is a variable whose possible values are outcomes off around them. Phenomenon. Let's say you throw a dice. It's complete random that you get either one or two or three up until six. So it rules out certain cases were the quantity which the round valuable returns is infinitely sensitive to small changes in the outcome, so nothing can sway the outcome. If something made the outcome swayed, the outcome to a specific value than that is not a random variable anymore. So there are two random, variable, discrete and continuous discreet, like flipping a coin on you Got heads or tails or a dice, um, and continues is like height or weight. For example, when tossing a coin, you either get heads or tails, and it depends on these uncertain physics. Which outcome will be observed is not certain. It's complete random. The coin. The coin could get caught in a crack in the floor, but such a possibility should be excluded from consideration because again, around variable has to be completely random. So a random variable is a function whose domain is the sample space and whose range is the set of real numbers. So the lesser you want to talk, toss a coin example. Space would be either heads or tail on the range. Would either be a zero or a one, so heads or tail, which would take possible values such as zero or one. This is a random variable. Thank you. 10. Lecture #10: Hello and welcome back from this lecture will talk about regression versus conservation tasks. So for aggression with aggression, the output is continuous variables. For example, let's say you want to predict the price of a house based on the number off rooms. Do you have data to rooms? The house is 100,000 four rooms, 150,000 and so on. So the I think you want to predict how the continuous output. So with this kind of problem, you got some questions like predict the price of the house if it had two rooms for classification tasks. The outward is is a discrete output, either. For example, let's say you want to predict if an email is spam or not based on the number of exclamation points, we have the number of exclamation points. And if the email spam or not one means from zero means not spam. So if you have six exclamation points, it's most likely to be spam have to, or ones most likely to be zero. So the question here would be something like predict an email if it's come or not. If it had eight exclamation points and we'll show how to do this in Excel in in the next lecture. Thank you 11. Excel #2: here, we're going to show you how to forecast the prediction based on previous values. Let's say the question that we had earlier when we said What is the price of the house, given that it has two rooms? So on Excel what we do is there's a function called forecast. You say forecast. It asks for the X that it wants to know because these are exes and these are wise. You haven't X and a y. So you want to forecast this ax it ask for the known wise. He's already known wise and then it asks for they known exes enter, so the house would be almost 110,000 dollars. Now, let's say here for these family not spot. Let's say the same thing. You want a forecast? Let's say first of all, integer because either wanted to zero or a one so we don't want any decimal points interred Your forecast x It wants the X to calculate the Y and cover wants the known wise. Come on, be known exes. Yes. So most likely it will be a spam one is equal to spams. Yours equal to non span. Thank you 12. Minitab #1: Let's learn to get some descriptive statistics on many. Top Minute Top is a very powerful software for statistical testing and statistical tools in general. So we have this data set here from a question that will be having later in the course to have these latest and so what we do here. We say stocked. Basic statistics display descriptive statistics on we want for the column C one and let's go ahead. And yes, we want to me and we don't want this once under aviation, we want variants minimum maximum. We don't need thes medium first quartile in thirds closer. And let's also get skewed us and keratosis. So we do OK and OK, so here they are mean is 28.76 Standard deviations 12.26 Andi maximum is 57. Minimum is 14.10. Medium is 27.5. You see, the median and mean are different than each other. Ski illness and keratosis. So this is how you get the descriptive statistics on many top. It's pretty straightforward. Thank you 13. Excel #1: Hello and welcome back. This lecture will show how we get the descriptive statistics from Excel. So I just started out with putting these at random complete randomly. So let's start with the average with excellent you put equal and then average average double click on you, specify the range of walk you want, and then you close the bracket that enter. So the average off these numbers are 6.45 Now let's get the media equals medium also. Same thing. What it will do is it'll arranged from smallest to largest and get the the number in the middle so the median would be six. And let's also get the mode a load of this. It should be six because six has occurred three times and yes, it's six. And let's get standard deviation and feet of open brackets, same thing and for variance you can either this to the power of to square this or you can put variance and or this to the power of to it. Let's now try first quartile, So court tile You want the first court oil? Sorry, you want array first and then you want the first quarter, so it will be for the third quarter. Tile would be the same thing. See, Marie, or you put three for third court time. You have nine que nous, Which is how the, uh, diagram miscued returns askew and us off a distribution, sq. All these All right, so it's probably skewed to the oh, right. I took this positive and let's do cart Assis. So this is how you get your descriptive statistics on excel. And if you wanted to learn on Python, well, person works in packages, so we're gonna import a package for new miracle procedures. So import numb pie, which is the numerical package off python as np onda. We would also need, um because we want to plot them flop the graph. So import mark plot lived Don't part plot as plt. So let's start out with a completely random data that let's see a is equal to 123456 and 78 and nine, for example. So let's get the mean. So we'll say here mean is equal to a and P, which is numerical package, and you want to call the mean function from the lump I package and you want to get the mean off A and let's also do for media. So let's call it medium. This is a variable you can call it whatever you want. Let's see media and let's say standard deviation is equal to and peaches numerical package standard deviation off. A. These are functions so you cannot play with their name ing's. And let's also say variance is equal to N p dot Far off a Onda lost Lee Liska or tile, the school court title equals and Peter Person toil off a and you want to say the 25th percentile Onda. Let's also draw his diagram off the plot. That's a plt dot kissed for Instagram, and you want to plot a and you want to have nine Ben's Let's say nine Benz And of course, it's not gonna You know what? I'm not going to do this now. Let's go ahead and run this So you see here a This is your a Andi mean the meanest five Immediate Nous 54 Tyler's three Standard Deviations 2.5 eight unfair against is 6.66 So let's say you have here something like, let's call it a two is equal to 122333 Sorry. 12 23334444 Because we want to plot frequencies. 55566 on seven. So let's see here yield to your doctor his on. You want a to plot a to and you want to have seven bends So let's go ahead. And also we are We need to do killed he don't show. So let's go ahead and write act And this is how you're just a gram Looks like so one occurred one time to occur two times and have four occurred four times and so on. So this is your hissed a gram. So this concludes of the first section, which is introduction. Andi, I hope you enjoyed enjoyed it. Thank you. 14. Lecture #11: Welcome back in this lecture will talk about the most famous and widely used distribution, which is the normal distribution. The normal distributions, also sometimes called ghazi and distribution or bell curve. You can see why they call it a bell curve. It's continuous, probably distribution, where you get the probability of a continuous variable from area under the curve and in support. And because when distributions are not known for real valued random variables, they're always assumed to be normal, especially for real life problems. So this is the, uh, probability density function of a continuous random variable under the norm for the normal distribution. You don't need to have get any headaches from this formula. Will will teach you how to get probabilities from the normal distribution. So the pdf, which is probability density function, is used to specify the probability of a round of arable falling within a particle region off values, which is the probability, which is the area under the curve. So let's say, for example, what is the probability that you get X equals five off some problem and will give examples later in the, uh in the next lectures. So the properties off normal or you know, normal distributions are is that the media is equal to the mean, equal to the mode you can see here. The means equal to the median is equal to the mode because it's symmetrical 50 50% where 68% of all the values are in within one sigma, which is one started to be Asian and 2% with sorry, 95% off. The values are within Two Sigma on 99.7% are within three sigma. So assuming that a normal distribution with 95 of its students at school or between 1.1 meters and 1.7 meters tall calculate the mean and standard deviation. Now, given that it's a normal distribution, this should tell you a lot. So first of all, this is a plot. 95% are between 1.5 and 1.7 now, because this is normal distribution for the middle point, which is the mean or the media does that mean is equal to the median would be the midpoint between 1.1 and 1.7. So that means halfway between 1.1 and 1.7 making the mean at 1.4. Because if you add these off together and divide by two, you got to mean at 1.4 meters now the standard deviation since 95% of the students are between 1.1 and 1.7, its means that they are within two standard deviations on the right off the mean and two standard deviations on the left off the mean so which makes it forced under deviations. 12 three, and four. So one minus seven. Sorry, 1.7 minus one point 1/4, which makes each section which is one started aviation off 10.15 Thank you. 15. Lecture #12: Welcome back in this lecture will talk about why and how we standardize the normal distribution. Well, first, in the previous example of you, remember, if one student had the height off 1.85 then we would say his height has a Z score off three because 1.85 because 1.85 is three standard deviations away from the mean. So this is 012 and three. So the number off the standard deviations from the means also called this to underscore sigma or Z score. So again, the a person with a height of 1.85 would be called that he has is the score off three. So why do we standardized? Well, first, it helps us make makes decision and make decisions about our data. Serves as the standard by which all other normal distributions are measured and it's universally understood. So how do it start to dive? Well, first, X equals a mean or mu over sigma. So for example, here this is zero will be zero. This is standard deviation is 2020 20. So standard deviations 20 here. If you say 1000 and 30 minus 1000 and 10/20 you will get one 1000 and 50 minus 1000 and 10/20. You will get two and so on. So the Z distribution is a normal distribution with a mean, off zero. Understand the deviation off one. And like we said, 99.7% of its values Lee lie within three sigma deviations to the right and three sick motivation to the left. So a Z value or Z score represent the number of standard deviations that are particular value lies above or below the mean So, for example, if we say that X has, ah the value off the score off one, it means it is one standard deviation above the mean, if we say minus one than it is one starting deviations below the mean. So here's a standard normal distribution with percentages for every half off, standard division and cumulative percentages. So, like we said between one and minus 1 68% off, the data is between one and minus one, where between zero and 0.5 is 19.1% off the data you can see here 97 99.7% of the data lines between minus three and three. Thank you. 16. Lecture #13: Welcome back in this lecture will continue talking about the standard score off normal distribution. So we said, in order to standardize, we subtract example me and from the population mean and divide by the side of aviation. You'll understand what is meant by sample Mean a population mean while going through the course We said that Z represents how many standard deviations you are away from the mean. It also represents the probability off a particle er random value valuable occurring that is distributed normally. When you say what is the probability off X is equal or less than something you take the area under the curve the area to the left you thens using the see table. So let's say here you want to find the probability off this you take the area off the curve to the left, to the left. So we say, What is the probability Off z At 1.65 you come to the see table This is a Z table where dizzy is a standard I score of them in and we got it Uses he table to get the area to the left off c So 1.65 you go here 1.6 and this is the decimal 0.1 point 65 We should be 95%. And what is The area to Z is equal to zero, which is 50% 00 Because it's symmetrical. Z is equal to zero, which is demean. It will be 50% of the right, 50% to the left. So he wanted to go the other way around. We would say What is the at? When the when you have 90% of the data blow it. So it's like we reverse engineer. We find 90% here, so 90% should be around here, so it would be 1.251 point 28 Sorry, 1.28 In case you have a small probability like, let's say 0.5 you have two ways. Either you use the other half of the table where it shows you negatives, a values or the method we So now, since the normal distribution symmetrical, this means that the area less than 0.5 is equal to the area greater than 0.95 So the area equal to less done point is 0.5 is equal to this area right here which is greater than this is E, which means, which means if you find out which is he is 0.95 you can take its negative value, which would be on the here. 0.95 0.95 would be one 0.65 So this negative would be one minus 1.65 There was an example. The time it takes a driver to react to the brake lights on decelerating vehicle is critical in helping to avoid rear end collisions. This article here suggest that the reaction time for an in traffic response to a break signal from standard brake lights can be modelled with a normal distribution having a mean off 1.25 seconds. I understand the deviation off point for six seconds. What is this probability that the reaction time will be between one second and 1.75 seconds . So here 1.25 is what is called by the population mean where it determines that mean for the reaction time in general, not for not just for a small sample size, the first thing you should do is draw. So here you have 11.75 And this is the mean 1.25 which is called Mu. Since we're on the topic, the population means called mu on the sample. Mean is called X bar. We need to find the area between one and 1.75 What is probability that the reaction time is between these two numbers? So you take the area to the left off this minus the area to the left of this so first thing , you should do a standardize at 11 minus 1.25 over and standard deviation. You get negative. 0.54 and same thing here. 1.75 minus 1.25 over. Signal. You got 1.0. Uh, 686 So we use the Z table to find the area. We need the area to the left off this and the area to the left off This. So the area to the left off. Negative point. Because we need we need this. This is negative for the area to the negative. Off point 54 would be the area to the right off 540.54 on because we need the area to their right, which is imagined. This is 0.54 here, since we need the area to the right so we can get the area to the left off miners 0.54 Since we need the area to go to the right, it has to be the one minus the air going to the left. So the area under the whole distribution is one. That's why in order to get the area to the right off one off 10.54 it will be one minus the area to to one the area off 54 which is 70 which is points of in 0541 minus this number and we get the arrogant to the right. So which is off course, like we said equal to the area on the left off negative 0.0 0.54 here. So you end up getting this number on, uh, you end up getting 5675 which is the area between one and 1.75 Or you can use the negative Z table, which is the other half of the distribution where it gives you areas under 50% or these below zero. So if you want to find negative 00.54 it will be here negative. And this is a decimal point. So it would be 2946 So 2946 and you deduct them in order to get the area in between. So, another example The breakdown voltage over normally chosen dired off a particular type is known to be normally distributed. What is the probability that I adore your breakdown? Voltage is within one standard deviation off its mean value. So we want the probability off X between one standard deviation of each side off mu. So let's say this is one standard deviation one started division. You want this area right here, which will be new minus one sigma mu plus one sigma or one standard deviation. So the area between them, uh, of course would be this. So now we standardize with standardize X we're trying to find, which is X my mu minus sigma and mu plus sigma. So we want to find the area in between. So this is X and this is X. So this meal goes goes with this mule on the sigma goes with the sickness, we end up with minus one. Same thing Here you end up with minus one. Do you want to find the area between one and minus one? This is one and minus one. And from the Z table, you end up with the area off one is 10.84 On area of minus one is 10.15 They end up after deducting them with an area off 0.68%. And if you remember that the area or the data between one standard deviation is 8 68.26%. Which is this between One sigma once our deviation between the left and right off the mean . So you see you. 68%. Thank you. 17. Lecture #14: Welcome back here. You'll taken exercise question and try to solve it on your own. The amount of distorted water dispensed by a certain machine is normal. Distributor. Do the mean value off 64 ounce and a standard deviation off points of in a town's? What containers I see will ensure that overflow occurs only 640.5% of the time, which is 0.5% off the time 0.5 over that time. So try it. Just just give it a try on do. The solution will be in the next lecture. Thank you. 18. Lecture #15: welcome back. So here will talk about the answer off. The last exercise that we talked about to remember the question The amount of distorted water dispensed by a certain machine is normally distributed with the mean value of 64 ounce and a standard deviation off 640.78 ounce. What container side see will ensure that overflow occurs only 0.5% off the time. First of all, of course, you draw. So what is X here where overflow is only 0.5 So you need to find Z here, where the area to the left would be 995 Rosie is 2.58 You get it from the table. For now, we standardize we have See, we have me when we have sigma So X, which is what we're looking for, minus 64 because here, it says with the mean value of 64 over standard deviation should give us 2.58 c. Do the multiplication and the addition and you get X is equal to 66 hours. So the container see should be £66. That overflow records only 0.5% off the time. Thank you 19. Lecture #16: Welcome back in this lecture will talk about the famous Central Limit Theorem. Probably you've heard of it. Well, in case you haven't let me make it very simple for you. If you have a distribution where you have a large number off n, the larger and the more it'll be normally distributed. So when N becomes large, let's say for here, for example, you have, like a something in the thousands frequency, the more and as the more it gets to become normal distributed. For example, let's say you have three students whose weights are 70 72 65 kilograms. Now. Does this mean that they have a normal distribution? Of course not. But let's say that you you have 20,000 ah, data for 20,000 weights. Most probably it would be normally distributed. So the idea here is that when N is large, it approximately becomes normal, distributed when X is small to moderate and it's it becomes almost normal. So basically, that's the central limit theorem. Why, when n is large, it starts to become more normally distributed looking. Thank you 20. Lecture #17: Welcome back, everyone. This is our first lecture in hypothesis testing where we'll will gonna test a normal population with a known signal. So first of all, a statistical Harpal says, is a claim either about the value of a single parameter, several parameters or about the form of entire population distribution. There are two contradicted hypothesis. There's an old hypothesis which you want to try to prove or try not to reject would be the best answer. Try not to reject on the alternative hypothesis, which would reject our null hypothesis. So the null hypothesis is a claim that is initially assumed to be true. The alternative hypothesis is the assertion that is contradictory to the null hypothesis is the null hypothesis will be rejected in favour off the alternative hypothesis on Lee. If sample evidence adjust that the null hypothesis is false. If the sample does not strongly contradict the null hypothesis, we will continue to believe in the possibility off the null hypothesis. So there are two possible conclusions. We either reject the null hypothesis or we fail to reject. The null hypothesis is so we always learned by examples a manufacturer off sprinkler systems used for Fire Protection Office buildings claims that the true average system activation temperature is at 130. A sample off n equals nine systems went dusted yields example. Average activation temperature off 131.0 eight temperature. If the distribution off activation times is normal. Standard division off 1.5 Fahrenheit. Does the data contradict? The manufacturers claim a significance level off 0.1 where significant level is the probability of rejecting the null hypothesis. Giving it is true meaning the critical area, the critical area where the probability of rejecting the null hypothesis given that it's that the null hypothesis is true. So first of all, you draw you identify the parameter of interest, which is the true act average activation temperature. So this is the true average activation temperature is at 130 this is when when they tested nine systems, it yielded in 131.8 So the null hypothesis is that we want to kinda failed to reject that mu is equal to 130 on the alternative hypotheses that it is not equal to 130 So first of all, be there is something called test statistic. So we calculate the statistics where it is equal to X minus mu over sigma over the square root off n So 131 minus 130 over sigma, which is 1.5 over square root off nine is equal to 2.16. Now we said Alfa, that point 01 we do Alfa by to the reason why is because we are asking if mu is equal or not equal to 130 that means it could be below or it could be above 130. So here it could be above 130. It could be below 130. So we divide Alfa by two. Alfa could be all over to hear off over to here because we could be We could reject the null hypothesis above the mean We could reject the null hypothesis below them in. So we stayed the rejection region, where we find the Z that is equal here and see that is equal here Z at an area off 0.5 z equal to 2.575 and Z here is equal to minus 2.575 Remember, it's symmetrical, so you can find one and take the negative off the other. So we draw the rejection line where we draw the this area right here. So we draw the sea here and z here on this line here, that will be 2575 and minus 2575 I would find this the T statistic. We put it here now it is not in the rejection region, meaning based on evidence. We failed to reject the null hypothesis because it's not in the rejection region. Or we can say that these the the test statistics, which is 2.5 16 is less than see critical. This is a critical the absolute value. So since 2.16 is less than 2.575 we also say we failed to reject the null hypothesis. Another method is applying the P value method. You probably heard about the Pew Value will just discuss it more in the next lecture, but will say that the area to the right off Z talc is then Z Kulka is 2.16 through the area to the right off 2.16 is 0.154 and we multiply it by two. Because, Alfa, because this is a two tailed test where you have half of Alfa here and half of all for here , So you must apply it by 24 p value is more than Alfa. We failed to reject the null hypothesis And since Alfa waas 0.1 therefore we fail to rejecting all high process. Another method is finding the confidence interval. So this is the formula for finding the competence interval were mu plus or minus Z critical over times sigma over the square root of end So 130 plus minus Z critical which is to the absolute value to 575 times 1.5 over square root of nine and you take the positive of it and the negative You you add them up and you subtract them and get the two numbers. So you get 131.2875 and minus. Sorry, this should be minus 1 to 8.7125 and you check your this value right here that you were trying to test the X bar. And you see, that should be also 31 you should see that it is within the confidence interval. So you failed to reject the null hypothesis. Thank you. 21. Minitab #2: to illustrate this example on many tub, which is a very powerful statistical tool. And I suggest I recommend we highly suggest that you learn Let's do it on many tub. So first of all here we want to do is eat us which she'll find basis Statistics one samples . He does not only have one sample, so I want to have an put on put the data. So first Ah, you have sample size off nine. Ask for the sample size. We need the sample mean of 131.0 831 point theory known standard deviation 1.5 on the hypothesize means So we want to check. Ah, if it's equal to 130 or not. Right Options. The Alfa is Let's check. Alfa was 0.1. Okay, let's change. Although 2.1 so confidence level, it's 0.99 on demean is not equal to hypothesized languages right here. So let's go ahead and run, Doctor. Okay, so here you go. First of all, the Z value we got was 2.16 here to 0.16. The teat test statistic on the P value is 0.31 and we got for our PVR. Value is 0.308 which is close to point 031 So very have it. How many? Top. It was much easier to do sarcoma and you learn it. Thank you. 22. Lecture #18: Welcome back in this lecture will discuss what P values are. So the P value is the probability calculated, assuming that the null hypothesis is true of obtaining a value off the test statistic at least as contradictory to the null hypothesis as the value calculated from the available sample, the P value is the smallest significance level at which the null hypothesis can be rejected . Because of this, the P value is alternatively referred to as the observed significance level for the data to use the rejection region first, you must have a significance level of Alfa. They know hypothesis. H note is rejected If the test statistic value falls in the rejection region on his otherwise not rejected. This is the rejection right here, Rejection region, where this is ALF over to ALF over to because it's ah to tail test. So if you if your result lies here or here, then your tests IQ is significant. So you we reject the null hypothesis. So the P value is less than Alfa. So here or here we reject the null hypothesis. But if the P value was greater del Alfa in this region right here, then we failed to reject in all hypothesis. So what is the significance level? Well, significance level is all is decided beforehand by the analyst When interpreting whether the P value is significant or not, we need to know Alfa being used for the test. Usually Alfa set 2.5 points or one. So here this Alfa was said 2.5 so half of it would be here and half of it would be here. So the p value it was here or here then that means that we reject the null hypothesis. Also here, Alfa Doubtful. So in order to make what we just said, much more simpler P value stands for probability value. It indicates how likely it is that a result occurred by chance alone. If the P value is small, it indicates the result was unlikely to have occurred by chance alone. These results are known as being statistically different. A small P value means greater than chance alone, which means something happened just this significant. A large P value means that the result is within chance or normal sampling error. So nothing happened. Test is not significant. So if we had this, the significant level of Alfa. So this all would be the people, the the rejection region. And let's say this is the observed result. It would lie in the rejection region. That means that statistically significant. So this is the observed P value here and here are the unlikely observation. So meaning if ah sample or observed valuables here, it means it occurred by chance because it's very unlikely to happen. So when it occurs by chance, it means that it's statistically significant. Thank you. 23. Lecture #19: welcome to another lecture on hypothesis testing. So start off with another example I dynamic cone, Petrom, Attar DCP is used for measuring material Really resistance to penetration. Mm millimeter. Her blow as a cone is driven into pavement or sub grade. Supposed that for a particle application. It is required that the true average DCP value for certain type of pavement be less than 30 . The pavement will not be used unless there is conclusive evidence that the specification has been met. So let's state the less state and just appropriate type offices using the following data. We got the data from populistic model for the announcements off dynamic code pennant troll meter test values in pavement structure evaluation. And these are the values right here. The first thing we want to do with. We need to get the mean standard deviation, and we have an end off 52. So the mean off these numbers are 28.76 The standard deviation off these numbers are 12.26 on. We have 52 observations. The first thing is that we draw. We want to see if the d. C. It should be not less than 30. So the pavement will not be used unless there is conclusive evidence that the specification has been met. So it needs to be 30. So our null hypothesis is that me was equal to 30 and that here we only looking for the true average we're only looking for Ah, we only care about if it was less than 30. So 30 or above. We care about 30 or less. We are not interested in that. So the first step is that we draw here is 30 the mean and this is X bar, which is 28.76 Andi Sigma is 12.26 So they know hypothesis is that the muse equal to 30? The alternative is that it is less than 30. So this means because it's not not equal to this is only one side. It only talks about, the less the less so That means it's one tail hypothesis. One tail test will talk about one tail versus two tails in the next lecture. But you should know, since this is only concerned with one side of the distribution, it means it's a want tail test, so we calculate the test statistic value, which is this formula right here. X bar minus mule over 60 over sigma over square root of n, and you get minus zero 0.73 So Alfa is 0.5 If Alpha's not giving given, always assume it to be 0.5 now because it is a one tailed test. Remember, here it's a one tailed test because you only care about one side. You don't care about the upper. You only care about the lower for that's why here would sit with the Alfa. We don't divide it over to we keep it the weight, as does remember, it's a want tail test, so we stayed the rejection region. But the Rejection region is the reason here. Ah, which is the area C. C. Off the area off 0.5 so Z would be minus 1.6 45 And even though this is a two tailed test, we still care about this area just to be safe on the safe side. So we also calculate dizzy for 20.95 area of 95 So we get 1.645 So we have the rejection region where and you were between here and here is not rejected. Anywhere that lies here or here is rejected. So 1.645 and 1.645 on the do you statistic, the Z statistic is here. So it is not in the direction region means. And like we said, even if it's a one tail test, we do these two, uh, two sides for safety. So based on evidence, we beverage to reject the null hypothesis. What does this mean? It means we fail to reject that This is wrong. We fail to reject it. But that doesn't mean that we accepted. We failed to reject it because it is one out of several possibilities. Or we calculate the Zeke alc, which is Ah, this right here, Onda compared to the Z critical if it was less the absolute value, then we fail to reject in all hypothesis. Another method is that we talked about the P value method. So the area to the left off Zeke alc is Z Khan is point for the area to the left off. Easy cockroaches. Uh, this right here is 0.23 to 7. And since it is above 70.5 Alfa. Then we failed to reject the null hypothesis. Remember, if the P value was here somewhere here then we reject the null hypothesis. But since it is somewhere here, then we failed to reject the null hypothesis. Because Alfa is last done. The P value so also was 0.5 Therefore, we fail to reject inal hypothesis. Another method is the confidence interval where, as we stated before, this is the formula for 130 plus minus 1.645 times sigma which is 12.26 over square root of 52. You end up with 32.79 and 27.29 and explore which is to 8.76 which is this one right here? The mean off this lies in between these two, the confidence interval. Therefore it's within the confidence and interval. So we failed to reject inal hypothesis. Thank you 24. Minitab #3: hi. To illustrate this on many top let's go to many top first. So we imported the data here in too many top few 52 data sets. 52 observation. Sorry. And then what we do here is we go toe to starts. Basic statistics one, apply a one sample Z test. So we have one sample one more supple in each column. So our column with BC One on the, uh, known policy here. So the scientists sound education would be 12.2647 So 12 point 2647 2647 on the hypothesized mean is 30 on options. We want a confidence interval off. We said if it's not given, assume point 05 So 95 because ah 100 minus point or five is 95 we want the mean is less than the hypothesized mean which is here the mean is less than the hypothesized mean less And then we go OK, and then we click OK, and here it is. So we have a P value off. First of all, Z statistic off minus 73 So you hear minus 73 on a P value off point 233 in here 0.233 almost 0.233 So, basically, you can see here because this is a one tailed. That's why we didn't multiplied by two like the previous example. So you can see here that based on the P value and the Z value, is the statistic on the P value. That means we reject the null hypothesis. Thank you. 25. Lecture #20: in this lecture, we'll talk about the difference between one tailed versus two tailed tests. So when a two tailed test in a lots half of the Alfa, which is a significance level in one direction and the other half in the other direction, regardless of the direction of your relationship, you hypothesize you are testing for the possibility of the relationship in both directions . Remember when we said the null hypothesis would be mu is equal to 130 for example? And if we say that the alternative hypothesis is that it is not equal to 130 it means you're either looking for it to be above 130 or below 130 so in both directions. So, for example, we would like to compare the mean of, for example, to a given value X using a T test. A word it h not, uh is that the mean is equal to X. The two tail test will test both if the means significantly greater the X. If the test statistics is in the top 2.5% all bottom 25% off the probability distribution resulting in a p value of less stand 250.5 So the like we said they know happened, this would be mu equals X, and the alternative would be mule is not equal to act. So that way it would look for if it's above X or below X if, if X is below mu or above, or a both mule for one final test. There are lots all of the Alfa to the test to the testing off the statistical significance in one direction, you are testing for the possibility off the relationship in one direction and completely disregarding the possibility of a relationship in the other direction. For example, our no hypothesis is that the mean is equal to X one tail test will test either the mean is significantly greater than acts or if the meanest, politically lesson acts, but not both. So, for example, null hypothesis is that music with X, the alternative would be is a muse equal is greater or equal to X or amuse equal. It is less or equal to X. The one tail test provides more power to predict to detect an effect in one direction by no testing the effect in the other direction. Thank you 26. Lecture #21: welcome for the normal distribution. We used two types of tables either the Z table, like we talked about in the previous examples or the tea table. So what is the difference? First, the Z score uses the population standard deviation, while the T score uses sample standard deviation. If the population start to deviation is not known on the sample, size is less than 30 than the tea tables used. If the sample size was above 30 then the Z table is used due to the central limit theorem. Remember the limits of the central limit theorem. When N is large, it becomes more of a normal distribution, so you can use those etait the table. But if we have end less than 30 on the population, standard deviation is not known than we should use the tea table. So we use the tea table. If sample size is less than 30 on the population, standard deviation is not known. But if the's population standard deviation is no or the sample size is more than 30 then we use is he thank you 27. Lecture #22: welcome to another lecture on hypothesis testing. So now we're going to use the T test or the tea table as an example, because we learned with examples, glassy role is a major by product, off ethanol fermentation in wild production and contributes to the sweetness, body and fullness of wines. This article right here includes the following observations on glycerol concentration for samples, off 100 quality uncertified white wines. These are the samples. Suppose the desired concentration value is for this example. Data suggest that the true average concentration something other than the desired value. So our null hypothesis would be muse equal to four. Our alternative hypothesis would be muse equal is not equal to four. So it's a two tailed test. So these are our samples five samples and is equal to five and we have an Alfa off 50.5 So the first thing we should do is find the average off those samples under standard deviation on. We calculated how we learned we have them. You mean is equal to 3.8 on the sand. Intubation is equal 2.712 The first thing is we should draw, so we have you here is equal to four and we have the export is equal to 3.8. So the prompter of interest is the true average cholesterol concentration. And like we said, the null hypothesis is that musical toe for the alternative hypothesis is that it is not equal to four. So it means it's a two tailed test. So because first of all, we only have five observation, so end is less than 32nd of all, we have the standard deviation off the sample, not the population. Therefore, we use the t test the T statistic. So first of all, you calculated the same way X bar minus mu over sigma mine over Ah, square root of n. You got minus point 58 And since Alfa Since this is a to tail test, alphas are off over two 0.25 Now, here's the difference between the key test or the tea table and the table. Remember, with Z table we used to get our area was area to the left or with the tea table with the tea table. You got the area to the right. So remember this. So if you want to something to the area to the left. You'll have to deduct one minus the area to the right. So we have state the rejection region. So we have Al for over two, which is 20.25 and something called degrees of Freedom. So the degrees of freedom here is n minus one, which is five minus one, which is four. So we look for T at two, tailed 0 to 5 or Onda at four. So we look here. This is a degrees freedom of four, which is an minus one. And you look for here. So either you have two tailed at Al physical 2.5 or after you, uh, divided by two. So you have it here at 2.776 So that's your critical value. So we draw the rejection or line between the critical value off 2.776 and 2.776 and we take the t statistic, and we put it here. So it is not in the rejection region, meaning we fail to reject the null hypothesis. Another method is that we compare the absolute value of this with the absolute value off this and since it is less than we failed to reject the null hypothesis, another method is the confidence interval where this is the formula. Right here it's the same formulas. Dizzy, critical. So four plus minus This t critical time standard deviation over route square root off and And you got four. 4.89 and 3.10. And you you compare the the mean here that you got the meat of the samples 3.84 with the confidence interval. And since it's within the confidence interval, you failed to reject the null hypothesis to remember when you're dealing with the confidence interval, you compare it to X bar when you're dealing with t critical. Or let's say you're when you're dealing with the camp with the rejection line, you you compare it with teeth statistic. So here, compared teach just statistic with t critical T statistic with three t critical. And here the confidence interval. But with the X bar. So thank you 28. Minitab #4: welcome back. So regarding this example right here, where this is a sample where we apply the T test because we only have five samples on the population started. Aviation is not known, which is Sigma. That's why we apply the Tita. So let's go ahead and apply it on many top. So these are the five samples. So you do stopped basic statistics. One sample t test and then our column is C three on our hypothesized. Mean is four on options. We want a P value off. Sorry and Alfa off 0.5 So 95 100 minus 1000.5 and we want the mean is not equal. Two for the mean is not equal to the hypothesis. I mean, okay. And then Okay, run. So we have here. The T value is minus 0.58 Same here, minus 0.58 on the P value. We did not calculated here, So the P value is point almost 0.0.6. So we failed to reject the null hypothesis. Thank you. 29. Lecture #23: Here's an exercise question for you to apply. What you learned Lightbulbs off a certain type are advertised as having an average lifetime off 750 hours. The price of these light bulbs is very favorable, so a potential customer has decided to go ahead with the purchase arrangement unless it can be can cute conclusively demonstrated that the true average lifetime is smaller than what is advertised. A random sample of 50 bulbs were selected. What conclusion would be for a significance level? Off 500.5? The mean of the sample is 738.44 and the standard deviation is 38.20. Just a clue, since we already have the population. Sorry, since we have the standard deviation for a sample. But we have end. It's more than 30. You should apply thes e test, not the Tito's because and is more than 30. Thank you 30. Lecture #24: welcome back. So let's answer the question that we took in the previous lecture. So to say it again, like bulbs off a certain type, are advertised as having an average lifetime off 750 hours. The price of these bulbs is very favorable, so potential customer has decided to go ahead with the purchase arrangement unless it can be done. Could conclusively demonstrated that the true average lifetime is smaller than what is advertised. The random sample of 50 around himself example of 50 bulbs was selected. What conclusion would be for a significance level of 500.5? The mean of the sample is 738.44 and the standard deviation is 38.20. So this customer will buy these light bulbs only if it can be proven that, um, that the average lifetime is 750 hours. So he will decided to go ahead with the purchase unless it can be conclusively demonstrated that the true average is smaller than what they have advertised off 750 hours. So first of all, we draw 700 hours, and Sigma is, uh, Sunday invasions, 38.20 on the mean of the sample is 738.44 The parameter of interest is the true average lifetime of light bulbs, where the null hypothesis is that muse equal to 750 on the alternative is that mu is less than 750. Because here, he said, unless, uh, the price of these bulbs. So where is it? The true average lifetime is smaller than what is advertised, so lets go ahead. First of all, we calculate the test statistic value, which is X bar minus mu over a standard deviation over square root of an They get minus or negative 2.14 and your Alfa is 0.5 So we stayed the rejection region for the 0.5 It will be between negative 1.645 and 1.645 So here is 1.645 or negative 1.645 So we draw the rejection line and we take the test statistic on DSI. Work is on the rejection line. It is outside the rejection line, so in the rejection region, it's it is outside Theo acceptance line here is in the rejection region. So based on the evidence, we reject inal hypothesis, which is? We reject the doctor the lifetime. The average lifetime is 750 hours. Another way we can solve it and see if the's eat statistic is lesson see critical, which is the absolute value to 0.14 is definitely not less than 1.645 So that means also we reject the null hypothesis. Now, based on the P value method, we take the area to the left off the statistic which is the area to the left off minus 2.14 it will be minus zero Sorry, 0.162 Since the P value is less than Alfa, we reject the null hypothesis. So therefore we reject in a hypothesis another method. The confidence interval were mu the population mean plus minor Z critical times. Standard deviation over square root of an mu is 750. The population mean z critical. We got it from the Alfa. You're significance level as under deviation, a square root of mean you end up with 785.886 and 741 113 So you see where your mean is act , which is this year 738.44 The mean for the sample that you took, which is for the 50 light bulb. It is outside the confidence interval outside the confidence interval. Therefore, you reject the null hypothesis. Thank you.