Statistics Lesson  SPSS  Outliers & Boxplots
Ton Van Voorden, MSc.
3 Lessons (16m)


1. 12
5:36 
2. 12
3:33 
3. 12
7:03

About This Class
What are outliers and what are boxplots. How to use Boxplots in order to find outliers?
Class Projects See All
Use the attached dataset to try it yourself. In the future more exercise videos will come. Â
(Read More)
Transcripts
1. 12: Hello. There's a storm of order, your statistics teacher and this video. I'm going to explain you something about outliers and the effect. So at first, what are our allies? Let's see a simple example. That's to universities. Unit 51 in New University, too. Any university want 100 students of 19 years old? 2021 22 23. So if you calculate the everts off this sample size, you will get X bar ever. It's equals 21. University, too, also has 500 students. But if you will calculate the efforts off it, if you don't use this yet and you will get expired equals 20. But now university, too, has done some special promotion for elderly. And that came than people from 80 years old. And I joined the university. So whatever happened to the mean? It's only 10 people like you see 101 101 100. It's like 500 only this 10 people. We'll make the effort go to 21 points 176 And if you look really good to the averages, then you see that a new university to Evert off the people is higher than at every state of the people is in university to hire, as in University one. But what's important is what kind of investigation are doing like this. People from 80 years old. It's our only 10 people, so it's only then people like and in the university one. There's 100 people off 23 years old and has no people of 18 years old. And that's also really important. And for your investigation, is it important to really check on the elderly? Or do you just want to know something about the university students, which are younger? For example, if you do an investigation about student housing, taking this 10 people into account, UH, well, school up your investigation? Because in that case, like people from 80 years old candles, including into the student housing, I don't expect them to go sleep in dorms or something. Shooting dorms so you can do analysis in SPS about it and its analysis. You will see that this group it's possess call it like in statistics it's called. They're called out Elias, and in a lot of cases it's better to or remove them like two options or remove them like in my last example. If you do something investigation about student housing, I would recommend to remove them. But if it would do an investigation about about studying behaviour off, all the people and younger people on like the older people are maybe more serious. And that's your hypo teases. Then, of course, you can include this 80 year old ones, but don't include him as 80 year old ones change his data into lying the least old group. So then you would change this data into 10 off also 22 years old, and remove this like in changing it. This way you will see that the mean off to cope would be much lower and just calculating it now for you. And I mean for just become around 20 said that, you see, it's like 20 plus. So then I mean will become 20 plus just a little bit higher, s 20 and that's more like it compares into what's the actual data actually is. So what you learn about in this video is a lot of things about outliers. It's important to check for our allies. Sometimes you see your data that people are 165 years old. Of course. Then something went wrong. You just have to lead this as data. But if they are like 80 years old, they you have to check. What kind of investigation are you doing? And how you were treated Outlines what you order lead them or which you lower them to the highest normal group in the next videos. I'm going to show you how you can do this in s possess. Never explain more about a box both and used on box full to check for outliers within its Be this. Thank you so much for watching this video.
2. 12: Hello. Is this storm of order, your statistics teacher and this video? I'm going to explain you how books brought work over here from every day that you could make a box float and a box plots just looked like this part This place you see that little box and in this part off the books blows 50% like that. Think barred 50% off there observations full so you can split it into half. You see, like a little split up. And this one is the median. And it is like the first quarter. And it's the third quarter over your first quarter, third quarter and from the box, either toe risk us coming out of it. They're called whiskers, and in one of the brisket is 20 almost 25% in the other one. Also 20 Almost 25%. Because there are including in the 25% as some off the hour lioce And over here see outlines, it means like they're so far away from the median. And this to also that you should reconsider this values. For example, if I have, I just put here temperature, for example, in see, like the median temperature. Probably. Also, the evidence would be around eight. And you see, like in 50% of the cases, the Edwards will be 50% of the gates. The temperature would be between six and 10 and like in almost all the cases between four and 12. But I like five cases in this case, uh, where that's not the case. And sometimes, like if you're still quite close to the whisker, you see like little balls with If you go farther away, you will see a star. And like on the other side, it's exactly the same. A star means like outline was really, really, really needs attention. But actually, all our lives needs attention. And you have to think about how you're gonna undertake action onto it because statistical techniques don't allow outliers. See the previous video of my example about AIDS on universities. What happened, toe Evertz. If damn, just come. A couple of people would join graph like, really high eight. So what can you do with this points? Actually, you should lower the points or you have to delete the points if you think like this kind of temperatures could not exist. But if you think that they would exist? Most books he commends that Put the points lower to level off the whisker. Put those two feet over here and put those two over here so they still have a quite good effect. But not that enormous. In this case, even the warm days, we'll have an influence. And the cold is also but not too much on your analysis and in their the way your analysis will be perfect In the next video, I'm going to show you how to do this exactly in SPS s. So stay tuned for more information. Thank you so much for watching this video.
3. 12: Hello. This form of order, your statistics teacher. And in this video, I'm going to explain something about outliers. In the last videos where we've checked the frequencies objective, there's their values. Like like, for example, there's a distance and the distance will be minus five, which would be impossible. So we have removed them and this video I'm gonna let you I know how you can. Let s p ss check those things for you. First, you still have to do the frequencies, but sometimes data quite far away from the normal range of data. And SPS s can tell you which day that those are and I'm going to explain to you how to get this result Sweden SPS So have you got SPS? Analyze And we go this time to graphs, legacy dialogues and we go to box brought over here. You select a simple is already selected and you have to select the below one summaries off separate variables. Normal. It's on groups. We use the separate of Ebel's and over here, you see, I already selected all the skill variables and all the skill that goes into that boxes every present because this works only for scale variables. Just best. Okay? And then get my output over here on what you see is, for example, if you look to the nimmo of exams, you see, that's ah, no, his name off Examine. People can have zero. Uh, the highest they can have equals 12 but the normal amount will be in between zero and five . So there's one thing this 12 value, which you see over here, will be a really high value. And if you think about yourself about it, you see also a star you see, like a little circle and start means it's outlined would really far our liar and probably you have to do something about it. So now you can do two things you can call the person, like asking. Did you really have 12 exams these days? But most of time you don't do that. You just I would. The lead is, and in this case, because it's it's really too far away. And he you see 136. If I go here in this list to 136 I see at name of exams 1/12 in this case, I would believe, But it's always your own judgment. What you can do to is instead of the leading it check What was the If you think it's a right value, but then out lies are in general not that good for SPS. Especially not if it's a star, you can decide Like, yeah, this person has a big influence. If he really had 12 exams, let's make it the same influence was another person with five exams on a day which also already really a lot. In my opinion, you can also change it. You can or delete it or created into a five. So if you created into a five you you see this person as someone who had just a lot of exams and if you delete it, you should delete it. If you think this is a really warm probably maybe he had like zero exams. He just filled in some strains. Number on the data is corrupted. In this case. I lose. Leave it at a five. If I look to the next day that I see med, a nice person going from 1 to 7, like all the normal fairies will be between the below whisker and above whisker in this case 17 And you see that there's 1234 out liars having a nine. And I think it's quite possible that you can meet a nice person and nine is not debt emits more at seven. So in this case, I will just leave it to this number. Its objective wanting to eat a burger or great. First you see over here that the data is okay. And if you look to the great off a second exam, you see all the great are in between a five and a night Onda 10 5 on a 10 story and you see over here that a lot some people three people had a one and in this case, also, you can check like what you want to do with it. Some people will say like, Yeah, these people did not study a lot, so they will get a one which could be good value. We all the people say I am it. If you if you will do investigation about emotional instead, uncertainty and there will be some lazy people were just like out of the study. Don't study anymore. And they got the one now and they already already gone away from the study. So they're not representative in your riel study, Kate. Then you couldn't leader with another case. You can just leave the one the one or you can decide to create from the one also five to make it the impact off this data Not too much on your data set, because a lot of techniques within statistics they cannot withstand out liars. Sometimes if you make this kind of graphs you get, ah, it's impossible to read. Graph this, Stan. It's quite possible with, for example, if I would add the every minute of sport I should have added in the first place, you'll see the Eversmann of the sport. You can quite hey, this one quite well. But now the other ones become too small. And in this case, I would recommend first make your conclusion about the efforts man's with sports, then as a second step removed at one from your list. And now say something about the other five. Because now they're more easy to read. Thank you submits about watching this video