Transcripts
1. Introduction to Statistical Functions: okay on the section for statistical functions. This section is all about operations that you perform on numerical data. So we'll start with our basic stats functions. These are things like Max men. Average counts some ah, standard deviation, variants, percentiles, basic stuff. They're all very straightforward. They were going to go into some more interesting ones, small and large functions that rank and the percent rank functions. We'll talk about how to build random number generators with Rand and ran between Go into some product, which is, Ah, pretty powerful calculation, tool talk count. If some if and average if and then finish up with count ifs, some ifs and average ifs would basically do the same thing, except they allow for additional criteria. Ah, and then last, but not least, will build out a demo showing how to build a dashboard just using some If in Countess functions eso to files for download. As usual, we've got our PdF called Excel for analysts statistical functions and the Excel file called Section four statistical functions. So go ahead and open those up and let's dive in
2. Basic Statistical Functions: All right, let's kick things off with some basic stats Functions use the functions like count, average median mode, maximum percentile, etcetera. Rather than dive into each one. Specifically, I just want to show them all here. This isn't a comprehensive list. This is kind of a list of the ones that are more commonly used. But the syntax is pretty straightforward. Pretty simple to understand, really. The only component and most of these formulas is your reference. All right, So, for instance, if I want to find the median of data from a two through a 20 that's exactly how I'd write the formula, that the only difference is for ones like percentile, you have an additional component of the formula to specify which percentile you want to return. So if I'm looking for the 25th percentile within a given data range had put a 0.25 I'm looking for the 75th and put a 0.75 Standard deviation is STD Dev Variance is V a r. I'm. So let's jump into excel and see what we're working with for this section. I've got the section for statistical functions, Workbook open as you can see there three tabs here right now. I'm in the salary data time, which is a raw data. And as you can see, we've got a five years worth of Major League Baseball salary data. So there's a year column, team, league player and salary. You'll also see to blank columns F N G for rank and percent. Rank will be covering those in the next lecture on the second tab called Basic Stats Functions. I've just got some blank cells where we're gonna be calculating some of the formulas we just walked through. We'll do that in a minute. And then last but not least, first tab his team seller dashboard, This is gonna be our section wrap up demo. We're gonna use data validation, count ifs and some ifs, toe aggregate, and visualize some of this raw salary data. So fun stuff there. Let's jump into basic stats, functions and just populate these cells. So I'm going to start with sample size, which basically requires account of my my data set. So I'm gonna type equals, count, open the parenthesis. And then here's just worry and certain data, right? So I'm going to jump over to the salary data tab and select the entire column E, which is my salary column. Close the parenthesis and hit Enter and it jumps back evaluates the Formula. One thing to point out is that when you're linking data or referring to data across tabs, Excel will automatically add the tab labels there. So that's why it says salary data in single quotes, followed by the Reference E. Um, so I don't have to type many of that as you click through. It will automatically populate for you. So let's do the average here. It's just gonna be equals average of salary data. Call me average salary just over $3.5 million which is totally nuts. Let's look at meeting here, which is just the median function. So as you can see, it's a very similar pattern here, so I'm going to start moving a little bit quicker. Mode is just mode, and mode is one of those functions that takes quite a while that calculate. So I'm kind of pausing here to let excel do its thing. Usually takes 5 to 10 seconds. There you go. Um, not quite sure why it takes so much longer than the others. Essentially the mode is just which number appears most often in our data set. In this case, we see that it's $1 million. Let's take a look at what the max salary is. So for maximum, I could just right, Max 33 million, Which believed was Alex Rodriguez. Minimum is at my end, 400,000. Which was that and will be minimum salary in 2010. So now we're getting into 75th percentile. So I'm gonna type out percentile, select my array, just like normal. But now I'm gonna add a comma. You see, this little helper box popped up, and it's looking for the K value. That's just what percentile I'm looking to return. In this case, it's 75th. Gonna put 0.75 Close it off. There you go. Now to do 25th percentile. Let's save ourselves a few seconds. Just copy the formula above, paste it down, change the 722 and we've got a 25th percentile, which is just under 500,000. Standard deviation is STD, Dev, Uh, this is gonna basically describe the spread of our data around the mean and then variants is basically just calculated as standard deviations squared, which is gonna give us a huge, huge number. Not particularly meaningful in this context. But that's the variance function. Um, these other 2/10 highest and 100 lowest salary we're gonna cover in the next lecture. But there you go. Basic stats functions.
3. SMALL/LARGE & RANK/PERCENTRANK: All right, so we're gonna talk about a few more functions now. We're in talk rank, small, large and percent rank. So I'm gonna start with the small and large functions which return the 10th smallest or largest values within an array formula. Syntax wise, there's two components. You have your data array and then the rank of the number you're trying to return. So in this case, pyrite large a two through a eight comma two, it's gonna look for the second largest number within the array A two through a eight. It looks at that array finds that 90 is the second largest number. It returns 90 same story with small. It's just kind of flipped. So if I write small a two through a eight comma three, it's gonna look for the third smallest number in the array, which in this case is 50 outrank works in a similar way. It kind of gives you a similar insight again. Two components of the formula. You start with kind of the cell of interest or the value that you're trying to find the rank of, followed by the array in which that value lives. So by writing rank 82 comma A to eight. We're looking for the rank of the value in a to in the context of the array for May 2 through a eight. So that returned a two because the value any two, which is 90 is the second highest value in that array. Um, and it kind of follows. From there. You'll note that rank of a five within a 238 equals one, which indicates that it's the highest value in the array and the rank of a three using a seven, which indicates that it's the lowest value. Um, next, the percent rank. It works just like rank, except it returns the rank of a value as a percentage of a given array or data set. So this formula syntax is a little bit different, but again, pretty simple. Just two components. It starts with the array. So what range of data are you looking at? And then the second piece is just noted with an X, and that's which value within the range do you care about. And so it's gonna tell me what the percent rank of that value X is within the context of the given array. So if I wrote a percent rank formula and call him be here in this little screenshot, uhm, I'm going to see 100% for the value. That's the highest in the array. So in this case it's a 14 which is 6827. You can see it's the highest value here, which corresponds to a 100% percent rank. Same thing. The lowest value will return a 0% percent rank. So let's jump into excel and, uh, play with dysfunctions. So first back to the Basic Stats Functions tab where we left off. So for the 10th highest salary I'm going to use a large function here so equals large array . It's going to be the salary column and then I want to be 10th largest seven right? 10 $24,642,857 is the 10th highest salary in my data set. Same deal to find the 100th lowest salary. Use this small function my rig and it's gonna be calling E and then type 100. So 405,000 which is just above minimum, is the 1/100 lowest value So there you go. There's small and large. Let's jump to the salary data Tabin fill in columns F and G which are calculate rank and percent rank. So what I want to do is find the rank of each associated salary as part of the larger range . So for this one, I'm gonna start with equals rank. So it's asking for the number which in this case is sell eat two and then the reference, which is the full array in which it lives. So here's an important distinction. First off, I'm going to use a shortcut that I taught you in the first section. We're gonna select E to hold control shift and arrow down to give me the entire contiguous data array. The other important piece here that I need to do is fix this reference. So without touching anything, I'm gonna press f four. And as you can see, it fixed that entire array from E two through E 4135. But I'm leaving the E two as a relative reference and you'll see why in just a minute I close it out. I'm done with my rank formula and hit enter. This is telling me that at this first player I think it's Bobby Abreu in 2010 as the 4014th highest salary in the data set. So now, because I set my references and thought about them as I was writing the formula, I should be able to just double click to apply this formula down. And as you can see, the only thing changing in my formula is the reference to my value. So that's what I want to be changing. But I don't want the entire array to be shifting down or changing as I move this formula. So this is working properly. Percent rank. I'm gonna do a really similar thing here, and type equals percent rank. It kind of flips. I'm gonna start with the array clicky too. Control shift Arab down. Fix it with that four comma to the X, which is the valley that I care about. Just gonna page over to eat, too. Close it off, press enter and apply it down. So there you go. If you look at percent of drank and compare it against salaries, you'll see that it makes sense of the higher percentage ranks 86.3%. For instance, um, it's tied to salary of 8.25 million. Um, whereas the lowest salaries 400,000. Ah, equate to a percent rank of zero. So there you go. Small, large rank and percent rank.
4. RAND() & RANDBETWEEN: okay, Time to talk about some of my favorite functions and excel the rand and rand between functions. So these are essentially random number generators, and they work in slightly different ways. The first is the rand function, and when you write it, you literally just open and close the parentheses without anything in between. Just like you see here. There are a few functions in Excel like today and now, which will cover in the date and time section where you write it this way. There's there's actually no components to the formula. Besides the formula name itself on what the Rand function does is it returns a random number between zero and one out to, I believe 15 digits ran between is similar, except that you can specify the range that you want to determine on debt will return whole numbers between your range. So if I wrote, ran between zero common 100 filled out this array from a 136 here's an example of what I might see. Keep in mind, these are volatile functions, meaning that they'll re calculate or re randomize with any workbook change. So just be aware I'm gonna jump into just a blank excel workbook to really quickly show you what these look like and sell a one gonna do a rand function So just r a nd open closed parentheses Enter as well sealed. Drag this out, drag it over. And it's just giving me a Siris of random numbers between zero and one. And if I go to the formulas tab, click calculate. Now you'll see they consistently change. I'll just change instead of Rand, I'll do rand between and let me just say zero to ah 100 and then close the princess. Once I apply that to the same range, you'll see that I'm dealing with a similar random number generator. Except in this case, all my numbers are whole numbers between zero and 100. So there you go. Two examples of random number generators in Excel using Rant and Randy
5. The SUMPRODUCT Function: all right. I want to quickly cover formula here, called some product, and what some product formula does is multiply corresponding cells from multiple arrays and return the some of the products. That probably sounds like complete gibberish. It's a little complicated to describe in words, but realistically, it's not that bad once you understand what it's doing. So first, the syntax is actually quite simple. You just list out the arrays that you want to take some product of. Typically, these air entire columns note that all your rays have to have the same dimensions here. But to look at an example, let's say you've got a very simple data set here, just three products, along with quantities and prices. If I want to take the total revenue for all my data, all my products I could go row by row and say it's calculate the revenue for Apple's two times 50 cents revenue for bananas four times a dollar, oranges three times 80 cents and then add those product level revenues to get my total of $7.40. Some product is the same thing just in one step. So if I take the some product of column B quantity and column C price. It's doing that exact same process. It's calculating products row by row and then summing the results to get the same value. So, uh, some product is often used with filters, so that was a simple example. But sometimes you might want to say, Take the some products, but only in cases where, uh, this column takes on a certain value. To do that, you can just replace your array reference with the array reference, followed by an equal sign followed by what you want to filter on. So in this case, I want to filter where call him a only equal shots, Um, and you can write it just like the normal formula. At the only difference. And this is a weird little nuance with Excel is that if you don't have a filter, you separate your raise with comments. If you do have a filter, you separate your raise with multiplication signs. Um, and you can add as many filters as you want here in this last example filtering. We're taking a some product Onley where, uh, column Aiquile Shaw's and column B equals apple. And then when those two are true. It's also multiplying. Column C and D. So one kind of peak under the hood to understand how it's really working is to compare how you see the data versus how Excel is interpreting the data. So any time you've got Filter here, it's essentially, um, a logical test that Excel evaluates behind the scenes. So when I say eight to Through a 17 equals Shaw's, that gets translated on the back end into a series of zeros and ones where it zero if my testes false. So if the store does not equal shots and a one if my test is true, same deal with Colin B s er five b equals Apple. It's gonna look like a one every time the product is apple a zero. Otherwise and again, you don't actually see these ones and zeros. This is just kind of how Excel is doing the math behind the scenes. Um, so if we were to have a some product formula like this, which is taken to some product of a times B times C times D, as you can see, Onley rows that satisfy all conditions or all filters will produce a non zero sum because as soon as one of my filters fails, I'm going to get a zero in that row, which is gonna make the entire row zero out or be excluded from the sun. So basically, that's how the filters working behind the scenes, and that's some product in a nutshell.
6. COUNTIFS/SUMIFS/AVERAGEIFS: All right, now we're getting into the really good stuff. Count if some if average If this is where we're starting to build formulas that really opened the door toe, let us create some really cool, powerful, dynamic things in Excel. So right now we're looking at count. If some if and average if and I'm going to skip this slide and I'll tell you why, it's because each of those functions also have a plural ized version s account ifs, some ifs and average ifs within s at the end. The only difference is that these functions allow you to do the exact same thing, but include multiple criteria instead of just one. And the syntax is kind of different. So rather than teaching you the syntax for the singular versions and then immediately teaching you the difference in tax for the plural ized versions, let's just focus on count ifs, summits and average ifs, because you can use these versions of the formulas even if you're just using one criteria. But they opened the door for you to include additional ones if you choose. So these air kind of the ideal wants to use actually haven't used. Ah singular version in God. I don't even know how long. Um, so let's focus on these for now. Um, basically, what these do is they allow you to evaluate account summer average based on multiple conditions or criteria. Eso sin tax wise count if basically consists of a combination of criteria, ranges and criteria so you'll see criteria range one criteria. One criteria arranged to criteria to in this extends for as many criterias want to include . So let's say our criteria that we want to a factor into this count is ah, the campaign name. So I want to take the count where the campaign name equals Google, for instance, in that case, the criteria range is gonna be Whichever column contains my campaign names and the criteria . One criteria itself is going to be Google in quotes as a string, um, some ifs an average of work the same way. There's just one additional piece, and that's right up front. You lead with either your some range or your average range, so some range is the range of actual values that you want to take some of followed by all of your criteria, ranges and individual criteria. So I know it's a little bit confusing at first, but I promise, once you read a few of these, they'll start to become really, really natural. So it's just kind of slowly take some time to walk through these three examples listed here . Um, first off Countess. So again we're just listing the criteria ranges, followed by the criteria. So my first criteria ranges Column B I'm just looking at 13 rows of data. So be to thirties be 13 which is different tactics either Search your display this case on Lee take the count where be equal search and where d, which is. My CLICKS column is greater than 200 and you're probably wondering, Hey, that's a, uh, new miracle filter. Why you putting quotes around? It's not a string. And ah, I wish I could tell you why that's the case. It's just the case when you use greater than or less than in a summit for accounted for an average if, for some crazy reason you got it surrounded in quotes. Um, trust me. I tried 1000 other ways, kept getting errors, didn't know why, and then realized this was how it works. So again another one of those little nuances that you learn once and then you never forget it. So that's count if it returns the result of three. Because there were three instances where be equal search and click volume is greater than 200. So some ifs. Now we need my some range first, remember? So I'm indicating Artemus some ifs column D This is what I want to some where column A equals February and column B equals display. So it's gonna look at all the roads where both those cases air true. And then take the some of the clicks column, which is my some range same exact thing with average ifs. So the average range is gonna be calling demon clicks again on average. Column D when column A equals January and column C equals MSN. So there you go. That's count, ifs, some ifs and average ifs. In the next lecture, we're gonna talk about putting these into practice to build our first dynamic dashboard, so get pumped
7. Project Shocase: Building a Basic Dashboard: All right, let's build ourselves a dashboard. We're gonna use count ifs here, renews some ifs here. We're also gonna build in some data validation which we covered in the first section on. Basically, the idea here is to take the raw data that we have on and create a dynamic little dashboard along with the visual to kind of give you an idea of how these functions can be used as a really cool, actual analytical tool. So I'm on the team salary dashboard tab of the section for statistical functions Excel, workbook and ah, first step that I'm going to do is to turn cell C two into a data validation. So and basically, I wanted to allow a list of all the unique values for each team. Um and that's it. So first thing I need is that source list to get that, I'm gonna click the salary data tab. Starting be to control shift arrow down and hit control C to copy. No, I'm just going to create a new tab here and paste that data. This is kind of just a scrap sheet on without de selecting it. I'm just gonna go to the data tab hit the removed duplicate buttons. OK, that will tell me that 33 unique values remained, which are each of the individual team names. So Okay, that's gonna become the source list for my data. Validation. Drop down. So it's jump back to team salary. Dashboard back to cell C two, go into data data validation. Click the data validation button and then in the allow drop down, we're gonna choose list. And in this source section, choose the data selector there, Jump to sheet too. Select a one control shift Arab down to a 33. Impress that same button again. Toe lock it in. It's now free press. OK, we've got a drop down list of each team, so that's step one. Step two is actually populating this array of data using count if in a summit formula to pull in the correct data based on the team and the year. So as you can tell, they're two criteria that we're dealing with here, which means we have to use the some ifs and count ifs versions of the functions. So let's get started for total salary. Since we're gonna be summing an actual data range, we need to use this. Some ifs here, so I'm going to start with some ifs formula open the princess. And remember, my first component is the sum range. So what am I taking some of? Well, this is the salary column, so I'm gonna be taking the sum of salary. That data lives in the salary data sheet. In column E select that whole column gonna press f four to lock it in, just in case this formula moves from column to column. Um, I want always calculate salary in this case, Gonna hit the comma to get to my criteria range one. So I have two criteria that I'm working with here team and year doesn't matter which order I go in. Uh, I could set my team criteria first, then year or vice versa. So in this case, let's just do team first. So criteria range One is the range in which my first criteria values live. So I'm doing team first. That's gonna be column beef. Gonna lock it in with the F four shortcut. And then when comma over the next piece that they need is the actual criteria that I'm looking to pull out of range one or looking to identify within range one. Um and that value is going to be found in cell C two. So I'm gonna lock that again. That means whatever team name eyes populated in cell C two is gonna be fed into this summits formula. So comma over and I'm on criteria arranged to. So my second criteria is the year. So where did my years live? They live in salary data column A again afford to fix it and comment in my last piece of the function, which is criteria to So how do I know what year I care about in this instance? Go back to the dashboard tab. I care about the year in column B. And in this case, it's the value in B five because I'm dealing with the total salary for 2010. Um, now an important point of distinction here. If I were to lock, be five in completely with you, have four short cut, close the parenthesis and enter the formula. I'm getting the correct salary for 2010 for Arizona. If I drag us down, you'll see that I get the exact same salary every single time The reason that's the case is because if you watch the formula itself as a click down, nothing's changing because every reference is completely fixed. Um, that's almost correct. Except the one thing that I do want to shift is the reference to the year name. It's kind of tough to see there, but if I scroll down and hit F to to kind of dive in, you could see that in every case I'm referring to be five and Onley be five. So if I go back into my original formula and C five, I want the B two b fixed. But I want the five to shift to row 678 and nine as they move this formula down. So I'm just going to remove the fixed reference before the five and press enter again. Now, if I apply this down, it will update, uh properly. So that's my total salary column for the number of players in this case, we don't want to some any values. We just want to take account of the number of players under those same two criteria team and year. So much going to do equals count ifs, criteria range one which is team. So where to my teams live Salary data, Colin, Be fix it. And what's my criteria? Whatever team have selected in cell C two criteria range to is your where to my years Live salary data column A and then criteria to is whatever year that I care about, which isn't be five. And I'm gonna hit F 43 times to cycle through until I fixed the B but not the five. Close the parenthesis hit. Enter. And there you go. So this is telling me that Arizona has 26 players in the data sample for 2010. 27 for 2011 and 12 and so on and so forth. So for average salary, um, you could use an average ifs here. What I'm gonna do since I have the total salary and I have the number of players just gonna take the simple route and just use simple, uh, division here, So take C five divided by the number of players. So team salary divided by a number of players equals average salary per player. And I'm just gonna apply that down. And if I want to get rid of thes decimals. I can go into the home tab and just hit this decrease decimal button twice. And I've rounded to the nearest dollar. So there you go. This this chart is already tied Teoh data in those ranges so will update automatically. As you cycle through. Now you can see how each team has kind of adjusted their salaries and how the average salaries have changed over time. So there you go. Pretty simple example of how to build a dynamic dashboard and how to use these formulas and more of an analytical way. Um, all right, so that wraps up the statistical function section. Next up, we have look up reference functions, which is one of my favorite sections, so get excited.
8. HOMEWORK: Statistical Functions: All right. Time for section for homework. Go ahead and open up the homework exercises file. Now we're on the third tab. Stats functions. And here we've got a basic data set in columns, A through E. We're looking at product category and skews by location, and we're measuring sales and revenue. So what we're gonna do here's walk through three steps to use functions like some if some ifs and count ifs to pull out specific subsets of this data and populate the cells above. And remember, if you get stuck, you've got the answer file available to you in the course Resource is good luck.