Transcripts
1. Trailer: Im Alle Chen, a solutions architect at CODA in New Type of Doc, that brings words, data and teams together. I have a passion for data and analysis and the tools we use to analyze data by the numbers . I have been an Excel in Google Sheets Power User For 10 plus years, I've taught over 25,000 students on Scotia through my Excel classes, and I've also taught in person Excel classes to over 5000 MBA students around the United States. I also host my own podcast about excel on data analysis called Dear Analyst, and I frequently blawg about excel in data analysis. At the key cuts dot com and also on LinkedIn. I'll be teaching intermediate formula skills, which are commonly use for cleaning up a large data sets. This scale is really helpful because we are inundated with data at work and elsewhere, and I want to empower you to be able to clean the data yourself without the help of a did engineer or in Excel Booth. 1st 2/3 of this class will be spent on the formulas you use for cleaning up a data set, and we'll be diving into some more advanced nested formulas with if statements and other conditional operators in the latter third of this class will be creating a scatter plot chart and doing some basic linear regression and creating some regression statistics. I'm really excited about teaching this class because you can only analyse and present your thoughts about the data. If the data set is clean, really excited in this class to be able to get into some of the more intermediate formulas that I use for cleaning data in Excel or Google sheets. So please join this class, gain some new skills for cleaning data and be empowered to clean and analyze data in Excel .
2. Class Overview + Project: for each lesson in this class, you'll be jumping right into excel and you'll watch me apply the skill from the lesson right inside the Excel interface. Each lesson will be covering an intermediate formula, and you'll watch the formulas bill on top of each other. Lesson after lesson For the purposes of cleaning up a descent, you should feel comfortable navigating around Excel, writing basic formulas and creating simple charts. If you don't have the skills, I would highly recommend taking my fundamentals. Excel Class on Skill Share It's called Excel for the Real World 12 and three. The class project involves cleaning up a data sets of hotel bookings for two hotels based in Portugal. One hotel is a city hotel, and one hotel is kind of like a resort spa tip of hotel. The data set is around 50,000 rows, so relatively large and you'll take the techniques and let skills from this class and clean up the data set to do some exploratory data analysis. After doing some of the cleaning up of the data, such as getting the proper dates and we're showing the proper hotel categories, you'll build some summary tables to summarize the data about hotels. Finally, you'll attempt to forecast certain metrics, like when the hotels should expect cancellations based on the lead time for a customer making a booking. It's a really fun project, and I really hope that you'll after taking this class, that you'll jump in and start doing some of this exploratory data analysis that I just mentioned. So without further ado, let's get started with the first lesson.
3. Named Ranges & Offsets: lesson three you'll be learning about Named, arranges and why they make you more productive in Excel will also explore the offset formula and why you might use this formula in conjunction with a name range in exile. Be setting a name range for a range of cells to more easily reference those cells. You do a similar thing with the offset function, and we'll discuss the differences between those two for making her your life more productive in Excel. So I have this table of TV characters, the show their from and then also just a numerical rating for that TV character. And before I get into the first step of this lesson, let's just do a basic V look up to get Steve Oracle's rating from the table. And if you don't know how to use if you look up, please take my first Excel classic self for the real World Number one, and you could get a quick review on how to use. We look up so I'm gonna start writing a V look up left parentheses, a reference Steve Urkel here, which is written in the cell comma. I'm gonna head over to my table and sucked everything here. Making an absolute reference is in case comma three to pull back 1/3 column and then zero for false. And I get Steve Oracle's reading, which is right there. Now let's delete this value for now. And, well, poor do that instead of referencing the table will like this with regular cell references. Let's make this easier for ourselves by creating a named range, and we're gonna do that by we're gonna call the name range characters. So let's go to insert. And this is in Mac Excel on Windows. The menus will be different name to find name, and I'm going to start writing My name of my range is going to be called characters and then in the range of cells and click on this little icon here and then select everything from L five to end 23 and then hit this symbol here again. It's also hit, okay, and there we go. We have center name range. If I go back to inserts name, find name, you'll see a character's name range Now. Now, instead of doing that traditional cell reference for the view. Look up. I'm gonna reference that range. You look up Steve Urkel comma. And if I start typing characters, you'll see that it will show up as well those auto fill options. And if I hit Tab, you notice how it highlights this range of TV shows and characters and the ratings, so I don't have to navigate to the cells and the sore references. I could just type in characters comma three, comma zero like we did before. And now I can get Steve Oracle's rating, which is just by using the simple name range. So this really is useful when you have a static list of data that you want a constant reference in your formulas. Now the next one is a little more advanced. What if we are constantly adding to this table below? This name range only references up to Row 23. But if I add new Rose, this character's name ranging won't pick up those new rose. So that's where we have to use a combination of the offset and the count a function to build what we call a dynamic name named Range just to review. The offset function is basically kind of shift around what cell you're looking at so if I write equals offset down here and a pick Will Smith as a reference, which is l five. And I say comma three comma zero. What this tells me to do is look at cell five, Move down three rows, which is 1230 shop just jazz and move over. Move over a zero columns. So stay in the column that my reference cell is in silicate jazz. If I change this to be three, that I'm going to go from Will Smith down toe 123 to jazz and then one two. Okay. Should be to Sorry about that. It will be jazz zero Freshpet spellers, one rating us to so hit that I'll get the rating for jazz. So there to additional options that we're gonna use for our dynamic range Name. Really? We just height and with that tells us from the reference cell, How many rows of data do you wanna pull back? Which is height and how many columns don't pull back, which is with So we're gonna do that with our new name range. So if I go to insert name to find name, I'm now going to add a new one. Call the dynamic characters. So it's that right here, dynamic underscore characters. And in the cell reference, this is gonna be a little more advance. I'm going to delete this junk right here and right, Offset Left parentheses. My reference cells slow. Gonna be l five, which is Will Smith. Now I'm gonna put a comma. Zero comma zero. Let's see what that does that tells Excel. Don't move the cell around from l five. Don't move down. Move it down. Zeros and zero columns. So put a comma again. Remember, our last two parameters for his formula is height or height and with so high it is how many cells you want to pull back. And since we want to make this I dynamic range, I'm gonna write Count it. Just trust me on this one. It's a nested inside offset for makin county, and we're stuck everything from l five and you can select as many empty cells below. I'm just gonna go down Teoh l 30 and you can pick as many as you want You can. Maybe something entire column. In some cases, if you're tables at the very top, I'm just going to It's like two l 30 and then I don't I want to say I want to pull back the the with should be three columns of data. So 123 Um, if I leave this as one, that's only gonna pull back calm. L so want three columns of data. And just to recap, what this does is it says go to the Will Smith cell. Don't move around from that cell. Count the number count a counts number of cells that are non empty or have data in them And then comma three just goes back gives us back three columns of data. Now let's just close this out. Hit OK And this formula might take some time getting used to. So definitely pause the video and watch this formula being written out to make sure that you get it down directly. Okay? And you notice how I have a new character here called Miles Crane and I want to get that characters rating and if I say equals V look up. Now it's crane comma. I'm gonna start typing in my new dynamic range dynamic characters and, you know, it shows up right there. Tab and references everything from L five to l 30 comma three comes areas, just like we did before. I noticed. I get an error of n Slash a. Let's put in a name. It's already on this list a mixture. Our new vehicle v look of formula with the dynamic characters. Reference works. Let's type in Carl Winslow. Make sure we get that reading. Yep, seven Is Carl's rating that sort of Cory Matthews Really quick, All rights, working correctly. But if I want to pull back Niles Crane, which is one of my one of my favorite TV show characters from a TV show back in the 90 years called Fraser. Uh, not sure if you guys have seen before, but definitely check it out. So if I start adding new Rosa data to this list, start typing in Niles Crane. Fraser. You should have seen already that the four months are changing. This picks up new data that's being added this list. So the key difference between this range and this dynamic range this name bridges that this name range is really useful. If you have a static table that doesn't change, no data is being added, and you can just constant reference that table in your Excel files. This one allows for new data to be at your table. If let's say every week you're getting new data for stats or for reporting, this may be kind of useful to use for academic ranges. So just to, uh, a few other tips is that offset is typically used. We looked at the dynamic range. The offset function is typically used for financial models, but we're using, in this case as a way to build a dynamic range. And if your data is growing a lot, this is one way to capture all that extra data. Another way is to use Excel tables, not dis attritional table. But there is actually a Excel menu for tables. I believe it's an insert somewhere right here. You can use his table option, but we're not going to talk about that in this in this class, and a few key takeaways is that name ranges again should be applied to died to static data and your reference in the same thing all the time. A dynamic range is applied to, uh, more dynamic data, has an implies, and so definitely try to use both of these in your Excel files that your using at work or at home
4. Date-Related Formulas: in less than four, you'll be learning how to use date related formulas such as year, month and day to quickly label time stamps and categorize your data by time. Have a list of advertising data. Looks like if you just kind of scanned the table will be using this same data set for the next few lessons. So it's important that you kind of get used to looking at ah in understanding the use case for the say that we have a time Sam column age or income and clicked on ads. So this is going to be data you might get from Google AdWords or Facebook ads or linked in ads were Twitter ads and you want. And it basically gives you an idea of what time a given person click on the add their income, what city they click, the add from and also the country and whether or not the person was male or not, and will be doing some Excel formulas here to write simple if statements to see if a given ad was clicked within a certain time that we specified. So for the first step, we're going to create formulas for the year month and day for the data to the right. Right here Is that right here in columns are s anti. We have time stem here, which is month, day year, followed by the time and the promise to myself here. Sometimes you want to sort data not by the time stamp, which is typically when something happens in database or in your inner business. You want to sort of by year, day or month by those specific values. So we're going to start creating this formless from in these columns. So let's start with by saying, Let's create the rear formula. And as you can imagine, it's a very simple formula. Teoh build out. We're just going to say equals year and you'll see a year function tab impressed left a review times reference the time stamp he referenda sees hit, enter and you can see 2016. Now, instead of feelings from the down, I'm going to go in a month and with same right The month. The formula for the month, which is equals month left parentheses over time stamp again. Rep. Parentheses hit, enter and you get the month, which is march and then day is this day left parentheses. Head over to timestamp, right parentheses, and there we have it. So we built these three formulas out, and it seems to work. And let's select these cells and let's fill it down to the rest of our data, impressing command D to fill the formula down. This is spot check the used to make sure that the formal Zahra correct tell in 16 here months, January and the day is 28th. Okay, so we have this data. We have the year, month and day built out for a table. And let's see it was the next step. Create an it firm like to see if the ad was clicked on in Q one, which is generally through March. And if you don't know how to use it formulas yet, please check out Excel for the real World number one, which is my first class about fundamentals. And let's go ahead and build out this if formula. So when we look at data from Q one, Q. One applies the generator march, and that's any month that falls within the range of 12 or three because one is generated to a savory three is March. So in Q one, that's gonna be the formula we're gonna focus on right now. So let's start reading. The formula equals if and we want to focus on the month and the way the test I'm going to say here is we want to capture anything that's 12 or three. So what's one condition weaken Due for capturing all those values? I can say, if S five in this case is less than or equal to three. And now why does that work? Because if the month is three, it's going to capture because of less than or equal to. If it's one or two than this, less than will capture that as well. So after I say if S five is less than or equal to three, I can now say Comma, I'm just going to say, Is it if it's yes, then we'll say yes. If it's listen. If it's Q three, if it's not, then we'll say no and then close it out. And now we have this from leftward seeing if it's in Q one, and I'm gonna drag this from let down as well Command E. And let's just spot check this to make sure we have it right. So June is not a Q one. It's not less than or equal three. So it's no. One is. Yes, seven is no. So it looks like we did our job. The last step is sort of data to see all ads that were clicked on in Cuban, regardless of year. So this is where we can do some more interesting things on her data to see bit to, basically analyzer data. There's multiple ways of doing this, and I'm going to first. Let's first filter by the the Clicks for in Q. One. Now I'm doing a bunch of command. I'm doing some keyboard shortcuts here, some sucking on my data pressing command shift F to create a filter. And it's different. Excel on the Windows Boat on the Mac Control Command Shift F creative filter, and I can select here and say I want to look at only the yeses, and I can quickly see just filter to the ads that were clicked with in Q one. By putting this to yes, I'm gonna undo that. The steps those the step is to sort. The data by ads were clicked in Q one. And so what I can do here is against slick my data do command shift our to sort based on, ah a month and do small, so largest. And now you can see that my months now are all my data. My had adds data is sorted by month 12 or three, regardless of the year. They all happen to be intelligent 16. But this is a a better way of looking at data if you want to see how data was clicked by month, even if it's not in chronological order in the TIMESTAMP column. So just some tips here, dates and times, like in this time, some calm are very important to see customer and product usage data. You want to pay really close attention to the format of the time stamp because sometimes it may not be this easy to get out the month, year and day in your data. Sometimes there's time zone differences, and there others these cases where you want to find the differences between two time stamps . I really liked ability to create these month, days and years because it allows me if you've a lot of data, this is a small set. If you have a lot of data, you can compare data across months or quarters instead of just chronological order. And that's helpful. If you want to compare, you know, the Christmas season from last year to Christmas season from two years before. You just want to focus on quarter over quarter growth, and this type of data is very useful for putting into a pivot table as well. So just to recap a timestamp shows you when the data was typically entered into the database. This is in our cases when the ad was clicked and these date related formulas can help you simplify when the sale, click or customer was logged into a database.
5. Nested If Formulas: in less than five, you'll be learning what happens when you put if formulas inside each other, also known as nested. If formulas in the previous lesson learned some basic if formulas, and we'll see what happens when you combine these formless with other date related formulas . We learned in the previous Claessens putting this standard if formula in this lesson. But we're going to be in nesting the formula inside other if formulas, to build some really powerful logic. As we clean up and categorize our online advertising data here into more useful data set than we can then put into a pivot table filter sort whatever you want to do. Now we're gonna write some pretty complicated, necessary formulas. And instead of doing instead of writing a very long, necessary formulas, we can actually break this out into multiple columns. Here. We're gonna do this or hear later in this lesson, and by breaking out to multiple columns, will make it easier to debug the formula. But let's start. Let's first start with the hard formula, which is create unnecessary formal, its output. The quarter for the ad clicks. We know that this column time sand contains the time in which the ad was clicked. So it's creative formula to get the quarter. Let's get the quarter of that Timestamp. So we didn't in the last lesson. We just figured out if the timestamp waas Q one or not. But in this us and we're going to say whether or not the time Septus Q. One Q two q three q four. And so it's gonna be a little more complicated than just doing one simple if statement. So let's start by writing equals if left parentheses and in order to get the month from this timestamp, remember how we had to do a month so once a month, click on the reference, which is J five red parentheses. And just to recap, this formula right here gives us the number of the month in this cell, which is march. So let's say if month and just like we did with the last lesson lesson equals 23 tells us if that month is 12 or three. So if it is that I'm gonna say Q one now, how do I start saying if it's you know, 45 or six, then I want you to. If it's 789 i one Q three. We can't just say if we can just write Q to Q three Q four here because this is valued his false. So we're gonna write another if statement, and this is where we're going to start nesting if statements inside each other. So if left parentheses month. Same thing. If month most are typing this orphans Now J five is less than or equal to six, then Q. Two. Now let's think about what this does here you might be. Think yourself will want this less regal. Six. Want this? Capture everything from one through 6123456 The answer is yes. But when Excel evaluates this hole is statement. It first looks at this this condition. So if anything is from one through three, automatically classified as Q one and then it checks to see this if statement. And so this. If steam will capture anything from four through six because this one already capture everything from one through three. So there's a very important distinction to understand here and how Excel evaluates these nested if statements. So let's go on to building out the Q three option. So if month of J five is less than or equal to nine than Q three, so this will handle everything from 79. And instead of writing this same thing out again for Q four, I'm just gonna write Q four here, oops you for as the false option for this necid of statement. Because if it doesn't fall within one of these three options thes three conditions, then we know that's that the month value has to be 10 11 or 12 which is Q four. Some Now it's gonna write a bunch of rep entities to clear out my close up My if statement and you could see how these, if statements nested inside each other, allow us to build is really powerful logic. And then we're gonna fill this down and just do some spot checking. There we go. Let's do some quick spot checking to make sure that this is right. So marches Q. One June is cute, too. November is Q four. So our if semen was built out correctly, are necid of statements filled out correctly. Let's get the quarter this time by creating a separate column for the month so this formula is definitely kind of long and kind of complicated read. And if something goes wrong, it's kind of harder debug. So I'm gonna create a second to more calms here once a month. We did this already in the last set. Last lesson. So this should be review for you. Tell us down. And now for the quarter, I want to I'm gonna do the same thing as this reference this formula. But I'm already have the month here, so it makes it easier to actually build out this formula because I could just right equals . If eligible tests less than equal to three, then Q one oops, if are five, is less than equal to six, then cute, too, if are five, is less or equal to nine. Thank you, three. Otherwise, you for you can see how this is. Ah, lot simpler to read, and it's just a lot less complicated. Fill this down and you'll see that this should batch aren't alive. These keep insurance blank. You'll see that these two columns match up exactly. But the key thing about this one is that I have my month column here, so I can quickly see that number and know if that number falls within that that quarter. And this becomes much easier to debug if something goes wrong with your message message of formulas. So that's what I was saying. That as your message for unnecessary formula as it gets more complicated, it might make sense of breaking out into columns. So that's easier to see, how each branch of that necessary from evaluates the numbers and the message of formula. And this is actually kind of similar to coding. In many ways. If you ever go from Excel Teoh writing JavaScript or other programming languages, you want toe kind of see how branches of if statements will evaluate over time eso just to recap, if formalist any place inside each other to create some very powerful conditional statements, as we saw with this necessary formula here very, very powerful. And you can use this message from law in conjunction with other formulas like left if you look up and right, which I cover an excel for the real world one. So definitely check that out if you want to reveal what those other formulas. So what we've done so far is sometimes we get data. It doesn't exactly look the way we wanted to. We don't have the quarters. So we manually created is if statements to add the quarter so it makes it easier to filter and categorize the data and label the data once we get it from a database and we need to do some reports or analysis on it.
6. Using AND/OR In If Formulas: in a lesson. Six. You'll learn how to add in multiple conditions to your formulas using the and and or operators. You can also ness these formulas inside each other with necid. If formulas addition to using necessary formulas, I'm going to talk about how you can add the and and or operators to your message formulas to create even more powerful advanced conditions, which allows you to count of further categorize er data the way you want. So school with the first step created or form of that indicates whether the country is in Africa or not. So we have our our ads a day here, and we're going to try to fill out. Column are and we have a country column and you can see they're just from various countries around the world. And we want to label whether or not this country is Africa and just put on a simple yes or no in this column, And I put in a row all the countries in our data set that are in Africa's. We have Burundi, Cameroon, Egypt, Ghana, Tunisia and let's start righting out what this formula looks like. So we're going to start with a if statement again. But we want to be able to tell Excel whether or not the we want to tell Excel what the whether or not the countries in Africa, without writing like a super super long nested if statement and because one way I could do this is I could say equals If this equals Tunisia, then yes, then write another. If statement. If this equals Cameroon, you come second to this over and over again to see if it covers it. But that's kind of long complicated. So I want to be able to do this really simply and that's where we can use an or for Melo. And I'm going to copy these these countries the top of this I have. It isn't easy reference when I write out this formula. All right, so I'm gonna say equals presumable basic and see exactly doing equals if and we're gonna put in the or operator this time. Or And what this tells excel is if if the condition matches any of these conditions, then we're gonna go with the if statement, the true and the false part of this statement. So or if that cell equals Burundi comma you can see how just look references selling, writing all the countries Cameroon comma equals Egypt, comma Ghana and then to you. So what this tells Excel is if any of these, if p five equals any of these countries and you separate them by commas, then we can say value to the true condition, which is yes, otherwise no. So this is a really powerful way to build. Instead of investing multiple, it seems, within each other, the or function simply just gives you an option to put all different options. You want inside that condition. No fate. Enter dragging someone down. We could quickly check that. Do a quick spot check. Tunisia is in Africa, and Naru is not No, no, no, no. This. Make sure these air not Africa. Ghana's Yes, Egypt do. Yes, and Cameron burned. So this this formula is working out as we expected to. So let's go back to the steps. Now we want to create an if from like to classify ad clicks originating in Africa and occurring in Q one. So we know how to build out the Q one from that. We use that month formula, but now we want to know if the add originated in Meet Matt matches. Two conditions. One is it started in Africa. We originated in Africa and also was in Cuba. One. So we're here in key one Africa must already my formula. And this is where we're going to start using the and operator equals if instead of writing or I'm gonna say and because I want to say if this cell equals yes, to say that this click Origen in Africa and and the month of the time stamp, probably that in the in less than three or listen for rather month is less than or equal to three. And what this tells excel is that our if seven has a match. Both of these conditions in in this in Africa cell we were saying if it matches any of these conditions not all these conditions it would be impossible. I guess you could say if it matches all of these, but it would be we're data. But this is saying if in f equals yes and the month of the timestamp is less or equal three , then I want this to say yes. Otherwise, no, thats dragging. Stanley, Quick, do a spot check and let's look at the yeses. So this is yes, because it's in Africa and it's also in Q one. This one is in Q one as well. It's in Africa. This this one is in Africa. But it happened in July, which is not Q one. So therefore, this is a no. All right, let's go on to the last one. This is gonna be our most complicated if statement and we're going to say created Necid, a formula toe label. The clicks from African countries is either high income greater than 50,000 or low income less than 2000. All of their country should be labeled as non Africa. So we want to focus. We're baking now deeper and deeper into already to set. So we want to see a few conditions. If the ad was clicked in Africa and we're going to see whether or not the income is greater than 50,000 or less of 2000. And Aiken, you quickly format this. These are these numbers, so it's a little easier. See? Issue currency zero. Whoops. All right, that's right. So we wanna see if that ad was clicked in Africa and if what the income was, and we're gonna label it as high income or low income based on that number, everything else is not Africa. So let's start building out our formula, and this is gonna be using a nested if statements. So let's start with trying to find the high income countries. So we're gonna right equals if and we have two conditions this time, just like we did with Q one Africa. We want if and in Africa equals yes. And also if, um if area income is greater than let's go back to my formula greater than 50,000 let's take greater than equal to 2000. So this captures the high income earners. Aiken, say hi, income. Now, how do I capture the low income ad clicks? I'm gonna start writing unnecessary statement here again, and I'm some type of so references. If our five equals yes, comma M five, less than 50,000 again. Number M five is our income. So this part, this branch, if you will, of our message if statement captures anything, anything that is in Africa but is less 50,000 So therefore I can say this will be low income And what happens if this if it doesn't match any of these conditions, then was going to say non African country because it doesn't match our conditions of being in Africa? Well, if it doesn't match, if it doesn't match as being in Africa, then that's just gonna default. Putting into the this bucket, which is not African country. So close that out. Close it out and, uh, editor, and we can now fill this formula down. And I can see I've classified my data based on a few different columns now. So let's take a look in spot. Check one of our few of these values. So this one is high income because it's in Africa and the income is above 2000 not having country. We know that as if it's not even not even in Africa that were so. Label is on after country. High incomes double check that one is in Africa. Correct. It's about 50,000 US school down here. Try to find a low income option. Here is low income. It's in Africa, but the theme income is less if 2000 which is why it got classified as low income. Zoom out a little bit and you can see now we have a simple data set of ad clicks, but we're now able to apply some really advanced if statements with Orrin and and also using message of statements that build some really powerful classifications for data. Because again, sometimes me pull the data from the database. It's not in the perfect form that you want, and you want to add in these extra columns that make it more useful for you for reporting purposes and and for for forecasting, whatever it may be. A few tips is that syntax here is really important. So please pause the video and take a look at how I'm placing the left, parentheses and the commas if you place these parentheses and commas in the wrong places that it was completely messed up formulas. So this is where attention, attention to detail. It is really important as you build more complicated. If statements and just a recap, we could have used a message of statement to build out this function here. This, uh, this categorization, but using the or is this a little easier because you don't have to nest? Five. Different if statements in inside each other. And so once you kind of master the if statement with and and the or option you can now start seeing very powerful ways to clean up your data set. And you're kind of creating these so called rules for a data set to cost fire, Daito.
7. Handling Formulas Errors: in Lesson seven, you'll learn how to deal with pesky errors in Excel like n slash a and div slash zero. You can get rid of these errors by using nested if statements. But there is more convenient air handling formulas you can use, such as is error and if error but crates in air value. You seen these before the end slash A's, and the DIV slashed his euros. You can make that error more informative than a traditional like hashtag and such hair def slash zero. We can do this by using a few for formulas to handle these errors, the most common of which are is error, which is typically uses an if statement if error takes on the complexity out and we'll see that in just a second and the most common ways of using if error is by just putting in a empty string like double quotes or zeros, and we'll see what that means in a second. So let's start with the first step created. The look of that indicates whether the country is in Africa or not using this list below. If you're calling the past lesson, we use an if statement with or to indicate if these are all African countries. But the more commonly doing this, the right way of doing this, in my opinion, is to creative. You look up to see if that country falls within your data set. So let's go over here and start writing that you look up Formula Someone to start writing equals him. He's zoom in a little bit here, to the right equals V. Look up, looking up this value right here, the country comma one head over to my country table and apply an absolute reference to that . So see it d 12 in this case, which is just these two columns of data, and then I'm gonna pull back the second calm to indicate if it's Africa and then zero for false and Phyllis one lay down. All right, so what do we have here? We see that Tunisia's in Africa. These are really annoying Entei formulas, and that's because these countries don't exist in this table right here, which is why we get this pesky error. But we can see how it's classic properly classifying the countries that are in Africa as Africa because we're pulling back the second call so that this one looks like it's working out. Okay, now we're going to write, and if formula with is aired, indicated the country is in Africa, but classifying that click as yes or no. So this is, Ah, somewhere that we did in the last lesson, But we're now using. We're now building for most on top of each other by first looking at this column. So we know that if the value is an error, then that country is not in Africa. So we're gonna write that if from a starting with if if is error, which is this option right here. And is there only takes in one value. So I'm just gonna click on this cell right here. So what this tells Excel is if if this values an error, then do the true option. If it's not then false. So we know that if this values and error, then we want this. We know it's not an African country, and therefore we're going to say no. Otherwise, say yes, because if it's not an error that we know is in Africa, if I feel this down, you'll see that this properly classifies this row or this country rather as yes or no based on the kids in Africa. Because if this is an error, no Africa, yes. Now let's go for the third step, which is right and if air formula to indicate that the country is in Africa. So this is where the if error, if error function is probably one of my favorites functions because it simplifies things a lot. So if I go to my if Eric column, this is typically used right with. The Celtic may contain the error if error, so if error also takes in takes into values. But the first value is just the cell that you want. Evaluated hasn't Ahronot, and it says if this cell contains an error, then we want to say no because that means that this country is on after. So if I hit, repentance is here. What happens is if this country, if the cell is an error than this, has no otherwise it just gives you the value of that cell. Let's see what that happens to have to do this for this down. So it's not exactly yes, no. But what happens is you see how all these error values now show up as no. Because I'm telling Excel. If this value evaluates to an error, then give me a know. Otherwise, Excel just give you the actual sell. Actual value in the cell is Africa. I could change this to not in Africa, and that would be what shows up in this column instead of no. So this is a really useful way to do some categorization on your data when you want to quickly evaluate the cell for errands, and it gives you better valiant sort of end sashay by giving you the option to put in your own value for the error. A really common common with using this formula for numeric values. Is this leading a double blink because sometimes you don't want toe have any error, show up or any value, and you could just leave it as blank. Or you can put in a zero and this will show up as this zero, and this is useful again for numerical values. So another tip to keep in mind is if you are formatting, if you are leaving back zeros or blanks like this, keep in mind what the formatting is of the cell because Sometimes this may show up as a dash or zero and just make sure the formatting is correct and you're not putting in a value that you're not expecting, uh, just to recap. Errors are never fun to deal with, so you should be able to handle errors gracefully. And that's what we did here with looking at using yes or no for is error and then handling this with if error as well, I would say if there is easiest to use but is error is also useful if you want to use it in the broader context of an if statement and in terms of using this, these functions elsewhere in the class project see if you can use if error is error to get rid of pesky and slash Azour div zeros and you decide
8. Formulas For Aggregation: in Lesson eight, you'll be learning some formulas for aggregation. Sometimes you want to create a quick, summary table off your big data set, and you can use functions like count if some, if average. If an aggregate too quickly summarized data in a table, this will be really useful for the class project using our advertising. They just sit here and as your cleaning data formulas like count if some if and average if can be very helpful to make sure that you're classifying your data correctly and you're on the right track. So we're gonna look at some from us for aggregation. This is also helpful to see if your message if formulas are functioning properly, which we've been doing a lot in our last few lessons. We're also gonna talk about an arrogant function, which is a super powerful way to handle errors in your data. But there are some nuances about this are good function, which will cover in a second. So let's first start off with Step one, count the number of ad clicks for each quarter below using count. If so, we have Q on Q to Q three Q four, and I've already incom are already created. The conditions message if statement to see if this ad click is in Q one Q. Four and we want to be able to count the number of clicks. Remember, a click is each individual row because this is data that's coming from a database about ad clicks. So weaken. Assume that each of these clicks is a row, and so we can try to count the number of clicks that fall into one of these quarters. So let's start writing the formula here. Zoom in equals count if and count of takes in two options range and criteria. So let's look at the range, which is going to be everything from cells are one Sorry are five down to are 34 and you'll see it shows up in this reference. Now here I'm gonna ply by person Command T I'm going to and the Mac apply the dollar sign Teoh, make this naps reference you can do F four on the PC and then the criteria. Is this going to be this cell right here? Because we want anything that classifies in Q one and hit Enter and this will give us the number off rose that fall within this category of Q one. Just a recap count. If you the first supply that the the column that you want kind of match, uh, or count the number of rows that fit within this criteria, which is B seven, which is Q one. So if I feel this from letdown, it will automatically show this formal automatically pick up the new cell reference to see what court I'm looking at, and it properly picks each one up. And the reason why I have this check some here in this Excel file is because a really common way to make sure you're on the right track again. We just did a simple summary formula Summary table to see how many clicks fall within each quarter after we compared that court. This column quarter to our condition. This check some. This make sure that we captured all our different quarters is a way to see that we captured all rows of data. So if I select all my Rosa data you see here it says count 30 down on the toolbar. I could also say equals count A like this, and this is a quick way to see that. Okay, I've accounted for all 30 rows of data, and this is a really common practice to do in, uh, financial modeling to make sure you're capturing all your data set. So we know that these numbers are accurate because our some equals 30. All right, so now we're gonna do step to use. Some have to get the total area income of clicks coming from Africa. So this is summing up everything in this column right here, and we want to be ableto look at clicks on Lee coming from Africa. So if I go back to here, I have a calm here for Africa is gonna write this from a really quick here if you look up and we did this in the last lesson and we'll say these cells right here that that's table won't pull from comma two commas zero to get the African country African continent rather, and we're going to get these errors for now. But that's OK. Um, I'm going to skip that for now. It's OK. We don't have that. It's OK. We don't have. This is a perfectly labeled country one option, which you learn in the previous lesson is we could do it if error like this if error and then say non Africa and Phyllis down. And this gets rid of that and a And just to recap what this does is it looks and evaluates this this formula. If that's an error, give me back, not Africa. Otherwise I just do that. You look up as it normally does. So we're not gonna find the total area income, which is column M here in our data set. So let's start writing the summit formula equals some if and were going to compare, we're gonna first look at the criteria range, which is this entire calm right here, and we can apply and absolute difference if you want, But we're not going to moving cells around, so that's okay. If we don't, then we're going to say the criteria is Africa because we want to compare this calm just like just like we did with the count. If to see if this has the value, Africa, if it does have the cell, if it doesn't the valley Africa, then the some ranges. What we actually want to sum up in our case, which is the area income and apply that APS reference just in case and close that parentheses and you're gonna get $389,000. Yes, a recap. What this does here is it compares the S five s severity four column, which is our African country column. It sees if that value in that column contains the word Africa. If it doesn't know individually, some up these numbers in column em. All right, And now let's do the average. If not, this is probably gonna make more sense because we don't really care about the total income that's coming from these ad clicks you want? No, from a business standpoint, what is the average income of individual that is clicking on ads from Africa? And so that's where the average if formula, might make more sense in our scenario average if same way, we're good. We did the formula for Harry income. We're going to select all of our cells here and calm. Asked to indicate if it's in Africa, apply and after reference here, Africa And then though range we want to average is calm F, which is our area income, planets, reference, rep, parentheses. And this is more useful because it's tells us, on average, the person that clicks on an ad from an African country has an income of 55,000 dollars. So again, these forms a really useful to kind of summarize a data set in a way to give you some high level data high little stats about your data. Now we're gonna use the arrogant function and accounting function to get the same results, um, to count the number of clicks coming from coming from Africa. Now, the arrogant function is one of those interesting functions because it requires a few different parameters requires ah, function. Ah, the the function number and the options. So I put these tables out here. What? What the arrogant function needs is values to ignore. So typically, I put this as six because I want ignore their values, and I, depending on what you want, account or average or some you can take one of these fuckin numbers. So what we do here in order to make sure we get the right numbers we can say equals AGR it ? The function number is going to be three because we want to count the number of countries that are in Africa, and we want to omit any errors. Actually, this is the reason why I want to get rid of. I want to keep this back and as our original with errors. Let's get out of this. Let's keep these enemies here and the arrogant function. What it can do is the function number is again. We're going to say, three. We want to count the number of of cells three. But we want to omit any errors, which is the six number here. And there's different options here. You, depending on what you want, you can ignore nothing and ended and rose. I typically use six, and then the options, the array rather. Is this going to be this? Sell these these cells right here. It's your family should look like this now, arrogant 36 and then that you get seven. And that's because there's seven countries 1234 567 that have the word Africa in them. But you notice that the arrogant function automatically omits anything with errors, which is what this some, which is what this six does for us. The other option is just do a simple account. If discount the number of cells that contain Africa here, and you're from little looks like that, and you get the same exact number 27 But this this one is a little more useful if you have , ah, bunch of errors in your cells and and or you want to ignore other specific things in our data set. So not as commonly used, but could be useful in certain scenarios. And you can look at these two tables down here, Teoh. Good idea of how you can use different functions within the arrogant formula. So just to recap, um, head, this is different ways of summarizing your your data using count of some if average if and the arrogant function, you should try to using these formulas on different columns in the class project when you're trying to summarize it about the hotels.
9. Scatterplots With Trendlines: in less and nine, we'll be talking about scatter plots with trend lines will cover a kind of chart that you can use for forecasting and predicting results, which is thes scatter plot chart, and you'll add a regression trend line to the scatter plot to help you with forecasting a chart based on data points we have on this table to the right and will try to predict with sales will look like by adding a trend line to the charts and also seeing what the formula is for that trendline. So a lot of things will be happening in this in the next two lessons. Um, and this is what we can. These two lessons are really focused on how you can start analyzing and forecasting data once you've cleaned the data up. So for the first step, it's quickly analyze the data to the right and see what we can make any hypotheses about the data. So we have three columns. We have week, week of instagram posts and product sales. So when I say Instagram post, I'm referring to kind of sponsor Post that look like this when you see the minutes a gram where a company you know is paying toe show up in your instagram feed. And what's the hypothesis that we can make about this data? One hypothesis is that the Mawr instagram posts that you pay for the company pays for the more product sales will result from those instagram posts. So our job is to figure out well how how tightly correlated are the number of instagram posts with the product sales, there could be a scenario where, if you show too many instagram posts than that might actually maybe decreased sales and not increase sales literally, because the use the instagram users are just kind of saturated and flooded with your ads and they won't want to buy your product. But we have this data here, so let's turn to analyze it and see if we can draw any conclusions about the data. So the first step we're gonna do is create a scatter plot and at a trend line to the data we're going to ignore this week, Convert now and just select calm xem through n like this after it's like those those columns say insert. And then if you look in the middle of your ribbon, you should see this scatter plot shirt, and it's really just a scatter a chart with a bunch of dots on it. So click on scatter and you're going to see your chart That looks like this. I was gonna move my ribbon for now. Me kind of expand this a little more, and it's kind of useful, but let's make this chart a little more descriptive so that we know exactly what we were looking at. So I'm going to relabel this title this chart as regression of product sales and it's reimpose. Maybe make this, ah, font a little bigger so that we can may be very clear about what this is. Stick even bigger. Okay, um, now we have the Y axis. This is sales. So let's actually play around with the chart and mixer. We're adding in the right axes titles. So primary or vertical, this is sales. And this Lipsitz at in this title here for Mary Horizontal. This is Instagram pose and let's make these it's a little bigger, and one thing I like to do here well, first of all, let's take a look of data. We can see there's kind of like a trend here. We can see that as INSTAGRAM posts increases, the number of sales increase was not a perfect straight line as we could, as you might expect, but we do see somewhat of a somewhat of a trend here. One thing I like to do with this is there's all this empty space here some might. This is not a best practice because for the purposes, purposes of this lesson, I'm going to actually Ah, let's see here chart title X title axes. More exceptions. Um, I actually want Teoh edit this so that here we go. I want to make the minimum, Let's say 20 so we can see more of our dots. That's really all it is. It's to see more of her dots. You see how it kind of, like shifted the dots a little bit, But now I Seymour of my data in one place. The next thing I want to do is add a trend line to the state of you can do this by clicking on the dots, right click, and then go to add trendline and you'll start to see this straight line that shows up within your chart and by default Excel will give you a linear regression trend line. And this is probably the most common use case. I mean, you can look at all these other graphic exponential like rhythmic, But for the purposes of our exercise, we're gonna focus on the linear chart and, ah, few things I want you to do here is click on display. Quicken and chart and display are scored value on chart. Those two options and you're going to see a formula. Some data show up on your on her chart. Let's make this little bigger. Let's make this also 15 and work. I'm not gonna walk into in depth about what these numbers mean, because this does get a little bit into statistics. But the why equals 25.438 x minus four for April 19 This is it. Remember, from your middle school kind of ah, algebra days or geometry days. Uh, this is the equation for this line right here. It's a straight line and the are square. This is a measure of how tightly correlated that your two variables are In this case are variables are instagram posts and sales. And so a general rule of thumb is anything above a 0.9 means that the two variables are tightly correlated. So in our case, it means that as the number of INSTAGRAM posts increases, we should expect sales to also increase in a somewhat linear fashion. As described by this this formula for this trend line here. Now let's go down here. Some three extended trendline out to 10 periods predict what more Instagram Post could do for sales. So I go back and click on my trendline. You have this forecast, which allows you to forecast things forwards and backwards. If I change, it's a 10. You'll see this lying it extended out past. What are our data set? Has I can even I can also make this go backwards, too. That's right, Click and for much in line and make this battle, say five. And I can see that if I have, that's a 45 instagram post. I should expect my sales to be around 700 this is a nice way. Teoh. Tell your colleagues of your boss that if we put him or advertising dollars for sponsored Instagram post, then we should expect to see a commensurate level of sales even though our data set does not have instagram poser cells at that level. Similarly, we can say if we have instagram sale instagram post below 26. This is our least number of instagram post in our data set. We should expect a decrease in sales down here. So just to recap when you want to see how your data is related to each other and how one variable is correlated with another variable in this case, how Instagram posts are correlated our Sorry how sales are correlated with instagram posts . Ah, a regression trend line can help us predict what those four cat what the forecast is and also to see if there is any any correlation between our two different variables and so question I would pose for you is are there any metrics from your company that you think are correlated with each other? And can you try doing a similar scatter plot charts for those two variables to see if there's a trend line that exists between those two variables? Ultimately, find out if there's any correlation between those two variables in your data
10. Formulas For Regression: in Lesson 10 will be talking about formulas for regression. In the last lesson, you were able to get the equation from the trend line on the scatter plot chart. In this lesson, you'll be comparing that equation with the results of the slope and intercept functions, and you also see the results of the analysis tool pack Adan in Excel formulas to get thesafeside equation that we got for a trendline in their last lesson. If you recall, we built this chart the scatter plot chart right here. And we were able to get this formula from the trendline. And we'll also be using the analysis tool pack to get the same results and get some more regression statistics about our product sales and instagram posts in our original table here. And you can see how you can use thes formulas to forecast future data points on our trendline. So to start off for step one, I want you to install the analysis tool pack. I put instructions for both Windows and Mac over here on the Mac. I'm on the macro now office 3 65 I can say if you go to tools and then go to Excel Adan's. I already have my analysis to a pack installed, but if you don't have this check box checked off, just hit. Check it off and then hit. Okay? And you'll know that you installed the analysis tool pack when you go to data and you have a data analysis option in your ribbon. So just ah, good way to a good tool to have in your tool belts in terms of doing statistical analysis and the windows. I can't show the Windows instructions, obviously, but you can follows instructions here, and we'll give you the same analysis tool pack, um, in your Excel environment. So let's go to Step two. We want to be able to write formulas in Excel that allow us to get this same data that we were able to get from our trendline. If you recall in our trendline, if I go to form a trendline, I was simple. I was able to get these values by simply saying display equation on chart or display are scored on chart. But if I want to get thes data thes values formulae Klay from a data set, I can use Excel formulas for that as well so we're going to first focus on Slope. Let's right it off. Slow function. As you can imagine, pretty simple functions that use equals slope and Slope takes into arguments known, wise and known X is known. Wise is what we call our dependent variable, and that case is going to be product sales because product sales depends on our sponsored INSTAGRAM posts. So it's first like all of these values known wise as our first option and then known exes is basically are independent variable, which isn't superimpose because instagram post Dr product sales. So if I had represents is there, that's the slope of our our line that would go through these data points. Intercept equals intercept. Pretty simple. Also, it also takes the same arguments known wise and no nexus. So it's like known wise comma known exes are instagram posts, referent sees center, and that's going to be our intercept. Negative for 48.19 and then R squared. This is our Eskew, which stands for R Squared also takes in known wise and no nexus. Um so select known wise comma and then no taxes. 0.99 point 9112 Now, if I drag in my chart really quick over here, you're going to see that these values match up. Exactly. So my slope is 25.438 which is right there. Intercept is negative for for 8.19 right there. And then r r squared, which is are a statistical measure of correlation also matches up with what we have on the trend line. But the key difference here is that we were all able to get this these numbers we available to get these numbers by using formulas of using slope intercept an R squared and just to do some recap on your kind of Ah, your algebra days if you remember of them from back in middle school or high school is this number This intercept indicates the the Y axis on where when the ah when the very when the waxes will when the line will cross the y axis. So if I go back to my trendline here and let's say I I shall want to fork, I want to format my axes again. Let's go toe Whoops. Chart design, turn element axes. More exceptions. I want my access to be. Where is that option? Here we go. Zero. And if you notice that if I were to extend this trend line backwards, it's just make it go all the way back to, like, 50. It hits the Y axis at negative for 4 April 1 1906 And that stern line all way back down through zero. Be negative for for 8.19 Ah, another kind of I guess Wait, an analyzes result is X is our interim post. So for every ink for every increase in one instagram posts, we could expect $25 euros in sales. So if I plug in a five into X, so if I say equals five times 25.438 plus this negative for 4 April +19 that's the number of sales I should expect if I only have five instagram posts. But if I have 50 then I'll have 100 $23 or euros of sales. So this family could be really useful when you want to predict and calculate what your sales will be, or whatever metric trying to calculate whatever the dependent variable is that you want to get into data set. Um, using this formula as a way to forecast that number. I'm gonna move this formula. This chart back down here and Step three is we're gonna now use the analysis tool pack that we installed in step one to get more regression statistics about already to set. So let's go to data. Did analysis and you'll see a bunch of different options that announces tools. Did analysis tools, gives you I want you click on regression because we're gonna be doing a regression on our INSTAGRAM and product sales data It okay, And you're gonna get a menu that looks like this and for input wiring is you want to select all the everything from M 23 through M 35. So you want include the label. This is our again dependent variable are known wise for input X range. You should select everything from l 23 through l 35. This is our X range and you want to make sure you check off labels because we have included the labels for our two columns here. The output range. You should make that select this outward. Basically wanna say where we're going to output are results and I'm going to select a 34 as our starting point for that. And then just click on residuals just to see some additional data set data you can get from this analysis. So positive. Pause the video if you want. And just make sure you're regression settings. Looks like this window that I have here and once you have this set hit okay. And you're going to see the announces tool packs spit out a bunch of data about our instagram posts and product sales. You'll see that we have the are square, which is 0.9112 which matches up with our score here. But you noticed this is a hard coded number and that's important to know is that the announces tool pack gives you a snapshot of your data and time. So if you have, if your instagram posting product sales changes, these formulas wirelessly change because this is tired of the data set. But the analysis tool packed it won't change because this is like a one time analysis. Just one thing to keep note about. The analysis told pack, You also see down here we have these coefficients which match up with what we calculated here with our formulas. And they also show up here on our trendline formulas. So to note, we have our intercept, which is negative for 48 which is what we got here negative for for eight, which also shows up in her formula here. Negative for for eight. And our slope 25.438 is tied to our instagram post right here. It's gonna bolt easily. Quick. 25.43 is, which is also what we show up here and are slope formula. You also all these others physical, uh, metrics that I'm not gonna talk over right now because this is definitely gets more in advance in terms of statistics. But the key thing I want to walk your walk away with is that you can get a lot of really interesting statistics. Ah, and requestion numbers from doing this analysis tool pack on your data set. So just to recap, you there multiple ways to get sophisticates. Ah, bio datas that we did. We did this through putting on a trend line and getting the firm left with a trend line running the formulas out here and then also actually using the announces tool pack to do a one off analysis on the data set. Another recap is that you can start to forecast eight up by getting the trend line, getting the formula and then actually using this formula to calculate what individual values of X in this case, our instagram posts can yield in terms of park cells by putting in that formula into X and then getting the why on that line. One caveat is you want to be very careful about using trend lines of forecast, everything. If this our score is below 0.9, it may not. You may want explore different trend lines that can fit your data. But in general, when you're forecasting data with trend lines, you don't want to make it too aggressive and you can't use from lands to forecast everything. There are many examples of forecasts gone wrong, and I'll show you some examples in the class description for the this. Ah, this lesson would just be very careful when you're forecasting things on trend lines because not every not every single data site should be should have a forecast. That's this perfectly tied to it. You want to make sure you take into count multiple variables when you're doing forecasting to make sure that you get an accurate forecast for your data.
11. Congratulations & Conclusion: Congratulations on taking Intermediate Excel for the real world Just to give you a recap. We covered everything from formulas to manipulate your dates to creating advanced formulas for cleaning up your data for analysis to doing some basic or aggression and forecasting or your descent. A few key takeaways is that clean data and data quality are really important for your you and your colleagues To be able to trust your data. Excel in Google Sheets formulas give you statistical tools to help you forecast data such as analysis, tool, pack. But you should always question assumptions and inputs that go into a forecast or prediction . I just want to say thank you for taking this class. Hopefully, you learn some intermediate skills for data analysis and cleaning up data, and these skills can also be applied to Google sheets. And in addition to taking this class, I would say a lot of learning comes from you playing with Excel on Google sheets every day on your job. So definitely get the habit of doing that and just dedicate some time to experimentation in your spreadsheets, read through the class projects and feel free to message me or leave a comment on the discussion forum for this class. If you have questions about project, and I look forward to seeing your results and exploratory data analysis on the class project and thank you very much.