Learn Power Query for Excel & SAVE Time | Chandoo | Skillshare

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Learn Power Query for Excel & SAVE Time

teacher avatar Chandoo, Become Awesome in your Work

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

9 Lessons (2h 32m)
    • 1. Introduction

      0:51
    • 2. Getting Started & 5 data cleanup tasks with Power Query

      17:27
    • 3. Adding Columns in Power Query

      13:07
    • 4. Homework Solution

      2:42
    • 5. Merging Tables and Filtering Data in Power Query

      14:06
    • 6. Automating Data Combine & Clean-up

      20:50
    • 7. Unpivoting (reshaping) Data

      17:44
    • 8. Bonus lesson automate stuff with Power Query

      64:40
    • 9. Don't forget this video!!!

      0:20
  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

443

Students

2

Projects

About This Class

Power Query is THE MOST amazing feature of Excel. Using Power Query, you can:

  • automate data clean-up tasks
  • combine data coming from various places
  • remove unnecessary data points
  • spot data quality issues early on
  • spend more time on doing "actual" work instead of data "cleaning"

In this short yet concise course, learn how to use and take advantage of this beautiful aspect of Excel.

2dc5168b.png

What previous students say about Chandoo:

"AWESOME trainer"

"Wholesome & passionate"

"like watching a magic show"

"I love Chandoo's neat and tidy presentation style"

About your trainer (Chandoo):

My name is Chandoo & I have been using, learning and teaching Excel for last 15 years. I run a very popular website (chandoo.org) and youtube channel (chandoo). In 2022 alone, I have trained more than 1.4mn people with my articles, videos & courses. 

I am 13 time recipient of the prestigious Microsoft MVP Award

When I am not teaching Excel or Power BI, I like to play with LEGO, take our dog (named Excel) on walks or solve crosswords. I live in Wellington, New Zealand with my family (wife & twin kids).

Meet Your Teacher

Teacher Profile Image

Chandoo

Become Awesome in your Work

Teacher

Chandoo is an award-winning Microsoft Excel & Power BI trainer. His life's mission is to make people AWESOME in their work. He has been teaching data analysis, visualization & dashboards for over a decade to professionals all over the world.

Chandoo runs a popular website for Microsoft Excel + Power Bi at Chandoo.org

He has received the prestigious Microsoft MVP award for his contributions to the tech community.

Chandoo lives with wife (Jo) & twins (Nishanth & Nakshatra) in beautiful & occasionally windy Wellington in New Zealand.

You can catch Chandoo on his Youtube channel, where he regularly publishes videos on all things data.

See full profile

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Introduction: Power Query is the most awesome feature to be added to excel in a while. In this Power Query Power course, I'm going to teach you how to use power Query to connect to any type of datasets, bring the data, and how to clean it using a step-by-step automation technique. You have five video lessons in this course. So this is a really mini-course and you can binge watch this in an afternoon. By the end of this course, you will have some special powers within you. You will be able to take up any kind of data that looks dirty or mangled or all misshapen. And then you can say, Yes, I can clean it up and I can analyze my data with automations. So thank you so much for getting this course. Let's jump into our lessons now. 2. Getting Started & 5 data cleanup tasks with Power Query: Here is the sample data that we will be using for the first two videos. In fact, we will use the same data for all of the videos, but I will introduce some additional datasets. For videos 34. This is a typical employee dataset because it's quite neutral. We can all relate to it. As you can see, while it does look clean, there are some problems with this data. To get to this data into Power Query, you just have to select all the data. And then from the data ribbon, click on from Table Range option. Power Query would insist that your data should be in a table format, so it lets you create the table. At this point. We just say, okay, in our data gets turned into a table. This will open up the Power Query Editor. And this is the screen through which we can tell Excel how we want our data to be cleaned using data processing rules. We can see the table name here. It is called Table one because we have created the table on the flight Excel name of this as TB1. It will show you a preview of the data. Think of power query editor as a separate screen where we can go and configure various things. We will use this to answer the first five data cleanup questions and then you will become familiar with the screen in that process. Our first question is we have got employee ID here, but it is having two different values mixed up. The first two letters of employee ID or my cost center code. And then the next five letters are the employee number itself. So we want to split the cost center and employee number into two separate columns for analysis purpose. To split the column, you can select the column and then you can right-click on the column in here you will have a Split Column option. Notice that there are different ways to split the column. For example, we can say I wanted to split by number of characters. This is really what we need. I want to split two characters and five characters. Alternatively, if you have got a delimiter, you can use that or some other exciting options here as well. It'd be used the number of characters and specify that I want to split two characters once as far left as possible. This is just going to split the column into two separate columns, first two letters and everything else. When you click okay, this is just going to create two separate employee ID columns. So your original employee ID column is now gone. And instead, Power Query will give you two separate columns to work with. And they will be named ID one and ID two. You can double-click on the column and name this as cost center. And this one would be ID number. Here. Through this process, what we have done is we have bald Power Query that given this data of employee ID, split it into two columns. So while it happens with the current dataset, whenever you have neat new data, if you run the Refresh process Power Query will grab the new data. It will automatically split that data as well are mixed problem is, in the department column we can see that several people have null department. These people should not even be in our list of people. They are no longer working here, but somehow our data import system kind of brings them over. So we knew we wanted to remove these people. Any rows from the data. You can use the filter option. So click on the filter and if we uncheck anything that will not be part of the final data that Power Query will provide. So if I'm unchecking the null, I no longer see that. I can use the same screen. So for example, exclude any training staff and support staff if I don't want to see them for a specific analysis or data situation. In this case, we will keep all of these but just exclude the null employees. And when you click OK, those people that also gone from the data to see the steps that we have applied on the data, you can look at the Applied Steps area right here in this corner. And then it will list you all the steps that you have taken so far, clean the data. The steps that we have done are we sorted the data from Excel. So this is the source step. Then we split the column by position. So this column is splitted into two columns, Employee ID one and employee 82. When you click on a specific step in the query, it will show you the data as of Dutch step. At this step when we split the columns are still called ID one and ID two. Then in the third step we renamed the columns. And then in the fourth step, we have applied a filter on the department column so that we remove the null department. Let's say I changed my mind. I get the mail from the finance team that our cost center code is not first two letters, but it is the first three letters. How do we change it? Simply look at the step where you have made the change. So for example, Split Column by position is the step. We locate that. And then there is a gain or cog button right next to the step. You click on that. And here you can just say first three letters are my cost center code. So make the two S3 click Okay. And you know how fast three letters here, and then the rest of them are there. Technically, this is not correct. So I'm just going to undo this step by changing it back to two irrespective of which step you are viewing. Whenever you finish your work within Power Query, it will only give you the data as of the last steps. So it's not gonna give you the intermediate values, just the last step alone. Got third cleanup problem is if you notice the name column, you can see that some names have some extra space in the end or in the beginning. So here I have got my extra spaces. This is again, a common problem in many data cleanup scenarios when you bring the data from an external system, sometimes it will have some extra spaces. So we can clean up that spaces. To clean the Space, right-click on the name column and then hear from branch form. You have the option to trim the data. What we'll do is it will remove any extra spaces at the beginning or at the end. So we run the trim and those extra spaces are gone. So far we have done three data cleanup steps. We have split the employee ID, we have taken out the extra spaces and we have taken out any employees in the Department. For our mixed data cleanup scenario, we have employees start date and we just wanted to calculate how long they are here with the organization. As the venule. We have a date column here, but notice that the datatype here says ABC123. Before we do any operations on the dates, it's a good idea to convert the data into the data datatype. To do this, right-click on Start date Change Type, and then select the date option. When you have the date data type automatically, the icon here will show a calendar logo instead of ABC123, which was what it was showing earlier. Now that the start date is treated as a date, I can go and introduce a new column which looks at the start date and then my current date. The date as of recording this video is 31st of March 2022. And then tell me how long each of these employees have been with us. To do that kind of an operation from home, you need to go to the Add Column ribbon. And from here I can introduce a calculation as a new column. From here on the date. I'm just going to select the first option, Add Column date, age. What this will do is it'll calculate the age of data date as of current day from today minus that date is what the age would be. When you add age, you will get the Aij added here. The age will be shown in the format of base hours, minutes, and seconds. So it says this employee has been with us for 503 days. This person needs here for 1344 days. While this is a good way to calculate the employee tenure, or 503 is not really meaningfully. Ideally, we want to calculate how long they have been with us in ears or something white. Keeping the age column selected. We can go to the Transform ribbon and from there I can change the age to another format. So add column will add the column, whereas transform will change things in place. From transform pages shown in a duration format. You can see the stopped clock, kind of an icon there in frame duration. I can just say sure, this age to me in total ears. This is gonna calculate the employee tenure in EX this person has been here for 1.307 years and that person has been here for 3.68 years at this point, you can also double-click on the age and then rename this as tenure. This age calculation is a dynamic calculation. So every time you refresh Power Query, it's gonna read on that calculation and update the tenure as of duct refresh date. Right now it is 31st of March, 2022. But in future, whenever you are watching this video and you're refreshing this data, you will see a different value here. One thing that you might notice if you're following along with me, ease, your Power Query is clean, is probably looking slightly different and the steps are also looking slightly different. So there are a couple of things that I have done at my end. And let's just reveal that the number one difference that you might notice is you may not have the formula bar on the top. This is something that is optional and I like to keep this on. To see the formula bar, you can go from transformed to the beauty button and enable the formula bar option. This is a onetime thing. And when you do that, every time you open Power Query from any other Excel file, it's going to show you the formula bar. You might think what's the point of formula bar Power Query uses its own language called M language. And because M language is quite different from Excel formula language, I like to understand what's going on from time to time. If we have the formula bar on there, you can read what's happening and make sense of the steps, because each step has a piece of logic or code attached to it. For example, renamed Columns step here says You took the column age and then renamed that to tenure. So that's the logic that part 20 is using. Likewise to calculate the total ears. If I select that step, you can see that it took the durations total days, and then divide it with 365 to calculate the age. Here you can see that this is not paying attention to the leap years or anything irrespective of what the ear is, it's always dividing with 365. Having that formula bar is a great way to kind of get a sneak peek into what's happening behind the scenes. And I like to keep this one on for my own amusement purposes. Let's conclude this lesson with adding one more calculation, which is given the FDG of an employee. I wanted to add a column that tells me if they are a full-time employee or a part-time employee. This kind of calculation is called conditional calculation because if they're FDI is one, then they are full-time. Anything less than one is part-time. We can use the Add Column Conditional column option to introduce such a calculation. So add column Conditional column. From here specify the column name. So this is my full slash part, that is the name of the column. Then we build the condition using this particular screen. So if FDE column, and then here it says equals less than, greater than etcetera. So I'm gonna say ys less than one. That means they're full-time equivalent value is under one, then they are a part-time employee. You can kind of build a ladder of collisions here for now, we'll just do if-else. If they are under one there part-time, else they are full-time. And then we'll click Okay, and it will introduce a full slash part calculation here as a column for me, what tagging the employees as part-time or full-time. This kind of thing is very useful if I wanted to do some analysis on employees nature in downstream. Now that we have five cleanup steps which is splitting the employees cost center and ID number, cleaning the name, removing the department, adding their tenure and adding their nature of work, whether full-time or part-time. Let's go and load this back to Excel so that we can see how the loaded data we really looks like before we do that is it's called table one. Table one is not a very good name, so I'm gonna name my query as. You can. Just select the name option here and then type the name. I'll call this as staff and hit Enter. So now the query will be called staff from home, that has a big Close and Load button. For now, we will click that in the subsequent videos, I'll tell you how to use the other loading options as well. So we'll say Close and Load. Then what this will do is it's gonna load up a green color table. I think this is the default color that park where he chooses all the time into Excel in a new tab and the name it has, the tab is called staff as well as the table itself that it creates is also called staff. This green table here is the cleaner version of your data. Your original data is still here in the Data tab. Notice that when you created the Power Query, it kind of turn that into a tabular format. So this is my original data and that's my new data. You can see that here, Darren Scalia, the employee ID, SBI zeros zeros 07. In the cleaner version, it got split into PR and zeros 07. Likewise, Darwin's qualia is what, 0.3 FTE, and they joined on 13th of November 2020. So based on that information, these two we got calculated their tenure and then we tag them as part-time employee as well because they have a department that are listed here. But if I go here and then look at some of these people like minute while Stafford, David and not be here in this data because they belong to the null department. So when the data got cleaned, they kind of got exit from the data. And you can see from this filter here we don't have the null department NADH. We will have Minerva in the search results here as well. This is how the data gets cleaned if something changes. So for example, we get the message from background corporate that Minerva is not a null department employee, she moved to legal. I changed my data. Likewise, furtherance will update their FDI from 0.3 to 0.6. I'm, I'm actually going to make them one that we will see that they become full-time. We made two changes. We made the torrents as full-time employee and we mowed Minerva to legal department. We've come here to fetch the character data. Again, you just have to right-click and then refresh the data. You can do it in multiple ways. You can right-click and refresh. You can go to Data and then refresh. Or if you have the query some connection panel shown on the right-hand side, you can also right-click here on this and then refresh. Irrespective of how you do, you're just doing the same process. So we'll refresh this. And boom, we get Minerva here. We get tolerance here with DEF d1, and then they become full-time employee. There is no change in their tenure because we're still rerunning the process on 31st of March. But if I want to read on this again tomorrow or day after, I'm gonna see completely different values for that as well. So that is the first video where we loaded the data into Power Query. We did five cleanup tasks and then we loaded it back to Excel. Let's move on to the next one. 3. Adding Columns in Power Query: In the previous lesson, we took our staff data and then applied five data cleanup steps, which is splitting the employee ID into cost center and valid ID number, removing spaces from the name column, and then taking out any people who are in the null department, as well as based on the start date, calculate their tenure, and based on the FTE, figure out whether there are full-time or part-time employee. This is the data that we generated after cleaning using Power Query. In this lesson, let's continue to cleanup process and add few more steps to it. Now that we're here in Excel, how do we go back to part query? You can do this in few different steps. Number one, you can go from home to the data ribbon. And from here, you have got your queries and connections. If you click on that queries and connections, you will see all the queries in your workbook. So this is my staff query right now we only have one, but potentially you could have multiple queries as well. When you look at a particular query, you can right-click and then say Edit declaring. Likewise, the second option is you can click on the table, the green table that Power Query has generated. Whenever you click inside this kind of air power Query table, you will see that there is a Query ribbon appears on the top. And if you go there, you have got the nice big Edit button on the codeword as well. You can use this to get back into our query. Let's edit this and start doing a bit more cleanup of the data. How are fast cleanup step in this video is going to be on the salary column. What is that salary column has got some null values. And anybody who has gotten null salary, this is because of the particular HR policy. And so anyone who's getting paid exactly $45 thousand salary comes up as a null. Don't ask me know, Just made-up scenario here. We wanted to just replace the words with 45 thousand, so that should be their member. But before we do that, let's just understand quickly what these green bars on the top mean. This green bars tell us the column data quality. If anything is fully green, that means it has got kind of data all or the column. But if some columns have got some null values, like here, I have got null values. The green bar is not all the way full. And when you hover, it tells me that 931 rows have values and 4141% more 4% are empty. And it also gives me an option to remove the empty. We don't want to remove anything empty. We just want to deal with the salary problem here. So here, within the salary, for example, 40 values are empty. And it also would highlight if there is any errors right now we don't have any errors. But if there is error, it'll highlight that as well. Let's just go and add a rule that says if the salary is null, then there the value should be 45 thousand. This kind of process is called replacement. So all you have to do is right-click on the column and then use the Replace Values option. You might be thinking, why can't I just select this and typo or the data? You are not allowed to do such kind of a thing within Power Query. Power Query is a rule-based Engine, so every cleanup step that you do must be a ruin. So I'm going to right-click and then choose Replace values. The value to find 0s and null. We wanted to replace the Arctic 45,001. Note of caution here, Power Query is case sensitive. So if you type null exactly the way it appears, then only to work. If I were to type this in capital letters, even though for our eyes they are the same. This is a different value and that's not really going to work. So we're just gonna say null 45 thousand and then click Okay. Then it's going to add a rule that says if someone's salary is known, then it's going to be 45 thousand. Now, our salary column is green all the way through. Our next cleanup scenario is given the employee name, I want to extract just their first name and then print that in a different column. We could then use that for maybe sending out a letter or whatever. Then you just want to print their FirstName. This kind of a thing is where I want to extract everything up to the first space. So that's the hostname. You could do this in a couple of different ways. Number one is you can select the column and then from the column, you can use the column from examples. In this way, you will let Power Query figured out what the logic for doing such an operation is. Think of column from examples as the Flash Fill in Excel. But here with column from examples, it is a repeatable. That means if your data changes and you read and the process, it's called extract the first names for the new data as well. We'll do this column from examples. And from here I'm just going to type my firstName, Torrance. And then at this point you've kind of predicted what we're going for. And then it, It's shortly all other values that it's going to get if the first value is to be taught. And let's just read the formula as well. What formula it is using, is it saying x-dot before delimiter. Looking at the name column and the delimiter is space. That's the actual M language formula that park where he is using here. When you are happy with this, you click Okay. And then that gets added in if you'd be called text before delimiter, I'm just going to double-click on this and then say firstName. That's the name of this column. And we will get the employee first name here. It will still keep your original name, but it's going to add a new column called firstname. Now that we added this, I want to see this first-name right next to the fullName, not all the way here. One way to move this column is click on the column, hold your mouse and then just move it right next to the name column. This will rearrange the columns with the name here plus num here. You can look at the M language code here it says tabled or to reorder columns. We'll continue our discussion with the name column again. Given the name, I want to rewrite the name as, for example, taurine Scalia is their name. I want to call them as qualia, camera tolerance, chancy comma shape, bird comma gati. Again, this is how in certain situations you want to have last name, comma, first name as an option. We will make another column, which will be name2. And then that's where inductor and if avoiding will be done. Again, we could use the column from examples option. But because you have already done that, I'm gonna show you a different technique, this different technique in walls from that column we have got an extract option. From the extract I have got fixed before delimiter text after the limiter options. I'm going to use this to kind of map out the name to two different columns. We already have the first name column, so we'll use the first name, we'll just generate the last name, and then we'll do that. So extract text after delimiter. Then the delimiter is space. So I'm going to just press space here and click Okay. This has got to extract endodermis. Put it all the way at the end. It's all text after delimiter, and then it'll just have these last names printed here. Notice that some of these last names have some extra space. And the big thing as well, this is because vendor name is entered into the system. It does how multiple spaces in the middle and those spaces are not removed by trim. That's why in those spaces are coming up there. This is all good ones. The big stuff to delimiter is coming up. I can right-click quickly, trim this as well. So this excess business is gone. Now we have got a last name here. In a first-name here. Then what we want to do is we want to take the text after delimiter, place a comma, then make the steam into the final column. Here we will use the custom column option. We have the conditional column, we applied the column from examples. We'll try the custom column. This is where it will open a screen asking you to write the M language yourself. This is my name2 column. Here. Will pick the text after delimiter. That's the last name, ampersand, within double-quotes comma space. And then we'll pick the firstName. This is the M language formula to introduce a new name that takes two columns and then put the comma space in the middle there. Click Okay, this has gone to call your parents chancy, shame, but gati like that. Now that this column is generated, we no longer need this guy here. We only use it to get there. At this point, you can right-click on text after delimiter and then remove this button. Or if you want to keep it, you can keep it and then rename that as last name. I'm just gonna remove it. Now we have got a name2. And again we will move it right next to the name. We have name, name2 and then firstName here, it all nicely listed. Last transformation or our last data cleanup in this particular video is going to be looking at the start date. We know how long an employee has been with us, but some of the employees have left us as well. So for example, we have a date of termination telling us that as of September 24th, 2021, this particular employee, Ali, has left the organization based on the date of termination. I wanted to know if an employee is the current employee or not. We will have a active indicator column. First step, you can see that date of termination is ABC123. And so I'm just going to right-click Change Type to date. This way, you can have a date or lull. And again, the column quality tells me that only 80 people have a date of termination. 892 are empty. That means they're all still current employees. Based on this, I want to add a column that tells me whether their current dot naught. This kind of a thing is perfect for the conditional column. So we'll just use add column, Conditional column, and then see if date of termination equals. And then here you can just type null and outputting is yes, that means they're active employee. No, no means they're no longer act to click Okay. Anyone who laughed and they will have no there because they will have a date of domination. That concludes this particular video. But before you vanish, I have got a homework assignment for you. The homework assignment ys on the salary column. I want you to look at the salary column and then create a salary group has a new column. The salary group logic is simple. If you absorb the salary column, our salaries go from $28 thousand all the way up to $119 thousand. So it may be a $120 thousand. We want to group our employees into four buckets. Anybody under 50 thousand, anybody under 80 thousand, anybody under a 100 thousand, and then more than a 100 thousand. So those are the four groups under 5050 to 808100 and more than a 100. Based on the salary, you need to introduce an extra column here called salary group, where such a value can be maintained. I leave it to your imagination on how to do this. But if you have some trouble with this, do watch the homework solution video that explains that process. For now, I'm just going to close this particular one by clicking on clothes. And Lord, this is just going to update our green table. Those extra calculations and extra name columns now so you can see that original lame is here, name2 plus name, and all of these other things as well. So if I want to update someone's termination date in my original data. So for example, parents decided to leave on 31st of March 2022. So we put their termination date here. You come here, right-click and then refresh. Immediately. They will buy market as no longer active and death date of termination will appear here as well. That's good luck with your homework. I'll catch you in the next video. 4. Homework Solution: To add the salary grouping as a column, you can use the conditional column option and build that kind of a ladder. So we'll use that conditional column. Here. The column name is salary group. If my salary is less than 50 thousand, then the output would be less than 50 K. Then we will add one more clause. If salary is less than 80 thousand, then the output would be 5280. I've taught clause, which is if salary is less than 100 thousand, then it needs to be 8200 K. Else, we don't need to do that thing for one more clause here, we can just use the else clause. And then we will say greater than a 100 K. So those are the outputs. You can type literally anything here. You could, for example, have a label like low salary, medium, high, very high salary or whatever. When you finish doing that, if you click Okay, it's called add those salary groupings here for you. This is an excellent way to bring some sort of extra values based on business rules and then use them in Excel. You could have built this kind of a column in Excel as well, but because this is a port query video, I'm teaching how to do this within Power Query. If you change the conditions and let's just say our criteria is no longer 5080, but it is 5075. Then you can use the cog button here. The cog button, I can click on it. And this is going to have those boundaries here, and I can simply change the value. So here from 80, I'm gonna just seems this to 75 thousand. And the renamed the values here as 7575 to 100 K like that. Make sure to add just all the things that need to be adjusted whenever these kinds of things change and when you click Okay, that will update. Another way to do these kinds of changes, IZ instead of using the cog button, if you have got the formula bar visible on the screen, you can directly edit the items in the formula bar as well. Maybe are less, lower range is not 5845. So I can just type 45 thousand here and then rename my labels as well as 45 K. And that's called adjust. Update as well. Whenever you finish typing the formula, if you presenter, that's just going to change those things as well. 5. Merging Tables and Filtering Data in Power Query: So far we have cleaned up the data within one dataset. In this lesson, I'm going to introduce another dataset and show you some more techniques for the purpose of this lesson, I'm going to assume that each of our employees is going to receive a bonus based on their salary. The bonus percentage is decided based on the department. So if you are in training department, let's say you receive 2% bonus, but in human resources department you might receive 3% bonus. To make all of this happened, we have got to follow a specific policy. The policies that the bonus is only awarded to employees who are currently active and who are not temporary. So they need to be the permanent or fixed it comes staff who are active. Let's go into Power Query and kind of do that process where we take anybody who is a temporary or an inactive employee and then bring the bolus details as well. If you go to Query Editor, this query in here, Let's go ahead and remove anyone who is not active. So I'll select this and then say, I don't want any inactive employees. Likewise, we will come to the employee type and uncheck the temporary. At this point, we have reduced our employees to adjust dominant and fixed symptoms staff who are active. Now we need to calculate the bowlers based on their salary. For the purpose of this, we do have a bolus mapping table. Let's, let's quickly Lord this back so that I can show you the bolus mapping table. Here is my smallest mapping table. It tells me by each department what is the percentage of bonus we are going to go? We need to bring this data into the dataset that we already have and then combine this to generate the bonus calculations. You can, for example, Control C this go here and then paste this data. But because the data is in another file, we can directly connect to that as well. Instead of copy pasting, we'll use Spark query to first bring the data and then combine that with this data. Go to data and then get data from file from Excel workbook. This is because my bonus mapping table is an Excel file. If you are born as mapping using a text file or it isn't a database, you can use those options. So I'll just use From Workbook. And then point to my bonus mapping. I have a more advanced version of bolus mapping problem in the bolus mapping to files, which we will talk about in the next video. This will show you a navigator screen asking you pick what data you want. I want to bend my bolus mapping tables. I'll select this. We don't need this table, instead we need to calculate the bonuses. So we'll get into transformed data and this will load up the bonus table into my park already. You can see that our staff table is also there. Now, my bonus table tells me what the bolus ease for each department in the staff they would like and see what department the person belongs to. If I can get right next to department, they bonus percentage that I've taken the percentage multiply that with salary to get the bolus value as well. So this kind of an operation is called margin to tables, where I want to match this table on the department column with the mourners Stevie. Here. To do this, you can use the Home ribbon Merge Queries option. This will open a dialogue. Ask you which column you want to match. So on staff table, I want to manage a department with my bonus table. Department. This is going to quickly provide you a feedback here, saying that you're able to match up all the rows here with those. And then just click Okay. In here we will have the associated bonus table for each employee here. If I click on this table, I can see that because that person is human resources, they are required for the bolus is 0 for this person product management, 0.05 or 5%. We don't need the table, we need to just extract the bonus value. To extract the value, you need to click on this sideways arrows button and uncheck the department keep the bonus and uncheck this option also use original column name as prefix. This way we will just get a column that says bonus. Boom, we get the bonus value. The bonus value shows up here as a decimal number. I'm just going to right-click here, change type to percentage. This will show me what the bonus percentages. This has only part of the problem. We do get the bonus percentage, but what we need is actually the boneless value. To get the bonus value we need to take the salary. Then multiply that with the percentage of boldness. To do this, select the salary column first, hold down your control key. Go and pick the bonus column. This way you have selected both of the columns. Now from Add Column. You can do an arithmetic operation of multiplication. So add column standard multiplying. This is going to create a multiplication of those two numbers, which is salary times bonus, and then add that as a member here with the bonus value. We can just change this back to cut and see if you want to. And then you will see the multiplication result as a currency value of how much bonus each person receives. Gonna rename this as my bolus amount. Now that the bonuses are calculated, our next job as part of the data analysis East to split this bonus calculation values into two tables. One for all our employees in USA locations and another for all the employees in New Zealand locations, do that bit. What we want these instead of loading the staff table as a single item, which we will see that if I load this now close and Lord, we're gonna just get the staff table updated with the bolus values here. You'll also get the bonus table here. This current way of loading the data is not what we want. Instead what we want is we don't want a single staff table. We want to separate staff tables, one for all the USA locations, that is Bellevue and Los Angeles, then all the new Zealand locations, which is only one, Wellington, New Zealand. So I'm gonna go back to my Edit Query. First step, let's split this data. To split the data, you have two techniques. Number one is we can create an exact replica of this. This is called duplicating to duplicate a query law that they have staff, I can right-click on it and then say Duplicate. I'll get staff to staff and staff two are exactly same quantities. If I go to staff, you can see that it has all these steps. Staff too. We'll also have all these steps. They both begin from the same Excel file. Apply all the cleanup steps to come to the final stage. Then in each table I can then go ahead and do the necessary bits. So for example here, I'll say that I want to just see my USA location. So in staff too, I will uncheck my Wellington, New Zealand, and then we will call this as staff hyphen USA. We can go to the original staff table. Here, go to the location and unchecked that USA bit so that only the New Zealand locations remind here. And then, and name this as staff hyphen ends up. This is one approach. The problem with this approach is both queries will need to run. So staff engineer needs to run all these steps. Staff USA will also run all these steps, but this is kind of duplicating the work and that is why it is called a duplicated. There is also another option called reference. First, let's just use the duplicate option, load up the data, and then I'll come back and explain to you how to use the reference queries before we load the data. We also want to make sure that the bonus table doesn't get loaded into Excel. We don't need the bonus table to see on the screen. We just used it to do the multiplications. So instead of closing, Lord, we can just use Close and Load To option. For now. I'm just going to say only create connection. And then click Okay. At this point we will have all the three queries listed here. Staff USC is the only connection, but enzymes and bolus stable or loaded. Now what we want is we don't want the bolus tables. I'm right clicking on the bolus table, going to load two. Here. I'll just say that this should be only a collection. We don't want any table, we'll click Okay, this is gonna give me a warning saying possible data loss. This is because the data is already loaded. This is fine with me. I'm just going to click Okay, and bolus table is now just a connection. I can delete this worksheet. And then we have staff enzymes already loaded. Law. I'm just going to right-click and then say a staff USA, they should also be a table into a new worksheet. Boom, we get two of the query's lauded as table, and then one other query just maintained as a connection. As I mentioned earlier, while this was all good, both staff and z bench staffing USA queries need to run all the steps. This is where the reference query comes into picture. Let us see how to use the reference query. For the purpose of this, I have created a separate file, staff data3 reference query. And notice that this staff file has all my people, not just the New Zealand steps. So here if I go to location, I can see everybody. Let's right-click and edit this query. And here we will leave the original staff table as a tease and right-click on it and make a reference query. My original staff, they will ease the master query and then we're gonna build a separate query called staff and z. Notice that staff and z, because it is a referral query, it has only a single step. The source step simply refers to that. This query is the same as the staff table. This way, this query, the staff NSAID query will not run until staff table has finished its processing and then it just really uses those values within this query. I'm now going to the location and then just say, this is only the Wellington staff. This is staff answered. We will make reference to the original staff. So right-click and then reference. Then this one we will call staff USA. Again. Here we will apply the same kind of filtering. We now have four queries. The original query, my bonus stable and then to reference queries staff, NSAID and staff USE. I'm gonna do one other thing on both of my staff, NSAID and staff US equities, which is let's say for the purpose of bonus analysis, we don't need all of these columns. We only need certain columns. We can go to staff NSAIDs to begin with, and then from home choose columns. And then kelp, our query which columns we want to see in the final output, I will deselect everything, will keep cost center employee number. And then the name2, which has my last name, comma, first name, gender, department salary. We don't need none of these other things, maybe employee type and location, bolus amount. Those are the only columns that we want. And this is going to give me that output. Now I want to repeat the same exact step on the staff USA as well. You could go to staff USC and then again, use the choose columns thing. But here is a little sneaky trick. If you have got your formula bar, this is something that you can readily use to begin, go to the Formula bar, making sure that you have selected the remote other columns option from this query. Then you can see that the M language code that park where he has generated to keep those columns. So it says table dot select columns, rows, and these are the only columns I want. So I'm just gonna select all of this formula control C to copy it. And then I'll select on, I'll go to my staff USA table. Now, here we'll click on the Effects button. This is going to add a step. And then in this step, we will paste that formula. Make sure that there's no extra equal to certain eating. And then hit Enter. Boom. My staff USA table also has the same columns. Quick note of caution though, this method will not work if both of these queries do not have the exactly same steps in the same order. That is all. Now I'm just going to say Close and Load to. For now we will just create a connection. Then I'm just going to load up this data. Staff enzyme should become a table in the new worksheet. Likewise, staff USE should also become a table in the new worksheet. We get both of our cuts off data staff USA and staff insert here. 6. Automating Data Combine & Clean-up: So far we have only bought could with the data that is in one place altogether. In this video, let's talk about how to deal with scenarios where your data could be split in different ways. First example is instead of having all the staff data in a single tab in one continuous set, I have got individual tabs, one for Wellington, one for Los Angeles, and one for Bellevue data. We would like to combine everything into one big staff table in present that in the output files. For this example, I'm gonna treat my staff at Penn dot XLSX file as the master data file. We will open a blank workbook. And in this workbook we will combine all the three individual tabs of data using the append method. In the subsequent parts of this video, I will show you other techniques for combining the data. Depending on how your data is structured and available. You can use all of these individual techniques as you see fit. For now. I'm just going to close my staff APN table and go back to the blank file. And from here, say data. I wanted to get the data from a file, which happens to be an Excel workbook. Even though I'm demonstrating the technique, the Excel file, I want to remind that whatever you are learning, you can apply that for database tables or cloud solutions like Azure or other things. It will go to from Excel workbook and select the staff append file. These other files are the ones that we will use later on in the video. Then say Import. This is going to show me that there are three individual worksheets in the file. Which one you want to bring. Now we need to get all this data and then combine. So I will use the select multiple items option and check all the three tabs. We don't need to load this data, we need to transform it because this is not how I want it. I want to combine everything into one big table. We'll get into the Transform Data option first. This opens the familiar Power Query editor screen with three queries, one for Bellevue, one for Los Angeles, and mourn for Burlington. All of these tables look good, but there is a couple of problems. Number one, my header row is all wrong. It is saying column 12345 instead of the number one rule. Notice that depending on how you have setup your own Power Query preferences, this particular behavior might be slightly different for you. If you already see employee ID in the column header, you don't need to do this next step, otherwise you will have to do it. Which is we wanted to take the row number one here, make that a header on this table. So to do this, we can go to the Home ribbon and then use Foster Row as Headers option right here. This is going to promote the employee ID header. Will repeat that on Los Angeles table and then Wellington table as well. Now all three tables have the data with kind of trade-offs. At this point, I just wanted to combine all these three tables into one big table. And this is where my app and queries functionality comes into picture. It will use the append queries. If you click directly, what it's gonna do is it's going to append the other two datasets into the Wellington dataset. Instead of doing that, we wanted to create a brand new query that has the combination of these three tables. I'll use the extra option and then select append queries as a New. This will ask me how I want to append, whether I went to append two tables or three or more tables, irrespective of which option you pick, the process is similar, but in this case we do have three tables. I'll select three or more tables option already Wellington is added. So I'm going to add my Los Angeles and Bellevue. Then when you click okay. Magically, all the data from these three individual tables ease combined into one giant table that has all the data in one place. It still has the other problems, which is my gender null is there, my training department of null, null values are there and all of that. But at least it is having the data in one place. At this point, you can go ahead and apply those cleanup steps. But I'm just going to rename this query for mapping B12 combined data. Then click on Close and Load. This extra option will make a connection for everything fast. This is gonna show for connections. And then right-click on combined data and change the Lord behavior for that to a table so that only the combined data is loaded up here. The individual tables are not loaded, just the combined table. As I mentioned, you can go ahead and apply the additional data cleanup steps on this for your gender department name got actions and all of that. If you want. So that is how the combine or append method works. You might have a questionnaire about that pen process, which is, let's just open the file first so that we can get this. Notice the order of the columns here. Everything has employee ID, name, gender, department, and ends with the location. In all the tabs. You might think, what if my order of the columns is different? It doesn't matter which order these columns are in, as long as all the columns are present in all the tables, the order doesn't matter. Power Query will rearrange it. To prove this point, I'm gonna go to my Los Angeles table and select the manager column and then move it here. That location is at the middle. Manager is an end. For my belly view, I'm going to move a gender column all the way to the end. Delete. The column order is kind of jumbled up in the original data file. I'll save this file. We go back here, right-click, Refresh and no errors, nothing. It is still going to work. It is going to give you all the data as it is in the original files. Just that Power Query will automatically rearrange. Now comes the next question, which is what if the columns don't match? Imagine, for example, here in my belly BW Data file instead of department, I'm gonna call this as group. We have a separate name here. Now let's see what happens. I'll save this, we'll come back here, click refresh. And then this is going to add an extra column called group. It's gonna keep the department, but it will introduce a new column, put it at the end. Duct group will have values for just the bell of view people and those Bellevue people don't have department. So if the column names don't match and then we have a problem. But if the column names match with the ordering of the column names, doesn't matter about clarity will automatically adjust these things. This is one quick thing that you need to keep in mind whenever you are appending, you need to make sure that the column names do match. So let's go ahead and fix this problem. I'll come back here. I'll just say, department, save this file. Go back here, right-click and refresh. And boom, that extra column is now gone and the data is all sitting here. Nice and pretty. Lets the first thing, which is we add using appended combine with the append method. We need to know upfront how many tables we are appending. That's not always true in some business situations. So to give you a sense of that, let me show you another example. Here I have got a different file. Instead of three locations. We know how extra employees coming in from our Columbus office as well. Now, I want to combine all these stamps into one big table. The challenge here is while we can use the append method, because this file may have some extra apps added. Your query will not work. If there is new tab sagging. Will need to think about this problem in a different light, which is, I want to take all these data and then combine it. So irrespective of how many tabs are there, I just wanted to combine everything. Let me show you how to do this. Again, we were treated the staff by location as my master data file. We will insert a blank workbook and here we will combine the data. In this workbook. I'm gonna go to my data, get the data from file, from workbook, and select the staff by location file. Again, you might be tempted to select all the four items, but instead just pick one of the item. It doesn't matter which item you pick. I'm just going to pick my belly BYU for now and then go to transform data. This is just going to show me only the belly View tab data. At this point, we do need all the four types, not just the view one. So I'm gonna delete the navigation step and go to just the source step. We will delete the step using that x-bar icon right there. If you click on it, that shape is gone. At the source level, you will see that Power Query has farmed for different worksheets. Largest belly Bu, but it won't Wellington, Los Angeles, Columbus, and bilabial. It also found a filter database thing which is not required. So what I will do first is I'm gonna filter by kind just Sheet that we are looking at all the individual sheets in the staff by location workbook. At this point, all the data of individual tables can be accessed with the table thing here. If I click on this, you can see that all my Wellington data shows up at the bottom. All by Los Angeles, Columbus in Bellevue data. Here. All we have to do e's, get this data and expanded. For that. We will just select name, hold down, Control select data. So we only select these two columns and then right-click and then just say remove other columns. So we don't need those other columns, just these two. Now that these two columns are there, I'm going to expand this data. Just say columns one to 11. The column names don't really look correct, but we will have to fix that problem mixed. When you click okay, you're gonna get all the data for each of the worksheets here. Wellington data, my employee ID, name, gender, department, etc, is coming up here. That's the header. Like that. There will be another header farther down the Los Angeles rule. So as soon as we begin Los Angeles and then we bought more header. And then this is going to continue like that. Then there is some nine values as well, which is probably some blank rows that it is fetching in. So we will have to delete all of those things. We're going to make this particular row a header. At this point, depending on how your data is. In my data, I do have location Wellington, New Zealand tagged here, so I don't need this name column, so I'm going to delete this column as well. Then we only have this Select From Home ribbon use the first row as headers. Now that the headers are done, we will have to remove any extra current source of the header. So because there will be another employee ID farther down in the Los Angeles data if and when we go there like that, There's got to be multiple employees, one for each worksheet. Here, all we have to do is using the filter. I'm checking the employee ID. Likewise, I'm gonna uncheck the null values because if there is any blank rows brought in, we don't need that as well. Those filters are going to clear up the data. And at this point this is my completed data. So we will just say all data as the name of the query. And then you can go ahead and load it, or you can apply some additional data cleanup steps. For example, we're gonna remove anybody in the null department. Right-click on the gender, Replace Values. Non width. Other. Likewise, you can also use date of termination to tag an employee as active what Lord, and use the FDI to figure out whether someone is permanent or sorry, full-time or part-time and use the salary null and replace that with the 45 thousand figured if you want. I'll leave all of those things for you as extra homework problems. Now that all data is combined, I can load this. Before I Lord, I just wanted to quickly explain again what we have done. We have started from the Excel file, then we have filtered to just the sheets, and then we just expanded all the data. We are going to get one dataset that is combining individual sheet data into one goal. Now for this particular technique to work, the column order should match in all the sheets. If the columns are out-of-order, then this is not gonna work very well. This is because when we are expanding, we are only expanding and combining column one of each sheet into one column because there are no headers. For this reason. It's not gonna automatically match the columns. So this is why the column order should match when you are using this particular technique. And finally, this is my output rate table. I can just close and load this. This is going to just load up all the data into one table nicely. Now let's see what happens if you want to introduce any other branch into a new tab in that original file, I'll go buy goto my staff by location page. Here I have introduced the Auckland branch data and these are just some of the people who from my Bellevue branch I got just move them into our claim. All of them have their location and we know have an extra worksheet called Auckland will save this and then close this file. Come here, make a note of all the locations. So we have only four locations. Now right-click Refresh. And this is going to bring those extra rows here, law, the number of roles have changed. And if you go here you can see that we know how Auckland people also added. They should probably be all the way at the end. Like here, all my outlet people are listed nicely. Appending, as well as using that expanded view are two ways to combine the data. When you have data in the same file. But many times your data may not be even in the same file. So for example, in here, instead of having the fire data like that, I might have individual files, one per location will have a bilabial CSV file, Los Angeles CSV file, and a Burlington CSV file. Each file giving me the data of the staff within that location. If the data is like this, how do we combine? So that's our next topic. For this to work, we will open a blank workbook and go to the data, get data from file. And instead of using the Excel workbook option, we will use the from Folder option. This is a powerful automation technique built right into Power Query using which I can take all the files in a folder, combine all of them into one giant dataset. So we'll say from folder, select the folder that your data 0s. Here, my location data is the folder and then click open. This is going to show you a screen where it says, I found three different files in the folder, Bella View, Los Angeles, and Wellington. What do you want to do? You have an option to combine. You can also load or transform data. Because we want to combine, we will use this option and then say combine and transform data. You can also combine and load directly, but I'm just going to use the transform option because then we can go ahead and do a little bit more clean up if needed. I'll do that. Then this will again open one more screen. And then it says, Harry's the fast workbook within your four ducks. So it's going to show you a sample file. It could be the first file, but you can also see any on any one of the files. And it says, this is how the data is. Are you happy with this? If so, I'm gonna go ahead and get it. So everything looks nice and tidy for me here. I'll just say, okay, this has got to know, load up Power Query, do its magic, and then combine all the data into one file. So this particular query that we are gone, we're watching right node E's already combined all the data. So it has my belly BW Data on the top. But if I keep scrolling, I'll find that in all my Los Angeles data is there. And further down we will have our Wellington data as well. But let's just understand because it would have added many other items on the right-hand side or the left-hand side to facilitate this process. The way the folder automation work seize power Query built a whole bunch of different ways to make it happen. Your final query will be in the other queries folder, and then it'll be called location data or whatever it is the folder name. But to make this process, it will use the sample file that you have seen on the earlier screen. So it showed you 11 file in the Nazca. You Is this what you want? So it will use the sample file and then it will build transformed sampling. This is the sample query, so this is only working on one file. Rebuild the query for one file and then run the same process on all the files. So if I wanted to change anything that I'm doing at a fine level, we just have to do it at the transformed sample file query. For example, if I see here, I can see that my column headers are wrong. We could kind of promote this role as a header, but we will still have to delete this extra headers in other files, which is a pain. We might as well just do that Edit Transform sample level. In that way, what happens is Power Query is gonna repeat that process on each of the files in the folder. So we'll go to the Transform sample and here apply the US first row as head of step that this it'll becomes the header. Likewise, if we want to do anything else at an individual file level, you can do those processes here as well. Like for example, removed a null employees. Once we make the chain at the Bellevue file, if I go to my location data, you will see that it has done the same for all other files. So while only better BW Data is what we cleaned, Power Query reuses that process and then applies that on my Los Angeles file. In the Los Angeles date. Also, there will be other gender and nobody will be in the null department anymore. Because there's still some blank departments. I'm going to uncheck that. Those people are all gone. And that's pretty much it. Our data is now clean and we can go ahead and load this. So again, we will just say tills and Load to create a connection for now. Right-click on location data, load to a table. Only this credit gets loaded. None of these other in-between queries are loaded. They are all maintained as connections. This is a very powerful automation technique. What it means is if you want to add any extra files to that folder, Refresh this process. It's going to pick up those files and add them at the end. I'm not going to demo that, but I will leave that to you as an extra homework exercise. So there you go, three different ways to combine the data. One using the append query is option one using the multiple sheet combine option and one using the folder combined option. Depending on how your data is. Use one of these three techniques to consolidate the data and degenerate bigger datasets that you want to work with. 7. Unpivoting (reshaping) Data: In this video, let's talk about one of the most powerful technique in Power Query. It is the ability to unpick what data. Remember our bonus example calculation problem. We had bonuses defined one bolus percentage per department. But what if you have bonuses defined like this? Here I have got a matrix style of bonus allocation way for each department, each category of staff, we know what percentage of bonus should be allocated. This is where we had gone to uncover two different techniques. Number one, how to take this data and reshape it into a format that works best for doing the bonus calculation. And the number two technique ease out to lunch tables on multiple columns. Let's go ahead and do that. I'll close this file. This is one of the earlier workbooks. So this is not the workbook that we have from folder combine or anything like that. This is the workbook from lesson number two, but it doesn't matter. You could use the original data as well and still do this kind of a thing because we do have the manager column as well as the department columns or both columns are required to figure out what percentage bonus to our board. Here I will just go to Data and then get data from file, from Excel workbook. And point to my bonus mapping to file which has the extra magnetic stabilise mapping information. Let's click OK aim. This is going to open up Power BI, Power Query navigator and then asks you, what do you want? There are a couple of different things. We have got our main bonus table. There is also a named range called bonus dot matrix that also comes up here. I'm just gonna pick the table because this is the correct format for our situation and then get into transformed data. This brings up the table here nicely. While this kind of a table is very good for maintaining the information, It's not really good for calculating the bonus. So if I go to my staff table, you notice that we have got our department column as well as the staff information column like what kind of manager they are, whether they are managers in a manager or staffs or manager column. We need to use these two columns to kind of figure out what percentage of bolus each employee is getting. The orientation of information here IS department has its own column, manager has its own column. Whereas this monastery will hear department has its own column, but Manager kind of goes across the screen. This kind of format is referred to as a pivot TED format. It kind of looks like a pivot table. From this, in order to map the data, we need to foster and pivot the data. Part query offers many powerful features to and pivot the data to unpick what a dataset like this, what do you want to do is click on the department column and then right-click. You need to already select the department column and then say unpaid about other columns. This is going to keep the department column and then take the rest of it, a non-pivot it, you will secretly what it does when you applied. So it's going to split this into three columns. So your original table is now remap it to three different columns. One easier department which is retained. The next one is attribute column, which will tell you what sort of person they are and what percentage bonus they're getting. All combinations are featured in here. Now that the formatting is right, I can go and map it out if I want. I can renamed attribute to manage our question mark. But this is not necessary because we don't, we won't be using this table. We will be directly calculating the bolus. Now that Dan pivoting is done, let's go ahead and map it. Before we jump there, I just want to highlight the other and pivot options. So if I right-click here, you can see that we have got three kinds of an pivot. So depending on how your data is, you may want to try one of these pivots and they all work in the same principle. But depending on what you select Done Pivot outcome changes. In this case, untoward other columns was the option for us that when we get to keep the department column and kind of reach, rearrange everything else. Let's go back here and now we're going to use the home match queries. Pick the department column, hold down control, and pick the manager column. The ordering in which you pick is also matters department is the first column, manager is the second column. We have mapped like that. I'll pick my bonus State Department control manager. Now both of these columns have mapped 11 CO2. What Park where he will now do is it'll take the combination of training manager and then it will give you a training manager, 0.2 as the output. So it will take care of all the combinations and how to map them correctly. Once this is done, we'll click Okay. We'll get the same old bonus table for each row. That tells me what the percentage for that person is. We'll just expand this, unchecked these options so we only get the value. We don't need to see the department and manager again. And there is my percentage of bonus. Now that the percentages here I can calculate the Bolus value by selecting the salary column, hold down control and selecting the value column. Now that both columns are selected, we will add column standard, multiply. Boom, we get our bonus information here that comes from the matrix data. Here I can just rename this as bolus dollar. Then I don't want this value anymore. I can remove this column as well. I'm just going to keep that value there so that we can refer to it when the data is sent to Excel. Now that our I'm pivoting is all done, I'm just going to say Close and Load, load tool and just say connection. This way, my bonus table doesn't get loaded here, but the original staff table now has these extra columns value and bonus value is my bonus percentage, and bonus dollar is my dollar information. If you want, you can also apply some additional formatting or a rounding of the values within Power Query. Or once the values are in Excel, you can do whatever else you want to do it. This data, this is one kind of on puberty, but I'm pivoting is quite useful. So I thought we will take a look at another example of NPV at the very last example of this course. For that I've got a separate file altogether. I'm just gonna save this and open that file. I call this as a hunt pivot. This, this is a good challenge. If you are feeling a bit adventurous, pause the video right here and go ahead and non-pivot to this data. But if you are still coming to terms with the whole non-pivot concept, let's get into this particular example. We are doing a corporate games week at awesome chocolates. And this corporate gigs week is happening between 11th of April two, 15th of April, that full week, just before Easter. We have got some some of our staff playing various kinds of games. So for example, on Monday 11th of April, we're playing cricket. And these are the people that are playing cricket in the corporate games week on 12th of April. These are the people that are playing volleyball. These people are paying basketball, these and tug of war. These people are playing. So we have got a matrix style data. You can think of this like a pivot table style data. The big problem is each cell itself contains multiple names. If I wanted to ask a question like, how many games Michael is playing or how many how many individuals are playing volleyball? If the same person plays multiple times and we don't want to count them. If for example, Phyllis is playing volleyball on Monday, she's again playing on Wednesday and on Friday. So we don't want to count fill is tries. We just want to count her once. How do you answer those questions? It becomes super impossible with this kind of a format. Whereas if I have a tabular format with sport, date, name, and three columns, then we could do it. So regular format rather than a pivoted format, I'm going to select all of this data. I didn't want to turn this into a table. Whereas if I try to get this into Power Query, it will say you need to have the data as a table. So another alternative is if you are working with the datasets like this, select the data and give it a name from the name box. You could either do it from here or formula defined name. Either of those options work. I'm going to name this as my corporate games week, CDW dot memes. Once you have it as a named range, you can go to data from Table Range option and Power Query will not ask you to make a table. The data is coming here nice and clean. We know that this row is the header row, so I'm just going to make use for stress headers that we take that into the header. Now that that is done, we'll keep the sport and pivot to this. So right-click on this and pivot other columns. So we get sport attribute value. I can beauties date. So I'm just gonna say this is my date. Then value has my names. Notice that each cell would have all the names comma separated. Here is a good thing. We have followed a consistent formatting does names are always separated with a comma and then there is a space. So the delimiter technically is comma and space. If we could take each name and then spill it down like cricket, 11th of April, Erin Karen, David, Andy like that. Separate rows, then our data is in a good shape. We can go and analyze it. So all we have to do is right-click on the value column and then split the column by a delimiter. Here instead of comma, I'm just gonna say Custom and then say. Comma space. So that's the whole delimiter. Then by default, if I just click OK here, I'm gonna get individual columns, one per person. And this also makes it impossible for us to ask questions like who all are playing volleyball because then we would have to believe with multiple columns. Some of them will have null values because we only have four people playing that on that day. This is not going to cut it. I'll delete that step using that X mark will go back here, right-click split column and by delimiter. And then custom comma space in each update on some delimiter. And then we use the advanced option. And then instead of splitting it into columns, will say, split this into rows. That's it. This way what it will do is it's gonna split that into individual rows and everything is gonna work nice and beautiful. Ready click Okay. Your entire table gets expanded and you'll get a larger table, source it from the original data. So we've got your sport date and value. Name this as person. I'm going to right-click on the date, change type to date as well so that that's coming up as an error. This is interesting. I mean, this is not something that I was counting on encountering in the video nor explaining it. But now that there is there, Let's go ahead and see. Just gonna point on this error. We couldn't pass the input provided as a date value. So for 11202212 AM, I guess the reasoning is this format is month, day, year, whereas my Power Query is probably looking for day, month, year or year, month day format. So it's not really happy with this format. We can kind of go back and delete that so we can look at the change of type, but I'm going to undo the change at type as well. And I'll show you an adult technique. We can right-click on the date and then go to change type instead of date. I'm going to use this last option using local. Whenever you have dates that are not asked, but you are a system default format. It's a good idea to use the local option because that way you can tell Excel that you are dealing with the data that is for Medicaid and other countries default date format. Here my datatype is going to be a date. If I leave it here, English, United States, you can see that the default formatting of English United States ease month, the month, date, and year. So this is the formatting. I think this should work. Not working as well. I suspect because our date also has this time component. This was not part of the spreadsheet, but some hope our query is thinking that's at the exact midnight. That's really where things are going into a little bit of confusion. So maybe if we go to the change datatype with locale and use that gear icon. And from the datatype date, if I put it as date slash, time and click, Okay, that works. Here I have now character, date and time value. We don't need the time because the time has actually no meaning. So now that it is treated as data, I can convert this back to date alone and I think Bush gap. So we first take it as date and time and then push it to become a date. And that seems to make Park where he happy. Some of the date and time conversions are a letter tricky depending on how your data is formatted. And this is where I find that Parkway gives you a little more power and ability to process the data rather than Excel alone. So understanding these nuances can go a long way in working with challenging datatypes that you will encounter from time to time. So a good challenge which I was not even planning to talk about. But sport date and person is now here. I can just close and Lord, I'll get a nice little table here, the green table that tells me who's playing, what. Given this table, we can make a quick pivot table from it. So I'm just going to insert a pivot from this data right here. For example, to see who's playing ward. So if I picked a sport and then put the person, get the count of person. So this is how many people are playing each sport. You could see that, for example, volleyball is our most popular sport with 30 people playing. Next is tug of war. Discounts, duplicates, but you could also count the unique values or whatever you want. You could also, instead of sport, you can kind of go and put Parson and then come to the number of sports they're playing. Jan has Johnny's playing on five different occasions. Some of our people are paying nine times. And if I sort this, I can see Kelly and Stanley. Some of these people are fairly athletic and Andy's only playing twice across all of the things. This is a very good way to analyze the data now that it is in a better format, we would not be able to get these kind of things from here. Let's just quickly test this. If I change anything, whether that'll appear there or not, I'm going to make some notes here. For example, Dwight displaying only twice, Michael is only paying three times. I'm going to add their names to some of the other places and then we will see what happens. So here I'm just gonna say Dwight. Then this one, we will put white and Michael. Both of them. We have added one do I should add to, and Michael should have gone up by one. Once this data is changed will come here, we'll right-click and then refresh. This is gonna refresh this table. And then it's going to add the extra information some there. But this pivot table will not update because it still says Y2 Michael tree. Now know that the data is updated. You need to refresh this and then that's gonna move up to Ford Michael to fork. This is where you need to double refresh because of the way this thing is setup, you are faster refresh. We'll get the updated data and then your second iteration will update the pivot tables. You could also use some of the settings within Power Query to kind of delay the pivot table update until data update is done, that we will be wonderful. She's an F, but I leave it all of those things for you to figure out on your own. There you go. I'm pivoting something a little more complicated than the usual datasets that you see. 8. Bonus lesson automate stuff with Power Query: Hi there. This is a bonus video in the Power Query mini-course. In this video I'm gonna talk about how we can take a boating and highly repetitive data task and then use power Query to automate it so that you can just sit back and relax. This is actually not like the rest of the videos in the class, because this video is extracted from a live stream that I did on my YouTube channel. On my YouTube channel, I do live streams every month. So this month's topic is how to automate boring and repetitive tasks. And I took the live stream, I edited out the portions that are not relevant for this lesson and then kind of shrunk into one over video. Although this is slightly longer than the rest of the videos in the class, I highly, highly encourage you to watch this whole thing because there is tons of value in there. And when you finish this particular video, you will be like, Oh my God, I can use power Query to do so many more awesome things. So please go ahead and watch this data sample files that you can download and practice the concept. Or I highly recommend that you apply them to your work situations as well. Yeah. Thank you so much. I'll catch you in the video. Hello, everyone. Good morning. Good afternoon, good evening. It is so good to have you in our automating, boring data stuff live stream for the month. I am really excited. Let's get into the session. You can grab the files, like I mentioned in the video description link that is there. I will. Along getting to the problem. This problem originated with one of the situations in the awesome chocolates company. Awesome chocolates is a made-up company that I use in many of my videos and examples. And this is also one of the scenarios that I created for us on chocolates. So what happened in awesome chocolate Cs? We sell chocolates and we are a chocolate, not a retail chocolate company, but the wholesale chocolate company. We make chocolate duct eventually get sold in supermarkets to you customers. Many times, our chocolate orders come from bulk supermarket and we just shipped them boxes. But most of these artists are automate it. But sometimes what happens is our customers also receive a traditional order form for the chocolates. So one of the other forms looks like this. So here in this particular challenge, I will show you how to build with data in situations where things can be a little bit more unconventional and how to handle that. So as you can see, this is my customer order form, but then awesome chocolate it because it doesn't look fairly clean. So it's not like super dirty data we're talking about. But it is, you will quickly understand that this kind of a format is a pain to work with if you wanted to do anything meaningfully, when it comes to doing data analysis, quickly zoom in so we can see what is going on here. The order form has kind of a standard template, which is got your salesperson name and customer name. So here we have got the sands button bar, funny and customer, some bold winner. Their details are only mentioned once in the order form. And then further down, we have got all of these product wise, how many boxes they weren't on each of the days. The dates themselves may not be continuous like you see here. Second of May, 114 boxes of fruit and nut bars. Then fifth of May, and they weren't 108 of 99%, pure and dark and pure like that. The dates will go like this. Sometimes that will be dates up to 1515 different days because this order form, you can only enter up to 15 days of data. So it's kind of like a traditional art form. Some of our claims fill up and send it to the salespeople, and then we will have to fulfill the orders. Now the big challenge, this is where you and I come in ease for us to look at this data and do some analysis work. The problem here is, the data is all there. For example, if I wanted to know how many total mint chip chocolate we need to ship, we could kind of do some simple analysis of adding the column map, but are, are adding this rule up and then saying whatever that number is, six hundred, six hundred boxes. But the problem, because of the way it is structured, it gets a little bit clunky and Ms. Messy, we don't know how far to some and all of that. And if I have got multiple art forms from bar funny in different files, then we don't know exactly how to act and that's where the challenges this is one order form. But as I mentioned, we have got hundreds of ohms. So this is the folder where I have kept some are followed order forms. And now I'm In this session, we're going to understand how you can take data that is kind of messed up and confused and garbled like this. And then how you can create a system through which you can combine all of this data in one goal into a final master table. That's really what the live session is going to be all about. We'll take the files, will build a repetitive process that will just get all the data, make sure that it isn't a neat single table format at the end of it. So that any number of files you feed to it, it's going to just take everything and then combining, put it in one place. If you think that is the most awesome thing you heard all week, all day or all ear. Please stay false, stay on and enjoy the rest of the session if you already know some parts of it, but not all of it. Again, standalone. Yet. That's that I'm going to get into actually zooming the screen and making sure that you can read things later. But now because I'm only demoing the file, there is not much to really weird. I'm not doing anything. The files are already provided to you. This is the files that I have provided you when you download the files, zip file that contains all these art forms and in everything, I'll quickly talk about the rest of the files in the file. Finally, the files that I gave you, there is also a combined data Excel workbook that has the final output from from Excel that has all the data combined. If you wanted to see how to finalize it. Output looks like that. That's the file, but that's what we're gonna construct in the next 60 to 90 minutes. A long way. You are going to learn some, some really powerful feature. So you'll understand how Power Query works. You will understand how to build a function within Power Query. This is one of the most powerful and advanced functionalities of Power Query. And then how to run that function over and over again. Some of the best practice when it comes to building such functions and using all of that. What I'm gonna do is I'm going to open up Excel. And then this is just a blank file. And then from this blank, while we will bring up all the data together into the final bag blank. Well, that's really what we, what we need to achieve. So lying quietly in Excel, and then from here it's all fairly straightforward process. The first thing that we want to do is we wanted to bring the files here and then combine them. I'll show you how the processes for a single file first, because then you will understand the true power of Power Query. Then the men were from the first file, you will understand certain key concepts and then we will see how to kind of repeat that process for the entire file. This is really this way you get kind of like a repetition of that. And then if anything was confusing the first time, second time it will become clear. We couldn't go to the Data ribbon. And from here you can access all the Power Query related functionality. So they are sitting here in this little corner called get and transform data. In our data is, even though it is a little bit messed up, it is all in spreadsheets, so it's easy for me to just get the data from file from Excel workbook. So that's really the path will connect to that. Then here you just want to point to the file location where the individual file is no, Right now we're only going to do the process for a single file. Then we'll repeat the process for the whole whole bunch. Again. I'm just going to go to my live stream folder, order forms and then pick any order form. It doesn't matter. For the sake of simplicity and consistency, I'm going to pick the zeros, 01, the very first start off one. Then click on Import. This has gone farther, open up in Navigator screen. And within the navigator screen you can now do a preview of the data Power Query will show you this is the data that I found. Are you happy with this? Another key note here, whatever you do within Power Query, you, even though I'm demoing this with an Excel file as the source data, you can apply the same idea, same concepts for your SQL datasets for your text files, for your web pages, or for your XML files, whatever is your source data, you can apply that. So don't think, Oh, this awesome chocolates example is not for me. Now, that's not the case because the technique and application and severity of disease so powerful that if you ignore this creature of Excel, you will be doing a lot of repetitive work manually or using some other older technologies. That's really important here. And the Navigator, we just select this and then it's gonna show you how the data looks like. To the navigator. As you can see, this is a really messed up a single file itself. For our eyes, it looks super clean, but when we're looking at it from the Power Query perspective with only sees the data. That is a lot of null values. It's all mismatch and everything we will need to do the transformed data step. If I directly loaded, it's going to be useless for me. I can't do any analysis for this one file alone. Forget about doing it for hundreds of files. We need to use the Transform Data button. I'm going to click on that and it'll kind of jump into the query editor screen within Power Query. And this is a shortcut. This is something that's not part of the session at all. But if you see things in Power Query and you're thinking, Oh, everything is super small here. How do I assume? Because within Excel I can hold my mouse button and then up and down it and it'll increase the screen. You can use Control plus button, control minus to just the screen zoom. This is Power Query. As I said, if you've never done any power Query work, then it might be a little confusing what the task that we're gonna do right now, but well, I'll try to explain it a brief level. This is a query editor screen through which I can tell Power Query how I want my data to be cleaned. At any point in time, you can have multiple queries. So that's why this panel here lists all the queries. Right now. We only have 11 query, so this is the only one that is listed here, c where f, But if you have multiple queries, then they will all come up here. And it doesn't have your usual ribbon stuff on the top. It's all my ribbon. And you can read things and understand there's some important buttons here. Close and load means you finished your process, you wanted to know, load up the data to Excel. In. Merge Queries means I wanted to come combine two different queries like a join operation. Append means I want to combine data one after that, like stitching together in all of that. Within the query, this big box here, it shows me how my data looks. At that point in time. You can apply different steps on the data. Whatever steps you apply, they will appear here in that plant steps. Right now we haven't really done anything. So all of this is really blank. All we did is loaded up the source file and then navigated. But he was basically what the navigators screen was doing. That's really where we are and any name for your query, you can give it, you can change all of this, whatever you do it here. It will only apply locally within this Excel file. It will not change the source. Well, the source file will be still as it is, but this one will have all these extra things added. For example, I see that column 12 are null. There is nothing there, it's just null. So I can hold down my shift key, select both columns and then right-click and then just say remove columns. As I said, what this will do is it's just going to remove those columns locally for me, but the original file demands as it is. Now, there is a remote Columns step baggage here, which simply says remove two columns. Likewise, you can see that the first row and second row, kind of there's really nothing there. It says customer order form. Please interrupt to 15 days of order, and then third row is also null. So the real data begins from rule number four onwards. We could, for example, takeaway first three rows and we would intrude lyric, really lose anything valuable because there's nothing interesting there. So I can use the removal Rows button here. And then just say remote top. Then say it, number of rows is three. Now boom, those roles are also gone. Essentially, we have added an extra step that says, I don't need those top three rows. Get rid of Decker. Let's say all they want. When I clean up this data is I want to know who the salesperson is and who the customer is. I don't care anything about the numbers and everything. We only want rows 12, everything else is irrelevant for us. Then I can just use the key browse button, keep top rows. So earlier we use it removals. Now we are using keep rows. Rows, just keep. Rosenhan wrote numbers 12 because that's where my salesperson and customer information would be. At this point. Again, we don't need anything. None of these other columns are required. Only the first two columns are what we need. So I can just use Shift and select column 34. Right-click, remove other columns. That means keep these and get rid of everything else. This is how basically we are transforming the data. We took the portion of data out of everything that is there. At this point the data goes sideways. Salesperson, customer, ideally I want the salesperson and the customer has two columns. We can go to the Transform ribbon, and from here there is a transpose button. What it will do is it'll flip the table sideways. If I do it like that, then the data changes its orientation. So now we have Bonferroni here and board whether it there. If I go back to an earlier step, you will see that it is what? It is horizontal. This time we now know that column one should be called BAD, funny, and column two should be called customer, salesperson and customer. Again, from home, we can use the US first row as headers option to kind of set the characters, that is a semi colon. After that I can just delete that. This step is called renaming the columns. So we have renamed. We took a whole bunch of data, big bunch of data, and then we applied one step after another, so that by the end of it all, we were able to extract the two pieces of information. Imagine the other file, like a mountain of really shift the information. We were able to go in, dug up there and then find the two items of valuable information in the next drag that up. This is just a sample case, but it kind of gives you a sense of what we are achieving. At this point. I'm happy with the way things are. I can just say Close and Lord. And this has gone to Flickr the screen a few times and then it'll bring me the final data from that. There is only one row information here. It's obvious that there will only be one row because in each other there is only one customer in one salesperson. And they will come up here. Remember where it all started? It started from the the way f 001 file. Now I'll quickly go back here and open my file. This is the original source file, and I'm gonna change a few things from here. So instead of bar funny, I'm going to type my name there. Instead of board winner will type. Blanking out here. W. These two names, these don't exist anywhere else in the data, so it's easy for us to spot. I'll save this file. Close this. Now here if I want to get the character data now that the file has changed, it's severely simple matter of right-click, refresh. And boom, we get the updated data here. None of those steps need to be repeated again, because what we've done essentially is we have taught Excel how we want to do this process. When the data changes, I can come back and give you the updated data. While it is all super beautiful and awesome, there are some downsides to this approach. The biggest thing that you need to remember is we could with various assumptions to get here. For example, if I go here, notice that we removed the first three rows because they are all blank. But let's just say we get an order file where there is number three is missing. Like this. Save this, refresh. Probably the file is closed. Yes. I already get an error. It's not telling me that there is an error, but that's really what is happening because it was trying to do something, but that doesn't really happen. I can, for example, going AT ten, figured out what was going on. But because we are working under the assumption here, there is a error. You can see that at each step that is data available, at some point, things would have fresh the previous here. At each step, data is available up until that point and then my name is completely gone. We are only not up to customer information, so Chengdu salesperson is completely gone because that rule is removed. This is where because that assumption is broken. When I tried to do subsequent steps, it's not able to properly transformed the table. And then here, it's giving me. The real error is, if you remember the very last step that we did after we promoted the headers, we saw oh, customer column, salesperson column. So we remove the column. The actual underlying programming code that Power Query you as writing is this one. I'd say is the header that has a salesperson semicolon or a colon or whatever that is, and then make it salesperson, customer. Colon to customer. It was trained to look for a column named salesperson, which is not even there because of the way data is now happening. For example, if I don't bother with the renaming, then I would still, my query would still work if I delete that shape using that X mark, it'll still work. So at this point, my data looks like this. But if we want correct column names, so we renamed, then that's gonna break. So this is one of the things that you need to keep in mind when you are working with Power Query, essentially you're telling computer help you weren't. Computers are not smart. They just follow the instructions. So as with all, depends on how clear your instructions. Our credit is super smart, but it is still dump. It has no sense of what could break. So you would have to, as an analyst, anticipate what would break and then build a repetitive process that works through that. You don't have to break your backend or think all of this, but depends on your reality. Your reality is that the rules can disappear. Then you will have to build a better process or you just have to get on the phone and talk to the sales person who's doing the art form and then given a make make him understand or give him a piece of your mind so that they can, they can stop deleting rows or anything. Just tell them keep the template as it is. It will make our life simple. So there's different ways to deal with this kind of a problem. But whatever may be happening, this is something that you need to keep in the back of your mind irrespective of what is going on, this is still way better and way faster and smarter than manually doing any of this work. So this is why I really enjoy it. Now that we understood the process for one file and one part of the problem, we didn't even think about how to get the actual boxes and products and everything. So we kind of understand where it is all going. So now that we see that, now let's take this and apply it for a big file, all set of files in one goal. Before we do that, I'm just gonna read through some questions to understand if there is anything else going on. Chip asks Could if dance steps we used? Yes, you can add extra rules and logic and everything that says if this then do that, that is all possible. I don't think we have got any conditions added in the coverage for this session. But I've got another video where I talked a little bit more about that. So definitely check that one out and you will find it. Sean says the reason why you didn't keep all the data would just salesperson, customer name in a new column. Yeah. That's really what we're gonna do. But we didn't want to do that just for one file. We had to do this for all the files and then combine everything. So instead of tackling all the problems both times, the first time we just get only the name, customer name, and salesperson name. The next time we are just going to do everything together. You will see that we've been saying is going to be used for Query Editor in Power BI to yes, you can. Whatever you are learning, you don't have to use it in Excel. You can use the same ideas in Power of Power BI. So this is why I find learning Power Query is like the biggest bang for buck, because not only you're saving time, but you save time twice, once in Excel, once in Power BI. So why wouldn't you do that? Imagine you always wanted to visit France or Italy or one of the exotic countries for you. But the biggest hole back for you to use the language like everybody there speaks a different language than you do. But now imagine what happens. Like you can go to Paris, you can go to Italy, lived there. Everybody will speak the same language as you do at home. That's really what this is. You can use both tools and they both speak the same language. So it's super simple. Whatever you learn, literally no changes, it's exactly the same screen, same steps. Will you just do the same in Power BI? This is one fine, but our job is not done. We are only getting warmed up now. We will do it for all the files in one goal. This is my original query. I wanted to do one little thing before we needed that. We just go back here and add that blank row again. Kind of restore the sanity where our order form and refresh this so that the things come up here. Yeah, the column names are still having that extra character at the end. We'll just live with that for now. Alright, so I'll go to a new worksheet and then we don't even need to go there. And then from data this time we want to do the same process, but for all the files. And not just get the name and customer, but their entire list of orders here. Then continue that process for all the files so that we have one giant table here. It quick reminder, again, this is my art to farms. The folder has many, many order form, so the zeros 01 is just a tip of iceberg. I have got different art forms. If I open a random form here, you will see that this guy here, their data is a little bit longer. They had 14 days worth of information that has water form. One didn't have 14. I think it had only up to this. And they did ordering even more products than the original thing. The name and order of these products is also different, so everything is messed up. This is the big challenges. We need to kind of get it. Here we will say get data instead of a single file. We wanted to get the data from a foldable. The processes, what happens to a file is similar. But now instead of saying get the data from a file, you're saying head is a bunch of files. Get the data wherever everything together. That's what the folder we'll do. This option. If you are using a really old version of Excel, you may not see the folder option, but you can still technically do it. I would recommend probably an easier thing to upgrade Unix law Paul query, rather than try to twist it in and get it done, the internal workings are same. I don't really know exactly when the folder our patients started appearing. The folder thing is there in Power BI since very early versions I believe, but probably Excel. I don't know if someone can come from in the chat window. If you don't see it, say from folder, and then go to my other forms folder and then open back. That's the folder where I want my data. Then this is going to show you a listing of all the files that it Phone. The content columns is binary, which means bytes and bits and bytes. Basically that's what files and computers are. It says I found all of these files and OEM, it only shows like past 20 or 25 roles or something like that. What do you want to do? Do you want to combine? Do you want to load the old transform? You wanted to cancel? Combine these really what we want, but we just don't want to combine as it is because we know that our data is not correctly shape. We need to combine and transform. So that's the first step. Combine and transform. This is what we're really going for. You will see some similar options in Lord as well. Don't directly Lord, just say combine and transform data. This is going to open up a page. If and when it comes up, I didn't say I wanted to combine files. Now asks, where do you want to combine? So here we are now building that repetitive process. Essentially, you will tell Power Query whatever you pick in one file, then it's going, it's going to go and pick same data from all the files. Each file has this si hoy of worksheet. This is what I see. So do you want this data? That's what really asking here. There is not much of a choice, so we'll just say, yeah, cool, That's fine. This is what I want. So we click Okay. Then this is again going to flicker the screen a few times. L, like a tube light, and eventually it'll come up here and boom, all your data is merged. Now I can for me to hide my face so we can just see the data. There's a lot of garbage in this data. We still have to clean it up, but at least it is all there. If I go here and Control Plus, you can see that this has basically taken up boy of zeros 0110106 like that, all the data is basically taken and put one underneath another. Imagine if somebody is basically stitching together of the files, one file to file three like that. They would not really paying any attention. They're not doing any work on it. They're not changing the data. They just attaching one after another. That's really what Power Query has done so far for us. This is fairly useless, but it is a great starting point. At least I didn't have to deal with 100 different files of data. I have to deal with one set of ****** data. From here it is easy for me to apply things or do stuff. Hello, Ian says set my Excel is crashing. If my data type is different when I automate using folder any solution, I'm not really sure why it would crash. It's a little tricky to kind of dissect the problem or figured out where things are going wrong from, from just one message alone, there could be any number of reasons why it would crash or things could break. Starting from simple things like insufficient memory or you're running too many stuff to more complicated things. You would probably want to, this is something that kind of a universal if you can get into Power Query and there is an error at the last step is basically you just want to backtrack. You keep going one step at a time until you find the step where things are not an error. And then the next step is an editor. Once you look at that, that is where the problem is in the new stock to figure out why it is the problem. I hope that gave you some idea. If you already know that in you're like, Man, this is not helping me, then I Feel free to add more detail. But if it's not directly connected to the topic at hand, I wouldn't be able to help you out. Whilst he shares. Can source data be changed every time? Yes, That's the whole point. You can have different source data. Like in my case, I wanted to change a bunch of files in a different month folder. So even though I'm showing the process for just one file, one folder, and the idea can be extended. It gets a little bit more technical. So I didn't want to extend this masterclass for three hours and then show you how to do the multiple folder thing as well. But once you know that this is the process to do for one folder, then you just have to think, what if I have six folders, then it's very easy for you too. Get the answer for that question. So let us the technique and then going up late for multiple forests, there's probably YouTube videos or tutorials on the web that will give you missing pieces of that puzzle. Pooja says Folder option is not available for Mac users. Well, what can I say? I'm even surprised that some of the things that are available, because I only saw recently that they are adding Power Query support to max. It could be a V while went before you start seeing folder up support in mind. The only thing that I can say is my sympathies are with you that salt has many thanks for sharing Ken Parker do with, deal with different headers. It can deal with different headers. You just have to know the technique so everything is possible. But we will have to think, it's basically like you're playing chess with bar creating. Every step that you do. You have to anticipate what Power Query is gonna do if that's not the case. So if I am a Zoom like if I'm renaming a column from salesperson too, Sales Officer, then I'm assuming inherently, even though we're not explicitly saying that, that there will be a column called salesperson and then it should be renamed to Sales Officer. But what if there is no salesperson column? We have seen that earlier. Then you would have to play the chest two levels deep by starting with an assumption that you want to rename second column rather than salesperson column. That's how you start thinking. What if it's not second? What if it's seventh, then you will have to get even spot that everything is possible. It, you'll see a programming framework. This is something that I do for my I'm live streams. You can see I've got my witty saying on the board here. Power Query is the AMI, deliberate wordplay there. But essentially the M language. M is the language name that power query uses. M language, ies. What par credit users to build according infrastructure behind. And you can go and write your own M language code if you become a little bit comfortable. I don't know much in Miami is fairly drastic, but I can get that. With that. As I said, this is like a somebody stitch it together, the files. It is badly useful. I can't really do anything meaningful with it, but it is a better problem than a 100 individual problems, like one big problem. Now let's understand what really happened behind the scenes because that is where the clue is. If I expand the queries, suddenly I see a whole bunch of queries instead of a single query. I thought, you know, we just didn't afford a connection. There should be only one query. But internally, Power Query will now organize a whole bunch of different things to facilitate the mechanism. Here you can see that there is a bunch of things here. This explaining, this takes a little bit. Like, you know, if you are in the same room, then it's easier to explain. But since we're all in different rooms in different parts of the world, I'm gonna try my best. I think this is where my face will probably help. You can kind of see my expressions as well. Here we've got different folders. The key thing is there is a affects transform file. What this is, this is the function that Power Query will apply on each of the files in the folder your file has afforded has 20 files or 40 files on each file. It will apply. It function, the function is called transform file. And transform sample file is the query on which that function is based on. One way to think about is within this, this guy here. Within this transform sample, I tell Power Query what I wanted to do on a single file. Once I finish doing that, then Power Query will build the function for me, transform sampled function. It's an M language function. We don't need to know none of the technical stuff for this thing to work. But if I wanted to change it, then we need to know. If you are happy with the earth setup now and you'll say, okay, my job is done. I don't need anything more than my data is combined. You don't even have to know what else is there. Because our data is still messed up, we need to know what is going on. Transform sample is the function and the sample file is my query based on which the function gets internally created. If I, for example, change something in the sample file, add an extra step or do something on an individual file, then that function gets updated and all the files in the folder, we'll have new function applied and newer data gets generated. So if I am not happy with the way my final data is, the place where I would start changing is in the sample grading. We will adjust the sample query until our final output is suitable for us. That's really what we're gonna do. When you go to the sample. The sample is basically built off one file. This way you're not overwhelmed, you're not looking at all the 100 datasets. You just look at one file and then tell per query, what do you want to do. This is what really we have done earlier in the session. We looked at one file, we built the process to get the bits of data from them. Here in this file, what we will do is the way our data is, it's kind of like a little bit messed up. So for now, you don't have to, if you're following along, you don't have to do this step, but I'm just going to just do it for now so that we can open the file here. That way I can show you how our data is. Our data is technically not a single data. It's two datasets. This data, salesperson, end customer, which only appears once for the file. Then there is this table, giant table that could have some blank rows, blank columns, and lots of blank values. This is where the members are. It's basically two different datasets. So there's different ways to solve the problem. But the easier option here to, here would be if I can build two functions, one to get this data and the other to get that data. And then apply both functions one on top of another on all the files. Then we will combine data. That's one way of solving this problem. As I said, there is no one correct answer in this situations, it's technical implementation, Rachel, you could do differently and still get the same result, which is all fine. But if you're doing it over thousands of files, the new performance and optimization comes into picture when you go back and edit the query. So once you have months to come back to Excel, to go back into Power Query, you can, if you have got the query's panel, you can right-click on any query on Edit alternatively from data you can enable the credits and connections totally sure this panel, and then use the edit option. You go back here. We know we'll build two functions. So even though there is only one function built for us so far, we will customize that function so that it works for one set of the data. And then we will add a new function. So then you will see the whole spectrum of things. Transform sample file and this file in the already built one, we're gonna extract this data, the product matrix data, and then reshape it. To do that we don't need the foster two columns, so I'll just select these to remove columns. Then. We know that our product matrix good grid always starts from rule number nine. So anything before row number nine is irrelevant. Again, here we are working under the assumption that the end plate will always be true and there is nothing changes in the template like nobody adds blank rows or deletes blank rows. Otherwise, what we would have to do is we would have to scan the table, the first row where the product, what is appearing, then believed up until that time zone. 9. Don't forget this video!!!: Thank you so much for watching this Power Query lessons. I hope you thoroughly enjoyed it. Please don't forget to leave a review on the Skillshare platform so that other prospective students can also enjoy this particular course. I wish you all the best and more power to you in your future. Bye bye.