Transcripts
1. Introduction: Power Query is the
most awesome feature to be added to excel in a while. In this Power Query
Power course, I'm going to teach
you how to use power Query to connect to
any type of datasets, bring the data, and
how to clean it using a step-by-step
automation technique. You have five video
lessons in this course. So this is a really
mini-course and you can binge watch this
in an afternoon. By the end of this course, you will have some special
powers within you. You will be able to take
up any kind of data that looks dirty or mangled
or all misshapen. And then you can say, Yes, I can clean it up
and I can analyze my data with automations. So thank you so much for
getting this course. Let's jump into our lessons now.
2. Getting Started & 5 data cleanup tasks with Power Query: Here is the sample
data that we will be using for the first two videos. In fact, we will use the same
data for all of the videos, but I will introduce some
additional datasets. For videos 34. This is a typical
employee dataset because it's quite neutral. We can all relate to it. As you can see, while
it does look clean, there are some problems
with this data. To get to this data
into Power Query, you just have to
select all the data. And then from the data ribbon, click on from Table
Range option. Power Query would insist that your data should be
in a table format, so it lets you create the
table. At this point. We just say, okay, in our data gets
turned into a table. This will open up the
Power Query Editor. And this is the
screen through which we can tell Excel how we want our data to be cleaned using
data processing rules. We can see the table name here. It is called Table one
because we have created the table on the flight
Excel name of this as TB1. It will show you a
preview of the data. Think of power query editor as a separate screen where we can go and configure
various things. We will use this to answer the first five data cleanup
questions and then you will become familiar with
the screen in that process. Our first question is we
have got employee ID here, but it is having two
different values mixed up. The first two letters of employee ID or my
cost center code. And then the next five letters are the employee number itself. So we want to split the cost
center and employee number into two separate columns
for analysis purpose. To split the column, you can select the column and
then you can right-click on the column in here you will
have a Split Column option. Notice that there are different
ways to split the column. For example, we can say I wanted to split by number
of characters. This is really what we need. I want to split two characters
and five characters. Alternatively, if you
have got a delimiter, you can use that or some other exciting
options here as well. It'd be used the number of
characters and specify that I want to split two characters once as far left as possible. This is just going to split the column into two
separate columns, first two letters
and everything else. When you click okay, this is just going to create two separate
employee ID columns. So your original employee
ID column is now gone. And instead, Power Query will give you two separate
columns to work with. And they will be named
ID one and ID two. You can double-click
on the column and name this as cost center. And this one would be ID number. Here. Through this process, what
we have done is we have bald Power Query that given
this data of employee ID, split it into two columns. So while it happens with
the current dataset, whenever you have neat new data, if you run the Refresh process Power Query will
grab the new data. It will automatically
split that data as well are mixed problem is, in the department
column we can see that several people have
null department. These people should not even
be in our list of people. They are no longer working here, but somehow our data import system kind of brings them over. So we knew we wanted to
remove these people. Any rows from the data. You can use the filter option. So click on the filter and if we uncheck anything that will not be part of the final data that Power Query will provide. So if I'm unchecking the null, I no longer see that. I can use the same screen. So for example, exclude any training staff and
support staff if I don't want to see them for a specific analysis
or data situation. In this case, we will
keep all of these but just exclude the null employees. And when you click OK, those people that also gone from the data to see the steps that we have
applied on the data, you can look at the
Applied Steps area right here in this corner. And then it will list you
all the steps that you have taken so far,
clean the data. The steps that we have done are we sorted the data from Excel. So this is the source step. Then we split the
column by position. So this column is splitted
into two columns, Employee ID one and employee 82. When you click on a
specific step in the query, it will show you the
data as of Dutch step. At this step when we split the columns are still
called ID one and ID two. Then in the third step
we renamed the columns. And then in the fourth step, we have applied a filter on the department column so that we remove the null department. Let's say I changed my mind. I get the mail from
the finance team that our cost center code
is not first two letters, but it is the first
three letters. How do we change it? Simply look at the step where
you have made the change. So for example, Split Column
by position is the step. We locate that. And then there is a gain or cog button right
next to the step. You click on that. And here you can just say first three letters are
my cost center code. So make the two S3 click Okay. And you know how fast
three letters here, and then the rest
of them are there. Technically, this
is not correct. So I'm just going
to undo this step by changing it back to two irrespective of which
step you are viewing. Whenever you finish your
work within Power Query, it will only give you the
data as of the last steps. So it's not gonna give you
the intermediate values, just the last step alone. Got third cleanup problem is if you notice
the name column, you can see that some names have some extra space in the
end or in the beginning. So here I have got
my extra spaces. This is again, a
common problem in many data cleanup
scenarios when you bring the data from
an external system, sometimes it will have
some extra spaces. So we can clean up that spaces. To clean the Space,
right-click on the name column and then hear from branch form. You have the option
to trim the data. What we'll do is it will remove any extra spaces at the
beginning or at the end. So we run the trim and those
extra spaces are gone. So far we have done three
data cleanup steps. We have split the employee ID, we have taken out the extra
spaces and we have taken out any employees
in the Department. For our mixed data
cleanup scenario, we have employees start
date and we just wanted to calculate how long they are
here with the organization. As the venule. We have a date column here, but notice that the
datatype here says ABC123. Before we do any
operations on the dates, it's a good idea to convert the data into the data datatype. To do this, right-click on
Start date Change Type, and then select the date option. When you have the date
data type automatically, the icon here will show a calendar logo
instead of ABC123, which was what it
was showing earlier. Now that the start date
is treated as a date, I can go and introduce
a new column which looks at the start date
and then my current date. The date as of
recording this video is 31st of March 2022. And then tell me how long each of these employees
have been with us. To do that kind of an
operation from home, you need to go to the
Add Column ribbon. And from here I can introduce a calculation as a new column. From here on the date. I'm just going to select
the first option, Add Column date, age. What this will do
is it'll calculate the age of data date as of current day from today minus that date is
what the age would be. When you add age, you will
get the Aij added here. The age will be
shown in the format of base hours,
minutes, and seconds. So it says this employee has
been with us for 503 days. This person needs
here for 1344 days. While this is a good way to calculate the employee tenure, or 503 is not really
meaningfully. Ideally, we want to calculate how long they have been with us in ears or something white. Keeping the age column selected. We can go to the
Transform ribbon and from there I can change the
age to another format. So add column will
add the column, whereas transform will
change things in place. From transform pages shown
in a duration format. You can see the stopped clock, kind of an icon there
in frame duration. I can just say sure, this age to me in total ears. This is gonna calculate
the employee tenure in EX this person
has been here for 1.307 years and that
person has been here for 3.68 years at this point, you can also double-click
on the age and then rename this as tenure. This age calculation is
a dynamic calculation. So every time you
refresh Power Query, it's gonna read on
that calculation and update the tenure as
of duct refresh date. Right now it is 31st
of March, 2022. But in future, whenever you are watching this video and
you're refreshing this data, you will see a
different value here. One thing that you
might notice if you're following along with me, ease, your Power Query is clean, is probably looking slightly different and the steps are also looking slightly different. So there are a couple
of things that I have done at my end. And let's just reveal that the number one
difference that you might notice is you may not have
the formula bar on the top. This is something
that is optional and I like to keep this on. To see the formula bar, you can go from transformed to the beauty button and enable
the formula bar option. This is a onetime thing. And when you do that,
every time you open Power Query from any
other Excel file, it's going to show
you the formula bar. You might think what's
the point of formula bar Power Query uses its own
language called M language. And because M language is quite different from
Excel formula language, I like to understand what's
going on from time to time. If we have the
formula bar on there, you can read what's happening and make sense of the steps, because each step has a piece of logic or code attached to it. For example, renamed
Columns step here says You took the column age and
then renamed that to tenure. So that's the logic
that part 20 is using. Likewise to calculate
the total ears. If I select that step, you can see that it took
the durations total days, and then divide it with
365 to calculate the age. Here you can see that this
is not paying attention to the leap years or anything irrespective of what the ear is, it's always dividing with 365. Having that formula bar
is a great way to kind of get a sneak peek into what's
happening behind the scenes. And I like to keep this one on for my own amusement purposes. Let's conclude this lesson with adding one more calculation, which is given the
FDG of an employee. I wanted to add a
column that tells me if they are a full-time employee
or a part-time employee. This kind of calculation
is called conditional calculation because if
they're FDI is one, then they are full-time. Anything less than
one is part-time. We can use the Add Column
Conditional column option to introduce such a calculation. So add column
Conditional column. From here specify
the column name. So this is my full slash part, that is the name of the column. Then we build the condition
using this particular screen. So if FDE column, and then here it says equals less than,
greater than etcetera. So I'm gonna say
ys less than one. That means they're
full-time equivalent value is under one, then they are a
part-time employee. You can kind of build a ladder of collisions
here for now, we'll just do if-else. If they are under
one there part-time, else they are full-time. And then we'll click Okay, and it will introduce a full slash part calculation
here as a column for me, what tagging the employees
as part-time or full-time. This kind of thing is very
useful if I wanted to do some analysis on employees
nature in downstream. Now that we have
five cleanup steps which is splitting the employees cost center and ID number, cleaning the name,
removing the department, adding their tenure and
adding their nature of work, whether full-time or part-time. Let's go and load this back to Excel so that we can
see how the loaded data we really looks like before we do that is
it's called table one. Table one is not
a very good name, so I'm gonna name my query as. You can. Just select the name option
here and then type the name. I'll call this as
staff and hit Enter. So now the query will be
called staff from home, that has a big Close
and Load button. For now, we will click that
in the subsequent videos, I'll tell you how to use the other loading options as well. So we'll say Close and Load. Then what this will
do is it's gonna load up a green color table. I think this is
the default color that park where he
chooses all the time into Excel in a new
tab and the name it has, the tab is called staff
as well as the table itself that it creates
is also called staff. This green table here is the cleaner
version of your data. Your original data is still
here in the Data tab. Notice that when you
created the Power Query, it kind of turn that
into a tabular format. So this is my original data
and that's my new data. You can see that here, Darren Scalia, the employee ID, SBI zeros zeros 07. In the cleaner version, it got split into
PR and zeros 07. Likewise, Darwin's
qualia is what, 0.3 FTE, and they joined
on 13th of November 2020. So based on that information, these two we got calculated their tenure and
then we tag them as part-time employee as well because they have a department
that are listed here. But if I go here and
then look at some of these people like
minute while Stafford, David and not be
here in this data because they belong to
the null department. So when the data got cleaned, they kind of got
exit from the data. And you can see from this
filter here we don't have the null department NADH. We will have Minerva in the
search results here as well. This is how the data gets
cleaned if something changes. So for example, we get the message from
background corporate that Minerva is not a
null department employee, she moved to legal. I changed my data. Likewise, furtherance
will update their FDI from 0.3 to 0.6. I'm, I'm actually
going to make them one that we will see that
they become full-time. We made two changes. We made the torrents
as full-time employee and we mowed Minerva
to legal department. We've come here to fetch
the character data. Again, you just have to right-click and then
refresh the data. You can do it in multiple ways. You can right-click and refresh. You can go to Data
and then refresh. Or if you have the query some connection panel shown
on the right-hand side, you can also right-click here
on this and then refresh. Irrespective of how you do, you're just doing
the same process. So we'll refresh this. And boom, we get Minerva here. We get tolerance
here with DEF d1, and then they become
full-time employee. There is no change in
their tenure because we're still rerunning the
process on 31st of March. But if I want to read on this again tomorrow or day after, I'm gonna see completely
different values for that as well. So that is the first video where we loaded the
data into Power Query. We did five cleanup tasks and then we loaded
it back to Excel. Let's move on to the next one.
3. Adding Columns in Power Query: In the previous lesson, we took our staff data and then applied five data cleanup steps, which is splitting
the employee ID into cost center and valid ID number, removing spaces from
the name column, and then taking out any people who are in the null department, as well as based on the start date,
calculate their tenure, and based on the FTE, figure out whether there are full-time or part-time employee. This is the data that we generated after cleaning
using Power Query. In this lesson, let's
continue to cleanup process and add few
more steps to it. Now that we're here in Excel, how do we go back to part query? You can do this in
few different steps. Number one, you can go from
home to the data ribbon. And from here, you have got
your queries and connections. If you click on that
queries and connections, you will see all the
queries in your workbook. So this is my staff query
right now we only have one, but potentially you could have
multiple queries as well. When you look at a
particular query, you can right-click and
then say Edit declaring. Likewise, the second option is you can click on the table, the green table that Power
Query has generated. Whenever you click inside this kind of air
power Query table, you will see that there is a Query ribbon
appears on the top. And if you go there, you have got the
nice big Edit button on the codeword as well. You can use this to get
back into our query. Let's edit this and start doing a bit more
cleanup of the data. How are fast cleanup step in this video is going to
be on the salary column. What is that salary column
has got some null values. And anybody who has
gotten null salary, this is because of the
particular HR policy. And so anyone who's getting paid exactly $45 thousand
salary comes up as a null. Don't ask me know, Just
made-up scenario here. We wanted to just replace
the words with 45 thousand, so that should be their member. But before we do that, let's just understand
quickly what these green bars
on the top mean. This green bars tell us
the column data quality. If anything is fully green, that means it has got kind
of data all or the column. But if some columns have
got some null values, like here, I have
got null values. The green bar is not
all the way full. And when you hover, it
tells me that 931 rows have values and 4141%
more 4% are empty. And it also gives me an
option to remove the empty. We don't want to
remove anything empty. We just want to deal with
the salary problem here. So here, within the salary, for example, 40
values are empty. And it also would
highlight if there is any errors right now we
don't have any errors. But if there is error, it'll highlight that as well. Let's just go and add a rule that says if
the salary is null, then there the value
should be 45 thousand. This kind of process
is called replacement. So all you have to
do is right-click on the column and then use
the Replace Values option. You might be thinking, why can't I just select
this and typo or the data? You are not allowed to do such kind of a thing
within Power Query. Power Query is a
rule-based Engine, so every cleanup step that
you do must be a ruin. So I'm going to right-click and then choose Replace values. The value to find 0s and null. We wanted to replace
the Arctic 45,001. Note of caution here, Power Query is case sensitive. So if you type null exactly
the way it appears, then only to work. If I were to type this
in capital letters, even though for our
eyes they are the same. This is a different value and that's not really
going to work. So we're just gonna say null 45 thousand and
then click Okay. Then it's going to
add a rule that says if someone's
salary is known, then it's going to
be 45 thousand. Now, our salary column is
green all the way through. Our next cleanup scenario
is given the employee name, I want to extract just their first name and then print that in
a different column. We could then use that for maybe sending out a
letter or whatever. Then you just want to
print their FirstName. This kind of a thing is
where I want to extract everything up to
the first space. So that's the hostname. You could do this in a
couple of different ways. Number one is you can select the column and
then from the column, you can use the
column from examples. In this way, you will let
Power Query figured out what the logic for doing
such an operation is. Think of column from examples
as the Flash Fill in Excel. But here with column
from examples, it is a repeatable. That means if your data changes and you read and the process, it's called extract
the first names for the new data as well. We'll do this column
from examples. And from here I'm just going to type my firstName, Torrance. And then at this
point you've kind of predicted what
we're going for. And then it, It's shortly
all other values that it's going to get if the
first value is to be taught. And let's just read
the formula as well. What formula it is using, is it saying x-dot
before delimiter. Looking at the name column
and the delimiter is space. That's the actual
M language formula that park where
he is using here. When you are happy with
this, you click Okay. And then that gets added in if you'd be called
text before delimiter, I'm just going to
double-click on this and then say firstName. That's the name of this column. And we will get the
employee first name here. It will still keep
your original name, but it's going to add a new
column called firstname. Now that we added this, I want to see this first-name
right next to the fullName, not all the way here. One way to move this column
is click on the column, hold your mouse and then just move it right next
to the name column. This will rearrange the columns with the name here
plus num here. You can look at the M
language code here it says tabled or to
reorder columns. We'll continue our discussion
with the name column again. Given the name, I want
to rewrite the name as, for example, taurine
Scalia is their name. I want to call them as qualia, camera tolerance, chancy
comma shape, bird comma gati. Again, this is how in certain situations you
want to have last name, comma, first name as an option. We will make another column, which will be name2. And then that's where inductor and if avoiding will be done. Again, we could use the
column from examples option. But because you have
already done that, I'm gonna show you a
different technique, this different
technique in walls from that column we have
got an extract option. From the extract I
have got fixed before delimiter text after
the limiter options. I'm going to use this to kind of map out the name to
two different columns. We already have the
first name column, so we'll use the first name, we'll just generate
the last name, and then we'll do that. So extract text after delimiter. Then the delimiter is space. So I'm going to just press
space here and click Okay. This has got to
extract endodermis. Put it all the way at the end. It's all text after delimiter, and then it'll just have these
last names printed here. Notice that some of
these last names have some extra space. And the big thing as well, this is because vendor name
is entered into the system. It does how multiple spaces in the middle and those spaces
are not removed by trim. That's why in those spaces
are coming up there. This is all good ones. The big stuff to
delimiter is coming up. I can right-click quickly,
trim this as well. So this excess business is gone. Now we have got a
last name here. In a first-name here. Then what we want to do is we want to take the text
after delimiter, place a comma, then make the
steam into the final column. Here we will use the
custom column option. We have the conditional column, we applied the column
from examples. We'll try the custom column. This is where it
will open a screen asking you to write the
M language yourself. This is my name2 column. Here. Will pick the text
after delimiter. That's the last name, ampersand, within double-quotes
comma space. And then we'll pick
the firstName. This is the M language formula to introduce a new name that takes two columns and then put the comma space in
the middle there. Click Okay, this has gone to
call your parents chancy, shame, but gati like that. Now that this column
is generated, we no longer need this guy here. We only use it to get there. At this point, you can
right-click on text after delimiter and then
remove this button. Or if you want to keep it, you can keep it and then
rename that as last name. I'm just gonna remove it. Now we have got a name2. And again we will move it
right next to the name. We have name, name2 and
then firstName here, it all nicely listed. Last transformation or
our last data cleanup in this particular
video is going to be looking at the start date. We know how long an
employee has been with us, but some of the employees
have left us as well. So for example,
we have a date of termination telling us
that as of September 24th, 2021, this particular
employee, Ali, has left the organization based on the date
of termination. I wanted to know if an employee is the current employee or not. We will have a active
indicator column. First step, you
can see that date of termination is ABC123. And so I'm just going to
right-click Change Type to date. This way, you can
have a date or lull. And again, the column
quality tells me that only 80 people have
a date of termination. 892 are empty. That means they're all
still current employees. Based on this, I want to add a column that tells me whether
their current dot naught. This kind of a thing is perfect for the
conditional column. So we'll just use add
column, Conditional column, and then see if date
of termination equals. And then here you
can just type null and outputting is yes, that means they're
active employee. No, no means they're no
longer act to click Okay. Anyone who laughed and
they will have no there because they will have
a date of domination. That concludes this
particular video. But before you vanish, I have got a homework
assignment for you. The homework assignment
ys on the salary column. I want you to look at the
salary column and then create a salary group has a new column. The salary group
logic is simple. If you absorb the salary column, our salaries go from $28 thousand all the way
up to $119 thousand. So it may be a $120 thousand. We want to group our
employees into four buckets. Anybody under 50 thousand, anybody under 80 thousand, anybody under a 100 thousand, and then more than
a 100 thousand. So those are the
four groups under 5050 to 808100 and
more than a 100. Based on the salary, you need to introduce an extra column here
called salary group, where such a value
can be maintained. I leave it to your imagination
on how to do this. But if you have some
trouble with this, do watch the homework
solution video that explains that process. For now, I'm just going to close this particular one by
clicking on clothes. And Lord, this is just going
to update our green table. Those extra calculations and extra name columns
now so you can see that original lame is here, name2 plus name, and all of
these other things as well. So if I want to update someone's termination
date in my original data. So for example, parents decided to leave on 31st of March 2022. So we put their
termination date here. You come here, right-click
and then refresh. Immediately. They will buy market as no
longer active and death date of termination will
appear here as well. That's good luck
with your homework. I'll catch you in
the next video.
4. Homework Solution: To add the salary
grouping as a column, you can use the
conditional column option and build that kind of a ladder. So we'll use that
conditional column. Here. The column name is salary group. If my salary is less
than 50 thousand, then the output would
be less than 50 K. Then we will add
one more clause. If salary is less
than 80 thousand, then the output would be 5280. I've taught clause,
which is if salary is less than 100 thousand, then it needs to be 8200 K. Else, we don't need to do that thing for one
more clause here, we can just use the else clause. And then we will say
greater than a 100 K. So those are the outputs. You can type literally
anything here. You could, for example, have
a label like low salary, medium, high, very high
salary or whatever. When you finish doing that, if you click Okay, it's called add those salary
groupings here for you. This is an excellent way
to bring some sort of extra values based on business rules and then
use them in Excel. You could have
built this kind of a column in Excel as well, but because this is
a port query video, I'm teaching how to do
this within Power Query. If you change the
conditions and let's just say our criteria
is no longer 5080, but it is 5075. Then you can use the
cog button here. The cog button, I
can click on it. And this is going to have
those boundaries here, and I can simply
change the value. So here from 80, I'm gonna just seems
this to 75 thousand. And the renamed
the values here as 7575 to 100 K like that. Make sure to add
just all the things that need to be
adjusted whenever these kinds of things
change and when you click Okay, that will update. Another way to do these
kinds of changes, IZ instead of using
the cog button, if you have got the formula
bar visible on the screen, you can directly edit the items in the
formula bar as well. Maybe are less, lower
range is not 5845. So I can just type
45 thousand here and then rename my labels
as well as 45 K. And that's called adjust. Update as well. Whenever you finish
typing the formula, if you presenter, that's just going to change
those things as well.
5. Merging Tables and Filtering Data in Power Query: So far we have cleaned up
the data within one dataset. In this lesson, I'm going to introduce another dataset and show you some more techniques for the purpose of this lesson, I'm going to assume that each of our employees
is going to receive a bonus based
on their salary. The bonus percentage is decided
based on the department. So if you are in
training department, let's say you receive 2% bonus, but in human
resources department you might receive 3% bonus. To make all of this happened, we have got to follow
a specific policy. The policies that the bonus
is only awarded to employees who are currently active
and who are not temporary. So they need to be
the permanent or fixed it comes staff
who are active. Let's go into Power
Query and kind of do that process where
we take anybody who is a temporary or an inactive employee and then bring
the bolus details as well. If you go to Query Editor, this query in here, Let's go ahead and remove
anyone who is not active. So I'll select
this and then say, I don't want any
inactive employees. Likewise, we will come to the employee type and
uncheck the temporary. At this point, we have reduced
our employees to adjust dominant and fixed symptoms
staff who are active. Now we need to
calculate the bowlers based on their salary. For the purpose of this, we do have a bolus
mapping table. Let's, let's quickly
Lord this back so that I can show you
the bolus mapping table. Here is my smallest
mapping table. It tells me by each
department what is the percentage of bonus
we are going to go? We need to bring this data
into the dataset that we already have and
then combine this to generate the
bonus calculations. You can, for example, Control C this go here
and then paste this data. But because the data
is in another file, we can directly connect
to that as well. Instead of copy pasting, we'll use Spark
query to first bring the data and then combine
that with this data. Go to data and then get data from file from
Excel workbook. This is because my bonus
mapping table is an Excel file. If you are born as mapping using a text file or it
isn't a database, you can use those options. So I'll just use From Workbook. And then point to
my bonus mapping. I have a more
advanced version of bolus mapping problem in
the bolus mapping to files, which we will talk about
in the next video. This will show you
a navigator screen asking you pick
what data you want. I want to bend my bolus mapping
tables. I'll select this. We don't need this table, instead we need to
calculate the bonuses. So we'll get into
transformed data and this will load up the bonus
table into my park already. You can see that our staff
table is also there. Now, my bonus table tells
me what the bolus ease for each department in
the staff they would like and see what department
the person belongs to. If I can get right
next to department, they bonus percentage
that I've taken the percentage
multiply that with salary to get the
bolus value as well. So this kind of an operation
is called margin to tables, where I want to
match this table on the department column
with the mourners Stevie. Here. To do this, you can use the Home ribbon Merge
Queries option. This will open a dialogue. Ask you which column
you want to match. So on staff table, I want to manage a department
with my bonus table. Department. This is going to quickly
provide you a feedback here, saying that you're able to match up all the rows here with those. And then just click Okay. In here we will have the associated bonus table
for each employee here. If I click on this table, I can see that because that
person is human resources, they are required
for the bolus is 0 for this person product
management, 0.05 or 5%. We don't need the table, we need to just extract
the bonus value. To extract the value, you need to click on this
sideways arrows button and uncheck the department keep the bonus and uncheck this option also use original
column name as prefix. This way we will just get
a column that says bonus. Boom, we get the bonus value. The bonus value shows up
here as a decimal number. I'm just going to
right-click here, change type to percentage. This will show me what
the bonus percentages. This has only part
of the problem. We do get the bonus percentage, but what we need is actually
the boneless value. To get the bonus value we
need to take the salary. Then multiply that with the
percentage of boldness. To do this, select the
salary column first, hold down your control key. Go and pick the bonus column. This way you have selected
both of the columns. Now from Add Column. You can do an arithmetic
operation of multiplication. So add column
standard multiplying. This is going to create a multiplication of
those two numbers, which is salary times bonus, and then add that as a member
here with the bonus value. We can just change this back to cut and
see if you want to. And then you will see the
multiplication result as a currency value of how much bonus each
person receives. Gonna rename this
as my bolus amount. Now that the bonuses
are calculated, our next job as part of
the data analysis East to split this bonus calculation
values into two tables. One for all our employees
in USA locations and another for all the employees in New Zealand locations,
do that bit. What we want these instead of loading the staff table
as a single item, which we will see that if I
load this now close and Lord, we're gonna just get the staff table updated
with the bolus values here. You'll also get the
bonus table here. This current way of loading
the data is not what we want. Instead what we want is we don't want a single staff table. We want to separate
staff tables, one for all the USA locations, that is Bellevue
and Los Angeles, then all the new
Zealand locations, which is only one,
Wellington, New Zealand. So I'm gonna go back
to my Edit Query. First step, let's
split this data. To split the data, you
have two techniques. Number one is we can create
an exact replica of this. This is called
duplicating to duplicate a query law that
they have staff, I can right-click on it
and then say Duplicate. I'll get staff to staff and staff two are
exactly same quantities. If I go to staff, you can see that it
has all these steps. Staff too. We'll also have all these steps. They both begin from
the same Excel file. Apply all the cleanup steps
to come to the final stage. Then in each table I can then go ahead and do the necessary bits. So for example here, I'll say that I want to
just see my USA location. So in staff too, I will uncheck my Wellington, New Zealand, and then
we will call this as staff hyphen USA. We can go to the
original staff table. Here, go to the location
and unchecked that USA bit so that only the New Zealand
locations remind here. And then, and name this
as staff hyphen ends up. This is one approach. The problem with
this approach is both queries will need to run. So staff engineer needs
to run all these steps. Staff USA will also
run all these steps, but this is kind of duplicating the work and that is why
it is called a duplicated. There is also another
option called reference. First, let's just use
the duplicate option, load up the data, and then I'll come back and
explain to you how to use the reference queries
before we load the data. We also want to make sure that the bonus table doesn't
get loaded into Excel. We don't need the bonus
table to see on the screen. We just used it to do
the multiplications. So instead of closing, Lord, we can just use Close
and Load To option. For now. I'm just going to say
only create connection. And then click Okay. At this point we will have all the three
queries listed here. Staff USC is the
only connection, but enzymes and bolus
stable or loaded. Now what we want is we don't
want the bolus tables. I'm right clicking
on the bolus table, going to load two. Here. I'll just say that this
should be only a collection. We don't want any table,
we'll click Okay, this is gonna give me a warning saying possible data loss. This is because the
data is already loaded. This is fine with me. I'm just going to click Okay, and bolus table is now
just a connection. I can delete this worksheet. And then we have staff
enzymes already loaded. Law. I'm just going to right-click
and then say a staff USA, they should also be a table
into a new worksheet. Boom, we get two of the
query's lauded as table, and then one other query just
maintained as a connection. As I mentioned earlier, while this was all good, both staff and z bench
staffing USA queries need to run all the steps. This is where the reference
query comes into picture. Let us see how to use
the reference query. For the purpose of this, I have created a separate file, staff data3 reference query. And notice that this staff
file has all my people, not just the New Zealand steps. So here if I go to location,
I can see everybody. Let's right-click
and edit this query. And here we will leave the
original staff table as a tease and right-click on it
and make a reference query. My original staff,
they will ease the master query and then we're gonna build a separate
query called staff and z. Notice that staff and z, because it is a referral query, it has only a single step. The source step simply
refers to that. This query is the same
as the staff table. This way, this query, the staff NSAID
query will not run until staff table has finished its processing and then
it just really uses those values within this query. I'm now going to the
location and then just say, this is only the
Wellington staff. This is staff answered. We will make reference
to the original staff. So right-click and
then reference. Then this one we will
call staff USA. Again. Here we will apply the
same kind of filtering. We now have four queries. The original query, my bonus stable and then to
reference queries staff, NSAID and staff USE. I'm gonna do one other
thing on both of my staff, NSAID and staff US equities, which is let's say for the
purpose of bonus analysis, we don't need all
of these columns. We only need certain columns. We can go to staff
NSAIDs to begin with, and then from home
choose columns. And then kelp, our query which columns we want to see
in the final output, I will deselect everything, will keep cost center
employee number. And then the name2, which has my last name, comma, first name, gender,
department salary. We don't need none of
these other things, maybe employee type and
location, bolus amount. Those are the only
columns that we want. And this is going to
give me that output. Now I want to repeat
the same exact step on the staff USA as well. You could go to staff
USC and then again, use the choose columns thing. But here is a little
sneaky trick. If you have got
your formula bar, this is something that you
can readily use to begin, go to the Formula bar, making sure that
you have selected the remote other columns
option from this query. Then you can see that
the M language code that park where he has generated
to keep those columns. So it says table
dot select columns, rows, and these are the
only columns I want. So I'm just gonna select all of this formula control
C to copy it. And then I'll select on, I'll go to my staff USA table. Now, here we'll click
on the Effects button. This is going to add a step. And then in this step, we will paste that formula. Make sure that there's no
extra equal to certain eating. And then hit Enter. Boom. My staff USA table also
has the same columns. Quick note of caution though, this method will
not work if both of these queries do not have the exactly same steps
in the same order. That is all. Now I'm just going
to say Close and Load to. For now we will just
create a connection. Then I'm just going
to load up this data. Staff enzyme should become a
table in the new worksheet. Likewise, staff USE should also become a table
in the new worksheet. We get both of our cuts off data staff USA and staff insert here.
6. Automating Data Combine & Clean-up: So far we have only bought
could with the data that is in one place altogether. In this video, let's talk
about how to deal with scenarios where your data could be split in different ways. First example is
instead of having all the staff data in a single
tab in one continuous set, I have got individual tabs, one for Wellington,
one for Los Angeles, and one for Bellevue data. We would like to
combine everything into one big staff table in present
that in the output files. For this example, I'm gonna
treat my staff at Penn dot XLSX file as the
master data file. We will open a blank workbook. And in this workbook
we will combine all the three individual tabs of data using the append method. In the subsequent
parts of this video, I will show you other techniques
for combining the data. Depending on how your data
is structured and available. You can use all of these individual
techniques as you see fit. For now. I'm just going to close
my staff APN table and go back to the blank file. And from here, say data. I wanted to get the
data from a file, which happens to be
an Excel workbook. Even though I'm
demonstrating the technique, the Excel file, I want to remind that whatever
you are learning, you can apply that for
database tables or cloud solutions like
Azure or other things. It will go to from
Excel workbook and select the
staff append file. These other files are the ones that we will use later
on in the video. Then say Import. This is going to show
me that there are three individual
worksheets in the file. Which one you want to bring. Now we need to get all this
data and then combine. So I will use the select
multiple items option and check all the three tabs. We don't need to load this data, we need to transform it because this is
not how I want it. I want to combine everything
into one big table. We'll get into the Transform
Data option first. This opens the familiar
Power Query editor screen with three queries, one for Bellevue,
one for Los Angeles, and mourn for Burlington. All of these tables look good, but there is a
couple of problems. Number one, my header
row is all wrong. It is saying column 12345
instead of the number one rule. Notice that depending
on how you have setup your own Power
Query preferences, this particular behavior might be slightly different for you. If you already see employee
ID in the column header, you don't need to
do this next step, otherwise you will
have to do it. Which is we wanted to take
the row number one here, make that a header
on this table. So to do this, we can go to
the Home ribbon and then use Foster Row as Headers
option right here. This is going to promote
the employee ID header. Will repeat that on Los Angeles table and then
Wellington table as well. Now all three tables have the data with kind
of trade-offs. At this point, I just
wanted to combine all these three tables
into one big table. And this is where my app and queries functionality
comes into picture. It will use the append queries. If you click directly, what it's gonna do is
it's going to append the other two datasets into
the Wellington dataset. Instead of doing that, we wanted to create
a brand new query that has the combination
of these three tables. I'll use the extra option and then select append
queries as a New. This will ask me how
I want to append, whether I went to
append two tables or three or more tables, irrespective of which
option you pick, the process is similar, but in this case we
do have three tables. I'll select three or
more tables option already Wellington is added. So I'm going to add my
Los Angeles and Bellevue. Then when you click okay. Magically, all the data from these three
individual tables ease combined into one giant table that has all the
data in one place. It still has the other problems, which is my gender
null is there, my training department of null, null values are there
and all of that. But at least it is having
the data in one place. At this point, you can go ahead and apply those cleanup steps. But I'm just going to
rename this query for mapping B12 combined data. Then click on Close and Load. This extra option will make a connection
for everything fast. This is gonna show
for connections. And then right-click
on combined data and change the Lord
behavior for that to a table so that only the combined data
is loaded up here. The individual tables
are not loaded, just the combined table. As I mentioned, you
can go ahead and apply the additional data
cleanup steps on this for your gender department name got actions and all of that. If you want. So that is how the combine
or append method works. You might have a
questionnaire about that pen process, which is, let's just open the file first
so that we can get this. Notice the order of
the columns here. Everything has
employee ID, name, gender, department, and
ends with the location. In all the tabs. You might think, what if my order of the
columns is different? It doesn't matter which
order these columns are in, as long as all the columns are
present in all the tables, the order doesn't matter. Power Query will rearrange it. To prove this point, I'm gonna go to my
Los Angeles table and select the manager column
and then move it here. That location is at the middle. Manager is an end. For my belly view, I'm going to move a gender
column all the way to the end. Delete. The column order
is kind of jumbled up in the original data
file. I'll save this file. We go back here, right-click, Refresh and no errors, nothing. It is still going to work. It is going to give
you all the data as it is in the original files. Just that Power Query will
automatically rearrange. Now comes the next question, which is what if the
columns don't match? Imagine, for example, here in my belly BW Data file
instead of department, I'm gonna call this as group. We have a separate name here. Now let's see what happens. I'll save this, we'll come
back here, click refresh. And then this is going to add an extra column called group. It's gonna keep the department, but it will introduce
a new column, put it at the end. Duct group will have values for just the bell of view people and those Bellevue people
don't have department. So if the column names don't match and then we
have a problem. But if the column names match with the ordering of
the column names, doesn't matter
about clarity will automatically adjust
these things. This is one quick thing that you need to keep in mind
whenever you are appending, you need to make sure that
the column names do match. So let's go ahead and
fix this problem. I'll come back here. I'll just say, department,
save this file. Go back here,
right-click and refresh. And boom, that
extra column is now gone and the data is
all sitting here. Nice and pretty. Lets
the first thing, which is we add using appended combine with
the append method. We need to know upfront how
many tables we are appending. That's not always true in
some business situations. So to give you a sense of that, let me show you another example. Here I have got a
different file. Instead of three locations. We know how extra employees coming in from our
Columbus office as well. Now, I want to combine all these stamps
into one big table. The challenge here is while
we can use the append method, because this file may have
some extra apps added. Your query will not work. If there is new tab sagging. Will need to think about this problem in a
different light, which is, I want to take all these data and
then combine it. So irrespective of how
many tabs are there, I just wanted to
combine everything. Let me show you how to do this. Again, we were
treated the staff by location as my master data file. We will insert a blank workbook and here we will
combine the data. In this workbook. I'm gonna go to my data, get the data from file, from workbook, and select
the staff by location file. Again, you might be tempted
to select all the four items, but instead just pick
one of the item. It doesn't matter
which item you pick. I'm just going to
pick my belly BYU for now and then go to
transform data. This is just going to show me only the belly View tab data. At this point, we do
need all the four types, not just the view one. So I'm gonna delete the navigation step and go
to just the source step. We will delete the step using that x-bar icon right there. If you click on it,
that shape is gone. At the source level, you will see that Power Query has farmed for
different worksheets. Largest belly Bu, but
it won't Wellington, Los Angeles, Columbus,
and bilabial. It also found a
filter database thing which is not required. So what I will do first
is I'm gonna filter by kind just Sheet that we are looking at all
the individual sheets in the staff by location workbook. At this point, all the
data of individual tables can be accessed with
the table thing here. If I click on this,
you can see that all my Wellington data
shows up at the bottom. All by Los Angeles, Columbus in Bellevue data. Here. All we have to do e's, get this data and expanded. For that. We will just select name, hold down, Control select data. So we only select
these two columns and then right-click and then just
say remove other columns. So we don't need those other
columns, just these two. Now that these two
columns are there, I'm going to expand this data. Just say columns one to 11. The column names don't
really look correct, but we will have to fix
that problem mixed. When you click okay,
you're gonna get all the data for each
of the worksheets here. Wellington data, my
employee ID, name, gender, department, etc, is coming
up here. That's the header. Like that. There will be another
header farther down the Los Angeles rule. So as soon as we begin Los Angeles and then
we bought more header. And then this is going
to continue like that. Then there is some
nine values as well, which is probably some blank
rows that it is fetching in. So we will have to delete
all of those things. We're going to make this
particular row a header. At this point, depending
on how your data is. In my data, I do have
location Wellington, New Zealand tagged here, so I don't need
this name column, so I'm going to delete
this column as well. Then we only have this Select From Home ribbon use the
first row as headers. Now that the headers are done, we will have to remove any extra current
source of the header. So because there will be another
employee ID farther down in the Los Angeles data if and when we go there like that, There's got to be
multiple employees, one for each worksheet. Here, all we have to do
is using the filter. I'm checking the employee ID. Likewise, I'm gonna
uncheck the null values because if there is any
blank rows brought in, we don't need that as well. Those filters are going
to clear up the data. And at this point this
is my completed data. So we will just say all data
as the name of the query. And then you can go
ahead and load it, or you can apply some
additional data cleanup steps. For example, we're gonna remove anybody in
the null department. Right-click on the
gender, Replace Values. Non width. Other. Likewise, you can also
use date of termination to tag an employee
as active what Lord, and use the FDI to figure out whether someone
is permanent or sorry, full-time or part-time
and use the salary null and replace that with the 45 thousand figured if you want. I'll leave all of
those things for you as extra homework problems. Now that all data is combined, I can load this. Before I Lord, I just wanted to quickly explain again
what we have done. We have started from
the Excel file, then we have filtered
to just the sheets, and then we just
expanded all the data. We are going to get
one dataset that is combining individual
sheet data into one goal. Now for this particular
technique to work, the column order should
match in all the sheets. If the columns are out-of-order, then this is not
gonna work very well. This is because when
we are expanding, we are only expanding and
combining column one of each sheet into one column
because there are no headers. For this reason. It's not gonna automatically
match the columns. So this is why the
column order should match when you are using
this particular technique. And finally, this is
my output rate table. I can just close and load this. This is going to just load up all the data into
one table nicely. Now let's see what happens
if you want to introduce any other branch into a new
tab in that original file, I'll go buy goto my
staff by location page. Here I have introduced the Auckland branch data and these are just
some of the people who from my Bellevue branch I got just move them
into our claim. All of them have their location and we know have an
extra worksheet called Auckland will save this
and then close this file. Come here, make a note
of all the locations. So we have only four locations. Now right-click Refresh. And this is going to bring
those extra rows here, law, the number of
roles have changed. And if you go here you
can see that we know how Auckland people also added. They should probably be
all the way at the end. Like here, all my outlet
people are listed nicely. Appending, as well as using that expanded view are two
ways to combine the data. When you have data
in the same file. But many times your data may not be even in the same file. So for example, in here, instead of having the
fire data like that, I might have individual files, one per location will
have a bilabial CSV file, Los Angeles CSV file, and a Burlington CSV file. Each file giving me the data of the staff within that location. If the data is like this, how do we combine? So that's our next topic. For this to work, we will open a blank
workbook and go to the data, get data from file. And instead of using the
Excel workbook option, we will use the
from Folder option. This is a powerful automation
technique built right into Power Query using which I can take all the files in a folder, combine all of them
into one giant dataset. So we'll say from folder, select the folder
that your data 0s. Here, my location data is the
folder and then click open. This is going to show you
a screen where it says, I found three different
files in the folder, Bella View, Los Angeles, and Wellington. What
do you want to do? You have an option to combine. You can also load
or transform data. Because we want to combine, we will use this option and then say combine and transform data. You can also combine
and load directly, but I'm just going to use
the transform option because then we can go ahead and do a little bit more
clean up if needed. I'll do that. Then this will again
open one more screen. And then it says, Harry's the fast workbook
within your four ducks. So it's going to show
you a sample file. It could be the first file, but you can also see any
on any one of the files. And it says, this
is how the data is. Are you happy with this? If so, I'm gonna go
ahead and get it. So everything looks nice
and tidy for me here. I'll just say, okay,
this has got to know, load up Power Query, do its magic, and then combine all the
data into one file. So this particular
query that we are gone, we're watching right node E's already combined
all the data. So it has my belly
BW Data on the top. But if I keep scrolling, I'll find that in all my
Los Angeles data is there. And further down we will have our Wellington data as well. But let's just understand
because it would have added many other items on the right-hand side or the left-hand side to
facilitate this process. The way the folder automation
work seize power Query built a whole bunch of different
ways to make it happen. Your final query will be in
the other queries folder, and then it'll be
called location data or whatever it
is the folder name. But to make this process, it will use the sample file that you have seen on
the earlier screen. So it showed you 11
file in the Nazca. You Is this what you want? So it will use the
sample file and then it will build transformed sampling. This is the sample query, so this is only
working on one file. Rebuild the query for one file and then run the same
process on all the files. So if I wanted to change anything that I'm
doing at a fine level, we just have to do it at the transformed
sample file query. For example, if I see here, I can see that my column
headers are wrong. We could kind of promote
this role as a header, but we will still have to delete this extra headers in other
files, which is a pain. We might as well just do that Edit Transform
sample level. In that way, what happens
is Power Query is gonna repeat that process on each of the files
in the folder. So we'll go to the Transform
sample and here apply the US first row as head of step that this it'll
becomes the header. Likewise, if we
want to do anything else at an individual
file level, you can do those
processes here as well. Like for example, removed
a null employees. Once we make the chain
at the Bellevue file, if I go to my location data, you will see that
it has done the same for all other files. So while only better BW
Data is what we cleaned, Power Query reuses
that process and then applies that on
my Los Angeles file. In the Los Angeles date. Also, there will be other gender and nobody will be in the
null department anymore. Because there's still
some blank departments. I'm going to uncheck that. Those people are all gone. And that's pretty much it. Our data is now clean and we
can go ahead and load this. So again, we will just
say tills and Load to create a connection for now. Right-click on location data, load to a table. Only this credit gets loaded. None of these other in-between
queries are loaded. They are all maintained
as connections. This is a very powerful
automation technique. What it means is
if you want to add any extra files to that
folder, Refresh this process. It's going to pick
up those files and add them at the end. I'm not going to demo that, but I will leave that to you as an extra homework exercise. So there you go,
three different ways to combine the data. One using the append
query is option one using the multiple sheet
combine option and one using the
folder combined option. Depending on how your data is. Use one of these three
techniques to consolidate the data and degenerate bigger datasets that
you want to work with.
7. Unpivoting (reshaping) Data: In this video, let's
talk about one of the most powerful
technique in Power Query. It is the ability to
unpick what data. Remember our bonus example
calculation problem. We had bonuses defined one bolus percentage
per department. But what if you have
bonuses defined like this? Here I have got a
matrix style of bonus allocation way
for each department, each category of staff, we know what percentage of
bonus should be allocated. This is where we had gone to uncover two
different techniques. Number one, how to take this
data and reshape it into a format that works best for
doing the bonus calculation. And the number two
technique ease out to lunch tables on
multiple columns. Let's go ahead and do that. I'll close this file. This is one of the
earlier workbooks. So this is not the
workbook that we have from folder combine
or anything like that. This is the workbook from lesson number two, but
it doesn't matter. You could use the
original data as well and still do this kind
of a thing because we do have the manager column as well as the
department columns or both columns are
required to figure out what percentage
bonus to our board. Here I will just go
to Data and then get data from file,
from Excel workbook. And point to my bonus
mapping to file which has the extra magnetic stabilise
mapping information. Let's click OK aim. This is going to
open up Power BI, Power Query navigator and then asks you, what do you want? There are a couple
of different things. We have got our
main bonus table. There is also a named
range called bonus dot matrix that
also comes up here. I'm just gonna pick the table because this is the
correct format for our situation and then get
into transformed data. This brings up the
table here nicely. While this kind of a
table is very good for maintaining the information, It's not really good for
calculating the bonus. So if I go to my staff table, you notice that we have got
our department column as well as the staff information column like what kind of
manager they are, whether they are managers in a manager or staffs
or manager column. We need to use these two
columns to kind of figure out what percentage of bolus
each employee is getting. The orientation of
information here IS department has
its own column, manager has its own column. Whereas this monastery will hear department has
its own column, but Manager kind of
goes across the screen. This kind of format is referred
to as a pivot TED format. It kind of looks
like a pivot table. From this, in order
to map the data, we need to foster
and pivot the data. Part query offers many
powerful features to and pivot the data to unpick what
a dataset like this, what do you want
to do is click on the department column
and then right-click. You need to already select
the department column and then say unpaid
about other columns. This is going to keep the department column and
then take the rest of it, a non-pivot it, you will secretly what it does
when you applied. So it's going to split
this into three columns. So your original table is now remap it to three
different columns. One easier department
which is retained. The next one is
attribute column, which will tell you what
sort of person they are and what percentage
bonus they're getting. All combinations are
featured in here. Now that the
formatting is right, I can go and map
it out if I want. I can renamed attribute to
manage our question mark. But this is not necessary
because we don't, we won't be using this table. We will be directly
calculating the bolus. Now that Dan pivoting is done, let's go ahead and map it. Before we jump there, I just want to highlight the
other and pivot options. So if I right-click here, you can see that we have got
three kinds of an pivot. So depending on
how your data is, you may want to try one of these pivots and they all
work in the same principle. But depending on what you select Done Pivot
outcome changes. In this case, untoward
other columns was the option for us
that when we get to keep the department
column and kind of reach, rearrange
everything else. Let's go back here and now we're going to use the
home match queries. Pick the department column, hold down control, and
pick the manager column. The ordering in which
you pick is also matters department
is the first column, manager is the second column. We have mapped like that. I'll pick my bonus State
Department control manager. Now both of these columns
have mapped 11 CO2. What Park where he will now do is it'll take the combination of training manager and then it will give you a
training manager, 0.2 as the output. So it will take care of all the combinations and
how to map them correctly. Once this is done,
we'll click Okay. We'll get the same old
bonus table for each row. That tells me what the
percentage for that person is. We'll just expand this, unchecked these options
so we only get the value. We don't need to see the
department and manager again. And there is my
percentage of bonus. Now that the percentages
here I can calculate the Bolus value by selecting
the salary column, hold down control and
selecting the value column. Now that both columns
are selected, we will add column
standard, multiply. Boom, we get our
bonus information here that comes from
the matrix data. Here I can just rename
this as bolus dollar. Then I don't want
this value anymore. I can remove this
column as well. I'm just going to
keep that value there so that we can refer to it when the data
is sent to Excel. Now that our I'm
pivoting is all done, I'm just going to
say Close and Load, load tool and just
say connection. This way, my bonus table
doesn't get loaded here, but the original
staff table now has these extra columns value and bonus value is my
bonus percentage, and bonus dollar is my
dollar information. If you want, you can also apply some additional formatting or a rounding of the values
within Power Query. Or once the values are in Excel, you can do whatever
else you want to do it. This data, this is one
kind of on puberty, but I'm pivoting
is quite useful. So I thought we
will take a look at another example of NPV at the very last example
of this course. For that I've got a
separate file altogether. I'm just gonna save this
and open that file. I call this as a hunt pivot. This, this is a good challenge. If you are feeling
a bit adventurous, pause the video right here and go ahead and non-pivot
to this data. But if you are still coming to terms with the whole
non-pivot concept, let's get into this
particular example. We are doing a
corporate games week at awesome chocolates. And this corporate gigs week is happening between
11th of April two, 15th of April, that full
week, just before Easter. We have got some some of our staff playing
various kinds of games. So for example, on Monday 11th of April,
we're playing cricket. And these are the people
that are playing cricket in the corporate games
week on 12th of April. These are the people that
are playing volleyball. These people are
paying basketball, these and tug of war. These people are
playing. So we have got a matrix style data. You can think of this like
a pivot table style data. The big problem is each cell itself contains multiple names. If I wanted to ask
a question like, how many games
Michael is playing or how many how many individuals
are playing volleyball? If the same person plays multiple times and we
don't want to count them. If for example, Phyllis is
playing volleyball on Monday, she's again playing on
Wednesday and on Friday. So we don't want to
count fill is tries. We just want to count her once. How do you answer
those questions? It becomes super impossible
with this kind of a format. Whereas if I have a
tabular format with sport, date, name, and three
columns, then we could do it. So regular format rather
than a pivoted format, I'm going to select
all of this data. I didn't want to turn
this into a table. Whereas if I try to get
this into Power Query, it will say you need to
have the data as a table. So another alternative is if you are working with
the datasets like this, select the data and give it
a name from the name box. You could either do it from
here or formula defined name. Either of those options work. I'm going to name this as my corporate games
week, CDW dot memes. Once you have it
as a named range, you can go to data from Table Range option
and Power Query will not ask you
to make a table. The data is coming
here nice and clean. We know that this row
is the header row, so I'm just going to
make use for stress headers that we take
that into the header. Now that that is done, we'll keep the sport
and pivot to this. So right-click on this
and pivot other columns. So we get sport attribute value. I can beauties date. So I'm just gonna
say this is my date. Then value has my names. Notice that each cell would have all the names comma separated. Here is a good thing. We have followed a consistent
formatting does names are always separated with a comma
and then there is a space. So the delimiter technically
is comma and space. If we could take each
name and then spill it down like cricket, 11th of April, Erin Karen, David, Andy like that. Separate rows, then our
data is in a good shape. We can go and analyze it. So all we have to do
is right-click on the value column and then split the column by a delimiter. Here instead of comma, I'm just gonna say
Custom and then say. Comma space. So that's the whole delimiter. Then by default, if I
just click OK here, I'm gonna get individual
columns, one per person. And this also makes it impossible for us to
ask questions like who all are playing
volleyball because then we would have to believe
with multiple columns. Some of them will have
null values because we only have four people
playing that on that day. This is not going to cut it. I'll delete that step using that X mark
will go back here, right-click split column
and by delimiter. And then custom comma space in each update on
some delimiter. And then we use the
advanced option. And then instead of
splitting it into columns, will say, split this into rows. That's it. This way what it will do is
it's gonna split that into individual rows and everything is gonna work nice
and beautiful. Ready click Okay. Your
entire table gets expanded and you'll
get a larger table, source it from the
original data. So we've got your
sport date and value. Name this as person. I'm going to right-click
on the date, change type to date as well so that that's coming
up as an error. This is interesting. I mean, this is not something
that I was counting on encountering in the
video nor explaining it. But now that there is there, Let's go ahead and see. Just gonna point on this error. We couldn't pass the input
provided as a date value. So for 11202212 AM, I guess the reasoning is
this format is month, day, year, whereas my Power Query is
probably looking for day, month, year or year,
month day format. So it's not really
happy with this format. We can kind of go
back and delete that so we can look at
the change of type, but I'm going to undo the
change at type as well. And I'll show you
an adult technique. We can right-click
on the date and then go to change
type instead of date. I'm going to use this
last option using local. Whenever you have dates
that are not asked, but you are a system
default format. It's a good idea to
use the local option because that way you
can tell Excel that you are dealing with the
data that is for Medicaid and other countries
default date format. Here my datatype is
going to be a date. If I leave it here,
English, United States, you can see that the
default formatting of English United
States ease month, the month, date, and year. So this is the formatting. I think this should work. Not working as well. I suspect because our date
also has this time component. This was not part
of the spreadsheet, but some hope our query is thinking that's at
the exact midnight. That's really where things are going into a little
bit of confusion. So maybe if we go to the change datatype with locale
and use that gear icon. And from the datatype date, if I put it as date slash, time and click,
Okay, that works. Here I have now character, date and time value. We don't need the time because the time has
actually no meaning. So now that it is
treated as data, I can convert this back to date alone and I think Bush gap. So we first take it
as date and time and then push it
to become a date. And that seems to make
Park where he happy. Some of the date and
time conversions are a letter tricky depending on
how your data is formatted. And this is where I find
that Parkway gives you a little more power
and ability to process the data rather
than Excel alone. So understanding these
nuances can go a long way in working with
challenging datatypes that you will encounter
from time to time. So a good challenge which I was not even planning to talk about. But sport date and
person is now here. I can just close and Lord, I'll get a nice
little table here, the green table that tells
me who's playing, what. Given this table, we can make a quick
pivot table from it. So I'm just going to insert a pivot from this
data right here. For example, to see
who's playing ward. So if I picked a sport
and then put the person, get the count of person. So this is how many people
are playing each sport. You could see that, for example, volleyball is our
most popular sport with 30 people playing. Next is tug of war. Discounts, duplicates,
but you could also count the unique values
or whatever you want. You could also,
instead of sport, you can kind of go and put Parson and then come to the number of sports
they're playing. Jan has Johnny's playing on
five different occasions. Some of our people are
paying nine times. And if I sort this, I can see Kelly and Stanley. Some of these people are
fairly athletic and Andy's only playing twice across
all of the things. This is a very good way to analyze the data now that
it is in a better format, we would not be able
to get these kind of things from here. Let's just quickly test this. If I change anything, whether that'll
appear there or not, I'm going to make
some notes here. For example, Dwight
displaying only twice, Michael is only
paying three times. I'm going to add their
names to some of the other places and then
we will see what happens. So here I'm just
gonna say Dwight. Then this one, we will
put white and Michael. Both of them. We have added one
do I should add to, and Michael should
have gone up by one. Once this data is
changed will come here, we'll right-click
and then refresh. This is gonna
refresh this table. And then it's going to add the extra information
some there. But this pivot table will not update because it still
says Y2 Michael tree. Now know that the
data is updated. You need to refresh this
and then that's gonna move up to Ford Michael to fork. This is where you need
to double refresh because of the way
this thing is setup, you are faster refresh. We'll get the updated
data and then your second iteration will
update the pivot tables. You could also use some of the settings within
Power Query to kind of delay the pivot table update until data update is done, that we will be wonderful. She's an F, but I
leave it all of those things for you to
figure out on your own. There you go. I'm pivoting something a
little more complicated than the usual
datasets that you see.
8. Bonus lesson automate stuff with Power Query: Hi there. This is a bonus video in the
Power Query mini-course. In this video I'm
gonna talk about how we can take a boating and highly repetitive data task
and then use power Query to automate it so that you
can just sit back and relax. This is actually not like the rest of the
videos in the class, because this video
is extracted from a live stream that I did
on my YouTube channel. On my YouTube channel, I do live streams every month. So this month's topic is how to automate boring and
repetitive tasks. And I took the live stream, I edited out the portions
that are not relevant for this lesson and then kind of
shrunk into one over video. Although this is slightly longer than the rest of
the videos in the class, I highly, highly
encourage you to watch this whole thing because there
is tons of value in there. And when you finish this particular video,
you will be like, Oh my God, I can use power Query to do so many more
awesome things. So please go ahead and watch this data sample
files that you can download and practice
the concept. Or I highly recommend
that you apply them to your work
situations as well. Yeah. Thank you so much. I'll
catch you in the video. Hello, everyone. Good morning. Good afternoon, good evening. It is so good to have
you in our automating, boring data stuff live
stream for the month. I am really excited. Let's get into the session. You can grab the files, like I mentioned in the video description
link that is there. I will. Along getting to the problem. This problem originated with one of the situations in the
awesome chocolates company. Awesome chocolates is a
made-up company that I use in many of my
videos and examples. And this is also one of the scenarios that I created
for us on chocolates. So what happened in
awesome chocolate Cs? We sell chocolates and
we are a chocolate, not a retail chocolate company, but the wholesale
chocolate company. We make chocolate duct
eventually get sold in supermarkets
to you customers. Many times, our chocolate
orders come from bulk supermarket and we
just shipped them boxes. But most of these
artists are automate it. But sometimes what happens is
our customers also receive a traditional order form
for the chocolates. So one of the other
forms looks like this. So here in this
particular challenge, I will show you how to build with data in
situations where things can be a little bit more unconventional and
how to handle that. So as you can see, this is my customer order form, but then awesome
chocolate it because it doesn't look fairly clean. So it's not like super dirty
data we're talking about. But it is, you will quickly understand that this
kind of a format is a pain to work with if you wanted to do
anything meaningfully, when it comes to
doing data analysis, quickly zoom in so we can
see what is going on here. The order form has kind
of a standard template, which is got your salesperson
name and customer name. So here we have got
the sands button bar, funny and customer,
some bold winner. Their details are only mentioned
once in the order form. And then further down, we have got all of
these product wise, how many boxes they weren't
on each of the days. The dates themselves may not be continuous like you see here. Second of May, 114 boxes
of fruit and nut bars. Then fifth of May, and
they weren't 108 of 99%, pure and dark and
pure like that. The dates will go like this. Sometimes that will
be dates up to 1515 different days
because this order form, you can only enter up
to 15 days of data. So it's kind of like a
traditional art form. Some of our claims fill up and send it to the salespeople, and then we will have
to fulfill the orders. Now the big challenge, this is where you and
I come in ease for us to look at this data and
do some analysis work. The problem here is, the data is all there. For example, if I
wanted to know how many total mint chip
chocolate we need to ship, we could kind of do some simple analysis of adding
the column map, but are, are adding this rule up and then saying
whatever that number is, six hundred, six hundred boxes. But the problem, because of
the way it is structured, it gets a little
bit clunky and Ms. Messy, we don't know how far
to some and all of that. And if I have got
multiple art forms from bar funny in
different files, then we don't know exactly
how to act and that's where the challenges
this is one order form. But as I mentioned, we have got hundreds of ohms. So this is the
folder where I have kept some are
followed order forms. And now I'm In this session, we're going to understand how you
can take data that is kind of messed up and
confused and garbled like this. And then how you can create
a system through which you can combine all of this data in one goal into a
final master table. That's really what
the live session is going to be all about. We'll take the files, will build a repetitive
process that will just get all the data, make sure that it isn't a neat single table
format at the end of it. So that any number of
files you feed to it, it's going to just take
everything and then combining, put it in one place. If you think that is
the most awesome thing you heard all week, all day or all ear. Please stay false,
stay on and enjoy the rest of the session if you already know
some parts of it, but not all of it. Again, standalone. Yet. That's that I'm going
to get into actually zooming the screen and
making sure that you can read things later. But now because I'm
only demoing the file, there is not much
to really weird. I'm not doing anything. The files are already
provided to you. This is the files that I have provided you when you
download the files, zip file that contains all these art forms
and in everything, I'll quickly talk
about the rest of the files in the file. Finally, the files
that I gave you, there is also a combined
data Excel workbook that has the final output from from Excel that has
all the data combined. If you wanted to
see how to finalize it. Output looks like that. That's the file, but
that's what we're gonna construct in the next
60 to 90 minutes. A long way. You are
going to learn some, some really powerful feature. So you'll understand
how Power Query works. You will understand how to build a function within Power Query. This is one of the most powerful and advanced functionalities
of Power Query. And then how to run that
function over and over again. Some of the best practice
when it comes to building such functions and
using all of that. What I'm gonna do is I'm
going to open up Excel. And then this is
just a blank file. And then from this blank, while we will bring up all the data together
into the final bag blank. Well, that's really what we, what we need to achieve. So lying quietly in Excel, and then from here it's all fairly
straightforward process. The first thing that we want
to do is we wanted to bring the files here and
then combine them. I'll show you how the processes
for a single file first, because then you will understand the true
power of Power Query. Then the men were
from the first file, you will understand certain key concepts and then we will see how to kind of repeat that
process for the entire file. This is really this way you get kind of like
a repetition of that. And then if anything was
confusing the first time, second time it
will become clear. We couldn't go to
the Data ribbon. And from here you can access all the Power Query
related functionality. So they are sitting here in this little corner called
get and transform data. In our data is, even though it is a
little bit messed up, it is all in spreadsheets, so it's easy for me to just get the data from file
from Excel workbook. So that's really the path
will connect to that. Then here you just
want to point to the file location where
the individual file is no, Right now we're only going to do the process for a single file. Then we'll repeat the process
for the whole whole bunch. Again. I'm just going to go
to my live stream folder, order forms and then
pick any order form. It doesn't matter. For the sake of simplicity
and consistency, I'm going to pick the zeros, 01, the very first
start off one. Then click on Import. This has gone farther, open up in Navigator screen. And within the navigator
screen you can now do a preview of the
data Power Query will show you this is the
data that I found. Are you happy with this? Another key note here, whatever you do within
Power Query, you, even though I'm
demoing this with an Excel file as
the source data, you can apply the same idea, same concepts for your SQL
datasets for your text files, for your web pages, or for your XML files, whatever is your source
data, you can apply that. So don't think, Oh, this awesome chocolates
example is not for me. Now, that's not the case because the technique
and application and severity of disease so powerful that if you ignore
this creature of Excel, you will be doing a
lot of repetitive work manually or using some
other older technologies. That's really important here. And the Navigator, we just select this
and then it's gonna show you how the
data looks like. To the navigator.
As you can see, this is a really messed
up a single file itself. For our eyes, it
looks super clean, but when we're
looking at it from the Power Query perspective
with only sees the data. That is a lot of null values. It's all mismatch and
everything we will need to do the transformed data step. If I directly loaded, it's going to be useless for me. I can't do any analysis
for this one file alone. Forget about doing it
for hundreds of files. We need to use the
Transform Data button. I'm going to click on that
and it'll kind of jump into the query editor screen
within Power Query. And this is a shortcut. This is something that's not
part of the session at all. But if you see things in Power
Query and you're thinking, Oh, everything is
super small here. How do I assume? Because within Excel I can hold my mouse button and then up and down it and
it'll increase the screen. You can use Control plus button, control minus to just the screen zoom.
This is Power Query. As I said, if you've never
done any power Query work, then it might be a little confusing what the task that
we're gonna do right now, but well, I'll try to
explain it a brief level. This is a query
editor screen through which I can tell Power Query how I want my data
to be cleaned. At any point in time, you can have multiple queries. So that's why this panel
here lists all the queries. Right now. We only have 11 query, so this is the only one
that is listed here, c where f, But if you
have multiple queries, then they will all come up here. And it doesn't have your usual
ribbon stuff on the top. It's all my ribbon. And you can read things and understand there's some important buttons here. Close and load means you
finished your process, you wanted to know, load
up the data to Excel. In. Merge Queries means
I wanted to come combine two different queries
like a join operation. Append means I want to
combine data one after that, like stitching together
in all of that. Within the query,
this big box here, it shows me how my data looks. At that point in time. You can apply different
steps on the data. Whatever steps you apply, they will appear here
in that plant steps. Right now we haven't
really done anything. So all of this is really blank. All we did is loaded up the source file and
then navigated. But he was basically what the navigators screen was doing. That's really where we are
and any name for your query, you can give it, you
can change all of this, whatever you do it here. It will only apply locally
within this Excel file. It will not change the source. Well, the source file
will be still as it is, but this one will have all
these extra things added. For example, I see that
column 12 are null. There is nothing
there, it's just null. So I can hold down my shift key, select both columns and then right-click and then
just say remove columns. As I said, what this will
do is it's just going to remove those columns
locally for me, but the original file
demands as it is. Now, there is a remote
Columns step baggage here, which simply says
remove two columns. Likewise, you can see that
the first row and second row, kind of there's
really nothing there. It says customer order form. Please interrupt to
15 days of order, and then third row is also null. So the real data begins from
rule number four onwards. We could, for example, takeaway first three rows
and we would intrude lyric, really lose anything valuable because there's nothing
interesting there. So I can use the removal
Rows button here. And then just say remote top. Then say it, number
of rows is three. Now boom, those
roles are also gone. Essentially, we have added
an extra step that says, I don't need those top three
rows. Get rid of Decker. Let's say all they want. When I clean up this
data is I want to know who the salesperson is
and who the customer is. I don't care anything about
the numbers and everything. We only want rows 12, everything else is
irrelevant for us. Then I can just use the key
browse button, keep top rows. So earlier we use it removals. Now we are using keep rows. Rows, just keep. Rosenhan wrote numbers
12 because that's where my salesperson and customer
information would be. At this point. Again, we don't need anything. None of these other
columns are required. Only the first two
columns are what we need. So I can just use Shift
and select column 34. Right-click, remove
other columns. That means keep these and
get rid of everything else. This is how basically we
are transforming the data. We took the portion of data out of everything that is there. At this point the
data goes sideways. Salesperson, customer, ideally I want the salesperson and the
customer has two columns. We can go to the
Transform ribbon, and from here there is
a transpose button. What it will do is it'll
flip the table sideways. If I do it like that, then the data changes
its orientation. So now we have Bonferroni here and board whether it there. If I go back to an earlier step, you will see that it is what? It is horizontal.
This time we now know that column one
should be called BAD, funny, and column
two should be called customer, salesperson
and customer. Again, from home, we can
use the US first row as headers option to kind
of set the characters, that is a semi colon. After that I can
just delete that. This step is called
renaming the columns. So we have renamed. We took a whole bunch of
data, big bunch of data, and then we applied one
step after another, so that by the end of it all, we were able to extract the
two pieces of information. Imagine the other file, like a mountain of really
shift the information. We were able to go in, dug up there and then
find the two items of valuable information
in the next drag that up. This is just a sample case, but it kind of gives you a sense of what
we are achieving. At this point. I'm happy with the
way things are. I can just say Close and Lord. And this has gone to
Flickr the screen a few times and then it'll bring
me the final data from that. There is only one row
information here. It's obvious that there will
only be one row because in each other there is only one
customer in one salesperson. And they will come up here. Remember where it all started? It started from the
the way f 001 file. Now I'll quickly go back
here and open my file. This is the original
source file, and I'm gonna change a
few things from here. So instead of bar funny, I'm going to type my name there. Instead of board
winner will type. Blanking out here. W. These two names, these don't exist anywhere
else in the data, so it's easy for us to spot. I'll save this file. Close this. Now here if I want to get the character data now
that the file has changed, it's severely simple matter
of right-click, refresh. And boom, we get the
updated data here. None of those steps need
to be repeated again, because what we've
done essentially is we have taught Excel how we
want to do this process. When the data changes,
I can come back and give you the updated data. While it is all super
beautiful and awesome, there are some downsides
to this approach. The biggest thing that
you need to remember is we could with various
assumptions to get here. For example, if I go here, notice that we removed the first three rows
because they are all blank. But let's just say
we get an order file where there is number
three is missing. Like this. Save this, refresh. Probably the file is closed. Yes. I already get an error. It's not telling me
that there is an error, but that's really
what is happening because it was trying
to do something, but that doesn't really happen. I can, for example, going AT ten, figured
out what was going on. But because we are working under the assumption here,
there is a error. You can see that at each
step that is data available, at some point, things would
have fresh the previous here. At each step, data is available up until that point and then
my name is completely gone. We are only not up to
customer information, so Chengdu salesperson is completely gone because
that rule is removed. This is where because that
assumption is broken. When I tried to do
subsequent steps, it's not able to properly
transformed the table. And then here, it's giving me. The real error is, if you remember the
very last step that we did after we promoted
the headers, we saw oh, customer column,
salesperson column. So we remove the column. The actual underlying
programming code that Power Query you as
writing is this one. I'd say is the header that has a salesperson semicolon or a colon or whatever that is, and then make it
salesperson, customer. Colon to customer. It was trained to look for
a column named salesperson, which is not even there
because of the way data is now happening. For example, if I
don't bother with the renaming, then
I would still, my query would still
work if I delete that shape using that X
mark, it'll still work. So at this point, my data looks like this. But if we want
correct column names, so we renamed, then
that's gonna break. So this is one of the things that you
need to keep in mind when you are working
with Power Query, essentially you're telling
computer help you weren't. Computers are not smart. They just follow
the instructions. So as with all, depends on how clear
your instructions. Our credit is super smart, but it is still dump. It has no sense of
what could break. So you would have
to, as an analyst, anticipate what would
break and then build a repetitive process
that works through that. You don't have to
break your backend or think all of this, but depends on your reality. Your reality is that the
rules can disappear. Then you will have to
build a better process or you just have to get
on the phone and talk to the sales person who's
doing the art form and then given a make make him understand or give him a piece of your
mind so that they can, they can stop deleting
rows or anything. Just tell them keep
the template as it is. It will make our life simple. So there's different ways to deal with this
kind of a problem. But whatever may be happening, this is something that
you need to keep in the back of your mind
irrespective of what is going on, this is still way better and way faster and smarter than manually
doing any of this work. So this is why I
really enjoy it. Now that we understood
the process for one file and one
part of the problem, we didn't even think
about how to get the actual boxes and
products and everything. So we kind of understand
where it is all going. So now that we see that, now let's take this and
apply it for a big file, all set of files in one goal. Before we do that, I'm just gonna read
through some questions to understand if there is
anything else going on. Chip asks Could if
dance steps we used? Yes, you can add extra rules and logic and everything that
says if this then do that, that is all possible. I don't think we have got
any conditions added in the coverage for this session. But I've got another video where I talked a little
bit more about that. So definitely check that one
out and you will find it. Sean says the reason
why you didn't keep all the data would
just salesperson, customer name in a new column. Yeah. That's really what
we're gonna do. But we didn't want to do
that just for one file. We had to do this for all the files and then
combine everything. So instead of tackling all
the problems both times, the first time we just
get only the name, customer name, and
salesperson name. The next time we are just going to do everything together. You will see that
we've been saying is going to be used
for Query Editor in Power BI to yes, you can. Whatever you are learning, you don't have to
use it in Excel. You can use the same ideas
in Power of Power BI. So this is why I find learning Power Query is like the
biggest bang for buck, because not only
you're saving time, but you save time twice, once in Excel, once in Power BI. So why wouldn't you do that? Imagine you always
wanted to visit France or Italy or one of the
exotic countries for you. But the biggest hole
back for you to use the language like
everybody there speaks a different
language than you do. But now imagine what happens. Like you can go to Paris, you can go to
Italy, lived there. Everybody will speak the same
language as you do at home. That's really what this is. You can use both tools and they both speak the same language.
So it's super simple. Whatever you learn,
literally no changes, it's exactly the same
screen, same steps. Will you just do the
same in Power BI? This is one fine, but
our job is not done. We are only getting
warmed up now. We will do it for all
the files in one goal. This is my original query. I wanted to do one little
thing before we needed that. We just go back here and
add that blank row again. Kind of restore the sanity
where our order form and refresh this so that
the things come up here. Yeah, the column names are still having that extra
character at the end. We'll just live
with that for now. Alright, so I'll go to a new worksheet and then we don't even
need to go there. And then from data this time we want to
do the same process, but for all the files. And not just get the
name and customer, but their entire
list of orders here. Then continue that
process for all the files so that we have one
giant table here. It quick reminder, again, this is my art to farms. The folder has many,
many order form, so the zeros 01 is
just a tip of iceberg. I have got different art forms. If I open a random form here, you will see that this guy here, their data is a
little bit longer. They had 14 days worth of information that has water form. One didn't have 14. I think
it had only up to this. And they did ordering even more products than
the original thing. The name and order of these
products is also different, so everything is messed up. This is the big challenges. We need to kind of get it. Here we will say get data
instead of a single file. We wanted to get the
data from a foldable. The processes, what happens
to a file is similar. But now instead of saying
get the data from a file, you're saying head
is a bunch of files. Get the data wherever
everything together. That's what the folder we'll do. This option. If you are using a really
old version of Excel, you may not see
the folder option, but you can still
technically do it. I would recommend
probably an easier thing to upgrade Unix law Paul query, rather than try to twist
it in and get it done, the internal workings are same. I don't really know exactly when the folder our patients
started appearing. The folder thing is
there in Power BI since very early versions I
believe, but probably Excel. I don't know if someone can
come from in the chat window. If you don't see it,
say from folder, and then go to my other forms
folder and then open back. That's the folder
where I want my data. Then this is going to show you a listing of all the
files that it Phone. The content columns is binary, which means bytes
and bits and bytes. Basically that's what
files and computers are. It says I found all of
these files and OEM, it only shows like past 20 or 25 roles or
something like that. What do you want to do?
Do you want to combine? Do you want to load
the old transform? You wanted to cancel? Combine these really
what we want, but we just don't want to
combine as it is because we know that our data
is not correctly shape. We need to combine
and transform. So that's the first step. Combine and transform. This is what we're
really going for. You will see some similar
options in Lord as well. Don't directly Lord, just say
combine and transform data. This is going to open up a page. If and when it comes up, I didn't say I wanted
to combine files. Now asks, where do
you want to combine? So here we are now building
that repetitive process. Essentially, you will tell Power Query whatever
you pick in one file, then it's going,
it's going to go and pick same data
from all the files. Each file has this si hoy of worksheet. This is what I see. So do you want this data? That's what really asking here. There is not much of a choice, so we'll just say, yeah,
cool, That's fine. This is what I want.
So we click Okay. Then this is again going to flicker the
screen a few times. L, like a tube light, and eventually it'll
come up here and boom, all your data is merged. Now I can for me to hide my face so we
can just see the data. There's a lot of
garbage in this data. We still have to clean it up, but at least it is all there. If I go here and Control Plus, you can see that this has
basically taken up boy of zeros 0110106 like that, all the data is basically taken and put
one underneath another. Imagine if somebody is basically stitching
together of the files, one file to file
three like that. They would not really
paying any attention. They're not doing
any work on it. They're not changing the data. They just attaching
one after another. That's really what Power
Query has done so far for us. This is fairly useless, but it is a great
starting point. At least I didn't
have to deal with 100 different files of data. I have to deal with one
set of ****** data. From here it is easy for me
to apply things or do stuff. Hello, Ian says set
my Excel is crashing. If my data type is
different when I automate using
folder any solution, I'm not really sure
why it would crash. It's a little tricky
to kind of dissect the problem or figured
out where things are going wrong from, from just one message alone, there could be any
number of reasons why it would crash or
things could break. Starting from simple things
like insufficient memory or you're running too many stuff to more complicated things. You would probably want to, this is something that kind of a universal if you can get
into Power Query and there is an error at the last step is basically you just
want to backtrack. You keep going one step at a time until you find the step where
things are not an error. And then the next
step is an editor. Once you look at that, that is where the problem is in the new stock to figure
out why it is the problem. I hope that gave you some idea. If you already know that
in you're like, Man, this is not helping me, then I Feel free to add more detail. But if it's not directly
connected to the topic at hand, I wouldn't be able
to help you out. Whilst he shares. Can source data be
changed every time? Yes, That's the whole point. You can have different
source data. Like in my case, I
wanted to change a bunch of files in a different
month folder. So even though I'm showing the
process for just one file, one folder, and the
idea can be extended. It gets a little
bit more technical. So I didn't want to extend this masterclass for
three hours and then show you how to do the multiple
folder thing as well. But once you know that this is the process to
do for one folder, then you just have to think, what if I have six folders, then it's very easy for you too. Get the answer for
that question. So let us the technique and then going up late
for multiple forests, there's probably YouTube
videos or tutorials on the web that will give you missing
pieces of that puzzle. Pooja says Folder option is
not available for Mac users. Well, what can I say? I'm even surprised that some of the things
that are available, because I only saw recently that they are adding Power
Query support to max. It could be a V while
went before you start seeing folder up
support in mind. The only thing that I can say is my sympathies
are with you that salt has many thanks for
sharing Ken Parker do with, deal with different headers. It can deal with
different headers. You just have to
know the technique so everything is possible. But we will have to think, it's basically like you're playing chess with bar creating. Every step that you do. You have to anticipate
what Power Query is gonna do if that's not the case. So if I am a Zoom like if
I'm renaming a column from salesperson too, Sales Officer, then I'm assuming inherently, even though we're not
explicitly saying that, that there will be
a column called salesperson and then it should be renamed
to Sales Officer. But what if there is
no salesperson column? We have seen that earlier. Then you would have to play
the chest two levels deep by starting with an assumption
that you want to rename second column rather
than salesperson column. That's how you start thinking. What if it's not second? What if it's seventh,
then you will have to get even spot that
everything is possible. It, you'll see a
programming framework. This is something that I do
for my I'm live streams. You can see I've got my witty
saying on the board here. Power Query is the AMI,
deliberate wordplay there. But essentially the M language. M is the language name
that power query uses. M language, ies. What par credit users to build according
infrastructure behind. And you can go and write your own M language code if you become a little
bit comfortable. I don't know much in Miami is fairly drastic, but
I can get that. With that. As I said, this is like a somebody stitch it
together, the files. It is badly useful. I can't really do anything
meaningful with it, but it is a better problem than a 100 individual problems, like one big problem. Now let's understand
what really happened behind the scenes because
that is where the clue is. If I expand the queries, suddenly I see a whole bunch of queries instead
of a single query. I thought, you know, we just
didn't afford a connection. There should be only one query. But internally, Power
Query will now organize a whole bunch of
different things to facilitate the mechanism. Here you can see that there
is a bunch of things here. This explaining, this
takes a little bit. Like, you know, if you
are in the same room, then it's easier to explain. But since we're all in different rooms in different
parts of the world, I'm gonna try my best. I think this is where my
face will probably help. You can kind of see my
expressions as well. Here we've got
different folders. The key thing is there is
a affects transform file. What this is, this is the function that Power Query will apply on each of the files in the folder your
file has afforded has 20 files or 40 files on
each file. It will apply. It function, the function
is called transform file. And transform sample file is the query on which that
function is based on. One way to think about is
within this, this guy here. Within this transform sample, I tell Power Query what I
wanted to do on a single file. Once I finish doing that, then Power Query will build the function for me,
transform sampled function. It's an M language function. We don't need to know none of the technical stuff for
this thing to work. But if I wanted to change
it, then we need to know. If you are happy with the earth
setup now and you'll say, okay, my job is done. I don't need anything more
than my data is combined. You don't even have to
know what else is there. Because our data is
still messed up, we need to know
what is going on. Transform sample is the
function and the sample file is my query based on which the function gets
internally created. If I, for example, change something in
the sample file, add an extra step or do
something on an individual file, then that function gets updated and all the files in the folder, we'll have new function applied and newer data gets generated. So if I am not happy with
the way my final data is, the place where I would start changing is in the
sample grading. We will adjust the
sample query until our final output is
suitable for us. That's really what
we're gonna do. When you go to the
sample. The sample is basically built off one file. This way you're not overwhelmed, you're not looking at
all the 100 datasets. You just look at
one file and then tell per query, what
do you want to do. This is what really we have
done earlier in the session. We looked at one file, we built the process to get
the bits of data from them. Here in this file, what we will do is
the way our data is, it's kind of like a
little bit messed up. So for now, you don't have to, if you're following along, you don't have to do this step, but I'm just going
to just do it for now so that we can
open the file here. That way I can show
you how our data is. Our data is technically
not a single data. It's two datasets. This data, salesperson,
end customer, which only appears
once for the file. Then there is this table, giant table that could
have some blank rows, blank columns, and
lots of blank values. This is where the members are. It's basically two
different datasets. So there's different ways
to solve the problem. But the easier option here to, here would be if I can
build two functions, one to get this data and
the other to get that data. And then apply
both functions one on top of another
on all the files. Then we will combine data. That's one way of
solving this problem. As I said, there is
no one correct answer in this situations, it's
technical implementation, Rachel, you could do differently and still get the same result, which is all fine. But if you're doing it
over thousands of files, the new performance and
optimization comes into picture when you go back
and edit the query. So once you have months
to come back to Excel, to go back into Power Query, you can, if you have
got the query's panel, you can right-click on any query on Edit alternatively from data you can enable
the credits and connections totally
sure this panel, and then use the edit
option. You go back here. We know we'll build
two functions. So even though there
is only one function built for us so far, we will customize that function so that it works for
one set of the data. And then we will
add a new function. So then you will see the
whole spectrum of things. Transform sample file and this file in the already built one, we're gonna extract this data, the product matrix data,
and then reshape it. To do that we don't need
the foster two columns, so I'll just select
these to remove columns. Then. We know that our product matrix good grid always starts
from rule number nine. So anything before row
number nine is irrelevant. Again, here we are working under the assumption that the
end plate will always be true and there is
nothing changes in the template like nobody adds blank rows or
deletes blank rows. Otherwise, what we
would have to do is we would have
to scan the table, the first row where the product, what is appearing, then believed
up until that time zone.
9. Don't forget this video!!!: Thank you so much for watching
this Power Query lessons. I hope you thoroughly
enjoyed it. Please don't forget to leave a review on the
Skillshare platform so that other
prospective students can also enjoy this
particular course. I wish you all the best and more power to
you in your future. Bye bye.